Design goals

The design goals of the Eternity Service protocol described in this document are to:

the design does not address the orthogonal goal of:

because anonymous browsing can be provided by alternate services, such as Freedom, Anonymizer, LWPA/ProxyMate, Crowds etc.

Similarly the following issue:

is not addressed, as third party rating can be treated separately by third party filtering services, such as NetNanny etc., or more positively via third party ratings, or categorised indexes (such as Yahoo).

System components

The Eternity service consists of three major components: storage, retrieval and publication services. If the publication service fails, existing materials can still be retrieved.

Storage

Storage servers hold all the materials available in the Eternity system in a highly redundant way. Each file is held at a number of locations specified at publication; the value of the file and risk of attack should affect the publisher’s decision.

Server operators are particularly vulnerable to legal attacks. British libel actions are commonly brought against the distributor of libellous material as well as its author. As protection against coercion of server administrators, our system hides the material they are serving even from them.

Data is stored in uniform 5k blobs that are DES_EDE3 encrypted with a user-chosen key. Each blob in the system is given a unique 160-bit name by the person who publishes it; if a blob with that name already exists, the system forces the user to choose a new one. The theoretical limit of data held at any one time is therefore 5k * 2160, roughly 5 * 2130 terabytes. Given that the current Web is estimated to occupy around 2TB, this should be more than adequate for future expansion.< /P>

A blob is structured as follows:

0         1           2          4                      12
+--------------------------------------------------------+
| Version | Algorithm | Reserved | Initialisation Vector |
+========================================================+
| Link | Padding Length |  Padding  |         Data       | 
+=======================/================/===============+
12     32              34                              5156

This is the version 0 format; the algorithm field is also set to 0 to represent DES_EDE3 encryption. Future formats may use different algorithms if a problem is found with this algorithm, but it was chosen due to its success against many years of cryptanalysis and freedom from patent hindrance. The reserved field is available for any necessary future parameters without needing to change the version number. An algorithm-dependent initialisation vector is chosen randomly to set up the following cipher feedback encryption. DES has an 8-octet block size.

The rest of the blob is encrypted in CFB mode. If there is further data in the file contained in the blob, the first 20 bytes are the name of the continuation blob. Otherwise they are the first 20 bytes of the blob's data. A two-octet field then specifies the length of the following random data that pads the data contained in the blob to 5122 octets. If this field is zero, a client can begin to retrieve the continuation blob while it receives the remainder of the current blob. Any random padding then comes before the data itself to take advantage of the feedback in the encryption mode.

The total length of the encrypted section is 5144 bytes, a multiple of DES’ 8-byte block size. Hence the section will take 643 DES blocks and require no padding before encryption.

For backup and audit purposes, the blob names and keys are chosen randomly by the user’s software. For public files, they are derived from a URL used to access the file. The first blob of data from a file is named SHA1(URL). If the file contains less t han 5k of data, the parcel is padded with random data. If the file is larger, the first blob ends with the random 160-bit label assigned to the next parcel; and so on until the whole file is split up into 5k segments. This prevents files being traced by t heir size, or of traffic analysis being use to follow their transmission across a network. Goldberg and Wagner [GW97] suggested using files padded to several different sizes such as 1k, 2k, 4k, 8k and 16k. While this is more efficient, it makes it easy to differentiate between backup data and public Web pages stored on a server. This would reduce the protection the presence of less controversial data gives the system from total shutdown.

To reduce server operator liability, even public files are encrypted. The key for all the blobs that make up a public file are derived from the URL that refers to it. Operators know only the hash of the URL and therefore could not derive the key.

For simplicity we take the first 192 bits from

SHA1(1|URL) | SHA1(2|URL)

where | is concatenation. The last bit of each of the 24 bytes is changed to a parity bit; hence the effective key length is 168 bits. Schneier lists a small number of weak DES keys [Schn96] but because SHA1 should give keys uniformly distributed t hroughout the entire keyspace, we feel the small security risk of using them is outweighed by the added complexity our protocol would require to not use them.

As a final protection against dawn raids, servers can store blobs encrypted using an operator-chosen password. The version, algorithm and reserved fields are not encrypted to remove known plaintext. The name of the file is also encrypted. Even an attac ker in full control of a server cannot determine if it holds a given blob. Any password given by an operator will result in random-looking data and cannot be checked.

Retrieval

It is vital that an attacker cannot find the locations of a given blob; this would allow her to physically or legally attack the servers holding it and attempt to remove it. Anderson suggested a trusted network of index servers that mapped locators sup plied by clients to the real location of files. Unfortunately, any list of locations becomes a very valuable target for an attacker. Goldberg and Wagner used "chained" URLs, which required several separate servers to co-operate to find a given f ile. If an attacker could prevent the operation of any of those servers, the file could not be retrieved.

At some expense in terms of bandwidth, our design therefore avoids such lists altogether. A process wishing to retrieve a known blob multicasts the message GET blobname on a well-known address. This is currently 249.221.1.5 port 666; this may change if the Eternity protocol is standardised. All storage servers listen to this address and so receive the request. Servers know only which blobs they hold; no-one else knows the location of any blob. As long as one server holding a given blob is operating, it can return that blob to a client.

There are three potential ways a server that has a copy of a blob could send it to a client. In all cases the client would not know which servers it received blobs from.

The simplest is for all servers holding a blob to respond to all requests for that blob. The data would be sent through an anonymous TCP network [Goldberg98] to hide the identity of the servers. This is bandwidth inefficient, and is also risky in that the set of servers holding a certain blob could be trivially identified if the anonymous network was compromised.

An improvement is for each server holding a blob stored on n servers to respond to requests for that blob with probability p, which would naively be set at 1/n. Ideally, each request would then be answered randomly by only one of t he servers holding the requested blob. Realistically, the chances of r responses is pr(1-p)n-r. There is a probability of (1-p)n requests for an existing blob being ignored by all servers holding it. A typical value of n=5 would cause 33% of requests to be ignored. Increasing p to 2/n would double average reply traffic but reduce ignored requests to 8%. If clients were set to request a blob m further tim es before deciding it was not available, the probability of a false negative would be (1-p)n+m. Requests are far less bandwidth intensive (24 bytes) than replies (5156 bytes) so pn should be equal to 24m/5156. Both s hould be set with regard to the false positive rate that is deemed to be acceptable: (1-p)n+m < .001 would seem reasonable. This is a bandwidth-efficient method, but requires servers to know how many other servers hold a given blob in order to set p. Ideally, they should not have this information, which aids attackers that compromise a server or operate one of their own. It also loses its efficiency when clients serve blobs from their caches. Because nobody then knows th e number of blob copies in the system, a server cannot independently make a decision about whether to reply to a request.

The final option is for the server to multicast its reply. When a retrieval server receives a request, it waits for a random period of time then multicasts an acknowledgement from an anonymous IP address where it can be reached by the requesting client . If it hears an acknowledgment from another server before its timeout, it suppresses its own response with a high probability.

Because of the time taken for an acknowledgement to travel across the network, several servers are likely to respond. This allows clients to choose the nearest server to first request a blob from, before moving to other servers if transmission problems occur. It also reduces the impact of a denial of service attack where an attacker always acknowledges a blob request, but then refuses to supply it. Reducing the probability with which a server will suppress a response to a blob request increases overall traffic, but lowers the impact of this attack. To further reduce this possibility, the acknowledgement includes the hash of the requested blob. Other servers only suppress their responses when they see a correct hash for that blob. Attackers can only pro duce acknowledgements for blobs whose content they know

All three methods are bandwidth inefficient, but the alternative is that all the storage servers need to communicate with each other to decide who is to send back the blob. This adds protocol complexity, but more importantly requires the setting up and management of trust relationships between servers. This would require either some central authority that would then become a very valuable target, or a huge amount of work by each server administrator.

Clients also act as servers. They cache any blobs they retrieve for a short length of time, and during that period will act as a server to any other client that requests the blob. To unpublish a blob, an attacker must therefore remove it from the cache of every client that has accessed it recently as well as all the servers that hold it.

To encourage a better distribution of different blobs throughout all client caches, clients monitor the blobs supplied by other servers, and add those blobs to a deletion list. If a blob is already on the deletion list, it is moved up one position on t he list. When a client needs to reduce the size of its cache, it removes a random selection of files on the list before deleting other least recently used blobs in the cache. This reduces the ability of an attacker to force all clients to remove certain b lobs from their caches by repeatedly requesting and supplying those blobs.

Gateway retrieval

Public files are known by URLs of the form *.cypherspace/*. Separate Eternity systems could co-exist by using different top level domains. Eternity-compatible browsers recognise such URL s and know how to retrieve them from the Eternity system. But it would be extremely useful if clients without Eternity support could read public Eternity files without having to install new software.

This would be possible using gateway servers and one of two small changes to clients. A gateway machine would take HTTP requests from browsers for a public Eternity file, retrieve the file and then return it via the same HTTP connection. A user could a ccess such a server by filling in a form on a page on the server that was part of the World Wide Web.

To save even this step, certain DNS servers could be programmed to reply to any request for a .cypherspace address with the IP address of a nearby gateway, or return a different gateway address to each query for load balancing. The gateway would listen on the standard HTTP port (80) and service requests sent there. It would require HTTP 1.1 clients, which include the entire URL in a request rather than just the filename portion. Users would the n just need to add one of these DNS machines to their list of nameservers. If a proxy server took this step, any client using the proxy would then have access to Eternity files without even needing to take this small step.

Authentication

Because clients multicast requests for Eternity files, it would be possible for an attacker to respond to requests for certain files with different pieces of information. Private Eternity files are encrypted with a user-chosen key; without that key, a masquerading file from an attacker will be obviously forged. But how does a client know that a public file it receives is authentic?

One option would be for publication servers (see later section) to sign blobs before putting them into the Eternity system. Clients would then check the authenticity of these signatures when retrieving a blob. But this makes the publication servers’ pr ivate signature keys high-value targets. If compromised, an attacker could break the whole system by replying to every client request for any blob with false, but signed blobs. The client wouldn’t then know which the "real" blob was. Even with s martcards we couldn’t guarantee the security of publication server keys [AK96], and an attacker would have a great incentive to steal them.

But, with a small twist, we can get rid of that dependency. When someone publishes a file in the system, they are given a URL slightly altered from the one they requested, with base64(SHA1(first_blob)) embedded. This adds 27 characters to the URL and lets the person who requests the document do the authentication — removing the need for centralised signatures. Eternity files can’t be changed so having their hash as part of their URL is entirely acceptable. Anderson and Peticolas use this idea to authe nticate medical formularies [AK97].

In this system, the storage and retrieval services are entirely separate. There is no need for any global indexes, and storage and gateway servers can be started and stopped with little damage to the integrity of the overall system.

Publication

In order for an Eternity system to succeed, those who operate it will need an incentive to resist the attacks it will doubtless come under. A large part of the administrative burden of operating an anonymous remailer is answering messages from people r eceiving unwanted mail through the server. While Eternity is a pull rather than push system – it only serves content to those who request it – powerful attackers are likely to become interested if material they see as objectionable is published in it.

We therefore need a mechanism for paying servers for holding Eternity data. It has to be more complex than a publisher handing over a specified sum of e-cash to a given server; by design, even a publisher must not know where their data is being held. W e also need a mechanism to check that servers really are serving material to clients during the period they have contracted to do so.

Our protocol therefore makes use of a trusted third party. A payment server chooses, through a market mechanism, the servers that are to hold data being published; then pays those servers after checking they are fulfilling the quality of service they c ontracted to provide.

A user who wants to publish file x first gives it to a payment server P. It checks the requested filename is free (by trying to retrieve blob filename), then multicasts a "Who wants to store a file (size n) for m months?" message. Any listen ing retrieval server that has available capacity replies "I’ll store it for n e-dollars".

P chooses the lowest-bidding user-required number of servers, charges the user their combined charges plus commission, and sends the file blob(s) along with one check blob and first payment for each server.

P can now forget which retrieval servers are holding which files (very important, as otherwise the compromise of P would compromise all the files published through it.) All it needs to remember for each file is, "try retrieving blob b at the follo wing randomised intervals, and pay retrieval server r x dollars if it passes those tests." If availability does not meet contracted conditions, the retrieval server will not be paid.

When anyone requests a blob, they cannot (by design) know who has returned it. So without this checking protocol, the payment server couldn’t check that a specific retrieval server was holding a specific blob. With a special protocol, the retrieval ser ver could cheat by replying to checks but not normal requests.

Each component is highly separated. If an attacker closes down any one server, it has no serious effect on the rest of the system. Ideally, each retrieval server would have one or two ‘buddies’ who kept a list of the serial number of all blobs the othe r buddies have. Then if one of the buddies was shut down, the other could tell the rest of the network which blobs were now less available, whereupon other servers could get copies from the rest of the network. If an index server is closed down, this mere ly needs to be notified to all of our DNS nameservers, who will no longer refer clients to it. If a nameserver gets closed down, users can just change to another one.

Other publication servers can be seamlessly slotted into our system. Any system can be integrated simply by providing a gateway that can take Eternity requests, retrieve blobs using a system-specific protocol, and return if necessary an acknowledgement and blob to a client. The second author’s prototype USENET Eternity [Back99], for example, can be integrated with a simple gateway script that can access a News server.

Auxiliary applications

Two services are easily added to this system to make it more Web-like. Search engines can index Eternity pages, but must be able to redirect users without local Eternity support to a Web gateway to access pages they have discovered in their sear ches. Search software simply needs to be configured with a list of known gateways, and return links to Eternity pages in the form http://www.gateway.com /getpage?url=www.company.cypherspace/searchresults. A random gateway could be used each time for load balancing purposes. This also overcomes a difficulty Anderson worried about in extreme situations; that governments would make th e very publishing of an Eternity URL illegal. Even in that extreme scenario, clients could still locate Eternity material through a search engine.

Entirely independent audits can also be conducted of the readership of a certain Eternity file. An auditing agent configured with a list of URLs can listen for requests of the SHA hash of those URLs, and log the details of the requests. Advertis ing agencies can therefore obtain independent data on how often pages they may wish to advertise on are read. Agencies currently need to rely on site operators or their agents, who have an incentive to boost the figures for such data.