A Survey of Peer-to-Peer Content Distribution...

A Survey of Peer-to-Peer Content Distribution Technologies

STEPHANOS ANDROUTSELLIS-THEOTOKIS AND DIOMIDIS SPINELLIS

Athens University of Economics and Business

Distributed computer architectures labeled “peer-to-peer” are designed for the sharingof computer resources (content, storage, CPU cycles) by direct exchange, rather thanrequiring the intermediation or support of a centralized server or authority.Peer-to-peer architectures are characterized by their ability to adapt to failures andaccommodate transient populations of nodes while maintaining acceptable connectivityand performance.

Content distribution is an important peer-to-peer application on the Internet thathas received considerable research attention. Content distribution applicationstypically allow personal computers to function in a coordinated manner as a distributedstorage medium by contributing, searching, and obtaining digital content.

In this survey, we propose a framework for analyzing peer-to-peer contentdistribution technologies. Our approach focuses on nonfunctional characteristics suchas security, scalability, performance, fairness, and resource management potential, andexamines the way in which these characteristics are reflected in—and affected by—thearchitectural design decisions adopted by current peer-to-peer systems.

We study current peer-to-peer systems and infrastructure technologies in terms oftheir distributed object location and routing mechanisms, their approach to contentreplication, caching and migration, their support for encryption, access control,authentication and identity, anonymity, deniability, accountability and reputation, andtheir use of resource trading and management schemes.

Categories and Subject Descriptors: C.2.1 [Computer-Communication Networks]:Network Architecture and Design—Network topology; C.2.2 [Computer-Communication Networks]: Network Protocols—Routing protocols; C.2.4[Computer-Communication Networks]: Distributed Systems—Distributeddatabases; H.2.4 [Database Management]: Systems—Distributed databases; H.3.4[Information Storage and Retrieval]: Systems and Software—Distributed systems

General Terms: Algorithms, Design, Performance, Reliability, Security

Additional Key Words and Phrases: Content distribution, DOLR, DHT, grid computing,p2p, peer-to-peer

1. INTRODUCTION

A new wave of network architectureslabeled peer-to-peer is the basis ofoperation of distributed computing

Authors’ address: Athens University of Economics and Business, 76 Patission St., GR-104 34, Athens, Greece;email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or direct commercial advantage andthat copies show this notice on the first page or initial screen of a display along with the full citation.Copyrights for components of this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use anycomponent of this work in other works requires prior specific permission and/or a fee. Permissions may berequested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212)869-0481, or [email protected]©2004 ACM 0360-0300/04/1200-0335 $5.00

systems such as [Gnutella 2003], Seti@Home [SetiAtHome 2003], OceanStore[Kubiatowicz et al. 2000], and manyothers. Such architectures are gener-ally characterized by the direct sharing

ACM Computing Surveys, Vol. 36, No. 4, December 2004, pp. 335–371.

336 S. Androutsellis-Theotokis and D. Spinellis

of computer resources (CPU cycles, stor-age, content) rather than requiring the in-termediation of a centralized server.

The motivation behind basing appli-cations on peer-to-peer architecturesderives to a large extent from their abilityto function, scale, and self-organize inthe presence of a highly transient popu-lation of nodes, network, and computerfailures, without the need of a centralserver and the overhead of its adminis-tration. Such architectures typically haveas inherent characteristics scalability,resistance to censorship and centralizedcontrol, and increased access to resources.Administration, maintenance, respon-sibility for the operation, and even thenotion of “ownership” of peer-to-peer sys-tems are also distributed among the users,instead of being handled by a single com-pany, institution or person (see also Agre[2003] for an interesting discussion of in-stitutional change through decentralizedarchitectures). Finally, peer-to-peer archi-tectures have the potential to acceleratecommunication processes and reducecollaboration costs through the ad hoc ad-ministration of working groups [Schoderand Fischbach 2003].

This report surveys peer-to-peer con-tent distribution technologies, aiming toprovide a comprehensive account of ap-plications, features, and implementationtechniques. As this is a new and (thank-fully) rapidly evolving field, and advancesand new capabilities are constantly beingintroduced, this article will be present-ing what essentially constitutes a “snap-shot” of the state of the art around thetime of its writing—as is unavoidably thecase for any survey of a thriving researchfield. We do, however, believe that thecore information and principles presentedwill remain relevant and useful for thereader.

In the next section, we define the basicconcepts of peer-to-peer computing. Weclassify peer-to-peer systems into threecategories (communication and collab-oration, distributed computation, andcontent distribution). Content distribu-tion systems are further discussed andcategorized.

We then present the main attributes ofpeer-to-peer content distribution systems,and the aspects of their architectural de-sign which affect these attributes are an-alyzed with reference to specific existingpeer-to-peer content distribution systemsand technologies.

Throughout this report the terms“node”, “peer” and “user” are used inter-changeably, according to the context, to re-fer to the entities that are connected in apeer-to-peer network.

1.1. Defining Peer-to-Peer Computing

A quick look at the literature reveals a con-siderable number of different definitionsof “peer-to-peer”, mainly distinguishedby the “broadness” they attach to theterm.

The strictest definitions of “pure” peer-to-peer refer to totally distributed sys-tems, in which all nodes are completelyequivalent in terms of functionality andtasks they perform. These definitions failto encompass, for example, systems thatemploy the notion of “supernodes” (nodesthat function as dynamically assignedlocalized mini-servers) such as Kazaa[2003], which are, however, widely ac-cepted as peer-to-peer, or systems thatrely on some centralized server infras-tructure for a subset of noncore tasks(e.g. bootstrapping, maintaining reputa-tion ratings, etc).

According to a broader and widely ac-cepted definition in Shirky [2000], “peer-to-peer is a class of applications that takeadvantage of resources—storage, cycles,content, human presence—available atthe edges of the internet”. This defini-tion, however, encompasses systems thatcompletely rely upon centralized serversfor their operation (such as seti@home,various instant messaging systems, oreven the notorious Napster), as well asvarious applications from the field of Gridcomputing.

Overall, it is fair to say that there isno general agreement about what “is” andwhat “is not” peer-to-peer.

We feel that this lack of agreement ona definition—or rather the acceptance of

ACM Computing Surveys, Vol. 36, No. 4, December 2004.

A Survey of Content Distribution Technologies 337

various different definitions—is, to a largeextent, due to the fact that systems orapplications are labeled “peer-to-peer” notbecause of their internal operation or ar-chitecture, but rather as a result of howthey are perceived “externally”, that is,whether they give the impression of pro-viding direct interaction between comput-ers. As a result, different definitions of“peer-to-peer” are applied to accommodatethe various different cases of such systemsor applications.

From our perspective, we believe thatthe two defining characteristics of peer-to-peer architectures are the following:

—The sharing of computer resources bydirect exchange, rather than requir-ing the intermediation of a centralizedserver. Centralized servers can some-times be used for specific tasks (sys-tem bootstrapping, adding new nodesto the network, obtain global keys fordata encryption), however, systems thatrely on one or more global centralizedservers for their basic operation (e.g. formaintaining a global index and search-ing through it—Napster, Publius) areclearly stretching the definition of peer-to-peer.As the nodes of a peer-to-peer networkcannot rely on a central server coordi-nating the exchange of content and theoperation of the entire network, they arerequired to actively participate by inde-pendently and unilaterally performingtasks such as searching for other nodes,locating or caching content, routing in-formation and messages, connecting toor disconnecting from other neighboringnodes, encrypting, introducing, retriev-ing, decrypting and verifying content, aswell as others.

—Their ability to treat instability andvariable connectivity as the norm, au-tomatically adapting to failures in bothnetwork connections and computers, aswell as to a transient population ofnodes.This fault-tolerant, self-organizing ca-pacity suggests the need for an adap-tive network topology that will changeas nodes enter or leave and network con-

nections fail or recover, in order to main-tain its connectivity and performance.

We therefore propose the following defi-nition:

Peer-to-peer systems are distributed systemsconsisting of interconnected nodes able to self-organize into network topologies with the purposeof sharing resources such as content, CPU cycles,storage and bandwidth, capable of adapting tofailures and accommodating transient popula-tions of nodes while maintaining acceptable con-nectivity and performance, without requiring theintermediation or support of a global centralizedserver or authority.

This definition is meant to encompass“degrees of centralization” ranging fromthe pure, completely decentralized sys-tems such as Gnutella, to “partially cen-tralized” systems1 such as Kazaa. How-ever, for the purposes of this survey, weshall not restrict our presentation and dis-cussion of architectures and systems toour own proposed definition, and we willtake into account systems that are consid-ered peer-to-peer by other definitions aswell, including systems that employ a cen-tralized server (such as Napster, instantmessaging applications, and others).

The focus of our study is content distri-bution, a significant area of peer-to-peersystems that has received considerable re-search attention.

1.2. Peer-to-Peer and Grid Computing

Peer-to-peer and Grid computing are twoapproaches to distributed computing, bothconcerned with the organization of re-source sharing in large-scale computa-tional societies.

Grids are distributed systems that en-able the large-scale coordinated use andsharing of geographically distributed re-sources, based on persistent, standards-based service infrastructures, often witha high-performance orientation [Fosteret al. 2001].

1In our definition, we refer to “global” servers, tomake a distinction from the dynamically assigned“supernodes” of partially centralized systems (seealso Section 3.3.3)



As Grid systems increase in scale, theybegin to require solutions to issues of self-configuration, fault tolerance, and scala-bility, for which peer-to-peer research hasmuch to offer.

Peer-to-peer systems, on the other hand,focus on dealing with instability, tran-sient populations, fault tolerance, and self-adaptation. To date, however, peer-to-peerdevelopers have worked mainly on verti-cally integrated applications, rather thanseeking to define common protocols andstandardized infrastructures for interop-erability.

In summary, one can say that “Grid com-puting addresses infrastructure, but notyet failure, while peer-to-peer addressesfailure, but not yet infrastructure” [Fosterand Iamnitchi 2003].

In addition to this, the form of sharinginitially targeted by peer-to-peer has beenof limited functionality, providing a globalcontent distribution and filesharing spacelacking any form of access control.

As peer-to-peer technologies move intomore sophisticated and complex applica-tions, such as structured content distri-bution, desktop collaboration, and net-work computation, it is expected thatthere will be a strong convergence be-tween peer-to-peer and Grid computing[Foster 2000]. The result will be a newclass of technologies combining elementsof both peer-to-peer and Grid comput-ing, which will address scalability, self-adaptation, and failure recovery, while,at the same time, providing a persis-tent and standardized infrastructure forinteroperability.

1.3. Classification of Peer-to-PeerApplications

Peer-to-peer architectures have been em-ployed for a variety of different applicationcategories, which include the following.

Communication and Collaboration.This category includes systems thatprovide the infrastructure for facilitatingdirect, usually real-time, communica-tion and collaboration between peercomputers. Examples include chat andinstant messaging applications, such as

Chat/Irc, Instant Messaging (Aol, Icq,Yahoo, Msn), and Jabber [Jabber 2003].

Distributed Computation. This categoryincludes systems whose aim is to take ad-vantage of the available peer computerprocessing power (CPU cycles). This isachieved by breaking down a computer-intensive task into small work units anddistributing them to different peer com-puters, that execute their correspondingwork unit and return the results. Cen-tral coordination is invariably required,mainly for breaking up and distributingthe tasks and collecting the results. Exam-ples of such systems include projects suchas Seti@home [Sullivan III et al. 1997;SetiAtHome 2003], genome@home [Lar-son et al. 2003; GenomeAtHome 2003],and others.

Internet Service Support. A number ofdifferent applications based on peer-to-peer infrastructures have emerged forsupporting a variety of Internet ser-vices. Examples of such applicationsinclude peer-to-peer multicast systems[VanRenesse et al. 2003; Castro et al.2002], Internet indirection infrastruc-tures [Stoica et al. 2002], and secu-rity applications, providing protectionagainst denial of service or virus attacks[Keromytis et al. 2002; Janakiraman et al.2003; Vlachos et al. 2004].

Database Systems. Considerable workhas been done on designing distributeddatabase systems based on peer-to-peerinfrastructures. Bernstein et al. [2002]propose the Local Relational Model(LRM), in which the set of all data storedin a peer-to-peer network is assumed tobe comprised of inconsistent local rela-tional databases interconnected by setsof “acquaintances” that define translationrules and semantic dependencies betweenthem. PIER [Huebsch et al. 2003] is ascalable distributed query engine builton top of a peer-to-peer overlay networktopology that allows relational queriesto run across thousands of computers.The Piazza system [Halevy et al. 2003]provides an infrastructure for buildingsemantic Web [Berners-Lee et al. 2001]applications, consisting of nodes that cansupply either source data (e.g. from a



relational database), schemas (or ontolo-gies) or both. Piazza nodes are transitivelyconnected by chains of mappings betweenpairs of nodes, allowing queries to bedistributed across the Piazza network.Finally, Edutella [Nejdl et al. 2003] is anopen-source project that builds on theW3C metadata standard RDF, to providea metadata infrastructure and queryingcapability for peer-to-peer applications.

Content Distribution. Most of the cur-rent peer-to-peer systems fall within thecategory of content distribution, which in-cludes systems and infrastructures de-signed for the sharing of digital mediaand other data between users. Peer-to-peer content distribution systems rangefrom relatively simple direct fileshar-ing applications, to more sophisticatedsystems that create a distributed stor-age medium for securely and efficientlypublishing, organizing, indexing, search-ing, updating, and retrieving data. Thereare numerous such systems and in-frastructures. Some examples are: thelate Napster, Publius [Waldman et al.2000], Gnutella [Gnutella 2003], Kazaa[Kazaa 2003], Freenet [Clarke et al.2000], MojoNation [MojoNation 2003],Oceanstore [Kubiatowicz et al. 2000],PAST [Druschel and Rowstron 2001],Chord [Stoica et al. 2001], Scan [Chenet al. 2000], FreeHaven [Dingledineet al. 2000], Groove [Groove 2003], andMnemosyne [Hand and Roscoe 2002].

This survey will focus on content distri-bution, one of the most prominent appli-cation areas of peer-to-peer systems.

1.4. Peer-to-Peer Content Distribution

In its most basic form, a peer-to-peer con-tent distribution system creates a dis-tributed storage medium that allows forthe publishing, searching, and retrievalof files by members of its network. Assystems become more sophisticated, non-functional features may be provided, in-cluding provisions for security, anonymity,fairness, increased scalability and perfor-mance, as well as resource managementand organization capabilities. All of thesewill be discussed in the following sections.

An examination of current peer-to-peertechnologies in this context suggests thatthey can be grouped as follows (with ex-amples in Tables I and II).

—Peer-to-Peer Applications. This categoryincludes content distribution systemsthat are based on peer-to-peer tech-nology. We attempt to further subdi-vide them into the following two groups,based on their application goals and per-ceived complexity:Peer-to-peer “file exchange” systems.These systems are targeted towardssimple, one-off file exchanges betweenpeers. They are used for setting upa network of peers and providing fa-cilities for searching and transferringfiles between them. These are typicallylight-weight applications that adopt abest-effort approach without addressingsecurity, availability, and persistence. Itis mainly systems in this category thatare responsible for spawning the repu-tation (and in some cases notoriety) ofpeer-to-peer technologies.Peer-to-peer content publishing and stor-age systems. These systems are targetedtowards creating a distributed storagemedium in—and through—which userswill be able to publish, store, and dis-tribute content in a secure and persis-tent manner. Such content is meant tobe accessible in a controlled manner bypeers with appropriate privileges. Themain focus of such systems is securityand persistence, and often the aim isto incorporate provisions for account-ability, anonymity and censorship resis-tance, as well as persistent content man-agement (updating, removing, versioncontrol) facilities.

—Peer-to-Peer Infrastructures. This cate-gory includes peer-to-peer based infras-tructures that do not constitute workingapplications, but provide peer-to-peerbased services and application frame-works. The following infrastructure ser-vices are identified:Routing and location. Any peer-to-peercontent distribution system relies ona network of peers within which re-quests and messages must be routed



Table I. A Classification of Current Peer-to-Peer Systems

(RM: Resource Management; CR: Censorship Resistance; PS: Performance and Scalability; SPE: Security,Privacy and Encryption; A: Anonymity; RA: Reputation and Accountability; RT: Resource Trading.)

Peer-to-Peer File Exchange Systems

System Brief DescriptionNapster Distributed file sharing—hybrid decentralized.Kazaa [2003] Distributed file sharing—partially centralized.Gnutella [2003] Distributed file sharing—purely decentralized.

Peer-to-Peer Content Publishing and Storage Systems

System Brief Description Main FocusScan [Chen et al. 2000] A dynamic, scalable, efficient content distribution

network. Provides dynamic content replication.PS

Publius [Waldman et al. 2000] A censorship-resistant system for publishingcontent. Static list of servers. Enhanced contentmanagement (update and delete).

RM

Groove [Groove 2003] Internet communications software for directreal-time peer-to-peer interaction.

RM,PS,SPE

FreeHaven [Dingledine et al. 2000] A flexible system for anonymous storage. A,RAFreenet [Clarke et al. 2000] Distributed anonymous information storage and

retrieval system.A,RA

MojoNation [MojoNation 2003] Distributed file storage. Fairness through the useof currency mojo.

SPE,RT

Oceanstore [Kubiatowicz et al. 2000] An architecture for global scale persistent storage.Scalable, provides security and access control.

RM,PS,SPE

Intermemory [Chen et al. 1999] System of networked computers. Donate storagein exchange for the right to publish data.

RT

Mnemosyne [Hand and Roscoe 2002] Peer-to-peer steganographic storage system.Provides privacy and plausible deniability.

SPE

PAST [Druschel and Rowstron 2001] Large scale persistent peer-to-peer storage utility. PS,SPEDagster [Stubblefield and Wallach 2001] A censorship-resistant document publishing

system.CR,SPE

Tangler [Waldman and Mazi 2001] A content publishing system based on documententanglements.

CR,SPE

with efficiency and fault tolerance, andthrough which peers and content canbe efficiently located. Different infras-tructures and algorithms have beendeveloped to provide such services.Anonymity. Peer-to-peer based infras-tructure systems have been designedwith the explicit aim of providing useranonymity.Reputation Management. In a peer-to-peer network, there is no centralorganization to maintain reputationinformation for users and their behavior.Reputation information is, therefore,hosted in the various network nodes. Inorder for such reputation information tobe kept secure, up-to-date, and avail-able throughout the network, complexreputation management infrastructuresneed to be employed.

2. ANALYSIS FRAMEWORK

In this survey, we present a descriptionand analysis of applications, systems, andinfrastructures that are based on peer-to-peer architectures and aim at eitheroffering content distribution solutions, orsupporting content distribution relatedactivities.

Our approach is based on:—identifying the feature space of nonfunc-

tional properties and characteristics ofcontent distribution systems,

—determining the way in which the non-functional properties depend on, andcan be affected by, various design fea-tures, and

—providing an account, analysis, andevaluation of the design features andsolutions that have been adopted by



Table II. A Classification of Current Peer-to-Peer Infrastructures

Infrastructures for routing and location

Chord [Stoica et al. 2001] A scalable peer-to-peer lookup service. Given a key, itmaps the key to a node.

CAN [Ratnasamy et al. 2001] Scalable content addressable network. A distributedinfrastructure that provides hash-table functionalityfor mapping file names to their locations.

Pastry [Rowstron and Druschel 2001] Infrastructure for fault-tolerant wide-area location androuting.

Tapestry [Zhao et al. 2001] Infrastructure for fault-tolerant wide-area location androuting.

Kademlia [Mayamounkov and Mazieres 2002] A scalable peer-to-peer lookup service based on theXOR metric.

Infrastructures for anonymity

Anonymous remailer mixnet [Berthold et al. 1998] Infrastructure for anonymous connection.Onion Routing [Goldschlag et al. 1999] Infrastructure for anonymous connection.ZeroKnowledge Freedom [Freedom 2003] Infrastructure for anonymous connection.Tarzan [Freedman et al. 2002] A peer-to-peer decentralized anonymous network layer.

Infrastructures for reputation management

Eigentrust [Kamvar et al. 2003] A Distributed reputation management algorithm.A Partially distributed reputation management

system [Gupta et al. 2003]A partially distributed approach based on adebit/credit or debit-only scheme.

PeerTrust [Xiong and Liu 2002] A decentralized, feedback-based reputationmanagement system using satisfaction and number ofinteraction metrics.

current peer-to-peer systems, as well astheir shortcomings, potential improve-ments, and proposed alternatives.

For the identification of the nonfunc-tional characteristics on which our studyis based, we used the work of Shaw andGarlan [1995], which we adapted to thefield of peer-to-peer content distribution.

The various design features and solu-tions that we examine, as well as theirrelationship to the relevant nonfunctionalcharacteristics, were assembled througha detailed analysis and study of currentpeer-to-peer content distribution systemsthat are either deployed, researched, orproposed.

The resulting analysis framework is il-lustrated in Figure 1. The boxes in theperiphery show the most important at-tributes of peer-to-peer content distribu-tion systems, described as follows.

Security. Further analyzed in terms of:Integrity and authenticity. Safeguard-ing the accuracy and completeness of

data and processing methods. Unautho-rized entities cannot change data; ad-versaries cannot substitute a forged doc-ument for a requested one.Privacy and confidentiality. Ensuringthat data is accessible only to thoseauthorized to have access, and thatthere is control over what data is col-lected, how it is used, and how it ismaintained.Availability and persistence. Ensuringthat authorized users have access todata and associated assets when re-quired. For a peer-to-peer content dis-tribution system this often means al-ways. This property entails stability inthe presence of failure, or changing nodepopulations.

Scalability. Maintaining the system’s per-formance attributes independent of thenumber of nodes or documents in itsnetwork. A dramatic increase in thenumber of nodes or documents willhave minimal effect on performance andavailability.



Fig. 1. Illustration of the way in which various design features affect the main characteristics of peer-to-peer content distribution systems.

Performance. The time required for per-forming the operations allowed by thesystem, typically publication, searching,and retrieval of documents.

Fairness. Ensuring that users offer andconsume resources in a fair and bal-anced manner. May rely on account-ability, reputation, and resource tradingmechanisms.

Resource Management Capabilities. Intheir most basic form, peer-to-peer con-tent distribution systems allow the pub-lishing, searching, and retrieval of docu-ments. More sophisticated systems mayafford more advanced resource manage-ment capabilities, such as editing orremoval of documents, management ofstorage space, and operations on meta-data.

Semantic Grouping of Information. Anarea of research that has attracted con-siderable attention recently is the se-mantic grouping and organization ofcontent and information in peer-to-peernetworks. Various grouping schemesare encountered, such as semantic

grouping, based on the content itself;grouping, based on locality or networkdistance; grouping, based on organiza-tion ties, as well as others.

The relationships between nonfunc-tional characteristics are depicted as aUML diagram in Figure 1. The Figureschematically illustrates the relationshipbetween the various design features andthe main characteristics of peer-to-peercontent distribution systems.

The boxes in the center show the designdecisions that affect these attributes. Wenote that these design decisions are mostlyindependent and orthogonal.

We see, for example, that the perfor-mance of a peer-to-peer content distribu-tion system is affected by the distributedobject location and routing mechanisms,as well as by the data replication,caching, and migration algorithms. Fair-ness, on the other hand, depends onthe system’s provisions for accountabil-ity and reputation, as well as on any re-source trading mechanisms that may beimplemented.



In the following sections of this survey,the different design decisions and featureswill be presented and discussed, based onan examination of existing peer-to-peercontent distribution systems, and withreference to the way in which they affectthe main attributes of these systems,as presented. A relevant presentationand discussion of various desired prop-erties of peer-to-peer systems can alsobe found in Kubiatowics [2003], whilean analysis of peer-to-peer systems fromthe end-user perspective can be found inLee [2003].

3. PEER-TO-PEER DISTRIBUTED OBJECTLOCATION AND ROUTING

The operation of any peer-to-peer contentdistribution system relies on a networkof peer computers (nodes), and connec-tions (edges) between them. This networkis formed on top of—and independentlyfrom—the underlying physical computer(typically IP) network, and is thus referredto as an “overlay” network. The topology,structure, and degree of centralization ofthe overlay network, and the routing andlocation mechanisms it employs for mes-sages and content are crucial to the oper-ation of the system, as they affect its faulttolerance, self-maintainability, adaptabil-ity to failures, performance, scalability,and security; in short almost all of the sys-tem’s attributes as laid out in Section 2(see also Figure 1).

Overlay networks can be distinguishedin terms of their centralization andstructure.

3.1. Overlay Network Centralization

Although in their purest form peer-to-peeroverlay networks are supposed to be to-tally decentralized, in practice this is notalways true, and systems with various de-grees of centralization are encountered.Specifically, the following three categoriesare identified.

Purely Decentralized Architectures. Allnodes in the network perform exactly thesame tasks, acting both as servers andclients, and there is no central coordi-nation of their activities. The nodes of

such networks are often termed “servents”(SERVers+clieENTS).

Partially Centralized Architectures. Thebasis is the same as with purely decentral-ized systems. Some of the nodes, however,assume a more important role, acting aslocal central indexes for files shared by lo-cal peers. The way in which these supern-odes are assigned their role by the networkvaries between different systems. It is im-portant, however, to note that these su-pernodes do not constitute single pointsof failure for a peer-to-peer network, sincethey are dynamically assigned and, if theyfail, the network will automatically takeaction to replace them with others.

Hybrid Decentralized Architectures. Inthese systems, there is a central server fa-cilitating the interaction between peers bymaintaining directories of metadata, de-scribing the shared files stored by the peernodes. Although the end-to-end interac-tion and file exchanges may take place di-rectly between two peer nodes, the centralservers facilitate this interaction by per-forming the lookups and identifying thenodes storing the files. The terms “peer-through-peer” or “broker mediated” aresometimes used for such systems [Kim2001].

Obviously, in these architectures, thereis a single point of failure (the centralserver). This typically renders them inher-ently unscalable and vulnerable to censor-ship, technical failure, or malicious attack.

3.2. Network Structure

By structure, we refer to whether the over-lay network is created nondeterministi-cally (ad hoc) as nodes and content areadded, or whether its creation is basedon specific rules. We categorize peer-to-peer networks as follows, in terms of theirstructure:

Unstructured. The placement of content(files) is completely unrelated to the over-lay topology.

In an unstructured network, contenttypically needs to be located. Searchingmechanisms range from brute force meth-ods, such as flooding the network withpropagating queries in a breadth-first



or depth-first manner until the desiredcontent is located, to more sophisticatedand resource-preserving strategies thatinclude the use of random walks and rout-ing indices (discussed in more detail inSection 3.3.4). The searching mechanismsemployed in unstructured networks haveobvious implications, particularly in re-gards to matters of availability, scalability,and persistence.

Unstructured systems are generallymore appropriate for accommodatinghighly-transient node populations. Somerepresentative examples of unstructuredsystems are Napster, Publius [Waldmanet al. 2000], Gnutella [Gnutella 2003],Kazaa [Kazaa 2003], Edutella [Nejdl et al.2003], FreeHaven [Dingledine et al. 2000],as well as others.

Structured. These have emerged mainlyin an attempt to address the scalabilityissues that unstructured systems wereoriginally faced with. In structured net-works, the overlay topology is tightlycontrolled and files (or pointers to them)are placed at precisely specified locations.These systems essentially provide a map-ping between content (e.g. file identifier)and location (e.g. node address), in theform of a distributed routing table, so thatqueries can be efficiently routed to thenode with the desired content [Lv et al.2002].

Structured systems offer a scalable so-lution for exact-match queries, that is,queries where the exact identifier of therequested data object is known (as com-pared to keyword queries). Using exact-match queries as a substrate for keywordqueries remains an open research problemfor distributed environments [Witten et al.1999].

A disadvantage of structured systems isthat it is hard to maintain the structurerequired for efficiently routing messagesin the face of a very transient node popu-lation, in which nodes are joining and leav-ing at a high rate [Lv et al. 2002].

Typical examples of structured systemsinclude Chord [Stoica et al. 2001], CAN[Ratnasamy et al. 2001], PAST [Druscheland Rowstron 2001], Tapestry [Zhao et al.2001] among others.

Table III. A classification of Peer-to-Peer ContentDistribution Systems and Location and Routing

Infrastructures in Terms of Their Network Structure,With Some Typical Examples

Centralization

Hybrid Partial None

Unstructured Napster,Publius

Kazaa,Mor-pheus,Gnutella,Edutella

Gnutella,FreeHaven

StructuredInfrastructures

Chord,CAN,Tapestry,Pastry

StructuredSystems

OceanStore,Mnemosyne,Scan, PAST,Kademlia,Tarzan

A category of networks that are in be-tween structured and unstructured are re-ferred to as loosely structured networks.Although the location of content is notcompletely specified, it is affected by rout-ing hints. A typical example is Freenet[Clarke et al. 2000; Clake et al. 2002].

Table III summarizes the categories weoutlined, with examples of peer-to-peercontent distribution systems and architec-tures. Note that all structured and looselystructured systems are inherently purelydecentralized; form follows function.

In the following sections, the overlaynetwork topology and operation of differ-ent peer-to-peer systems is discussed, fol-lowing the above classification, accordingto degree of centralization and networkstructure.

3.3. Unstructured Architectures

3.3.1. Hybrid Decentralized. Figure 2 il-lustrates the architecture of a typical hy-brid decentralized peer-to-peer system.

Each client computer stores content(files) shared with the rest of the network.All clients connect to a central directoryserver that maintains:

—A table of registered user connection in-formation (IP address, connection band-width etc.)



Fig. 2. Typical hybrid decentralized peer-to-peerarchitecture. A central directory server maintainsan index of the metadata for all files in the network.

—A table listing the files that each userholds and shares in the network, alongwith metadata descriptions of the files(e.g. filename, time of creation, etc.)

A computer that wishes to join thenetwork contacts the central server andreports the files it maintains. Client com-puters send requests for files to the server.The server searches for matches in its in-dex, returning a list of users that hold thematching file. The user then opens directconnections with one or more of the peersthat hold the requested file, and down-loads it (see Figure 2).

The advantage of hybrid decentralizedsystems is that they are simple to imple-ment, and they locate files quickly and ef-ficiently. Their main disadvantage is thatthey are vulnerable to censorship, legal ac-tion, surveillance, malicious attack, andtechnical failure, since the content shared,or at least descriptions of it, and the abil-ity to access it are controlled by the singleinstitution, company, or user maintain-ing the central server. Furthermore, thesesystems are considered inherently unscal-able, as there are bound to be limitationsto the size of the server database and itscapacity to respond to queries. Large Websearch engines have, however, repeatedlyprovided counterexamples to this notion.

Examples of hybrid decentralized con-tent distribution systems include the no-

torious Napster and Publius systems[Waldman et al. 2000] that rely on a static,system-wide list of servers. Their architec-ture does not provide any smooth, decen-tralized support for adding a new server,or removing dead or malicious servers. Acomprehensive study of the behavior ofhybrid peer-to-peer systems and a com-parison of their query characteristics andtheir performance in terms of bandwidthand CPU cycle consumption is presentedin Yang and Garcia-Molina [2001].

It should be noted that systems thatdo not fall under the hybrid decentral-ized category may still use some centraladministration server to a limited extent,for example, for initial system boot-strapping (e.g. Mojonation [MojoNation2003]), or for allowing new users tojoin the network by providing themwith access to a list of current users(e.g. gnutellahosts.com for the gnutellanetwork).

3.3.2. Purely Decentralized. In this sec-tion, we examine the Gnutella network[Gnutella 2003], an interesting and rep-resentative member of purely decentral-ized peer-to-peer architectures, due to itsopen architecture, achieved scale, and self-organizing structure. FreeHaven [Dingle-dine et al. 2000] is another system usingrouting and search mechanisms similar tothose of Gnutella.

Like most peer-to-peer systems,Gnutella builds a virtual overlay net-work with its own routing mechanisms[Ripeanu and Foster 2002], allowing itsusers to share files with other peers.

There is no central coordination of theactivities in the network and users connectto each other directly through a softwareapplication that functions both as a clientand a server (users are referred to as aservents).

Gnutella uses IP as its underlying net-work service, while the communication be-tween servents is specified in a form ofapplication level protocol supporting fourtypes of messages [Jovanovich et al. 2001]:

Ping. A request for a certain host to an-nounce itself.



Pong. Reply to a Ping message. It con-tains the IP and port of the respondinghost and number and size of the files beingshared.

Query. A search request. It contains asearch string and the minimum speed re-quirements of the responding host.

Query Hits. Reply to a Query message.It contains the IP, port, and speed of theresponding host, the number of match-ing files found, and their indexed resultset.

After joining the Gnutella network (byconnecting to nodes found in databasessuch as gnutellahosts.com), a node sendsout a Ping message to any node it isconnected to. The nodes send back aPong message identifying themselves, andalso propagate the Ping message to theirneighbors.

In order to locate a file in unstructuredsystems such as gnutella, nondeterminis-tic searches are the only option since thenodes have no way of guessing where (atwhich nodes) the files may lie.

The original Gnutella architecture usesa flooding (or broadcast) mechanism to dis-tribute Ping and Query messages: eachGnutella node forwards the received mes-sages to all of its neighbors. The responsemessages received are routed back alongthe opposite path through which the origi-nal request arrived. To limit the spread ofmessages through the network, each mes-sage header contains a time-to-live (TTL)field. At each hop, the value of this fieldis decremented, and when it reaches zero,the message is dropped.

The above mechanism is implementedby assigning each message a unique iden-tifier and equipping each host with a dy-namic routing table of message identifiersand node addresses. Since the responsemessages contain the same ID as the orig-inal messages, the host checks its routingtable to determine along which link the re-sponse message should be forwarded. Inorder to avoid loops, the nodes use theunique message identifiers to detect anddrop duplicate messages, to improve effi-ciency, and preserve network bandwidth[Jovanovic 2000].

Once a node receives a QueryHit mes-sage, indicating that the target file hasbeen identified at a certain node, it ini-tiates a download by establishing a di-rect connection between the two nodes.Figure 3 illustrates an example of theGnutella search mechanism.

Scalability issues in the original purelydecentralized systems arose from the factthat the use of the TTL effectively seg-mented the network into “sub-networks”,imposing on each user a virtual “hori-zon” beyond which their messages couldnot reach [Jovanovic 2000]. Removing theTTL limit, on the other hand, would re-sult in the network being swamped withmessages.

Significant research has been carriedout to address the above issues, andvarious solutions have been proposed.These will be discussed in more detail inSection 3.3.4.

3.3.3. Partially Centralized. Partially cen-tralized systems are similar to purelydecentralized, but they use the con-cept of supernodes: nodes that are dy-namically assigned the task of servic-ing a small subpart of the peer networkby indexing and caching files containedtherein.

Peers are automatically elected to be-come supernodes if they have sufficientbandwidth and processing power (al-though a configuration parameter mayallow users to disable this feature)[FastTrack 2003].

Supernodes index the files shared bypeers connected to them, and proxy searchrequests on behalf of these peers. Allqueries are therefore initially directed tosupernodes.

Two major advantages of partially cen-tralized systems are that:

—Discovery time is reduced in compari-son with purely decentralized systems,while there still is no unique point offailure. If one or more supernodes godown, the nodes connected to them canopen new connections with other su-pernodes, and the network will continueto operate.



Fig. 3. An example of the Gnutella search mechanism. Solid lines betweenthe nodes represent connections of the Gnutella network. The search orig-inates at the “requestor” node, for a file maintained by another node. Re-quest messages are dispatched to all neighboring nodes, and propagatedfrom node-to-node as shown in the four consecutive steps (a) to (d). Whena node receives the same request message (based on its unique identifier)multiple times, it replies with a failure message to avoid loops and minimizetraffic. When the file is identified, a success message is returned.

—The inherent heterogeneity of peer-to-peer networks is taken advantage of,and exploited. In a purely decentralizednetwork, all of the nodes will be equally(and usually heavily) loaded, regard-less of their CPU power, bandwidth, orstorage capabilities. In partially central-ized systems, however, the supernodeswill undertake a large portion of theentire network load, while most of theother (so called “normal”) nodes will bevery lightly loaded, in comparison (seealso [Lv et al. 2002; Zhichen et al. 2002]).

Kazaa is a typical, widely used instanceof a partially centralized system imple-

mentation (as it is a proprietary system,there is no detailed documentation on itsstructure and operation). Edutella [Nejdlet al. 2003] is another partially centralizedarchitecture.

Yang and Garcia-Molina [2002a, 2002b]present research addressing the designof, and searching techniques for, partiallycentralized peer-to-peer networks.

The concept of supernodes has also beenproposed in a more recent version of theGnutella protocol. A mechanism for dy-namically selecting supernodes organizesthe Gnutella network into an interconnec-tion of superpeers (as they are referred to)and client nodes.



When a node with enough CPU powerjoins the network, it immediately be-comes a superpeer and establishes con-nections with other superpeers, forming aflat unstructured network of superpeers.If it establishes a minimum required num-ber of connections to client nodes within aspecified time, it remains a superpeer. Oth-erwise, it turns into a regular client node.

3.3.4. Shortcomings and Evolution of Unstruc-tured Architectures. A number of methodshave been proposed to overcome the origi-nal unscalability of unstructured peer-to-peer systems. These have been shown todrastically improve their performance.

Lv et al. [2002] proposed to replacethe original flooding approach by multipleparallel random walks, where each nodechooses a neighbor at random, and propa-gates the request only to it. The use of ran-dom walks, combined with proactive objectreplication (discussed in Section 4), wasfound to significantly improve the perfor-mance of the system as measured by queryresolution time (in terms of numbers ofhops), per-node query load, and messagetraffic generated. Proactive data replica-tion schemes are further examined in Co-hen and Shenker [2001].

Yang and Garcia-Molina [2002b] sug-gested the use of more sophisticatedbroadcast policies, selecting which neigh-bors to forward search queries to based ontheir past history, as well as the use of localindices: data structures where each nodemaintains an index of the data stored atnodes located within a radius from itself.

A similar solution to the information re-trieval problem is proposed by Kalogerakiet al. [2002] in the form of an Intelli-gent Search Mechanism built on top ofa Modified Random BFS Search Mecha-nism. Each peer forwards queries to a sub-set of its neighbors, selecting them basedon a profile mechanism that maintains in-formation about their performance in re-cent queries. Neighbours are ranked ac-cording to their profiles and queries areforwarded selectively to the most appro-priate ones only.

Chawathe et al. [2003] presented theGia System, which addresses the above is-

sues by means of:

—A dynamic topology adaptation protocolthat ensures that most nodes will be at ashort distance from high capacity nodes.This protocol, coupled with replicatingpointers to the content of neighbors, en-sures that high capacity nodes will beable to provide answers to a very largenumber of queries.

—An active flow control scheme that ex-ploits network heterogeneity to avoidhotspots, and

—A search protocol that is based on ran-dom walks directed towards high capac-ity nodes.

Simulations of Gia were found to in-crease overall system capacity by threeto five orders of magnitude. Similar ap-proaches, based on taking advantage ofthe underlying network heterogeneity, aredescribed in Lv et al. [2002] and Zhichenet al. [2002].

In another approach, Crespo andGarcia-Molina [2002] use routing indicesto address the searching and scalabilityissues. Routing indices are tables of infor-mation about other nodes, stored withineach node. They provide a list of neighborsthat are most likely to be “in the direction”of the content corresponding to the query.These tables contain information aboutthe total number of files maintained bynearby nodes, as well as the numberof files corresponding to various topics.Three different variations of the approachare presented, and simulations are shownto improve performance by 1-2 ordersof magnitude with respect to floodingtechniques.

Finally, the connectivity properties andreliability of unstructured peer-to-peernetworks such as Gnutella have beenstudied in Ripeanu and Foster [2002]. Inparticular, emphasis was placed on thefact that peer-to-peer networks exhibitthe properties of power-law networks,in which the number of nodes with Llinks is proportional to L−k , where k isa network dependent constant. In otherwords, most nodes have few links (thus, alarge fraction of them can be taken away



without seriously damaging the networkconnectivity), while there are a few highly-connected nodes which, if taken away, arelikely to cause the whole network to bebroken down in disconnected pieces. Oneimplication of this is that such networksare robust when facing random nodeattacks, yet vulnerable to well-plannedattacks. The topology mismatch betweenthe Gnutella network and the underlyingphysical network infrastructure was alsodocumented in Ripeanu and Foster [2002].Tsoumakos and Roussopoulos [2003]present a more comprehensive analysisand comparison of the above methods.

Overall, unstructured peer-to-peer con-tent distribution systems might be thepreferred choice for applications where thefollowing assumptions hold:

—keyword searching is the common oper-ation,

—most content is typically replicated at afair fraction of participating sites,

—the node population is highly transient,—users will accept a best-effort content re-

trieval approach, and—the network size is not so large as to in-

cur scalability problems [Lv et al. 2002](The last issue is alleviated throughthe various methods described in thisSection).

3.4. Structured Architectures

The various structured content distribu-tion systems and infrastructures employdifferent mechanisms for routing mes-sages and locating data. Four of the mostinteresting and representative mecha-nisms and their corresponding systemsare examined in the following sections.

—Freenet is a loosely structured systemthat uses file and node identifier similar-ity to produce an estimate of where a filemay be located, and a chain mode propa-gation approach to forward queries fromnode-to-node.

—Chord is a system whose nodes maintaina distributed routing table in the form ofan identifier circle on which all nodes aremapped, and an associated finger table.

—CAN is a system using an n-dimensionalCartesian coordinate space to imple-ment the distributed location and rout-ing table, whereby each node is respon-sible for a zone in the coordinate space.

—Tapestry (and the similar Pastry andKademlia) are based on the plaxtonmesh data structure, which maintainspointers to nodes in the network whoseIDs match the elements of a tree-likestructure of ID prefixes up to a digitposition.

3.4.1. Freenet—A Loosely Structured Sys-tem. The defining characteristic of looselystructured systems is that the nodes of thepeer-to-peer network can produce an esti-mate (not with certainty) of which node ismost likely to store certain content. Thisaffords them the possibility of avoidingblindly broadcasting request messages toall (or a random subset) of their neighbors.Instead, they use a chain mode propaga-tion approach, where each node makes alocal decision about which node to send therequest message to next.

Freenet [Clarke et al. 2000] is a typical,purely decentralized loosely-structuredcontent distribution system. It operatesas a self-organizing peer-to-peer network,pooling unused disk space in peer com-puters to create a collaborative virtualfile system. Important features of Freenetinclude its focus on security, publisheranonymity, deniability, and data replica-tion for availability and performance.

Files in Freenet are identified by uniquebinary keys. Three types of keys are sup-ported, the simplest is based on applyinga hash function on a short descriptive textstring that accompanies each file as it isstored in the network by its original owner.

Each Freenet node maintains its own lo-cal data store, that it makes available tothe network for reading and writing, aswell as a dynamic routing table contain-ing the addresses of other nodes and thefiles they are thought to hold. To searchfor a file, the user sends a request messagespecifying the key and a timeout (hops-to-live) value.

Freenet uses the following types of mes-sages, which all include the node identifier



(for loop detection), a hops-to-live value(similar to the Gnutella TTL, see Sec-tion 3.3.2), and the source and destinationnode identifiers:

Data insert. A node inserts new datain the network. A key and the actual data(file) are included.

Data request. A request for a certainfile. The key of the file requested is alsoincluded.

Data reply. A reply initiated when therequested file is located. The actual file isalso included in the reply message.

Data failed. A failure to locate a file.The location (node) of the failure and thereason are also included.

New nodes join the Freenet networkby first discovering the address of oneor more existing nodes, and then start-ing to send Data Insert messages. To in-sert a new file in the network, the nodefirst calculates a binary key for the file,and then sends a data insert message toitself. Any node that receives the insertmessage, first checks to see if the key isalready taken. If the key is not found, thenode looks up the closest key (in termsof lexicographic distance) in its routingtable, and forwards the insert messageto the corresponding node. By this mech-anism, newly inserted files are placedat nodes possessing files with similarkeys.

This continues as long as the hops-to-live limit is not reached. In this way, morethan one node will store the new file. Atthe same time, all the participating nodeswill update their routing tables with thenew information (this is the mechanismthrough which the new nodes announcetheir presence to the rest of the network).If the hops-to-live limit is reached withoutany key collision, an “all clear” resultwill be propagated back to the originalinserter, informing that the insert wassuccessful.

If the key is found to be taken, the nodereturns the preexisting file as if a requestwere made for it. In this way, maliciousattempts to supplant existing files by in-serting junk will result in the existing filesbeing spread further.

If a node receives a request for a locally-stored file, the search stops and the datais forwarded back to the requestor.

If the node does not store the file that therequestor is looking for, it forwards the re-quest to its neighbor that is most likely tohave the file, by searching for the file keyin its local routing table that is closest tothe one requested. The messages, there-fore, form a chain, as they propagate fromnode-to-node. To avoid huge chains, mes-sages are deleted after passing througha certain number of nodes, based on thehops-to-live value they carry. Nodes alsostore the ID and other information of therequests they have seen, in order to handle“data reply” and “data failed” messages.

If a node receives a backtracking “datafailed” message from a downstream node,it selects the next best node from its rout-ing stack and forwards the request to it.If all nodes in the routing table have beenexplored in this way and failed, it sendsback a “data failed” message to the nodefrom which it originally received the datarequest message.

If the requested file is eventually foundat a certain node, a reply is passed backthrough each node that forwarded the re-quest to the original node that startedthe chain. This data reply message willinclude the actual data, which is cachedin all intermediate nodes for future re-quests. A subsequent request with thesame key will be served immediately withthe cached data. A request for a similarkey will be forwarded to the node that pre-viously provided the data.

To address the problem of obtainingthe key that corresponds to a specific file,Freenet recommends the use of a specialclass of lightweight files called “indirectfiles”. When a real file is inserted, the au-thor also inserts a number of indirect filesthat are named according to search key-words and contain pointers to the real file.These indirect files differ from normal filesin that multiple files with the same key(i.e. search keyword) are permitted to ex-ist, and requests for such keys keep goinguntil a specified number of results is ac-cumulated, instead of stopping at the firstfile found. The problem of managing the



Fig. 4. The use of indirect files in Freenet.The diagram illustrates a regular file with key“A652D17D88”, which is supposed to contain a tu-torial about photography. The author chose to in-sert the file itself, and then a set of indirect files(marked with an i), named according to search key-words she considered relevant. The indirect filesare distributed among the nodes of the network;they do not contain the actual document, simply a“pointer” to the location of the regular file contain-ing the document.

large volume of such indirect files remainsopen. Figure 4 illustrates the use of indi-rect files.

The following properties of Freenet area result of its routing and location algo-rithms:

—Nodes tend to specialize in searchingfor similar keys over time, as they getqueries from other nodes for similarkeys.

—Nodes store similar keys over time, dueto the caching of files as a result of suc-cessful queries.

—Similarity of keys does not reflect simi-larity of files.

—Routing does not reflect the underlyingnetwork topology.

3.4.2. Chord. Chord [Stoica et al. 2001]is a peer-to-peer routing and location in-frastructure that performs a mapping offile identifiers onto node identifiers. Datalocation can be implemented on top ofChord by identifying data items (files)with keys and storing the (key, data item)pairs at the node that the keys map to.

In Chord, nodes are also identified bykeys. The keys are assigned both to filesand nodes by means of a deterministicfunction, a variant of consistent hashing

Fig. 5. A Chord identifier circle consisting of thethree nodes 0,1 and 3. In this example, key 1 is lo-cated at node 1, key 2 at node 3, and key 6 at node 0.

[Karger et al. 1997]. All node identifiersare ordered in an “identifier circle” modulo2m (Figure 5 shows an identifier circle withm = 3). Key k is assigned to the first nodewhose identifier is equal to, or follows k, inthe identifier space. This node is called thesuccessor node of key k. The use of consis-tent hashing tends to balance load, as eachnode receives roughly the same number ofkeys.

The only routing information requiredis for each node to be aware of its succes-sor node on the circle. Queries for a givenkey are passed around the circle via thesesuccessor pointers until a node that con-tains the key is encountered. This is thenode the query maps to.

When a new node n joins the network,certain keys previously assigned to n’s suc-cessor will become assigned to n. Whennode n leaves the network, all keys as-signed to it will be reassigned to its suc-cessor. These are the only changes in keyassignments that need to take place in or-der to maintain load balance.

As discussed, only one data elementper node needs to be correct for Chord toguarantee correct (though slow) routing ofqueries. Performance degrades gracefullywhen routing information becomes out ofdate due to nodes joining and leaving thesystem, and availability remains high onlyas long as nodes fail independently. Since



Fig. 6. CAN: (a) Example 2-d [0, 1] × [0, 1] coordinate space partitionedbetween 5 CAN nodes; (b) Example 2-d space after node F joins.

the overlay topology is not based on theunderlying physical IP network topology, asingle failure in the IP network may man-ifest itself as multiple, scattered link fail-ures in the overlay [Saroiu et al. 2002].

To increase the efficiency of the loca-tion mechanism described previously, thatmay, in the worst case, require traversingall N nodes to find a certain key, Chordmaintains additional routing information,in the form of a “finger table”. In this table,each entry i points to the successor of noden + 2i. For a node n to perform a lookupfor key k, the finger table is consulted toidentify the highest node n′ whose ID isbetween n and k. If such a node exists, thelookup is repeated starting from n′. Oth-erwise, the successor of n is returned. Us-ing the finger table, both the amount ofrouting information maintained by eachnode and the time required for resolvinglookups are O(logN ) for an N -node sys-tem in the steady state.

Achord [Hazel and Wiley 2002] is pro-posed as a variant of Chord that pro-vides censorship resistance by limitingeach node’s knowledge of the network inways similar to Freenet (see Section 3.4.1).

3.4.3. CAN. The CAN (“ContentAddressable Network”) [Ratnasamyet al. 2001] is essentially a distributed,Internet-scale hash table that maps filenames to their location in the network,

by supporting the insertion, lookup, anddeletion of (key,value) pairs in the table.

Each individual node of the CAN net-work stores a part (referred to as a “zone”)of the hash table, as well as informationabout a small number of adjacent zonesin the table. Requests to insert, lookup, ordelete a particular key are routed via in-termediate zones to the node that main-tains the zone containing the key.

CAN uses a virtual d -dimensionalCartesian coordinate space (see Figure 6)to store (key K ,value V ) pairs. The zoneof the hash table that a node is respon-sible for corresponds to a segment of thiscoordinate space. Any key K is, therefore,deterministically mapped onto a point Pin the coordinate space. The (K , V ) pair isthen stored at the node that is responsiblefor the zone within which point P lies. Forexample, in the case of Figure 6(a), a keythat maps to coordinate (0.1,0.2) would bestored at the node responsible for zone B.

To retrieve the entry corresponding toK , any node can apply the same determin-istic function to map K to P , and then re-trieve the corresponding value V from thenode covering P . Unless P happens to liein the requesting node’s zone, the requestmust be routed from node-to-node until itreaches the node covering P .

CAN nodes maintain a routing tablecontaining the IP addresses of nodes thathold zones adjoining their own, to enable



routing between arbitrary points in space.Intuitively, routing in CAN works by fol-lowing the straight line path through theCartesian space from source to destinationcoordinates. For example, in Figure 6(a), arequest from node A for a key mapping topoint p would be routed though nodes A,B, E, along the straight line representedby the arrow.

A new node that joins the CAN system isallocated its own portion of the coordinatespace by splitting the allocated zone of anexisting node in half, as follows:

(1) The new node identifies a node al-ready in CAN network, using a boot-strap mechanism as described in Fran-cis [2000].

(2) Using the CAN routing mechanism,the node randomly chooses a point Pin the coordinate space and sends aJOIN request to the node covering P .The zone is then split, and half of it isassigned to the new node.

(3) The new node builds its routing tablewith the IP addresses of its new neigh-bors, and the neighbors of the splitzone are also notified to update theirrouting tables to include the new node.

When nodes gracefully leave CAN, thezones they occupy and the associatedhash table entries are explicitly handedover to one of their neighbors. Undernormal conditions, a node sends periodicupdate messages to each of its neighborsreporting its zone coordinates, its list ofneighbors, and their zone coordinates. Ifthere is prolonged absence of such an up-date message, the neighbor nodes realizethere has been a failure, and initiate a con-trolled takeover mechanism. If many ofthe neighbors of a failed node also fail, anexpanding ring search mechanism is ini-tiated by one of the neighboring nodes, toidentify any functioning nodes outside thefailure region.

A list of design improvements is pro-posed over the basic CAN design describedincluding the use of multi-dimensionalcoordinate space, or multiple coordinatespaces, for improving network latency andfault tolerance; the employment of more

advanced routing metrics, such as connec-tion latency and underlying IP topology,alongside the Cartesian distance betweensource and destination; allowing multiplenodes to share the same zone, mapping thesame key onto different points; and the ap-plication of caching and replication tech-niques [Ratnasamy et al. 2001].

3.4.4. Tapestry. Tapestry [Zhao et al.2001] supports the location of objects andthe routing of messages to objects (or theclosest copy of them, if more than one copyexist in the network) in a distributed, self-administering, and fault-tolerant manner,offering system-wide stability by bypass-ing failed routes and nodes, and rapidlyadapting communication topologies to cir-cumstances.

The topology of the network is self-organizing as nodes come and go, and net-work latencies vary. The routing and loca-tion information is distributed among thenetwork nodes; the topology’s consistencyis checked on-the-fly, and if it is lost due tofailures or destroyed, it is easily rebuilt orrefreshed.

Tapestry is based on the location androuting mechanisms introduced by Plax-ton, Rajamaran and Richa [1997], in whichthey present the Plaxton mesh, a dis-tributed data structure that allows nodesto locate objects and route messages tothem across an arbitrarily-sized overlaynetwork, while using routing maps ofsmall and constant size. In the originalPlaxton mesh, the nodes can take on therole of servers (where objects are stored),routers (which forward messages), andclients (origins of requests).

Each node maintains a neighbor map,as shown in the example in Table IV. Theneighbor map has multiple levels, eachlevel l containing pointers to nodes whoseID must be matched with l digits (the x ’srepresent wildcards). Each entry in theneighbor map corresponds to a pointer tothe closest node in the network whose IDmatches the number in the neighbor map,up to a digit position.

For example, the 5th entry for the 3rdlevel for node 67493 points to the node



Table IV. The Neighbor Map Held by Tapestry NodeWith ID 67493

Level 5 Level 4 Level 3 Level 2 Level 1

Entry 0 07493 x0493 xx093 xxx03 xxxx0Entry 1 17493 x1493 xx193 xxx13 xxxx1Entry 2 27493 x2493 xx293 xxx23 xxxx2Entry 3 37493 x3493 xx393 xxx33 xxxx3Entry 4 47493 x4493 xx493 xxx43 xxxx4Entry 5 57493 x5493 xx593 xxx53 xxxx5Entry 6 67493 x6493 xx693 xxx63 xxxx6Entry 7 77493 x7493 xx793 xxx73 xxxx7Entry 8 87493 x8493 xx893 xxx83 xxxx8Entry 9 97493 x9493 xx993 xxx93 xxxx9

Each entry in this table corresponds to a pointer toanother node.

closest to 67493 in network distance whoseID ends in ..593. Table IV shows an exam-ple neighbor map maintained by a nodewith ID 67493.

Messages are, therefore, incrementallyrouted to the destination node digit-by-digit, from the right to the left. Figure 7shows an example path taken by a mes-sage from node with I D = 67493, to nodeI D = 34567. The digits are resolved rightto left as follows:

xxxx7 → xxx67 → xx567 → x4567→ 34567

The Plaxton mesh uses a root node foreach object, that serves to provide a guar-anteed node from which the object canbe located. When an object o is insertedin the network and stored at node ns,a root node nr is assigned to it by us-ing a globally consistent deterministic al-gorithm. A message is then routed fromns to nr , storing data in the form of amapping (object id o, storer id ns) at allnodes along the way. During a locationquery, messages destined for o are ini-tially routed towards nr , until a node isencountered containing the (o, ns) locationmapping.

The Plaxton mesh offers: (1) simplefault-handling by its potential to routearound a single link or node by choos-ing a node with a similar suffix, and (2)scalability (with the only bottleneck exist-ing at the root nodes). Its limitations in-clude: (1) the need for global knowledge

Fig. 7. Tapestry: Plaxton mesh routing ex-ample, showing the path taken by a messageoriginating from node 67493, and destined fornode 34567 in a Plaxton mesh, using decimaldigits of length 5.

required for assigning and identifying rootnodes, and (2) the vulnerability of the rootnodes.

The Plaxton mesh assumes a static nodepopulation. Tapestry extends its design toadapt it to the transient populations ofpeer-to-peer networks and provide adapt-ability, fault tolerance, as well as vari-ous optimizations described in Zhao et al.[2001] and outlined below:

—Each node additionally maintains a listof back-pointers, which point to nodeswhere it is referred to as a neighbor.These are used in dynamic node inser-tion algorithms to generate the appro-priate neighbor maps for new nodes. Dy-namic algorithms are employed for nodeinsertion, populating neighbor maps,and notifying neighbors of new node in-sertions.

—The concept of distance between nodesbecomes semantically more flexible, andlocations of more than one replica ofan object are stored, allowing the ap-plication architecture to define how the“closest” node will be interpreted. Forexample, in the Oceanstore architecture[Kubiatowicz et al. 2000], a “freshness”metric is incorporated in the conceptof distance, that is taken into account



when finding the closest replica of a doc-ument.

—The use of soft-state to maintain cachedcontent, based on the announce/listenapproach [Deering 1998], is adopted byTapestry to detect, circumvent, and re-cover from failures in routing or ob-ject location. Caches are periodicallyupdated by refreshment messages, orpurged if no such messages are re-ceived. Additionally, the neighbor mapis extended to maintain two backupneighbors, in addition to the closest (pri-mary) neighbor. Furthermore, to avoidcostly reinsertions of nodes after fail-ures, when a node realizes that a neigh-bor is unreachable, instead of removingits pointer, it temporarily marks it as in-valid in the hope that the failure will berepaired, and, in the meantime, routesmessages through alternative paths.

—To avoid the single point of failure thatroot nodes constitute, Tapestry assignsmultiple roots to each object. This en-ables a trade-off between reliability andredundancy. A distributed algorithmcalled surrogate routing is employed tocompute a unique root node for an objectin a globally consistent fashion, giventhe nonstatic set of nodes in the network

—A set of optimizations improve per-formance by adapting to environmentchanges. Tapestry nodes tune theirneighbor pointers by running refresherthreads that update network latencyvalues. Algorithms are implemented todetect query hotspots and offer sugges-tions as to where additional copies ofobjects can be placed to significantly im-prove query response time. A “hotspotcache” is also maintained at each node.

Tapestry is used by several systemssuch as Oceanstore [Kubiatowicz et al.2000; Rhea et al. 2001], Mnemosyne[Hand and Roscoe 2002], and Scan [Chenet al. 2000].

Oceanstore further enhances the per-formance and fault tolerance of Tapestryby applying an additional “introspec-tion layer”, where nodes monitor usagepatterns, network activity, and resource

availability to adapt to regional outagesand denial of service attacks.

Pastry [Rowstron and Druschel 2001]is a scheme very similar to Tapestry,differing mainly in its approach toachieving network locality and objectreplication. It is employed by the PAST[Druschel and Rowstron 2001] large-scale persistent peer-to-peer storageutility.

Finally Kademlia [Mayamounkov andMazieres 2002] is proposed as an improvedXOR-topology-based routing algorithmsimilar to Tapestry and Pastry, focusingon consistency and performance. It intro-duces the use of a concurrence parameter,that lets users trade bandwidth for betterlatency and fault recovery.

3.4.5. Comparison, Shortcomings and Evo-lution of Structured Architectures. In theprevious sections, we have presentedcharacteristic examples of structuredpeer-to-peer object location and routingsystems, and their main characteristics.A comparison of these (and similar) sys-tems can be based on the size of therouting information maintained by eachnode (the routing table), their searchand retrieval performance (measured innumber of hops), and the flexibility withwhich they can adapt to changing networktopologies.

It turns out that in their basic de-sign, most of these systems are equiva-lent in terms of routing table space cost,which is O(log N ), where N is the size ofnetwork. Similarly, their performance isagain mostly O(log N ), with the exceptionof CAN, where the performance is given byO( d4 N

1d ), d being the number of employed

dimensions.Maintaining the routing table in the

face of transient node populations is rela-tively costly for all systems, perhaps morefor Chord, as nodes joining or leaving in-duce changes to all other nodes, and lessso in Kademlia which maintains a moreflexible routing table. Liben-Nowell et al.[2002a] introduce the notion of the “half-life” of a system as an approach to thisissue.



Overall, these systems present rela-tively equivalent solutions to routing andlocation in distributed peer-to-peer en-vironments, focusing, however, on dif-ferent aspects, and prioritizing differ-ently the various design issues (see alsoBalakrishnan et al. [2003]) that have beenadopted by various systems.

One of the major limitations of struc-tured peer-to-peer systems is that they aredesigned to support exact match lookups.One needs to know the exact identifier(key) of a data item in order to be able tolocate the node that stores it. To addressthis issue, keyword query techniques canbe designed on top of exact-match queries,although it is arguable whether suchtechniques can be efficiently implemented[Lv et al. 2002]. In Garces-Erice et al.[2004] an approach is proposed for aug-menting peer-to-peer systems based ondistributed hash tables with a mecha-nism for locating data using incompleteinformation. The mechanism relies ondistributing hierarchically-organized in-dexes that maintain information about thedata stored in nodes. A system to supportcomplex queries over structured peer-to-peer systems is proposed in Harren et al.[2002]. This approach relies on the un-derlying peer-to-peer system for both in-dexing and routing, and implements aparallel query processing layer on top of it.

Heleher et al. [2002], furthermore, ar-gue that by creating keys for accessingdata items (i.e. by virtualizing the namesof the data items) two main problemsarise:

—Locality is Destroyed. Files originatingfrom a single site are not usually colo-cated, meaning that opportunities forenhanced browsing, prefetching, and ef-ficient searching are lost. They proposean approach allowing systems to exploitobject locality.

—Useful Application Level Information isLost. Data used by applications is oftennaturally organized in a hierarchical-fashion. The virtualization of the filenamespace that takes place as files arerepresented by keys discards this infor-mation.

The ability of structured peer-to-peersystems to maintain the structure re-quired for routing in order to function effi-ciently when nodes are joining and leavingat a high rate is an open research issue.The resilience of structured peer-to-peersystems in the face of a very transient userpopulation is considered in Lv et al. [2002]and Liben-Nowell et al. [2002b].

Finally, Saroiu et al. [2002] argue that,due to the deterministic nature of thetopology of structured systems, it is pos-sible for an attacker to construct a setof node identifiers that, if inserted in thenetwork, could intercept all lookup re-quests originating from a particular node.There are mechanisms for preventing apeer from deliberately selecting arbitraryidentifiers, however, they can probabilisti-cally be circumvented, if enough addressesare selected.

4. CONTENT CACHING, REPLICATION ANDMIGRATION

Peer-to-peer content distribution systemsrely on the replication of content on morethan one node for improving the avail-ability of content, enhancing performance,and resisting censorship attempts. Con-tent replication can be categorized as fol-lows:

Passive Replication. Content replicationoccurs naturally in peer-to-peer systemsas nodes request and copy content fromone another. This is referred to as passivereplication.

Cache-Based Replication. A form ofreplication employed in several systems,cache-based replication (e.g. OceanStore,Mojonation, Freenet) is the result ofcaching copies of content as it passesthrough nodes in the network. In Freenet[Clarke et al. 2000] for instance, when asearch request succeeds in locating a file,the file is transferred through the networknode-by-node back to the originator of therequest (see also Section 3.4.1). In the pro-cess, copies of the file are cached on allintermediated nodes, increasing its avail-ability and location characteristics in fu-ture requests.



Active Replication. Active (or proactive)content replication and migration meth-ods are often employed to improve localityof data, as well as availability and perfor-mance.

Instrospective replica management tech-niques are employed by OceanStore, ontop of extensive caching, whereby trafficis observed, and replicas of content ob-jects are created to accommodate demand[Rhea et al. 2001]. OceanStore thus adaptsto regional outages and denial of serviceattacks by migrating data to areas of use,and maintaining sufficiently high levels ofdata redundancy.

A similar approach is seen in MojoNa-tion, where additional copies of populardata are made by a central services bro-ker that monitors traffic and requests, andare distributed to peer nodes for storage[MojoNation 2003].

Dynamic replica management algo-rithms are used in Scan to dynami-cally place a minimum number of repli-cas, while meeting quality of service andserver capacity constraints [Chen et al.2000]. Such algorithms are designed tosatisfy both client latency and server loadsby first searching for existing replicasthat meet the client latency constraintwithout being overloaded, and, if unsuc-cessful, proceeding to place new replicas.Different versions of these algorithms dif-fer in the degree to which replicas are en-forced to lie close to the client requestingthem.

In proactive replication studies for theGnutella network [Cohen and Shenker2001], it was found that the optimal repli-cation scheme for nondeterministic searchis to replicate objects such that

p ∝ √qr ,

where p is the number of object replicas,and qr is the object query rate.

By replicating content, data consistencyand synchronization issues come up thatneed to be addressed, especially in sys-tems that allow the deletion and update ofcontent (see also Section 9). Some appli-cations effectively decide to weaken theirconsistency restrictions in favor of more

extensive data replication and higheravailability.

Replication is of increased significanceto systems employing steganographic filestorage techniques for encryption, such asMnemosyne [Hand and Roscoe 2002], asthese systems must ensure that enoughcopies of the files are available to avoidcollisions. Such encryption techniques arefurther discussed in Section 5.

Replication is more of a challenge forstructured systems such as Chord, wherethe location of objects is directly bound tothe object identifiers. In some cases, theuse of aliases is employed to allow for repli-cation [Stoica et al. 2001; Saroiu et al.2002]. In Tapestry, a replication functionproduces a set of random keys, yielding aset of replica roots at random points in theID space; a replica function in Chord mapsan object’s key to the node IDs of the neigh-bor set of the key’s root; in CAN a replicafunction produces random keys for stor-ing replicas at diverse locations [Wallach2002]

Finally mechanisms for trading content,discussed in Section 8, essentially result inan additional form of data replication. InFreeHaven, for example, peers frequentlytrade and exchange parts of documents(shares) with each other, with the sole aimof increasing availability and censorshipresistance [Dingledine et al. 2001a].

5. SECURITY

Peer-to-peer content distribution architec-tures present a particular challenge forproviding the levels of availability, privacy,confidentiality, integrity, and authentic-ity often required, due to their open andautonomous nature. The network nodesmust be considered untrusted parties, andno assumptions can be made regardingtheir behavior.

This Section discusses various ap-proaches that address a number of secu-rity issues characteristic of peer-to-peercontent distribution systems. We particu-larly focus on secure storage, secure rout-ing, access control, authentication, andidentity management.



5.1. Secure Storage

Various cryptographic algorithms and pro-tocols are employed to provide security forcontent published and stored in peer-to-peer networks.

Self-Certifying Data. Self-certifyingdata is data whose integrity can beverified by the node retrieving it [Castroet al. 2002]. A node inserting a file in thenetwork calculates a cryptographic hashof the file’s content, based on a knownhashing function, to produce the file key.When a node retrieves the file using itskey, it calculates the same hash functionto verify the integrity of the data. Note therequirement for common knowledge of thehashing function by all nodes in the net-work. The above approach is adopted bythe CFS system [Dabek et al. 2001], whilePAST uses a similar approach, basedon signed files [Druschel and Rowstron2001]. Note that the method depends onusing the hash-derived signature as afile’s unique access key.

Information Dispersal. The informationdispersal algorithm by Rabin [1989] iswidely used. Publius [Waldman et al.2000], Mnemosyne [Hand and Roscoe2002], and FreeHaven [Dingledine et al.2000] employ the algorithm for encodinginformation to be published in the net-work. Files are encoded into m blocks,so that any n is sufficient to reassem-ble the original data (m < n). This givesresilience “proportional” to a redundancyfactor equal to mn .

Shamir’s Secret Sharing Scheme.Shamir’s [1979] secret sharing scheme(SHA) is also used in several systems(Publius [Waldman et al. 2000], Mo-joNation [MojoNation 2003], and PAST[Druschel and Rowstron 2001]). The pub-lisher of the content encrypts a file witha key K , then splits K into l shares, sothat any k of them can reproduce K , butk − 1 give no hints about K . Each serverthen encrypts one of the key shares, alongwith the file block. In order for the file tobecome inaccessible, at least (l − k − 1)servers containing the key must be shutdown.

Anonymous Cryptographic Relays. Amechanism based on publisher, forwarder,storer, and client peers communicatingthrough anonymous connection layers isemployed by PAST [Serjantov 2002]. Apublisher selects several forwarders andsends to them, via an anonymous con-nection, encrypted shares of a file. Theforwarders, in turn, select other nodesto act as storers for the shares, and for-ward the shares to them (again via ananonymous connection). Once all sharesare stored, the publisher destroys them,and announces the file name, togetherwith the list of forwarders that wereused.

In order to retrieve the content, a clientwill contact the forwarders; the forwarderswill, in turn, contact random servers toact as decryptors for the addresses of thestorers holding the shares. The forwarderswill then contact the storers requesting theshares. The storers will decrypt the sharesand send them back to the client. The pro-cess repeats until enough shares are col-lected to reconstruct the document.

Forwarders are visible to a potential at-tacker, but they are less likely to be at-tacked since they don’t store the content,nor the addresses of the nodes storing thecontent.

Smartcards. The use of smartcards fortracking each node’s use of remote re-sources and issuing digitally signed tick-ets was also suggested in the PAST system[Druschel and Rowstron 2001]. This wouldallow nodes to prove to other nodes thatthey are operating within their quota. Itis argued, however, that it may be imprac-tical to issue smartcards to a very largenumber of nodes, and that the danger ofthe smartcard keys being compromised bynodes of the network is considerable [Wal-lach 2002].

Distributed Steganographic File Sys-tems. A distributed steganographic file sys-tem, based on Anderson et al. [1998] isimplemented by Mnemosyne [Hand andRoscoe 2002]. The main property of asteganographic file system is that en-crypted blocks are indistinguishable froma random substrate, so that their presence



cannot even be detected. The system isprepared by first writing random data toall blocks, and then files are stored byencrypting their blocks and placing themat pseudo-randomly chosen locations (e.g.by hashing the block number with a ran-domly chosen key). To avoid collisions, aconsiderable amount of replication is re-quired.

Erasure Coding. A mechanism based onerasure coding for data durability and aByzantine agreement protocol [Lamportet al. 1982] for consistency and update se-rialization is implemented by OceanStore[Rhea et al. 2001].

With erasure coding, the data is brokenin to blocks and spread

A Survey of Peer-to-Peer Content Distribution...

Documents

Transcript of A Survey of Peer-to-Peer Content Distribution...