Preventing DDoS attacks on internet servers exploiting P2P systems

se

ester

Responsible Editor: C. Douligeris

Keywords:P2PDDoSMembership validationExperimental evaluation

beems mthe P

dated before use; (ii) innocent participants must only propagate validated information;

matumerc

by causing large-scale distributed denial of service (DDoS)attacks on web servers and other sites not even part of theoverlay system. In particular, an attacker could subvertmembership management mechanisms, and force a large

shown in [2]. Since then, several works including our own[36,16] have demonstrated the generality of the problem,by showing that a variety of extensively deployed systemsmay be exploited to launch DDoS attacks. The systemsinclude unstructured le-sharing systems such as Gnutella[3], and BitTorrent [4,5], DHT-based le-sharing systemssuch as Overnet [2], and Kad [6,16], and a video broadcast-ing system based on End System Multicast (ESM) [6].

1389-1286/$ - see front matter 2010 Elsevier B.V. All rights reserved.

* Corresponding author. Tel.: +1 765 409 1291.E-mail addresses: [email protected] (X. Sun), [email protected]

(R. Torres), [email protected] (S. Rao).

Computer Networks 54 (2010) 27562774

Contents lists available at ScienceDirect

Computer N

.e lsdoi:10.1016/j.comnet.2010.05.02110]. Given the increasing prevalence of the technology, itbecomes critical to consider how such systems can bedeployed in a safe, secure and robust manner.

Several works [1115] have studied how maliciousnodes in a P2P system may disrupt the normal functioning,and performance of the system itself. In this paper,however, we focus on attacks where malicious nodes in aP2P system may impact the external Internet environment,

innocent nodes are not distinguishable from normal proto-col packets.

While the community has been aware of the possibilityof exploiting P2P systems to launch DDoS attacks for sev-eral years (for example [1]), a number of researchers havehighlighted the criticality of the problem in recent years.The feasibility of exploiting the intrinsic characteristics ofP2P systems for indirection attacks was rst systematically1. Introduction

Peer-to-peer (P2P) systems havewe have recently seen several comand (iii) the system must protect against multiple references to the victim. We systemat-ically explore the effectiveness of an active probing approach to validating membershipinformation in thwarting such DDoS attacks. The approach does not rely on centralizedauthorities for membership verication, and is applicable to both structured (DHT-based)and unstructured P2P systems. We believe these considerations are important to ensurethe mechanisms can be integrated with a range of existing P2P deployments. We evaluatethe techniques in the context of a widely deployed DHT-based le-sharing system, and avideo broadcasting system with stringent performance requirements. Our results showthe promise of the approach in limiting DDoS attacks while not sacricing applicationperformance.

2010 Elsevier B.V. All rights reserved.

red to the pointial offerings [8

fraction of nodes in the system to believe in the existenceof, and communicate with a potentially arbitrary node inthe Internet. The attacks are hard to detect and track-downas the packets being exchanged between the attacker andAccepted 13 May 2010Available online 6 July 2010

them based on the underlying cause for attack amplication. We show that the attacksstem from a violation of three key principles: (i) membership information must be vali-Preventing DDoS attacks on internet

Xin Sun *, Ruben Torres, Sanjay RaoSchool of Electrical and Computer Engineering, Purdue University, 465 Northw

a r t i c l e i n f o

Article history:Received 3 June 2009Received in revised form 4 November 2009

a b s t r a c t

Recently, there hasdeployed P2P systeservers, external to

journal homepage: wwwrvers exploiting P2P systems

n Avenue, West Lafayette, IN 47907, United States

n a spurt of work [17] showing that a variety of extensivelyay be exploited to launch DDoS attacks on web and other Internet2P system. In this paper, we dissect these attacks and categorize

etworks

evier .com/locate /comnet

Each node contacts one or more trackers to obtain a list of

an unstructured overlay for data delivery, and employs a

X. Sun et al. / Computer Networks 54 (2010) 27562774 2757Attacks can be signicant-for example, [6] showed an at-tack where over 700 Mbps of UDP trafc was generatedat the victim by exploiting Kad using 200 attacker ma-chines. Creating the attack heuristics was as simple asmodifying two source les and less than 200 lines.

While traditional ways to launch DDoS attacks such asDNS reector attacks [17], or botnet-based DDoS attacks[18] are more widespread today, there is evidence to sug-gest that exploiting P2P systems to launch DDoS attacksin the wild is on the rise [7,19]. For instance, ProlexicTechnologies has reported that they have observed whatthey term a DC++ attack [7] that involved over 300 K IPaddresses. Discussions in the eMule forum [19,20] haveindicated DDoS attacks on DNS servers exploiting theDHT-based Kad system. We present evidence of this in Sec-tion 2.3 through trafc measurements collected at the edgeof a MiniPoP of an ISP. We believe it is imperative to sys-tematically study the problem given the large-scaledeployments of P2P systems, the emerging reports of at-tacks in the wild, and the relative lack of attention to thearea.

In this paper, we seek to obtain a deeper understandinginto the threats by studying the intrinsic design limitationsof existing P2P systems which leave them vulnerable tosuch DDoS attacks. As a rst contribution of this paper,we categorize all known DDoS attacks presented to date.In our classication, we focus on the underlying cause forachieving amplication. By amplication, we refer to theratio of the number of messages (or bytes) received bythe victim to the total number of messages (or bytes) sentby all malicious nodes. We focus on amplication since thisis the key factor that determines whether an attack isattractive to a malicious node. We then articulate keydesign principles that P2P designers must follow to avoidthese sources of amplication. The principles highlightthe need to validate membership information before theyare further used or propagated, and the need to protectagainst multiple references to the victim. While these prin-ciples are almost obvious in retrospect, the failure to followthe guidelines in a wide range of deployed systems, andthe resulting repercussions are striking.

As a second contribution of this paper, we systemati-cally explore the effectiveness of an active probingapproach to validating membership information in miti-gating the threats. We focus on this approach since it doesnot rely on centralized authorities for membership veri-cation, and since it is applicable to both structured (DHT-based) and unstructured approaches. The key issues withan active probing approach are ensuring that the probesused for validation themselves do not become a source ofDDoS, and dealing with benign validation failures thatmay occur due to packet loss, churn in group membership,and the presence of hosts behind Network Address Trans-lators (NATs). We present simple mechanisms to addressthese issues, and show that with the mechanisms, themaximum amplication achievable by attackers can bebounded. We have incorporated these mechanisms intwo contrasting applications a DHT-based le-sharingsystem (Kad [21]), and a video broadcasting system (ESM[22]) with stringent performance requirements. Throughextensive experimental evaluation, we show that thegossip-based membership management protocol. Eachnode periodically picks another node at random, and sendsit a subset of the peers it knows. A node adds to its routingtable any peer that it has not already known, and may usepeers in the swarm, and starts exchanging data with thepeers. Each node also contacts the trackers periodicallyafterwards to discover more peers.

Gnutella is another popular unstructured P2P le-shar-ing system which has a two-tier hierarchy. In Gnutella,when a node launches a query for a le, other nodes mayreply with a query hit message that includes the IP andport of a peer that has the le. In addition, when the noderequests the peer for the le, the format of the le requestmessage is HTTP based.

ESM is one of the rst operationally deployed P2P videostreaming systems. It constructs a multicast tree on top ofschemes may be suitably parameterized to ensure thatDDoS attacks are effectively limited, while not sacricingapplication performance.

2. DDoS attacks by exploiting P2P systems

Recently researchers have shown how a variety of P2Psystems can be exploited to launch DDoS attacks on anyInternet host such as a web server [36,23,16]. Each ofthese works presents attack heuristics on a specic system,and to date there have been several different attacks re-ported on ve widely deployed P2P systems. In this sec-tion, we begin by presenting an overview of thesesystems in Section 2.1, and then summarize the attacksexploiting them in Section 2.2.

2.1. Systems background

Kad and Overnet are large-scale DHT-based le-sharingsystems with each having more than one million concur-rent users. Kad is supported by the popular eMule [21] cli-ent and its clones, while Overnet is supported by eDonkey[24]. These two systems are similar because they bothimplement the Kademlia [25] protocol, and differ primarilyin implementation issues. In both systems, each participat-ing node maintains a routing table with a subset of peers.For any given le, there are multiple index nodes, eachof which maintains a list of members who own that le. In-dex nodes are regular participants, who have an ID close toa le ID. Every node periodically publishes to the indexnodes what les it owns. When a node wants to downloada le, it rst employs an iterative query lookup mechanismto locate an index node, and it obtains a list of membershaving the le from the index node.

BitTorrent is a very popular tracker based unstructuredP2P system for le sharing. In BitTorrent, if a node wantsto download a le (say a movie), it must rst downloadthe torrent le for that movie, which is published throughout-of-band channels such as websites. The torrent lecontains a list of trackers. A tracker is a central serverwhich maintains the membership information of a swarm.

these peers for various protocol operations such as parentselection.

2.2. Attacks

In any scalable P2P system, a node A may learn aboutanother node C from a peer node B, as shown in Fig. 1(a).For example, this is required when a node locates indexnodes, or obtains a list of sources for a le. However, thisoperation can be exploited by a malicious node M to redi-rect A to the victim V, as shown in Fig. 1(b). V can poten-tially be any Internet host such as a web server. In this

to the tracker that the victim is a participating peer,through packet spoong. This would cause the tracker toredirect innocent participants to the victim, who in turnrepeatedly initiate TCP connections to the victim. Harring-ton et al. [23] present a variant of this attack where thetracker itself is malicious and falsely tells innocent partic-ipants in the swarm that the victim is a participating peer.Defrawy et al. [4] present an attack where the attackerpublishes fake torrent les to web sites of known torrentsearch engines. Each of the fake torrent les includes thevictims IP several times, each with a different port. Inno-cent participants who download the torrent les believethat there are different trackers running on the same IP,and repeatedly try to connect to each of the trackers.

2758 X. Sun et al. / Computer Networks 54 (2010) 27562774section we summarize various ways in which this vulner-ability has been exploited to cause large-scale DDoSattacks.

Attacks exploiting Kad and Overnet: While [6,2] have pre-sented attacks on Kad and Overnet, respectively, we be-lieve that most of these attacks are applicable to bothsystems. Thus we summarize them together here.

Index poisoning [2]: This attack is depicted in Fig. 2.Here, a malicious node M publishes to an index node I thatthe victim holds some le. Later any innocent participant Alooking for a set of sources for the le will contact I and beredirected to the victim V. The redirected participants willtry to establish TCP connections to the victim in order todownload the le. This could not only result in TCP SYNs,but also result in successful TCP connections if for instancean actual web or mail server were running on the victim.The bar for such an attack could be raised by not requiringnodes to insert their IP and port information in applicationlayer messages to begin with, however we note that indexpoisoning attacks could still occur if malicious nodes couldconduct packet level spoong.

NAT-buddy exploit: This attack may be viewed as a var-iant of the Index poisoning attack. It exploits a NAT traversalmechanism that is commonly used in todays P2P systems,including both Kad and Overnet. In such a mechanism, anode behind a NAT could select a public node as itsbuddy. When the NAT node publishes a le, informationabout the buddy is included in its publish message to theindex nodes. A malicious node exploits this to launch aDDoS attack, by advertising the victim as its buddy. Wheninnocent participants obtain a set of sources from the in-dex node, they contact the buddy (victim) as per normalprotocol operations resulting in a DDoS attack on the vic-tim. This attack is briey discussed in [2]. The attack couldpotentially be prevented by modifying the underlying pro-tocols so that NAT nodes are required to publish content

Fig. 1. (a) Normal operation and (b) attack.through their buddies, however the protocol modicationsmust ensure the overheads at the buddy are not increasedsignicantly, and must include mechanisms to ensuremalicious nodes cannot deliberately increase the over-heads on the buddy, for instance by providing informationabout non-existent les.

Search hijack [6]: This attack exploits the parallel lookupmechanism of Kad and Overnet, where multiple nodes maybe included in a reply to a query. In particular, when amalicious node receives a search query from an innocentparticipant, it includes in the reply multiple (fake) logicalidentiers, all sharing the IP address of the victim. This re-sults in the innocent participant querying the victimmulti-ple times. Note that in order to enable distinct users behindthe same NAT to participate in the system, Kad, Overnet,and many other P2P systems allow a participating nodeto communicate with multiple logical identiers eventhough they share the same IP address.

Routing table poisoning [2]: This attack is specic toOvernet. It exploits the announcement messages, whichenable nodes to announce themselves to others. In partic-ular, a malicious node may put the victims IP address in anannouncement message and send it to an innocent partic-ipant, by exploiting a vulnerability specic to Overnet. Thisresults in the innocent participant adding the victim to itsrouting table, and using it for normal protocol operations.Like with index poisoning attacks, the bar for such an at-tack could be raised by not requiring nodes to insert theirIP and port information in application layer messages,however the attacks could still occur if malicious nodescould conduct packet level spoong.

Attacks exploiting BitTorrent: Sia [5] presents an attackwhere malicious nodes in a BitTorrent swarm can report

Fig. 2. Index poisoning attack in Overnet.

10000

100000

ssfu

l flo

ws

a

X. Sun et al. / Computer Networks 54 (2010) 27562774 2759Attacks exploiting Gnutella: Athanasopoulos et al. [3]present an attack where malicious nodes include the vic-tims IP address in their replies to le query messages sentby innocent participants. The victim in this attack is a webserver which hosts some les. Further, the le names in thereply messages are constructed in such a way that the lerequests sent by innocent participants to the victim willlook exactly like genuine HTTP requests from normalweb clients. This causes the victim to upload an entire leto the redirected nodes.

100

1000

10 100 1000 10000

unsu

cce

port number

0.4 0.45

0.5 0.55

0.6 0.65

0.7 0.75

0.8 0.85

0.9 0.95

10 100 1000 10000

fract

ion

of u

nsuc

cess

ful f

low

s

port number

b

Fig. 3. (a) Number of unsuccessful Kad ows sent to ports that receivedmore than 500 unsuccessful ows. (b) Fraction of unsuccessful Kad owsto ports that received more than 500 unsuccessful ows.Attacks exploiting ESM: This attack is presented in [6],where a malicious node M exploits the push-based natureof the gossip protocol in ESM. In particular, a maliciousnode generates false information about the victim beingpart of the group, and aggressively pushes the informationas part of its gossip messages to innocent participants. Inaddition, the malicious node includes the victims IP sev-eral times in a gossip message, each with a different logicalID, similar to the Search hijack attack in Kad.

2.3. Potential evidence for real attacks in the wild

We have analyzed a one day trace collected at the edgeof a MiniPoP of an ISP, where we found potential evidenceof abnormal trafc to DNS servers from peers running theKad system. Our trace consists of ow-level logs.1 Ouranalysis considers all Kad UDP trafc between hosts insidethe network and hosts in the Internet. Fig. 3(a) shows thenumber of unanswered ows (i.e., ows for which onlyoutbound trafc was seen), as a function of the destination

few generic design principles for P2P developers.

1 A ow is identied by the 5-tuple comprising source and destinationaddresses and ports, and protocol. A UDP ow starts when the rst packet isobserved, and is considered to end if no packet is seen in any direction for200 s. If a packet is never observed in the reverse path, then the UDP ow issaid to be unanswered.3.1. Classifying attacks by amplication causes

Based on the source of amplication, the attacks de-scribed in Section 2.2 can be classied as follows (also inTable 1):

3.1.1. Repeated packets to victimThis is the simplest source of amplication, where an

innocent participant may keep using the victim it learnedfrom a malicious node for various protocol operations, de-spite the fact that it has never been able to communicatewith the victim. Surprisingly, many of the systems havethis vulnerability.

3.1.2. MultifakeIn this case, a malicious node may convey the same vic-

tim multiple times to an innocent participant, by disguis-ing the victim as multiple different members of thesystem. This is a major source of amplication in the caseof the Search hijack attack on Kad and Overnet, the attackon ESM, and the attack on BitTorrent involving fake torrentles [4]. This heuristic can be generalized to mount an at-tack on a network, by having the malicious node includingport number. Impulses were plotted only for ports that re-ceived more than 500 unanswered ows. The spike at port4672 is expected since this is the default UDP Kad port.However, it is interesting that there is a spike at port 53(DNS port). Fig. 3(b) shows the fraction of unansweredows out of the total outgoing ows, as a function of theport number. The graph shows that port 53 has the highestratio of unanswered ows with 92%.

We considered whether these results could be due toactual Kad clients running on port 53, which are aimingto hide their presence in rewalled networks. We believethere are two reasons this is unlikely to be the case. First,a majority of ows (92%) to port 53 are unanswered, whilefor other ports, the percentage of unanswered ows was atmost 60%. Second, manual investigation (e.g., by doing annslookup on the target IP addresses), indicated that mostdestinations were DNS servers (e.g., in China and Thailand).

We believe these results are abnormal, and potentiallypoint to a DDoS attack exploiting the Kad system. Our re-sults also corroborate discussions in the eMule forum[19], and works by other researchers [20], which furtherstrengthens our belief that these results may be due to aDDoS attack. Finally, we note that others have observedDDoS attacks in the wild exploiting other P2P systems [7].

3. Eliminating attack amplication

While specic solutions may potentially be designed foreach of the attacks listed in Section 2, the fact that a mul-titude of attacks have been reported against a range of sys-tems leads us to explore more general principles andtechniques to protect against the entire class of attacks.In this section, we dissect the attacks in Section 2.2, andidentify a few underlying patterns which lead to large at-tack amplication. Based on the insights, we articulate a

victim

2760 X. Sun et al. / Computer Networks 54 (2010) 27562774fake membership information about several different IPaddresses, all of them belonging to the same network.

3.1.3. DelegationFor attacks in this class, amplication is achieved be-

cause the innocent participants spread fake membershipinformation. This is a major source of amplication in theIndex poisoning and Nat-buddy exploit attacks in Kad andOvernet where the index nodes propagate the victim, inthe attacks on BitTorrent, where the trackers and torrentwebsites propagate the victim, and in the attacks onESM, where innocent participants gossip about the victimto each other.

3.1.4. Triggering large reply from victimThis class includes attacks such as those on Gnutella [3],

where an entire le is sent in response to a query, leadingto high amplication.

3.2. Principles for robust design

We next present several key design principles which iffollowed could eliminate all the amplication causes.

Validate before use: Each node must validate member-ship information it learns, before adding the informa-tion to its routing table and/or using it for protocoloperations. This eliminates the Repeated packets to vic-tim vulnerability since it ensures that no protocol pack-

Table 1Classication of DDoS attacks by their major source of amplication.

System Attack Repeated packets to

Kad & Overnet Index poisoning [2]NAT-buddy exploit [2]Search hijack [6]

Overnet RT poisoning [2]p

BitTorrent Fake report to tracker [5]Malicious tracker [23]

pFake torrent le [4]

Gnutella Gnutella attack [3]ESM ESM attack [6]

pets will be sent to a node until it has been validated.Further, it also eliminates the Triggering large reply fromvictim vulnerability because an innocent node wouldnot send le requests to the victim server since the ser-ver cannot be successfully validated.

Validate before propagation: Any membership informa-tion must be rst validated before being propagatedto any other nodes. This ensures that innocent partici-pants do not propagate fake membership information.A potential source of attack amplication such as Dele-gation is avoided as attackers are now constrained toinfecting the innocent participants directly. It is worthpointing out that this principle must be followed forall operations involving exchanging membership infor-mation. For instance, in Kad, the principle is followedwhen nodes learn other members through search pro-cess, but not followed when index nodes receive pub-lish messages from other nodes, thus Kad is stillvulnerable to the Index poisoning attack.

Preventing multiple references to the victim: This princi-ple guards against amplication due toMultifake, wherean attacker conveys the same victim multiple times toan innocent participant by disguising the victim as mul-tiple distinct members of the system. A particularlyinteresting case is where the attacker is able to conveythe victim multiple times in a single membership mes-sage, as in the Search hijack attack, because this hasinteresting implications for our defense mechanismsas we will see in Section 4.

4. Enhancing DDoS resilience

As discussed in Section 3, the key principles involved inlimiting DDoS amplication is validating membershipinformation. Several approaches may be adopted to thisend, and we discuss this further in Section 4.1. In this pa-per, we explore in depth the potential of an active probingapproach to validating membership information. We dis-cuss the considerations that motivated us to focus on suchan approach, and details of the approach in the rest of thesection.

4.1. Approaches for validating membership information

We discuss possible approaches that may be adopted

Multifake Delegation Triggering large reply from victimpp

p

p

p pp

p pfor validating membership information:

Use of centralized authorities: Centralized authorities cansimplify validation of membership information by pro-viding signed certicates that indicate that a memberbelongs to a group. The certicates may be distributedthrough the membership management mechanisms,and enable a participant to verify the membershipinformation corresponds to a genuine member. How-ever, the existence of central authorities cannot beassumed in many P2P deployments, and we only con-sider mechanisms that do not rely on their existence.

DHT-specic approaches: It may be feasible to leveragethe properties of DHTs, and design solutions tailoredto them. For instance, one approach is to assign eachnode an ID dependent on its IP address, for instance

X. Sun et al. / Computer Networks 54 (2010) 27562774 2761the hash of its IP address [26]. An attacker that attemptsto provide fake membership information in response toa search query must then ensure that (i) the victim IDobeys the necessary relationship to the victim IP; and(ii) the victim ID is close to the target ID of the search.These dual constraints on the victim ID may be difcultto simultaneously satisfy, complicating the attack.Issues that need to be addressed with such an approachinclude the need to accommodate multiple participantsbehind the same NAT which share a common IPaddress, and the fact the victim ID is only loosely con-strained by the target ID. While the approach has prom-ise, we do not explore it further since our focus in thispaper is on mechanisms that apply to both structuredand unstructured approaches.

Corroboration from multiple sources: Another approachto validating membership information is based on priorworks on Byzantine-tolerant diffusion algorithms [2730]. In this approach, a node will accept and communi-cate with a newly-learned peer only if it learns aboutthe peer from multiple other nodes (say k). Suchschemes are susceptible to attacks where the attackerhas control over k or more nodes, because then he canmake these malicious nodes lie about the same fakemembership information, and defeat the corroboration.Such attacks may be particularly easy to conduct in con-junction with a Sybil attack. Another concern with theapproach is that from a performance perspective, thelarger the k value, the longer it may take to receiveresponses from all k nodes, which can slow down con-vergence and performance [31].

4.2. Validating peers through active probing

In this paper, we explore the potential of an active prob-ing approach for membership validation. We focus on suchan approach because (i) it does not rely on centralizedauthorities; and (ii) the technique applies to both unstruc-tured approaches as well as structured DHT-based ap-proaches. Thus, the technique has potential to be widelyapplicable to a range of existing P2P deployments.

In the approach, when a node receives a membershipmessage, it probes any member included that it did notknow before, and does not send further messages to it orpropagate it unless it receives a response. We use the termmembership message very broadly to refer to any protocolmessage that contains information about other partici-pants in the group. This includes, for example, search re-plies and le publish messages in Kad, Overnet andGnutella and gossip messages in ESM. It also includes mes-sages where a node may directly announce itself to othernodes.

It is possible that a probe sent to the victim could trig-ger a spurious response from some other application run-ning on the port under attack. To prevent this, the proberequest and response should contain: (i) a predened bytepattern unique to the application and distinct patterns forthe request and the response and (ii) a sequence number inthe request which will be incremented in the response.

While probing-based validation has potential, the keyissues are ensuring the probes themselves do not becomea source of DDoS, and dealing with benign validation fail-ures that may occur due to packet loss, membership churn,and the presence of hosts behind Network Address Trans-lators (NATs). We discuss heuristics to handle these issuesin the rest of the section.

4.3. Preventing DDoS exploiting validation probes

To prevent validation packets from being a source ofDDoS attacks, two broad approaches may be adopted. Arst approach involves coordination across members inthe system, so as to ensure only a subset of members con-duct the validation. Such coordination mechanisms arethemselves subject to attack when malicious nodes are in-volved, given that members may not be truthful in sharinginformation about validation failures. While reputationmechanisms such as [32] may be used to address theseconcerns, these mechanisms are often themselves vulnera-ble to attacks such as whitewashing (see [33] for a surveyof schemes and attacks). Further, many schemes (for exam-ple [32]) rely on centralized authorities or pretrustednodes, and we are interested in solutions not dependingon their existence. Thus we focus on mechanisms that onlyrely on locally observable events. More specically we em-ploy two schemes as described below:

4.3.1. Source-throttlingThis scheme seeks to limit the attack trafc that a single

membership message from a malicious node can induce.This prevents attacks like the Search hijack in Section 2.2,which enabled an attacker to achieve signicant attackamplication. In the scheme, when a node receives a mem-bership message, rather than try to validate all the mem-bers included in the message at the same time,validations are performed to at most m members initially,where m is a parameter. A new validation to an additionalmember is conducted only when a previous validation issuccessful. This mechanism ensures that a single messagefrom an attacker can trigger at mostm validation messagesto the victim under attack. Combined with the rest of thevalidation framework, this limits the total attack amplica-tion achievable by the attacker to m, as we discuss in Sec-tion 5.1.

While small values of m are desirable to keep attackamplication small, the main concern is the potential im-pact on performance. In particular, validation failuresmay occur for benign reasons resulting in unnecessarythrottling. Consequently genuine information in a receivedmembership message may be ignored. Further, the valida-tion process may incur some delay, possibly resulting inhigher latencies or convergence times for the application.In Section 7, we evaluate the feasibility of employing smallm values for real applications.

4.3.2. Destination-throttlingThe source-throttling scheme by itself could prove

highly effective in thwarting the attacker in many circum-stances since it bounds the amplication the attacker canachieve. We augment this scheme with a simple mecha-nism we term destination-throttling, that can limit thenumber of packets each innocent participant sends to the

total number of packets sent by each innocent node to avictim prex can be bounded under DDoS attacks.

4.4. Avoiding validation failures with NATs

Many P2P systems employ NAT-Agnostic membershipmanagement operations. In these systems, when memberB propagates information about member X to member A,there is no indication as to whether X is behind a NAT ornot. This can result in benign failures with probing-basedvalidation. In particular, if X were behind a NAT, a valida-tion packet sent by A to X will not successfully reach X un-less X had previously contacted A.

While alleviating communication issues with NATs is anongoing area of work [36,37], these techniques are not al-ways effective with symmetric NATs [38], which is the

2762 X. Sun et al. / Computer Networks 54 (2010) 27562774victim. We note that even with destination-throttling thevictim can receive O(N) attack packets, where N is the totalnumber of participants in the system. However, since theamplication is bounded, it would require the combinedset of all attacker machines to send O(N) messages to inno-cent nodes to redirect them to the victim.

An obvious solution to limiting packets that each inno-cent node sends to the victim is to black-list sources thatcause repeated validation failures, and ignore furthermembership messages from them. This heuristic by itselfdoes not sufce however, since there are potentially manysources, and further the malicious node may mount a Sybilattack. Instead, with destination-throttling, repeated valida-tion failures to a destination are used as an indication thatit is under attack, and future validations are not sent insuch a case.

A destination could refer to an hIP,porti, an IP, or an en-tire network. The network to which an IP belongs may bedetermined by a longest prex match on a database of pre-xes and netmask information extracted from BGP routingtable snapshots [34,35]. Such a database could be obtainedby a client in an out-of-band fashion at the start of the ses-sion, and the information is unlikely to change over theduration of a typical session. Alternately, though less accu-rately, an IP could be assumed to belong to a /24 network.In the rest of the paper, we use the terms prex and net-work interchangeably.

Every client maintains the total number of validationpackets, and the number of failed validations to eachhIP,porti, IP, and prex. Each validation failure is associatedwith a time-stamp indicating when the failure occurred,and only failures that occurred in a recent time windowT are considered. A validation packet is suppressed if eitherthe hIP,porti, IP address, or prex is suspected underattack. An hIP,porti is suspected under attack if more thanFipport failures have been observed to it. A prex (IP) is sus-pected under attack if it has seen at least Fprex(Fip) failuresthat involve Dprex(Dip) distinct IPs (hIP,porti pairs). The setof parameters must be chosen so that the likelihood of des-tinations being falsely suspected under attack due to be-nign validation failures is small. We discuss this furtherin Section 7.

With the scheme as above the total validation failuresto a prex before it is suspected under attack could be aslow as Fprex, and as high as Fipport Dip Dprex. To bettercontain the magnitude of a potential attack, once Fprex fail-ures have been seen to the prex, validations are permittedonly if there has been no prior failure to the IP. With thismodication, at most Fprex + Dprex validation failures areallowed to the prex. A similar heuristic is applied to fail-ures to individual IPs.

The destination-throttling scheme could potentially beexploited by malicious nodes to create attacks on the per-formance of the P2P system itself. We analyze the potentialfor such attacks in Section 5.2. A variant of the destination-throttling scheme that could be used to raise the baragainst such attacks is to consider a prex under attack ifthe percentage of failed validations to a prex exceeds athreshold. While we believe such a variant could be easilyintegrated with our solution, we focus on a solution basedon the total validation failures to each prex to ensure themost restrictive type of NATs, and accounts for close to30% of NATs [36]. With symmetric NATs, different connec-tions initiated by the same internal node generate differentexternal IP and port mappings. Incoming packets are onlyallowed from the public nodes to which packets have beensent. Two nodes both behind symmetric NATs cannot com-municate with each other.

To handle NATs (including symmetric NATs) and re-walls, we require that membership management opera-tions are NAT-Aware. In particular, membershipinformation about nodes behind NAT are propagated witha ag indicating they are behind NAT. When a node Alearns about X, it probes X only if it is not behind a NAT,thereby avoiding benign validation failures. We discuss po-tential attacks on the scheme where a malicious membermay falsify information regarding whether another mem-ber is behind a NAT in Section 5.

4.5. Illustrating the scheme

Probing-based validation mechanisms may be easilyintegrated with various P2P systems, to protect the sys-tems from the exploits described in Section 2.2. As a con-crete example, Fig. 4 illustrates the steps to preventattacks exploiting buddy mechanisms in Kad and Overnetwith our scheme: (1) Node N is behind a NAT, and sendsan advertisement to node I indicating that a public nodeB is its buddy; (2) I validates B, and accepts information

Fig. 4. Complete sequence of steps for normal buddy operations withprobing-based validation mechanisms.

Attack magnitudes: The destination-based throttling

X. Sun et al. / Computer Networks 54 (2010) 27562774 2763scheme ensures that the total number of packets sentby each participating node to a victim is bounded. Inparticular, at most Fprex + Dprex packets are sent toany victim prex over a period of time T, where T isthe time for which a failed validation is considered. Fur-ther, since this scheme is entirely based on the destina-tion to which validation failures are observed, and doesonly on successful validation; (3) C obtains informationthat N has the le and B is the buddy; (4) C validates B,and then sends it a request; (5) this is relayed to N whichthen sends C the le. Note that steps 2 and 4 are the vali-dation messages that are triggered with our framework.The other exploits described in Section 2.2 are similarlyhandled, and we omit the details.

5. Analysis

In this section, we analyze the effectiveness of probing-based validation mechanisms in thwarting DDoS attacks.We also analyze the vulnerability of the mechanisms tonew attacks that may impact application performance,and suggest renements to minimize the impact.

5.1. DDoS attacks

We rst consider DDoS attacks on hosts not participat-ing in the P2P system, which is the primary focus of our pa-per, and what we view as a more critical threat. We thenpresent possible renements for handling attacks on par-ticipating nodes.

5.1.1. DDoS attacks on hosts not in the P2P systemWe discuss the key measures of interest:

Message amplication: this is the ratio of the number ofmessages received by the victim to the total number ofmessages sent by all malicious nodes. We observe thatwith source-throttling, the message amplication is atmostm, independent of the number of malicious nodes.To see this, consider that a DDoS attack on a victim notin the P2P system consists entirely of validation mes-sages. Each validation message must be triggered by amembership message directly received from a mali-cious node. This is because innocent members onlypropagate membership information they can validate,and hence do not propagate the victim. Thus the mes-sage amplication achievable is bounded by the maxi-mum number of validations that may be sent to thevictim due to a single membership message from anattacker node. This is bounded by m, by the throttlingheuristics.

Bandwidth amplication: this is the ratio of attack trafcreceived by the victim to the total trafc sent by allattacker nodes. More, precisely, the bandwidth ampli-cation can be expressed as: m validation message sizemembership message size. Giventhat a validation message is small in general and smal-ler or at most comparable to the size of the membershipmessage, the bandwidth amplication is also bounded bym.not depend on the source which triggered the valida-tion, the bound holds irrespective of the number ofattackers or under Sybil attacks. We note that the victimcan still receive O(N) attack packets, where N is the totalnumber of participants in the system. However, webelieve this is not a signicant concern because theamplication is bounded by m, and the attacker mustsend O(N) messages to innocent nodes to redirect themto the victim. While it may be possible to design mech-anisms that can ensure not all innocent participantssend attack packets to the victim, such mechanismsare likely to involve coordination across the nodes. Suchcoordination mechanisms are not only complex, butalso are themselves subject to possible attack whenmalicious nodes are involved. We made a deliberatedecision to avoid such mechanisms given the feasibilityof bounding attack amplication without them.

Man-in-the-middle attacks: The above analysis as-sumes an attacker model where malicious nodes canonly join the P2P system as regular participants. A moresophisticated attacker could potentially conduct a man-in-the-middle attack, perhaps by compromising routers.In particular, an attacker could intercept and respondto validation packets sent by innocent participants tothe victim, tricking these participants into thinking thevalidation is successful. This is a potential concern be-cause the tricked participants, which we term delegates,could propagate information about the victim, and con-sequently the amplication bounds above do not hold.

We note that to be effective, that attacker must strategi-cally intercept validation packets from a moderate numberof delegates, and this may not be trivial. If validations areintercepted from too many delegates (for instance, theman-in-the-middle is located close to the victim), the at-tacker is likely to see all validations to the victim; if the val-idations are intercepted from too few delegates, the totaltrafc induced at the victim due to the delegates is small.

The message amplication under man-in-the-middleattacks may be expressed as mMredirnMdelMredirnMMIM . Here Mredirn is

the number of redirection messages sent by the maliciousnodes, and MMIM is the number of validations to the victimthat are intercepted by the attacker to conduct the man-in-the-middle attack. Mdel is the number of validation mes-sages received at the victim induced by membership infor-mation propagated by the delegates. Let us assume that themaximum number of innocent participants to which eachdelegate could spread information about the victim is K.This bound could be achieved by simply having a delegateconduct periodic revalidation, and limiting the rate atwhich it spreads information about innocent participants,or by having the delegate conduct a revalidation each timeit spreads membership information to K other participants.Then,Mdel is bounded by K MMIM, and the overall messageamplication is mMredirnKMMIMMredirnMMIM . It is easily veried that this

quantity is between m and K. Finally, the amplication islikely to be even smaller because the analysis does not con-sider that the man-in-the-middle will not only see trafcdue to validations from the delegates, but also normal pro-tocol trafc packets from the delegates.

2764 X. Sun et al. / Computer Networks 54 (2010) 275627745.1.2. DDoS attacks on participating nodesWe next discuss the case when a participating node is

the victim of the attack. A straightforward way to extendthe probing-based validation mechanisms is to requirethe victim to explicitly deny validation requests if it re-ceives them at an excessive rate. On receipt of such a de-nial, an innocent participant considers the validation tohave failed. While this can help limit the attack, this isnot enough because innocent participants that were previ-ously able to successfully validate the victim, (which weagain refer to as delegates), could still continue to propa-gate the membership messages.

The validation trafc seen at the victim due to member-ship information spread by the delegates depends on (i)the number of delegates; and (ii) the rate at which eachdelegate spreads the victim to other innocent participants.The second factor is under the control of the delegates andis easily controlled. However, it becomes important toensure the total number of delegates does not grow inunbounded fashion.

A possible heuristic to limit the number of delegates isto have each node A restrict the number of other nodes thatmay successfully validate it over any window of time, anddeny all other requests. In addition, as described above,delegates are required to periodically revalidate A, and ifa revalidation fails, they are required to stop sending fur-ther packets to A, and stop propagating A to others.

An attacker could exploit this heuristic by sending alarge number of validation messages to the victim, therebycausing the victim to deny validations from innocent peers.We term such an attack a Disconnection Attack, and notethat the attack targets the performance of the P2P systemitself. It is unclear how attractive such an attack is sinceit would require the malicious node to send I U messageseach revalidation period, where I is the delegate limit pernode, and U is the number of nodes to be disconnected.While it is perhaps not hard to disconnect a single node,disconnecting a signicant fraction of nodes in the systeminvolves a large number of messages given Kad and Gnu-tella have over a million users. For instance, if I = 10,000,U = 1000 (representing a disconnection of 0.1% of all partic-ipants), and assuming a revalidation is conducted everyminute, a malicious node would need to send about160,000 messages per second. On the other hand, thisrevalidation trafc would not be a signicant burden forinnocent nodes-considering that typically each node hasa few hundred neighbors, the revalidation trafc requiredof a typical node would be about 2 messages per second.Finally, we note that if the attacker only controlled a smallnumber of IP addresses or prexes, such disconnection at-tacks could be further prevented by black-listing peers thatrepeatedly send validation messages.

In structured DHT-based overlays, a concern with limit-ing delegates is that a node may deny a validation requestfrom a peer with an ID close to its own, thereby impactingthe DHT structure. While this may not be a concern ordi-narily if the delegate limit is chosen conservatively, itmay be possible for an attacker to create such a situationby redirecting a lot of innocent nodes (whose IDs are farfrom the victims ID) to the victim. To defend against thistype of attack, the scheme may be modied for structuredcious nodeMmay do so by ooding Awith incorrect portinformation about the victim node. However, this attackis less attractive for an attacker than the previous one,since it only disconnects A from a single node. Whileour analysis below focuses on attacks to prexes, webelieve that similar arguments will hold for attacks to asingle node.

Attacks on NAT-Aware mechanisms: When a maliciousnode M propagates membership information to node A,it may falsely indicate that a participating node N has apublic IP address, even though N is behind a NAT. Thiscould induce validation failures from A to N, potentiallyresulting in A throttling the prex to which N belongs.

We observe that a malicious node must send at leastd FprefixDprefixm messages to disconnect a node from partici-pants in one prex. This is true since at least Fprex + Dprexvalidation failures are required to a prex before futurevalidations to it are suppressed, and at most m validationfailures may be induced by a single membership messagedue to the source-throttling mechanisms. Based on ourparameterization results in Section 7, we expect d to beof the order of 10 messages.

While such attacks are theoretically feasible, we believethey are not attractive for an attacker in practice, since alarge number of messages must be sent to cause a notice-able degradation in application performance. We note themessages involved to disconnect a node depends on thenumber of prexes spanned by participating nodes, whichis of the order of tens of thousands, and at least an orderof magnitude larger than the attacks in Section 5.1.2. Fur-ther, in a system like Kad, to limit the ability of n partici-pants to access a le, an attacker must disconnect theparticipants from all prexes with nodes that have the le.Further, popular les are likely to be distributed acrossmany prexes. Disconnecting n participants from p prexesrequires at least d n p messages. This can be high, con-sidering that the system can havemillions of clients distrib-overlays to have nodes probabilistically accept a peer as adelegate, based on the ID distance between the two. Thecloser the peer, the higher the probability. The scheme iseffective given that a node will not have too many peerswith an ID close to it, and an attacker has no control overthe ID of an innocent node. We defer a more detailed inves-tigation of these issues to future work.

5.2. Attacks on destination-throttling

In this section, we discuss possible attacks on the desti-nation-throttling scheme. We note these attacks do not ap-ply to the source-throttling scheme, which was theprimary mechanism in limiting attack amplication. Weidentify two variants of the attack:

Attacks on destination-throttling: A malicious node Mmay ood another node A with several fake member-ship entries corresponding to a victim prex. The result-ing validation failures could force A to throttle theprex, resulting in A being disconnected from valid par-ticipants in that prex.A similar attack could be performed to cause A to be dis-connected froma singlenode rather thanaprex. Amali-

X. Sun et al. / Computer Networks 54 (2010) 27562774 2765uted over tens of thousands of prexes. In addition, notethat any such disconnection is temporary, since validationfailures are timed out after time T, and sustaining the dis-connection requires continued messages from the attacker.

It is possible to limit the extent of disconnection attacksif the malicious nodes are localized to a small number ofprexes. In particular, once a prex Z is suspected of beingunder a DDoS attack, source IP prexes that have triggereda validation failure to Z are black-listed. Further validationsto Z are sent only if they are triggered by source IP prexesthat have not been black-listed. We note however that thefact malicious nodes are localized to a small number of pre-xes may not be known apriori. Thus, it is desirable todynamically tune the scheme to ensure the extent of dis-connection attacks is limited if the number of prexesspanned by malicious nodes is small, however ensure thescheme is not susceptible to DDoS attacks if the numberof attacker prexes is large. To handle this, once prex Zis suspected of being under a DDoS attack, a validation trig-gered by a source prex that is not black-listed is sent onlyif the source prex belongs to a randomly selected fraction fof all possible prexes. f is initialized to 1, and dynamicallyreduced, for example by a factor of 2 on each validation fail-ure after the number of black-listed source prexes exceedsa threshold. Intuitively, we expect the scheme to stabilize atf values which are inversely proportional to P, the numberprexes spanned by attackers. Further, the number of addi-tional validationmessages to the victim is at most log P. Wecan bound this quantity even more, by not permitting anyfurther validations once f goes below a threshold.

Finally, the destination-throttling scheme as presentedin the paper considers a prex under attack if the numberof validation failures to a prex exceeds a threshold. A po-tential approach to raise the bar against disconnection at-tacks is to consider a prex under attack if the percentageof validation failures to the prex exceeds a threshold. Wehave focused on the former variant in this paper to boundthe total number of packets sent by each innocent node toa victim prex under DDoS attacks. However, the lattervariant could be easily integrated if the need to preventdisconnection attacks is viewed more critical.

6. Evaluation methodology

The primary question driving our evaluations is howeffective the validation framework is in preventing DDoSattacks without sacricing application performance. Tounderstand this, we have integrated our framework intoa le sharing application (Kad) and a video broadcastingapplication (ESM), two mature and contrasting P2P appli-cations, with very different membership management de-signs and performance requirements. In the rest of thesection, we present our evaluation goals, metrics and ourexperimental methodology.

6.1. Evaluation goals

We have the following goals:

Performance under normal conditions: We study theimpact that the validation framework has on applica-tion performance under normal conditions when thereare no attacks. This enables us to determine the perfor-mance of the throttling heuristics under benign valida-tion failures, arising due to packet loss, churn, and NATs.

Impact of throttling parameters: As our analysis in Sec-tion 5 indicates, the attack amplication, and attackmagnitudes achievable with the validation frameworkare directly dependent on the choice of them parameterfor the source-throttling scheme and the various F andD parameters for the destination-throttling scheme.On the one hand, it is desirable to keep these parame-ters small to limit DDoS attacks. On the other hand,the smaller the values, the greater the potential impacton performance. Thus our evaluations explore how var-ious parameters of the throttling mechanisms impactapplication performance, and seek to identify operatingranges where the impact is small.

Impact on contrasting applications: The performancewith the schemes is dependent on the application itself.Hence, we explore the issues in the context of two verydifferent applications, DHT-based structured le-shar-ing Kad and unstructured video broadcasting ESM.

Performance under attacks: Finally, we evaluate the ben-ets of the validation framework both in limiting DDoSattacks, and in preventing degradation in applicationperformance in the presence of malicious nodes.

6.2. Performance metrics

In our evaluations with le distribution applications(Kad), we consider the fraction of successful searches,and the time a successful search takes to locate an indexnode. For video broadcast (ESM), we consider the fractionof the streaming video rate received by participating nodesand the join time of nodes to the multicast tree. In addition,in evaluating the destination-throttling scheme, we mea-sure the number of prexes (as well as IPs, and hIP,portis)blocked by the scheme, in the absence of attackers.

6.3. Methodology

Our experimental methodology employs experimentsboth on the live Kad network, and on Planetlab.

Live Kad experiments: The performance of both throttlingschemes is sensitive to the extent to which benign vali-dation failures are seen in realistic application deploy-ment settings. This in turn depends on realistic churnrates, packet loss rates, and the fraction of participatingNAT hosts. In addition, the performance of the destina-tion-throttling scheme is sensitive to the number of par-ticipating nodes that share an IP prex, and the numberof participating nodes that share an IP address. To eval-uate the performance of the schemes under such realis-tic conditions, several of our experiments are conductedon the live Kad network. We do this by implementingour validation framework in a Kad client, and having itjoin the live Kad network. We compare the performanceof an unmodied Kad client on the live network, withmultiple instances of the modied Kad client runningdifferent parameter settings.

Planetlab experiments: One limitation of evaluations onlive Kad is that the throttling schemes are implementedonly on the clients we have control over. To evaluate thethrottling schemes in settings where all participatingclients implement the validation framework, we evalu-ate Kad on Planetlab. In addition, Planetlab evaluationsenable us to compare the performance of the schemesin the absence of attacks, and under attack scenarios.

Since there are no long-running live ESM broadcasts,experiments on live ESM deployments are not feasible,and our ESM experiments are conducted on Planetlab.To ensure realism, our experiments with ESM leveragetraces from real broadcast events [22] to model the groupdynamics patterns, and bandwidth-resource constraintsof nodes. Since most nodes on Planetlab have publicIPs, we emulate NAT connectivity restrictions by imple-menting packet ltering to ensure that two Planetlabincarnations that are behind a NAT cannot communicatewith each other. Note that our emulations model Sym-

while the experiments with ESM were conducted on

7.1. File sharing application

We parameterize the source-throttling scheme in Sec-tion 7.1.1 and the destination-throttling scheme in Section7.1.2.

7.1.1. Source-throttling: impact of mAs described in Section 2.1, when a Kad node conducts a

search for a keyword or a le, it issues queries, and receivesreplies containing membership entries. This enables thenode to locate index nodes for the keyword or the le.The source-throttling scheme impacts the search perfor-mance because entries returned in a reply message maynot be fully utilized, and the validation process may incursome extra delay, as we have explained in Section 4.3. Afactor that may affect the results is the number of member-ship entries returned in a reply message. In Kad, this num-ber is a client-specied parameter which can be set by thenode, and is included in the query messages sent to peersso they will know how many entries to include in the re-

15elay [

3

f m o

2766 X. Sun et al. / Computer Networks 54 (2010) 27562774Planetlab. All the experiments were conducted in the ab-sence of attackers.

0

0.2

0.4

0.6

0.8

1

0 5 10

CD

F on

sea

rche

s

D

0.7

0.8

0.9

1

1 2

Fig. 5. Source-throttling: impact ometric NATs, the properties of which are elaborated inSection 4.4.

7. Parameterizing the validation framework

In this section, we present results evaluating the im-pact of the validation framework and its various param-eters on the performance of Kad and ESM. Our goal is toidentify the possible sweet-spot parameters that aresmall enough to minimize the DDoS attacks, yet largeenough to tolerate most benign validation failures. Weimplemented the throttling schemes in real Kad andESM clients. For comparison, in our experiments we alsoran unmodied Kad and ESM clients, which we refer toas Base-Kad and Base-ESM. The experiments with Kadin this section were conducted in the live Kad network,plies. In the mainstream implementations such as eMuleand aMule client software, the default values for this num-ber are 2, 4 and 11, depending on the type of the search.While we set it at 11 for our experiments, we also studythe sensitivity of our results to this parameter.

We rst measure the impact of the m parameter onapplication performance. We let four versions of our mod-ied Kad client (i.e., with source-throttling) run in parallel,with each version running a different value of m (i.e., oneversion with m = 1, one version with m = 2 and so on). Inaddition, we let an unmodied client (i.e., Base-Kad) runat the same time. A random set of 1000 keywords fromthe English dictionary were picked, and each client con-ducted one search for each of these keywords in sequencein a one hour period. For all the keywords for which theBase-Kad client returned at least one index node, we mea-sured the time each client took to locate the rst indexnode.

In all, the Base-Kad client returned at least one indexnode for 567 searches. The fraction of these searches for

20 25 30 35second]

m=1m=2m=4m=8

Base-Kad

4 5 6 7

m=1

n search delay. Measured in Kad.

ar is t

X. Sun et al. / Computer Networks 54 (2010) 27562774 2767Fig. 6. Source-throttling: sensitivity to Planetlab Site. For each site, each bsearches. The error bars show the standard deviation. Measured in Kad.which the source-throttling scheme also returned one ormore index node was 94.5% for m = 1, 98.8% for m = 2,99.5% for m = 4, and 99.6% for m = 8. Thus, while the throt-tling scheme did impact some of the searches, the effectwas minor for m = 2 and higher values.

Fig. 5 plots the CDF of the search delay for successfulsearches (i.e., the time taken for the rst index node tobe returned). The line on the top represents the Base-Kadscheme, and the line on the bottom represents thesource-throttling scheme with m set at 1. The remaininglines are the source-throttling scheme with m set to 2, 4,and 8. These lines are very close to each other and practi-cally indistinguishable. Overall, the results indicate thatwhile extremely aggressive levels of throttling (m = 1)can result in noticeable degradation in application perfor-mance, the degradation is minimal for even slightly highervalues of m, such as m = 2 or m = 4.

Sensitivity: The previous graph showed results for oneexperiment from one site. We now consider six distinctPlanetlab sites and conduct ve runs from each site. Forall sites, we make two hosts join the Kad network, one hostrunning Base-Kad and the other running source-throt-tling with m = 2. Fig. 6 shows the search delay for each

Fig. 7. Source-throttling: sensitivity to the number of membershipentries returned in a reply message. For each site, each bar is the averageover ve runs of the 90th percentile of the delay of successful searches.The error bars show the standard deviation. Measured in Kad.Planetlab site. We show two bars per site, one for Base-Kad and the other for source-throttling withm = 2. For eachrun, the 90th percentile of the delay across successfulsearches is taken. Each bar is the average of the 90th per-centile of the search delay over ve runs. The error barsshow the standard deviation. We observe that the resultsacross sites are consistent with the results in Fig. 5.

Additionally, we study the sensitivity of the results tothe number of membership entries that are returned in areply (P) in Fig. 7. Here we only consider source-throt-tling with m = 2. Each group of bars correspond to a differ-ent value of P, with each bar corresponding to a differentsite. For each run, the 90th percentile of the delay acrosssuccessful searches is taken. Each bar is the average of the90th percentile of search delay over ve runs. The error barsshow the standard deviation. We notice that the perfor-mance of m = 2 is not sensitive to the change in the P set-ting. This further conrms the feasibility of using small mvalues in realistic settings. We repeated these experimentson additional Planetlab sites and results were similar.

he average over ve runs of the 90th percentile of the delay of successful7.1.2. Destination-throttling: impact of D and FWe next study the impact of the destination-throttling

scheme on Kad performance. In doing so, the key metricis the extent to which destinations (prexes, IPs, andhIP,Portis) are unnecessarily blocked due to benign valida-tion failures. Note that in the destination-throttlingscheme, only the validation failures in a recent windowof time are considered in deciding whether a destinationis to be blocked or not. Hence our evaluation focuses onunderstanding validation failures seen by typical clientsover such a time window. We believe that a reasonablewindow length should be tens of minutes, and use onehour in our evaluation.2

2 If the number of peers in the system is N, each peer could incur Pvalidation failures before blocking a prex, each validation packet is Bbytes, and the validation failures are considered for a time T, then theexpected validation trafc to a victim under a DDoS attack is N P B/T. Fora population of 1 million, with P = 20, B = 100 bytes, and T = 1 h, this isabout 4.4 Mbps, which we believe is reasonable.

A key factor that impacts the results is the aggressive-ness with which clients conduct searches, as this impactsthe number of validations conducted and failed validationsseen. Thus we have done a sensitivity study to the rate atwhich our clients conduct searches. We choose the rateto be 100, 300 and 1000 per hour. According to a studyof client query patterns [39], even 100 per hour is higherthan the search rate of a typical client. Three-hundred

or larger, and a Fprex of 10 or larger. Overall, choosing Fprexvalues in the 1015 range, and Dprex in the 510 range iseffective.

We have conducted a similar evaluation to study theimpact of the Fip, Dip, and Fipport parameters. Our resultsshow that with Dip value of 3 or larger, and Fip and Fipportvalues of 5 or larger the scheme works well. With thesesettings, no destinations are blocked at the IP level evenwhen the client conducts 1000 searches per hour. Further,as the client search rate varies from 100 to 1000 per hour,only 0.20.4% of the hIP,Portis contacted are blocked. Inter-estingly, most of the blocked destinations corresponded toport 53, which is used for DNS service. We believe this cor-responds to a real attempt of DDoS attacks exploiting thewild Kad system, as indicated by Zhou et al. [20], and dis-cussions on the eMule forum [19].

Sensitivity: We conducted sensitivity experiments ofthe results in Table 2 for 300 searches per hour and two(Dprex,Fprex) combinations of parameters, on six Planetlab

Table 2Destination-throttling: percentage of contacted prexes blocked with theparticular search rate and for various combinations of (Dprex,Fprex).Measured in Kad.

# of searches (Dprex,Fprex) (%)

(5,5) (5,10) (10,10) (10,15) (15,15)

100 0.07 0 0 0 0300 0.55 0 0 0 01000 4.53 0.44 0.29 0.07 0.05

2768 X. Sun et al. / Computer Networks 54 (2010) 27562774per hour is comparable to the most aggressive clients,while 1000 per hour is signicantly beyond even mostaggressive clients and stresses our scheme. Adopting asimilar methodology as in Section 7.1.1, we instrumentmultiple versions of a client to join the live Kad networkin parallel, each conducting an appropriate number of key-word searches over a one hour period.

Table 2 shows the percentage of prexes blocked for aclient conducting a particular number of searches in onehour. Results are shown for various combinations of Dprexand Fprex. A prex is blocked if it has seen more than Fprexfailures involving Dprex distinct IP addresses. To determinethe prex to which a client belongs, we use the RouteViews dataset that helps map IP addresses to prexesbased on BGP data [34]. If any prex is coarser than a /16however, we simply consider the /16 as the prex. The re-sults show that barring extremely small values of the Dprexand Fprex parameters, the number of blocked prexes issmall. In particular, for 300 searches per hour, no prexesare blocked if Dprex is 5, and Fprex is 10 or larger. Even witha search rate as high as 1000 per hour, the percentage offalsely blocked prexes is less than 0.3% for a Dprex of 10Fig. 8. Destination-throttling: sensitivity to Planetlab Site. For each site, each baare blocked. The error bars show the standard deviation. Note that for all but oneKad.sites. Fig. 8 presents our results. There are two bars per site,each for a different combination of (Dprex,Fprex). Each barshows the average over ve runs of the percentage of con-tacted prexes that were blocked. Error bars show thestandard deviation. We can see that the results are consis-tent across sites with the results in Table 2. In particular,notice that for (Dprex,Fprex) = (10,10), ve out of the sixsites did not block any prexes for the experimentsconducted.

7.2. Video broadcasting application

Our ESM parameterization experiments were con-ducted on Planetlab. We leverage a trace from a realdeployment [22] to emulate group dynamics and resourceconstraints in the system. We only show results for thesource-throttling scheme. We were unable to parameterizethe destination-throttling scheme since it requires realisticdistributions of participants sharing a prex or an IP ad-dress and this information was not available in the trace.Given this, we use the results from the Kad experimentsto select the F and D parameters for ESM.

r is the average over ve runs of the percentage of contacted prexes thatsite, when (Dprex,Fprex) = (10,10), 0% of prexes are blocked. Measured in

The Planetlab experiments emulate a 20 min segmentof the trace, with 148 joins, 173 leaves and 377 nodes in to-tal. The streaming video rate employed is 450 Kbps, whichrepresents typical media streaming rates in real settings[22]. In addition, we vary the fraction of NAT nodes inthe system and turn off the NAT-aware heuristics, to in-crease the likelihood of benign failures and stress ourscheme.

We observed that the average streaming rate receivedby nodes throughout the experiment is not affected bysmall values of m. More than 94% of the nodes receivedmore than 90% of the streaming rate for m = 1, m = 2 andm = 4. To explain these results, consider that a key factorthat may affect the performance of ESM is the number ofentries the nodes have in their routing table. Having manyentries ensures that nodes have enough potential parentsto contact when they join the group or when a parentleaves. The source-throttling scheme can impact ESM byreducing the rate at which nodes learn about others, since

rate received by nodes.

ling DDoS attacks without impacting performance of these

X. Sun et al. / Computer Networks 54 (2010) 27562774 2769To explore the potential impact of the source-throttlingscheme on the performance of nodes in the initial phase, weconsider the time it takes for a node to join the multicasttree. Fig. 9 shows the 90 percentile of the join time for var-ious values ofm and different NAT percentages. There is onebar for each value of m. Each bar is the average over veruns. Error bars show the standard deviation. The resultsshow that while there is some increase in join time forsmall values of m, the impact is limited. For instance, thejoin time for m = 2 and m = 4 is only 2 s higher than Base-ESM, for settings with 65% NATs. Note that with NAT-awareheuristics in place, this difference would be even smaller.

7.3. Discussion

Our results indicate that the performance degradationwith Kad, and ESM is minimal, even with small values of

Fig. 9. Source-throttling: impact of m on the join time of nodes. Each baris the average over ve runs of the 90th percentile of the join time. Theerror bars show the standard deviation. Measured in ESM.membership information received may not be fully uti-lized. But this does not affect nodes in the long run, sinceover time they are still able to build a large routing table.Hence, the value ofm does not affect the average streamingapplications.While the validation framework itself may be inte-

grated easily in many P2P systems, the appropriate param-eter choices are potentially dependent on the particularapplication. We now discuss the factors that might impactthe parameter choice, and why we expect the parameterscan potentially be kept small in general.

The primary concern with the source-throttling schemeis that when a membership message is received, benignvalidation failures incurred to some of the membership en-tries could prevent other potentially useful entries in thatmessage from being validated (and hence utilized). To getmore insight into this, consider that each membershipmessage includes K entries, and assume the probability aprobe will fail for benign reasons is p. With the source-throttling scheme, it may be easily veried that the ex-pected number of membership entries that are probed ismp , and the number of potentially useful entries probed ismp m. Since the expected number of potentially useful en-tries included in the entire membership message is(1 p)K, the fraction of potentially useful entries that oursource-throttling scheme actually utilizes is mpK. If the prob-ability p of benign validation failures is kept low, for exam-ple by including NAT-aware heuristics, and by ensuring thesystem does a good job of minimizing stale membershipinformation, then, small m values will have only minorperformance impact. Even if the application cannot elimi-nate benign validation failures, it may have other mecha-nisms that can help limit the performance impact. Forexample, when the Kad system receives a membershipmessage in response to a search request, it preferentiallyprobes entries that are closer to the search target. This en-sures that the most potentially useful entries are utilizedrst, even if some of the entries are not utilized.

In our work, we have assumed the parameters are setuniformly across all clients and in static fashion. One couldenvision the need for the parameters to be set dependenton the client characteristics (e.g. aggressiveness of searchpatterns), or with differing levels of thresholds for differentdestination networks, based on their number of partici-pants or bandwidth capabilities. In our evaluations, wehave found that setting parameters conservatively basedon worst-case scenarios (e.g. based on extremely aggres-sive search patterns) is sufcient to keep parameters low.That said, there may be potential benets in other applica-tions and deployment scenarios to tuning the parametersto individual clients or prexes. Self-tuning mechanismsto dynamically determine appropriate parameters for anyapplication, and deployment scenario are an interestingdirection of future work.

8. Evaluation under attack

In this section, we present results evaluating the effec-tiveness of our validation framework in minimizing DDoSthe throttling parameters. Since the amplication andmagnitudes of potential DDoS attacks are directly deter-mined by these parameters, these results indicate thepromise of the validation framework in effectively control-

attacks while achieving good performance even under at-tack. We implemented the validation framework on aKad client and an ESM client. We refer to our modied cli-ents as Resilient-Kad and Resilient-ESM. Again, we refer tothe unmodied Kad and ESM clients as Base-Kad andBase-ESM. We set various throttling parameters accordingto the results obtained in Section 7. In particular, we setm = 2, Dprex = 10, Fprex = 10, Dip = 3, Fip = 5 and Fipport = 5for both Resilient-Kad and Resilient-ESM. All the experi-ments in this section were conducted on Planetlab.

8.1. File sharing application

In our Kad experiments, the inter-arrival patterns ofnodes, and the stay time duration follow a Weibull distri-bution, based on [40]. A mean stay time of 10 min is as-sumed. Each experiment lasts 30 min, and involves 680clients in total, with peak group size around 270. Each cli-ent conducts a search for a random logical identier every

effectively reduced by a factor of 100,000 from 10 Mbpsto 0.1 Kbps.

Fig. 11 compares the performance of the Base-Kad andtheResilient-Kad schemes. There are three sets of bars, eachcorresponding to a setting with a particular percentage ofnodes behind NAT. Each set has four bars correspondingto the two schemes, with and without the presence ofattackers. The graph shows the search delay, averaged overall searches conducted by all nodes throughout a run, thenaveraged over ve runs. Error bars show the standard devi-ation. Here search delay is measured as the time taken tolocate the node which is not behind NAT and which hasan ID that is the closest to the target ID. Our goal is to emu-late location of index nodes in the real Kad network, andwe note that only public nodes can be index nodes. Wemake the following observations.

First, in the presence of attackers, the search delay withthe Base-Kad degraded signicantly (over eight seconds inall NAT percentage settings). This was because nodes keptgetting redirected by the attackers to the victim when theyconducted searches. However, with Resilient-Kad, in thepresence of the attackers the degradation was small and

2770 X. Sun et al. / Computer Networks 54 (2010) 2756277460 s, which is intended to simulate a search for a keywordor a le. There are ve attackers that stay through the en-tire experiment and there is a single victim. Each attackerconducts the Search hijack attack from Section 2.2, and em-ploys the Attraction [6] and Multifake (Section 3.1) heuris-tics. In particular, when an attacker initially joins thenetwork, it proactively pushes information about itself toabout 100 innocent nodes, forcing them to add the attackerto their routing tables. This causes many nodes to sendsearch queries to the attacker and be redirected to the vic-tim. In addition, in every search response, the attacker in-cludes the victims contact information about 100 times.This causes even larger attack magnitudes since innocentnodes send several messages to the victim for every re-sponse from the attacker (see [6] for more details).

Fig. 10 shows the attack trafc generated at the victimas a function of time, from one experiment. With theBase-Kad, the trafc was as high as 10 Mbps throughoutthe run. Further, the trafc at each attacker was only about250 Kbps, which while higher than what a normal usersees, is 40 times lower than the trafc seen by the victim.However, with Resilient-Kad, the attack magnitude was

0.01

0.1

1

10

100

1000

10000

0 5 10 15 20 25 30

Traf

fic R

ate

[kbp

s]

Time [minute]

Base-KadResilient-Kad

Fig. 10. Trafc seen at the victim as a function of time, with ve attackers,and 50% of the nodes behind NAT. Measured in Kad.the search delay was still under 3 s for all NAT percentagesettings. This was because fake information provided aboutthe victim was quickly throttled and not used as part of thesearches.

Second, in the absence of attackers, Resilient-Kad per-formed slightly better than Base-Kad. There are mainlytwo reasons for this. First, Base-Kad does not involveNAT-aware membership management. Thus, its routing ta-ble could include nodes behind NAT, and many searchmessages could fail since nodes behind NAT are contacted,leading to longer search times. In contrast, Resilient-Kadcontains NAT-aware membership management leading toa routing table with fewer useless entries. Second, the des-tination-throttling scheme implemented in Resilient-Kadhas the side effect that it can help purge stale membershipinformation. In particular, consider a scenario where node

BaseKad, no attackBaseKad, under attack

ResilientKad, no attackResilientKad, under attack

0

2

4

6

8

10

0% 50% 65%

Del

ay [s

]

NAT percentage

Fig. 11. Average time a search takes to locate the index node, withdifferent NAT percentages. Each bar is the average over ve runs. Theerror bars show the standard deviation. Measured in Kad.

A has left the group. Node B learns stale membership infor-mation about node A from some other node. In Base-Kadthis membership information is accepted, adding to use-less entries in Bs table for some time. Further, the staleinformation may also be propagated by B to others. In con-trast, with Resilient-Kad this membership information isnot accepted due to the validation process, consequentlyhelping avoid useless entries.

Finally, the performance of all schemes get better whenthe NAT percentage increases. This is because, as the num-ber of public nodes decreases, there are fewer hops toreach the index node of a given target ID. Resilient-Kadnodes see better improvements since they only maintainentries of public nodes while Base-Kad nodes maintainuseless entries of peers behind NAT.

8.2. Video broadcasting application

We consider the effectiveness of Resilient-ESM. We as-sume 10% of the nodes are malicious and perform an attackas described in Section 2.2 and in [6]. Our results indicatethat the Resilient-ESM scheme is effective in containing at-tacks, reducing the attack magnitude from 10 Mbps to0.1 Kbps.

9. Interactions with P2P developers

10. Related work

X. Sun et al. / Computer Networks 54 (2010) 27562774 2771Next, we consider application performance with Base-ESM and Resilient-ESM. Fig. 12 shows the fraction of nodesthat see more than 90% of the source rate, for bothschemes, with and without attackers. Each bar is the aver-age over ve runs. Error bars show the standard deviation.We observe that Base-ESM shows signicant degradationin performance in the presence of attackers. For instance,less than 70% of the nodes received more than 90% of thesource rate, in settings with 65% NATs. This is because un-der attack, much of the membership information in therouting table of nodes is fake. This reduces the number ofpotential parents to contact when nodes join the group

Fig. 12. Fraction of nodes that see more than 90% of the source rate, with10% of the nodes as attackers and different NAT percentages. Each bar isthe average over ve runs. The error bars show the standard deviation.Measured in ESM.In Section 2.2, we have described most of previous re-search, which focus on exploiting individual P2P systemsfor DDoS attacks. In contrast, we have classied knownDDoS attacks by amplication source, and identied prin-ciples to prevent amplication. In addition, ours is the rstwork aimed at enhancing the resilience of P2P systems toDDoS attacks.

The idea of employing probing-based mechanisms tovalidate membership information is briey discussed in[2]. Athanasopoulos et al. [3] discusses a solution specicto Gnutella which requires a node to complete a hand-We have initiated discussions with P2P developersalerting them to the potential for DDoS attacks exploitingtheir systems, and encouraging them to make changes totheir system to address the vulnerabilities. When we con-tacted the developers of eMule, they indicated they werealready aware of our workshop paper [6], which hadshown the feasibility of exploiting Kad (part of the eMulesoftware) to cause DDoS attacks, and had implemented anumber of changes to address the vulnerabilities. We iden-tied some limitations of the changes, and the developersindicated they would implement additional mechanismsto address these in a future release. The combined set ofchanges limit the total number of entries in search re-sponse messages, and limit each response to have onlyone IP associated with an ID, and have at most 10 IPs foreach /24 prex. Similar restrictions are placed on the rout-ing table entries. These changes are primarily intended todefend against attack heuristics such as Multifake, in orderto bound attack amplication. These xes are easily imple-mentable and help solve some of the immediate problems.However, the xes still suffer from a few limitations: (i) theamplication on /24 prexes could be as high as 10; (ii) at-tacks on prexes coarser than /24 are not prevented andthe amplication of attacks on such coarser prexes isnot bounded; and (iii) each innocent participant couldsend an unbounded number of packets to the victim. Weare in ongoing discussions with the developers to get ourthrottling mechanisms integrated, which could addressthese limitations. Our overall interactions show that P2Pdevelopers recognize the potential for DDoS attacksexploiting their systems, and the value of designing sys-tematic solutions to counter the threat.or when a parent leaves. Furthermore, we notice that theperformance degradation on Base-ESM becomes more sig-nicant as the fraction of nodes behind NAT increases. Thisis because in these regimes there are fewer potential par-ents in the system, so the impact of having fake entriesin the routing table of nodes is even higher. In contrast,with Resilient-ESM, invalid membership information wasnot used, leading to fast convergence and high perfor-mance. In particular, 95% of the nodes received more than90% of the source rate, for all NAT percentages, with andwithout attackers.

retrospect, the failure to follow the guidelines in a wide

2772 X. Sun et al. / Computer Networks 54 (2010) 27562774shake with a peer prior to any le request message beingsent to the peer, which could be viewed as a form of vali-dation. In Kad, nodes employ a form of probing-based val-idation mechanism when nodes learn about members insearch response messages (but nodes do not validatemembers learnt through publish messages). However,none of these consider the possibility that validation pack-ets could be exploited to cause DDoS attacks. In fact, thiswas exploited in [6] to achieve large attack amplication.Further, we have also considered issues such as benign val-idation failures, DDoS attacks on entire network prexes,and disconnection attacks. In addition, we have presentedanalysis and comprehensive performance evaluations ofour framework. Finally, we have shown that our mecha-nisms are general and can defend against attacks on a di-verse range of systems. Freedman et al. [41] have usedprobing-based mechanisms to validate membership infor-mation to improve DHT lookup performance. In contrast,our focus is on DDoS detection and the interoperability ofvalidation mechanisms with sources of benign failuressuch as NATs.

In our earlier paper [31], we showed the limitations ofthree other techniques in enhancing the resilience of P2Psystems. A rst technique was to limit DDoS attacks byusing pull-based membership management rather thanpush. However, while this reduces the susceptibility to at-tacks in some contexts, pull-based protocols are still vul-nerable as our attacks on Kad show [6]. A secondtechnique was to corroborate membership informationfrom multiple sources. However, this technique is highlysusceptible to Sybil attacks, and can incur signicant per-formance degradation. A third technique is to limit thenumber of IDs associated with the same IP address, andthe number of IPs sharing the same prex that a node ac-cepts. The limits are applied even if only genuine partici-pants are involved, greatly limiting communicationbetween participants. In contrast, this paper takes a differ-ent approach that requires nodes to be validated, andbounds the number of validation failures to each IP addressor prex, which in turn results in much fewer false posi-tives. The focus on validations, throttling schemes, andanalysis of their performance and security distinguishesour current work.

Researchers (e.g. [42]) have looked at exploiting unstruc-tured le systems to launchDDoS attacks on P2P systems byintroducing unnecessary queries, and having them oodedby the system. In contrast to these attacks, our focus is onDDoSattacksonexternal servers, causedby introducing fakemembership management information.

Several works [1115] focus on how malicious nodes ina peer-to-peer system may disrupt the normal functioning,and performance of the overlay itself. Many of these works,and most notably [13,14], focus on attacks on structuredDHT-based overlays, and rely on trusted authorities to as-sign node IDs to principals in a certied manner. Our workdiffers in several ways. First, we focus on DDoS attacks onthe external Internet environment, i.e., on nodes not par-ticipating in the overlay. Second, we focus on mechanismsthat may be easily integrated into both structured DHT-based and unstructured non DHT-based systems. Third,we do not rely on a centralized authority to certify mem-range of deployed systems, and the resulting repercus-sions are striking.

Second, we have shown the effectiveness of an activeprobing approach to validating membership informa-tion in thwarting such DDoS attacks. We have focusedon such an approach given that it does not rely on cen-tralized authorities for membership verication, and isapplicable to both structured and unstructured P2P sys-tems. Despite the simplicity of the approach, it can keepattack amplication low (to a factor of 2), while havinga modest impact on performance. For a video broadcastapplication with stringent performance requirements,and for m = 2, the average source rate seen by nodes ispractically unaffected, and when the 90 percentile ofclient join time is considered, the increase is less than12%. With Kad and for m = 2, the search time increasesby less than 0.3 s on average.

While we have taken a key step towards enhancing theresilience of peer-to-peer systems to DDoS attacks, we areextending our work in several directions. From a securityperspective, we are investigating mechanisms that canbound amplic

Preventing DDoS attacks on internet servers exploiting P2P systems

Documents

Transcript of Preventing DDoS attacks on internet servers exploiting P2P systems