Research Article Towards Accurate Node-Based Detection...

Research ArticleTowards Accurate Node-Based Detection of P2P Botnets

Chunyong Yin12

1 School of Computer amp Software Nanjing University of Information Science amp Technology Nanjing 210044 China2 Jiangsu Engineering Center of Networking Monitoring Nanjing University of Information Science amp TechnologyNanjing 210044 China

Correspondence should be addressed to Chunyong Yin yinchunyongnjugmailcom

Received 4 April 2014 Accepted 15 May 2014 Published 24 June 2014

Academic Editor Yuxin Mao

Copyright copy 2014 Chunyong YinThis is an open access article distributed under theCreativeCommonsAttributionLicense whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Botnets are a serious security threat to the current Internet infrastructure In this paper we propose a novel direction for P2Pbotnet detection called node-based detection This approach focuses on the network characteristics of individual nodes Basedon our model we examine nodersquos flows and extract the useful features over a given time period We have tested our approach onreal-life data sets and achieved detection rates of 99-100 and low false positives rates of 0ndash2 Comparison with other similarapproaches on the same data sets shows that our approach outperforms the existing approaches

1 Introduction

11 Background and Motivation Botnets are groups of com-puterswhich are libked to each other through similar networkprocesses which perform coordinated tasks like informationcrawling Internet Relay Chatting (IRC) and informationsharing While botnets can be used for benign purposesover the past decade there has been a drastic increase inthe number of malicious botnets [1] which have becomea serious concern to the security of Internet applicationsand networking infrastructure To build a malicious botnetthe attacker known as botmaster uses one or more centralservers to compromise and acquire control of vulnerablecomputers using malware These compromised computersknown as ldquobotsrdquo link to the central server which acts asthe command and control center (CampC) using protocols likeIRC or HTTP The botmaster uses these channels to deliveradditionalmalware and instructions to the bots for launchingdifferent kinds of attacks Malicious botnets are capable ofa wide range of attacks including e-mail spam keystrokelogging packet sniffing DoS attacks and identifying newtargets for enlisting in the botnet among others [2ndash6] In thispaper unless explicitly stated we use the term ldquobotnetrdquo torefer to a malicious botnet

Figure 1 illustrates a botnet of five bots connected totwo CampC servers controlled by a botmaster This botnet

is operating in a centralized mode that is one or twoservers control the bots in the network Such centralizedbotnets can be shutdown or blocked if the CampC serversare identified thereby rendering the botnet ineffective Toincrease resiliency to detection recent botnets are built usingpeer-to-peer networking principles where any node can actas a client as well as a server Accordingly in a P2P botnetany node can act as a bot as well as the CampC server InFigure 2 we show an example P2P network and in Figure 3we show a corresponding P2P botnet The botmaster canconnect to any P2P bot in the network and operate it as theCampC server Compared to the server-client botnet the P2Pbotnet has the ability to realize highly scalable and extensiblenetwork structure which is resilient to firewall sanctions andnodepath failures

In this paper we focus on the problem of detecting P2Pbots in a distributed network Mitigating the threat of a P2Pbotnet is a challenging task as the botnet has no centralCampC server which can be blocked and the botmaster usesthe overlay structure of the P2P network to stay connected tothe bots As shown in Figure 3 even if some bots are blockedby firewalls the botmaster can continue communicationwiththese bots along alternate routing paths as long as the blockednodes are connected to at least one other P2P botThis impliesthat it is essential to detect the bots in a systematic andcomprehensivemanner Furthermore themalware programs

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 425491 10 pageshttpdxdoiorg1011552014425491

2 The Scientific World Journal

Botmaster

CampC serverCampC server

BotBotBotBotBot

Figure 1 CampC botnet

Figure 2 P2P network

Botmaster

BlockedBlocked

Blocked

Blocked

Remote

Figure 3 P2P botnet bypassing firewalls

used in P2P bots are typically self-propagatingmdashwhichhelp in discovering new peers and mutating as wellmdashtoavoid signature-based detection Due to their resilience P2Pbotnets represent a major threat to the Internet applicationsand infrastructure Therefore to safeguard the Internet fromstrategic coordinated attacks there is an urgent need to devisesolutions to detect P2P bots and render the P2P botnetsineffective

12 Limitation of Prior Art The existing solutions for P2Pbotnet detection can be broadly classified into signature [7ndash13] and flow-based detection [14 15 15ndash18] Signature-basedP2P bot detection is based on inspecting each packet inthe network traffic entering or leaving the Internet gatewayof the network for the presence of special features such asport numbers byte sequences in the payload and blacklistedIP address These special features known as signaturesare extracted from known botnet infections in the pastand stored in a signature database While signature-baseddetection has good detection rate and is easily deployedit has two major limitations First it is deterministic andrelies only on detecting known botnet infections and cannotdetect unknown bots Even known bots can evade signaturedetection by changing ports of communication or use packetpayload encryption to hide the bot specific features Secondinspection of each packet results in performance degradationespecially when the traffic consists of a large amount ofbenign data

Flow analysis based bot detection examines networkflows between two nodes where a flow is defined as a set ofpackets which have the same source address source portdestination address and destination port The intuition inthese approaches is that the flow features such as the countof the packets in the flow the order of the packet arrivalsand the interval between packets can model the botnetcommunication patterns more accurately than direct packetinspection The extracted features are used to construct aclassifier that can differentiate normal flows and maliciousbot flows Since classifiers use statistical profiling flow-basedanalysis is capable of detecting unknown bots which exhibitbehavioral similarities to known bots However flow-basedtechniques suffer from two key limitations First there areseveral flows between any two network nodes which need tobe analyzed and usuallymost of these flows belong to normalnetwork processes Second the flow features need to beextracted at runtime which implies that flow-based analysisrequires considerable computational overhead at runtime Atany given instant there are a significant number of flows inthe network which exaggerate the impact of these limitationsfurther

13 Proposed Approach In this paper we describe a novelP2P bot detection approach called node-based bot detectionin which we analyze the network profile of nodes to detectbot characteristics A sample network profile of a nodemay comprise the different protocols used by the node thenumber of flows in a particular time period packet statisticsand so on Our approach is based on the intuition that P2Pbots exhibit a distinct network profile due to the variousP2P network maintenance related tasks they are required toperform A P2P bot will be more active in communicatingwith other P2P bots and exchanging various instructionsrelated to control and command Also unlike normal P2Pnodes the P2P bots exhibit nearly uniform network activitybased on the instructions of the current CampC server Based onthese observations our approach consists of identifying andquantifying the network profile features that are typical of a

The Scientific World Journal 3

P2P bot To extract these features we monitor the networkflows at each node and generate the nodersquos network profileThe final network profile of a node is a combination ofthe features typical of P2P bots and the features observedfrom the network flows at the node Finally we use machinelearning based classification techniques to detect whether thenetwork profile of a node corresponds to the network profileof a P2P bot

14 Technical Challenges and Solutions There are severaltechnical challenges in our approach First the process ofconstant flow monitoring at a node results in a majorcomputational and storage overheadWe address this issue byusing a sampling approach wherein we periodically sample anetwork flow at different time intervals Although samplingmay not detect the same number of bots as those detectedby constant flow monitoring in the same time interval dueto the cyclic nature of P2P botnets the sampling approacheventually detects all the bots in the P2P botnet Secondquantifying the network profile of P2P bots is nontrivial asdifferent botnets exhibit different semantics and use variableprotocols To address this concern we abstract andmodel thegeneral network features of a P2P botnet using the profiles offew existing P2P botnets We focus on the communicationpatterns of P2P botnets and do not consider the individualprotocol and payload features By avoiding the payloadinspection we are able to overcome the difficulty of handlingencrypted payloads and also avoid compromising the privacyof individual users We combine these unique bot specificfeatures with the flow statistics of the node to obtain thenetwork profile of a node Third differentiating between thebehavior of a normal node and a P2P bot is a complexproblem Towards this we use machine learning techniquesto cluster and classify the collected network profile featuresWe use the decision tree technique because of its efficiencyand ease of implementation To evaluate our approach weuse real-life data sets which contain a mix of malicious andnonmalicious data We ensure that the nonmalicious datadominates the malicious content in order to estimate thesensitivity of our approach

15 Key Contributions Thekey contributions of our work areas follows (a) We describe the first node-based approach todetect P2P bots in a P2P botnet Our approach is a significantdeviation from the signature-based and flow-based detectionapproaches (b) We describe a sampling technique to reducethe overhead of monitoring for the network administrator(c) We abstract and quantify the network profile features of aP2P bot Our abstraction technique avoids dealingwith issueslike packet encryption and user privacy (d) We describe theuse of efficient machine learning algorithms to classify thenetwork profile into normal and P2P bot profile (e) We haveevaluated our approach on real-life malicious traffic datasetsand obtained a detection accuracy of 99-100with extremelylow false positives in the range of 0ndash2 We also show thatexisting state-of-the-art techniques perform poorly on thesame data set when compared to our approach

151 Organization In Section 2 we describe the relatedresearch in this domain In Section 3 we describe our node-based detection approach In Section 4 we perform a detailedevaluation of our approach We compare our scheme withother existing approaches in this domain We summarize ourpaper and describe future directions in Section 5

2 Related Research

Signature-based bot detection approach has been widelystudied [7ndash13] This approach is effective to detect knownbots for example Phatbot Kolbitsch et al [19] proposeda signature-based malware detection system which usesspecial graphs to detect different kinds of bots However thedetection rate in this approach is only 64 The utility ofsignature-based methods is limited as they are not capableof detecting unknown bots or variants of known bots Inthe current Internet scenario numerous new bot variants areincreasing rapidly thereby necessitating the need for moreadaptive approaches for bot detection

Flow-based analysis for bot detection has better detectionrate These techniques [14 15] were proposed to model awider range of bot behaviors than those covered in signature-based techniques Livadas et al [16] developed a system todetect CampC traffic of botnets based on flow analysis Thissystem consists of two stages the first stage extracts severalper-flow traffic features including flow duration maximuminitial congestion window and average byte count per packetand the second stage uses a Bayesian network classifier totrain and detect bots However the observed false positiverate is still very high 1504 as it fails to capture botnetspecific network profiles effectively Choi et al [17] proposeda botnet detection mechanism based on the monitoring ofDNS traffic during the connection stage of bots However abotnet can easily evade this mechanism if it rarely uses DNSat its initialization and limits or avoids DNS usage at latterstages

Wang et al [20] presented a detection approach of P2Pbotnets by observing the stability of control flows in thebotnet initialization time intervals However this approachsuffers from high performance and storage overhead whileachieving similar detection accuracy as earlier approachesKang et al [15] proposed a novel real-time detection modelnamed the multistream fused model in which they processdifferent types of packets in a graded manner However thismodel cannot achieve desirable detection accuracy whendeployed in a large-scale network environment Liu et al[18] presented a general P2P botnet detection model basedon macroscopic features of the network streams by utilizingcluster techniques The proposed method is unreliable orineffective if only a single infected machine is present on thenetwork

To the best of our knowledge there has been no researchfocusing on the application of node-based analysis for P2Pbot detection The node-based approach has distinct advan-tages that separate it from signature-based and flow-basedtechniques


3 Node-Based P2P Bot Detection

In this section we describe our node-based P2P bot detectionapproach Our approach consists of four important steps P2Pbot quantification efficient flow monitoring classificationand evaluation In Section 31 we describe our methodologyfor modeling P2P bots which is the most important step ofour detection approach Using this model we identify thefeatures to quantify a P2P bot In Section 32 we describe ourapproach for reducing the complexity of the flowmonitoringat the network nodes In Section 33 we describe our classi-fication approach for identifying P2P profiles from the setof network profiles of all nodes and describe the evaluationmetrics of the classification approach

31 Our Model for Quantifying P2P Bot Features In ournode-based P2P bot detection approach we monitor thecommunication flows at every node in the network to checkfor bot infection Since each flow can exhibit many features itis important to identify and isolate features which are uniqueto P2P bots Our model of P2P bots is based on two keyobservations First since a P2P bot is part of a P2P networkit exhibits the communication behavior of a normal P2Pnode but with some distinguishable differences Second aP2P bot exhibits different types of network activity comparedto regular P2P nodes Now using these observations weidentify several important features of a P2P botWe group thefeatures into two categories P2P bot communication modeland P2P bot behavior model respectively and describe themas follows

311 P2P Bot Communication Model In a P2P network anode might attempt to connect to one or more network peersperiodically in order to maintain the connection status orto query for data of interest A P2P bot performs a similaractivity but with the key difference being that the P2P botattempts such connections more actively so as to ensure theconnectivity across the P2P botnet This behavior is uniformacross all P2P bots in the P2P botnet Furthermore unlikeregular P2P communication where the P2P node attemptsconnections based on responses received from other peersthe P2P bot attempts to initiate connections proactivelyTherefore at the beginning of its activity a bot sends connec-tion requests to other bot nodes according to the peer list Acertain amount of such requests fails because some peers areshutdown or not infected On the contrary the success rate isusually high when normal P2P applications send connectionrequests Thus the success rate of connection requests is animportant criterion for P2P bot detection From the observedP2P botnet data we note that if the success rate of a nodeconnection attempt is below 50 the node tends to be a bot

312 P2P Bot Behavior Model To understand the uniquefeatures of P2P botnets we chose four kinds of real P2P botsavailable in the wild Using a controlled virtual environmentwith the help of VMware technology we analyzed these botsA summary of the results is shown in Table 1 Using this datawe identify the following network features of interest specific

to P2P bot behavior A feature represents a characteristic of anode in a given timewindowT which is chosen by the systemanalyst performing the P2P bot detection

Number of Different Protocols A majority of the P2P botsutilize both UDP and TCP packets in the same flow that issend and receive to the same destination port and addressfrom the same source port and source address For instanceBot 2 makes SMTP connection using TCP and sends severalUDP packets in the same flow The more important aspect isthat the UDP packets outnumber the TCP packets which isindicative of P2P bot behavior

Large Number of Flows The number of flows in a P2P botis higher than for a normal P2P node For instance in Bot3 there is a high amount of ICMP traffic from the P2P bottowards port 53 of random IP addresses on the InternetThese packets correspond to discovery packets intended tolocate new targets for the P2P bot infection The number offlows reflects the degree of extensive connections with othernodes

Large Number of Packets In P2P botnets due to the P2Ptopology there are a large number of packets exchangedamong the P2P nodesThis differs from a server-client botnetwhere the communication among client nodes is minimalor nonexistent Therefore a P2P botnet typically generatesa higher average in the number of packets exchanged Thisbehavior is observed uniformly across all the bots we haveanalyzed

Average Packet Length Since the P2P bots need to exchangeupdates or instructions regularly with other bots the sizeof the packets is necessarily small to avoid detection bythe IDS system However this results in many continuousuniform small packets which is a useful feature for P2P botdetection For instance in Bots 1 2 and 3 there is constantcommunication among peers regarding P2P status and otherfeatures

Ratios of Packets Exchanged The number of packets sent andreceived by P2P bots exhibits certain uniformity across all thebots These values differ considerably from the regular P2Pcommunication as observed from our analysis of the samplebots One important feature of interest is ratio of number ofpackets sent to the number of packets received RNP wherea higher value of RNP indicates that these nodes are moreactive than other nodes Another feature of importance isratio of the average of length of packets sent to the averageof length of packets received RLP where the value of RLPis an indicator to the peering relationship between nodes alower value indicating a normal P2P node and a higher valueindicating a P2P node controlled by some other peer nodes

Table 2 lists the seven features we have selected for thepurposes of P2P bot detection In this list the features suchas the source and destination IP addresses are extracteddirectly from the TCPUDP headers Other features such asthe number of protocols used require additional processingand computationTherefore we perform flowmonitoring on


Table 1 Sample bot analysis and behavior

Name Host behavior Network behavior Remark

Bot 1 Phatbot

(1) Modify the registry(2) Add startup item(3) Modify a file(4) Terminate antivirus thread

(1) Start the IRC thread(2) Start the P2P server thread(3) Start the P2P client thread

(1) Modify a file named host in systemdirectory(2) Start the thread of IRC client and connectto IRC server(3) In order to improve the communication ofp2p start both client thread and server thread

Bot 2 Zhelatinzy

(1) Modify the registry(2) Add a startup item(3) Copy file

(1) Connect to SMTP server(2) UDP connection

(1) In order to a botrsquos propagation copy the botitself to the shared directory(2) Connect to SMTP Server by SMTP thread(3) A lot of UDP connections with both thesame source port and the random target port

Bot 3 Sinit(1) UDP protocol(2) A high ICMP traffic(3) Sending packets to port 53

(1) Sending special discovery packets to port 53of random IP addresses on the Internet

Bot 4 Nugache (1) Modify the registry (1) Open TCP port 8(2) Encrypted data transmission

(1) Modify the registry and install the list withhosts into Windowsrsquos registry(2) Has a static list of IP addresses (20 initialpeers) to which it will try to connect on TCPport 8(3) The exchanged data is encrypted

Table 2 Features for node-based analysis

Feature Description(1) Node Computer address for transmitting information(2) NP Number of protocols used for time interval(3) NF Number of flows used for time interval(4) NPS Number of packets sent for time interval(5) ALPS Average length of packets sent

(6) RNP Ratio of number of packets sent to number ofpackets received for time interval

(7) RLP Ratio of average sending packets length to averagereceiving packets length for time interval

individual nodes to extract the desired features We describeour flow monitoring approach next

32 Flow Monitoring Our node-based detection approachmonitors the flows at each node to extract the networkfeatures identified in Table 2 We note that even with thisrequirement the node-based analysis is still much moreefficient than signature-based analysis But there are twochallenges in flow monitoring First analyzing each flowat a node requires capturing each packet The process ofpacket capturing is known to suffer from high packet lossesSpecialized hardware might be required to handle the packetloss which may prove to be an expensive option Second assome features like NFS NP RNP RLP and so forth cannotbe obtained from packet headers directly the packets needto be stored and processed This results in a large storageoverhead for the bot detection process To overcome thesetwo challenges we adopt the sampling approach proposed in[21 22]

Bot1

Bot1Bot1

Bot1Bot1

Bot1Bot2 Bot2

Bot2Bot2

Bot2

P1 P2 P3 P4 P5 P6 P7 P8 P9

P1 P2 P3 P4 P5 P6 P7 P8 P9

Δt Δt Δt998400

Δt998400

t1 t2 t0

middot middot middot


Normal detection

Sampling detection

Figure 4 Effect of sampling on bot detection

Specifically for each flow at the node we sample thepackets in a periodic manner thereby reducing the numberof packets that need to be captured However reducingthe number of captured packets can reduce accuracy ofbot detection To evaluate the impact of sampling on botdetection we compare continuous packet capturing againstsampling and show the results in Figure 4 This figure showsthat the normal capturing can possibly detect more botsthan sampling detection within the same time window Forexample at around time t1 the normal packet capturingdetects two bots while the sampling detects one bot Howevereventually the two methods detect the same number of botsafter a few time windows For example at around time t2both normal capturing and the sampling detect two botsTheasymptotic results are possible due to the cycle limit that isthe constant reassignment of the CampC server to different P2Pbots

Therefore using sampling in combination with ourbot detection approach can reduce the overhead of flow


monitoring without sacrificing the detection accuracy whenconsidered over a time period We note that the trade-off interms of detection time is reasonable as our approach detectsall P2P bots within acceptable time windows

33 Classification Technique and Evaluation Metrics For ourP2P bot detection approach we require classification tech-niqueswhich have high performance in order to support real-time detection goals and at the same time have high detectionaccuracy Machine learning classification techniques attemptto cluster and classify data based on feature sets We haveselected the decision tree classifier technique for our eval-uation Decision tree based classifiers exhibit desirably lowcomputational complexity with high performance In a deci-sion tree interior nodes represent input features with edgesextending from them which correspond to possible values ofthe features These edges eventually lead to a leaf node whichrepresents an output variable corresponding to a decision Forour approach the decision tree is trained based on the real-life P2P bot data using the feature set from Table 2 Duringthe detection phase the feature set extracted from nodersquosflow information is given to the classifier which essentiallyclassifies this feature set into malicious or nonmaliciousfeature set

We consider the standard metrics true positive TP truenegative TN false positive FP and false negative FN withrespect to the classification of feature set into malicious ornonmalicious The TP and TN values indicate the numberof feature sets correctly classified as malicious and benignrespectively The values FP and FN indicate the number offeature vectors incorrectly classified as malicious and benignrespectivelyThe true positive rate TPR and the false positiverate FPR are calculated using the following equations Wedefine the detection accuracy of our technique using the termdetection rate DR and set it to TPR The true positive rateTPR estimates the performance of our P2P botnet detectiontechnique in terms of the probability of a suspicious featureset correctly classified as malicious On the contrary the falsepositive rate FPR estimates the probability of a normal trafficbeing classified as malicious Finally we use the standardvariable precision to indicate the probability of detectionprecision of our technique

DR = TPR = TPTP + FN

FPR = FPFP + TN

Precision = TPTP + FP

(1)

We note that the detection rate DR approaches 1 if thefalse negatives FN tend to zero Similarly the precisionapproaches 1 if the false positives FP tend to zero Thereforeboth the detection rate and the precision have equivalentimportance in the P2P bot detection process

4 Experimental Evaluation

In this section we describe the evaluation results of ournode-based P2P bot detection approach First we present theperformance results of our scheme showing the detection rateand the precision achieved Next we compare our approachwith general flow-based detection approach and a state-of-the-art detection tool Bothunter [23] which uses eventcorrelation based analysis

41 ExperimentalMethodology and Implementation We con-struct our experimental dataset by combining two separatedatasets one containingmalicious traffic related to the Stormbotnet and the other containing malicious traffic related tothe Waledac botnet obtained from the French chapter ofthe honeynet project [24] Waledac is currently one of themost prevalent P2P botnets and has a highly decentralizedcommunication structure than the Storm botnet WhileStormusesOvernet for P2P communicationWaledac utilizesHTTP communication and a fast-flux based DNS networkextensivelyThe highly distributed nature ofWaledacmakes itresilient to bot detection approaches Next we incorporatedtwo benign datasets into our experimental dataset one isfrom theTrafficLab at EricssonResearch inHungary [25] andthe other is from the Lawrence BerkeleyNational Lab (LBNL)[26] The Ericsson Lab dataset contains a large amountof traffic from different applications including HTTP webbrowsing behaviors World of Warcraft gaming packets andfrom popular bit-torrent clients such as Azureus The LBNLtrace data provides additional nonmalicious backgroundtraffic As the LBNL is a research institute with a medium-sized enterprise network their trace data provides a differentvariety of benign traffic such as web email network backupand streaming media data This variety of traffic serves as agood example of modeling the day-to-day use of enterprisenetworks

We implemented our approach in Java Our programextracts all node information from a given packet capture(pcap) file and parses the individual node information intorelevant features for use in classification For classificationwe utilized the popular Weka machine learning framework[21] with the decision tree instantiation on our data Weused the standard training approach for training and testingour solution The key intuition is that the feature vectorscorrespond to individual node flows and the analysis is donebased on these features

42 Performance of Our Approach To evaluate our approachwe used different time windows to represent the amountof flow data analyzed that is in a wider time windowwe analyze more flow data The time window attempts toalign to the bot life cycle that is to capture the entire botspecific network activityWhile it might seem that larger timewindows are preferred our results show that with reasonabletime windows we are able to achieve 99-100 detection ratesWe have tested our approach using regular and sampledmonitoring approaches for flow capture We have tabulatedthe values of detection rate DR false positive rate FPR and


Table 3 Detection rate and precision of node-based detection

Time interval Time window Detection rate FPR Precision

0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1

precision in Table 3 We selected time windows of 10 2030 60 and 180 seconds For regular flow analysis the timeinterval of capture is set to 0 seconds that is all packetsare captured and analyzed For sampled flow capturing weset the time intervals of capturing to 10 20 60 and 180seconds With large time windows our sampled approachreduces the processing overhead considerably while retainingthe detection accuracy near to optimal

From Table 3 we make an important observation thatfor both regular and sampled monitoring the average botdetection rate of our approach is 99 For regular monitor-ing which is the row corresponding to time interval of 0seconds there is a gradual reduction in the false positive rateas the time window increases and for the time window of 180seconds the FPR falls to 0 and the DR and precision valuesachieved are 1 This result shows that our approach is capableof detecting P2P bots with 100 accuracy given a sufficienttime window This result combined with the 99 accuracyachieved in other timewindows validates our node-based botdetection approach

For sampled monitoring we chose the time intervalsof 0 10 20 30 60 and 180 seconds From the results inTable 3 we observe that the sampling has limited effect onDR and precision while the FPR shows a slight increasefor smaller time windows However the impact of FP onprecision is very slight These results show that our botdetection approach works equally well with sampled flowmonitoring and hence can scale to real-time detection forhigh performance systems The main advantage of samplingis that it reduces more than 60 of the input raw packettraces while retaining the high detection rates and very lowfalse positive rates 0ndash2 when viewed in the absolute termsThe number of bots found in different time windows andthe length of packets captured are illustrated in Figure 5 Wecan see from this figure that the amount of data processeddenoted by the length of packets is considerably smaller forsampling detection while detecting the same number of bots

43 Comparison with Flow-Based Detection We comparedour node-based approach with the generic flow-based bot


The number of bots found

0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)

The amount of data processing

0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)

Figure 5 Bots detected in different time windows and data processed for detection

detection approaches [15ndash18 20 27] Since our node-basedapproach has broader adaptability to new bot behaviors weexpected our approach to perform better than flow-basedapproaches To verify this expectation we implemented flow-based detection by extracting 12 features from the networkflows as shown in Table 4 The summary of our experimentalresults is shown in Table 5 Our approach has lower falsepositive rate than flow-based approach and the detectionrate is higher More importantly our approach has betterperformance since the sampled detection approach reducesthe processing and storage overhead considerably

44 Comparisonwith BotHunter BotHunter [23] is one of thefew botnet detection tools relevant to ourwork and is publiclyaccessible BotHunter mainly consists of a correlation enginethat creates associations among alerts generated by Snort[22] For generating Snort alerts Bothunter uses two customplugins the SLADE plugin for detecting payload anomaliesand the SCADE plugin for detecting inout bound scanningof the network In addition to regular Snort rule sets Both-unter uses an enhanced rule set that is specifically designed todetect malicious traffic related to botnet activities such as eggdownloads and CampC traffic The correlation engine analyzesall the alerts creates associations among them and generatesa report for botnet infections

When we tested BotHunter on our dataset the generatedalerts indicated that there is a spambot in the datasetMore specifically there were three alerts with priority 1 thatreported the presence of botnet traffic But all three alertspointed to the same IP address corresponding to a machineinfectedwith theWaledac botnetMoreover BotHunter failedto detect the other machine that was infected with the Stormbotnet Finally among the 97043 unique malicious flows in

Table 4 Features for flow-based detection comparison

Attribute DescriptionSrcIp Flow source IP addressSrcPort Flow source port addressDstIp Flow destination IP addressDstPort Flow destination port addressProtocol Transport layer protocol or ldquomixedrdquoAPL Average payload packet length for time intervalPV Variance of payload packet length for time intervalPX Number of packets exchanged for time interval

PPS Number of packets exchanged per second in timeinterval 119879

FPS The size of the first packet in the flowTBP The average time between packets in time intervalNR The number of reconnections for a flow

FPH Number of flows from this address over the totalnumber of flows generated per hour

Table 5 Comparison of flow-based and our approach

Attribute Flow-based Our approachTrue positive rate 983 100False positive rate 001 0

the system BotHunter was able to detect only 56 flows whichis a very small percentage of the total flowsThese results showthat our approach performs better than BotHunter in termsof both performance and detection accuracy


5 Conclusion and Future Work

We described node-based detection a novel direction todetect P2P botnets Our approach is node-centric and focuseson modeling the network behavior of individual nodes Ourmodel is constructed by using a combination of the P2Pcommunication model and the observed behavior of real-lifeP2P botnets We identified useful features that are indicativeof bot behavior and extract these features using the flowsat individual nodes Due to the generality of our P2P botmodel we are able to use sampling to reduce the effort offlow monitoring at individual nodes while retaining highdetection accuracy Since our model is based on observedbehavior our approach is resilient to variations in protocolsand payload obfuscations usually employed by P2P botnetsOur experimental results over different time windows showthat by choosing an appropriate time window we can achieve99-100 accuracy of detection We have also shown that ourapproach outperforms existing approaches considerably

We note that it is very important and necessary to designa system that can evaluate the performance of the detectiononline instead of the present off-line mechanism in our workIt is also important to train the detection system onlineinstead of an off-line training process so that it is suitablefor live deployment Such a system is ideal for identifyingnew threats such as zero-day malware We will explore theseissues in our future work Further we will explore the use ofAIS Artificial Immune System to solve the huge number ofbehavior problems and identify key influence factors in thefuture

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was funded by the National Natural ScienceFoundation of China under Grant no 61373134

References

[1] B Stone-Gross M Cova B Gilbert R Kemmerer C Kruegeland G Vigna ldquoAnalysis of a botnet takeoverrdquo IEEE Security andPrivacy vol 9 no 1 pp 64ndash72 2011

[2] X Ma X Guan J Tao et al ldquoA novel IRC botnet detectionmethod based on packet size sequencerdquo in Proceedings of theIEEE International Conference on Communications (ICC rsquo10)pp 1ndash5 Cape Town South Africa May 2010

[3] W Liao and C Chang ldquoPeer to peer botnet detection using datamining schemerdquo in Proceedings of the International Conferenceon Internet Technology and Applications (ITAP rsquo10) pp 1ndash4Wuhan China August 2010

[4] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Symposium on InformationAssurance and Security (IAS rsquo08) pp 318ndash323 Naples ItalySeptember 2008

[5] D A L Romana Y Musashi R Matsuba and K SugitanildquoDetection of bot worm-infected PC terminalsrdquo Informationvol 10 no 5 pp 673ndash686 2007

[6] P Wang S Sparks and C C Zou ldquoAn advanced hybrid peer-to-peer botnetrdquo IEEE Transactions on Dependable and SecureComputing vol 7 no 2 pp 113ndash127 2010

[7] X Dong F Liu X Li and X Yu ldquoA novel bot detectionalgorithm based on API call correlationrdquo in Proceedings of the7th International Conference on Fuzzy Systems and KnowledgeDiscovery (FSKD rsquo10) vol 3 pp 1157ndash1162 Yantai China August2010

[8] W Zilong W Jinsong H Wenyi and X Chengyi ldquoThedetection of IRC botnet based on abnormal behaviorrdquo inProceedings of the 2nd International Conference on MultiMediaand Information Technology (MMIT rsquo10) pp 146ndash149 KaifengChina April 2010

[9] S Wang Q-J Du G-X Yu et al ldquoMethod of choosing optimalcharacters for network intrusion detection systemrdquo ComputerEngineering vol 36 no 15 pp 140ndash144 2010

[10] B Al-Duwairi and L Al-Ebbini ldquoBotDigger a fuzzy inferencesystem for botnet detectionrdquo in Proceedings of the 5th Interna-tional Conference on Internet Monitoring and Protection (ICIMPrsquo10) pp 16ndash21 Barcelona Spain May 2010

[11] Y Al-Hammadi and U Aickelin ldquoDetecting bots based onkey logging activitiesrdquo in Proceedings of the 3rd InternationalConference on Availability Security and Reliability (ARES rsquo08)pp 896ndash902 Piscataway NJ USA March 2008

[12] M Crotti F Gringoli P Pelosato and L Salgarelli ldquoA statisticalapproach to IP-level classification of network trafficrdquo inProceed-ings of the IEEE International Conference on Communications(ICC rsquo06) vol 1 pp 170ndash176 Istanbul Turkey July 2006

[13] X Wang F Liu J Ma and Z Lei ldquoResearch of automaticallygenerating signatures for botnetsrdquo Journal of Beijing Universityof Posts and Telecommunications vol 34 no 4 pp 109ndash112 2011

[14] K Wang C Huang S Lin and Y Lin ldquoA fuzzy pattern-basedfiltering algorithm for botnet detectionrdquo Computer Networksvol 55 no 15 pp 3275ndash3286 2011

[15] J Kang Y Song and J Zhang ldquoAccurate detection of peer-to-peer botnet using multi-stream fused schemerdquo Journal ofNetworks vol 6 no 5 pp 807ndash814 2011

[16] C Livadas R Walsh D Lapsley and W T Strayer ldquoUsingmachine learning techniques to identify botnet trafficrdquo in Pro-ceedings of the 31st Annual IEEE Conference on Local ComputerNetworks (LCN rsquo06) pp 967ndash974 Tampa Fla USA November2006

[17] H Choi H Lee and H Kim ldquoBotnet detection by monitoringgroup activities in DNS trafficrdquo in Proceedings of the 7th IEEEInternational Conference on Computer and Information Tech-nology (CIT rsquo07) pp 715ndash720 Aizuwakamatsu Japan October2007

[18] D Liu Y Li Y Hu and Z Liang ldquoA P2P-botnet detectionmodel and algorithms based on network streams analysisrdquo inProceedings of the International Conference on Future Informa-tion Technology and Management Engineering (FITME rsquo10) vol1 pp 55ndash58 Changzhou China October 2010

[19] C Kolbitsch P M Comparetti C Kruegel E Kirda X Zhouand X Wang ldquoEffective and efficient malware detection at theend hostrdquo in Proceedings of 18th USENIX Security Symposiumpp 351ndash366 USENIX Association Montreal Canada 2009

[20] B Wang Z Li H Tu and J Ma ldquoMeasuring peer-to-peerbotnets using control flow stabilityrdquo in Proceedings of the


International Conference on Availability Reliability and Security(ARES rsquo09) pp 663ndash669 Fukuoka Japan March 2009

[21] I H Witten E Frank and M Hall Data Mining PracticalMachine Learning Tools andTechniquesMorganKaufmann 3rdedition 2011

[22] M Roesch ldquoSnortmdashlightweight intrusion detection for net-worksrdquo in Proceedings of the 13th USENIX Conference on SystemAdministration (USENIX LISA rsquo99) pp 229ndash238 USENIXAssociation Berkeley Calif USA 1999

[23] G Gu P Porras V Yegneswaran M Fong and W LeeldquoBotHunter detecting malware infection through IDS-drivendialog correlationrdquo in Proceedings of the 16th USENIX SecuritySymposium pp 167ndash182 2007

[24] French Chapter of Honenynet httpwwwhoneynetorgchap-tersfrance

[25] G Szabrsquoo D Orincsay S Malomsoky and I Szabrsquoo ldquoOn thevalidation of traffic classification algorithmsrdquo in Proceedings ofthe 9th International Conference on Passive and Active NetworkMeasurement (PAM rsquo08) pp 72ndash81 Cleveland Ohio USA2008

[26] LBNL Enterprise Trace Repository 2005 httpwwwicirorgenterprise-tracing

[27] L Braun G Munz and G Carle ldquoPacket sampling for wormand botnet detection in TCP connectionsrdquo in Proceedingsof the 12th IEEEIFIP Network Operations and ManagementSymposium (NOMS rsquo10) pp 264ndash271 IEEE Osaka Japan April2010

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



Botmaster

CampC serverCampC server

BotBotBotBotBot

Figure 1 CampC botnet

Figure 2 P2P network

Botmaster

BlockedBlocked

Blocked

Blocked

Remote

Figure 3 P2P botnet bypassing firewalls

used in P2P bots are typically self-propagatingmdashwhichhelp in discovering new peers and mutating as wellmdashtoavoid signature-based detection Due to their resilience P2Pbotnets represent a major threat to the Internet applicationsand infrastructure Therefore to safeguard the Internet fromstrategic coordinated attacks there is an urgent need to devisesolutions to detect P2P bots and render the P2P botnetsineffective

12 Limitation of Prior Art The existing solutions for P2Pbotnet detection can be broadly classified into signature [7ndash13] and flow-based detection [14 15 15ndash18] Signature-basedP2P bot detection is based on inspecting each packet inthe network traffic entering or leaving the Internet gatewayof the network for the presence of special features such asport numbers byte sequences in the payload and blacklistedIP address These special features known as signaturesare extracted from known botnet infections in the pastand stored in a signature database While signature-baseddetection has good detection rate and is easily deployedit has two major limitations First it is deterministic andrelies only on detecting known botnet infections and cannotdetect unknown bots Even known bots can evade signaturedetection by changing ports of communication or use packetpayload encryption to hide the bot specific features Secondinspection of each packet results in performance degradationespecially when the traffic consists of a large amount ofbenign data

Flow analysis based bot detection examines networkflows between two nodes where a flow is defined as a set ofpackets which have the same source address source portdestination address and destination port The intuition inthese approaches is that the flow features such as the countof the packets in the flow the order of the packet arrivalsand the interval between packets can model the botnetcommunication patterns more accurately than direct packetinspection The extracted features are used to construct aclassifier that can differentiate normal flows and maliciousbot flows Since classifiers use statistical profiling flow-basedanalysis is capable of detecting unknown bots which exhibitbehavioral similarities to known bots However flow-basedtechniques suffer from two key limitations First there areseveral flows between any two network nodes which need tobe analyzed and usuallymost of these flows belong to normalnetwork processes Second the flow features need to beextracted at runtime which implies that flow-based analysisrequires considerable computational overhead at runtime Atany given instant there are a significant number of flows inthe network which exaggerate the impact of these limitationsfurther

13 Proposed Approach In this paper we describe a novelP2P bot detection approach called node-based bot detectionin which we analyze the network profile of nodes to detectbot characteristics A sample network profile of a nodemay comprise the different protocols used by the node thenumber of flows in a particular time period packet statisticsand so on Our approach is based on the intuition that P2Pbots exhibit a distinct network profile due to the variousP2P network maintenance related tasks they are required toperform A P2P bot will be more active in communicatingwith other P2P bots and exchanging various instructionsrelated to control and command Also unlike normal P2Pnodes the P2P bots exhibit nearly uniform network activitybased on the instructions of the current CampC server Based onthese observations our approach consists of identifying andquantifying the network profile features that are typical of a






2 Related Research





















Bot 1 Phatbot




Bot 2 Zhelatinzy














Bot1

Bot1Bot1

Bot1Bot1

Bot1Bot2 Bot2

Bot2Bot2

Bot2

P1 P2 P3 P4 P5 P6 P7 P8 P9

P1 P2 P3 P4 P5 P6 P7 P8 P9

Δt Δt Δt998400

Δt998400

t1 t2 t0



Normal detection

Sampling detection









FPR = FPFP + TN


(1)










0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1







0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation














2 Related Research





















Bot 1 Phatbot




Bot 2 Zhelatinzy














Bot1

Bot1Bot1

Bot1Bot1

Bot1Bot2 Bot2

Bot2Bot2

Bot2

P1 P2 P3 P4 P5 P6 P7 P8 P9

P1 P2 P3 P4 P5 P6 P7 P8 P9

Δt Δt Δt998400

Δt998400

t1 t2 t0



Normal detection

Sampling detection









FPR = FPFP + TN


(1)










0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1







0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation

























Bot 1 Phatbot




Bot 2 Zhelatinzy














Bot1

Bot1Bot1

Bot1Bot1

Bot1Bot2 Bot2

Bot2Bot2

Bot2

P1 P2 P3 P4 P5 P6 P7 P8 P9

P1 P2 P3 P4 P5 P6 P7 P8 P9

Δt Δt Δt998400

Δt998400

t1 t2 t0



Normal detection

Sampling detection









FPR = FPFP + TN


(1)










0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1







0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation














FPR = FPFP + TN


(1)










0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1







0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation












0

10 0997 0026 099720 0998 0024 099830 0998 0007 099760 0999 0 0999180 1 0 1

10

10 0998 0004 099820 0998 0 099730 0999 0 099860 0995 0 0995180 0996 0 0996

20

10 0998 0005 099820 0997 0035 099630 0999 0015 099860 0997 0 0996180 1 0 1

30

10 0998 0016 099820 0998 0007 099830 0997 0053 099760 0999 0 0998180 1 0 1

60

10 0998 0053 099720 0999 0 099830 0997 0 099760 0999 001 0998180 0999 0 0998

180

10 1 0 120 1 0 130 1 0 160 0999 0026 0999180 1 0 1







0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











0

1

2

3

4

The n

umbe

r of b

ots f

ound

SamplingNormal

0 20 40 60 80 100 120 140 160Time (s)

minus20

(a)


0

SamplingNormal

0 20 40 60 80 100 120 140 160

500

1000

1500

2000

2500

3000

The l

engt

h of

pac

kets

Time (s)minus20

(b)



















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation















Acknowledgment


References
































RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation




















RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Research Article Towards Accurate Node-Based Detection...

Documents

Transcript of Research Article Towards Accurate Node-Based Detection...