ResearchArticle An Effective Conversation-Based...
Transcript of ResearchArticle An Effective Conversation-Based...
Research ArticleAn Effective Conversation-Based Botnet Detection Method
Ruidong Chen12 Weina Niu12 Xiaosong Zhang12 Zhongliu Zhuo12 and Fengmao Lv1
1School of Computer Science and Engineering University of Electronic Science and Technology of China ChengduSichuan 611731 China2Center for Cyber Security University of Electronic Science and Technology of China Chengdu Sichuan 611731 China
Correspondence should be addressed to Xiaosong Zhang johnsonzxsuestceducn
Received 25 January 2017 Accepted 12 March 2017 Published 9 April 2017
Academic Editor Lixiang Li
Copyright copy 2017 Ruidong Chen et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
A botnet is one of the most grievous threats to network security since it can evolve into many attacks such as Denial-of-Service(DoS) spam and phishing However current detection methods are inefficient to identify unknown botnet The high-speednetwork environmentmakes botnet detectionmore difficult To solve these problems we improve the progress of packet processingtechnologies such as New Application Programming Interface (NAPI) and zero copy and propose an efficient quasi-real-timeintrusion detection system Our work detects botnet using supervised machine learning approach under the high-speed networkenvironment Our contributions are summarized as follows (1) Build a detection framework using PF_RING for sniffing andprocessing network traces to extract flow features dynamically (2) Use random forest model to extract promising conversationfeatures (3) Analyze the performance of different classification algorithms The proposed method is demonstrated by well-knownCTU13 dataset and nonmalicious applications The experimental results show our conversation-based detection approach canidentify botnet with higher accuracy and lower false positive rate than flow-based approach
1 Introduction
Botnet [1] comprises many compromised hosts under thecontrol of the botmaster remotely Early botnets relied onInternet RelayChat (IRC) [2] andHypertext transfer protocol(HTTP) [3] to communicate The problem of that is single-point invalid and easy to be detected and destroyed Mostbotnets are decentralized and use P2P technology [4] toconstruct command and control (CampC) mechanism [5]Noncentral node P2P botnet [6] is harder to detect than IRCand HTTP-based one What is more bots evolve into attackswhich are difficult to track their position Most currentDenial-of-Service (DoS) and spam are caused by botnet [7]Thus the botnet is one of the greatest threats to networksecurity
In the past researchers used signature-based [8] andanomaly-based intrusion detection systems (IDS) [9 10] todetect botnet However the former has two shortcomingsone is that the original detection rules cannot effectivelydetect bot program that changes communication means theother is that inaccurate signatures cause high false positive
Lack of scalability under huge network traffic is anotherproblem
Currently the backbone network is based on 1Gbps or10Gbps optical fibers which renders massive traffic datain short time Moreover fast growing P2P applicationspose significant strain to data storage Therefore identifyingbotnet traffic under high-speed network is a challengingissue [11] In this paper a detection platform with highdetection accuracy and powerful traffic processing ability isproposed It uses conversation-based network traffic analysisand supervisedmachine learning to identifymalicious botnettraffic The experimental results show that random forestalgorithm [12] has higher detection accuracy and lowerfalse positive rate Moreover we further explore the top fiveclassifiers (RandomForest REPTree RandomTree BayesNetand DecisionTump [13])
The contributions of the paper are threefold First a novelbotnet detection system with low latency and high accu-racy is introduced Second our detection method identifiesbotnet traffic using conversation-based traffic analysis andsupervised machine learning Our approach outperforms the
HindawiMathematical Problems in EngineeringVolume 2017 Article ID 4934082 9 pageshttpsdoiorg10115520174934082
2 Mathematical Problems in Engineering
accuracy based on flow since the false positive rate of botnettraffic decrease is 132 percent In addition to the above twowe evaluate performances of the five well supervisedmachinelearning algorithms (MLAs) [14] The detection rate of thebotnet is up to 936 and the false alarm rate is about 03by the random forest algorithm
The remainder of this paper is organized as followsSection 2 gives an overview of botnet detection related worksSection 3 shows the proposed detection method Section 4provides the preliminary experimental results conclusion aresummarized in Section 5
2 Related Work
Botnet detection methods fall into two categories hostbehavior-based detection [15] and network-based detection[16]
21 Host-Based Detection Host-based detection is the earli-est method To determine whether a host is compromisedthis method continuously monitors the change of processfiles network connections and registries under a controlledenvironment [17 18] Host-based detection is useful indetecting known bots However it performs poorly becauseit cannot detect new or variant bots For example host-baseddetection has a sense of inability to identify bots with newtechnologies like a rootkit counter debug
22 Network-Based Detection Network-based detection [19ndash21]mainly identifies traffic in CampC control phrase of a botnetbecause behavior features in this phrase are different fromother phrases Network-based detection mainly focuses onanalyzing two kinds of network behaviors the rate of failedconnection and flow features Most commonly used flow fea-tures include the number of uplink (downlink) data packetsthe number of uplink (downlink) transmission bytes theaverage variance-length of uplink (downlink) data packetsthe maximum length of uplink (downlink) data packets theaverage variance-length of uplink (downlink) data packetsthe duration time of data flow (ms) the rate of the lengthof data packets in uplink and downlink and the total lengthof loaded data packets in a flow Nowadays researchersintroduce machine learning and neural network to network-based detection to identify unknown botnet trafficThus thismethod is a hot research point in recognition and analysis ofbotnet traffic
Network-based detection method has a high detectionrate because it extracts common flow features independentof botnet category However in the high-speed and complexnetwork existing detection platforms based on flow featuresare ineffective due to high packet drop rate
3 Our Detection Method
In this section we describe the components of our proposedbotnet traffic detection framework
Standard NIC
mmap
Kernel layer pf_ringKo
Configuration files
Packets inflow_1
Application layer
Ring Buffer
Ring Buffer
Ring Buffer
flow_nPackets in
middot middot middot
NA
PI
Figure 1 The packet process module architecture
The framework consists in the following
(1) Traffic processmodule for clustering captured packetsinto different flow buffer
(2) Flow-based feature extraction module for generatingstatistical characteristics of flow
(3) Conversation-based feature selection module forextracting promising conversation-based feature set
(4) Botnet detectionmodule for identifying botnet trafficusing machine learning algorithm
Packet process module is used to extract the requiredfields out of the packets After the extraction of the desiredinformation from the packet process module the flow-basedfeature extractionmodule is used for generating flow featuresBased on the flow features conversation-based feature selec-tion module can obtain promising conversation feature setfor the botnet detection module Botnet traffic detection isaccomplished using supervised classification algorithm [22]
31 Traffic Process Module Libcap [23] is used for sniffingthe packets from the network interface due to its simpleoperation and cross-platform After the NIC captures thepackets Libcap copies packets from the driver to kernel-levelusing DMA in order to filter themThen Libcap copy filteredpackets into application in user level for further analyzingwhereasmultiple copies of Libcap imposemore overhead andconsume more time The mechanics of Libcap makes packetloss and do not reduce the user session PF_RING [24] is anew network socket that uses NewApplication ProgrammingInterface (NAPI) and zero copy to capture packet data froma live network Thus PF_RING is used to capture trafficonto successive pcaps The detailed packet capture process isshown in Figure 1
First the kernel layer of the packet process modulereads the configuration file to set the parameter values likepacket length ClusterId ClusterId is the ID of Ring Buffercreated by PF_RING Parameter values are stored in theconfiguration file so that we can modify them at any timeSecond network devices are turned on and Ring Buffer iscreated using pfring_open (device snaplen flags) functionof PF_RING where device denotes the name of the networkdevice snaplen denotes the packet length and flags denotes
Mathematical Problems in Engineering 3
Flowexistence
Create flow andupdate status
Flag is0x02
Update flowinformation
End
Drop packet
No
No
FIN or RST value is 1
Outputcomplete flow
Yes
YesYes
Strat
Figure 2 The packet process module architecture
whether it is in mixed mode Here we set snaplen value as60 because header fields of a packet are needed in this paperThird we save the header information payload length andarrival time of a packet in different flow buffer according tofive tuples (SrcIp DstIp SrcPort DstPort Proto) which isused to mark a flow That is if two different packets havethe same sourcedestination hostport and the same protocolthey belong to the same flow
32 Flow-Based Feature Extraction Module There are differ-ent flow reorganization methods for different transport layerprotocols Using TCP packets as an example we use a three-way handshake to represent the start of a flowWhen a packetwhose FIN or RST value is 1 comes the end of this flowis marked The detailed TCP flow reorganization process isshown in Figure 2
When a packet comes we decide whether the flow thispacket belongs to exists If a packet whose flag value is 0x02and the flow does not exist we create a flow according to Ipprotocol and port When the flag of the packet takes othervalues this packet needs to be dropped An instance of a flowreorganization state machine can be in only one of the fivestates handshake_1 handshake_2 handshake_3 data trans-mission and end If a packet whose flag value is 0x02 thisprocess is in the status of handshake_1 Only when a packetwhose flag value is 0x12 is coming the flow reorganizationwill be in handshake_2 status Then the arrival of a packetwhose flag value is 0x10 marks handshake_3 status Afterthe three-way handshake data begins transmitting In theprocedure of flow reorganization whenever there is a packetwhose flag value is 0x02 it turns back to the handshake_2status
After analyzing the data characters of a botnet we findthat there is a flow similarity of the same botnet Herea conversation contains many flows with different sourceports That is two flows having the same sourcedestinationIP destination port and protocol can be classified as thesame conversation Promising conversation feature gener-ating is based on the flow features Thus the flow-basedfeature module extracts statistical features including flowduration the average interval of up (down) flow the maxi-malminimumaverage length of up (down) flow the numberof valid up (down) packets in a flow the number of trans-mission bytes of up (down) flows and the number of smallpackets in a flow
33 Conversation-Based Feature Selecting Module
331 Conversation Features
(1) The Duration Time of Flows in a Conversation Thecommunication between the botmaster and other bot hostsis done by bots Thus the duration time of botnet flowis usually fixed and short However the duration time ofnormal flow is determined by user behaviors Here we canextract the average duration time of flows in a conversation(avg_duration) the minimum and maximum duration timeof flows in a conversation (min_durationmax_duration) thestandard deviation of duration time of flows in a conversation(std_duration) and the average arriving intervals of upand down flows in a conversation (avg_finter avg_binter)Assuming that there are 119899 flows in a conversation avg_finter= (avg_finter_1198911+sdot sdot sdot+avg_finter_119891119899)119899 where avg_finter_1198911denotes the average arriving intervals of up packets in the firstflow
(2) The Distribution of Flows in Conversation During thecommunication process among nodes in a botnet the sizeand the number of transmitted data packets are small AndCampC communication flows produced from bot hosts in thesame botnet have great similarity [25] Thus we extractthe average length of up and down flows in a conversa-tion (avg_fpkl avg_bpkl) the minimum length of up anddown flows in a conversation (min_fpkl min_bpkl) themaximum length of up and down flows in a conversation(max_fpkl max_bpkl) the standard variation of the lengthof up and down flows in a conversation (std_avg_fpklstd_avg_bpkl) the average number of valid up and downflows in a conversation (avg_fpks avg_bpks) the standardvariation of the number of valid up and down flows ina conversation (std_avg_fpks std_avg_bpks) the averagenumber of transmission bytes of up and down flows ina conversation (avg_fpksl avg_bpksl) and the standardvariation of transmission bytes of up and down flows in aconversation (std_fpksl std_bpksl)
(3) The Distribution of Small Packets in Conversation Thereare many packets within the range of 40320 bytes [26] in thebotnet traffic because bots need to constantly connect to newhosts However a packet size of the benign server traffic is
4 Mathematical Problems in Engineering
largeThus the distribution of small packet in a conversationis an interesting characteristic of botnet traffic like the min-imum of small packet in a conversation (min_spacket) themaximum of small packet in a conversation (max_spacket)the average of small packet in a conversation (avg_spacket)and the standard variance of small packet in a conversation(std_spacket) In conclusion there are 26 features extractedfrom conversations which are shown in Table 1
332 Feature Selection We use random forest algorithm [12]to select promising features All the classification trees inrandom forest is binary tree Construction of classificationtree meets the principle of recursive splitting from top tobottom For each classification binary tree all the train setthey used is sampled from the original dataset In otherwords several samples in the original train set may appearmany times in the train set of one classification tree and maynever appear in any classification tree samples Algorithm 1describes how to construct a random forest algorithm indetail
In the procedure of random forest model establishmentGini coefficient is used to select feature Here are 2 classesthus the value of 119870 is 2 We suppose that feature 119860_119894 (119894 =1 26) splits dataset 119863 into 119873 parts 1198631 119863119899 On thecondition of the feature 119860_119894 Gini coefficient of dataset 119863 isshown as follows
Gini (119863 119860_119894) = 119873sum119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| Gini (119863119899)
Gini (119863119899) = 1 minus119870sum119896=1
( 1003816100381610038161003816119862119896100381610038161003816100381610038161003816100381610038161198631198991003816100381610038161003816)2
(1)
Then we select promising features according to randomforest model The feature selection process is shown inAlgorithm 2 In the algorithm input data with 27 columnsincludes 26 conversation features and a class label
In every iteration we first rank features according totheir importance and then delete the feature with minimumvalue until detection rate no longer changes The formulafor calculating an RF score of features is shown in (2) Inthe following equation there is119872 decision tree with feature119860_119894 Gini(119863 119860_119894) indicatesGini coefficient of dataset119863 usingfeature 119860_119894 in the current decision tree
RF_score (119860_119894) = 119872sum119898=1
119870 lowast 119873minus119898prod119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| minus Gini (119863 119860_119894) (2)
Depending on the following random forest model thedetection rate is generated using testing data In this workwe use the features including std_bpksl std_avg_fpklstd_avg_fpks std_f(b)pksl std spacket Feature vectorsconstructed in this paper include SrcIp DstIp DstPortPro std_bpksl std_avg_fpkl std_avg_fpks std_f(b)pkslstd_spacket and using SrcIp DstIp DstPort Proto repre-sents the conversion of visiting the same service
Table 1 Conversation features
Feature value Description of feature value
avg_duration The average duration timeof flows in a conversation
min_durationThe minimum duration
time of flows in aconversation
max_durationThe maximum duration
time of flows in aconversation
std_durationThe standard deviation ofduration time of flows in a
conversation
avg_f(b)interThe average interval of up
(down) flows in aconversation
avg_f(b)pklThe average length of upand down flows in a
conversation
min_f(b)pklThe minimum length of up
(down) flows in aconversation
max_f(b)pklThe maximum length of up
(down) flows in aconversation
std_avg_f(b)pklThe standard variation ofthe length of up (down)flows in a conversation
avg_f(b)pksThe average number of up(down) valid flows in a
conversation
std_avg_f(b)pks
The standard variation ofthe number of up (down)
valid flows in aconversation
avg_f(b)pkslThe average of transmissionbytes of up (down) flows in
a conversation
std_f(b)pksl
The standard variation oftransmission bytes of up
(down) flows in aconversation
min_spacket The minimum of smallpacket in a conversation
max_spacket The maximum of smallpacket in a conversation
avg_spacket The average of small packetin a conversation
std_spacketThe standard variance of
small packet in aconversation
34 Botnet Detection Module In order to achieve scal-ability in botnet detection module we use API pro-vided by Weka to implement machine learning algo-rithms [14] The conversation feature need be saved inCSV format at the conversation-based feature selecting
Mathematical Problems in Engineering 5
Input data a labeled dataset with 119901 featuresOutput PF a random forest model(1) 119899 larr the number of decision trees119898 larr the number of selected features(2) initialization119898 = 119872 119899 = 119873 119894 = 1(3) while 119894 le 119873 do(4) draw a bootstrap sample 119885lowast of size 119878 from data(5) repeat(6) select119872 features at random from the 119901 features(7) calculate the Gini coefficient of selected119872 features(8) select the feature with lower Gini coefficient among the119872(9) split the node into two daughter nodes(10) until the minimum node size is reached(11) construct decision tree 119894(12) 119894 = 119894 + 1(13) end while
Algorithm 1 Random forest algorithm
Input train_data a labeled training set test_data a labeled testing setOutput PF a list of promising features(1) 120575 larr an error range 119861119863_119897 larr the botnet traffic detection rate 119861119863_119888 larrthe current botnet traffic detection rate(2) initialization 119894 = 1 120575 = Θ(3) 119861119863_119897 = 119861119863_119888 = Randonforest (all features)(4) while |119861119863_119888 minus 119861119863_119897| le Θ do(5) 119861119863_119897 = 119861119863_119888(6) calculate RF scores of importance(7) rank the RF scores(8) delete the feature with the smallest importance from train_data and test_data(9) 119861119863_119888 = randomforest (remaining_features)(10) end while
Algorithm 2 Feature selection algorithm
module First the module reads feature vectors usingwekacoreconvertersConverterUtilsDataSource Howeversome features like SrcIp DstIp DstPort and Proto haveno efficiency in identifying botnet traffic Second thismodule deletes them using wekacoreInstances Third weuse random forest algorithm to train these data throughwekaclassifierstreesRandomForest Fourth this moduleuses the trained classifier to predict unlabeled data by call-ing classifyInstance(unlabeledinstance (119894)) function Hereldquounlabeledrdquo denotes testing data without a label
4 Experimental Results and PerformanceAnalysis
41 Experimental Setup Famous public datasets used todetect botnet traffic include dataset disclosed from Informa-tion Security and Object Technology (ISOT) organization[25] Stratosphere [27] and the CTU University [28] Thedataset from Stratosphere contains many types of botnetbehaviors traffic such as the traffic of scanned port traffic
of CampC communication and attack traffic However mostbotnet traffic of this dataset is IRC and HTTP botnet andthere is only one type of P2P botnet traffic The dataset fromISOT only contains three types of botnet traffic WaledacStorm and Zeus and many background traffic The datasetfrom the CTU University consists of thirteen scenarios ofdifferent botnet samples Thus in the experiment we usedataset fromCTUUniversity And the distributions of botnettypes about training and test in our experiment are listed inTables 2 and 3 For example Rbot contains three types ofbotnets namely IRC DDoS and the US
42 The Results and Analysis of Experiments During theprocess of experiment we assess our detection method byadopting the train set and test set from CTU13 The CUT13dataset provides a better test environment for unknownbotnet because this test set contains many types of botnettraffic which do not exist in the training set
The effectiveness of the top five classifiers namely ran-dom forest REPTree randomTree BayesNet and Decision-Tump [29] has been studied with the CTU botnet traffic and
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
accuracy based on flow since the false positive rate of botnettraffic decrease is 132 percent In addition to the above twowe evaluate performances of the five well supervisedmachinelearning algorithms (MLAs) [14] The detection rate of thebotnet is up to 936 and the false alarm rate is about 03by the random forest algorithm
The remainder of this paper is organized as followsSection 2 gives an overview of botnet detection related worksSection 3 shows the proposed detection method Section 4provides the preliminary experimental results conclusion aresummarized in Section 5
2 Related Work
Botnet detection methods fall into two categories hostbehavior-based detection [15] and network-based detection[16]
21 Host-Based Detection Host-based detection is the earli-est method To determine whether a host is compromisedthis method continuously monitors the change of processfiles network connections and registries under a controlledenvironment [17 18] Host-based detection is useful indetecting known bots However it performs poorly becauseit cannot detect new or variant bots For example host-baseddetection has a sense of inability to identify bots with newtechnologies like a rootkit counter debug
22 Network-Based Detection Network-based detection [19ndash21]mainly identifies traffic in CampC control phrase of a botnetbecause behavior features in this phrase are different fromother phrases Network-based detection mainly focuses onanalyzing two kinds of network behaviors the rate of failedconnection and flow features Most commonly used flow fea-tures include the number of uplink (downlink) data packetsthe number of uplink (downlink) transmission bytes theaverage variance-length of uplink (downlink) data packetsthe maximum length of uplink (downlink) data packets theaverage variance-length of uplink (downlink) data packetsthe duration time of data flow (ms) the rate of the lengthof data packets in uplink and downlink and the total lengthof loaded data packets in a flow Nowadays researchersintroduce machine learning and neural network to network-based detection to identify unknown botnet trafficThus thismethod is a hot research point in recognition and analysis ofbotnet traffic
Network-based detection method has a high detectionrate because it extracts common flow features independentof botnet category However in the high-speed and complexnetwork existing detection platforms based on flow featuresare ineffective due to high packet drop rate
3 Our Detection Method
In this section we describe the components of our proposedbotnet traffic detection framework
Standard NIC
mmap
Kernel layer pf_ringKo
Configuration files
Packets inflow_1
Application layer
Ring Buffer
Ring Buffer
Ring Buffer
flow_nPackets in
middot middot middot
NA
PI
Figure 1 The packet process module architecture
The framework consists in the following
(1) Traffic processmodule for clustering captured packetsinto different flow buffer
(2) Flow-based feature extraction module for generatingstatistical characteristics of flow
(3) Conversation-based feature selection module forextracting promising conversation-based feature set
(4) Botnet detectionmodule for identifying botnet trafficusing machine learning algorithm
Packet process module is used to extract the requiredfields out of the packets After the extraction of the desiredinformation from the packet process module the flow-basedfeature extractionmodule is used for generating flow featuresBased on the flow features conversation-based feature selec-tion module can obtain promising conversation feature setfor the botnet detection module Botnet traffic detection isaccomplished using supervised classification algorithm [22]
31 Traffic Process Module Libcap [23] is used for sniffingthe packets from the network interface due to its simpleoperation and cross-platform After the NIC captures thepackets Libcap copies packets from the driver to kernel-levelusing DMA in order to filter themThen Libcap copy filteredpackets into application in user level for further analyzingwhereasmultiple copies of Libcap imposemore overhead andconsume more time The mechanics of Libcap makes packetloss and do not reduce the user session PF_RING [24] is anew network socket that uses NewApplication ProgrammingInterface (NAPI) and zero copy to capture packet data froma live network Thus PF_RING is used to capture trafficonto successive pcaps The detailed packet capture process isshown in Figure 1
First the kernel layer of the packet process modulereads the configuration file to set the parameter values likepacket length ClusterId ClusterId is the ID of Ring Buffercreated by PF_RING Parameter values are stored in theconfiguration file so that we can modify them at any timeSecond network devices are turned on and Ring Buffer iscreated using pfring_open (device snaplen flags) functionof PF_RING where device denotes the name of the networkdevice snaplen denotes the packet length and flags denotes
Mathematical Problems in Engineering 3
Flowexistence
Create flow andupdate status
Flag is0x02
Update flowinformation
End
Drop packet
No
No
FIN or RST value is 1
Outputcomplete flow
Yes
YesYes
Strat
Figure 2 The packet process module architecture
whether it is in mixed mode Here we set snaplen value as60 because header fields of a packet are needed in this paperThird we save the header information payload length andarrival time of a packet in different flow buffer according tofive tuples (SrcIp DstIp SrcPort DstPort Proto) which isused to mark a flow That is if two different packets havethe same sourcedestination hostport and the same protocolthey belong to the same flow
32 Flow-Based Feature Extraction Module There are differ-ent flow reorganization methods for different transport layerprotocols Using TCP packets as an example we use a three-way handshake to represent the start of a flowWhen a packetwhose FIN or RST value is 1 comes the end of this flowis marked The detailed TCP flow reorganization process isshown in Figure 2
When a packet comes we decide whether the flow thispacket belongs to exists If a packet whose flag value is 0x02and the flow does not exist we create a flow according to Ipprotocol and port When the flag of the packet takes othervalues this packet needs to be dropped An instance of a flowreorganization state machine can be in only one of the fivestates handshake_1 handshake_2 handshake_3 data trans-mission and end If a packet whose flag value is 0x02 thisprocess is in the status of handshake_1 Only when a packetwhose flag value is 0x12 is coming the flow reorganizationwill be in handshake_2 status Then the arrival of a packetwhose flag value is 0x10 marks handshake_3 status Afterthe three-way handshake data begins transmitting In theprocedure of flow reorganization whenever there is a packetwhose flag value is 0x02 it turns back to the handshake_2status
After analyzing the data characters of a botnet we findthat there is a flow similarity of the same botnet Herea conversation contains many flows with different sourceports That is two flows having the same sourcedestinationIP destination port and protocol can be classified as thesame conversation Promising conversation feature gener-ating is based on the flow features Thus the flow-basedfeature module extracts statistical features including flowduration the average interval of up (down) flow the maxi-malminimumaverage length of up (down) flow the numberof valid up (down) packets in a flow the number of trans-mission bytes of up (down) flows and the number of smallpackets in a flow
33 Conversation-Based Feature Selecting Module
331 Conversation Features
(1) The Duration Time of Flows in a Conversation Thecommunication between the botmaster and other bot hostsis done by bots Thus the duration time of botnet flowis usually fixed and short However the duration time ofnormal flow is determined by user behaviors Here we canextract the average duration time of flows in a conversation(avg_duration) the minimum and maximum duration timeof flows in a conversation (min_durationmax_duration) thestandard deviation of duration time of flows in a conversation(std_duration) and the average arriving intervals of upand down flows in a conversation (avg_finter avg_binter)Assuming that there are 119899 flows in a conversation avg_finter= (avg_finter_1198911+sdot sdot sdot+avg_finter_119891119899)119899 where avg_finter_1198911denotes the average arriving intervals of up packets in the firstflow
(2) The Distribution of Flows in Conversation During thecommunication process among nodes in a botnet the sizeand the number of transmitted data packets are small AndCampC communication flows produced from bot hosts in thesame botnet have great similarity [25] Thus we extractthe average length of up and down flows in a conversa-tion (avg_fpkl avg_bpkl) the minimum length of up anddown flows in a conversation (min_fpkl min_bpkl) themaximum length of up and down flows in a conversation(max_fpkl max_bpkl) the standard variation of the lengthof up and down flows in a conversation (std_avg_fpklstd_avg_bpkl) the average number of valid up and downflows in a conversation (avg_fpks avg_bpks) the standardvariation of the number of valid up and down flows ina conversation (std_avg_fpks std_avg_bpks) the averagenumber of transmission bytes of up and down flows ina conversation (avg_fpksl avg_bpksl) and the standardvariation of transmission bytes of up and down flows in aconversation (std_fpksl std_bpksl)
(3) The Distribution of Small Packets in Conversation Thereare many packets within the range of 40320 bytes [26] in thebotnet traffic because bots need to constantly connect to newhosts However a packet size of the benign server traffic is
4 Mathematical Problems in Engineering
largeThus the distribution of small packet in a conversationis an interesting characteristic of botnet traffic like the min-imum of small packet in a conversation (min_spacket) themaximum of small packet in a conversation (max_spacket)the average of small packet in a conversation (avg_spacket)and the standard variance of small packet in a conversation(std_spacket) In conclusion there are 26 features extractedfrom conversations which are shown in Table 1
332 Feature Selection We use random forest algorithm [12]to select promising features All the classification trees inrandom forest is binary tree Construction of classificationtree meets the principle of recursive splitting from top tobottom For each classification binary tree all the train setthey used is sampled from the original dataset In otherwords several samples in the original train set may appearmany times in the train set of one classification tree and maynever appear in any classification tree samples Algorithm 1describes how to construct a random forest algorithm indetail
In the procedure of random forest model establishmentGini coefficient is used to select feature Here are 2 classesthus the value of 119870 is 2 We suppose that feature 119860_119894 (119894 =1 26) splits dataset 119863 into 119873 parts 1198631 119863119899 On thecondition of the feature 119860_119894 Gini coefficient of dataset 119863 isshown as follows
Gini (119863 119860_119894) = 119873sum119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| Gini (119863119899)
Gini (119863119899) = 1 minus119870sum119896=1
( 1003816100381610038161003816119862119896100381610038161003816100381610038161003816100381610038161198631198991003816100381610038161003816)2
(1)
Then we select promising features according to randomforest model The feature selection process is shown inAlgorithm 2 In the algorithm input data with 27 columnsincludes 26 conversation features and a class label
In every iteration we first rank features according totheir importance and then delete the feature with minimumvalue until detection rate no longer changes The formulafor calculating an RF score of features is shown in (2) Inthe following equation there is119872 decision tree with feature119860_119894 Gini(119863 119860_119894) indicatesGini coefficient of dataset119863 usingfeature 119860_119894 in the current decision tree
RF_score (119860_119894) = 119872sum119898=1
119870 lowast 119873minus119898prod119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| minus Gini (119863 119860_119894) (2)
Depending on the following random forest model thedetection rate is generated using testing data In this workwe use the features including std_bpksl std_avg_fpklstd_avg_fpks std_f(b)pksl std spacket Feature vectorsconstructed in this paper include SrcIp DstIp DstPortPro std_bpksl std_avg_fpkl std_avg_fpks std_f(b)pkslstd_spacket and using SrcIp DstIp DstPort Proto repre-sents the conversion of visiting the same service
Table 1 Conversation features
Feature value Description of feature value
avg_duration The average duration timeof flows in a conversation
min_durationThe minimum duration
time of flows in aconversation
max_durationThe maximum duration
time of flows in aconversation
std_durationThe standard deviation ofduration time of flows in a
conversation
avg_f(b)interThe average interval of up
(down) flows in aconversation
avg_f(b)pklThe average length of upand down flows in a
conversation
min_f(b)pklThe minimum length of up
(down) flows in aconversation
max_f(b)pklThe maximum length of up
(down) flows in aconversation
std_avg_f(b)pklThe standard variation ofthe length of up (down)flows in a conversation
avg_f(b)pksThe average number of up(down) valid flows in a
conversation
std_avg_f(b)pks
The standard variation ofthe number of up (down)
valid flows in aconversation
avg_f(b)pkslThe average of transmissionbytes of up (down) flows in
a conversation
std_f(b)pksl
The standard variation oftransmission bytes of up
(down) flows in aconversation
min_spacket The minimum of smallpacket in a conversation
max_spacket The maximum of smallpacket in a conversation
avg_spacket The average of small packetin a conversation
std_spacketThe standard variance of
small packet in aconversation
34 Botnet Detection Module In order to achieve scal-ability in botnet detection module we use API pro-vided by Weka to implement machine learning algo-rithms [14] The conversation feature need be saved inCSV format at the conversation-based feature selecting
Mathematical Problems in Engineering 5
Input data a labeled dataset with 119901 featuresOutput PF a random forest model(1) 119899 larr the number of decision trees119898 larr the number of selected features(2) initialization119898 = 119872 119899 = 119873 119894 = 1(3) while 119894 le 119873 do(4) draw a bootstrap sample 119885lowast of size 119878 from data(5) repeat(6) select119872 features at random from the 119901 features(7) calculate the Gini coefficient of selected119872 features(8) select the feature with lower Gini coefficient among the119872(9) split the node into two daughter nodes(10) until the minimum node size is reached(11) construct decision tree 119894(12) 119894 = 119894 + 1(13) end while
Algorithm 1 Random forest algorithm
Input train_data a labeled training set test_data a labeled testing setOutput PF a list of promising features(1) 120575 larr an error range 119861119863_119897 larr the botnet traffic detection rate 119861119863_119888 larrthe current botnet traffic detection rate(2) initialization 119894 = 1 120575 = Θ(3) 119861119863_119897 = 119861119863_119888 = Randonforest (all features)(4) while |119861119863_119888 minus 119861119863_119897| le Θ do(5) 119861119863_119897 = 119861119863_119888(6) calculate RF scores of importance(7) rank the RF scores(8) delete the feature with the smallest importance from train_data and test_data(9) 119861119863_119888 = randomforest (remaining_features)(10) end while
Algorithm 2 Feature selection algorithm
module First the module reads feature vectors usingwekacoreconvertersConverterUtilsDataSource Howeversome features like SrcIp DstIp DstPort and Proto haveno efficiency in identifying botnet traffic Second thismodule deletes them using wekacoreInstances Third weuse random forest algorithm to train these data throughwekaclassifierstreesRandomForest Fourth this moduleuses the trained classifier to predict unlabeled data by call-ing classifyInstance(unlabeledinstance (119894)) function Hereldquounlabeledrdquo denotes testing data without a label
4 Experimental Results and PerformanceAnalysis
41 Experimental Setup Famous public datasets used todetect botnet traffic include dataset disclosed from Informa-tion Security and Object Technology (ISOT) organization[25] Stratosphere [27] and the CTU University [28] Thedataset from Stratosphere contains many types of botnetbehaviors traffic such as the traffic of scanned port traffic
of CampC communication and attack traffic However mostbotnet traffic of this dataset is IRC and HTTP botnet andthere is only one type of P2P botnet traffic The dataset fromISOT only contains three types of botnet traffic WaledacStorm and Zeus and many background traffic The datasetfrom the CTU University consists of thirteen scenarios ofdifferent botnet samples Thus in the experiment we usedataset fromCTUUniversity And the distributions of botnettypes about training and test in our experiment are listed inTables 2 and 3 For example Rbot contains three types ofbotnets namely IRC DDoS and the US
42 The Results and Analysis of Experiments During theprocess of experiment we assess our detection method byadopting the train set and test set from CTU13 The CUT13dataset provides a better test environment for unknownbotnet because this test set contains many types of botnettraffic which do not exist in the training set
The effectiveness of the top five classifiers namely ran-dom forest REPTree randomTree BayesNet and Decision-Tump [29] has been studied with the CTU botnet traffic and
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
Flowexistence
Create flow andupdate status
Flag is0x02
Update flowinformation
End
Drop packet
No
No
FIN or RST value is 1
Outputcomplete flow
Yes
YesYes
Strat
Figure 2 The packet process module architecture
whether it is in mixed mode Here we set snaplen value as60 because header fields of a packet are needed in this paperThird we save the header information payload length andarrival time of a packet in different flow buffer according tofive tuples (SrcIp DstIp SrcPort DstPort Proto) which isused to mark a flow That is if two different packets havethe same sourcedestination hostport and the same protocolthey belong to the same flow
32 Flow-Based Feature Extraction Module There are differ-ent flow reorganization methods for different transport layerprotocols Using TCP packets as an example we use a three-way handshake to represent the start of a flowWhen a packetwhose FIN or RST value is 1 comes the end of this flowis marked The detailed TCP flow reorganization process isshown in Figure 2
When a packet comes we decide whether the flow thispacket belongs to exists If a packet whose flag value is 0x02and the flow does not exist we create a flow according to Ipprotocol and port When the flag of the packet takes othervalues this packet needs to be dropped An instance of a flowreorganization state machine can be in only one of the fivestates handshake_1 handshake_2 handshake_3 data trans-mission and end If a packet whose flag value is 0x02 thisprocess is in the status of handshake_1 Only when a packetwhose flag value is 0x12 is coming the flow reorganizationwill be in handshake_2 status Then the arrival of a packetwhose flag value is 0x10 marks handshake_3 status Afterthe three-way handshake data begins transmitting In theprocedure of flow reorganization whenever there is a packetwhose flag value is 0x02 it turns back to the handshake_2status
After analyzing the data characters of a botnet we findthat there is a flow similarity of the same botnet Herea conversation contains many flows with different sourceports That is two flows having the same sourcedestinationIP destination port and protocol can be classified as thesame conversation Promising conversation feature gener-ating is based on the flow features Thus the flow-basedfeature module extracts statistical features including flowduration the average interval of up (down) flow the maxi-malminimumaverage length of up (down) flow the numberof valid up (down) packets in a flow the number of trans-mission bytes of up (down) flows and the number of smallpackets in a flow
33 Conversation-Based Feature Selecting Module
331 Conversation Features
(1) The Duration Time of Flows in a Conversation Thecommunication between the botmaster and other bot hostsis done by bots Thus the duration time of botnet flowis usually fixed and short However the duration time ofnormal flow is determined by user behaviors Here we canextract the average duration time of flows in a conversation(avg_duration) the minimum and maximum duration timeof flows in a conversation (min_durationmax_duration) thestandard deviation of duration time of flows in a conversation(std_duration) and the average arriving intervals of upand down flows in a conversation (avg_finter avg_binter)Assuming that there are 119899 flows in a conversation avg_finter= (avg_finter_1198911+sdot sdot sdot+avg_finter_119891119899)119899 where avg_finter_1198911denotes the average arriving intervals of up packets in the firstflow
(2) The Distribution of Flows in Conversation During thecommunication process among nodes in a botnet the sizeand the number of transmitted data packets are small AndCampC communication flows produced from bot hosts in thesame botnet have great similarity [25] Thus we extractthe average length of up and down flows in a conversa-tion (avg_fpkl avg_bpkl) the minimum length of up anddown flows in a conversation (min_fpkl min_bpkl) themaximum length of up and down flows in a conversation(max_fpkl max_bpkl) the standard variation of the lengthof up and down flows in a conversation (std_avg_fpklstd_avg_bpkl) the average number of valid up and downflows in a conversation (avg_fpks avg_bpks) the standardvariation of the number of valid up and down flows ina conversation (std_avg_fpks std_avg_bpks) the averagenumber of transmission bytes of up and down flows ina conversation (avg_fpksl avg_bpksl) and the standardvariation of transmission bytes of up and down flows in aconversation (std_fpksl std_bpksl)
(3) The Distribution of Small Packets in Conversation Thereare many packets within the range of 40320 bytes [26] in thebotnet traffic because bots need to constantly connect to newhosts However a packet size of the benign server traffic is
4 Mathematical Problems in Engineering
largeThus the distribution of small packet in a conversationis an interesting characteristic of botnet traffic like the min-imum of small packet in a conversation (min_spacket) themaximum of small packet in a conversation (max_spacket)the average of small packet in a conversation (avg_spacket)and the standard variance of small packet in a conversation(std_spacket) In conclusion there are 26 features extractedfrom conversations which are shown in Table 1
332 Feature Selection We use random forest algorithm [12]to select promising features All the classification trees inrandom forest is binary tree Construction of classificationtree meets the principle of recursive splitting from top tobottom For each classification binary tree all the train setthey used is sampled from the original dataset In otherwords several samples in the original train set may appearmany times in the train set of one classification tree and maynever appear in any classification tree samples Algorithm 1describes how to construct a random forest algorithm indetail
In the procedure of random forest model establishmentGini coefficient is used to select feature Here are 2 classesthus the value of 119870 is 2 We suppose that feature 119860_119894 (119894 =1 26) splits dataset 119863 into 119873 parts 1198631 119863119899 On thecondition of the feature 119860_119894 Gini coefficient of dataset 119863 isshown as follows
Gini (119863 119860_119894) = 119873sum119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| Gini (119863119899)
Gini (119863119899) = 1 minus119870sum119896=1
( 1003816100381610038161003816119862119896100381610038161003816100381610038161003816100381610038161198631198991003816100381610038161003816)2
(1)
Then we select promising features according to randomforest model The feature selection process is shown inAlgorithm 2 In the algorithm input data with 27 columnsincludes 26 conversation features and a class label
In every iteration we first rank features according totheir importance and then delete the feature with minimumvalue until detection rate no longer changes The formulafor calculating an RF score of features is shown in (2) Inthe following equation there is119872 decision tree with feature119860_119894 Gini(119863 119860_119894) indicatesGini coefficient of dataset119863 usingfeature 119860_119894 in the current decision tree
RF_score (119860_119894) = 119872sum119898=1
119870 lowast 119873minus119898prod119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| minus Gini (119863 119860_119894) (2)
Depending on the following random forest model thedetection rate is generated using testing data In this workwe use the features including std_bpksl std_avg_fpklstd_avg_fpks std_f(b)pksl std spacket Feature vectorsconstructed in this paper include SrcIp DstIp DstPortPro std_bpksl std_avg_fpkl std_avg_fpks std_f(b)pkslstd_spacket and using SrcIp DstIp DstPort Proto repre-sents the conversion of visiting the same service
Table 1 Conversation features
Feature value Description of feature value
avg_duration The average duration timeof flows in a conversation
min_durationThe minimum duration
time of flows in aconversation
max_durationThe maximum duration
time of flows in aconversation
std_durationThe standard deviation ofduration time of flows in a
conversation
avg_f(b)interThe average interval of up
(down) flows in aconversation
avg_f(b)pklThe average length of upand down flows in a
conversation
min_f(b)pklThe minimum length of up
(down) flows in aconversation
max_f(b)pklThe maximum length of up
(down) flows in aconversation
std_avg_f(b)pklThe standard variation ofthe length of up (down)flows in a conversation
avg_f(b)pksThe average number of up(down) valid flows in a
conversation
std_avg_f(b)pks
The standard variation ofthe number of up (down)
valid flows in aconversation
avg_f(b)pkslThe average of transmissionbytes of up (down) flows in
a conversation
std_f(b)pksl
The standard variation oftransmission bytes of up
(down) flows in aconversation
min_spacket The minimum of smallpacket in a conversation
max_spacket The maximum of smallpacket in a conversation
avg_spacket The average of small packetin a conversation
std_spacketThe standard variance of
small packet in aconversation
34 Botnet Detection Module In order to achieve scal-ability in botnet detection module we use API pro-vided by Weka to implement machine learning algo-rithms [14] The conversation feature need be saved inCSV format at the conversation-based feature selecting
Mathematical Problems in Engineering 5
Input data a labeled dataset with 119901 featuresOutput PF a random forest model(1) 119899 larr the number of decision trees119898 larr the number of selected features(2) initialization119898 = 119872 119899 = 119873 119894 = 1(3) while 119894 le 119873 do(4) draw a bootstrap sample 119885lowast of size 119878 from data(5) repeat(6) select119872 features at random from the 119901 features(7) calculate the Gini coefficient of selected119872 features(8) select the feature with lower Gini coefficient among the119872(9) split the node into two daughter nodes(10) until the minimum node size is reached(11) construct decision tree 119894(12) 119894 = 119894 + 1(13) end while
Algorithm 1 Random forest algorithm
Input train_data a labeled training set test_data a labeled testing setOutput PF a list of promising features(1) 120575 larr an error range 119861119863_119897 larr the botnet traffic detection rate 119861119863_119888 larrthe current botnet traffic detection rate(2) initialization 119894 = 1 120575 = Θ(3) 119861119863_119897 = 119861119863_119888 = Randonforest (all features)(4) while |119861119863_119888 minus 119861119863_119897| le Θ do(5) 119861119863_119897 = 119861119863_119888(6) calculate RF scores of importance(7) rank the RF scores(8) delete the feature with the smallest importance from train_data and test_data(9) 119861119863_119888 = randomforest (remaining_features)(10) end while
Algorithm 2 Feature selection algorithm
module First the module reads feature vectors usingwekacoreconvertersConverterUtilsDataSource Howeversome features like SrcIp DstIp DstPort and Proto haveno efficiency in identifying botnet traffic Second thismodule deletes them using wekacoreInstances Third weuse random forest algorithm to train these data throughwekaclassifierstreesRandomForest Fourth this moduleuses the trained classifier to predict unlabeled data by call-ing classifyInstance(unlabeledinstance (119894)) function Hereldquounlabeledrdquo denotes testing data without a label
4 Experimental Results and PerformanceAnalysis
41 Experimental Setup Famous public datasets used todetect botnet traffic include dataset disclosed from Informa-tion Security and Object Technology (ISOT) organization[25] Stratosphere [27] and the CTU University [28] Thedataset from Stratosphere contains many types of botnetbehaviors traffic such as the traffic of scanned port traffic
of CampC communication and attack traffic However mostbotnet traffic of this dataset is IRC and HTTP botnet andthere is only one type of P2P botnet traffic The dataset fromISOT only contains three types of botnet traffic WaledacStorm and Zeus and many background traffic The datasetfrom the CTU University consists of thirteen scenarios ofdifferent botnet samples Thus in the experiment we usedataset fromCTUUniversity And the distributions of botnettypes about training and test in our experiment are listed inTables 2 and 3 For example Rbot contains three types ofbotnets namely IRC DDoS and the US
42 The Results and Analysis of Experiments During theprocess of experiment we assess our detection method byadopting the train set and test set from CTU13 The CUT13dataset provides a better test environment for unknownbotnet because this test set contains many types of botnettraffic which do not exist in the training set
The effectiveness of the top five classifiers namely ran-dom forest REPTree randomTree BayesNet and Decision-Tump [29] has been studied with the CTU botnet traffic and
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
largeThus the distribution of small packet in a conversationis an interesting characteristic of botnet traffic like the min-imum of small packet in a conversation (min_spacket) themaximum of small packet in a conversation (max_spacket)the average of small packet in a conversation (avg_spacket)and the standard variance of small packet in a conversation(std_spacket) In conclusion there are 26 features extractedfrom conversations which are shown in Table 1
332 Feature Selection We use random forest algorithm [12]to select promising features All the classification trees inrandom forest is binary tree Construction of classificationtree meets the principle of recursive splitting from top tobottom For each classification binary tree all the train setthey used is sampled from the original dataset In otherwords several samples in the original train set may appearmany times in the train set of one classification tree and maynever appear in any classification tree samples Algorithm 1describes how to construct a random forest algorithm indetail
In the procedure of random forest model establishmentGini coefficient is used to select feature Here are 2 classesthus the value of 119870 is 2 We suppose that feature 119860_119894 (119894 =1 26) splits dataset 119863 into 119873 parts 1198631 119863119899 On thecondition of the feature 119860_119894 Gini coefficient of dataset 119863 isshown as follows
Gini (119863 119860_119894) = 119873sum119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| Gini (119863119899)
Gini (119863119899) = 1 minus119870sum119896=1
( 1003816100381610038161003816119862119896100381610038161003816100381610038161003816100381610038161198631198991003816100381610038161003816)2
(1)
Then we select promising features according to randomforest model The feature selection process is shown inAlgorithm 2 In the algorithm input data with 27 columnsincludes 26 conversation features and a class label
In every iteration we first rank features according totheir importance and then delete the feature with minimumvalue until detection rate no longer changes The formulafor calculating an RF score of features is shown in (2) Inthe following equation there is119872 decision tree with feature119860_119894 Gini(119863 119860_119894) indicatesGini coefficient of dataset119863 usingfeature 119860_119894 in the current decision tree
RF_score (119860_119894) = 119872sum119898=1
119870 lowast 119873minus119898prod119899=1
10038161003816100381610038161198631198991003816100381610038161003816|119863| minus Gini (119863 119860_119894) (2)
Depending on the following random forest model thedetection rate is generated using testing data In this workwe use the features including std_bpksl std_avg_fpklstd_avg_fpks std_f(b)pksl std spacket Feature vectorsconstructed in this paper include SrcIp DstIp DstPortPro std_bpksl std_avg_fpkl std_avg_fpks std_f(b)pkslstd_spacket and using SrcIp DstIp DstPort Proto repre-sents the conversion of visiting the same service
Table 1 Conversation features
Feature value Description of feature value
avg_duration The average duration timeof flows in a conversation
min_durationThe minimum duration
time of flows in aconversation
max_durationThe maximum duration
time of flows in aconversation
std_durationThe standard deviation ofduration time of flows in a
conversation
avg_f(b)interThe average interval of up
(down) flows in aconversation
avg_f(b)pklThe average length of upand down flows in a
conversation
min_f(b)pklThe minimum length of up
(down) flows in aconversation
max_f(b)pklThe maximum length of up
(down) flows in aconversation
std_avg_f(b)pklThe standard variation ofthe length of up (down)flows in a conversation
avg_f(b)pksThe average number of up(down) valid flows in a
conversation
std_avg_f(b)pks
The standard variation ofthe number of up (down)
valid flows in aconversation
avg_f(b)pkslThe average of transmissionbytes of up (down) flows in
a conversation
std_f(b)pksl
The standard variation oftransmission bytes of up
(down) flows in aconversation
min_spacket The minimum of smallpacket in a conversation
max_spacket The maximum of smallpacket in a conversation
avg_spacket The average of small packetin a conversation
std_spacketThe standard variance of
small packet in aconversation
34 Botnet Detection Module In order to achieve scal-ability in botnet detection module we use API pro-vided by Weka to implement machine learning algo-rithms [14] The conversation feature need be saved inCSV format at the conversation-based feature selecting
Mathematical Problems in Engineering 5
Input data a labeled dataset with 119901 featuresOutput PF a random forest model(1) 119899 larr the number of decision trees119898 larr the number of selected features(2) initialization119898 = 119872 119899 = 119873 119894 = 1(3) while 119894 le 119873 do(4) draw a bootstrap sample 119885lowast of size 119878 from data(5) repeat(6) select119872 features at random from the 119901 features(7) calculate the Gini coefficient of selected119872 features(8) select the feature with lower Gini coefficient among the119872(9) split the node into two daughter nodes(10) until the minimum node size is reached(11) construct decision tree 119894(12) 119894 = 119894 + 1(13) end while
Algorithm 1 Random forest algorithm
Input train_data a labeled training set test_data a labeled testing setOutput PF a list of promising features(1) 120575 larr an error range 119861119863_119897 larr the botnet traffic detection rate 119861119863_119888 larrthe current botnet traffic detection rate(2) initialization 119894 = 1 120575 = Θ(3) 119861119863_119897 = 119861119863_119888 = Randonforest (all features)(4) while |119861119863_119888 minus 119861119863_119897| le Θ do(5) 119861119863_119897 = 119861119863_119888(6) calculate RF scores of importance(7) rank the RF scores(8) delete the feature with the smallest importance from train_data and test_data(9) 119861119863_119888 = randomforest (remaining_features)(10) end while
Algorithm 2 Feature selection algorithm
module First the module reads feature vectors usingwekacoreconvertersConverterUtilsDataSource Howeversome features like SrcIp DstIp DstPort and Proto haveno efficiency in identifying botnet traffic Second thismodule deletes them using wekacoreInstances Third weuse random forest algorithm to train these data throughwekaclassifierstreesRandomForest Fourth this moduleuses the trained classifier to predict unlabeled data by call-ing classifyInstance(unlabeledinstance (119894)) function Hereldquounlabeledrdquo denotes testing data without a label
4 Experimental Results and PerformanceAnalysis
41 Experimental Setup Famous public datasets used todetect botnet traffic include dataset disclosed from Informa-tion Security and Object Technology (ISOT) organization[25] Stratosphere [27] and the CTU University [28] Thedataset from Stratosphere contains many types of botnetbehaviors traffic such as the traffic of scanned port traffic
of CampC communication and attack traffic However mostbotnet traffic of this dataset is IRC and HTTP botnet andthere is only one type of P2P botnet traffic The dataset fromISOT only contains three types of botnet traffic WaledacStorm and Zeus and many background traffic The datasetfrom the CTU University consists of thirteen scenarios ofdifferent botnet samples Thus in the experiment we usedataset fromCTUUniversity And the distributions of botnettypes about training and test in our experiment are listed inTables 2 and 3 For example Rbot contains three types ofbotnets namely IRC DDoS and the US
42 The Results and Analysis of Experiments During theprocess of experiment we assess our detection method byadopting the train set and test set from CTU13 The CUT13dataset provides a better test environment for unknownbotnet because this test set contains many types of botnettraffic which do not exist in the training set
The effectiveness of the top five classifiers namely ran-dom forest REPTree randomTree BayesNet and Decision-Tump [29] has been studied with the CTU botnet traffic and
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
Input data a labeled dataset with 119901 featuresOutput PF a random forest model(1) 119899 larr the number of decision trees119898 larr the number of selected features(2) initialization119898 = 119872 119899 = 119873 119894 = 1(3) while 119894 le 119873 do(4) draw a bootstrap sample 119885lowast of size 119878 from data(5) repeat(6) select119872 features at random from the 119901 features(7) calculate the Gini coefficient of selected119872 features(8) select the feature with lower Gini coefficient among the119872(9) split the node into two daughter nodes(10) until the minimum node size is reached(11) construct decision tree 119894(12) 119894 = 119894 + 1(13) end while
Algorithm 1 Random forest algorithm
Input train_data a labeled training set test_data a labeled testing setOutput PF a list of promising features(1) 120575 larr an error range 119861119863_119897 larr the botnet traffic detection rate 119861119863_119888 larrthe current botnet traffic detection rate(2) initialization 119894 = 1 120575 = Θ(3) 119861119863_119897 = 119861119863_119888 = Randonforest (all features)(4) while |119861119863_119888 minus 119861119863_119897| le Θ do(5) 119861119863_119897 = 119861119863_119888(6) calculate RF scores of importance(7) rank the RF scores(8) delete the feature with the smallest importance from train_data and test_data(9) 119861119863_119888 = randomforest (remaining_features)(10) end while
Algorithm 2 Feature selection algorithm
module First the module reads feature vectors usingwekacoreconvertersConverterUtilsDataSource Howeversome features like SrcIp DstIp DstPort and Proto haveno efficiency in identifying botnet traffic Second thismodule deletes them using wekacoreInstances Third weuse random forest algorithm to train these data throughwekaclassifierstreesRandomForest Fourth this moduleuses the trained classifier to predict unlabeled data by call-ing classifyInstance(unlabeledinstance (119894)) function Hereldquounlabeledrdquo denotes testing data without a label
4 Experimental Results and PerformanceAnalysis
41 Experimental Setup Famous public datasets used todetect botnet traffic include dataset disclosed from Informa-tion Security and Object Technology (ISOT) organization[25] Stratosphere [27] and the CTU University [28] Thedataset from Stratosphere contains many types of botnetbehaviors traffic such as the traffic of scanned port traffic
of CampC communication and attack traffic However mostbotnet traffic of this dataset is IRC and HTTP botnet andthere is only one type of P2P botnet traffic The dataset fromISOT only contains three types of botnet traffic WaledacStorm and Zeus and many background traffic The datasetfrom the CTU University consists of thirteen scenarios ofdifferent botnet samples Thus in the experiment we usedataset fromCTUUniversity And the distributions of botnettypes about training and test in our experiment are listed inTables 2 and 3 For example Rbot contains three types ofbotnets namely IRC DDoS and the US
42 The Results and Analysis of Experiments During theprocess of experiment we assess our detection method byadopting the train set and test set from CTU13 The CUT13dataset provides a better test environment for unknownbotnet because this test set contains many types of botnettraffic which do not exist in the training set
The effectiveness of the top five classifiers namely ran-dom forest REPTree randomTree BayesNet and Decision-Tump [29] has been studied with the CTU botnet traffic and
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
Dec
ision
Tum
p
Dec
ision
Tum
p
Dec
ision
Tum
p
Baye
sNet
Baye
sNet
Baye
sNet
REPT
ree
REPT
ree
REPT
ree
rand
omTr
ee
rand
omTr
eera
ndom
Tree
Rand
omfo
rest
Rand
omfo
rest
Rand
omfo
rest
0
05
1
0
005
01
015
0
05
1
0
005
01
015
02
Rand
omfo
rest
rand
omTr
ee
REPT
ree
Baye
sNet
Dec
ision
Tum
p
A_T
PB_
FP
B_TP
B_FN
Figure 3 Detection rate of the top five classifiers
Table 2 Distribution of botnet types in the training dataset
Botnet name Type Portion of datasetRbot IRC DDoS US 01Virut SPAM PS HTTP 0485Menti PS 389Sogou HTTP 0035Murlo PS 164Neris IRC SPAM CF PS 313
Table 3 Distribution of botnet types in the testing dataset
Botnet name Type Portion of datasetNeris IRC SPAM CF 321Rbot IRC PS US 2646Rbot IRC DDoS US 0088Virut SPAM PS HTTP 04Menti PS 333Sogou HTTP 0036Murlo PS 14Neris IRC SPAM CF PS 289NSISay P2P 171Virut SPAM PS HTTP 107
normal traffic generated by benign programs The detailedcontrast tests are done in WEKA in terms of A_TP B_TPB_FP and B_TN explained as follows the ratio of benign
traffic and botnet traffic recognized correctly the ratio ofbotnet traffic detected as botnet conversation the ratio ofbegin traffic classified as botnet traffic and the ratio of botnettraffic identified as normal trafficThey are defined as follows
A_TP = TP + TNTM + TB
B_TP = TPTM
B_FP = TNTP
B_TN = FNTM
(3)
where true positive (TP) indicates that the number of bot-net conversations is correctly classified true negative (TN)indicates that the number of benign traffic conversationsis correctly classified false positive (FP) expresses that thenumber of benign traffics is detected as botnet traffic falsenegative (FN) indicates that the number of botnet traffics isdetected as benign traffic TM indicates the total number ofbotnets and TB expresses the total number of benign traffics
The experiment result is shown in Figure 3The whole recognition rate of DecisionTump is the lowest
because there is a one-level decision tree in the Decision-Tump Random forest algorithm selects variables automati-cally during the model formation and establishes the optimaldiscriminantmodelThus the detection rate of random forestalgorithm is the highest Meanwhile random forest has alower false positive and false negative rates than the otherfour Moreover there is no obvious difference among the
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
Table 4 Experimental parameters settings
Transmit speed (Gbps)Internal time (s)
30 60The number of flows The number of conversations The number of flows The number of conversations
1 138825 39734 203741 6138010 261630 92933 452930 158722
Conversation Flow0
02
04
06
08
1
B_TP
B_TNB_FP
Figure 4 Detection effect of flow-based and conversation-basedfeatures
detection effect of BayesNet REPTree and randomTree Thebotnet traffic detection of Decision-Tump is 844 Howeverthe detection accuracy of the other four algorithms is morethan 90The false positive rate and true negative rate of thetop five algorithm are under 10 except for DecisionTump
Kirubavathi and Anitha [20] proposed a botnet detectionmethod via mining of traffic flow characteristics In theirwork they used features like small packets packet ratioinitial packet length and bot response packet to identifybotnet traffic Here we compare the detection rates of flowfeature and conversation feature The result is shown inFigure 4
As it can be seen from Figure 4 the false positives rateof conversation-based detection and flow-based detection is03 and 135 respectively Thus the experimental resultsshow that the false positives rate of our proposed methoddecreases more than ten times Meanwhile the botnet iden-tification rate of our method does not reduce
In theory the higher the number of classification treesthe higher the classification accuracy rate However if thenumber and depth of classification tree are extremely highthey will reversely affect the classification speed of classifierIn order to determine the two parameter values of the numberand depth of classification tree from random forest algorithmin this paper we analyze the influence on the classificationaccuracy by adjusting parameters In the experiment thenumber of classification trees can be set as 10 50 100 and200 and the depth of each classification tree can be set as2 4 10 20 and so forth The experiment results of differentclassification tree size and different classification tree depthare shown in Figure 5
When the number of the classification trees is 100 andthe depth is 10 the detection rate of random forest algorithm
reaches the maximum Afterward regardless of increasingthe number or the depth of the classification trees thedetection rate does not increase anymore Thus when thenumber of the classification trees is set as 100 and the depthof classification tree is set as 10 in the experiment the randomforest works the best
43Online P2PBotnet TrafficDetection Platform Our frame-work has been implemented in Python and utilizes MicrosoftNetworkMonitor to capture packets from a network interfaceor a pcap file Because the timeout value of TCPUDP packetsis 60 s we set the time window as 60 s in this paper to extractconversation feature While we experimented with differenttime window settings the 60-second time window showedthe best accuracy at considerably low computational com-plexity In the high-speed network environment we count thenumber of conversations and the data flows contained in theinterval of 60 s and gather the 1 Gbps and 10Gbps networkin many times The interval of the gathering is 60 s and 30 sand then we compute the average value The result is shownin Table 4 In Table 4 119879 stands for time 119878 stands for speedand conversa stands for conversation
According to Table 4 in the 10Gbps network and theinterval of 60 s the average number of passing flows is452930 and the average number of conversations is only158722 Thus using the conversation features can greatlyreduce the number of feature vectors The reason of that isa conversation consists of any number of flows that have thesame sourcedestination hostport and the same protocolAccording to the foregoing experiments we can see thatthe time of using random forest algorithm to detect 204711feature vectors is 271 seconds Thus half real-time botnetdetection platform based on random forest classifier andconversation features can identify botnet traffic under thehigh-speed network environment
5 Conclusion and Future Work
In this paper we propose an efficient botnet traffic detectionsystem which can handle heavy network bandwidths Ourframework utilizes PF_RING to solve the high packet droprate of Libcap RF-RING has low latency and low overheadto extract required fields of traffic Then feature selection isconducted to reduce the dimensionality of data Conversationfeatures combine the advantages of the existing detectionmethods based on flow statistical behaviors and flow sim-ilarity We select promising features using random forestalgorithm in order to reduce the feature dimension This
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
084086088
09092094096
20 18 16 20018014 16012Classification tree depth
14010 120
Classification tree size1008 806 60404 202 0
0002004006008
01012
B_TP
B_TN
Figure 5 Detection rate for different number and different depth of classification trees
framework selects the machine learning which obtained thebest learning performance The experiments are conductedon the offline public dataset and online real data Theexperimental results show that conversation features used inthis paper behave better than flow features in the CTU13 opensource dataset Among all the classification algorithms thedetection rate of random forest is the highest which is upto 936 And the false alarm rate is only 03 which is tentimes less than detection based on traffic flow characteristics
The future work will focus on mining association rulesaccording to our proposed conversation features Moreoverwe need to further identify specific botnet categories in orderto design corresponding defense plans
Conflicts of Interest
The authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (Grant nos 61572115 61502086 and61402080) and the Key Basic Research of Sichuan Province(Grant no 2016JY0007)
References
[1] Z Zhu G Lu Y Chen Z J Fu P Roberts and K Han ldquoBotnetresearch surveyrdquo in Proceedings of the 32nd Annual IEEEInternational Computer Software and Applications Conference(COMPSAC rsquo08) pp 967ndash972 IEEE August 2008
[2] C Mazzariello ldquoIRC traffic analysis for botnet detectionrdquo inProceedings of the 4th International Conference on Information
Assurance and Security (IAS rsquo08) pp 318ndash323 IEEE September2008
[3] J-S Lee H C Jeong J-H Park M Kim and B-N Noh ldquoTheactivity analysis of malicious http-based botnets using degreeof periodic repeatabilityrdquo in Proceedings of the InternationalConference on Security Technology (SECTECH rsquo08) pp 83ndash86IEEE December 2008
[4] W Zhou and X Wu ldquoSurvey of p2p technologiesrdquo ComputerEngineering and Design vol 27 no 1 pp 76ndash79 2006
[5] H R Zeidanloo and A A Manaf ldquoBotnet command and con-trol mechanismsrdquo in Proceedings of the International Conferenceon Computer and Electrical Engineering (ICCEE rsquo09) pp 564ndash568 IEEE December 2009
[6] D Dittrich and S Dietrich ldquoP2P as botnet command andcontrol a deeper insightrdquo in Proceedings of the 3rd InternationalConference on Malicious and Unwanted Software (MALWARErsquo08) pp 41ndash48 IEEE October 2008
[7] M Feily A Shahrestani and S Ramadass ldquoA survey of botnetand botnet detectionrdquo in Proceedings of the 3rd InternationalConference on Emerging Security Information Systems andTechnologies (SECURWARE rsquo09) pp 268ndash273 IEEE June 2009
[8] R Villamarın-Salomon and J C Brustoloni ldquoBayesian botdetection based on DNS traffic similarityrdquo in Proceedings of the24th Annual ACM Symposium on Applied Computing (SAC rsquo09)pp 2035ndash2041 ACM March 2009
[9] S Arshad M Abbaspour M Kharrazi and H SanatkarldquoAn anomaly-based botnet detection approach for identifyingstealthy botnetsrdquo in Proceedings of the IEEE InternationalConference on Computer Applications and Industrial Electronics(ICCAIE rsquo11) pp 564ndash569 IEEE December 2011
[10] M N Sakib and C-T Huang ldquoUsing anomaly detectionbased techniques to detect HTTP-based botnet CampC trafficrdquo inProceedings of the IEEE International Conference on Communi-cations (ICC rsquo16) pp 1ndash6 IEEE Kuala Lumpur Malaysia May2016
[11] P V Amoli and THamalainen ldquoA real time unsupervisedNIDSfor detecting unknown and encrypted network attacks in high
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
speed networkrdquo in Proceedings of the 2nd IEEE InternationalWorkshop on Measurements and Networking (M amp N rsquo13) pp149ndash154 IEEE October 2013
[12] K Singh S C Guntuku A Thakur and C Hota ldquoBig dataanalytics framework for peer-to-peer botnet detection usingrandom forestsrdquo Information Sciences vol 278 pp 488ndash4972014
[13] S Kalmegh ldquoAnalysis of WEKA data mining algorithm REP-Tree simple CART and RandomTree for classification of Indiannewsrdquo International Journal of Innovative Science Engineeringand Technology vol 2 no 2 pp 438ndash446 2015
[14] M Hall E Frank G Holmes B Pfahringer P Reutemann andI H Witten ldquoThe weka data mining softwarerdquo ACM SIGKDDExplorations Newsletter vol 11 no 1 pp 10ndash18 2009
[15] S Saad I Traore A Ghorbani et al ldquoDetecting P2P botnetsthrough network behavior analysis and machine learningrdquo inProceedings of the 9th Annual International Conference on Pri-vacy Security and Trust (PST rsquo11) pp 174ndash180 IEEE MontrealCanada July 2011
[16] M R Rostami B Shanmugam and N B Idris ldquoAnalysis anddetection of P2P botnet connections based on node behaviourrdquoin Proceedings of the World Congress on Information andCommunication Technologies (WICT rsquo11) pp 928ndash933 IEEEDecember 2011
[17] H Zhang M Gharaibeh S Thanasoulas and C Papadopou-los ldquoBotdigger detecting DGA bots in a single networkrdquoin Proceedings of the IEEE International Workshop on TrafficMonitoring and Analaysis Louvain La Neuve Belgium April2016
[18] W Wang B-X Fang and X Cui ldquoBotnet detecting methodbased on group-signature filterrdquo Journal on Communicationsvol 31 no 2 pp 29ndash35 2010
[19] K Shanthi andD Seenivasan ldquoDetection of botnet by analyzingnetwork traffic flow characteristics using open source toolsrdquoin Proceedings of the 9th IEEE International Conference onIntelligent Systems andControl (ISCO rsquo15) pp 1ndash5 IEEE January2015
[20] G Kirubavathi and R Anitha ldquoBotnet detection via mining oftraffic flow characteristicsrdquo Computers and Electrical Engineer-ing vol 50 pp 91ndash101 2016
[21] J Zhang R Perdisci W Lee X Luo and U Sarfraz ldquoBuildinga scalable system for stealthy P2P-botnet detectionrdquo IEEETransactions on Information Forensics and Security vol 9 no1 pp 27ndash38 2014
[22] M Stevanovic and J M Pedersen ldquoAn efficient flow-based bot-net detection using supervised machine learningrdquo in Proceed-ings of the International Conference on Computing Networkingand Communications (ICNC rsquo14) pp 797ndash801 IEEE February2014
[23] LMGarcia ldquoProgrammingwith libpcapmdashsniffing the networkfrom our own applicationrdquo Hakin9-Computer Security Maga-zine p 2-2008 2008
[24] M M Rathore A Ahmad and A Paul ldquoReal time intrusiondetection system for ultra-high-speed big data environmentsrdquoThe Journal of Supercomputing vol 72 no 9 pp 3489ndash35102016
[25] D Zhao I Traore B Sayed et al ldquoBotnet detection basedon traffic behavior analysis and flow intervalsrdquo Computers andSecurity vol 39 pp 2ndash16 2013
[26] H Choi H Lee and H Kim ldquoBotGAD detecting botnets bycapturing group activities in network trafficrdquo in Proceedings
of the 4th International ICST Conference on CommunicationSystem Software and Middleware p 2 ACM June 2009
[27] P Judge D Alperovitch and W Yang ldquoUnderstanding andreversing the profit model of spam (position paper)rdquo in Pro-ceedings of the 4th Workshop on the Economics of InformationSecurity June 2005
[28] F Haddadi D-T Phan and A N Zincir-Heywood ldquoHow tochoose from different botnet detection systemsrdquo in Proceedingsof the IEEEIFIP Network Operations and Management Sympo-sium (NOMS rsquo16) pp 1079ndash1084 IEEE April 2016
[29] A Sharma and S K Sahay ldquoAn effective approach for classifi-cation of advanced malware with high accuracyrdquo InternationalJournal of Security and Its Applications vol 10 no 4 pp 249ndash266 2016
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpswwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of