[IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India...

6
Efficient Hybrid Technique for Detecting Zero-Day Polymorphic Worms Ratinder Kaur Computer Science and Engineering Department Thapar University Patiala, India [email protected] Maninder Singh Computer Science and Engineering Department Thapar University Patiala, India [email protected] Abstract—This paper presents an efficient technique for detecting zero-day polymorphic worms with almost zero false positives. Zero-day polymorphic worms not only exploit unknown vulnerabilities but also change their own representations on each new infection or encrypt their payloads using a different key per infection. Thus, there are many variations in the signatures for the same worm, making fingerprinting very difficult. With their ability to rapidly propagate, these worms increasingly threaten the Internet hosts and services. If these zero-day worms are not detected and contained at right time, they can potentially disable the Internet or can wreak serious havoc. So the detection of Zero- day polymorphic worms is of paramount importance. Keywords—Zero-day attack, intrusion detection, hybrid system, polymorphic worm. I. INTRODUCTION Zero day polymorphic worms continue to threat the Internet with their ability to rapidly infect a large numbers of hosts by exploiting the zero-day vulnerabilities. It's worsen by the fact that zero-day polymorphic worms can change their patterns dynamically and can evade most existing intrusion detection systems thus, making them ineffective. With the major innovations in worm technology, zero-day worms are spreading more quickly, maliciously and efficiently than ever. The modern zero-day attacks not only deals with the freshness of the vulnerability it can exploit but can also exhibit any number of distinct behaviors. This includes: complex mutation to evade defenses, multi-vulnerability scanning to identify potential targets, targeted exploitation that launches directed attacks against vulnerable hosts, remote shells that open arbitrary ports on compromised hosts to connect to at a later time, malware drops, in which malicious code is downloaded from an external source to continue propagation. In recent years the number of zero-day attacks reported each year has increased immensely. According to Symantec's Internet Security Threat Report of 2013 there is 42% increase in zero-day attacks in 2012 [1]. The most dangerous zero-day exploits ever seen in cyberspace are Hydraq Trojan, Stuxnet, Duqu and Flame. The Hydraq Trojan, also known as Aurora attack aimed to steal information from several companies. Stuxnet (discovered in June 2010) tampered the control system associated to the enrichment of Uranium in the Natanz facility and remained undetected over a large amount of time. Duqu (discovered in September 2011) related to Stuxnet exploits zero-day Windows kernel vulnerabilities, uses stolen digital keys and is highly targeted. Flame (discovered in 2012) is a modular computer malware that exploits some same zero-day vulnerabilities in Microsoft Windows as Stuxnet. To defend against these new attacks, the research community has proposed various techniques. Those techniques can be broadly classified into: statistical-based, signature-based behavior-based and hybrid techniques. Most of the statistical-based techniques [2] [3] [4] are dependent on attack profiles build from historical data. Due to the static nature of attack profiles, the detection techniques are unable to adapt to the timely changes in the environment. For any change in the data pattern the system will require updated profile with constant training. Setting the limit (or detection parameters) for judging new observations (new attacks) is a critical step in designing a statistical detection approach since it has a dramatic effect on the quality of the detection. If the threshold value is very narrow, it will frequently be exceeded resulting in a high rate of false positive alarms, and if it is very wide the limit will never be exceeded, resulting in many false negative alarms. At times, the detection parameters are either manually extracted or adjusted to detect new attacks. All these factors, limit the statistical detection approaches to work in offline mode. And hence, they cannot be used for instant detection and protection in real time. The signature based detection techniques mainly focus on polymorphic worms. There are three types of signatures: content-based, semantic-based and vulnerability-based. The content-based signatures [5] [6] [7] capture the features specific to a worm implementation, thus might not be generic enough and can be evaded by other exploits. Furthermore, various attacks can evade the content-based signatures by misleading signature generation processes by using crafted packets injection into normal traffic. Semantic-based signatures [8] are computationally expensive to generate as compared to approaches based on substrings. Moreover, they cannot be implemented in existing IDS like Snort. Vulnerability-driven signatures [9] capture the characteristics of the vulnerability the worm exploits and are difficult to generate. Behavior-based techniques [10], looks for the essential characteristics of worms which do not require the examination of payload byte patterns. They suffer from the fact that they 95 978-1-4799-2572-8/14/$31.00 c 2014 IEEE

Transcript of [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India...

Page 1: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

Efficient Hybrid Technique for Detecting Zero-Day Polymorphic Worms

Ratinder Kaur Computer Science and Engineering Department

Thapar University Patiala, India

[email protected]

Maninder Singh Computer Science and Engineering Department

Thapar University Patiala, India

[email protected]

Abstract—This paper presents an efficient technique for detecting zero-day polymorphic worms with almost zero false positives. Zero-day polymorphic worms not only exploit unknown vulnerabilities but also change their own representations on each new infection or encrypt their payloads using a different key per infection. Thus, there are many variations in the signatures for the same worm, making fingerprinting very difficult. With their ability to rapidly propagate, these worms increasingly threaten the Internet hosts and services. If these zero-day worms are not detected and contained at right time, they can potentially disable the Internet or can wreak serious havoc. So the detection of Zero-day polymorphic worms is of paramount importance.

Keywords—Zero-day attack, intrusion detection, hybrid system, polymorphic worm.

I. INTRODUCTION Zero day polymorphic worms continue to threat the Internet

with their ability to rapidly infect a large numbers of hosts by exploiting the zero-day vulnerabilities. It's worsen by the fact that zero-day polymorphic worms can change their patterns dynamically and can evade most existing intrusion detection systems thus, making them ineffective. With the major innovations in worm technology, zero-day worms are spreading more quickly, maliciously and efficiently than ever. The modern zero-day attacks not only deals with the freshness of the vulnerability it can exploit but can also exhibit any number of distinct behaviors. This includes: complex mutation to evade defenses, multi-vulnerability scanning to identify potential targets, targeted exploitation that launches directed attacks against vulnerable hosts, remote shells that open arbitrary ports on compromised hosts to connect to at a later time, malware drops, in which malicious code is downloaded from an external source to continue propagation.

In recent years the number of zero-day attacks reported each year has increased immensely. According to Symantec's Internet Security Threat Report of 2013 there is 42% increase in zero-day attacks in 2012 [1]. The most dangerous zero-day exploits ever seen in cyberspace are Hydraq Trojan, Stuxnet, Duqu and Flame. The Hydraq Trojan, also known as Aurora attack aimed to steal information from several companies. Stuxnet (discovered in June 2010) tampered the control system associated to the enrichment of Uranium in the Natanz facility and remained undetected over a large amount of time. Duqu (discovered in September 2011) related to Stuxnet exploits

zero-day Windows kernel vulnerabilities, uses stolen digital keys and is highly targeted. Flame (discovered in 2012) is a modular computer malware that exploits some same zero-day vulnerabilities in Microsoft Windows as Stuxnet.

To defend against these new attacks, the research community has proposed various techniques. Those techniques can be broadly classified into: statistical-based, signature-based behavior-based and hybrid techniques.

Most of the statistical-based techniques [2] [3] [4] are dependent on attack profiles build from historical data. Due to the static nature of attack profiles, the detection techniques are unable to adapt to the timely changes in the environment. For any change in the data pattern the system will require updated profile with constant training. Setting the limit (or detection parameters) for judging new observations (new attacks) is a critical step in designing a statistical detection approach since it has a dramatic effect on the quality of the detection. If the threshold value is very narrow, it will frequently be exceeded resulting in a high rate of false positive alarms, and if it is very wide the limit will never be exceeded, resulting in many false negative alarms. At times, the detection parameters are either manually extracted or adjusted to detect new attacks. All these factors, limit the statistical detection approaches to work in offline mode. And hence, they cannot be used for instant detection and protection in real time.

The signature based detection techniques mainly focus on polymorphic worms. There are three types of signatures: content-based, semantic-based and vulnerability-based. The content-based signatures [5] [6] [7] capture the features specific to a worm implementation, thus might not be generic enough and can be evaded by other exploits. Furthermore, various attacks can evade the content-based signatures by misleading signature generation processes by using crafted packets injection into normal traffic. Semantic-based signatures [8] are computationally expensive to generate as compared to approaches based on substrings. Moreover, they cannot be implemented in existing IDS like Snort. Vulnerability-driven signatures [9] capture the characteristics of the vulnerability the worm exploits and are difficult to generate.

Behavior-based techniques [10], looks for the essential characteristics of worms which do not require the examination of payload byte patterns. They suffer from the fact that they

95978-1-4799-2572-8/14/$31.00 c©2014 IEEE

Page 2: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

cannot effectively capture the context in which the worm program interacts with the real victim machine.

Hybrid techniques [11] [12] combines heuristics and different intrusion detection techniques like signature-based, anomaly-based, etc. to detect zero-day polymorphic worms.

In this paper, an efficient hybrid detection technique is presented for “Zero-Day Polymorphic Worms”. The proposed technique is based on both signature detection and anomaly detection techniques. Honeynet is used as an anomaly detector to identify and capture new attacks. After detection, the new attacks are validated for polymorphism and accurate signatures are generated to contain them.

A. Contribution This paper's main contributions are four-fold as given

below.

• We propose an efficient technique that offers better sensitivity and specificity. It identifies zero-day attacks from data collected automatically on high-interaction honeypots.

• We have strengthened the basic existing techniques by combining their advantages and minimizing their disadvantages.

• Our technique does not need any prior knowledge of zero-day attacks as we are using Honeynet as anomaly detector.

• Our technique can detect zero-day attacks in their early phase and can contain them before major consequences occur.

The remainder of the paper is organized as follows. In Section II, related work is summarized. In Section III, detailed working of the proposed technique is presented. Finally in Section IV, describes results and the paper is concluded in Section V.

II. RELATED WORK This section discusses several proposed techniques in

detection of zero-day attacks.

Supervised Learning [13] is a novel method of employing several data mining techniques to detect and classify zero-day malware based on the frequency of Windows API calls. A machine learning framework is developed using eight different classifiers, namely Naïve Bayes (NB) Algorithm, k-Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 different kernels (SMO-Normalized PolyKernel, SMO-PolyKernel, SMO-Puk, and SMO-Radial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree. This system proves to be better than similar signature-free techniques that detect polymorphic malware and unknown malware based on analysis of Windows APIs.

Contextual Anomaly Detection [14] is a contextual misuse and anomaly detection prototype to detect zero-day attacks. The contextual misuse detection utilizes similarity with attack

context profiles, and the anomaly detection technique identifies new types of attacks using the One Class Nearest Neighbor (1-NN) algorithm.

SweetBait [7] is a distributed system that is a combination of network intrusion detection and prevention techniques. It employs different types of honeypot sensors, both high-interaction (Argos [15]) and low-interaction (SweetSpot, similar to honeyd [16]) to recognize and capture suspicious traffic. SweetBait automatically generates signatures for random IP address space scanning worms without any prior knowledge. And for the non-scanning worms, Argos is used to do the job. A novel aspect of this signature generation approach is that a forensics shellcode is inserted, replacing malevolent shellcode, to gather useful information about the attack process.

LISABETH [17] automatically generate signatures for polymorphic worms, LISABETH uses invariant bytes analysis of traffic content, as originally proposed in Polygraph [5] and refined by Hamsa [6]. LISABETH leverages on the hypothesis that every worm has its invariants set and that an attacker must insert in all worm samples all the invariants bytes. LISABETH and Hamsa systems are equally sensible to the suspicious flows pool size but, LISABETH is lesser sensible to innocuous flows pool size than Hamsa. LISABETH has shown significant improvement over Polygraph and Hamsa in terms of efficiency and noise-tolerance.

In Honeycyber [18] a, “Double-honeynet” is proposed as a new detection method to identify zero-day worms and to isolate the attack traffic from innocuous traffic. It uses unlimited honeynet outbound connections to capture different payloads in every infection of the same worm. It uses Principal Component Analysis (PCA) to determine the most significant substrings that are shared between polymorphic worm instances to use them as signatures [20].

ZASMIN [19] a Zero-day Attack Signature Management Infrastructure is an early detection system for novel network attack detection. This system provides early detection function and validation of attack at the moment the attacks start to spread on the network. To detect unknown network attacks, the system adopted new technologies. To filter malicious traffic it uses dispersion of destination IP address, TCP connection trial count, TCP connection success count and stealth scan trial count. Attack validation is done by call function and instruction spectrum analysis. And it generate signatures using content analysis.

LESG [9] is a network-based automatic worm signature generator that generates length-based signatures for zero day polymorphic worms, which exploits buffer overflow vulnerabilities. The system generates vulnerability-driven signatures at network level without any host level analysis of worm execution or vulnerable programs.

Network-Level Emulation [10] is a heuristic detection method to scan network traffic streams for the presence of previously unknown polymorphic shellcode. Their approach relies on a NIDS-embedded CPU emulator that executes every potential instruction sequence in the inspected traffic, aiming to identify the execution behavior of polymorphic shellcode. The

96 2014 IEEE International Advance Computing Conference (IACC)

Page 3: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

proposed approach is more robust to obfuscation techniques like self-modifications and non-self-contained polymorphic shellcodes but, not to non-self-modifying and indirect control transfer instructions.

Hybrid Detection for Zero-day Polymorphic Shellcodes (HDPS) [11] is a hybrid detection approach. It uses an elaborate approach to detect NOP Sleds to be robust against polymorphism, metamorphism and other obfuscations. It employs a heuristic method to detect return address, and achieves high efficiency by incorporating Markov Model to detect executable codes. This method filters normal packets with accuracy and low overload. But this approach cannot block shellcodes in network packets and it is hard to obtain transition matrixes of Markov Model.

Honeyfarm [12] is a hybrid scheme that combines anomaly and signature detection with honeypots. This system takes advantage of existing detection approaches to develop an effective defense against Internet worms. The system works at three levels. At first level signature based detection is used to filter known worm attacks. At second level an anomaly detector is set up to detect any deviation from the normal behavior. In the last level honeypots are deployed to detect zero day attacks. Low interaction honeypots are used to track attacker activities while high interaction honeypots help in analyzing new attacks and vulnerabilities. The controller is responsible to redirect suspicious traffic to respective honeypots which are deployed in honeyfarm.

Figure 1: System Components

III. PROPOSED TECHNIQUE The proposed techniques works in two-folds. First it detects

zero-day polymorphic worms and then contains them. It is combination of two malware detection techniques which is signature-based technique and anomaly-detection technique. We have combined the strengths of these two techniques to detect (through anomaly detector) and contain (through signature-based) zero-day worms. The drawback of an anomaly-based technique that it suffers from high false positive rate is reduced by using honeypot-technology and multi-level attack evaluation techniques.

Fig. 1 shows the three main components such as Suspicious Traffic Filter (STF), Zero-day Attack Evaluation (ZAE) and Signature Generator (SG). STF is the first defense layer from zero-day attack. ZAE takes input (malicious traffic) from STF to evaluate and analyze captured zero-day attack. SG generates

new signature for zero-day attack and update the signature database in STF. These three main components will work together as interrelated process.

A. Suspicious Traffic Filter (STF) STF observes all network traffic at an edge network and the

Internet. The traffic is passed simultaneously to both Honeynet and IDS/IPS sensors through a port mirroring switch as shown in Fig. 2.

Honeynet identifies the mechanism of a new attack and collects evidence for attacker's activity. If honeynet encounters a known attack, it will block and log the known attack. And, if honeynet encounters a zero-day attack, it will redirect the unknown traffic to high-interaction honeypots where Sebek is installed. The entire inbound and outbound communication of attacker is then monitored and logged. The captured unknown attack packets are stored remotely in more compact form by using SNORT BIN (Snort in binary capture mode), for further analysis. Similarly, the IDS/IPS sensors filter known attacks for the same traffic and stores rest of the filtered traffic in an online repository. The data collected from both honeynet and IDS/IPS sensors is then compared and analyzed to see if similar attack traces from honeynet are also found in IDS/IPS sensor's filtered traffic. If yes, then that is a new attack undetected by IDS/IPS sensors. This separates unknown malicious code from the benign traffic. By comparing the data of both repositories a low-level attack validation is done at the first stage.

For comparing data an efficient algorithm known as Longest Common Prefix (LCP) has been used. Consider a new attack pattern captured by Honeynet, H = {aba} of length m and filtered traffic stored by Sensors, S = {cabacabadabac} of length n. Suffix tree, constructed is shown in Fig. 3 ($ as terminal symbol). The path that matches pattern H = {aba} is shown in boldface in the tree ending at node . The subtree rooted at node has three leaves. These three leaves correspond to the three occurrences of new attack pattern {aba} in . Moreover, the labels of those leave provide the positions of the new attack pattern H in . This string matching is done in time (m) with preprocessing time (n). This represents a great saving of time.

B. Zero-day Attack Evaluation (ZAE) The packets from the suspicious traffic filter may contain

few false positives, so they need to be evaluated. ZAE deals with malicious strings and polymorphism. A typical structure of shellcode contains NOP sled, decryptor, shellcode and return address. The purpose of such a structure is that when a function returns following a buffer overflow, the return address directs the execution to NOP sled, which eventually reaches the shellcode. ZAE works in two stages.

Stage1: Detection of NOP and Return Address. The ZAE uses a simple pattern matching method to detect NOP sled and return address. In a polymorphic worm n-byte (n>1) instructions other than 0x90 can also be used but, such instructions decrease the probability of jumping into adjusted code.

2014 IEEE International Advance Computing Conference (IACC) 97

Page 4: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

Figure 2: STF Module

To detect NOP, similar NOP instructions that appeared consecutively are examined. There are certain non-self-contained polymorphic worms that may not use NOP but take advantage of a certain register (or instructions in libraries) that hold the base address of the injected shellcode upon hijacking the instruction pointer. Such registers and instructions are always located at the same memory address. Simple pattern matching to recognize JMP ESP like instructions is used.

Figure 3: Suffix Tree

Return address is chosen from a set of limited addresses so

that it points at or near the beginning of the injected code. Therefore, only least significant bytes can be mutated. To detect return address, 4-byte pattern appearing consecutively in the payload is checked. This method is operated along with further evaluation in stage2 therefore; false rate from stage1 is very low.

Stage2: Detection of Decryptor. Polymorphics mutate

shellcode and/or decryptor by inserting junk instructions or by using different instructions to achieve same result like e.g. push value, pop reg instead of mov reg,value. But, to do harm the decryptor have to decrypt the encrypted shellcode. In order to decrypt the encrypted shellcode, the decryptor needs to find the absolute memory address of the injected shellcode.

To do so it takes advantage of the CPU program counter (PC) or EIP (Extended Instruction Pointer). During the execution of the decryptor, PC points to an address within the memory where decryptor along with encrypted payload has been placed. The decryptor loads the current value of the PC to a register to compute the absolute address of the shellcode. The code that is used to retrieve the current PC value is referred to as the getPC code. ZAE performs disassembly per every byte to detect the seed instruction for GetPC. The whole idea of this method detects whether the PC value which the decryptor stores is used in accessing the memory or not. ZAE also detects related seed instructions like fstenv, fnstenv, fsave and fnsave, used to find absolute memory address. Following are the steps involved:

• Disassemble packet per byte. • Find the seed instruction, storing the PC value on a

stack. • Detect a register loading PC value from stack. • Trace the relation between the register containing PC

value and others registers, computing absolute memory address.

• Check if the loaded PC value is used for accessing a memory or not. If yes, then it is a polymorphic shellcode.

C. Signature Generation (SG) After evaluation of zero-day polymorphic worm, it is fed to

next module for signature generation. The proposed technique generates content-based signatures that treat the worms as strings of bytes. They do not depend upon any protocol or server information. Content-based signatures have fast signature matching algorithms and can easily be incorporated into Firewalls or NIDSes.

The goal of SG is to find invariant byte sequence in a polymorphic worm payload and then generate its signature. Most software vulnerabilities require invariant bytes to be exploited successfully. Any changes in these bytes cause an exploit to become non-functional. It has been also shown that

98 2014 IEEE International Advance Computing Conference (IACC)

Page 5: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

multiple invariant bytes remain same in all variants of a polymorphic worm. Such bytes are very useful in signature generation as they are indispensable for the exploit to work and their content is replicated in all worm instances. These substrings correspond to invariant exploit framing, invariant return addresses (overwrite values), and invariant bytes.

Invariant exploit framing typically comprises of reserved keywords or well-known protocol constants to exploit vulnerability successfully. Invariant return address is a jump target whose value is exploited to force a jump to injected code in the payload or in library code. This address is chosen from a set of limited addresses because the overwritten address must point at or near the beginning of the injected code. This means the high-order bytes of the return address are generally invariant and only least-bytes can be mutated. Invariant bytes are fixed values in every instance of a polymorphic worm. These can be part of worm body or the polymorphic decryptor. Instructions in worm body or decryptor routines can be subjected to polymorphism, but then also there must be some invariant values as polymorphic engines are not perfect.

SG extracts all invariant bytes from validated polymorphic worms in ZAE of a minimum length X which appear in at least K different worm instances out of total N worms. In polymorphism, the worm is not only encrypted but, in some cases, the instructions can be re-arranged randomly or no-operation (NOP) instructions may be added randomly in between instructions. Therefore, a signature for the polymorphic worm is generated in the form of a Longest Common Subsequence (LCSeq). Given sequences:

X = abcbdab and Y = bdcaba, bca is a common subsequence and bcba and bdab are two LCSeqs.

IV. RESULTS All experiments were run on an isolated network in

research lab. Honeynet comprises of Honeywall Roo-1.4 and high-interaction honeypots with Linux Sebek client installed on them. The Honeynet is fed with real network packets. Tools like MetaSploit framework 4.6.1 is used to generate the payload for exploits. For IDS/IPS sensors SNORT is used and operates on same network traffic as of Honeynet. We have also developed a small zero-day attack evaluation and signature generation engine for our experiment. To generate zero-day polymorphic worms, various polymorphic engines were used to mutate known shellcodes. These mutated known shellcodes act as zero-day attacks for our system. In experiment, total number of normal packets was 15453 out of which 734 were zero-day attacks. Attack packets are only approximately 5% of the normal packets as zero-day attacks are rare events in reality.

Three standard metrics were used to evaluate the performance of our technique: True Positive Rate (TPR), False Positive Rate (FPR) and Receiver Operating Characteristic (ROC) curve. TRP is the percentage of correctly identified malicious code; FPR is the percentage of wrongly identified benign code. In ROC curve the true positive rate is plotted in function of the FP rate for different points. Each point on the ROC curve represents a sensitivity pair corresponding to a particular decision threshold. A test with perfect discrimination

has a ROC curve that passes through the upper left corner (100% sensitivity). Therefore the closer the ROC curve to the upper left corner, the higher is the overall accuracy of the test. Usually, ROC area higher (closer) to 1 is considered good, and closer to 0.0 is considered poor. Tab. 1 shows TPR, FPR and ROC Area. Fig. 4 and Fig. 5 represent same information in form of bar graphs.

TABLE I. RESULTS

Polymorphic Engines TPR FPR ROC

AdMmutate 0.99 0.018 0.981

Alpha2 0.966 0.102 0.932

Clet 0.959 0.062 0.966

CountDown 0.948 0.051 0.966

JmpCallAdditive 0.986 0.025 0.982

Pex 0.94 0.064 0.939

PexFnStenvMov 0.934 0.069 0.932

PexFnStenvSub 0.94 0.084 0.928

ShikataGaNai 0.981 0.031 0.982

TAPiON 0.967 0.099 0.934

Figure 4: Detection Rate

Figure 5: False Positive Rate

Fig. 6 shows ROC curve. It is plotted by taking average of above values with average TPR=0.961 and average FPR=0.06. The proposed technique is compared with the results of another hybrid technique, Honeyfarm [12] that also tries to combine advantages of both anomaly-based and signature-based

2014 IEEE International Advance Computing Conference (IACC) 99

Page 6: [IEEE 2014 IEEE International Advance Computing Conference (IACC) - Gurgaon, India (2014.02.21-2014.02.22)] 2014 IEEE International Advance Computing Conference (IACC) - Efficient

detection with honeypots. Area under ROC curve proves that our technique is efficient with almost zero false positives.

Figure 7: ROC Curve: Comparisn with Honeyfarm

V. CONCLUSIONS In this paper a comprehensive security solution containing

multiple layers of defense for detecting zero-day polymorphic worms is presented. The proposed technique is accurate and efficient against various types of polymorphic worms. Results show average detection rate to be 96% with almost zero false positives. The key problem of false positives is solved by using Honeypots (where all traffic is malicious by default) by doing low-level attack validation in STF module and then by doing detailed zero-day attack validation for polymorphism in second module.

The future research direction of this paper includes developing attack classification models to help in finding the actual class of the new detected attack automatically, which would ease the interpretations and reporting for network administrators and researchers as well.

ACKNOWLEDGMENT The authors would like to thank Tata Consultancy Services

(TCS) for their support to this research work. The authors are highly obliged to the Computer Science and Engineering Department of Thapar University, Patiala for rendering their incessant help in providing best infrastructure and work-environment. The authors are also thankful to the anonymous reviewers for their constructive comments and feedback on this paper.

REFERENCES

[1] Symantec, “Internet security threat report”, 2013, https://scm.symantec.com/resources/istr18_en.pdf.

[2] W. C. Sun, Y. M. Chen, “A Rough Set Approach for Automatic Key Attributes Identification of Zero-day Polymorphic Worms”. Expert Systems with Applications, Elsevier, vol. 36, no. 3, pp. 4672-4679, 2009.

[3] S. Almotairi, A. Clark, G. Mohay, and J. Zimmermann, “A technique for detecting new attacks in low-interaction honeypot traffic,” in Proc. IEEE 4th International Conference on Internet Monitoring and Protection, Washington DC, USA, May 2009, pp.7–13.

[4] J. Song, H. Takakura, and Y. Kwon, “A generalized feature extraction scheme to detect 0-day attacks via ids alerts,” in Proc. IEEE

International Symposium on Applications and the Internet, Washington, DC, USA, Aug. 2008, pp. 55–61.

[5] J. Newsome, B. Karp, and D. Song, “Polygraph: Automatically generating signatures for polymorphic worms,” in Proc. IEEE Symposium on Security and Privacy (S&P’05), Oakland, CA, May 2005, pp. 226–241.

[6] Z. Li, M. Sanghi, Y. Chen, M. Kao, and B. Chavez, “Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience,” in Proc. IEEE Symposium on Security and Privacy (S&P’06), Berkeley/Oakland, CA, Jun. 2006, pp. 15–47.

[7] G. Portokalidis and H. Bos, “Sweetbait: Zero-hour worm detection and containment using low-and high-interaction honeypots,” Computer Networks: The International Journal of Computer and Telecommunications Networking, vol. 51, no. 5, pp. 1256–1274, Apr. 2007.

[8] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, “Polymorphic worm detection using structural information of executables,” in Proc. LNCS Springer 8th International Symposium on Recent Advances in Intrusion Detection (RAID’05), Seattle, Sep. 2005, pp. 207–227.

[9] L. Wang, Z. Li, Y. Chen, Z. Fu, and X. Li, “Thwarting zeroday polymorphic worms with network-level length-based signature generation,” IEEE/ACM Transactions on Networking (TON), vol. 18, no. 1, pp. 53–66, Feb. 2010.

[10] M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos, “Network-level polymorphic shellcode detection using emulation,” Journal in Computer Virology, vol. 2, no. 4, pp. 257–274, Jul. 2006.

[11] C. Ting, Z. Xiaosong, and L. Zhi, “A hybrid detection approach for zero-day polymorphic shellcodes,” in Proc. IEEE International Conference on E-Business and Information System Security, Wuhan, May 2009, pp. 1–5.

[12] P. Jain and A. Sardana, “Defending against internet worms using honeyfarm,” in Proc. CUBE International Information Technology Conference (CUBE’12), Pune, India, Dec. 2012, pp. 795–800.

[13] M. Alazab, S. Venkatraman, P. Watters, and M. Alazab, “Zeroday malware detection based on supervised learning algorithms of api call signatures,” in Proc. 9th Australasian Data Mining Conference (AusDM’11), Ballarat, Australia, Dec. 2011, pp. 171–182.

[14] A. AlEroud and G. Karabatis, “A contextual anomaly detection approach to discover zero-day attacks,” in Proc. IEEE International Conference on Cyber Security (CYBERSECURITY ’12), Washington, DC, Dec. 2012, pp. 40–45.

[15] G. Portokalidis, A. Slowinska, and H. Bos, “Argos: An emulator for fingerprinting zero-day attacks,” in Proc. ACM 1st SIGOPS/EuroSys European Conference on Computer Systems, New York, US, Oct. 2006, pp. 15–27.

[16] N. Provos, “A virtual honeypot framework,” in Proc. USENIX 13th USENIX Security Symposium, San Diego, CA, Aug. 2004, pp. 1–14.

[17] L. Cavallaro, A. Lanzi, L. Mayer, and M. Monga, “Lisabeth: Automated content-based signature generator for zero-day polymorphic worms,” in Proc. ACM 4th International Workshop on Software Engineering for Secure Systems, Leipzig, Germany, May 2008, pp. 41–48.

[18] M. M. Z. E. Mohammed, H. A. Chan, and N. Ventura, “Honeycyber: Automated signature generation for zero-day polymorphic worms,” in Proc. IEEE Military Communications Conference (MILCOM’2008), San Diego, CA, Nov. 2008, pp. 1–6.

[19] I. Kim et al., “A case study of unknown attack detection against zero-day worm in the honeynet environment,” in Proc. IEEE 11th International Conference on Advanced Communication Technology (ICACT’ 2009), Phoenix Park, Apr. 2009, pp. 1715–1720.

[20] M. M. Z. E. Mohammed, H. A. Chan, N. Ventura, M. Hashim, I. Amin, and E. Bashier, “Detection of zero-day polymorphic worms using principal component analysis,” in Proc. IEEE 6th International Conference on Networking and Services, Cancun, Mar. 2010, pp. 277–281

100 2014 IEEE International Advance Computing Conference (IACC)