report v1.2

download report v1.2

of 19

Transcript of report v1.2

  • 7/31/2019 report v1.2

    1/19

    Indian Institute of Technology,Roorkee

    Progress Report

    High Performance Intrusion Detection System

    Project GuideDr. Anjali Sardana

    Asst. Proffessor

    Electronics and Computer Science Department

    Submitted by

    Mohd Junaid Siddiqui 11536016

  • 7/31/2019 report v1.2

    2/19

  • 7/31/2019 report v1.2

    3/19

    Contents

    1 Introduction 1

    2 Issues in High Performance Computing Architechture 3

    2.1 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2.2 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2.3 Regular Expression and DFA . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    3 Optimizations 7

    3.1 Based on minimizing the communication between CPU and GPU . . . . . . 7

    3.1.1 Transfering Network Packet to the GPU . . . . . . . . . . . . . . . . 73.2 Based on minimizing the computational cost . . . . . . . . . . . . . . . . . . 9

    3.2.1 Data storage in GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.2.2 Delayed Input DFAs (D2FAs) . . . . . . . . . . . . . . . . . . . . . . 9

    4 Classification of HPCA for IDS 11

    4.1 Based on minimizing the communication cost incurred between CPU and GPU 11

    4.2 Based on minimizing the computation cost . . . . . . . . . . . . . . . . . . 11

    5 Algorithms for minimizing computational cost over GPU 12

    5.1 Converting DFAs to D2FAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    6 Conclusion 15

    References 16

    ii

  • 7/31/2019 report v1.2

    4/19

    1 Introduction

    Intrusion detection is the act of detecting unwanted traffic on a network or a device. An

    IDS can be a piece of installed software or a physical appliance that monitors network trafficin order to detect unwanted activity and events such as illegal and malicious traffic, trafficthat violates security policy, and traffic that violates acceptable use policies. Many IDStools will also store a detected event in a log to be reviewed at a later date or will combineevents with other data to make decisions regarding policies or damage control. An IPS is atype of IDS that can prevent or stop unwanted traffic. The IPS usually logs such events andrelated information. An IDS can be classified into two categories viz, A Host based IntrusionDetection System (HIDS) and Network Based Intrusion Detection System (NIDS). NetworkIntrusion Detection System (NIDS) is one common type of IDS that analyzes network trafficat all layers of the Open Systems Interconnection (OSI) model and makes decisions about

    the purpose of the traffic, analyzing for suspicious activity. Most NIDSs are easy to deployon a network and can often view traffic from many systems at once. Host-based intrusiondetection systems (HIDS) on the other hand analyze network traffic and system-specificsettings such as software calls, local security policy, local log audits, and more. A HIDSmust be installed on each machine and requires configuration specific to that operatingsystem and software.

    Moreover both these can employ different detecting techniques which are categorizedas signature based detection and anomaly based detection. An IDS can use signature-baseddetection, relying on known traffic data to analyze potentially unwanted traffic. This typeof detection is very fast and easy to configure. However, an attacker can slightly modify anattack to render it undetectable by a signature-based IDS. Still, signature-based detection,although limited in its detection capability, can be very accurate. An IDS that looks atnetwork traffic and detects data that is incorrect, not valid, or generally abnormal is calledanomaly-based detection. This method is useful for detecting unwanted traffic that is notspecifically known. For instance, an anomaly-based IDS will detect that an Internet protocol(IP) packet is malformed. It does not detect that it is malformed in a specific way, butindicates that it is anomalous.

    Ever increasing storage capacity and link speeds, the amount of data that needsto be searched, analyzed, categorized, and ltered is growing rapidly. For instance, networkmonitoring applications, such as network intrusion detection systems and spam filters, needto scan the contents of a vast amount of network trafc against a large number of threat

    signatures. Signature-based Network Intrusion detection systems (NIDS) have been widelydeployed to protect networks from attacks. The pattern matching algorithm used to deeplyinspect packet content dominates the performance of NIDS and may become the bottleneckof NIDS in high speed network environments. Most high performance systems that per-form deep packet inspection implement simple string matching algorithms to match packetsagainst a large, but finite set of strings. However, there is growing interest in the use ofregular expression-based pattern matching, since regular expressions offer superior expres-sive power and flexibility. Deterministic finite automata (DFA) representations are typically

    1

  • 7/31/2019 report v1.2

    5/19

    used to implement regular expressions. An important class of algorithms used for searchingand filtering information relies on pattern matching. Pattern matching is one of the coreoperations used by applications such as traffic classication, intrusion detection systems, virus

    scanners, spam filters, and content monitoring filters. Unfortunately, this core and powerfuloperation has signicant overheads in terms of both memory space and CPU cycles, as everybyte of the input has to be processed and compared against a large set of patterns.

    A possible solution to the increased overhead introduced by pattern matching is theuse of hardware platforms, although with a high and often prohibitive cost for many Mosthigh performance systems that perform deep packet inspection implement simple stringmatching algorithms to match packets against a large, but finite set of strings. However,there is growing interest in the use of regular expression-based pattern matching, since regu-lar expressions offer superior expressive power and flexibility. Deterministic finite automata(DFA) representations are typically used to implement regular expressions.organizations.

    Specialized devices, such as ASICs and FPGAs, can be used to inspect an input data streamand offload the CPU. Both are very efficient and perform well, however they are complex toprogram and modify, and they are usually tied to a specific implementation. The advent ofcommodity massively parallel architectures, such as modern graphics processors, is a com-pelling alternative option for inexpensively removing the burden of computationally-intensiveoperations from the CPU. The data-parallel execution model of modern graphics processingunits (GPUs) is a perfect fit for the implementation of high-performance pattern matchingalgorithms. GPU-based pattern matching engine enables content scanning at multi-gigabitrates, and allows for real-time inspection of the large volume of data transferred in modernnetwork links. The original purpose of the graphics processor is computer graphics appli-cations such as 3D processing for games. The demands for 3D animation drive graphics

    processors to do real-time, smooth and vivid rendering jobs. It results in that the computa-tion power of modern graphics processors has been increased dramatically in recent years,even surpassing that of general processors in floating point computation. The amazing com-putation power of graphics processors derives from the parallel computing ability bymultiplestream processors. Such computation power also catches the eyes of the developers of noncomputer game or graphics fields. The development of non graphics applications has beenstarted for a while, this kind of applications are called General Purpose Computations onGraphics Processor Units (GPGPU). As Graphics Processing Units (GPUs) are becomingincreasingly powerful and ubiquitous, researchers have begun exploring ways to tap theirpower for non-graphic or general-purpose (GPGPU) applications. The main reason behind

    this evolution is that GPUs are specialized for computationally-intensive and highly paralleloperations - required for graphics rendering - and therefore are designed such that moretransistors are devoted to data processing rather than data caching and ow control. Therelease of software development kits (SDKs) from big vendors, like NVIDIA and ATI hasstarted a trend of using GPUs as a computational unit to ooad the CPU.

    2

  • 7/31/2019 report v1.2

    6/19

    Figure 1: CPU and GPU architecture

    2 Issues in High Performance Computing Architech-ture

    2.1 GPU Architecture

    Driven by the insatiable market demand for real-time, high-definition 3D graphics, theprogrammable Graphic Processor Unit or GPU has evolved into a highly parallel, multi-threaded, many core processor with tremendous computational horsepower and very high

    memory bandwidth. The reason behind the discrepancy in floating-point capability betweenthe CPU and the GPU is that the GPU is specialized for compute-intensive, highly paral-lel and therefore designed such that more transistors are devoted to data processing ratherthan data caching and flow control. More specifically, the GPU is especially well-suited toaddress problems that can be expressed as data-parallel computations - the same programis executed on many data elements in parallel - with high arithmetic intensity - the ratioof arithmetic operations to memory operations. Because the same program is executed foreach data element, there is a lower requirement for sophisticated flow control, and becauseit is executed on many data elements and has high arithmetic intensity, the memory accesslatency can be hidden with calculations instead of big data caches. The GPU consists of alarge number of shader processors, and conceptually operates as a Single Instruction Multi-ple Data (SIMD). Modern graphics processing units (GPUs) have been at the leading edgeof increasing chip-level parallelism for some time. Current NVIDIA GPUs are many-coreprocessor chips, scaling from 8 to 240 cores. This degree of hardware parallelism reflectsthe fact that GPU architectures evolved to fit the needs of real-time computer graphics, aproblem domain with tremendous inherent parallelism.

    Modern graphics processing units (GPUs) have evolved to massively parallel com-putational devices, containing hundreds of processing cores that can be used for general-

    3

  • 7/31/2019 report v1.2

    7/19

    purpose computing beyond graphics rendering. The fundamental difference between CPUsand GPUs comes from how transistors are assigned to different tasks in the processor. AGPU devotes most of its die area to a large array of Arithmetic Logic Units (ALUs). In con-

    trast, most CPU resources serve a large cache hierarchy and a control plane for sophisticatedacceleration of a single thread. The architecture of modern GPUs[6] is based on a set ofmultiprocessors, each of which contains a set of stream processors operating on SIMD (SingleInstruction Multiple Data) programs. For this reason, a GPU is ideal for parallel applica-tions requiring high memory bandwidth to access different sets of data. Both NVIDIA[10]and AMD provide convenient programming libraries to use their GPUs as a general purposeprocessor (GPGPU), capable of executing a very high number of threads in parallel. A unitof work issued by the host computer to the GPU is called a kernel. A typical GPU kernelexecution takes the following four steps: (i) the DMA controller transfers input data fromhost memory to GPU memory; (ii) a host program instructs the GPU to launch the kernel;(iii) the GPU executes threads in parallel; and (iv) the DMA controller transfers the resultsdata back to host memory from device memory. A kernel is executed on the device as manydifferent threads organized in thread blocks, and each multiprocessor executes one or morethread blocks. A fast shared memory is managed explicitly by the programmer among threadblocks. The global, constant,and texture memory spaces can be read from or written to bythe host, are persistent across kernel launches by the same application, and are optimizedfor different memory usages.

    2.2 Pattern Matching

    String searching and regular expression matching are two of the most common patternmatching operations [2]. In string searching, a set of fixed strings is searched in a body of text.Regular expressions, on the other hand, offer significant advantages, providing flexibility andexpressiveness in specifying the context of each match. In addition to matching strings oftext, they offer wild-card characters, logical operators, repeating patterns, range constraints,and recursive forms. Thus, a single regular expression can cover a large number of individualstring representations. Both string patterns and regular expressions can be matched efcientlyby compiling the patterns into a Deterministic Finite Automaton (DFA). A sequence of bytescan be processed using O(n) operations irrespectively of the number of patterns, which isvery efficient in terms of speed. This is achieved because at any state, every possible inputbyte leads to at most one new state. Aiming to take advantage of the extreme thread-level

    parallelism of modern GPUs, we have to parallelized the DFA-based matching process bysplitting the input data stream into different chunks. Each chunk is scanned independentlyby a different thread using the same automaton that is stored in device memory. Althoughthreads use the same automaton, each thread maintains its own state, eliminating any needfor communication between them.

    4

  • 7/31/2019 report v1.2

    8/19

    Name Reg Ex Designation

    Epsilon {}Character For some character

    Concatenation RS Denoting the set { | in R and in S}. e.g.,{ab}{d,ef}={abd,abef}

    Alternation R|S Denoting the set union of R and S e.g.,{ab}|{ab,d,ef}={ab,d,ef}

    Kleene star A* Denoting the smallest superset ofR that contain andis closed under string concatenation. This is the set ofall strings that can be made by concatenating all thestrings in R

    Table 1: Regular expression operations.

    2.3 Regular Expression and DFA

    A regular expression is a very convenient form of representing a set of strings. They areusually used to give a concise description of a set of patterns, without having to list all ofthem. For example, the expression (a | b) aa represents the infinite set {aa, aaa, baa, ...},which is the set of all strings with characters a and b that end in aa. Formally, a regularexpression contains at least one of the operations described in Table 1

    A deterministic nite automaton (DFA) represents a nite state machine that recog-nizes a regular expression. A finite automaton is represented by the 5-tuple (, Q, T, q0,F),where: is the alphabet, Q is the set of states, T is the transition function, q0 is the initial

    state, and F is the set of final states. Given an input string I0I1I2...IN,a DFA processes theinput as follows: At step 0, the DFA is in state s0 = q0. At each subsequent step , the DFAtransitions into state si = T(s(i1, Ii). To alleviate backtracking at the matching phase, eachtransition is unique for every state and character combination.

    A DFA accepts a string if, starting from the initial state and moving from state tostate, it reaches a nal state. The transition function can be represented by a two-dimensionaltable T , which denes the next state T[s,c] for a state sand a character c. For example, theregular expression (abc+)+ is recognized by the DFA shown in Figure 2. The automatonhas four states, state 0 is the start state, and state 3 is the only final state.

    Many existing tools that use regular expressions, such as grep(1), flex(1) andpcre(3), have support for converting regular expressions into DFAs. The most commonapproach is to first compile them into non-deterministic finite automata (NFAs), and thenconvert them into DFAs. Each regular expression can be converted into a NFA using theThompson algorithm. The generated NFA is then converted to a DFA incrementally, usingthe Subset Construction algorithm. The basic idea of subset construction is to define a DFAin which each state is a set of states of the corresponding NFA. The resulting DFA achievesO(1) computational cost for each consumed character of the input during the matchingphase. Each DFA is represented as a two-dimensional state table array that is mapped on

    5

  • 7/31/2019 report v1.2

    9/19

    Figure 2: The DFA state machine

    Figure 3: state transition table

    6

  • 7/31/2019 report v1.2

    10/19

    the memory space of the GPU. The dimensions of the array are equal to the number ofstates and the size of the alphabet, respectively. Each cell contains the next state to moveto, as well as an indication of whether the state is a nal state or not.

    3 Optimizations

    3.1 Based on minimizing the communication between CPU andGPU

    3.1.1 Transfering Network Packet to the GPU

    The first thing to consider is the transfer of the packets to the memory space of the GPU[2].

    A major bottleneck for this operation, is the extra overhead, caused by the PCIe bus thatinterconnects the graphics card with the base system. Unfortunately, the PCIe bus suffersmany overheads, especially for small data transfers although with a large buffer, the rate fortransferring to the device is minimized. As a consequence, network packets are transferredto the memory space of the GPU in batches. A separate packet buffer is allocated tocollect the incoming packets. Whenever the buffer gets full, all packets are transferred tothe GPU in one operation. The format of the packet buffer plays a signicant role in theoverall packet processing throughput. First, it affects the transferring overheads, as smalldata transfer units achieve a reduced bandwidth due to PCIe and DMA overheads. Second,the packet buffer scheme affects the parallelization approach, i.e., the distribution of thenetwork packets to the stream processors. The simplest the buffer format, the better theparallelization scheme. There are two different approaches for collecting packets. The firstuses fixed buckets[2] as shown in Figure 4 for storing the network packets, and has beenpreviously adapted in similar works. The second approach uses a more sophisticated, index-based[2], scheme. Instead of pre-allocating a different, fixed-size, bucket for each packet, allpackets are stored back-to-back into a serial packet buffer.

    A separate index is maintained, that keeps pointers to the corresponding offsetsin the buffer, as shown in Figure 5. Each thread reads the corresponding packet offsetindependently, using its own thread number, without any lock or synchronization mechanismneeded. In order to avoid an extra transaction over the PCIe bus, the index array is storedin the beginning of the packet buffer. The packet buffer and the indices are transferred to

    the GPU at once, adding a mi- nor transfer cost, since the size of the index array is quitesmall in regards to the size of the packet buffer.

    7

  • 7/31/2019 report v1.2

    11/19

    Figure 4: Packets are stored to different buckets

    Figure 5: Packets are stored sequentially and indexed by a separate direc- tory.

    8

  • 7/31/2019 report v1.2

    12/19

  • 7/31/2019 report v1.2

    13/19

    states of the automaton. A modification to the standard DFA that can be represented muchmore compactly is available. These modifications are based on a technique used in the Aho-Corasick string matching algorithm. Which extend their technique and apply it to DFAs

    obtained from regular expressions, rather than simple string sets.

    10

  • 7/31/2019 report v1.2

    14/19

    Figure 6: Data transfer rate between host and device (Gbit/s).

    4 Classification of HPCA for IDS

    4.1 Based on minimizing the communication cost incurred be-tween CPU and GPU

    There are various algorithms available for such optimization. The main aim is to reduced thenumber of transfer of packets - to be scanned - between CPU and GPU. The first solutionwas to transfer the packets in batches rather transfering them individually. The experimentscarried out by Giorgos Vasiliadis[2] shows the increase transfer rate by increasing the buffersize. See figure 6

    As a consequence, network packets are transferred to the memory space of the GPUin batches. A separate packet buffer is allocated to collect the incoming packets. Wheneverthe buffer gets full, all packets are transferred to the GPU in one operation. Format of thepacket buffer plays a signicant role in the overall packet processing throughput. First, it

    affects the transferring overheads, as small data transfer units achieve a reduced bandwidthdue to PCIe and DMA overheads. Second, the packet buffer scheme affects the parallelizationapproach, i.e., the distribution of the network packets to the stream processors. The simplestthe buffer format, the better the parallelization scheme. In this work, we have implementedtwo different approaches for collecting packets. The first uses fixed buckets for storing thenetwork packets. The second approach uses a more sophisticated, index-based, scheme.Instead of pre-allocating a different, fixed-size, bucket for each packet, all packets are storedback-to-back into a serial packet buffer.

    Both these schemes are described in details in section 3.1.1

    4.2 Based on minimizing the computation cost

    Most high performance systems that perform deep packet inspection implement simple stringmatching algorithms to match packets against a large, but finite set of strings. However,there is growing interest in the use of regular expression-based pattern matching, sinceregular expressions offer superior expressive power and flexibility. Deterministic finite au-tomata (DFA) representations are typically used to implement regular expressions. However,

    11

  • 7/31/2019 report v1.2

    15/19

    DFA representations of regular expression sets arising in network applications require largeamounts of memory, limiting their practical application.

    In [11] S. Kumar, have introduced a new representation for regular expressions,called the Delayed Input DFA (D2FA), which substantially reduces space requirements ascompared to a DFA. A D2FA is constructed by transforming a DFA via incrementally re-placing several transitions of the automaton with a single default transition. Their approachdramatically reduces the number of distinct transitions between states. For a collectionof regular expressions drawn from current commercial and academic systems, a D2FA rep-resentation reduces transitions by more than 95%. Given the substantially reduced spacerequirements, we describe an efficient architecture that can perform deep packet inspectionat multi-gigabit rates. THeir architecture uses multiple on-chip memories in such a waythat each remains uniformly occupied and accessed over a short duration, thus effectivelydistributing the load and enabling high throughput.

    5 Algorithms for minimizing computational cost overGPU

    Traditionally, this deep packet inspection has been limited to comparing packet contentto sets of strings. State-of-the-art systems, however, are replacing string sets with regularexpressions, due to their increased expressiveness. The memory needed to represent a DFAis, in turn, determined by the product of the number of states and the number of transitionsfrom each state. For an ASCII alphabet, each state will have 256 outgoing edges.

    In [11], they introduced a highly compact DFA representation. Our approach reducesthe number of transitions associated with each state. The main observation is that groupsof states in a DFA often have identical outgoing transitions and we can use this duplicateinformation to reduce memory requirements. For example, suppose there are twostates s1and s2 that make transitions to the same set of states, {S}, for some set of input characters,{C}. We can eliminate these transitions from one state, say s1, by introducing a defaulttransition from s1 to s2 that is followed for all the characters in {C}. Essentially, s1 nowonly maintains unique next states for those transitions not common to s1 and s2 and usesthe default transition to s2 for the common transitions. We refer to a DFA augmented withsuch default transitions as a Delayed Input DFA (D2FA).

    5.1 Converting DFAs to D2FAs

    Although, we are in general interested in any equivalent D2FA, for a given DFA, there isno general procedure for synthesizing a D2FA directly. Consequently, our procedure forconstructing a D2FA proceeds by transforming an ordinary DFA, by introducing defaulttransitions in a systematic way, while maintaining equivalence. Our procedure does notchange the state set, or the set of matching patterns for a given state. Hence, we can

    12

  • 7/31/2019 report v1.2

    16/19

    Figure 7: Automata which recognize the expressions a+, b+c,and c+d+

    maintain equivalence by ensuring that the destination state function (x), does not change.Consider two states u and v, where both u and v have a transition labeled by the symbol ato a common third state w, and no default transition. If we introduce a default transitionfrom u to v, we can eliminate the a-transition from u without affecting the destination state

    function (x). A slightly more general version of this observation is stated below.Lemma 1. Consider a D2FA with distinct states u and v, where u has a transition labeled bythe symbol a, and no outgoing default transition. If(a, u)= (a, v), then the D2FA obtainedby introducing a default transition from u to v and removing the transition from u to (a,u) isequivalent to the original DFA. Note that by the same reasoning, if there are multiple symbolsa, for which u has a labeled outgoing edge and for which (a, u)=(a, v), the introduction of adefault edge from u to v allows us to eliminate all these edges. Our procedure for convertinga DFA to a smaller D2FA applies this transformation repeatedly. Hence, the equivalence ofthe initial and final D2FAs follows by induction. The D2FA on the right side of Figure 1 wasobtained from the DFA on the left, by applying this transformation to state pairs (2,1), (3,1),

    (5,1) and (4,1). For each state, we can have only one default transition, so its importantto choose our default transitions carefully to allow us to get the largest possible reduction.We also restrict the choice of default transitions to ensure that there is no cycle defined bydefault transitions. With this restriction, the default transitions define a collection of treeswith the transitions directed towards the tree roots and we can identify the set of transitionsthat gives the largest space reduction by solving a maximum weight spanning tree problemin an undirected graph which we refer to as the space reduction graph. The space reductiongraph for a given DFA is a complete, undirected graph, defined on the same vertex set as the

    13

  • 7/31/2019 report v1.2

    17/19

    DFA. The edge joining a pair of vertices (states) u and v is assigned a weight w(u,v) that isone less than the number of symbols a for which (a, u)=(a, v). Notice that the spanningtree of the space reduction graph that corresponds to the default transitions for the D2FA

    in Figure 7 has a total weight of 3+3+3+2=11, which is the difference in the number oftransitions in the two automata. Also, note that this is a maximum weight spanning treefor this graph.

    procedure refinedmaxspantree (graph G=(V, W), modifies set edge default);

    (1) vertex u, v; set edges; set weight-set[255];

    (2) default := {}; edges := W;

    (3) for edge (u, v) edges

    (4) if weight(u, v) > 0

    (5) add (u, v) to weight-set[weight(u, v)];

    (6) fi

    (7) for integer i = 255 to 1

    (8) do weight-set[i] [ ]

    (9) Select (u, v) from weight-set[i] which leads to the

    (10) smallest growth in the diameter of the default tree

    (11) if vertices u and v belongs to different default trees

    (12) if default U (u, v) maintains the diameter bound(13) default := default U (u, v);

    (14) fi

    (15) fi

    (16) od

    (17) rof

    end;

    14

  • 7/31/2019 report v1.2

    18/19

    6 Conclusion

    As far as exploitation of GPUs is concerned we can focus on utilizing multiple GPUs instead

    of a single one. Modern motherboards support dual GPUs, and there are PCI Expressbackplanes that support multiple slots. We believe that building such clusters of GPUs willbe able to support multiple Gigabit per second Intrusion Detection Systems.

    More development is needed in memory space efficiency, some potential solution arerequired to convert state transition table into efficient data structures. A new representationfor regular expressions, called the delayed input DFA (D2FA), which significantly reducesthe space requirements of a DFA by replacing its multiple transitions with a single defaulttransition. Since the construction of an efficient D2FA from a DFA is NP-hard. Thereforeheuristics are applied for D2FA construction that provide deterministic performance guar-antees. Results suggest that a D2FA constructed from a DFA can reduce memory space

    requirements by more than 95%. Thus, the entire automaton can fit in on-chip memories.Since embedded memories provide ample bandwidth, further space reductions are possibleby splitting the regular expressions into multiple groups and creating a D2FA for each ofthem.

    Also distribution of pattern signatures on the GPU memory, per signature per threadblock technique needs to be addressed.

    15

  • 7/31/2019 report v1.2

    19/19

    References

    [1] Weinsberg, Y., et al. High performance string matching algorithm for a network intru-

    sion prevention system (nips). 2006: IEEE.

    [2] Vasiliadis, G., M. Polychronakis, and S. Ioannidis, Parallelization and characterizationof pattern matching using GPUs, IEEE. p. 216-225.

    [3] Tumeo, A., O. Villa, and D. Sciuto. Efficient pattern matching on GPUs for intrusiondetection systems. 2010: ACM.

    [4] Vasiliadis, G., et al., Gnort: High performance network intrusion detection using graph-ics processors. 2008, Springer. p. 116-134.

    [5] Huang, N.F., et al., A gpu-based multiple-pattern matching algorithm for network in-trusion detection systems. 2008, IEEE. p. 62-67.

    [6] Dharmapurikar, S. and J. Lockwood. Fast and scalable pattern matching for contentfiltering. 2005: IEEE.

    [7] Liu, R.T., et al. A fast pattern matching algorithm for network processor-based intrusiondetection system. 2004: IEEE.

    [8] Taherkhani, M.A. and M. Abbaspour. An efficient hardware architecture for deep packetinspection in hybrid intrusion detection systems. 2009: IEEE.

    [9] Vasiliadis, G., et al. Regular expression matching on graphics hardware for intrusiondetection. 2009: Springer.

    [10] NVIDIA CUDA, http://www.nvidia.com/object/cuda_home_new.html

    [11] Kumar, S., et al., Algorithms to accelerate multiple regular expressions matching fordeep packet inspection. ACM SIGCOMM Computer Communication Review, 2006.36(4): p. 339-350.

    16

    http://www.nvidia.com/object/cuda_home_new.htmlhttp://www.nvidia.com/object/cuda_home_new.html