Krzysztof Fabjański Common string pattern searching.
-
Upload
elinor-mckinney -
Category
Documents
-
view
214 -
download
0
Transcript of Krzysztof Fabjański Common string pattern searching.
Krzysztof Fabjański
Common string pattern searching
Presentation layout:
➢ methods of network traffic collection and its
representation
➢ process of signature generation
➢ summary and conclusions
Methods of network traffic collection and its representation
collecting network traffic:➢ PC + tcpdump
➢ PC + snort
➢ honeynet
➢ nepenthes (malware collection)
➢ arakis
representation of network traffic:
➢ tcpdump format (with payload)
Sample tcpdump with payload
Process of signature generation
➢identification of attack
➢classification of threat
➢classification of vulnerability
➢network traffic representation
➢Proposition of the signature
➢normalization and validation
➢introduction of the new signature to the rule set
Area of interest.
Problem of a huge amount of information
web site sport.onet.pl was loaded in 3 sec. During that time tcpdump captured 195 packages. The file with packages consisted of 5666 lines and its size was 415155 bytes.
Problem of similarities
Should we have:
3 larger singatures:AA|C|HHKK|WW|IIDD|LL|DD
or
1 common:ABC|C
AA
AA AA BB CC C
C CC HH H
HJ
J
DD
DD AA BB CC L
LL
L CC DD D
D
AA
AA AA BB CC C
C CC HH H
HC
C
EE
AA AA BB CC C
CF
FH
H HH CC
AA BB CC KK
KK W
WW
W CC II
II
DD
DD AA BB CC L
LL
L CC DD D
D
KK
KK AA BB CC W
WW
W CC II
II
AA BB CC KK
KK W
WW
W CC II I
I
DD
DD AA BB CC L
LL
L CC DD
DD
Different types of analysis(for and against)
Offline:
(DBSCAN)➢ good precision➢ low efficiency➢ time-consuming
Online:
(Suffix tree algorithm)➢ good precision➢ good efficiency➢ very fast
Suffix Trees
Suffix Trees are universal data structures useful in a variety of string processing
problems
Align entire genomesFinding the largest palindrome
Detect repeats in DNAFinding the longest common substring in a set
Sequence homologyExact and approximate substring matching
BioinformaticsTraditional Text Applications
$bdacba
Building the Suffix Tree with the naive algorithm
abcabd$
bcabd$
cabd$ba
d$$
bdac
bd$ cabd$
d$
$
Running time O(n2)
abcabd$abcabd$abcabd$abcabd$abcabd$abcabd$
Building the Suffix Tree with the Ukkonen algorithm O(n)➢ Online Algorithm➢ Uses Suffix Links which link
nodes xα→ αLinkLink
xxαα→ → αα
abcabd$
cabd$
ba
d$$
bdac
bd$ cabd$
d$
$
1 create a root2 add a branch and leaf with S[1] label3 LastExtension=14 for Phase=2 to length[S]5 do6 for Extension=LastExtension to Phase7 do8 find the end of the path with S[Extension .. Phase – 1] label9 extend the path10 if rule for extension==3 then end the loop11 done12 LastExtension=Extension13 done
Building the Suffix Tree with the Ukkonen algorithm - pseudocode
Comparison of two strings s1 and s2 in steps
➢ building a suffix tree for s1
➢ finding the longest match of suffixes of s2 on the suffix
tree of s1
➢ return of the longest suffix of s2 machted on the suffix tree
of s1
Comparison of more then two strings using Generalized suffix tree
➢ concatenation of strings {s1,s2,...,sn}
➢ building a suffix tree for contacenated s string using
Ukkonen approach.
➢ return suffix which is the most common for {s1,s2,...,sn}
Common string pattern searching – main assumptions
➢ online string comparison require O(n) running time
➢ should find all possible common substrings
➢ should clusterize into sets of common strings
Common string pattern searching proposition
➢ genarlized suffix tree as a main structure (addition of
strings is performed in online mode – no concatenation).
➢ additional variables describing the weight of particular
node (numer of matches)
➢ additional structure – list of strings with the numbers
denoting the starting position of the suffix in those strings
(possible use of hash tables).
cab$
aba$
abc$
An example:
ba
$c
a$
$
$c a
$$
b c
$ ab$
$
1 {3 abc$} 1 {3 aba$} 1 {4 cab$}
3 {1 abc$} {1 aba$} {2 cab$} 3 {2 abc$}
{2 aba$} {3 cab$}
2 {3 abc$} {1 cab$}
3 {4 abc$} {4 aba$} {4 cab$}
1 {3 abc$} 1 {3 aba$} 1 {4 cab$}1 {4 abc$} 1 {2 cab$}
Expected result: ab | $
Thank you for your attention