Sequence Classification Using Statistical Pattern Recognition

58
José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es .

description

. Sequence Classification Using Statistical Pattern Recognition. José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, 30. 28911 Leganés, Spain {jiglesia, ledezma, masm}@inf.uc3m.es. Outline. - PowerPoint PPT Presentation

Transcript of Sequence Classification Using Statistical Pattern Recognition

Page 1: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

Page 2: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approach

Library Creation Classification

Target Environment Description Experiments and Results

Conclusions and Future Works

OutlineOutline

.

1

Page 3: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and IntroductionMotivation and Introduction Sequence classification Our approach

Library Creation Classification

Target Environment Description Experiments and Results

Conclusions and Future Works

1

OutlineOutline

Page 4: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Opponent behavior Modelling / Classification(Environment: soccer simulation domain)

MotivationMotivation

.

2

Opponent Modeling

Pattern Recognition

Off-Line Analysis

No-Pattern LogFile

Pattern LogFile

Base Estrategy

Pattern

Recognized Patterns

On-Line Comparing Method

Pattern Detection

On-Line Detection

Environment Information Advices to

Players

RoboCup Soccer Server

Pattern Recognized

Page 5: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Behavior ClassificationBehavior Classification

Behavior as sequence of elements

Sequence ClassificationSequence Classification

IntroductionIntroduction

.

3

SequenceSequence::

“set of elements ordered so that they can be labelled with the positive integers” (Merriam-Webster Dictionary)

Page 6: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction

Sequence classificationSequence classification Our approach

Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

4

OutlineOutline

Page 7: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

• Given:

Classes = {cClasses = {c11, c, c22, … c, … cnn}}

Sequence E = {eSequence E = {e11, e, e22, … e, … enn}}

• Determine: Which class ci Є C does the sequence E belong to.

Sequence classificationSequence classification

5

Page 8: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification

Our approachOur approach Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

.

6

OutlineOutline

Page 9: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Our approachOur approach

pwd

fs

fg

vi

man

ls

…finger

more

ls

...Sequence 1

Class 1Sequence 2

Class 2Sequence n

Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vi

more

ls

…Pattern to classify

Sequence to classify

Compare_Patterns

Compare_Patterns

Compare_Patterns

On-Line Sequence

Classification

SEQUENCE CLASS

Classification Result

.

7

Page 10: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification

Our approachOur approach Library CreationLibrary Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future Works

8

OutlineOutline

Page 11: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation

.

TrieTrie (re (retrietrieval)val) data structure data structure::

Special search tree used for storing elements and its prefixes.Special search tree used for storing elements and its prefixes.

Every node: Every node: – represents an element– stores useful information (times appeared,…)

9

Page 12: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sequence to insert initially in the trie: {pwd vi pwd vi pwd ls}

pwd

vi

pwd

vi

pwd

ls

Sequence

10

Page 13: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sequence to insert initially in the trie: {pwd vi pwd vi pwd ls}

Sub-sequence length: 3Sub-sequence length: 3 {pwd vi pwd vi pwd ls}

Sub-sequences to insert in the trie: {pwd vi pwd} and {vi pwd ls}

pwd

vi

pwd

vi

pwd

ls

Sequence

10

Page 14: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd pwd vi vi pwd pwd} and {vi pwd ls}

Root

11

Page 15: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd pwd vi vi pwd pwd} and {vi pwd ls}

Root

pwd [1]

vi [1]

pwd [1]

11

Page 16: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwd pwd} and {vi pwd ls}

Root

pwd [1]

vi [1]

pwd [1]

vi [1]

pwd [1]

11

Page 17: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [2]

vi [1]

pwd [1]

vi [1]

pwd [1]

11

Page 18: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [2]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

11

Page 19: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library CreationLibrary Creation - - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

11

Page 20: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

Sub-sequences to insert in the trie: {pwd vi vi pwdpwd} and {vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

ls [1]

11

Page 21: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - An example trieAn example trie

{pwd vi vi pwd pwd vi pwd ls}

Root

pwd [3]

vi [1]

pwd [1]

vi [2]

pwd [2]

ls [1]

ls [1]

ls [1]

11

pwd

vi

pwd

vi

pwd

ls

Page 22: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Evaluate the relation/dependence between an element and its prefix

Two approaches:

– Frequency-based method. Statistical dependence method.

Our approach: Statistical Value used: Chi-square value.

This value is stored in every node of the trie

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

12

Page 23: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

EventDifferent

eventTotal

Prefix O11 O12 O11 + O12

Different Prefix

O21 O22 O21 + O22

Total O11+ O21 O12+ O22

O11 + O12+

O21 + O22

O11: How many times the current node/element is followed by its prefix.

O12: How many times the current node/element is followed by a different prefix.O21: How many times a different prefix (of the same length) is followed by the same node.O22: How many times a different prefix (of the same length) is followed by a different node.

Expected (Eij)= (Rowi Total x Columnj Total)

Grand Total

X2 = ∑ ∑(Oij - Eij ) 2

Eiji=1

r k

2 x 2 Contingency Table

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

j=1

.

13

Page 24: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Library Creation - Library Creation - Evaluating DependencesEvaluating Dependences

.

pwd [3]

vi [1] [5.1][5.1]

pwd [1] [4.3][4.3]

vi [2]

pwd [2] [3.5][3.5]

ls [1] [4.3][4.3]

ls [1] [4.3][4.3]

ls [2]

Sequence Pattern Trie

Root

A Sequence Pattern Trie is created for each class.

14

Page 25: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification

Our approachOur approach Library Creation ClassificationClassification

Target Environment Description Experiments & Results

Conclusions and Future Works

15

OutlineOutline

Page 26: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ClassificationClassification

pwd

fs

fg

vi

man

ls

finger

more

ls

...Sequence 1

Class 1Sequence 2

Class 2Sequence n

Class n

Pattern 2 Pattern 3

Pattern Library

Classification

vi

more

ls

Sequence to classify

Compare_Patterns

Compare_Patterns

On-Line Sequence

Classification

ONLINE SEQUENCE

CLASS

.

Library Creation

Pattern to classify

TestingTesting TrieTrie

Pattern 1

Compare_Patterns

ClassClass TrieTrie

16

Page 27: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are in both Tries:

If ( abs(chi2TestingTrie – chi2

ClassTrie) ≤ ThresholdValue ):

SimilaritySimilarity between both tries.

Result [ElementTestingTrie, PrefixTestingTrie, ChiChi22TestingTrieTestingTrie]

Page 28: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are in both Tries:

If (abs(5.1 – 7.1) ≤ ThresholdValue ):

SimilaritySimilarity between both tries.

Result [vi , pwd, 5.15.1]

Page 29: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [Element [ElementTestingTrieTestingTrie, Prefix, PrefixTestingTrieTestingTrie, , (Chi(Chi22TestingTrieTestingTrie * -1) * -1)]]

Page 30: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [who, pwd [who, pwd vi, vi, (-4.3)(-4.3)]]

Page 31: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

.

17

pwd [3]

vi [1] [5.1][5.1]

who [1] [4.3][4.3]

vi [2]

who [2] [3.5][3.5]

Root

Classification – Comparing ProcessClassification – Comparing ProcessClass Trie Testing Trie

pwd [3]

vi [1] [7.1][7.1]

pwd [1] [[7.3]7.3]

vi [2]

pwd [2] [1.5][1.5]

ls [1] [0.3][0.3]

ls [2]

Root

If the node (and its prefix) are only in the Testing Trie:

DifferenceDifference between both tries.

Result Result [who, vi, [who, vi, (-3.5)(-3.5)]]

Page 32: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ResultResult:[Element1, Prefix1, ValueValue11]

[Element2, Prefix2, ValueValue22]

[Element3, Prefix3, ValueValue33]

[Element4, Prefix4, ValueValue44]

[Elementn, Prefixn, ValueValuenn]

Each comparison (ClassTrie, TestingTrie):

A comparision value

.

Comparison Value

18

Classification – Comparing ProcessClassification – Comparing Process

Page 33: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ResultResult:[vi, pwd, + 5.1+ 5.1]

[who, pwd vi, - 4.3- 4.3]

[who, pwd, - 3.5- 3.5]

.

- 2.7

18

Classification – Comparing ProcessClassification – Comparing Process

Comparison Value

Page 34: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

ClassificationClassification

pwd

fs

fg

vi

man

ls

finger

more

ls

...Sequence 1

Class 1Sequence 2

Class 2Sequence n

Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vi

more

ls

…Pattern

to classify

Sequence to classify

ONLINE SEQUENCE

CLASS

On-Line Sequence

Classification

Compare_Patterns

Compare_Patterns

Compare_Patterns

comparision value

comparision value

comparision value

.

19

Page 35: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

comparision value

comparision value

comparision value

ClassificationClassification

pwd

fs

fg

vi

man

ls

finger

more

ls

...Sequence 1

Class 1Sequence 2

Class 2Sequence n

Class n

Pattern 1 Pattern 2 Pattern 3

Pattern Library

Library Creation Classification

vi

more

ls

…Pattern to classify

Sequence to classify

Compare_Patterns

Compare_Patterns

Compare_Patterns

ONLINE SEQUENCE

CLASS

On-Line Sequence

Classification

Greatest Comparison

Value

.

20

Page 36: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target EnvironmentTarget Environment DescriptionDescription Experiments & Results

Conclusions and Future Works

21

OutlineOutline

Page 37: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Environment –Environment – UNIX command line sequences UNIX command line sequences

# Start session 1# Start session 1 cd ~/private/docs ls -laF | more cat foo.txt bar.txt zorch.txt > a.txt exit# End session 1# End session 1

# Start session 2 cd ~/games/ xquake & fg …

**SOF****SOF**cd<1>ls-laF|morecat<3>><1>exit**EOF****EOF**…

one "file name" argument

three "file name" arguments

one "file name" argument

Command histories of 9 UNIX computer usersUNIX computer users at over 2 yearsUCI Repository of ML Database [Newman C., Hettich S., Merz, C. (1998)]

22

Page 38: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target EnvironmentTarget Environment Description Experiments & ResultsExperiments & Results

Conclusions and Future Works

23

OutlineOutline

Page 39: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

Page 40: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

2. 2. Classification Algorithm:Classification Algorithm: Sequence to classify (sequences of very different sizes) Classified in the class with the greatest value (result value).

Page 41: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

9 files (users) containing from about 10.000 to 60.000 commands each. 1. 1. Extracting Patterns:Extracting Patterns: A trie is created for each user Pattern Library

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

.

24

2. 2. Classification Algorithm:Classification Algorithm: Sequence to classify (sequences of very different sizes) Classified in the class with the greatest value (result value).

3. 3. Evaluating the result:Evaluating the result: Calculate:

difference between the greatest value and the second greatest value (+)(+)x difference between the real classification value and the greatest value (-)(-)

(The greater the difference, the better the classification)

Page 42: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Results Results – – UNIX command line sequencesUNIX command line sequences

Unix Commands Classification – User 6

.

average of 25 simulation results

25

Cla

ssif

icat

ion

Val

ue

Length of the Sequence to classify

Page 43: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Results Results – – UNIX command line sequencesUNIX command line sequences

Minimum length for classifying a UNIX Computer User correctly

.

26

Unix Computer User (Class)

Len

gth

of

the

Seq

uen

ce t

o c

lass

ify

Page 44: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Motivation and Introduction Sequence classification Our approachOur approach

Library Creation Classification

Target Environment Description Experiments & Results

Conclusions and Future WorksConclusions and Future Works

27

OutlineOutline

Page 45: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

A threshold must be found

Long time for creating the tries

Results depend on the length of the sub-sequences used to create the trie

ConclusionsConclusions

28

Page 46: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Effective method to classify UNIX users

If a behavior can be represented by sequences,

the proposed classification method can be used

If a new class is added, only its trie must be created (the others are not modified)

This method could be used for other tasks:

sequence prediction, sequence clustering…

RoboCup Coach 2006 Competition (succesfully results)

ConclusionsConclusions

29

Page 47: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Pattern Library One Trie for all classes (users).

Classification method without threshold value

Analysis comparing our approach to others (HMMs)

Future WorksFuture Works

30

Page 48: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

Thank you!Thank you!

Page 49: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{jiglesia, ledezma, masm}@inf.uc3m.es

.

QuestionsQuestions

Page 50: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis

Computer Science Department Universidad Carlos III de Madrid

Avda. de la Universidad, 30. 28911 Leganés, Spain{ jiglesia, ledezma, masm}@inf.uc3m.es

.

Related to Questions...Related to Questions...

29

Page 51: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser1

Page 52: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 User On-Line vs Class User1 49 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser1

Page 53: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 User On-Line vs Class User1 49 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser1

Correctly ClassifiedCorrectly Classified

Page 54: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 User On-Line vs Class User1 49 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser1

Correctly ClassifiedCorrectly Classified

20

Page 55: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser2

Page 56: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 49

User On-Line vs Class User2 User On-Line vs Class User2 9 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

ClassUser2

Page 57: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 User On-Line vs Class User1 49 49

User On-Line vs Class User2 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

NO Correctly NO Correctly ClassifiedClassified

ClassUser2

Page 58: Sequence Classification Using Statistical Pattern Recognition

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern

Recognition

Experiments Experiments – – UNIX command line sequencesUNIX command line sequences

**SOF****SOF**cd<1>ls-laF|morecat<3>>……

Pattern/Class User0

**SOF****SOF**ls<1>exit<1>ls-laFxquake&fg……

**SOF****SOF**vi<1>vi<3>ls-lacat<2>……

USER 0

Class0

USER 1

Class1

USER 8

Class8

Pattern Library

**SOF****SOF**ls-laF|Morecd<4>

…Test User

Sequence ClassificationPattern/Class

User1Pattern/Class

User8

User On-Line

Є Class c

User On-Line vs Class User0 21

User On-Line vs Class User1 User On-Line vs Class User1 49 49

User On-Line vs Class User2 User On-Line vs Class User2 9 9

User On-Line vs Class User3 3

User On-Line vs Class User4 12

User On-Line vs Class User5 29

User On-Line vs Class User6 -1

User On-Line vs Class User7 0

User On-Line vs Class User8 11

NO Correctly NO Correctly ClassifiedClassified

- 40

ClassUser2