FS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web L ogs

46
FS-Miner : Efficient and FS-Miner : Efficient and Incremental Mining of Incremental Mining of Frequent Sequence Patterns Frequent Sequence Patterns in Web Logs in Web Logs Maged EL-Sayed, Carolina Ruiz, and Elke A. Maged EL-Sayed, Carolina Ruiz, and Elke A. Rundensteiner Rundensteiner 6th ACM International Workshop on Web Information and 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), pp.128-135, 2004 Data Management (WIDM 2004), pp.128-135, 2004 November 12-13, 2004, Washington, DC, USA November 12-13, 2004, Washington, DC, USA Advisor: Professor Hsin-Hsi Chen Advisor: Professor Hsin-Hsi Chen Reporter: Clarence Min-Chi Hsieh Reporter: Clarence Min-Chi Hsieh Natural Language Processing Laboratory, Natural Language Processing Laboratory, Dept. of Computer Science and Info. Dept. of Computer Science and Info. Engineering, NTU Engineering, NTU 2005/10/11 2005/10/11

description

FS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web L ogs. Maged EL-Sayed, Carolina Ruiz, and Elke A. Rundensteiner 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), pp.128-135, 2004 November 12-13, 2004, Washington, DC, USA - PowerPoint PPT Presentation

Transcript of FS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web L ogs

Page 1: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

FS-Miner : Efficient and FS-Miner : Efficient and Incremental Mining of Incremental Mining of

Frequent Sequence Patterns Frequent Sequence Patterns in Web Logsin Web LogsMaged EL-Sayed, Carolina Ruiz, and Elke A. Maged EL-Sayed, Carolina Ruiz, and Elke A.

RundensteinerRundensteiner

6th ACM International Workshop on Web Information and 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), pp.128-135, 2004Data Management (WIDM 2004), pp.128-135, 2004

November 12-13, 2004, Washington, DC, USANovember 12-13, 2004, Washington, DC, USA

Advisor: Professor Hsin-Hsi ChenAdvisor: Professor Hsin-Hsi ChenReporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh

Natural Language Processing Laboratory,Natural Language Processing Laboratory,Dept. of Computer Science and Info. Dept. of Computer Science and Info.

Engineering, NTUEngineering, NTU2005/10/112005/10/11

Page 2: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 22Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

OutlineOutlineIntroductionIntroductionFS-Tree ConstructionFS-Tree ConstructionMining the FS-TreeMining the FS-TreeMaintaining the FS-Tree Maintaining the FS-Tree

IncrementallyIncrementallyMining the FS-Tree IncrementallyMining the FS-Tree IncrementallyInteractive MiningInteractive MiningExperimental EvaluationExperimental EvaluationConclusionsConclusions

Page 3: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 33Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

IntroductionIntroduction

Path Traversal PatternPath Traversal Pattern– FS, SSFS, SS– AABBC, BC, BCCD…D…

Web Traversal PatternWeb Traversal Pattern– IPA, MFTPIPA, MFTP– AABBDDCCA, CA, CAACCAADDB…B…

Page 4: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 44Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Introduction Introduction (Cont.)(Cont.)

Consider Backward TraversalConsider Backward TraversalSubsequenceSubsequence

– Need Need ContinuousContinuousMSuppRMSuppRlinklink System Define System DefineMSuppRMSuppRseqseq User Define User DefineMSuppCMSuppClinklink

MSuppCMSuppCseqseq

Page 5: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 55Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree ConstructionFS-Tree Construction

SIDSID InSeqInSeq11 dgidgi22 dgdg33 cdehicdehi44 cdecde55 cbcdgcbcdg66 cbcb77 abcdgiabcdgi88 abcdabcd99 bdehibdehi

1010 bdehbdeh1111 cdebfabccdebfabc1212 cdefabccdefabc1313 aicaic1414 diedie1515 igdbaigdba1616 efaefa1717 efef1818 efabefab

SIDSID InSeqInSeq

MSuppCMSuppClinklink=2=2

MSuppCMSuppCseqseq=3=3

Total # of links = 50Total # of links = 50

MSuppRMSuppRlinklink=4%=4%

MSuppRMSuppRseqseq=6%=6%

System Define:System Define: MSuppR MSuppRlinklink

User Define:User Define: MSuppR MSuppRseqseq

Page 6: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 66Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLinkCountCountd-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22e-be-b 11

LinkLinkCountCount

b-fb-fe-fe-fa-ia-ii-ci-cd-id-ii-ei-ei-gi-gg-dg-dd-bd-bb-ab-a

11111111111111111111

SIDSID InSeqInSeq11 dgidgi22 dgdg33 cdehicdehi44 cdecde55 cbcdgcbcdg66 cbcb77 abcdgiabcdgi88 abcdabcd99 bdehibdehi

1010 bdehbdeh1111 cdebfabccdebfabc1212 cdefabccdefabc1313 aicaic1414 diedie1515 igdbaigdba1616 efaefa1717 efef1818 efabefab

SIDSID InSeqInSeq

Page 7: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 77Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22e-be-b 11

LinkLink CountCount

b-fb-fe-fe-fa-ia-ii-ci-cd-id-ii-ei-ei-gi-gg-dg-dd-bd-bb-ab-a

11111111111111111111

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

e-be-b 11LinkLink CountCount

b-fb-fe-fe-fa-ia-ii-ci-cd-id-ii-ei-ei-gi-gg-dg-dd-bd-bb-ab-a

11111111111111111111

1111SIDSID

1111121213131313141414141515151515151515

Non-Frequent Non-Frequent Links Table(NFLT)Links Table(NFLT)

When FS-Tree BuiltWhen FS-Tree Built

Page 8: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 88Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq11 dgidgi

dd

gg

i:1i:1

11

11

Page 9: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 99Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq22 dgdg

dd

g:2g:2

i:1i:1

22

11

Page 10: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1010Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq33 cdehicdehi

dd

g:2g:2

i:1i:1

22

11

cc

dd

ee

hh

i:3i:3

11

11

11

11

Page 11: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1111Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq44 cdecde

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

Page 12: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1212Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq55 cbcdgcbcdg

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

bb

cc

dd

g:5g:5

11

11

11

11

Page 13: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1313Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq66 cbcb

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

Page 14: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1414Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq77 abcdgiabcdgi

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

dd

gg

i:7i:7

11

11

11

11

11

Page 15: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1515Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq88 abcdabcd

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

Page 16: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1616Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq99 bdehibdehi

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

hh

i:9i:9

11

11

11

11

Page 17: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1717Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1010 bdehbdeh

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

22

22

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

Page 18: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1818Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1111 cdcdebfebfabcabc

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

33

33

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

11

11

11

Page 19: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 1919Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1212 cdcdefefabcabc

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

22

22

22

Page 20: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2020Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1313 aicaic

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

22

22

22

Page 21: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2121Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1414 diedie

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

22

22

22

Page 22: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2222Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

FS-Tree Construction FS-Tree Construction (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1515 gdbagdba

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

22

22

22

Page 23: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2323Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-TreeMining the FS-Tree

Step 1: Extracting Derived PathsStep 1: Extracting Derived PathsStep 2: Constructing Conditional Step 2: Constructing Conditional

Sequence BaseSequence BaseStep 3: Constructing Conditional Step 3: Constructing Conditional

FS-TreeFS-TreeStep 4: Extracting Frequent Step 4: Extracting Frequent

SequencesSequences

Page 24: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2424Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

Step 1Step 1LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 22

a-ba-b 44b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRoot

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

22

22

22

Page 25: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2525Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

Step 2Step 2RootRoot

cc

dd

e:4e:4

hh

44

44

11

bb

dd

ee

h:10h:10

22

22

22

Conditional Sequence base :Conditional Sequence base :

Step 3Step 3 Conditional FS-Tree:Conditional FS-Tree:

(c-d:1, d-e:1), (b-d:2, d-e:2)(c-d:1, d-e:1), (b-d:2, d-e:2)

RootRoot

ee

dd

cc bb

33

11 22

RootRoot

ee

dd

11

RootRoot

ee

dd

cc

11

11

RootRoot

ee

dd

cc

33

11

Page 26: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2626Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

Step 4Step 4

Depth first traversalDepth first traversal

RootRoot

ee

dd

cc bb

33

11 22<<deh deh : 3>: 3>

OutputOutput

Page 27: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2727Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

The AnswersThe AnswersLinkLink d-gd-g

Derived PathsDerived Paths(d-g:2)(d-g:2)

(c-b:2, b-c:1, c-d:1, d-g:1)(c-b:2, b-c:1, c-d:1, d-g:1)(a-b:2, b-c:2, c-d:2, d-g:1)(a-b:2, b-c:2, c-d:2, d-g:1)

Conditional Sequence basesConditional Sequence bases

(c-b:1, b-c:1, c-d:1)(c-b:1, b-c:1, c-d:1)(a-b:1, b-c:1, c-d:1)(a-b:1, b-c:1, c-d:1)Conditional FS-TreesConditional FS-Trees

Frequent SequencesFrequent Sequences

LinkLink c-dc-dDerived PathsDerived Paths

(c-d:4)(c-d:4)(c-b:2, b-c:1, c-d:1)(c-b:2, b-c:1, c-d:1)(a-b:2, b-c:2, c-d:2)(a-b:2, b-c:2, c-d:2)Conditional Sequence basesConditional Sequence bases

(c-b:1, b-c:1)(c-b:1, b-c:1)(a-b:2, b-c:2)(a-b:2, b-c:2)

Conditional FS-TreesConditional FS-Trees

(b-c:3)(b-c:3)Frequent SequencesFrequent Sequences

<bcd <bcd : 3>: 3>

Page 28: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2828Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

The AnswersThe Answers

LinkLink e-he-hDerived PathsDerived Paths

(c-d:4, d-e:4, e-h:1)(c-d:4, d-e:4, e-h:1)(b-d:3, d-e:2, e-h:2)(b-d:3, d-e:2, e-h:2)Conditional Sequence basesConditional Sequence bases

(c-d:1, d-e:1)(c-d:1, d-e:1)(b-d:2, d-e:2)(b-d:2, d-e:2)

Conditional FS-TreesConditional FS-Trees

(d-e:3)(d-e:3)Frequent SequencesFrequent Sequences

<<deh deh : 3>: 3>

LinkLink d-ed-eDerived PathsDerived Paths(c-d:4, d-e:4)(c-d:4, d-e:4)(b-d:3, d-e:2)(b-d:3, d-e:2)

Conditional Sequence basesConditional Sequence bases

(c-d:4)(c-d:4)(b-d:2)(b-d:2)

Conditional FS-TreesConditional FS-Trees

(c-d:4)(c-d:4)Frequent SequencesFrequent Sequences

<<cde cde : 4>: 4>

Page 29: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 2929Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree Mining the FS-Tree (Cont.)(Cont.)

The AnswersThe Answers

LinkLink a-ba-bDerived PathsDerived Paths

(a-b:2)(a-b:2)(f-a:2, a-b:2)(f-a:2, a-b:2)

Conditional Sequence basesConditional Sequence bases

(f-a:2)(f-a:2)Conditional FS-TreesConditional FS-Trees

Frequent SequencesFrequent Sequences

LinkLink b-cb-cDerived PathsDerived Paths(c-b:2, b-c:1)(c-b:2, b-c:1)(a-b:2, b-c:2)(a-b:2, b-c:2)

(f-a:2, a-b:2, b-c:2)(f-a:2, a-b:2, b-c:2)Conditional Sequence basesConditional Sequence bases

(c-b:1)(c-b:1)(a-b:2)(a-b:2)

(f-a:2, a-b:2)(f-a:2, a-b:2)Conditional FS-TreesConditional FS-Trees

(a-b:4)(a-b:4)Frequent SequencesFrequent Sequences

<abc : 4><abc : 4>

Page 30: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3030Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Maintaining the FS-Tree Maintaining the FS-Tree IncrementallyIncrementally

1616 efaefa1717 efef1818 efabefab

SIDSID InSeqInSeqe-f:3e-f:3f-a:2f-a:2a-b:1a-b:1

MSuppCMSuppClinklink=2=2

MSuppCMSuppCseqseq=3=3

e-f in NFLT Becomes e-f in NFLT Becomes Frequent, Move to Frequent, Move to Table HT Table HT

LinkLink CountCount

e-fe-f 11SIDSID1212

Non-Frequent Non-Frequent Links Table(NFLT)Links Table(NFLT)

1212 cdefabccdefabcSIDSID InSeqInSeq

Retrieve the Sequence Retrieve the Sequence from Original DBfrom Original DBDelete this record from NFLTDelete this record from NFLT

(Move to HT)(Move to HT)

Page 31: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3131Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Maintaining the FS-Tree IncrementallyMaintaining the FS-Tree Incrementally (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 44

a-ba-b 55b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1212 cdcdefefabcabc

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

11

11

11

e-fe-f 44

ff

aa

bb

c:12c:12

11

11

11

11

DeleteDelete

Page 32: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3232Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Maintaining the FS-Tree IncrementallyMaintaining the FS-Tree Incrementally (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 44

a-ba-b 55b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1616 efaefa

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

11

11

11

e-fe-f 44

ff

aa

bb

c:12c:12

11

11

11

11

ee11

ff

a:16a:1611

Page 33: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3333Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Maintaining the FS-Tree IncrementallyMaintaining the FS-Tree Incrementally (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 44

a-ba-b 55b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1717 efef

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

11

11

11

e-fe-f 44

ff

aa

bb

c:12c:12

11

11

11

11

ee22

f:17f:17

a:16a:1611

Page 34: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3434Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Maintaining the FS-Tree IncrementallyMaintaining the FS-Tree Incrementally (Cont.)(Cont.)

LinkLink CountCount

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 44

a-ba-b 55b-db-d 22

ListHListHHeader Table (HT)Header Table (HT)

RootRootSIDSID InSeqInSeq1818 efabefab

dd

g:2g:2

i:1i:1

22

11

cc

dd

e:4e:4

hh

i:3i:3

44

44

11

11

b:6b:6

cc

dd

g:5g:5

22

11

11

11

aa

cc

bb

d:8d:8

gg

i:7i:7

22

22

22

11

11

bb

dd

ee

h:10h:10

i:9i:9

22

22

22

11

ff

aa

bb

cc

11

11

11

e-fe-f 44

ff

aa

bb

c:12c:12

11

11

11

11

ee33

f:17f:17

a:16a:1622

b:18b:1811

Page 35: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3535Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree IncrementallyMining the FS-Tree Incrementally Type 1:Type 1:

– Mine for those Links if they are AffectedMine for those Links if they are Affected Type 2 and 4:Type 2 and 4:

– Mine for these LinksMine for these Links Type 3 and 5:Type 3 and 5:

– Delete Previously Discovered Patterns that Include these Delete Previously Discovered Patterns that Include these LinksLinks

Type 6, 7, 8, and 9:Type 6, 7, 8, and 9:– Do NothingDo Nothing

FrequentFrequentLinksLinks

PotentiallyPotentiallyFrequentFrequent

LinksLinks

Non-FrequentNon-FrequentLinksLinks

Header Table (HT)Header Table (HT)Non-Frequent Non-Frequent

Links Table (NFLT)Links Table (NFLT)11

2233 44

55

66

7788

99

Page 36: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3636Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree IncrementallyMining the FS-Tree Incrementally (Cont.)(Cont.)

The AnswersThe Answers LinkLink a-ba-bDerived PathsDerived Paths

(c-d:4, d-e:4, e-f:1, f-a:1, a-b:1)(c-d:4, d-e:4, e-f:1, f-a:1, a-b:1)(a-b:2)(a-b:2)

(f-a:1, a-b:1)(f-a:1, a-b:1)(e-f:3, f-a:2, a-b:1)(e-f:3, f-a:2, a-b:1)Conditional Sequence basesConditional Sequence bases

(c-d:1, d-e:1, e-f:1, f-a:1)(c-d:1, d-e:1, e-f:1, f-a:1)(f-a:1)(f-a:1)

(e-f:1, f-a:1)(e-f:1, f-a:1)Conditional FS-TreesConditional FS-Trees

(f-a:3)(f-a:3)Frequent SequencesFrequent Sequences

<fab <fab : 3>: 3>

LinkLink CountCount ListHListHHeader Table (HT)Header Table (HT)

d-gd-g 44g-ig-i 22c-dc-d 77d-ed-e 66e-he-h 33h-ih-i 22c-bc-b 22b-cb-c 55

f-af-a 44

a-ba-b 55b-db-d 22

e-fe-f 44

Page 37: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3737Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Mining the FS-Tree IncrementallyMining the FS-Tree Incrementally (Cont.)(Cont.)

The AnswersThe AnswersLinkLink e-fe-f

Derived PathsDerived Paths(c-d:4, d-e:4, e-f:1)(c-d:4, d-e:4, e-f:1)

(e-f:3)(e-f:3)Conditional Sequence basesConditional Sequence bases

(c-d:1, d-e:1)(c-d:1, d-e:1)

Conditional FS-TreesConditional FS-Trees

Frequent SequencesFrequent Sequences

LinkLink f-af-aDerived PathsDerived Paths

(c-d:4, d-e:4, e-f:1, f-a:1)(c-d:4, d-e:4, e-f:1, f-a:1)(f-a:1)(f-a:1)

(e-f:3, f-a:2)(e-f:3, f-a:2)Conditional Sequence basesConditional Sequence bases

(c-d:1, d-e:1, e-f:1)(c-d:1, d-e:1, e-f:1)(e-f:2)(e-f:2)

Conditional FS-TreesConditional FS-Trees

(e-f:3)(e-f:3)Frequent SequencesFrequent Sequences

<efa <efa : 3>: 3>

Page 38: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3838Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Interactive MiningInteractive Mining

Setting the Setting the MSuppCMSuppClinklink to a to a Small Enough ValueSmall Enough Value– Enough Information in the Enough Information in the FS-FS-

TreeTree– Without to Reference the Without to Reference the

Original DatabaseOriginal Database

Page 39: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 3939Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental EvaluationExperimental Evaluation

MS Data SetMS Data Set– Microsoft Anonymous Web Data SetMicrosoft Anonymous Web Data Set– 32,711 Sessions32,711 Sessions

1 up to 35 page references1 up to 35 page references

– 294 distinct pages294 distinct pages MSNBC Data SetMSNBC Data Set

– MSNBCMSNBC Anonymous Web Data SetAnonymous Web Data Set– 989,818 Sections989,818 Sections

1 up to several thousands of page reference1 up to several thousands of page reference

– 17 distinct pages17 distinct pages http://kdd.ics.uci.eduhttp://kdd.ics.uci.edu

Page 40: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4040Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Scalability with the Number of Input SessionsScalability with the Number of Input Sessions– MS Data SetMS Data Set

No No MSuppRMSuppRseqseq????

Page 41: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4141Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Scalability with the Number of Input SessionsScalability with the Number of Input Sessions– MSNBC Data SetMSNBC Data Set

No No MSuppRMSuppRseqseq????

Page 42: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4242Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Scalability with Support ThresholdScalability with Support Threshold– MS Data SetMS Data Set

Page 43: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4343Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Scalability with Support ThresholdScalability with Support Threshold– MSNBC Data SetMSNBC Data Set

Page 44: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4444Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Incremental MiningIncremental Mining– MS Data SetMS Data Set

Page 45: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4545Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

Experimental Evaluation Experimental Evaluation (Cont.)(Cont.)

Incremental MiningIncremental Mining– MSNBC Data SetMSNBC Data Set

Page 46: FS-Miner  :  Efficient and Incremental Mining of Frequent Sequence Patterns in Web  L ogs

SlideSlide - - 4646Copyright © Natural Language Processing Lab., NTU, 2005Copyright © Natural Language Processing Lab., NTU, 2005

Reporter: Clarence Min-Chi HsiehReporter: Clarence Min-Chi Hsieh FS-Miner : Efficient and Incremental MiningFS-Miner : Efficient and Incremental Mining of Frequent Sequence Patterns in Web Logsof Frequent Sequence Patterns in Web Logs

ConclusionsConclusionsTwo Scans for the Input DatabaseTwo Scans for the Input DatabaseAllows for Incremental Discovery Allows for Incremental Discovery

of Frequent Sequences when the of Frequent Sequences when the Input Database is UpdatedInput Database is Updated

Allows Interactive Response to Allows Interactive Response to Changes to the Minimun SupportChanges to the Minimun Support