Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences

15
1 Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences Ajumobi Udechukwu, Ken Barker, Reda Alhajj Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05) Advisor Jia-Ling Koh Speaker Chun-Wei Hsieh

description

Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences. Ajumobi Udechukwu, Ken Barker, Reda Alhajj Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05) - PowerPoint PPT Presentation

Transcript of Maintaining Knowledge-Bases of Navigational Patterns from Streams of Navigational Sequences

1

Maintaining Knowledge-Bases of Navigational Patterns from Streams of

Navigational Sequences

Ajumobi Udechukwu, Ken Barker, Reda Alhajj

Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA’05)

Advisor : Jia-Ling Koh

Speaker : Chun-Wei Hsieh

2

Introduction

Navigational patterns: traversal patterns

Two broad techniques for mining

navigational patterns– 1. level-wise, apriori-based techniques– 2. tree-based techniques

3

Methodology

Sliding window Batch-update strategy

– Batch: the web log in the base time unit

Example

B1

4

B1 B2B1 B2 B3B1 B2 B3 B4B1 B2 B3 B4 B5B1 B2 B3 B4 B5 B6

4

Adapted GST

Adapted generalized suffix tree Appending a stop symbol to all strings Mining without thresholds

5

1,12,1

1,22,2

1,32,3

LQ R$$R Q

R$

1,42,43,3

$

3,1

$

3,1

$

1,11,2

1,3

LQR$$RQ

R$

1,4

$

1,12,1

1,22,2

1,32,3

LQR$$RQ

R$

1,42,4

$

Adapted GST

LQR

1,1LQR$

1,11,2LQR$$RQ

1,11,2

1,3

LQR$$RQ

R$

LQR LQ

6

Adapted GST

7

The Challenge of Adapted GST

”LQ” occurs in B1 with support count of 4

and “L” occurs independently in B2 with support count of 2

Total count of “L” should be 4 + 2

8

AC-NAP tree 1

9

AC-NAP tree 2

Output all node labels and counts to a database

10

Maintaining patterns within a window

11

Maintaining patterns within a window

Count total support

Remove out_of_date patterns

12

Experiments

OS: Microsoft Windows XP professional edition CPU: 2GHz Intel Pentium 4 RAM: 512MB Program language: Java DBMS: MySQL Data: real-world web logs of ”msnbc.com”

13

Experiments

14

Experiments

15

Experiments