When to Update the Sequential Patterns of Stream Data?

21
1 When to Update the Sequential Patterns of Stream Data? Q. Zheng, K. Xu, and S. Ma, in Proc. o f the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2004.8.12

description

When to Update the Sequential Patterns of Stream Data?. Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003 . Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date: 2004.8.12. Introduction. - PowerPoint PPT Presentation

Transcript of When to Update the Sequential Patterns of Stream Data?

Page 1: When to Update the Sequential Patterns of Stream Data?

1

When to Update the Sequential Patterns of Stream Data?

Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003.

Adviser: Jia-Ling KohSpeaker: Shu-Ning ShinDate: 2004.8.12

Page 2: When to Update the Sequential Patterns of Stream Data?

2

Introduction An experimental method, called TPD

(Tradeoff between Performance and Difference), to decide when to update the sequential patterns of stream data by making a tradeoff between the performance of increasingly

updating algorithms and the difference of sequential

patterns.

Page 3: When to Update the Sequential Patterns of Stream Data?

3

Stream Data Model (1) Stream event:

Ei=<ei, tn> ei: stream event type tn: the time of stream event type occurring

Stream tuple: Qi=((ek1, ek2, …,ekm), ti)=(Ek1, Ek2, …, Ekm)

Length Stream tuple: |Qi|=|(ek1, ek2, …, ekm)|=m

Page 4: When to Update the Sequential Patterns of Stream Data?

4

Stream Data Model (2) Stream queue:

Sij=<Qi, Qi+1, …, Qj>, where ti< ti+1< …< tj

=<(Ei1, …, Eik)…(Ej1, …, Ejm)> Length of queue:

|Sij|=<Qi, Qi+1, …, Qj>=j-i+1 Stream viewing window:

Wk=<Qm, …, Qn|d=n-m+1> Size of viewing window:

|Wk|=n-m+1=d

Page 5: When to Update the Sequential Patterns of Stream Data?

5

Stream Data Model (3) occur(seqm, Wk):

|the times of seqm occurring in Wk| Seqm=<ei1, ei2, …, eim> Wk: an stream viewing window

support(seqm, Wk): Occur(seqm, Wk) / |Wk|

Page 6: When to Update the Sequential Patterns of Stream Data?

6

Stream Data Model - Example S18=<Q1, Q2 ,Q3, Q4, Q5, Q6, Q7, Q8>

S18=<E2, E5, E1, (E3, E6), E7, E9, E10>

W5=< Q1, Q2 ,Q3, Q4, Q5, Q6, Q7 |d=7>

Page 7: When to Update the Sequential Patterns of Stream Data?

7

Sliding Stream viewing window ΔWi: incremental window, i=0, 1, 2, 3, …

ΔW0: initial window Wi+1=Wi+ΔWi+1

|ΔW1|/|W0|: incremental ratio of stream data

Page 8: When to Update the Sequential Patterns of Stream Data?

8

Estimation of difference between the old and new sequential patterns Difference:

LWk: old frequent sequences in Wk

LWk+1: new frequent sequences in Wk+1

LWkΔ LWk+1 : symmetric difference

0),(,,),( 1

1

1

1

KKK

KK

KK

KK WWWWW

WWWW LLdotherwiseLif

LL

LLLLd

Page 9: When to Update the Sequential Patterns of Stream Data?

9

The Algorithm of Updating Sequential Pattern (IUS) (1) IUS algorithm uses the frequent and negative border sequences in D

B and db as the candidates to compute new frequent sequences and negative border sequences in the updated database U.

DB: The original database which contains old time-related data. db: The increment database which contains new time-related data. dd: The decrement database from DB which contains deleted time-related data. U: The updated database. When database being increasingly updated, the total set of dat

a which are equal to DB+db. When database being decreasingly updated, the total set of data which are equal to DB-dd.

Support(F, X): the support of the sequence X in the X database, where X ∈ {db, dd, DB, U}.

Min_supp:Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X database, where X ∈{db, dd, DB, U}. LX : Frequent sequences in the X database, where X ∈{db, dd, DB, U}. NBD(X)=CX- LX, where NBD(X) consists of the sequences in X database whose sub_sets are

Page 10: When to Update the Sequential Patterns of Stream Data?

10

IUS (2) Property1: Let B be a frequent sequence in Wk, if , w

e have occur(A, DB)>occur(B, DB). Property2:

Proof: assume that occur(S,DB)<Min_sup*|DB| and occur(S,db)<Min_sup*|db|occur(S,DB+db)<Min_sup*|DB+db|Support(S,U)<Min_sup, contradict the given condition.

BAA ,

.),( dbDBU LSorLShavewethendbDBULSandSsequencea

dbDB LSandLS

Page 11: When to Update the Sequential Patterns of Stream Data?

11

IUS – using the stream data model

Wk: The original stream view window which contains old time-related data.

ΔWk+1: The increment stream view window which contains new time-related data.

Wk+1: The updated stream view window. When stream data being increasingly updated, the total set of data which are equal to Wk+ΔWk+1

Support(F, X): the support of the sequence F in the X stream view windows, where X { W∈ k+1 ,Wk, ΔWk+1}.

Min_supp :Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X stream view windows, where X { W∈ k+1 ,Wk, ΔWk+1}.

LX : Frequent sequences in the X stream view windows, where X { W∈ k+1 ,Wk, ΔW

k+1}.

NBD(X)=CX- LX, where NBD(X) consists of the sequences in X stream view windows whose sub_sets are frequent, its Support is lower than Min_supp and greater than Min_nbd_supp. Note that X {W∈ k+1 ,Wk, ΔWk+1}

Page 12: When to Update the Sequential Patterns of Stream Data?

12

IUS – Algorithm (1)

Page 13: When to Update the Sequential Patterns of Stream Data?

13

IUS – Algorithm (2)

Page 14: When to Update the Sequential Patterns of Stream Data?

14

Tradeoff between Performance and Difference (TPD) (1) Use the speedups to measurement of IUS:

Speedup=the execution time of Robust_search / the execution time of IUS

Use the difference to measure the old and the new frequent sequence.

Use Min-Max normalization:

Page 15: When to Update the Sequential Patterns of Stream Data?

15

TPD (2) TPD method maps the curve of the

speedup and the difference changing with the size of incremental windows into the same graph under the same scale.

The points of intersection of the two curves are the suitable range of the incremental ratio of the initial windows for IUS.

Page 16: When to Update the Sequential Patterns of Stream Data?

16

Experiment conducted a set of experiments to find when to

update sequential patterns for stream data. Environment:

DELL PC Sever with 2 CPU Pentium II Memory 512M, Disk 16G Operating system: Red Hat Linux 6.0

Data1: the alarms in GSM Networks, contain 194 alarm types

and 100k alarm events. The time of alarm events in the data1 range from

2001-08-11-18 to 2001-08-13-17.

Page 17: When to Update the Sequential Patterns of Stream Data?

17

Experiment 1 – on Data 1|initial window|=20k

The intersection point: 6KThe suitable range of incremental ratio of initial window: 30% of W0.

Page 18: When to Update the Sequential Patterns of Stream Data?

18

Experiment 2 – on Data 1|initial window|=40k

The intersection point: 9K~10KThe suitable range of incremental ratio of initial window: 22.5%~25% of W0.

Page 19: When to Update the Sequential Patterns of Stream Data?

19

Experiment 3 – on Data 1|initial window|=50k

The intersection point: 15K~18KThe suitable range of incremental ratio of initial window: 30%~36% of W0.

Page 20: When to Update the Sequential Patterns of Stream Data?

20

Experiment 4 – on Data 1|initial window|=60k

The intersection point: 10K~12KThe suitable range of incremental ratio of initial window: 16.7%~20% of W0.

Page 21: When to Update the Sequential Patterns of Stream Data?

21

Conclusion TPD method, it is shown experimentally that t

he suitable range of incremental ratio of initial windows to update is about 20 to 30 percent of the size of initial windows for the IUS algorithm.