Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1...

30
Online Frequent Episode Mining Xiang Ao 1 , Ping Luo 1 , Chengkai Li 2 , Fuzhen Zhuang 1 and Qing He 1 1 2 22/2/7 X. Ao et al. Online Frequent Episode Mining 1

Transcript of Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1...

Page 1: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Online Frequent Episode MiningXiang Ao1, Ping Luo1, Chengkai Li2, Fuzhen Zhuang1 and Qing He1

1

2

23/4/19 X. Ao et al. Online Frequent Episode Mining 1

Page 2: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

2

Agenda

Introduction

• Problem Formulation

• Solution Framework

• Experiental Results

• Conlusions

23/4/19 X. Ao et al. Online Frequent Episode Mining

Page 3: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

23/4/19 X. Ao et al. Online Frequent Episode Mining 3

• Frequent episode mining (FEM) techniques are broadly conduced to analyze data sequences in many domains.

Manufacturing Telecommunication Finance

Biology News analysisSystem log analysis

Time stamps

Events• Episode (especially for serial episode in this paper), is kind of totally ordered set of events.

• E.g., D → A is an episode.

Page 4: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

4

• FEM aims at identifying all the frequent episodes whose frequencies are larger than a user-specified threshold.

Page 5: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

5

• Usually, FEM algorithms are time-consuming:

1. The anti-monotonicity property may fail tohold for episode frequency [Achar, 2012].2. Testing whether an episode occurs in a sequence is an NP-complete problem [Tatti, 2011].

[Achar, 2012] A. Achar, S. Laxman, and P. Sastry, “A unified view of the apriori-based algorithms for frequent episode discovery,” KAIS, 2012.[Tatti, 2011] N. Tatti and B. Cule, “Mining closed episodes with simultaneousevents,” in KDD, 2011.

Page 6: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

6

1

AB

2 3 4 5 6 7

D C A AB

D B

• Previous studies on FEM mostly process data offline in a batch mode.

FEM algorithmHistorical data

Frequent episodes

Output

1

AB

2 3 4 5 6 7 8

D C A AB

D BB

Updated dataUpdated frequent

episodes

Different

Page 7: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

7

• In this paper, we consider online frequent episode mining problem (OFEM).

1

AB

2 3 4 5 6 7

D C A AB

D B

8

B

9

AC

...10

D

• Newly emerging episodes may become valuable.

• Old episodes may become obsolete.

• Time-critical applications. Need efficient methods to find recent and frequent episodes.

Page 8: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Predictive maintenance

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

8

• Examples of motivated applications

High Frequency Trading

• Fast-growing data• Recency effect• Time-critical analysis.

Page 9: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

04/19/23 X. Ao et al. Online Frequent Episode Mining

9

Challenges of OFEM algorithm:Infrequent events at the current moment may become frequent in future.

Intensive computation will generate lots of episode occurrences.

Efficiently mining all occurrences of episodes also becomes a big challenge over the growing sequence.

Page 10: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Introduction

Contributions of this paper:Propose an algorithm, MESELO (Mining frEquent Serial Episode via Last Occurrence), for online frequent episode mining.

Design a data structure, episode trie, to compactly store all minimal occurrences of episode.

Introduce the concept of last episode occurrence.

Compare our method and some state-of-the-art batch mode FEM methods based on minimal occurrence.

04/19/23 X. Ao et al. Online Frequent Episode Mining 10

Page 11: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

11

Agenda

• Introduction

Problem Formulation

• Solution Framework

• Experiental Results

• Conlusions

23/4/19 X. Ao et al. Online Frequent Episode Mining

Page 12: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Problem Formulation

04/19/23 X. Ao et al. Online Frequent Episode Mining

12

Valid Sequence

1

AB

2 3 4 5 6 7

D C A AB

D B

8 ...

B

∆Frequent episodes may change as the sequence continues growing.

∆—window size of valid sequence.

Page 13: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

13

Agenda

• Introduction

• Problem Formulation

Solution Framework

• Experiental Results

• Conlusions

23/4/19 X. Ao et al. Online Frequent Episode Mining

Page 14: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

14

Minimal occurrence is a kind of occurrence of episode which can not contain any other occurrence of same episode.

• A → B is a serial episode in the example.• Consider another episode D → D in the example.

δ

Also, minimal episode occurrence is bounded by a user-specified parameter -- maximal occurrence window δ.

• The support of A → B is 2 in the example.

Page 15: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Frequent episodes

Valid Sequence

Local time window

1

AB

2 3 4 5 6 7 8 ...

D C A AB

D BB

Solution Framework

Updated frequent episodes

04/19/23 X. Ao et al. Online Frequent Episode Mining

15

Valid Sequence

1

AB

2 3 4 5 6 7

D C A AB

D B

8 ...

B

δ - 1

• The concept of local time window

Page 16: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

16

Valid Sequence

Local time window

1

AB

2 3 4 5 6 7 8 ...

D C A AB

D BB

• The concept of last episode occurrence

last occurrence of A→B in the local time window

Minimal but not last occurrence of A→B in the local time window

last minimal occurrence of A→B in the local time window

In MESELO, only last minimal episode occurrences could be further expanded to new minimal episode occurrences.

Page 17: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

17

Valid Sequence

1

AB

2 3 4 5 6 7

D C A AB

D B

8 ...

B

• The concept of minimal occurrence starting at i and ending not later than j.

• Definition (Minimal episode occurrence starting at ti and ending no later than tj). Given a time window [ti, tj], we use to denote the set of all minimal episode occurrence for which the start time is equal to ti, and the end time is not larger than tj.

jiM

• In the running example, = {(A, [5, 5]), (A → A, [5, 6]), (A → B, [5, 6]), (A → B → B, [5, 7]), (A → A →B, [5, 7])}.

75M

Page 18: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

1804/19/23 X. Ao et al. Online Frequent Episode Mining

18

Δ

......

δ-1

Sequence grows to k+1

k-Ѭ+1 k-Ѭ+2 k... k-δ+1 ... ... k+1...k-δ+3

... δ-1

k-Ѭ+1 k-Ѭ+2 k... k-δ+1 k-1 ... ...

...

Page 19: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

root:5

A:5

A:6 B:6

B:7 B:7

non-last occurrence node, denotes a minimal but not last occurrence

last occurrence node, denotes a last minimal occurrence

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

19

• Use episode trie to denote jiM

B:6

• Each node p = p.event:p.time, consists of two fields p.event and p.time.

• p.event registers which event this node represents.

• p.time registers the occurrence timestamp.

• The event field of the root is associated with the empty string (labeled as “root”), and the time field of the root is equal to ti.

root:5

• The event sequence along the path from the root to p denotes an episode minimal occurrence, and its occurrence window is [ti, p.time]. E.g., (A → A, [5, 6]).

The episode trie 75T

jiT

In fact, j ji iT M In MESELO, only last occurrence node could be

further expanded to new minimal episode occurrences.

Page 20: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

kiT

11kkT

Solution Framework

04/19/23 X. Ao et al. Online Frequent Episode Mining

20

• MESELO Algorithm

1- +2kkT

Basically,

•Step 1: create a new and update the super script of each

which still varies from k to k+1.

•Step 2: transfer the episode trie out of the main memory.

Page 21: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

04/19/23 X. Ao et al. Online Frequent Episode Mining

21

Valid Sequence

Latest δ timestamps

1 8 BkE E

root:5

A:5

A:6 B:6

B:7 B:7

root:6

A:6 B:6

B:7 B:7

root:7

B:7

(a) (b)

(c)

Before processing

root:8

B:8

(g)

root:7

B:7

B:8

(f)

root:6

A:6 B:6

B:7 B:7

B:8 B:8

(e)

root:5

A:5

A:6

B:7

B:6

B:7

B:8 B:8

(d)

root:5

A:5

A:6

B:7

B:6

B:7

B:8 B:8

root:6

A:6 B:6

B:7 B:7

B:8 B:8

root:7

B:7

B:8

root:8

B:8

(d) (e)

(f)

(g)

After processing

The more details, the proof of soundness and completeness of the algorithm, and the complexity analysis can refer to the paper.

Page 22: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

22

Agenda

• Introduction

• Motivation

• Problem Formulation

• Solution Framework

Experiental Results

• Conlusions

23/4/19 X. Ao et al. Online Frequent Episode Mining

Page 23: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Experimental Results

23/4/19 X. Ao et al. Online Frequent Episode Mining

23

Data sets

Online mode

Batch mode

Mining Server:•2.00 GHz Intel Xeon E5-2620 •32G gigabytes memory•Windows 2008

Database Server:•2.00 GHz Intel Xeon E5-2620 •16G gigabytes memory•Linux CentOS

• 100MB connection

BaselinesOnline mode BRUTE

Online mode MESELO-BS

Batch mode PPS [ICDM’04]

Batch mode MINEPI+ [Info. Sys.’08]

Batch mode UP-Span [KDD’13]

Batch mode DFS [DKE’13]

Environments

Degradation of MESELO Alg.

Page 24: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Experimental Results (1)

04/19/23 X. Ao et al. Online Frequent Episode Mining

24

• Online mode data preparation

Industry Name # of Stocks Datasets Name

Pharmaceuticals 1 Stock-1

Security 2 Stock-2

Electricity Power 4 Stock-3

Iron and Steel 6 Stock-4

Nonferrous-material

8 Stock-5

Estate 10 Stock-6

Table 1. Details of online mode data sets• Data from China Stock

Exchange Daily Trading list (denoted as Stock-1 to 6) over 2,509 trading days from January 1st, 2004 to May 9th, 2014.

• We always select the most leading stocks from each industry.

• Build stock event from daily closing price1. Calculate the increase ratio r of price between two consecutive trading

days.2. Discretize the value of r into 4 levels: UH (r >= 3.5%), UL (0% ≤ r ≤ 3.5%), DL

(−3.5% ≤ r < 0%), DH (r ≤ −3.5%)3. Then, a stock must happen one of the four events every day.

Page 25: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Experimental Results (2)

04/19/23 X. Ao et al. Online Frequent Episode Mining

25

• Online mode experimental results• Comparison method:

• Sequentially read every event set of the coming time stamp, and perform online frequent episode mining.

• Record the execution time at each time stamp and use their average value as the measure for the comparison.

Note: the average time over all time stamps is only related to δ.

Page 26: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Experimental Results (4)

04/19/23 X. Ao et al. Online Frequent Episode Mining

26

• Batch mode data preparationDatasets Name Data Type

Retail Market basket data from stores.

ChainStore Market basket data from stores.

Kosarak Click-stream data from web sites.

BMS Click-stream data from web sites.

Table 2. Details of batch mode data sets

Note: The four datasets are originally for sequential pattern mining. We follow the processing steps in [1].

[1] C.-W. Wu, Y.-F. Lin, S. Y. Philip, and V. S. Tseng, “Mining high utility episodes in complex event sequences,” in KDD, 2013.

Tid Events

1 A, B, D

2 B, E

3 A, F

… …

Sequential pattern mining data form

1 2 3 ...

ABD

BE

AF

Episode mining

data formto

Horizontal

Vertical

Page 27: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Experimental Results (5)

04/19/23 X. Ao et al. Online Frequent Episode Mining

27

• Batch mode performance evaluations• Comparison method: min_sup & δ variations

• 1. Fix δ and vary min_sup. (See Fig. 8)• 2. Fix min_sup and vary δ. (See Fig. 9)

BMS holds a shorter sequence length.And most importantly, less number of events per timestamp compared with other datasets.

Page 28: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

28

Agenda

• Introduction

• Motivation

• Problem Formulation

• Solution Framework

• Experiental Results

Conlusions

23/4/19 X. Ao et al. Online Frequent Episode Mining

Page 29: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Conclusions

New problem: online frequent episode mining.

• Especially useful to time-critical applications with growing sequences.

Efficient online algorithm (i.e. MESELO).

• Experiments on real data sets show the efficiency of MESELO is at least one magnitude of order faster than other baselines.

New concept of last episode occurrence and episode trie.

• Detecting the minimal episode occurrences efficiently. • All minimal episode occurrences are stored in a compact way.

04/19/23 X. Ao et al. Online Frequent Episode Mining 29

Page 30: Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1.

Thanks! Q&Ahttp://mldm.ict.ac.cn/MLDM/~aox

23/4/19 X. Ao et al. Online Frequent Episode Mining 30