Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University,...
-
Upload
kristian-thomas -
Category
Documents
-
view
214 -
download
0
Transcript of Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University,...
![Page 1: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/1.jpg)
Matching Heterogeneous Events with Patterns
Xiaochen Zhu1, Shaoxu Song1, Jianmin Wang1, Philip S. Yu2, Jiaguang Sun1
1Tsinghua University, China
2University of Illinois at Chicago, USA
1/29
ICDE 2014
![Page 2: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/2.jpg)
Outline
Motivation Event Matching Framework
A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H
Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion
2/29
ICDE 2014
![Page 3: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/3.jpg)
Information System and Event Log
Information systems play an important role in large enterprises:
Enterprise Resource Planning (ERP) Office Automation (OA)
These systems record the business history in their event logs.
3/29
ICDE 2014
Trace ID Trace Trace ID Trace
1 ABCDEF 6 ACBDEF
2 ACBDEF 7 ACBDFE
3 ACBDFE 8 ACBDFE
4 ABCDFE 9 ACBDFE
5 ACBDEF 10 ACBDFE
ABCDEF
Event ID Trace ID Event Name Timestamp
1 1 Order Received (A) 04-22 13:33:34
2 1 Payment (B) 04-22 15:10:17
3 1 Check Inventory (C) 04-22 15:18:11
4 1 Ship Goods (D) 04-22 15:31:50
5 1 Record Order (E) 04-23 08:14:26
6 1 Send Notification (F)
04-23 08:17:18
![Page 4: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/4.jpg)
Event Data Integration
Complex event processing Provenance analysis Decision support
Exploring the correspondence among events
4/29
ICDE 2014
Business Data Warehouse
Event Logs
Beijing Subsidiary
Event Logs
Shanghai Subsidiary
Event Logs
Guangzhou Subsidiary
Information systems
Information systems
Information systems
![Page 5: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/5.jpg)
Heterogeneous Events
Different events may represent the same activity
5/29
Event Name Timestamp
Order Received (A) 04-22 13:33:34
Payment (B) 04-22 15:10:17
Check Inventory (C) 04-22 15:18:11
Ship Goods (D) 04-22 15:31:50
Record Order (E) 04-23 08:14:26
Send Notification (F)
04-23 08:17:18
ICDE 2014
Event Name Timestamp
JD (1) 03-18 09:12:07
YD (2) 03-18 09:27:14
TJD (3) 03-18 09:30:18
CK (5) 03-18 09:35:32
ZF (4) 03-18 09:50:12
FH (6) 03-18 10:30:47
DL (7) 03-18 12:31:12
FT (8) 03-18 12:40:40
Abbreviation of Chinese phonetic representation
English name
![Page 6: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/6.jpg)
Convert Event Log to Graph Text similarity fails statistics and structural information Event Log Event Dependency Graph (V, E, f)
6/29
ICDE 2014
Trace ID Trace
1 ABCDEF
2 ACBDEF
3 ACBDFE
4 ABCDFE
5 ACBDEF
6 ACBDEF
7 ACBDFE
8 ACBDFE
9 ACBDFE
10 ACBDFE
A
B
C
D
E
F
1.0 1.0
1.0 1.0
1.0
0.2
f(A,C)=0.8
0.8
0.2
0.8 0.4
0.2 0.6
0.6
0.4
f(A,A)=1.0
frequency of appearance
frequency of consecutive events
![Page 7: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/7.jpg)
Graph-Based Matching Framework Event logs dependency graphs Event matching vertex mapping (injective mapping : V1
→ V2)
7/29
Event Log 1
Event Log 2
A
B
C
1.0
0.3
0.8
0.2
0.8
0.1
G1
1
2
3
1.0
0.5
0.7
0.3
0.7
0.2
G2
ICDE 2014
A
B
C
G11
2
3
G2
A
B
C
G11
2
3
G2
A
B
C
G11
2
3
G2
How to evaluate the best mapping?
![Page 8: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/8.jpg)
Evaluation of Mapping
Feature space: Vertex+Edge Vertex: Edge: Similarity of corresponding elements:
8/29
ICDE 2014
A
B
C
1.0
0.3
0.8
0.2
0.8
0.1
G1
1
2
3
1.0
0.5
0.7
0.3
0.7
0.2
G2
S(B2) =
B 2
S((A,C)(1,3)) =
B
C
A 1
2
3
mapping ={A1, B2, C3}A1, B2, C3
(A,B)(1,2), (A,C)(1,3), (C,B)(2,3)A, B, C
(A,B), (A,C), (C,B)
![Page 9: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/9.jpg)
Normal Distance Normal Distance*:
Summation of the similarities of corresponding elements. Higher is better.
9/29
* J. Kang and J. F. Naughton. On schema matching with opaque columnnames and data values. In SIGMOD Conference, pages 205–216, 2003.
ICDE 2014
![Page 10: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/10.jpg)
Event Matching Problem
={A1, B2, C3}
={A3, B2, C1}
Problem: Given two event logs and , the event matching problem is to find an event mapping that maximizes .
10/29
ICDE 2014
A
B
C
1.0
0.3
0.8
0.2
0.8
0.1
G1
1
2
3
1.0
0.5
0.7
0.3
0.7
0.2
G2
B
C
A
B
C
A 1
2
3
1
2
3
![Page 11: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/11.jpg)
Vertex+Edge, Not Enough
={A6, B2, C1, D3, E4, F5}
={A3, B4, C5, D6, E7, F8}
11/29
ICDE 2014
A
B
C
D
E
F
1.0
1.0 1.0
1.0 1.0
1.0
0.2
0.8
0.8
0.2
0.8 0.4
0.2 0.6
0.6
0.4
G1
3
4
5
6
7
8
1.0
1.0 0.9
1.0 0.9
1.0
0.4
0.6
0.6
0.4
0.6 0.3
0.4 0.7
0.6
0.4
1
2
1.0
1.0
0.2
0.8
0.2
0.8
G2
A
B
C
D
E
F
3
4
5
6
1
2
14.00
𝐷𝑁 (𝑀 h𝑡𝑟𝑢𝑡 )=13.91
A
B
C
D
E
F
3
4
5
6
7
8
Vertex+Edge is not discriminative enough
Fail!
![Page 12: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/12.jpg)
More Feature: Event Patterns Event Pattern: particular orders of event occurrence
12/29
ICDE 2014
=B
=SEQ(D,E)
=AND(B,C)
=SEQ(A,AND(B,C),D)
Trace ID Trace
1 ABCDEF
2 ACBDEF
3 ACBDFE
4 ABCDFE
5 ACBDEF
6 ACBDEF
7 ACBDFE
8 ACBDFE
9 ACBDFE
10 ACBDFE
=1.0
=0.4
=1.0
=1.0
not match
match
![Page 13: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/13.jpg)
Pattern Normal Distance Given an event matching and a set of patterns :
Vertices and edges can also be seen as patterns. Pattern Normal Distance is compatible with Normal
Distance
13/29
ICDE 2014
![Page 14: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/14.jpg)
Matching Events with Patterns14/29
ICDE 2014
A
B
C
D
E
F
1.0
1.0 1.0
1.0 1.0
1.0
0.2
0.8
0.8
0.2
0.8 0.4
0.2 0.6
0.6
0.4
G1
3
4
5
6
7
8
1.0
1.0 0.9
1.0 0.9
1.0
0.4
0.6
0.6
0.4
0.6 0.3
0.4 0.7
0.6
0.4
1
2
1.0
1.0
0.2
0.8
0.2
0.8
G2
A
B
C
D
E
F
3
4
5
6
1
2
={A6, B2, C1, D3, E4, F5}14.00
={A3, B4, C5, D6, E7, F8}
A
B
C
D
E
F
3
4
5
6
7
8
Patterns: Vertex pattern: A, B, C, D, E, FEdge pattern: SEQ(A,B), SEQ(A,C), SEQ(B,C), SEQ(C,B), SEQ(B,D), SEQ(C,D), SEQ(D,E), SEQ(D,F), SEQ(E,F), SEQ(F,E)Complex pattern: SEQ(A, AND(B, C), D)SEQ(A, AND(B, C), D) SEQ(3, AND(4, 5), 6)
14 .91
![Page 15: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/15.jpg)
Hardness of Matching Events Large amount of possible mappings:
A survey on a real Chinese bus manufacturer: The average number of distinct events is 18; The number of all the possible event mapping is
15/29
ICDE 2014
Key issue is efficiency
![Page 16: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/16.jpg)
Outline
Motivation Event Matching Framework
A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H
Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion
16/29
ICDE 2014
![Page 17: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/17.jpg)
A* Search Algorithm Input: two dependency graphs, pre-defined patterns Output: a vertex mapping with the maximum Process: growth of an A* tree Tree node:
Two Scores g and h: g: current (exact) h: remaining (upper bound)
Heuristic: always visit the tree node with the highest g+h
17/29
ICDE 2014
:{} :{A,B,C,D} :{1,2,3,4}
![Page 18: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/18.jpg)
Growth of A* Search Tree18/29
ICDE 2014
:{} :{A,B,C,D} :{1,2,3,4}Root node
:{A1} :{B,C,D}:{2,3,4}
node 1
:{A2} :{B,C,D}:{1,3,4}
node 2
:{A3} :{B,C,D}:{1,2,4}
node 3
:{A2,C1} :{B,D}:{3,4}
node 5
:{A2,C3} :{B,D}:{1,4}
node 6
:{A2,C4} :{B,D}:{1,3}
node 7
:{A2,C3,B4,D1} :{}:{}
node 10
:{A4} :{B,C,D}:{1,2,3}
node 4
g: 0.8h: 3.0g+h: 3.8
g: 1.0h: 3.0g+h: 4.0
g: 0.7h: 3.0g+h: 3.7
g: 0.5h: 3.0g+h: 3.5
g: 1.8h: 2.0g+h: 3.8
g: 2.0h: 2.0g+h: 4.0
g: 1.2h: 2.0g+h: 3.2
g: 4.0h: 0.0g+h: 4.0
1,2,3,4A
C1,3,4
g: current (exact)h: remaining (upper bound)
Terminate when U1 or U2 is empty
![Page 19: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/19.jpg)
Incremental Computing of G19/29
ICDE 2014
A B C D
1 2 3 4
Patterns:A, B, C, D,SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D),SEQ(A,B,C), SEQ(B,C,D)
G1
G2
1. newly introduced patterns:, SEQ(C,B)
C, SEQ(B,C), SEQ(A,B,C)2. prune unmapped patterns:3. compute similarities:
3, SEQ(2,3), SEQ(1,2,3)
, SEQ(C,B) of the parent
+ these similarities= of the child
𝑴𝟏
Parent node::{A1,B2}:{C,D}:{3,4}
𝑴𝟐
Child node::{A1,B2,C3} :{D} :{4}
![Page 20: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/20.jpg)
Estimating Upper Bound of H
Simple Bounding Function We assume each remaining pattern has a matching pattern with
similarity 1.0. Let h = 3.
Advanced Bounding Function
Motivation: Estimation need speed. Find for each ? Compute online ?
20/29
ICDE 2014
A B C D
1 2 3 4
Patterns:A, B, C, D,SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D),SEQ(A,B,C), SEQ(B,C,D)
G1
G2
:{A1,B2,C3} :{D} :{4}
Remaining Patterns:D,SEQ(C,D),SEQ(B,C,D)
![Page 21: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/21.jpg)
Advanced Bounding Function Use other frequency to take the place of Highest vertex frequency Highest edge frequency
21/29
ICDE 2014
Case of Pattern Upper Bound
a general pattern
a simple pattern SEQ(, ... , )
a simple pattern AND(, ... , )
a complex pattern
![Page 22: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/22.jpg)
Outline
Motivation Event Matching Framework
A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H
Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion
22/29
ICDE 2014
![Page 23: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/23.jpg)
Pay-As-You-Go Matching Motivation:
Interesting event patterns are gradually identified. Best matching may change.
Two heuristic strategy: Continue Restart
23/29
ICDE 2014
:{} :{A,B,C,D} :{1,2,3,4}
:{A1} :{B,C,D}:{2,3,4}
:{A2} :{B,C,D}:{1,3,4}
:{A3} :{B,C,D}:{1,2,4}
:{A2,C3,B4,D1} :{}:{}
:{A4} :{B,C,D}:{1,2,3}
Materialize leaf nodes
:{A2,C1} :{B,D}:{3,4}
:{A2,C3} :{B,D}:{1,4}
:{A2,C4} :{B,D}:{1,3}
Materialize previous answer for pruning
![Page 24: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/24.jpg)
Outline
Motivation Event Matching Framework
A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H
Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion
24/29
ICDE 2014
![Page 25: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/25.jpg)
Experiment Setting
Real Life Data Set: employed from the bus manufacturer
True-mapping is generated manually by domain experts.
Criteria: to evaluate the accuracy of event matching, F-measure of precision and recall.
Baseline: Opaque matching1, Iterative Matching2.
1. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20032. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.
25/29
No. of Event Logs 38 Min Event Size 2
No. of Traces 3000 Max Event Size 11
ICDE 2014
![Page 26: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/26.jpg)
Effectiveness and Efficiency26/29
ICDE 2014
Our ApproachOur Approach
Our ApproachOur Approach
![Page 27: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/27.jpg)
Performance on pay-as-you-go
More patterns, higher accuracy; Pay-as-you-go strategies accelerate the re-computation of
new event matching.
27/29
ICDE 2014
![Page 28: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/28.jpg)
Conclusion
Pattern based generic framework (Vertex+Edge+Complex) Patterns Compatible with existing methods.
An advanced bounding function.
Support matching in a pay-as-you-go style.
28/29
ICDE 2014
![Page 29: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e5f5503460f94b59688/html5/thumbnails/29.jpg)
Q & AThanks!
29/29
ICDE 2014