An Inductive Database for Mining Temporal Patterns in Event Sequences
Alexandre Vautier, Marie-Odile Cordier and René Quiniou
[email protected] - France
The application Cardiac arrhytmias
P wave
Normal QRS complex
…ok, but how can a specific
arrhythmia be automatically characterized ?
Abnormal QRS complex
Electrocardiograms
Abnormal rhythm
Normal rhythm
M. Rabbit, you suffer from bigeminy, a severecardiac arrhythmia
M. Dog, you are ok !...
A problem definition close to supervised machine learning
Discretized and labeled electrocardiograms
p,Q,p,q,p,Q,p,Q
p,q,p,q,p,q,p,qp,Q,p,p,Q,p,p,Q
N
..ok, which patterns are frequent in the sequence labeled
but not frequent in sequences labeled ?
P
N
p,Q,p,q,p,Q,p,Q
p,q,p,q,p,q,p,q
p,Q,p,p,Q,p,p,Q
Temporal patterns representative of the sequence
P
P
Frequent temporal patterns
Formalization of the problemThe framework of inductive database (IDB)
Sequences {Lbigeminy} 2 P
{Llbb, Lmobitz, Lnormal} 2 N
Temporal patterns…
An IDB
…ok, which temporal patterns Csatisfy Quexpert(P,N,T,C) = (9L2P, freq(C,L) ¸ TL) Æ (8L2N, freq(C,L) < TL) ?
Sequences {Lbigeminy} 2 P
{Llbb, Lmobitz, Lnormal} 2 N
Temporal patterns{C|freq(C, Lbigeminy)¸ T
0},
{C|freq(C, Llbbb)¸ T1},{C|freq(C, Lmobitz)¸ T2},{C|freq(C, Lnormal)¸ T3},{C| Quexpert(P,N,T,C) }
Plan
Introduction
Problem features Sequences Chronicles
Inductive databases Order relation What is frequency ?
Algorithms Frequent Minimal Chronicles Search (Fmc Search) Querying the IDB
Experiments and problems to be solved
Conclusion and future work
Long sequences of time-stamped events with few types Numerical temporal information of major importance
An example of an event sequence:
Features of sequences
AB AB AB B AA B …
Events
time0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Features of temporal patternsChronicles
Chronicle: a set of events temporally constrained May contain several events of the same type Specifies numerical temporal constraint between events:
the uncertain delay represented by an interval [dmin,dmax] dmin,dmax 2 Z Is easily readable by an expert of the application domain
C, t0 A, t1 B, t2[5;10]
[-2;20]
C,1 B,5 A,8 B,10 C,15 C,16 A,26 B,34A,27
Instances IC(L)
Event sequence: an ordered list of time-stamped events
C:
Inductive databaseRequired definitions
A query language that makes use of frequency constraintsfreq(C,L) ¸ T and freq(C,L) · T
If a query on frequency satisfies monotonicity or anti-monotonicity properties then a search based on frequency is easier to compute
An order relation on chronicles must be defined
An order relation on chronicles
C is more general than C’ (C v C’) ,
each event of C can be matched to an event of C‘ each temporal constraint of C is more general than the corresponding
constraint in C'
A,t2
B,t3
B,t4
[5;10]
[9;20]A,t0
B,t1
[8;21]v
C C’
How to compute the frequency of a chronicle in a sequence ?
B B
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
A A BB
IC(L)
L:
AB B[2,3] [-1,3]
[1,5]
C:
BA BBAB
The cardinal of the set ofAll the instances ?Minimal occurences ? [Mannila,97]Earliest distinct instances ? [Dousson, 99]Distinct instances ?
Monotonicity and anti-monotonicity properties
Constraints on frequency should satisfy monotonicity or anti-monotonicity properties
0 1 2 3 4 5
A AB B
A B[-2,2]
A
freq(C,L)
¸freq(C’,L)
L:
C:
C’:
2 instances
3 instances
·
Minimal occurences [Mannila, 97] don’t have monotonicity and anti-monotonicity properties
Recognition criterion
Let IC(L) be the set of instances of the chronicle C in the sequence L
A recognition criterion selects a unique set E of instances from IC(L)
The frequency of the chronicle C in the sequence L according to the recognition criterion Q is
freqQ(C,L) = |E|
A monotonic criterion is a recognition criterion Q such that
C v C’ ) freqQ(C,L) ¸ freqQ(C’,L)
Fmc Search
Fmc Search: Frequent Minimal Chronicles SearchfreqQ(C,L) ¸ T, maxwin(C) · W
Input: L: An event sequence T: A minimum frequency threshold Q: A recognition criterion (application dependent) W: A maximal time window
Output: FmcQ,W(L,T) Every chronicle from FmcQ,W(L,T) satisfies 3 properties:
is as specific as possible generalizes at least T instances… …that respect the recognition criterion Q
Algorithm:
Step 1 Step 2 Step 3 Step 4
x xxx x FmcQ,W(L,T)
Fmc SearchStep 1: Chronicle instance extraction
The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W]
Implemented in the software FACE (Frequency Analyser for Chronicle Extraction)
x xxx x FmcQ,W(L,T)
The numerical temporal constraints of chronicles found by FACE are not specific enough
Fmc SearchStep 2: Fuzzy clustering of instances
A fuzzy clustering of each set IC(L) found at step 1 is performed
B BA A BB AB
AB
BBx
x xx
IC(L)
An instance has a membership
degree to each cluster
x xxx x FmcQ,W(L,T)
Fmc SearchStep 3: Chronicle construction from clusters
For each fuzzy cluster of step 2: Instances are sorted in the decreasing order of their membership
degree The T first instances that respect the Q criterion are kept to construct a
chronicle This chronicle is the lgg (least general generalization) of the selected
instances
x xxx x FmcQ,W(L,T)
The specificity of chronicles depends on the
clustering
Fmc SearchStep 4: Chronicle filtering - keep the most specific
Compute the set of frequent minimal (maximally specific) chronicles FmcQ,W (L,T) The most specific chronicles are retained
Monotonicity property:A chronicle C that satisfies freqQ(C,L) ¸ T is more general than at least one chronicle of FmcQ,W (L,T)
x xxx x FmcQ,W(L,T)
Querying the IDB
A chronicle C satisfies this query iff: C is more general than at least one chronicle of FmcQ,W(LP,TP)
monotonicity property C is not more general than every chronicle of FmcQ,W(LN,TN)
anti-monotonicity property
Version space
?
T
An adaptation of Mitchell’salgorithm computes this
version space
Remember my query:
Quexpert(P,N,T,C). For the explanation P = {LP} and N = {LN}
freq(C,LP)¸TP Æ freq(C,LN)<TN
ExperimentsCharacterization of cardiac arrhythmias
Data: 4 sequences of cardiac events elaborated from electrocardiograms labeled by an expert containing ~4000 events of 3 types
(P waves, normal QRS complexes, abnormal QRS complexes)
A typical query :freqQ
d(C,Lbigeminy) ¸ 5% Æ freqQ
d(C,Lnormal) · 10% Æ freqQ
d(C,Lmobitz) · 10% Æ freqQ
d(C,Llbbb) · 10% Æ W = 3 s
ExperimentsAn example of cardiac chronicle
Characterizes bigeminy arrhythmia
Also found by a supervised learning method (ILP) from ECGs [Carrault, 03]
p = “P waves”q = “normal QRS”Q = “abnormal QRS”
Problems to be solved
The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle
For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence
This slows down Mitchell's algorithm dramatically
Finding Fmc is an NP-complete problem The set FmcQ,W(LN,TN)is correct but not complete
Results have to be filtered in order to give the correct solution
?
T
Optimal FmcQ,W(LN,TN)
Practical FmcQ,W(LN,TN)
Conclusion
An original method to extract temporal patterns in the form of chronicles Chronicles express constraints on time by numerical intervals
A formalization of the problem in the framework of inductive database which provides the definition of:
An order relation on temporal patterns A monotonic recognition criterion and the related frequency
A management of numerical temporal constraints (this task is very hard)
An algorithm that finds Fmc in sequences
A method to reuse and adapt Mitchell’s algorithm
Future work
Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchell’s algorithm
Adapt Mitchell’s algorithm in order to provide an approximate solution whose quality is user-defined
Extend the method to other measures of interest
Explore new applications intrusion detection
in Event Sequences
An IDB for Mining Temporal Patterns
Alexandre Vautier,Marie-Odile Cordier,
and René Quiniou
RENNES - France
Maximum size of IC(L) as a function of the number of events in a chronicle
Top Related