/faculteit technologie management
Genetic Process MiningGenetic Process Mining
Ana Karla Medeiros Ana Karla Medeiros Ton Weijters Ton Weijters Wil van der Aalst Wil van der Aalst
Eindhoven University of Technology
Department of Information Systems
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining – Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining – Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Process Mining
X = apply for licenseA = classes motobikeB = classes carC = theoretical exam
C = theoretical examD = practical motorbike examE = practical car examY = get result
/faculteit technologie management
Process Mining (cont.)
• Most of the current techniques cannot handle– Structural constructs: non-free choice, duplicate tasks
and invisible tasks– Noisy logs– Reason: local approach
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining– Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining– Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Genetic Process Mining (GPM)
Aim: Use genetic algorithm to tackle non-free choice, invisible tasks, duplicate tasks and noise.
Internal Representation
Fitness Measure
Genetic Operators
/faculteit technologie management
GPM – Build the Initial Population
• Causal Matrix
Input
XX AA BB CC DD EE YY Output
XX
AA
BB
CC
DD
EE
YY
XA
BY
EC
D
/faculteit technologie management
XA
BY
EC
D
GPM – Build the Initial Population
• Causal Matrix
Input
XX AA BB CC DD EE YY Output
XX 0 1 1 0 0 0 0
AA 0 0 0 1 1 0 0
BB 0 0 0 1 0 1 0
CC 0 0 0 0 1 1 0
DD 0 0 0 0 0 0 1EE 0 0 0 0 0 0 1YY 0 0 0 0 0 0 0
/faculteit technologie management
GPM – Build the Initial Population
• Causal Matrix
Input
XX AA BB CC DD EE YY Output
XX 0 1 1 0 0 0 0 A \/ B
AA 0 0 0 1 1 0 0 C /\ D
BB 0 0 0 1 0 1 0 C /\ E
CC 0 0 0 0 1 1 0 D \/ E
DD 0 0 0 0 0 0 1 Y
EE 0 0 0 0 0 0 1 Y
YY 0 0 0 0 0 0 0 True
XA
BY
EC
D
/faculteit technologie management
GPM – Build the Initial Population
• Causal Matrix
Input True X X A \/ B A /\ C B /\ C D \/ E
XX AA BB CC DD EE YY Output
XX 0 1 1 0 0 0 0 A \/ B
AA 0 0 0 1 1 0 0 C /\ D
BB 0 0 0 1 0 1 0 C /\ E
CC 0 0 0 0 1 1 0 D \/ E
DD 0 0 0 0 0 0 1 Y
EE 0 0 0 0 0 0 1 Y
YY 0 0 0 0 0 0 0 True
XA
BY
EC
D
/faculteit technologie management
GPM – Build the Initial Population
• Every individual has the same amount of tasks
(1) Log
X
A
B Y
C
ED
(2) Set of tasks
(3) Randomly created individuals
X
A
BY
C
E
D
X
AB
Y
C ED
/faculteit technologie management
GPM – Calculate Fitness
• Main idea– Benefit the individuals that can parse more frequent
material in the log
• Challenges– How to assess an individual’s fitness?– How to punish individuals that allow for undesired extra
behavior?
/faculteit technologie management
Fitness - How to assess an individual’s fitness?
- Use continuous semantics parser and register problems L = log and CM = causal matrix
/faculteit technologie management
Trace:
X,A,C,D,Y
For noise-free, fitness punishes:
AND-join AND-join OR-join OR-join OR-split OR-split AND-split AND-split
XA
BY
EC
D
Original net
XA
BY
EC
D
Individual
/faculteit technologie management
Trace:
X,A,C,D,Y
For noise-free, fitness punishes:
AND-split AND-split OR-split OR-split OR-join OR-join AND-join AND-join
XA
BY
EC
D
Original net
Individual
XA
BY
EC
D
/faculteit technologie management
Fitness - How to punish individuals that allow for undesired extra behavior?
Fitness = 1
XA
BY
EC
D
X
AB
Y
C ED
X
A
B
YC
E
D
/faculteit technologie management
XA
BY
EC
D
X
AB
Y
C ED
X
A
B
YC
E
D
Fitness - How to punish individuals that allow for undesired extra behavior?- Count the amount of enabled tasks at every
reachable marking
/faculteit technologie management
GPM – Calculate Fitness
where
L = log and CM = causal matrix and CM[] = population
/faculteit technologie management
GPM – Create next population
• Genetic operators– Crossover
• Recombines existing material in the population• Crossover point = task• Crossover probability• Subsets are swapped
– Mutation• Introduce new material in the population• Every task of a individual can be mutated• Mutation probability
/faculteit technologie management
GPM – Create next population
• Genetic operators - Crossover
XA
BY
EC
DX
A
BY
EC
DParent 1 Parent 2
XA
BY
EC
DX
A
BY
EC
D
Offspring 1 Offspring 2
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining – Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Experiments and Results
• Experiments– ProM framework
• Genetic Algorithm Plug-in• http://www.processmining.org
– Simulated data
• Results– The genetic algorihm found models that could parse all
the traces in the log
/faculteit technologie management
Outline
• Process Mining
• Genetic Algorithms
• Genetic Process Mining – Internal Representation– Fitness measure– Genetic Operators
• Experiments and Results
• Conclusion and Future Work
/faculteit technologie management
Conclusion and Future Work
• Conclusion– Genetic algorithms can be used to mine process
models
• Future Work– Tackle duplicate tasks
• How to detect the right level of abstraction?
– Apply the genetic process mining to "real-life" logs• How to deal with noise?
Top Related