Post on 31-Mar-2015
/faculteit technologie management
PM-1/faculteit wiskunde en informatica
Process mining: Discovering Process Models from Event Logs
Prof.dr.ir. Wil van der AalstEindhoven University of Technology, P.O.Box 513, NL-5600 MB,
Eindhoven, The Netherlands.
/faculteit technologie management
PM-2/faculteit wiskunde en informatica
Outline
• Who we are ...– I&T group– selected research projects
• Process mining– purpose– basic idea– (re)discovery problem– mining algorithm (W)– comparison– example/tools– case study
• Conclusion
/faculteit technologie management
PM-3/faculteit wiskunde en informatica
Who we are ...
/faculteit technologie management
PM-4/faculteit wiskunde en informatica
Information & Technology (I&T) group at EUT
• I&T group (35 persons), Department of Technology Management, Eindhoven University of Technology.
• Three subgroups:– Business Process Management
(workflow management, Petri nets, mining, ...)
– ICT Architectures(agents, transactions, ...)
– Software Engineering(software quality, ...)
/faculteit technologie management
PM-5/faculteit wiskunde en informatica
Selected research projects
• process mining • workflow verification• workflow patterns• web services composition languages• case handling• XRL/flower• business process improvement• ...In most cases using/extending Petri net theory!
/faculteit technologie management
PM-6/faculteit wiskunde en informatica
Workflow verification: Woflan
• Can interface with Staffware, Protos, COSA, Meteor.
• Can handle Event-driven Process Chains (ARIS)
/faculteit technologie management
PM-7/faculteit wiskunde en informatica
Workflow patterns
• The academicresponse
• A quest for the basic requirements
• 20 basic patterns• 20+ systems
evaluated• Joint work with QUT,
ATOS, etc. • http://www.tm.tue.nl/it/research/patterns• +/- 150 pageviews per working day (>25.000 in total)
/faculteit technologie management
PM-8/faculteit wiskunde en informatica
Web services composition languagespattern standard
XPDL UML BPEL XLANG WSFL BPML WSCI
Sequence + + + + + + +
Parallel Split + + + + + + +
Synchronization + + + + + + +
Exclusive Choice + + + + + + +
Simple Merge + + + + + + +
Multi Choice + - + - + - -
Synchronizing Merge - - + - + - -
Multi Merge - - - - - +/- +/-
Discriminator - - - - - - -
Arbitrary Cycles + - - - - - -
Implicit Termination + - + - + + +
MI without Synchronization - - + + + + +
MI with a Priori Design Time Knowledge + + + + + + +
MI with a Priori Runtime Knowledge - + - - - - -
MI without a Priori Runtime Knowledge - - - - - - -
Deferred Choice - + + + - + +
Interleaved Parallel Routing - - +/- - - - -
Milestone - - - - - - -
Cancel Activity - + + + + + +
Cancel Case - + + + + + +
• Also process support.• Standards
considered are BPML, BPEL4WS, XLANG, WSFL, WSCI.
• Joint work with QUT (Brisbane, Australia).
/faculteit technologie management
PM-9/faculteit wiskunde en informatica
Process miningTeam members:• Wil van der Aalst• Ton Weijters• Laura Maruster• Ana-Karla Medeiros• Boudewijn van Dongen• Eric Verbeek
/faculteit technologie management
PM-10/faculteit wiskunde en informatica
Business Process Management
processdesign
implementation/configuration
processenactment
diagnosis
/faculteit technologie management
PM-11/faculteit wiskunde en informatica
No feedback loop
processdesign
implementation/configuration
processenactment
diagnosis
/faculteit technologie management
PM-12/faculteit wiskunde en informatica
The basic idea
process mining
/faculteit technologie management
PM-13/faculteit wiskunde en informatica
Toy example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task A case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task E case 5 : task D case 4 : task D
ABCD {cases 1,3}ACBD {cases 2,4}AED {case 5}
/faculteit technologie management
PM-14/faculteit wiskunde en informatica
Result: A Petri net model
ABCD
ACBD
AED
A
B
C
DE
(W)
Petri nets are used as a formalism, the target language can be different, e.g., Event-driven Process Chains.
/faculteit technologie management
PM-15/faculteit wiskunde en informatica
generate workflow logbased on WF-net
construct WF-net basedon applying workflow
mining techniques
workflow log
WF-net
WF1 = WF2 ?
Focus of this presentation is on the following theoretical question:
/faculteit technologie management
PM-16/faculteit wiskunde en informatica
• Assumption: complete workflow logs without noise. • Let T be a set of tasks. T* is a workflow trace and W T*
is a workflow log.• Let W be a workflow log over T, i.e., W T*. Let a,b T:
– a > W b if and only if there is a trace = t1 t2 t3 tn-1 and i {1, , n-2} such that W and ti = a and ti+1 = b,
– a W b if and only if a > W b and not (b > W a),
– a #W b if and only if not(a > W b) and not(b > W a), and
– a W b if and only if a > W b and b > W a.
• Let N = (P,T,F) be a sound WF-net, i.e., N W. W is a workflow log of N if and only if W T* and every trace W is a firing sequence of N starting in state [i], i.e., (N,[i])\protect[.
• W is a complete workflow log of N if and only if (1) for any workflow log W of N: > W > W and (2) for any t T there is a W such that t .
/faculteit technologie management
PM-17/faculteit wiskunde en informatica
Example 1case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task A case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task E case 5 : task D
case 4 : task D
W = { A B C D, A C B D, A E D}
A > W B A > W C A > W E B > W CB > W D C > W BC > W DE > W D
AW B
A W C
A W E
B W D
C W D
E W D
B W
CC W
B
#W : rest
Log is complete if this relation cannot be extended
XW Y xorYW X xorX W Y xor
X #W Y
/faculteit technologie management
PM-18/faculteit wiskunde en informatica
Example 2
A
B
C
D
W = { A B C D, A C B D} is completeA > W B A > W C B > W CB > W D C > W BC > W D
AW B
A W C
B W D
C W D
B W
CC W
B
#W : rest
/faculteit technologie management
PM-19/faculteit wiskunde en informatica
Example 3
W = { A B D, A C D} is complete
A > W B A > W C B > W D C > W D
AW B
A W C
B W D
C W DW :non
e
#W : rest
A
B
C
D
/faculteit technologie management
PM-20/faculteit wiskunde en informatica
Causal relations imply connecting places
• Let N = (P,T,F) be a sound WF-net and let W be a complete workflow log of N. For any a,b T: a W b implies a b .
• I.e., if there is a causal relation between two transitions according to the workflow log, then there has to be a place connecting these two transitions.
• Surprisingly this holds for any sound WF-net!
A
B
C
DAW B
A W C
B W D
C W DA
B
C
D
/faculteit technologie management
PM-21/faculteit wiskunde en informatica
Connecting places “often” imply causal relations
• Let N = (P,T,F) be a sound SWF-net and let W be a complete workflow log of N. For any a,b T: a b and b a = implies a W b.
• No “short loops” (i.e., loops of length 1 or 2).• Structured Workflow Nets (SWF-nets) have no implicit places
and the following two constructs cannot be used:
/faculteit technologie management
PM-22/faculteit wiskunde en informatica
Example 4: loops of length 1 are harmful
A
B
D
AW B
A W D
B W D There is a place connecting B to B but not B W B.
/faculteit technologie management
PM-23/faculteit wiskunde en informatica
Example 5: loops of length 2 are harmful
AW B
B W D
There is a place connecting B to C but not B W C (because C can be followed directly by B).
A
B
C
D
There is a place connecting C to B but not C W B (because B can be followed directly by C).
/faculteit technologie management
PM-24/faculteit wiskunde en informatica
Example 6: Implicit places remain undetected
A B C
AW B
B W C
More complex examples can be given showing that the two other requirements for non-SWF-nets are needed.
/faculteit technologie management
PM-25/faculteit wiskunde en informatica
Parallelism can “often” be detected
• Let N = (P,T,F) be a sound SWF-net such that for any a,b T: a b = or b a = and let W be a complete workflow log of N. 1.If a,b T and a b , then a #W b.
2.If a,b T and a b , then a #W b.
3.If a,b,t T, a W t, b W t, and a #Wb, then a b t .
4.If a,b,t T, t W a, t W b, and a #Wb, then a b t .
• This is a complex way of stating that for sound SWF-nets without short loops, it is possible to distinguish XOR-splits from AND-splits and XOR-joins from AND-joins.
/faculteit technologie management
PM-26/faculteit wiskunde en informatica
Mining algorithm (W)
Let W be a workflow log over T. (W) is defined as follows.
1. TW = { t T W t },
2. TI = { t T W t = first() },
3. TO = { t T W t = last() },
4. XW = { (A,B) A TW B TW a Ab B a W b a1,a2 A a1#W
a2 b1,b2 B b1#W b2 },
5. YW = { (A,B) X (A,B) XA A B B (A,B) = (A,B) },
6. PW = { p(A,B) (A,B) YW } {iW,oW},
7. FW = { (a,p(A,B)) (A,B) YW a A } { (p(A,B),b) (A,B) YW b
B } { (iW,t) t TI} { (t,oW) t TO}, and
8. (W) = (PW,TW,FW).
/faculteit technologie management
PM-27/faculteit wiskunde en informatica
Solution to the rediscovery problem• Let N = (P,T,F) be a sound SWF-net and let W be a
complete workflow log of N. If for all a,b T a b = or b a = , then (W) = N modulo renaming of places.
• I.e., any sound SWF-net without short loops can be rediscovered!
generate workflow logbased on WF-net
construct WF-net basedon applying workflow
mining techniques
workflow log
WF-net
WF1 = WF2 ?
/faculteit technologie management
PM-28/faculteit wiskunde en informatica
Example 7: Sound SWF-net without short loops
A
B
C
D
A
B
C
D
/faculteit technologie management
PM-29/faculteit wiskunde en informatica
Example 8: A WF-net with an implicit place
A B C
A B C
(W)
/faculteit technologie management
PM-30/faculteit wiskunde en informatica
Example 9: Loop of length 1
A
B
D
A
B
D
(W)
/faculteit technologie management
PM-31/faculteit wiskunde en informatica
Example 10: Loop of length 2
A
B
C
D
A
B
C
D
(W)
/faculteit technologie management
PM-32/faculteit wiskunde en informatica
Example 11: Loop of length 3
A B
C
D
E
A B
C
D
E
No problem!
(W)
/faculteit technologie management
PM-33/faculteit wiskunde en informatica
Example 12: Non-free-choice constructs may be harmful
A D
C
EBA D
C
EB
(W)
/faculteit technologie management
PM-34/faculteit wiskunde en informatica
Example 13: Free-choice is not enough
A
B
C
D
E
F
G
A
B
C
D
E
F
G
Behaviorally equivalent!
(W)
/faculteit technologie management
PM-35/faculteit wiskunde en informatica
Example 14: Example with “hidden” tasks ?
A
AND-split
B
C
AND-join
D
E
Any suggestions?
/faculteit technologie management
PM-36/faculteit wiskunde en informatica
Simplification!
A
B
C
DE
Behaviorally equivalent!
(W)
/faculteit technologie management
PM-37/faculteit wiskunde en informatica
Results and issues
• Proven to be correct for a large class of processes.• Notion of completeness is needed (direct successor
relation).• Can handle parallelism and time.• Open issues:
– noise– incomplete logs– data– advanced process patterns (hidden tasks, NFC, etc.)– behavioral equivalence
• On each of these issues we have some preliminary results.
/faculteit technologie management
PM-38/faculteit wiskunde en informatica
Scientific competition
• J.E. Cook (and A.L. Wolf) – New Mexico State University/ University of Colorado, USA
• J. Herbst (and D. Karagiannis) – DaimlerChrysler, Germany• R. Agrawal, D. Gunopulos, M.K. Maxeiner, K. Küspert, and
F. Leymann – IBM, Germany• G. Schimm – OFFIS, Germany• S.Y. Hwang et al. – Sun Yeat-Sen University, Taiwan• M. Golani and S.S. Pinter – IBM, Israel• D. Grigori, F. Casati, et al. – HP, USA
Our approach differs because we incorporate time and noise and take parallelism as a starting point.
/faculteit technologie management
PM-39/faculteit wiskunde en informatica
Practical competition (ARIS PPM)
• IDS Scheer's ARIS Process Performance Manager. • No process mining but interesting links with systems like
SAP.
/faculteit technologie management
PM-40/faculteit wiskunde en informatica
Tools/standards for process mining
Staffware
InConcert
MQ Series
workflow management systems
FLOWer
Vectus
Siebel
case handling / CRM systems
SAP R/3
BaaN
Peoplesoft
ERP systems
common XML format for storing/exchanging workflow logs
EMiT Thumb
mining tools
/faculteit technologie management
PM-41/faculteit wiskunde en informatica
Example: processing customer orders
Example in Staffware: 7 tasks and
all basic routing
constructs
/faculteit technologie management
PM-42/faculteit wiskunde en informatica
Fragment of Staffware logCase 21
Diractive Description Event User yyyy/mm/dd hh:mm
----------------------------------------------------------------------------
Start swdemo@staffw_edl 2003/02/05 15:00
Register order Processed To swdemo@staffw_edl 2003/02/05 15:00
Register order Released By swdemo@staffw_edl 2003/02/05 15:00
Prepare shipment Processed To swdemo@staffw_edl 2003/02/05 15:00
(Re)send bill Processed To swdemo@staffw_edl 2003/02/05 15:00
(Re)send bill Released By swdemo@staffw_edl 2003/02/05 15:01
Receive payment Processed To swdemo@staffw_edl 2003/02/05 15:01
Prepare shipment Released By swdemo@staffw_edl 2003/02/05 15:01
Ship goods Processed To swdemo@staffw_edl 2003/02/05 15:01
Ship goods Released By swdemo@staffw_edl 2003/02/05 15:02
Receive payment Released By swdemo@staffw_edl 2003/02/05 15:02
Archive order Processed To swdemo@staffw_edl 2003/02/05 15:02
Archive order Released By swdemo@staffw_edl 2003/02/05 15:02
Terminated 2003/02/05 15:02
Case 22
Diractive Description Event User yyyy/mm/dd hh:mm
----------------------------------------------------------------------------
Start swdemo@staffw_edl 2003/02/05 15:02
Register order Processed To swdemo@staffw_edl 2003/02/05 15:02
Register order Released By swdemo@staffw_edl 2003/02/05 15:02
Prepare shipment Processed To swdemo@staffw_edl 2003/02/05 15:02
/faculteit technologie management
PM-43/faculteit wiskunde en informatica
Fragment of XML file<?xml version="1.0"?><!DOCTYPE WorkFlow_log SYSTEM
"http://www.tm.tue.nl/it/research/workflow/mining/WorkFlow_log.dtd"><WorkFlow_log>
<source program="staffware"/><process id="main_process">
<case id="case_0"><log_line>
<task_name>Case start</task_name><event kind="normal"/><date>05-02-2003</date><time>15:04</time>
</log_line><log_line>
<task_name>Register order</task_name><event kind="schedule"/><date>05-02-2003</date><time>15:04</time>
/faculteit technologie management
PM-44/faculteit wiskunde en informatica
EMiT
Focus on time and causality.
/faculteit technologie management
PM-45/faculteit wiskunde en informatica
Thumb
Focus on noise.
/faculteit technologie management
PM-46/faculteit wiskunde en informatica
Thumb is able to deal with noise (D/F-graphs)
causality
no noise 10% noise
/faculteit technologie management
PM-47/faculteit wiskunde en informatica
Real case: CJIB
• Processing of fines
• 130136 cases
• 99 different activities
/faculteit technologie management
PM-48/faculteit wiskunde en informatica
Process in EMiT
/faculteit technologie management
PM-49/faculteit wiskunde en informatica
Complete process model
Validated by CJIB
/faculteit technologie management
PM-50/faculteit wiskunde en informatica
SAP R/3
/faculteit technologie management
PM-51/faculteit wiskunde en informatica
Conclusion
• Process mining is both a scientific and practical challenge.
• Preliminary results are promising.
• Challenging problems:– Finding the right data in real information systems.
– Dealing with noise and incompleteness.
– Dealing with advanced synchronization patterns.
– Dealing with hidden tasks/behavioral equivalence.