Process Mining An index to the state of the art and an outline of open research challenges at DIAG...
-
Upload
amber-graves -
Category
Documents
-
view
215 -
download
3
Transcript of Process Mining An index to the state of the art and an outline of open research challenges at DIAG...
Process MiningAn index to the state of the art and an outline of open research challenges at DIAG
Claudio Di Ciccio, Massimo Mecella
Seminars in Software and Services for the Information Society
Process Mining
• Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs).
• ProM [AalstEtAl2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques.• The new version 6.0 is available for download at
www.processmining.org
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 2
Definition
Process Mining
• Process Mining involves:• Process discovery
• Control flow mining, organizational mining, decision mining;
• workflow
• Conformance checking
• Operational support
• We will focus on the control flow mining
• Many control flow mining algorithms proposed• α [AalstEtAl2004], α++ [WenEtAl2007] and α# [WenEtAl2010]
• Fuzzy [GüntherAalst2007]
• Heuristic [WeijtersEtAl2001]
• Genetic [MedeirosEtAl2007]
• Two-step [AalstEtAl2010]
• …
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 3
Definition
Process Mining
• The rest of the lesson is based on the following material:• Van der Aalst, W. M. P.: “Process Discovery: An
Introduction”• Available at
http://www.processmining.org/_media/processminingbook/process_mining_chapter_05_process_discovery.pdf
• From the teaching material for [Aalst2011.book]
• De Medeiros, A. K. A.: “Process Mining: Control-Flow Mining Algorithms”
• Available athttp://www.processmining.org/_media/courses/processmining/lecture3_controlflowminingalgorithms.ppt
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 4
Further reading
• Artful processes [HillEtAl06]– informal processes typically
carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.)
• “knowledge workers”[ACTIVE09]
• Knowledge workers create artful processes “on the fly”
• Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared
– Loosely structured– Highly flexible
Artful processes and knowledge workersA different context for process mining
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 5
A different context (2)
• In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers
– E.g., a software development mgr.
• Typically, by means of several email conversations
– Email conversations are actual traces of running processes that knowledge workers adhere to
Email conversations
P. 6Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
A different context (4)
• Personal information management (PIM)– how to organize one’s own activities, contacts, etc. through the
usage of software
• Information warfare– in supporting anti-crime intelligence agencies
• Enterprise engineering– for knowledge-heavy industries, where preserving documents
making up product data is not enough
• eHealth– for the automatic discovery of medical treatment procedures on
top of patient health records
Some areas of applicability
P. 7Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
MailOfMine
MailOfMine is the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of email messages, a set of workflow models that represent the artful processes laying behind the knowledge workers’ activities.
[DiCiccioEtAl11]
[DiCiccioMecella12]
[DiCiccioMecella/TR12]
[DiCiccioMecella13]
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 8
What is MailOfMine?
On the visualization of processes
P. 9
The imperative model
• Represents the whole process at once
• The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets)
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
On the visualization of processes
• Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints
• the idea is that every task can be performed, except the ones which do not respect such constraints
• this technique fits with processes that are highly flexible and subject to changes, such as artful processes
P. 10
The declarative modelIf A is performed,
B must be perfomed,no matter
before or afterwards(responded existence)
Whenever B is performed,C must be performed
afterwardsand B can not be repeated
until C is done(alternate response)
The notation here is based on [AalstEtAl06, MaggiEtAl11] (DecSerFlow, Declare)
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
On the visualization of processes
P. 11
Imperative vS declarative
Imperative
Declarative
Declarative models work better in presence of a partial specification of the process scheme
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
A real discovered process model“Spaghetti process” [Aalst2011.book]
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 12
Declare constraint templates
P. 13
Existence templates
Existence(n, A)Activity A occurs at least n times in the process instanceBCAAC ✓BCAAAC ✓BCAC ✗(for n = 2)
Absence(A)Activity A does not occur in the process instanceBCC ✓BCAC ✗
Absence(n+1, A)Activity A occurs at most n+1 times in the process instanceBCAAC ✗BCAC ✓BCC ✓(for n = 2)
Exactly(n, A)Activity A occurs exactly n times in the process instanceBCAAC ✗BCAAAC ✗BCAC ✗(for n = 2)
Init(A)Activity A is the first to occur in each process instanceBCAAC ✗ACAAAC ✓ BCC ✗
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
Declare constraint templates
P. 14
Relation templates
RespondedExistence(A, B)If A occurs in the process instance, then B occurs as wellCAC ✗CAACB ✓BCAC ✓BCC ✓
Response(A, B)If A occurs in the process instance, then B occurs after ABCAAC ✗CAACB ✓CAC ✗BCC ✓AlternateResponse(A, B)Each time A occurs in the process instance, then B occurs afterwards, before A recursBCAAC ✗CAACB ✗CACB ✓CABCA ✗BCC ✓CACBBAB ✓
ChainResponse(A, B)Each time A occurs in the process instance, then B occurs immediately afterwardsBCAAC ✗BCAABC ✗BCABABC ✓
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
Declare constraint templates
P. 15
Relation templates
RespondedExistence(B, A)If B occurs in the process instance, then A occurs as wellCAC ✓CAACB ✓BCAC ✓BCC ✗
Precedence(A, B)B occurs in the process instance only if preceded by ABCAAC ✗CAACB ✓CAC ✓BCC ✓AlternatePrecedence(A, B)Each time B occurs in the process instance, it is preceded by A and no other B can recur in betweenBCAAC ✗CAACB ✓CACB ✓CABCA ✓BCC ✗CACBAB ✓
ChainPrecedence(A, B)Each time B occurs in the process instance, then B occurs immediately beforehandBCAAC ✗BCAABC ✗CABABCA ✓
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
Declare constraint templates
P. 16
Relation templates
CoExistence(A, B)If B occurs in the process instance, then A occurs, and viceversaCAC ✗CAACB ✓BCAC ✓BCC ✗
Succession(A, B)A occurs if and only if it is followed by B in the process instanceBCAAC ✗CAACB ✓CAC ✗BCC ✗AlternateSuccession(A, B)A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the traceBCAAC ✗CAACB ✗CACB ✓CABCA ✗BCC ✗CACBAB ✓
ChainSuccession(A, B)A and B occur in the process instance if and only if the latter immediately follows the formerBCAAC ✗BCAABC ✗CABABC ✓
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
Declare constraint templates
P. 17
Negative relation templates
NotCoExistence(A, B)A and B never occur together in the process instanceCAC ✓CAACB ✗BCAC ✗BCC ✓
NotSuccession(A, B)A can never occur before B in the process instanceBCAAC ✓CAACB ✗CAC ✓BCC ✓
NotChainSuccession(A, B)A and B occur in the process instance if and only if the latter does not immediately follows the formerBCAAC ✓BCAABC ✗CBACBA ✓
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma)
Relation constraint templates subsumptionConstraint templates are not independent of each other
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 18
MINERful++
MINERful++ is the workflow discovery algorithm of MailOfMine
Its input is a collection of strings T and an alphabet ΣT
Each string t is a trace Each character is an event (enacted
task) The collection represents the log
Its output is a declarative process model
What is a declarative process model?
Our mining algorithm
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 19
The declarative specification of an artful process (e.g.)
A project meeting is scheduledWe suppose that a final agenda will be committed
(“confirmAgenda”) after that requests for a new proposal (“requestAgenda”), proposals themselves (“proposeAgenda”) and comments (“commentAgenda”) have been circulated.
Shortcuts for tasks (process alphabet):
– p (“proposeAgenda”)– r (“requestAgenda”)– c (“commentAgenda”)– n (“confirmAgenda”)
Scenario
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 20
The declarative specification of an artful process (e.g.)
Existence constraints1. Participation(n)2. Uniqueness(n)3. End(n)
Relation constraints4. Response(r, p)5. RespondedExistence(c, p)6. Succession(p, n)
The agenda1. must be confirmed,2. only once:3. it is the last thing to do.
During the compilation:4. the proposal follows a
request;5. if a comment circulates,
there has been / will be a proposal;
6. after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal.
Constraints on activities
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 21
MINERful++
MINERful++ is a two-step algorithm1. Construction of a Knowledge Base
2. Constraints inference by means of queries evaluated on the KB
This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input
Returned constraints are weighted with their support
the normalized fraction of cases in which the constraint is verified over the set of input traces
Workflow discovery by constraints inference
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 22
MINERful++ by example
In order to see how it works now
we see a run of MINERful++ over a string, compliant with the previous example:
r r p c r p c r c p c n
We start with the construction of the algorithm’s Knowledge Base
Workflow discovery by constraints inference
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 23
MINERful++ by example
p
– p occurred 3 times in 1 stringγp(3) = 1
• For each m ≠ 3 γp(m) = 0
– p did not occur as the first nor as the last charactergi(p) = 0gl(p) = 0
n
– γn(1) = 1
– For each m ≠ 1, γn(m) = 0
– n occurred as the last character in 1 stringgi(n) = 0gl(n) = 1
Building the “ownplay” of p and n
r r p c r p c r c p c n
r r p c r p c r c p c n
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 24
MINERful++ by example
With respect to the occurrence of p, n occurred…
i. Never before: 3 timesδp,n(-∞) = 3
ii. 2 char’s after: 1 timeδp,n(2) = 1
iii. 6 char’s after: 1 timeδp,n(6) = 1
iv. 9 char’s after: 1 timeδp,n(9) = 1
v. Repetitions in-between:i. onwards: 2 times
b→p,n = 2
ii. backwards: neverb←
p,n = 0
Looking at the string
i. r r p c r p c r c p c n
ii. r r p c r p c r c p c n
iii. r r p c r p c r c p c n
iv. r r p c r p c r c p c n
v. i. r r p c r p c r c p c n
ii. r r p c r p c r c p c n
Building the “interplay” of p and n
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 25
MINERful++ by example
-∞ -5 -2 +1 +2 +4 +5 +8 +9
δr,p 2 1 2 2 2 1 2 1 1
Building the “interplay” of r and p
b→r,p = 1
b←r,p = 0
r r p c r p c r c p c n
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 26
MINERful++
Let r and p be the number of times in which r and p respectively appear in the log
we have, e.g.,support for Response(r, p)
hint: how many times p was not read in the traces after r occurred?In those cases, Response(r, p) does not hold
support for Succession(r, p) how many times p was not read in the traces after
r occurred, nor r was read before p occurred?In those cases, Succession(r, p) does not hold
…
Computing the support of constraints
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 27
MINERful++
Computing the support of constraints
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 28
MINERful++ by example
1. r r p c r p c r c p c n
2. r c c r p p c p c r c p p n
3. c c c c c r r p c r r p r r p r p p p n
4. c p r p n
5. r p p n
6. r r r c r c p n
7. r p c n
8. c r p r r c c p p n
9. p c c n
10. c r r c c c r r p c p c c p n
Support for…– Response(r, p)
1.0– Succession(r, p)
0.96364– Precedence(c, n)
0.9
The support can be used to prune out those constraints falling under a given threshold
– E.g., 0.95
A threshold equal to 1.0 selects those constraints which are always valid on the log
Computing the support of constraints
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 29
Relation constraint templates subsumption
E.g.,– A trace like
a b a b c a b c csatisfies (w.r.t. b and a):
• RespondedExistence(a, b), RespondedExistence(b, a),CoExistence(a, b), CoExistence(b, a), Response(a, b), AlternateResponse(a, b), ChainResponse(a, b), Precedence(a, b), AlternatePrecedence(a, b), ChainPrecedence(a, b),Succession(a, b), AlternateSuccession(a, b),ChainSuccession(a, b)
– The mining algorithm would have to show the most strict constraint only (ChainSuccession(a, b))
MINERful++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints
Constraint templates are not independent of each other
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 30
Relation constraint templates subsumptionConstraint templates are not independent of each other
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 31
Relation constraint templates subsumptionA hint on the pruning procedure
1.0
1.0
1.0
1.0
1.0
1.01.01.0
1.01.01.0
✖✖✖
✖✖
✖✖
✖✖✖
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 32
Relation constraint templates subsumptionA hint on the pruning procedure
0.8
0.8
0.9
0.9
0.9
0.90.70.7
0.90.90.9
✖ ✖?✖?✖
✖
✖
✖
?✖?✖
Support
Support
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 33
Evaluation
MINERful++ is– Independent on the formalism used for expressing constraints– Modular (two-phase)– Capable of eliminating redundancy in the process model– Fast
Characteristics of MINERful++
Sony VAIO VGN-FE11HIntel Core Duo T2300 1.66 GHz2 MB L2 cache2 GB of DDR2 RAM @ 667 Mhz
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 34
Evaluation
Linear w.r.t. the number of traces in the log|T|Quadratic w.r.t. the size of traces in the log|tmax|Quadratic w.r.t. the size of the alphabet|ΣT|Hence, polynomial in the size of the inputO(|T|·|tmax |2·|ΣT |2)
On the complexity of MINERful++
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 35
Preliminary resultsMINERful++ on real data
Logs extracted from 2 mailboxes
– [DiCiccioMecella2012]
5 traces– 34.75 events each on average– 139 events read in total
Evaluation of inferred constraints conducted with an expert domainPrecision ≈ 0.794
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 36
Future work
Integrate MINERful++ with ProM– MINERful++ is already capable of reading/writing XES logs
Study the effects of errors in logs on the inferred workflow– Error-injected synthetic logs can help us conduct an automated
analysis on the quality of resultsRefine the estimation calculi for supportAuto-tune the support threshold
– Depending on the constraint template, which can be more or less “robust”
Enlarge the set of discovered constraint templates to the branching Declare constraints
Research in progress
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 37
References
• [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011).
• [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009)
• [AalstEtAl2004] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004) 1128–1142.
• [WenEtAl2007] Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2) (2007) 145–180.
• [WenEtAl2010] Wen, L., Wang, J., van der Aalst, W. M. P., Huang, B., Sun, J.: Mining process models with prime invisible tasks. Data Knowl. Eng., 69 (2010), 999-1021.
• [GüntherEtAl2007] Günther, C.W., van der Aalst, W.M.P.: Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics. Proc. BPM 2007.
• [WeijtersEtAl2001] Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10 (2001) 2003.
• [MedeirosEtAl2007] Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experi- mental evaluation. Data Min. Knowl. Discov. 14(2) (2007) 245–304.
• [AalstEtAl2010] van der Aalst, W., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Gnther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9 (2010) 87–111 10
Cited articles and resources, in order of appearance
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 38
References
• [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006)
• [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the ACTIVE integrated approach. BT Technology Journal 26(2), 165–176 (2009)
• [DiCiccioEtAl11] Di Ciccio, C., Mecella, M., Catarci, T.: Representing and visualizing mined artful processes in MailOfMine. Proc. USAB 2011:83-94
• [DiCiccioMecella12] Di Ciccio, C., Mecella,M.: Mining constraints for artful processes. Proc. BIS 2012
• [DiCiccioMecella/TR12] Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Technical report, Dipartimento di Ingegneria Informatica, Automatica e Gestionale “Antonio Ruberti” – SAPIENZA, Universita` di Roma (2012).
• [DiCiccioMecella13] Di Ciccio, C., Mecella, M.: A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows. Proc. CIDM 2013
• [AalstEtAl06] van der Aalst, W.M.P., Pesic, M.: Decserflow: Towards a truly declarative service flow language. Proc. WS-FM 2006
• [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. Proc. CIDM 2011
Cited articles and resources, in order of appearance
Process MiningClaudio Di Ciccio (DIAG, SAPIENZA – Università di Roma) P. 39