RuleML 2015: When Processes Rule Events
Transcript of RuleML 2015: When Processes Rule Events
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
When Processes Rule Events
Avigdor GalTechnion – Israel Institute of Technology
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Presentation Outline
Big data: the New Playground
Events, Processes, and Anything in Between
Complex Event Processing Optimizaion
Process Mining with Schedules
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Big Data: is it a Storm in a Teacup?
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Big data is a game changer
From Theory to Systems: empirical evaluation counts
From Systems to Data: large scale empirical evaluationcounts
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Who is a Data Scientist?
The ability to take data – to be able to understand it, toprocess it, to extract value from it, to visualize it, tocommunicate it – that’s going to be a hugely important skill inthe next decades. (Hal Varian, Google’s Chief Economist)
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Volume: No Longer the Size of a Teacup
Volume
Table: Big Data Cross Table
Big data may be a single dataset with a lot of data
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Volume: No Longer the Size of a Teacup
Table: Big Data Cross Table
Big data may be a single dataset with a lot of data
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Velocity: Replacing a Teacup with a Tea Hose
Volume
Velocity
Table: Big Data Cross Table
Big data may be data that rapidly changes
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Velocity: Replacing a Teacup with a Tea Hose
Table: Big Data Cross Table
Big data may be data that rapidly changes
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Variety: When One Tea Type is Just notEnough
Volume
Velocity
Variety
Table: Big Data Cross Table
Big data may be a small dataset with many different schemata
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Variety: When One Tea Type is Just notEnough
Table: Big Data Cross Table
Big data may be a small dataset with many different schemata
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Veracity: Is it Coffee or Black Tea with Milk?
Volume
Velocity
Variety
Veracity
Table: Big Data Cross Table
Big data may be data with varying levels of trustworthiness
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Veracity: Is it Coffee or Black Tea with Milk?
Table: Big Data Cross Table
Big data may be data with varying levels of trustworthiness
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Gathering: where and when to expect thefountain to burst
Gathering
Volume
Velocity
Variety
Veracity
Signal and Event Processing
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Gathering: where and when to expect thefountain to burst
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Management: Not your typical DBA anymore
Gathering Managing
Volume
Velocity
Variety
Veracity
Cloud Computing, NoSQL, NewSQL
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Analytics: When Data Analysis ExplodesMulti-Dimensionally
Gathering Managing Analyzing
Volume
Velocity
Variety
Veracity
Data & Process MiningML, IR, NLP
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Visualization: The Machine Offering toMankind
Gathering Managing Analyzing Visualizing
Volume
Velocity
Variety
Veracity
User Experience
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Data Visualization: The Machine Offering toMankind
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Big Data Cross Table
Gathering Managing Analyzing Visualizing
Volume Ev Pro
Velocity en ce
Variety t ss
Veracity s es
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Event Processing
Events
An event e is an occurrence within a particular system ordomain.
It is something that has happened, or is contemplated ashaving happened in that domain.[Etzion and Niblett, 2010]
Point-based semantics.
An event type E ∈ E is a specification for a set of eventsthat share the same semantic intent and structure.
Complex Event Processing
Systems: Amit [Adi and Etzion, 2004],SASE [Wu et al., 2006], Cayuga [Demers et al., 2007],CEDR [Barga et al., 2007], ESPER [].
DEBS 2016: Oragne County, California
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Event Processing
Urban Traffic Management
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Traffic Flow
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Bus Log
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Events and Big Data
Volume: 23 Million records per month (∼ 4GB)
Velocity: 770,000 new records per day (an event each 2-6seconds)
Variety: Homogeneous
Veracity: GPS locations
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Processes
Processes
Process models describe time dependencies amongactivities:
Business processesScheduled activities
Used as a template for execution by a process engine.
A process model can be modeled as a graph containingactivity nodes and control nodes:
Petri nets [Reisig, 1985]BPMN [bpm, 2011]
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Process Models
Bus Log
Bus Model
s d
Traveling Time = Drive Time + Delay Time + Stop Time
ω_2 ω_3 ω_i ω_{n-1}
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
Events
Processes
ComplexEventProcessingOptimization
ProcessMining withSchedules
Between Events and Processes
Given processes, detect (complex) events
Given events, discover processes
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
From Processes to CEP
Optimisation of event pattern matching on three levels
Approach based on domain knowledge
Results taken from: M. Weidlich, H. Ziekow, A. Gal, J.Mendling, M. Weske - Optimising Event Pattern Matchingusing Business Process Models. IEEE Transactions onKnowledge and Data Engineering (TKDE), accepted forpublication, 2015.
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
From Processes to CEP
Thanks Matthias Weidlich for the slides
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Optimization by Transformation
Sequentialization Rule
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Optimization by Plan Selection
Sequentialization Rule
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Optimization by Early Termination
Sequentialization Rule
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Performance Analysis
Datasets
publicly available process log that contains recordedexecution sequences of a paper reviewing process.a
The model denes 20 activities.The log comprises 3730 events that are related to 100process instances.Each event is associated with a timestamp and a referenceto an activity of the process model.
Process models of a German insurance company.
1021 process models, ranging from 4 to 339 nodes.The average size of the process models is around 23 nodes.The log was simulated using annotations of the processmodels.
ahttp://www.processmining.org/logs/start
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Performance Analysis
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Performance Analysis
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Complex Events Processing with Processes
Gathering ...
Volume
Velocity Optimization
Variety Optimisation in event processing networks
Veracity
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Complex Events Processing with Processes
... Analysis
Volume Mining of constraints
Velocity
Variety
Veracity Probabilistic mining of constraints
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
From Events to Processes
Online Traveling Time Prediction: when Processes Rule Events
Using information on bus stops, the prediction of the journeytraveling time T (〈ω1, . . . , ωn〉, tω1) is traced back to the sum oftraveling times per segment:
T (〈ω1, . . . , ωn〉, tω1) = T (〈ω1, ω2〉, tω1) + . . . + T (〈ωn−1, ωn〉, tωn−1)
where
tωn−1 = tω1 + T (〈ω1, ωn−1〉, tω1).
s d
Traveling Time = Drive Time + Delay Time + Stop Time
ω_2 ω_3 ω_i ω_{n-1}
(Thanks to Arik Senderovich for the slides)
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
From Events to Processes
Online Traveling Time Prediction: when Processes Rule Events
Using information on bus stops, the prediction of the journeytraveling time T (〈ω1, . . . , ωn〉, tω1) is traced back to the sum oftraveling times per segment:
T (〈ω1, . . . , ωn〉, tω1) = T (〈ω1, ω2〉, tω1) + . . . + T (〈ωn−1, ωn〉, tωn−1)
where
tωn−1 = tω1 + T (〈ω1, ωn−1〉, tω1).
s d
Traveling Time = Drive Time + Delay Time + Stop Time
ω_2 ω_3 ω_i ω_{n-1}
(Thanks to Arik Senderovich for the slides)
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Prediction: The Snapshot Principle inSingle-Station Queues
The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Prediction: The Snapshot Principle inSingle-Station Queues
The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Prediction: The Snapshot Principle inSingle-Station Queues
The snapshot principle stems from a heavy-trafficapproximation of a queueing system under limits of itsparameters, as the workload converges to capacity.
Station1
The principle states that the total time in the station(waiting+service) remains constant.
In our context, bus that passes through a segment, e.g.,〈ωi, ωi+1〉 ∈ S × S, will have the same traveling time asanother bus that has just passed through that segment (notnecessarily of the same type, line, etc.).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
The Snapshot Principle in Single-Station Queues
Based on the above, we define a single-segment snapshotpredictor, Last-Bus-to-Travel-Segment (LBTS), denoted byθLBTS(〈ωi, ωi+1〉, tω1).
In real-life settings, applicability of the snapshot principlepredictors should be tested ad-hoc.
The snapshot principle was shown to be of an empirical valuein previous research, where queueing techniques were applied topredict delays.
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
The Snapshot Principle in Single-Station Queues
Based on the above, we define a single-segment snapshotpredictor, Last-Bus-to-Travel-Segment (LBTS), denoted byθLBTS(〈ωi, ωi+1〉, tω1).
In real-life settings, applicability of the snapshot principlepredictors should be tested ad-hoc.
The snapshot principle was shown to be of an empirical valuein previous research, where queueing techniques were applied topredict delays.
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a networksetting.
The snapshot principle holds for networks of queues, when therouting through this network is known in advance.
In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a networksetting.
The snapshot principle holds for networks of queues, when therouting through this network is known in advance.
In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
In our case, the LBTS predictor needs to be lifted to a networksetting.
The snapshot principle holds for networks of queues, when therouting through this network is known in advance.
In scheduled transportation such as buses this is the case as theorder of stops (and segments) is predefined:
Station1 Station2 Station3
Station5 Station6
Station4
Station7
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN (〈ω1, ..., ωn〉, tω1) =
n∑i=1
θLBTS(〈ωi, ωi+1〉, tω1).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN (〈ω1, ..., ωn〉, tω1) =
n∑i=1
θLBTS(〈ωi, ωi+1〉, tω1).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Snapshot Principle in a Network
We define a multi-segment (network) snapshot predictor thatwe refer to as the Last-Bus-to-Travel-Network orθLBTN (〈ω1, ..., ωn〉, tω1), given a sequence of stops (with ω1
being the start stop and ωn being the end stop).
According to the snapshot principle in networks we get that:
θLBTN (〈ω1, ..., ωn〉, tω1) =
n∑i=1
θLBTS(〈ωi, ωi+1〉, tω1).
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Performance Analysis
Data
8 days of bus data, between September and October of2014.
Each day: approximately 11500 traveled segments.
First trip for each day: no associated last travel time.
Prediction for line 046A.
Data comes from all buses that share segments with line046A.
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Performance Analysis
10 20 30 40 50Index of the segment in the trip
100
101
102
103
104
105
106
107
Sam
ple
square
est
imati
on e
rror
40
50
60
70
80
90
100
110
Root
Mean S
quare
Err
or
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Process Mining with Schedules
... Analysis
Volume Better prediction
Velocity Segmentation
Variety
Veracity
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Process Mining with Schedules
... Management ...
Volume
Velocity
Variety
Veracity Event Cleaning
Table: Big Data Cross Table
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Thank You
Avigdor GalTechnion – Israel Institute of Technology
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
A. Adi and O. Etzion.Amit - the situation manager.The International Journal on Very Large Data Bases, 13(2):177–203, May2004.
Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, and MingshengHong.Consistent streaming through time: A vision for event stream processing.In CIDR [DBL, 2007], pages 363–374.
Business Process Model and Notation (BPMN) Version 2.0.Technical report, Object Management Group (OMG), January 2011.
CIDR 2007, Third Biennial Conference on Innovative Data SystemsResearch, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings.www.cidrdb.org, 2007.
Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald,Varun Sharma, and Walker M. White.Cayuga: A general purpose event monitoring system.In CIDR [DBL, 2007], pages 412–422.
Opher Etzion and Peter Niblett.Event Processing in Action.Manning Publications Company, 2010.
LectureOutline
Big Data: theNewPlayground
Events,Processes, andAnything inBetween
ComplexEventProcessingOptimization
ProcessMining withSchedules
Wolfgang Reisig.Petri Nets: An Introduction, volume 4 of Monographs in TheoreticalComputer Science. An EATCS Series.Springer, 1985.
Eugene Wu, Yanlei Diao, and Shariq Rizvi.High-performance complex event processing over streams.In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD internationalconference on Management of data, pages 407–418, New York, NY, USA,2006. ACM.