Data Science - TU/e · Data Science Betere processen en producten dankzij (Big) data Wil van der...
Transcript of Data Science - TU/e · Data Science Betere processen en producten dankzij (Big) data Wil van der...
Data Science Betere processen en producten dankzij (Big) data
Wil van der Aalst www.vdaalst.com @wvdaalst
www.processmining.org
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
DSC/e: Competences and Research Programs 28 groups and 420+ people involved
Context: Why are we using data science, does it have the intended effect, and will
people accept it?
Analysis: How to turn data into real value (models, answers/decisions, and
visualizations/insights)?
Enabling technologies: How to get the data and deal with computational/
infrastructural challenges (big data and hard questions)?
Probability and Statistics
Stochastic Networks
Data Mining
Process Mining
Visualization
Large-Scale Distributed Systems
Data-Intensive Algorithms
Data-Driven Operations Management
Data-Driven Innovation and Business
Human and Social Analytics
Privacy, Security, Ethics, and Governance
Internet of Things
[RP1] Process Analytics: Improving Service While Cutting Costs
[RP2] Customer Journey: Correlating Events to Learn and Influence Customer Behavior
[RP3] Smart Maintenance & Diagnostics: Safeguarding Availability
[RP4] Quantified Self: Improving Performance and Well-Being
[RP5] Data Value and Privacy: Economic and Legal Aspects of Data Science
[RP6] Smart Cities: Ensuring Safety and Convenience for Citizens
[RP7] Smart Grids: Data Intensive Infrastructures
Data Science Flagship (Philips & DSC/e)
• 4 Strategic topics
• 4 TU/e departments
• 16 PhD students
• 30 Data science specialists
1. Data Driven Value Propositions
2. Healthcare Smart Maintenance
3. Optimizing Healthcare Workflows
4. Continuous Personal Health
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
“Data Science University” in Den Bosch
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: On the interface
between process science and
data science
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Spreadsheet: Killer App for early computers
• VisiCalc (killer
app for Apple II,
Oct. 1979)
• Lotus 1-2-3 (killer
app for IBM PC
1983)
• Microsoft Excel
(1985)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Spreadsheet: Static data
fact derived
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Spreadsheet: Static data
31 items
sold
total
value
average
distribution
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Spreadsheet: Static data
How to analyze operational processes?
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
• Input: events (“things that
have happened”)
• Mandatory per event:
− case identifier
− activity name
− timestamp/date
• Optional
− resource
− transaction type
− costs
− …
case
identifier
activity
name timestamp
resource row = event
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
208 cases
5987 events
74 activities
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
batching for activities
“opstellen eindnota” and
“archiveren”
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Loesje van
der Aalst
desire line
Process Discovery
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
process discovery
NO
modeling
needed!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
process discovery
NO
modeling
needed!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
event data process
model
Conformance Checking
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
desire line
very safe
system
Conformance Checking
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
conformance checking
? discovered or
hand-made
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
conformance checking
fitness of
93.5%
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
conformance checking
final inspection is
skipped 40 times
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
conformance checking
move on model
(something should have
happened, but did not)
move on log
(something happened that
should not happen)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
performance analysis
average
flowtime is
1.92 months
bottleneck
NO
modeling
needed!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
performance analysis
waiting time of
15.74 days
NO
modeling
needed!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
animating reality
real cases
NO
modeling
needed!
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Process Mining: Spreadsheet for behavior
16 cases are
queueing
animating reality
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How to get started?
• Event Data
• Process Mining Tools
• Data Science Mindset
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
Starting point for process mining:
Event data patient activity timestamp doctor age cost
5781 make X-ray [email protected] Dr. Jones 45 70.00
5541 blood test [email protected] Dr. Scott 61 40.00
5833 blood test [email protected] Dr. Scott 24 40.00
5781 blood test [email protected] Dr. Scott 45 40.00
5781 CT scan [email protected] Dr. Fox 45 1200.00
5833 surgery [email protected] Dr. Scott 24 2300.00
5781 handle payment [email protected] Carol Hope 45 0.00
5541 radiation therapy [email protected] Dr. Jones 61 140.00
5541 radiation therapy [email protected] Dr. Jones 61 140.00
… … … … … …
case id activity name timestamp other data resource
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How to get started?
• Event Data
• Process Mining Tools
• Data Science Mindset
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
900+ plug-ins available covering the
whole process mining spectrum
©Wil van der Aalst & TU/e (use only with permission & acknowledgements) ©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
©Wil van der Aalst & TU/e (use only with permission & acknowledgements)
How to get started?
• Event Data
• Process Mining Tools
• Data Science Mindset
Process Mining
Data Science in Action
43.000+25.000 people joined!
Starts again on October 7th 2015! Register via https://www.coursera.org/course/procmin
Conclusion
http://www.tue.nl/dsce/
Get started today! spreadsheet
for behavior
data-oriented analysis (data mining, machine learning, business intelligence)
process model analysis (simulation, verification, optimization, gaming, etc.)
performance-oriented
questions, problems and
solutions
compliance-oriented
questions, problems and
solutions