6. December 2016 #deloittegng€¦ · data scientist definition speaks to a rare blend of...
Transcript of 6. December 2016 #deloittegng€¦ · data scientist definition speaks to a rare blend of...
6. December 2016 2016
Grab‘n Go: Session 18Analytics: Motoren i fremtidensdatadrevne virksomheder
#deloittegng
Advanced Analytics is hyped…
2© 2015 Deloitte
Machine
Learning
Advanced
Visualization
Artificial
Intelligence
Simulation
?
Real-time
Analytics
Sentiment
Analysis
Image
Recognition Business Case Data Technology Competences Examples
pdf0
11
10
?
Hypothesis Data New Technology InsightData Scientist
The winning formula of being data-driven
pdf0
11
10
?
Hypothesis Data New Technology ActionData Scientist
The winning formula of being data-drivenThe potential does not lie in the insight but in the action
From Descriptive Predictive
Typical transition journey towards advanced insight
© 2015 Deloitte 5
Ad
ded
Valu
e
Business User Data scientist
Descriptive
Diagnostic
Predictive
Prescriptive
Operational reporting• Standardized, static
reports (e.g. Monday morning Weekly Sales report)
• In-system reports (Open orders)
KPI reporting• Integrated
Performance reporting• Setting the basis for
integrated performance management
• Interactive dashboards for providing the right level of insight
Visual data exploration• Visualizing data &
relationships• Cause explanation
and anomaly detection
Simulation & Optimization• Simulate and
experiment with possible scenario’s
• Find the best solution out of many
• Generate structured business cases
Predictive Modelling• Forecasting and
predicting future outcomes
• Modelling and understanding correlations and causalities
Analytical Sophistication
Proje
ct
ap
proach
Êxperiment(lab)
Agile
Structured
From BI Advanced AnalyticsBusiness should not have what they ask for – but what they need…
© 2015 Deloitte 6
I would like to understand my
revenue broken down by a
number of dimensions…
I would like to know the
sickness leave broken down
by organization and time…
Traditional BI
Advanced Analytics
I would like to understand what is
causing my UK revenue to fall and
be able to be notified when and why
this is starting to happen so that I
can stop it before it becomes a
problem…
I would like to be able to know what
causes longer sickness leave
among specific departments to be
able to prevent people from being
stressed…
Types of Data Driven Business
1
2
3
Old Business New DataData used to diversify products or services to improve business – but the product itself remains as-is
Transforming Business Data is an integrated part of the product or services –product development is changed to support the transformation
Data NativeThe business is 100% based on data and the business originates from a data insight.
Core BusinessDo same thing better
Transformation
Find Inspiration
© 2015 Deloitte
Types of Data Driven Business
Data-enabled differentiation
Product Innovators Enhance their products and services with data
Systems InnovatorsUse data to integrate multiple product types
Data brokering
Data ProvidersGather and sell raw data without adding too much
value to it
Data Brokers Combine data from
multiple sources and sell insights
Data-based delivery networks
Value Chain Integrators Share data to extend
product offerings or reduce costs
Delivery Network Collaborators
Share data to foster marketplaces and enable
advertising
© 2015 Deloitte
From data to insight to action
What happened?
Why did it happen?
What will happen?
What do we want to happen?
Descri
ptive
Pre
dic
tive
Pre
scri
ptive
Business outcome
Data Knowledge
• Reduce cost• Improve efficiency• Automation• Prediction• New services• Improve products• New products• General insights• …
Action
?
Plan your actions
© 2015 Deloitte
© 2015 Deloitte 10
BIG DATA = “ALL DATA”
New Technologies
Data Driven Business
New Data Types
New Sources
New Data
New Insight
New Decision making
New Scenarios
Draw your data landscape
Draw your data landscape
11© 2015 Deloitte
Internal
CRMERP
R&DStr
uctu
red
External
Un
str
uctu
red
Data sources
Volume Variety
Velocity Variability
Shared Data Alternate Data
Privacy
Meta Data
Master Data
EssentialsConsider
Data Catalogue
Data characteristics, the V’s
Calculated Data
The Big Data industry
© 2015 Deloitte
The Open Source Community
13© 2015 Deloitte
Machine
Learning
Hadoop
Advanced
Visualization
Artificial
Intelligence
Jupyter
Simulation
Scala
?
SPARK
Traditional Data Management
Big Data technologies
?
R
MATLAB
Real-time
Analytics
Python
Sentiment
Analysis
Image
Recognition
?
Amount of data is now fit for purpose with Machine Learning algorithm
Processing power is now general availableAlgorithms are democratized due to open
source
Jupyter Notebook - The Data Scientists “Swiss knife”
The Open Source Community
© 2015 Deloitte 14
The Jupyter Notebook is a web application that allows
you to create and share documents that contain live
code, equations, visualizations and explanatory text.
Uses include: data cleaning and transformation,
numerical simulation, statistical modeling, machine
learning and much more.
The Notebook has support for over 40 programming
languages, including those popular in Data Science
such as Python, R, Julia and Scala.
Notebooks can be shared with others using email,
Dropbox, GitHub and the Jupyter Notebook Viewer.
Code can produce rich output such as images, videos,
LaTeX, and JavaScript. Interactive widgets can be used
to manipulate and visualize data in realtime.
Leverage big data tools, such as Apache Spark, from
Python, R and Scala.
Advanced Analytics - Machine Learning Analyse patterns in data
Model
Can we based on earlier examples identify the shape?
Training data based on known patterns are used to build ML models that can identify similar patterns in data
The strength of the model determines the accuracy in prediction on identifying similar patterns in new data
Model
0,980,01 0,03 0,18 0,27 ………………………………………
New classifiers and examples are
added to the training data to strengthen the
model and to be able to identify new patterns
?
False PositiveFalse Negative
© 2015 Deloitte
Process overview of analysis
16© 2015 Deloitte
Illustration
Prediction
Modelling
Data
Time Line
Tim
e C
on
su
mp
tio
n
Insight Driven OrganisationAsking the right questions
ActionDecision
Human inputDescriptiveWhat happened?
Human inputDiagnosticWhy did it happen?
PrescriptiveWhat should be done?
Decision Support
Decision Automation
Human inputPredictiveWhat will happened?
Feedback and
learning
Cognitive computingDecides what should be done, learn and improve from outcome.
Data
Data Information Insight Outlook Non-IDO IDO
Past What has
happened?
Why and how did it
happen?
Present What is currently
happening?
What is the next
best action?
Future
What is going to
happen?
What does
simulation tell us;
the options; the
pros and cons?
© 2015 Deloitte
The Data ScientistCreating the “Purple People”
Technical &
AnalyticalBusiness &
Communication
Business Technical
Traditional Analytics Advanced Analytics
There’s massive confusion about what a data scientist
actually is. For some, a person who can manage spread
sheets and do basic reporting might qualify. For others, the
data scientist definition speaks to a rare blend of statistical
sophistication, data management skills and business acumen.
Professionals who can deliver data-backed insights that
create business value - not just number crunchers - are
especially hard to find.
The shortage of data scientists coupled with growing amount
of data has led organisations to use artificial intelligence
technologies to solve complex problems in new ways and
make more effective decisions. Knowledge about artificial
intelligence is increasingly becoming a critical asset.
WHO ARE THE “PURPLE PEOPLE”? SKILLS SHORTAGES MACHINES AND ARTIFICIAL INTELLIGENCE
© 2015 Deloitte
The Data ScientistCreating the “Purple People”
Sto
ryte
lling
Busin
ess a
cum
enT
echnic
al skill
sD
ata
analy
sis
These skills may be all present in a very highly skilled individual or be complementary skills with a team; however it is this blend of skills within a capability
which is critical for success. Consider creating a ‘Purple Team’ for every analytics project you embark on.
“PURPLE CAPABILITIES”
Advanced Analytics Examples
Annual Accounts
Store Location
Shockwaves in flows
Social media
Learning
© 2016 Deloitte Greenhouse 21
Control on Annual Accounts
Annual accounts for
smaller companies.
727 companies controlled
manually – 24 with error
in property fields
Can we build a model that can
predict same error in new
annual accounts?
Basic data from fiscal year
no strong indicators.
Additional data and data from
previous fiscal year
strong indicators found
Model
First model: prediction
on 65%
Analysed 10.000 new
financial accounts and
identified 173 financial
accounts with error
Added 173 identified
annual accounts to
training set and trained
model
Model +
Updated model: prediction
on ~ 92%
© 2016 Deloitte Greenhouse 22
Store Location
City-Level Store Location Reduction Top City-Level Store Expansion
Optimal No. of Shops
Current No. of Shops
Problem: Why does the productivity drop when we accept many clients?
Service workflow in a small biotech company
© 2015 Deloitte
• Biological samples come in• Laboratory treatment in several stages: cleaning, preparation• Large equipment, terabytes of data• Weeks of computations, report writing and data delivery
Inventory and workflow data in Excel sheets
When did different events occur?Biotech: basic analysis
© 2015 Deloitte 24
0
50
100
150
200
19 23 25 27 30 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
Rate
[sam
ple
s/w
eek]
Week
TotalFlow
During this period the workerscomplained and were often sick
0
200
400
600
800
1000
1200
19 23 25 27 30 34 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
No.
sam
ple
s
Density
Clients in
Weeks of lower
productivityGood Good
Products/results out
Something flows, capacity is inflexibleModel: painfully simple – but useful
© 2015 Deloitte 25
projects
deliveryreception
Flow: 𝑞 =𝑁items
𝑡𝑖𝑚𝑒[1/s]
Density: 𝜌 =𝑁items
𝑙𝑒𝑛𝑔𝑡ℎ[1/m]
Velocity: 𝑣 =𝑙
𝑡[m/s]
flow = velocity × density: 𝑞 = 𝑣 ∙ 𝜌,1
𝑠=
𝑚
𝑠
1
𝑚
𝑥1 𝑥2
“The accountant’s equation“ is bookkeeping: any change of stock must balance the net flux.
Problem: why does the flow start to stall, compromising delivery dates?Answer: phantom traffic jams (density shockwave)
Observations, simulations and properties
26
No bottlenecks!!
Real traffic flow dataJapanese Highway Authority
© 2015 Deloitte
Do not often exceed your critical density.Recommendations for the biotech company
© 2015 Deloitte 27
0
20
40
60
80
100
120
140
160
180
200
0 100 200 300 400 500 600 700 800 900 1000 1100 1200
flow
[sam
ple
s/w
eek]
density [samples in workflow]
Fundamental traffic diagram
• Optimal delivery rate is 65 samples analysed per week.
• Do not process much more than 650 samples per week
• Breakdown predicted if > 2000 samples are to be processed per week.
Does Twitter activity predict box office sales?
Twitter and Hollywood box office revenue
© 2015 Deloitte 28
The early rise in social activity gives away the coming storm.
The viral property modelled as an epidemics
© 2015 Deloitte 29
Liable to buy a ticket
Moviegoer
Losing interest
Opening night in theatres
Example: Avengers
• Opening revenue est. 623 M $
• Model revenue est. 750 M$
• 1 tweet ~ 750 $ at the box office.
Data science: supervised vs unsupervised learning- a look ”under the hood”
Machine learning
© 2015 Deloitte 30
Iris setosa
Iris versicolor
Iris virginica
Fisher's Iris Data
Sepal length
Sepal width
Petal length
Petal width
Species
5.1 3.5 1.4 0.2 I. setosa
4.9 3.0 1.4 0.2 I. setosa
4.7 3.2 1.3 0.2 I. setosa
4.6 3.1 1.5 0.2 I. setosa
5.0 3.6 1.4 0.2 I. setosa
… … … … …
A classic data set from 1936 by the statistician Ronald
Fisher is often used as a benchmark for learning
algorithms. It is called the Iris flower dataset.
When learning is supervised you have expert
information on the categorization of your data, and you
optimize the algorithm to replicate those labels.
Unsupervised learning is pure exploration.
pdf0
11
10
?
Hypothesis Data New Technology ActionData Scientist
The winning formula of being data-driven
6. December 2016
Michael Winther & Jacob Bock Axelsen
Deloitte Consulting