Lecture on Data Science in a Data-Driven Culture

41
Data-Driven Culture DATA-DRIVEN and DATA-SCIENCE Johan Himberg / Reaktor 29.2.2016

Transcript of Lecture on Data Science in a Data-Driven Culture

Page 1: Lecture on Data Science in a Data-Driven Culture

Data-Driven CultureDATA-DRIVEN and DATA-SCIENCE

Johan Himberg / Reaktor 29.2.2016

Page 2: Lecture on Data Science in a Data-Driven Culture

survey data on the business practices and IT investments of 179 large, publicly traded companies

Firms that emphasise “data driven decision making”have output and productivity that is 5-6% higher than what would be expected given other investments and IT usage.

relationship also appears in asset utilisation, return on equity and market value

Why “data-driven”WHY

2

Brynjolfson et al (2011) on Data-Driven

Page 3: Lecture on Data Science in a Data-Driven Culture

Business acumen what for

Operations Researchoptimal decisions and actions

Probability theory how to handle uncertainties

Analyticsinsights and machine learning from data

Computer Science how to implement all that

Data Science in businessWHY

3

Page 4: Lecture on Data Science in a Data-Driven Culture

Data Science & analyticsBASICS

Page 5: Lecture on Data Science in a Data-Driven Culture

BASICS

5

Some dimensions 1. Business case

2. Analytical task

1. Active - Passive system

2. Informative - Operative aim

3. Modelling (model selection and fitting)

4. Data: structure, amount, velocity, and source

REAKTOR / JOHAN HIMBERGFEBRUARY 2016

Page 6: Lecture on Data Science in a Data-Driven Culture

Data Science & analyticsBUSINESS CASES

Page 7: Lecture on Data Science in a Data-Driven Culture

SECTION TITLE

7

Beware of empty “data-speak”

A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine learning course:

“Data-speak” hides the processes behind data. What creates the data? What is done with the results?

The goal is not “data analysis”

Define your goal and setup without using the word ‘data’.

REAKTOR2016

Page 8: Lecture on Data Science in a Data-Driven Culture

Sell audiences Google, Facebook, media, …

Sell information credit rating, car register,…

Information businessBUSINESS CASE

8

Page 9: Lecture on Data Science in a Data-Driven Culture

OperationsBUSINESS CASE

9

Create beneficial eventsmarketing: targeting, cross-sell, up-sell, conversionfind right product/service to sell or buy, find a good doctor, expert etc.

Avoid non-beneficial eventschurn, people leaving, waste, credit loss, fraud, …system failures, …

Optimizecustomer value,work force, schedules, prices, discounts, stocks,relevancy for customer,production quality, speed

Rationaliseprocess efficiency, lead times, handle complexity, search time … 

Understand: customer & product base, transactions, or processes internally: ERP, CRM, HR, sales systems, production, …externally: location, routes, weather, demographics, estates, …

Page 10: Lecture on Data Science in a Data-Driven Culture

Efficiency and competitionReact faster, streamlined decision making, risk awarenessFinancial efficiencyInnovations

Well-informed strategic decisionsUnderstanding customer groups’ needs for product and service developmentUnderstanding and predicting world events, economics, demographics, ….React to market fluctuation or changes in financial environment

Internal and external image and cultureTransparency, learning as a part of company cultureCustomer satisfaction, personalisation, brand

StrategicBUSINESS CASE

10

Page 11: Lecture on Data Science in a Data-Driven Culture

Netflix"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content.

- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering

ExampleVIRTUES

11

Page 12: Lecture on Data Science in a Data-Driven Culture

Data Science & analyticsTASKS & RISKS

Page 13: Lecture on Data Science in a Data-Driven Culture

BASICS

13

Some dimensions 1. Business case

2. Analytical task

1. Active - Passive system

2. Informative - Operative aim

3. Modelling (model selection and fitting)

4. Data: structure, amount, velocity, and source

REAKTOR / JOHAN HIMBERGFEBRUARY 2016

Page 14: Lecture on Data Science in a Data-Driven Culture

BASICS

14

Informative - Operative

Informative (for understanding)

Analysis results for understanding things, results for management for making decisions: reports, predictions, what-if analyses, simulations, visualisations,…

Operative

Automated system that makes decisions based on some rules or models, or

results that are directly operative, if not automated.

REAKTOR / JOHAN HIMBERGFEBRUARY 2016

Page 15: Lecture on Data Science in a Data-Driven Culture

BASICS

15

Active - Passive

Active

You make an “intervention” and gather evidence in tests designed to reveal an effect.

Example: A/B testing.

Passive

Data is just collected, captured “as it happens”: customer transactions, sales, web-browsing, tweets

REAKTOR / JOHAN HIMBERGFEBRUARY 2016

Page 16: Lecture on Data Science in a Data-Driven Culture

BASICS

16

Use cases

REAKTOR2016

Descriptive What has happened?

Diagnostic Why did it happen?

Passive Active

Customer profiles

Customer segmentation

Shopping cart analysis

Predictive What will happen?

Prescriptive What should I do?

Informative

Operative

Marketing impact analysis

Price elasticity analysis

Web design testing

Up-sell/cross-sell

New customer acquisition

Churn prediction

Life-time value prediction

Demography prediction

Marketing impact optimisation

Recommendation system

in a dynamic environment

Page 17: Lecture on Data Science in a Data-Driven Culture

Data Science & analyticsRISKS & PROBLEMS

Page 18: Lecture on Data Science in a Data-Driven Culture

RISKS / PROBLEMS

18

Issues by analytics use case

REAKTOR2016

Descriptive • isolated / ad hoc reports • isolated ad hoc decisions • feedback loop (report - decision

- effect) • ignoring statistics • analysts as sql-monkeys • UI / visualization

Diagnostic • statistical skills • testing and organisation • correlation vs. causality • requires lots of

communication

Passive Active

Predictive • what to predict: how to

quantify the target • access to historical data • quantifying and understanding

the risk(s) • prediction accuracy validation

for future

Prescriptive • what to optimize? • complex software system • technical feedback loop • co-op between “human” and

“artificial intelligence” • monitoring

Informative

Operative

Page 19: Lecture on Data Science in a Data-Driven Culture

•Focusing on wrong things•not recognising the analytics use cases•“data first”: long time from investment to benefits•not starting from the beef: actions and decisions• thinking only IT solutions and products•careful examination and validation of the algorithms, but not setting targets and risks according to the business target

•Organisation •silos: communication through hierarchy•no access to data, internal politics• technical details decided by business people•business criteria set by technical people

Examples…RISKS / PROBLEMS

19

Page 20: Lecture on Data Science in a Data-Driven Culture

•Underestimating complexity (time & scope)•both software and analytics to be build simultaneously• the time and effort needed with “data wrangling”• the time used for UIs and visualisations• the feedback loop

•Unrealistic expectations (quality) •on analytical systems in general (they are not that intelligent); rules needed•a product, a model, an algorithm, a data scientist solves all the problems•risks and targets cannot always be defined properly right away• there is no guarantee on accuracy on a particular case before trying

…more examplesRISKS / PROBLEMS

20

Page 21: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskWISE - DETERMINED - CURIOUS

Page 22: Lecture on Data Science in a Data-Driven Culture

Wise: Solve the right problems with analytics! Determined: aim at specific, concrete thingsCurious: be ready to divert, seek for evidenceBayesian: understand uncertainties and risksTruthful: don’t bend results upon wishes, it’s data scienceCourageous: act on evidenceActive and Agile: test, don’t just observe; inspect - adapt - learnTransparent and Helpful: co-operate from end-to-end, don’t silo

Culture that helps to handle riskVIRTUES

22

Page 23: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskWISE - DETERMINED - CURIOUS

Page 24: Lecture on Data Science in a Data-Driven Culture

Netflix prize competition (2006-2008)

Who gets the best RMSE (root mean squared error) on true user likings?

BUT

"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.”---Netflix Prize objective... is just one of the many components of an effective recommendation system... We also need to take into account factors such as context, title popularity… Supporting all the different contexts in which we want to make recommendations requires a range of algorithms that are tuned to the needs of those contexts.”

- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering

Aim at the right thingsVIRTUES

24

Page 25: Lecture on Data Science in a Data-Driven Culture

Always aim at something specific … but be open-minded and curious

Example: Röntgen and Fleming (Nobel laureates)

• their most famous findings were “accidental”, but

• they were skilled scientists doing disciplined research for some other aim

Explore occasionally “from data to insights”. But not aimlessly. 

If you find something interesting, make a disciplined analysis, preferably a test.

CuriosityVIRTUES

25

Page 26: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskBAYESIAN - TRUTHFUL

Page 27: Lecture on Data Science in a Data-Driven Culture

The main ingredients of data science!

Making decisions based on data analysis requires the concepts of risk and probability.

Understanding probabilities VIRTUES

27

Page 28: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskCOURAGE

Page 29: Lecture on Data Science in a Data-Driven Culture

Courage

“Data driven means that progress in an activity is compelled by data rather than by intuition or personal experience. It is often labeled as the business jargon for what scientists call evidence based decision making

- Wikipedia 2016-02-24

“I take risks, sometimes patients die. But not taking risks causes more patients to die, so I guess my biggest problem is I've been cursed with the ability to do the math.

- Fictional character Dr. House in Fox television series “House”

Page 30: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskHELPFUL - TRANSPARENT - AGILE

Page 31: Lecture on Data Science in a Data-Driven Culture

Agile - Transparent Doing data-driven work and data science in any organisation model boils down to

“Involve everyone along the information path”

Agile development - Team decides details

Start from

•concrete actions that can be optimized

•decisions they require, and

•how to measure the effects properly

Remember the feedback loop!

Develop constantly

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 32: Lecture on Data Science in a Data-Driven Culture

Action

optimize decide deploy

Data

big, small, open local, web, meta, …

Information

report visualize

model

Bus

ines

s dr

iver

s

aim 1

aim 2

aim 3

aim 4

aim 5For example

• Automatised decisions; recommendation, targeting

• Simulation

• prescriptive, predictive modelling

For example

• documentation on meaning of the data

• KPIs, profiles, segments, factors, DW dashboards

• descriptive, diagnostic, predictive modelling

For example

• source integrations

• Extract - Load - Transform

• Metadata

• modelling for cleansing & consistency

modellingwhat are the actions what are the insights

wranglingwhat data means

testingwhat is the impact

Think & plan from deployment to data

Pick an aim!

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 33: Lecture on Data Science in a Data-Driven Culture

Action Data Information

Bus

ines

s dr

iver

s

aim 1

start from here!

aim 3

aim 4

aim 5

For example

• Business: need optimising for customer retention

• Marketing: we could start with special offer by SMS

• Data Scientist: we’ll set up test & control groups!

For example

• Solution expert: Field ZPOR means revenue per unit and it is calculated based on …

• Customer transactions are not in Data Warehouse, they’re aggregated on monthly level - Let’s get daily data from system Z

For example

• Now we have transactions for 1M users for 1 yr fields a,b,c,d,e …

• …

modellingwhat are the actions what are the insights

wranglingwhat data means

testingwhat is the impact

Data-Driven is inherently iterative and benefits from agility. Data and processes are often not like assumed.Be curious, keep backlog, inspect, adapt.

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 34: Lecture on Data Science in a Data-Driven Culture

Action Data Information

Bus

ines

s dr

iver

s

aim 1

aim 2

aim 3

aim 4

aim 5For example

• deploy campaign, collect responses

For example

• calibrate & apply model

For example

• get data for modeling

• store results

modellingwhat are the actions what are the insights

wranglingwhat data means

testingwhat is the impact

Execute based on model, collect data

THE LOOP: results

Page 35: Lecture on Data Science in a Data-Driven Culture

Action Data Information

Bus

ines

s dr

iver

s

aim 1

aim 2

aim 3

aim 4

aim 5Backlog example

• test & control group handling in marketing automation

• Involve N.N. to the process

Backlog example

• define new information source

• Look for a new data source for determining income on zip code areas

• correct documentation

• automatization for the campaign modelling

Backlog example

• better system configuration & architecture

• automatization for the campaign process…

• new data: record information on all campaigns

modellingwhat are the actions what are the insights

wranglingwhat data means

testingwhat is the impact

Information path focused backlog

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 36: Lecture on Data Science in a Data-Driven Culture

Don’t silo • A change of culture; information (not data) is everybody’s business as well as

money

• One data scientist can’t excel all of this:

• PO / Technical Account Manager

• Business specialist

• Solution owner / process owner

• Data Steward

• Developer

• Visualization / UX expert

Page 37: Lecture on Data Science in a Data-Driven Culture

Data Scientists’ special role • Data scientists main tasks are in methods, but also in

processes and machinery of

• making evidence based decisions (automated if possible)

• finding out confidence on the outcome (by active tests if possible)

• getting insights based on models and data

• Data scientist often act as a “glue”.

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 38: Lecture on Data Science in a Data-Driven Culture

Culture that helps to handle riskTECHNOLOGY

Page 39: Lecture on Data Science in a Data-Driven Culture

Technology• Different analytical tasks need different tools. One has to integrate

different systems. Remember that you need a feedback loop!

• Prefer systems

• that give mass-access to historical, transactional data on individual level instead of just aggregates (avoid being “blinded by averages”)

• from which you’ll get the data, transformations, and results out to another system (avoid being “data hostage”)

• where you see what the analytics actually does at least on modular level (avoid being “method hostage”) Prefer being able to see the actual implementation (open source)

• Pick a product when you know the task, your needs, the product quality.

Lecture @AaltoBIZ, Johan Himberg, 2015

Page 40: Lecture on Data Science in a Data-Driven Culture

References• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data-

Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486

• Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html

• Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917

• Data science skills

• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html

• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html

Page 41: Lecture on Data Science in a Data-Driven Culture

www.reaktor.com