Data Science and Culture

50
Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017

Transcript of Data Science and Culture

Page 1: Data Science and Culture

Data Science &Culture

(Or how to stop worrying and love data driven culture)

Ícaro MedeirosData Science Forum São Paulo, Jun 2017

Page 2: Data Science and Culture

Inspired by(not limited to)

refs

Page 3: Data Science and Culture

Big Data

http://www.kdnuggets.com/2017/02/origins-big-data.html

✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc

✦ Fuzzy concept, ill-defined

✦ Popularized by Gartner(hype-fueled consulting firm)

Page 4: Data Science and Culture

✦ Big Data no longer considered an emerging technology (pervasive in industry)

✦ Entered Trough of Disillusionment in 2013

https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/

Page 5: Data Science and Culture

http://www.mikelnino.com/2016/03/chronology-big-data.html

Chronology of antecedents

Page 6: Data Science and Culture

Data science✦ Statistics (late 19th century)

✦ Computer Science (1950s)

✦ Machine Learning (1950s)

✦ Data Mining (1990s)

✦ Data Science (2010s)

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

yet another hyped term

Page 7: Data Science and Culture

Beware: controversy✦ Data science is not all-science

✴ It’s getting more and more engineering-like, a practice

✴ Data storytelling is a creative endeavor

✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe

Page 8: Data Science and Culture

A new hope

machine learning

big data

https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data

or hype

Page 9: Data Science and Culture

Hype: not that bad✦ Haters gonna hate i.e. don’t fully hate the hype

✴ more practitioners = faster tech and processes evolution

✴ Highly skilled professionals and innovation

✦ Academics sometimes look for difficult unwanted problems

✴ industry is more pragmatic, specially in tech

https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science

Page 10: Data Science and Culture

What we need…✦ Forget about Big Data pokémons

✴ OH so in Big Data we don’t need people to think schemas?

✦ Forget about misunderstood business expectations

✴ OH in deep learning we don’t need people to train models?

✦ You need PEOPLE

✴ Collaborating with shared values

✴ Awesome in tech but more importantly: CREATIVE

Page 11: Data Science and Culture

Shared valuesand practices

Culture

Page 12: Data Science and Culture
Page 13: Data Science and Culture

Good people✦ People are more important than ideas

✴ A mediocre team will screw up a good idea

✴ Mediocre idea to great team: they will fix it or rethink it

✦ A good lab: different kinds of autonomous thinkers

✴ Why hire smart people if they can't fix what’s broken?

✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns

Page 14: Data Science and Culture

The mythical 10x professional

https://twitter.com/icaromedeiros/status/838968884023668737

Page 15: Data Science and Culture

Good communication✦ Honesty, excellence, originality and self-

criticism (values)

✦ Communication structure <> organizational

✦ Be ready to hear the truth

✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work

✦ Braintrust: Leave ego and Jobs outside the door

Page 16: Data Science and Culture

Power to the people!✦ Product quality is everyone’s responsibility

✴ Don’t ask permission to take responsibility

✦ Passion and excellence versus autonomy

✦ Good things might shadow the bad

✴ People struggle to explore bad things to avoid being called “complainers”

Page 17: Data Science and Culture

Rebels

http://qaspire.com/2017/05/19/sketchnote-what-rebels-want-from-their-boss/

Page 18: Data Science and Culture

Destroy data silos!✦ Without information about data there is no science

✦ Software and data should be a collective property within the company

✦ Knowledge management matter

✦ Communication between areas must be enforced

Page 19: Data Science and Culture

Data portals✦ Self-service platforms to publish datasets

✴ Descriptions, schemas, samples, relations between datasets, etc

✦ Open Data initiatives, mostly governments

✦ OSS platforms: CKAN, AirBNB’s Dataportal

✦ Examples: data.gov.uk, dados.gov.br, etc

Page 20: Data Science and Culture

“When it comes to creative inspiration, job titles and

hierarchy are meaningless”

Page 21: Data Science and Culture
Page 22: Data Science and Culture

Data storytelling✦ Explain what numbers tell in layman, clear terms

✦ Make hidden premises clear

✴ Outside data insights

✦ Convince others about actions

✴ Decreases insights-to-value interval

✦ From data to knowledge

https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs

Page 23: Data Science and Culture

What is creativity

✦ Unexpected connections of concepts and ideas

✦ It's a marathon, it needs rhythm

✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process

Page 24: Data Science and Culture

Visual communication✦ Clean straightforward graphs > visually appealing

✴ Choose dataviz libs wisely

✦ “Don’t make me think”

✦ The right graph for the right audience

✴ Prefer a language everyone understands

Page 25: Data Science and Culture

Visual communication 101

Page 26: Data Science and Culture

Stats are not enough

https://www.autodeskresearch.com/publications/samestats

Page 27: Data Science and Culture

Stats are not enough

https://www.autodeskresearch.com/publications/samestats

Page 28: Data Science and Culture

Strateg a

Page 29: Data Science and Culture

Avoid egotrip data science✦ “OH my cluster has 10 Petabytes, I’m awesome”

✦ Fancy ML algorithms are not the goal

✦ The most important V in Big Data is value

https://twitter.com/amyhoy/status/847097034536554497

Page 30: Data Science and Culture

KPI versus HiPPO✦ Tech adoption per se is meaningless

✴ Slide-driven Big Data

✴ KPIs should grow from Big Data and data insights initatives

✦ Poor defined goals -> bad decisions

✦ Define viable but ambitious goals

✦ Data beats opinion

Page 31: Data Science and Culture

Set goal, plan and GO!✦ Business questions can't be like “OH we want to

detect things related to millennials”

✦ Clear goals must be set, with actionable metrics

✦ Balance perfect models versus time-to-market

✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive”

https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data

Page 32: Data Science and Culture

The process✦ The process is not the goal

✴ It has no agenda or taste, it’s just a tool

✦ Quality is the best business plan

✦ Agile is a mindset: not only kanbans or scrum

✦ If the model will become operational, mix scientists and engineers from start

Page 33: Data Science and Culture

Build vs Buy✦ If you buy and your core business is not techie, you can be

illiterate in tech

✴ Benchmark before buying

✴ Accelerate results and boost internal knowledge

✦ If you build and have a good-enough techie culture, you’re more or less good to go

✴ Assess pros and cons consciously

✦ If you surf the tech hype AND build good systems you’re awesome

Page 34: Data Science and Culture

https://twitter.com/Doug_Laney/status/847452219641356288

When data goes to vendors…

Page 35: Data Science and Culture

http://www.louisdorard.com/machine-learning-canvas/

Page 36: Data Science and Culture

DATA ENGINEERING

Page 37: Data Science and Culture

Big Data vs Great Data✦ If your logical models do not make sense

✦ Most performed queries are slow

✦ If you have string-only databases

✦ If you have unused expensive data

✦ Maybe your data lake is a swamp

Page 38: Data Science and Culture

“The data is a mess”✦ First step: accelerate human understanding of data

✴ Metadata, context, hidden assumptions

✦ Datasets might serve multiple purposes

✴ Define rationale and context

✴ Data portals and understandable datasets > Dashboards

https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-sciencehttps://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770

Page 39: Data Science and Culture

Data lost in translation✦ Heterogeneous and siloed databases (and people)

✦ Rethink ESB (microservices network)

✦ State-of-the-art: data workflow

✴ Luigi, Airflow (open source), almost every big tech vendor

✴ Transparency, reusability, reproducibility, traceability

✴ Automation and monitoring all the way!

https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science

Page 40: Data Science and Culture

Beyond relational models

✦ Not all data problems fits well in traditional SQL or DW models

✴ Key-value, columnar, graph-based, inverted index, etc

✦ Models are a framework for problem-solving

✴ Not the ultimate answer

✴ There’s no one-size-fits-all model

Page 41: Data Science and Culture

Do not forget fluency✦ Check the company lingua franca

✦ Make it easy for critical decision-makers

✴ Adhoc SQL queries?

✴ Dashboards?

✴ Reports?

Page 42: Data Science and Culture

EXPERIMENTATION

Page 43: Data Science and Culture

Experiments✦ Missions to discover facts towards understanding

✴ They don’t fail, any result produces new information

✴ If the initial theory was wrong: good

✴ With new facts you can reformulate the question

✦ Get more modeling questions asked more often

✦ Iterative data science

Page 44: Data Science and Culture

Product experimentation (A/B)

✦ Product experimentation should be hypothesis-driven (not feature-driven)

✦ Define the proper exposed population

✴ No new users, no heavy users only, no early adopters

✦ Understanding effect is essential

https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a

Page 45: Data Science and Culture

5 stages of A/B tests

https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari

Page 46: Data Science and Culture

Some other quick tips

✦ Focus on outcomes (not algorithms or methods)

✦ Design the right metric and evaluation

✦ Good experiments don't produce obvious insights

✦ Mix of data and intuition

https://twitter.com/mrdatascience/status/869957499662860288

Page 47: Data Science and Culture

Being data driven

✦ Be BAYESIAN - uncertainty is everywhere

✦ Be CURIOUS - keep learning

✦ Be AGILE - Fail fast, not too fast: evidence comes first

https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/

Page 48: Data Science and Culture

Being data driven

✦ Be TRUTHFUL - don’t torture data to please opinions

✦ Be HELPFUL - work across silos, support democracy

✦ Be WISE - know when to be analytical or intuitive

https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/

Page 49: Data Science and Culture

With the right people,Democracy,Creativity,Strategy,Big Great Data™and Experimentsthere's a good chance to do great

SCIENCE

Take-away message

Page 50: Data Science and Culture

Ícaro MedeirosData Scientist

icaromedeiros