Data Science & Data Products at Neue Zürcher Zeitung
-
Upload
rene-pfitzner -
Category
Technology
-
view
35 -
download
0
Transcript of Data Science & Data Products at Neue Zürcher Zeitung
Data Science &
Data Products at
René Pfitzner, Lead Data Scientist19th Swiss Big Data User Group MeetingZürich, 23rd January 2017
I. IntroductionNZZ, media challenges, trends
II. Data Science @NZZGoals, principles, approaches
III. Data Products @NZZOur “stack” & insights & demo
IV. NZZ Companion
Individual news fueled by data science
Outline
I. IntroductionMyself, NZZ, media challenges, trends
● Lead Data Scientist at NZZ
● media innovation
● algorithmic approaches for
news media
● background in StatPhys
● python, scala, spark, R
Self-Intro
www.renepfitzner.net
@RenePfitznerZH
Newspaper Revenue: the reality
US newspaper advertising revenue, corrected for inflation
Data: Newspaper Association of America
Graphics:https://commons.wikimedia.org/wiki/File:Naa_newspaper_ad_revenue.svg
Newspaper Decline
Number of newspapers in the United States
-25%
Data: www.census.gov
Graphics:https://commons.wikimedia.org/wiki/File:Number_of_newspaper_firms.png
Well, it should be!
Media = Fourth Estate!
Is this something to worry about?
Wikipedia:Decline of Newspapers
https://en.wikipedia.org/wiki/Decline_of_newspapers
II. Data Science @NZZGoals, principles, approaches
Data Science: Goals
Data Science at NZZD
ecis
ion
Mak
ing
Dat
a Pr
oduc
ts
Mar
ketin
g O
ptim
izat
ion
Data Science: Data Products
Attempt of a definition:
A data product is a digital product that provides some benefit to a downstream consuming application, incorporating data and data-based methods (e.g. ML).
Data Science: Data Products
What good is Data Science, if you cannot put
it into production?
Data Science: Data Products
Provision & Integration ?
https://blog.treasuredata.com/blog/2016/03/15/self-study-list-for-data-engineers-and-aspiring-data-architects/
Data Science: Data Products
?Provision & Integration
Data Product
III. Data Products @NZZOur “stack” & insights & demo
Data Products: Our stack
REST API’s
Data Products: What is Spark?
● “General engine for fast big
data processing”
● it’s more: parallel computing
framework
● “hadoop on steroids”
→ in-memory!
Data Products: How and where?
REST API’s
- on-premise / hosted- gcloud -- dataproc
- gcloud- in parts dockerized- kubernetes
- gcloud & hosted- dockerized; kubernetes- microservice approach
Data Products: Article Recomm
- recommendations based on current article
- mixed with advertisement
- article click rate x3
- ad conversion rate x3
Data Products: Article Recomm
Network-based
- weighted co-reading net
Trending articles
- clicks- click trend
Topic detection
- word2vec
Data Products: Learnings?
● Spark is great for general purpose
● … easily maintained
● … go fast from dev to prod
● Scala forces you to think more & structure better
● cons: development notebooks
● more technical? Talk to me later ...
IV. NZZ CompanionIndividual news fueled by data science
NZZ News Companion: Facts
● changing news consumption behavior
● vast majority of article clicks emerges from
startpage
● highly volatile startpage
data-enhanced content delivery
NZZ News Companion: Prototype
NZZ News Companion: DNI
https://www.digitalnewsinitiative.com/
Be a news-innovation
beta [email protected]
Be a news-innovation beta tester!
@RenePfitznerZH
www.renepfitzner.net