Data Journalism Big and Small - DATA SCIENCE NIGHTS...Data Journalism: Big and Small ... working on...
Transcript of Data Journalism Big and Small - DATA SCIENCE NIGHTS...Data Journalism: Big and Small ... working on...
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
Data Journalism: Big and SmallJoe GermuskaChief Nerd, Knight Lab @JoeGermuska
🔗 Notes: http://bit.ly/dj-big-and-small
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
What is Knight Lab?
a community of designers, developers, students, and educators working on experiments designed to push journalism into new spaces.
The Lab provides an open, collaborative environment for interdisciplinary exploration and conversation, where students and professionals learn together and from one another.
In short, we’re energized by hard questions worth answering; we believe in the process as much as the product.
tools studio community
Easy-to-use
Free
Internationally popular
No coding required
Open source if you do
TOOLS
Interdisciplinary
Small teams
One project
No lectures
By application
CS or JOUR credit
STUDIO
VR / AR
Environmental sensors
Data visualization
Conversational UI
App prototypes
Audience Research
COMMUNITY
🗓 Open Lab Tuesdays 7-9 🗓 Lab Lunch Thursdays 12-1
Regular Events
📰 facebook.com/knightlab 📰 bit.ly/knightlab-community
Announcements
📍 Fisk Hall 109-111
Device LabDrop-in sometimes works Or make an appointment [email protected]
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
Data Journalism: the Early Years
Prehistory
kudos to Scott Klein 📈 https://tinyletter.com/abovechart/
The Bombay Times, 1842 ☝Earliest line chart in a newspaper?
The Chicago Tribune, 1901 👉
New York Tribune, 1848 ☝Audit of Congressional mileage reimbursements
Detroit 1967
“nobody knew who the rioters were and
why they had rioted”
Atlanta 1988
“The Color of Money”, Bill Dedman, Atlanta Journal ConstitutionFirst Pulitzer Prize for a data-driven story
Miami 1992“What went wrong” Miami Herald (analysis by Steve Doig)Public Service Pulitzer
Data Sources Storm damage inspections Property tax roll Building master file Building/zoning db Campaign contribs (hand-entered from paper records)
“more than 45 reels of magnetic tape”
“Computer-Assisted Reporting”
NICAR: National Institute of Computer Assisted ReportingHas evolved a lot!NICAR 2018 here in Chicago, March 8-11You should go!https://ire.org/conferences/nicar18/
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
21st Century: It’s all « computer assisted »
collection
analysis
publication
Atlanta Journal-Constitution “Doctors & Sex Abuse”
(2017)
Grew out of other reporting on prison medical care
“Is Georgia unusual nationally?”
Created about 50 web scrapers to gather 100,000 disciplinary documents
Machine learning methods highlighted over 6000 documents to read and research completely
“Betting worth billions. Elite players. Violent threats. Covert messages with Sicilian gamblers. And suspicious matches at Wimbledon. Leaked files expose match-fixing evidence that tennis authorities have kept secret for years.”
BuzzFeed / BBC“The Tennis Racket” (2016)
John Templon (Medill MSJ ’09) analyzed 26,000 matchesInspired by paper in Journal of Prediction Markets “Monte Carlo Method” 🎲 1M simulations per playerCheck out the GitHub repo!
“BuzzFeed News Trained A Computer To Search For Hidden Spy Planes. This Is What We Found.” (2017)
✈ Extracted features from 20,000 flights
✈ “random forest” algorithm
✈ Follow-ups found:
✈ State & local cops
✈ Customs & Border Patrol
✈ Air Force Special Ops
✈ DEA
✈ Contractors
New York Times “512 Paths to the White House” (2012)
☑ Addressed key questions in reader minds
☑ Played out scenarios
☑ Provided push-button interaction for less engaged readers
New York Times “You Draw It: How Family Income Predicts Children’s College Chances” (2015)
✏ Engages readers
✏ Collects data for NYT
✏ “You draw it” now a recurring form
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
Clips File
Data science beyond reporting
How Promotion Affects Pageviews on the New York Times Website
Brian Abelson, Source
“Information Disorder”
“Information Pollution”
🌀 Misinformation
🌀 Disinformation
🌀 Propaganda
Please don’t say “fake news”
Study: Breitbart-led right-wing media ecosystem altered broader media agenda
Benkler, Faris, Roberts, and Zuckerman
Data science applications in studying journalism as well as making it
Tools from Journalism🔧 dedupe.io - match data records even when strings aren’t exact matches
🔧 TabulaPDF - get data out of PDFs
🔧 csvkit - data “Swiss army knife
🔧 twick - Quick Twitter archiver
🔧 klaxon - Watch websites for changes
Hall of Fame🔧 Django web framework
🔧 Underscore / Backbone / CoffeeScript
🔧 D3
News Databases⚡ break free from story format
⚡ generate leads for newsroom
⚡ can be “expensive”
⚡ can be good sources for your data work
⚡ ProPublica Represent
⚡ ProPublica Nonprofit explorer
⚡ Census Reporter
⚡ crime.chicagotribune.com (RIP)
⚡ schools.chicagotribune.com (RIP)
Examples
ProPublica Represent
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
Conversation
NORTHWESTERN UNIVERSITYC H I C A G O • S A N F R A N C I S C O
@knightlab
@joegermuska ✉ [email protected]
🔗 knightlab.northwestern.edu
🔗 bit.ly/dj-big-and-small