Data journalism
-
Upload
paul-bradshaw -
Category
Education
-
view
916 -
download
1
description
Transcript of Data journalism
![Page 1: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/1.jpg)
![Page 2: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/2.jpg)
![Page 3: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/3.jpg)
Philip Meyer, Detroit, 1967Knight newspapers reporter. Nieman Fellow interested in social research methods. Teamed up with academic to test stories being told about riots (poor immigrants being ‘deviant’). Field research, analysis, publication - 1 month debunked - no correlation between income, origin. Line about information abundance and need for ‘truth about the facts’
![Page 4: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/4.jpg)
Online JournalismCity UniversityPaul Bradshaw
Data journalism: “The truth about the facts”
![Page 5: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/5.jpg)
1. How is 2012 different to 1967?2. Getting data3. Getting stories
Themes
![Page 6: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/6.jpg)
Holly Watt, 2009
![Page 7: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/7.jpg)
The Guardian and Wikileaks
![Page 8: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/8.jpg)
![Page 9: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/9.jpg)
![Page 10: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/10.jpg)
![Page 11: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/11.jpg)
![Page 12: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/12.jpg)
“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”
Adrian Holovaty
![Page 13: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/13.jpg)
![Page 14: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/14.jpg)
![Page 15: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/15.jpg)
• Times Data Blog
![Page 16: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/16.jpg)
![Page 17: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/17.jpg)
”QUOTE”
Now is a good time.
![Page 18: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/18.jpg)
“The Tribune’s more than three dozen interactive databases, collectively have drawn three times as many page views as the site’s stories. [75% of traffic]”
http://bit.ly/dj2dmz
![Page 19: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/19.jpg)
.
Everything is zeroes and ones
![Page 20: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/20.jpg)
NumbersTextLive dataBehavioural dataImages, audio, video
If it’s digitised, it’s a subject for data journalism
![Page 21: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/21.jpg)
![Page 22: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/22.jpg)
(comparison, themes)
![Page 23: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/23.jpg)
Times film genres
![Page 24: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/24.jpg)
.
The process.
![Page 25: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/25.jpg)
25
![Page 26: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/26.jpg)
Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?
Passive vs active data journalism
![Page 27: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/27.jpg)
Official sources: ONS, data.gov.uk, etc.Secondary FOI: disclosure logs, WDTK, HansardReports and research: Google alertsUnofficial sources: Scraperwiki, OpenlyLocal, OpenCorporates, OpenCharities, etc.
Compile: Reactive
![Page 28: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/28.jpg)
Communities, mailing lists, groupsAdvanced search: Site:gov.uk (etc), Filetype:pdf (etc) Tip: database contents are invisibleScrapers - tools, write or ask
Compile: Proactive
![Page 29: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/29.jpg)
29
![Page 30: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/30.jpg)
“disclosure log” site:gov.uk“hate crime” filetype:xls site:police.uk“confidential” filetype:pdf site:gov.uk
Walkthrough: advanced search
![Page 31: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/31.jpg)
RSS, XML, JSON, RDF - and APIsScraperwikiOutwit HubGoogle RefineYahoo! PipesGoogle Docs formulae
Feeds and scrapers
![Page 32: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/32.jpg)
Format? Table? Pattern? URL?
'Structured' data
![Page 33: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/33.jpg)
http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=®ion=&country=united+kingdom§or=
http://www.ltscotland.org.uk/scottishschoolsonline/schools/5thyear.asp?iSchoolID=5237521
![Page 34: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/34.jpg)
'Structured' HTML? (Use Firebug)
<p> <strong>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>The complainant requested a copy of the authorities approved business plan [...]<br /><strong>Section of Act/EIR & Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /><a title="Opens in new window" href="~/media/documents/decisionnotices/2010/fs_50295557.ashx" target="_blank">View PDF of Decision Notice FS50295557</a></p>
![Page 35: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/35.jpg)
=ImportHTML("http://bob.com/mytable", "table", 1)=ImportXML("http://backtweets.com/search.xml?itemsperpage=100&...”)=ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2)
Spreadsheet formulae
![Page 36: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/36.jpg)
1. Open a spreadsheet2. In cell A1 type a URL of a page with a table, e.g. http://www.horsedeathwatch.com3. In cell A2 type:=ImportHTML(A1, "table", 1)
Instructions at http://excelnotes.posterous.com/tag/importhtml
Walkthrough: =IMPORT (Google Docs)
![Page 37: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/37.jpg)
"A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit."
Ethics
![Page 38: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/38.jpg)
If you have to do a job more than once...
Let the computer do the work
![Page 39: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/39.jpg)
Start with a question
What is the average? Who is top? Bottom?Time: what has happened since last year? 10 years ago? Space: Trends in fields/regions?What is the context?
![Page 40: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/40.jpg)
![Page 41: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/41.jpg)
![Page 42: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/42.jpg)
Total expenditure =SUM(D:D)Biggest single spend =MAX(D:D)Average invoice value =MEDIAN(D:D)Spend per day =SUM(D:D)/30Number of invoices =COUNT(D2:D200)Number of invoices over £5000 =COUNTIF(D2:D200,”>5000”)
Interview the data
![Page 43: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/43.jpg)
= indicates this is a formulaSUM is the formula to be applied( contains the ingredients for that formulaD2:D300 this is a range of cells*) ends the list of ingredients
*You might instead use a single cell, a value, or a ‘nested’ formula
Basic calculations
![Page 44: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/44.jpg)
![Page 45: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/45.jpg)
Walkthrough: using formulae
Use =COUNTIF to get a total number (e.g. loans over £1m)Use =SUMIF to find the total value of those loansUse =IF to create a new column that divides loans into 2 types
![Page 46: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/46.jpg)
Data health
warning!
Remember the context: spending over £500
![Page 47: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/47.jpg)
![Page 48: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/48.jpg)
Insert > Pivot table > Layout... Put focus category in left columnIn middle: count or sum or averageAcross top: sub-categoriesSort, then re-edit to add count or sum, sub-categories
Data journalism on a deadline: Pivot tables
![Page 49: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/49.jpg)
.
Questions?
![Page 50: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/50.jpg)
Links
OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj08Delicious.com/paulb/DJDelicious.com/paulb/visDelicious.com/paulb/data
![Page 51: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/51.jpg)
- Use advanced search to find data- Use tools to scrape data- Visualise a politician's speeches using Wordle or Many Eyes- Google form to crowdsource beer cost data?
Lab
![Page 52: Data journalism](https://reader035.fdocuments.us/reader035/viewer/2022081414/54c906ed4a7959486f8b4576/html5/thumbnails/52.jpg)
Books
Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn'tDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data