Data Journalism - Newsroom Statistics

25
DATA JOURNALISM Dr. Bahareh Heravi @Bahareh360 Week 11 Newsroom Statistics

Transcript of Data Journalism - Newsroom Statistics

Page 1: Data Journalism - Newsroom Statistics

DATA JOURNALISM

Dr. Bahareh Heravi @Bahareh360

Week 11Newsroom Statistics

Page 2: Data Journalism - Newsroom Statistics

What we have learned so farWhat Data Journalism is about���Finding Data���Data collection���Data scraping���Data mashing and summarisation���Data cleaning���Data aanalysis���Data visualisation with graphs, charts and infographics���Data visualisation with maps���FOI���Social Media as a source

Page 3: Data Journalism - Newsroom Statistics

   

NEWSROOM  STATISTICS  

Page 4: Data Journalism - Newsroom Statistics

We have learned before

Simple newsroom math

sum, average, median

Rate

Percent change

Page 5: Data Journalism - Newsroom Statistics

   

ANALYSING  RELATIONSHIPS  

Page 6: Data Journalism - Newsroom Statistics

Correlation analysis

Correlation concerns the strength of relationship between values of two variables.

���Are height and weight correlated?

Are engine size and max speed in cars correlated?

Page 7: Data Journalism - Newsroom Statistics

Correlation

Page 8: Data Journalism - Newsroom Statistics
Page 9: Data Journalism - Newsroom Statistics

Perfect  nega+ve   Perfect  posi+ve  

No  correla+on  

-­‐1  

0  

strong  strong  

weak  weak  

-­‐0.5   0.5  

1  

Source:  Sta+s+cs  without  tears,  Derek  Rowntree  

Page 10: Data Journalism - Newsroom Statistics

-­‐1  -­‐0.8  -­‐0.3  

0.3  

0  

0.8   1  

Page 11: Data Journalism - Newsroom Statistics

Student   Theory   Prac=cal  A   59   70  B   63   69  C   64   76  D   70   79  E   76   74  F   78   80  G   82   77  H   79   86  I   86   84  J   92   90  

Page 12: Data Journalism - Newsroom Statistics

50  

55  

60  

65  

70  

75  

80  

85  

90  

95  

50   55   60   65   70   75   80   85   90   95  

Theo

ry  

Prac=cal  

50  

55  

60  

65  

70  

75  

80  

85  

90  

95  

50   55   60   65   70   75   80   85   90   95  

Theo

ry  

Prac=cal  

Page 13: Data Journalism - Newsroom Statistics

Student   Theory   Prac=cal  G   82   77  H   79   86  I   86   84  

76  77  78  79  80  81  82  83  84  85  86  87  

78   79   80   81   82   83   84   85   86   87  

Theo

ry  

Prac=cal  

76  77  78  79  80  81  82  83  84  85  86  87  

78   79   80   81   82   83   84   85   86   87  

Theo

ry  

Prac=cal  

? !

Page 14: Data Journalism - Newsroom Statistics

   

SIGNIFICANCE  TEST  

Page 15: Data Journalism - Newsroom Statistics

Significance test

Significance test is to determine whether an observed relationship is real, or is it just one that we would anyway expect to see quite often by chance?

We start out assuming that there is no real

relationship between the two variables: null

hypothesis.

Page 16: Data Journalism - Newsroom Statistics

p valuep value: the probability that your relationship has happened by chance. The smaller the p value the more significant the relationship.

p value is calculated probability of an observed difference occurring by chance when really no difference/relationship actually exists (null hypothesis).

If p value was small enough(?*), we can reject the null hypothesis. ���

Page 17: Data Journalism - Newsroom Statistics

p value cut offs

p < 0.05 or 0.05 level significant*

p < 0.01 or 0.01 level highly significant**

���

Page 18: Data Journalism - Newsroom Statistics
Page 19: Data Journalism - Newsroom Statistics

WARNING

?

Correlation = Causation���

Page 20: Data Journalism - Newsroom Statistics
Page 21: Data Journalism - Newsroom Statistics

Other statistical analysis tools

R

PSPP

Excel solver ���

Page 22: Data Journalism - Newsroom Statistics

Hands-on

Correlation analysis and significant test for:

Penalty points in counties in Ireland and rate of road fatalities.

Use SPSS or PSPP

Go back to your penalty points and road fatalities story/data.

Page 23: Data Journalism - Newsroom Statistics

You have now completed all the data analysis and visualisation needed for our

penalty points story.������Well done!

Page 24: Data Journalism - Newsroom Statistics

Resources:    Sta+s+cs  without  tears:  A  primer  for  non-­‐mathema+cians,    Derek  Rowntree,    first  published  1981    Sta+s+cs  done  wrong,  Alex  Reinhart,  2015  hNp://www.sta+s+csdonewrong.com/          

Page 25: Data Journalism - Newsroom Statistics

 Ques=ons?  

 

Bahareh  R.  Heravi    

 

 

@Bahareh360