Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics...
Transcript of Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics...
![Page 1: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/1.jpg)
Edwin de Jonge, December 3, 2013
Big Data Visualization
“Turning Statistics into Knowledge”, Aguascalientes
With thanks to Piet Daas, Martijn Tennekes and Alex Priem
![Page 2: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/2.jpg)
Overview
2
• Big Data • Research ‘theme’ at Stat. Netherlands • Data driven approach
• Visualization as a tool •Why? •Examples in our office
•Census •Social Security •Social Media •Not shown: Traffic loops, Mobile phone data
![Page 3: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/3.jpg)
Why Visualization?
October 1st 2013, Statistics Netherlands
![Page 4: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/4.jpg)
Effective Display!
(see Tor Norretranders, “Band width of our senses”)
![Page 5: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/5.jpg)
Anscombes quartet…
5
DS1 x
y DS2 x y
DS3 x y DS4 x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
![Page 6: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/6.jpg)
Anscombe’s quartet
Property Value
Mean of x1, x2, x3, x4 All equal: 9
Variance of x1, x2, x3, x4 All equal: 11
Mean of y1, y2, y3, y4 All equal: 7.50
Variance of y1, y2, y3, y4 All equal: 4.1
Correlation for ds1, ds2, ds3, ds4 All equal 0.816
Linear regression for ds1, ds2, ds3, ds4
All equal: y = 3.00 + 0.500x
Looks the same, right?
![Page 7: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/7.jpg)
Lets plot!
![Page 8: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/8.jpg)
Visualization
For Big Data:
Use appropriate:
- Summarization
- Granularity
- Noise filtering
Research: What works for big data?
![Page 9: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/9.jpg)
9
Scatter plot with 100 data points
![Page 10: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/10.jpg)
10
Scatter plot with 100 000 data points
![Page 11: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/11.jpg)
11
Example 1: Census
![Page 12: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/12.jpg)
Example Virtual Census
‐ Every 10 years a Census needs to be conducted
‐ No longer with surveys in the Netherlands • Last traditional census was in 1971
‐ Now by (re-)using existing information • Linking administrative sources and available sample
survey data at a large scale
• Check result
• How?
• With a visualisation method: the Tableplot
11
![Page 13: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/13.jpg)
Making the Tableplot
1. Load file 17 million records 2. Sort record according to 17 million records
key variable • Age in this example
3. Combine records 100 groups (170,000 records each)
• Numeric variables • Calculate average (avg. age)
• Categorical variables • Ratio between categories present (male vs. female)
4. Plot figure of select number of variables • Colours used are important up to 12
12
![Page 14: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/14.jpg)
![Page 15: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/15.jpg)
October 1st 2013, Statistics Netherlands tableplot of the census test file
![Page 16: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/16.jpg)
Tableplot: Monitor data quality
16
– All data in Office passes stages:
‐ Raw data (collected)
‐ Preproccesed (technically correct)
‐ Edited (completed data)
‐ Final (removal of outliers etc.)
![Page 17: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/17.jpg)
Processing of data Raw (unedited) data
Edited data
Final data
![Page 18: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/18.jpg)
Example 2 : Social Security Register
15
![Page 19: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/19.jpg)
Social Security Register
– Contains all financial data on jobs, benefits and
pensions in the Netherlands
‐ Collected by the Dutch Tax office
‐ A total of 20 million records each month
‐ How to obtain insight into so much data? • With a visualisation method: a heat map
19
![Page 20: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/20.jpg)
October 1st 2013, Statistics Netherlands
Heat map: Age vs. ‘Income’
16
Age
Inco
me
(eu
ro)
![Page 21: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/21.jpg)
17
amount
amount
![Page 22: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/22.jpg)
22
Example 3: Social media
![Page 23: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/23.jpg)
Daily Sentiment in Dutch Social Media
Social media: daily sentiment in Dutch messages
23
![Page 24: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/24.jpg)
Granilarity: From day to week
Social media, daily sentiment in Dutch messages Social media: daily & weekly sentiment in Dutch messages
24
![Page 25: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/25.jpg)
Granularity: From day to month
Social media, daily sentiment in Dutch messages Social media: daily, weekly & monthly sentiment in Dutch messages
25
![Page 26: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/26.jpg)
Enter: Consumer confidence!
Social media, daily sentiment in Dutch messages Social media: monthly sentiment in Dutch messages & Consumer confidence
26 Corr: 0.88
![Page 27: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/27.jpg)
Conclusions
Big data is a very interesting data source for
official statistics
Visualisation is a great way of
getting/creating insight
Not only for data exploration, but also for
finding errors
27
![Page 28: Big Data Visualization - OECD 2 CBS Netherlands.pdf · Big Data Visualization “Turning Statistics into Knowledge”, Aguascalientes With thanks to Piet Daas, Martijn Tennekes and](https://reader034.fdocuments.us/reader034/viewer/2022042305/5ed0c1cfa5793b252353db8d/html5/thumbnails/28.jpg)
The future of statistics?