Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to ....

15
Marco Puts Big Data and Road Sensors Implementing a Big Data Statistics

Transcript of Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to ....

Page 1: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Marco Puts

Big Data and Road Sensors Implementing a Big Data Statistics

Page 2: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Statistics Netherlands and Big Data Why a Big Data approach?

– Shorter time to publication – Respond to current events – Higher reliability – More detail – More efficient processes Considerations: - Infrastructure - Competences - Culture

2

Page 3: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Road sensors

Road sensor data – Passing vehicle counts for each minute

(24/7) at about 60.000 sensors in the Netherlands

– Types of sensors: ‐ Induction loop ‐ Camera ‐ Bluetooth

– Length categories (e.g. small, medium, long vehicles)

– Large volume: approx. 230 mln records/day

Page 4: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Dutch highways

Page 5: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Dutch highways with road sensors

Page 6: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Data journalism and (almost) real time statistics

Respond to current events

Within two

days!

Page 7: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Statistics Netherlands and Big Data

Big Data Research (and development) ‐ Data Driven ‐ Case Based ‐ Roadmap

Bottom up Approach ‐ There is no Theory of Big Data yet ‐ Explorative Research

‐ Findings → Methodology

Page 8: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Big Data Processes Data driven vs. output driven

8

Stats Big Mess of Data

Page 9: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

How to get into production?

9

Page 10: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

A Lean view on Statistical Processes

10

Big Mess of Data

T PIB IB

T/O

O/O/M/W

MB

• Bring Processes to the data (T) • Minimize human interaction (M) • Added value of data cleaning (O/O)? • Simple rules (O/O) • Speeding up the processes: (H)PC (W) • Process data in small chunks (I/W)

Page 11: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Raw Data 80 TB

2010 ‐ 2014

Transformed data Data

70 GB

Microdata 500 MB

Statistics 6 KB

Selection +

Transformation Data Cleaning (filtering)

Estimation

Examples of Big Data processing

The Challenge during exploration

11

Cultural Change

Page 12: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Process in small chunks 3V’s

12

BIG DATA

Page 13: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Minimize Human Interaction “traditional” approach

13

Impact

automatic

manual

Macro

Page 14: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Minimize Human Interaction Alternative Approach

14

Manually

Monitoring Automatic

Q Q

Proces parameters

Page 15: Big Data and Road Sensors · Data journalism and (almost) real time statistics . Respond to . current events . Within two days! Statistics Netherlands and Big Data . Big Data Research

Conclusion

– Big data is still a new phenomenon within official statistics

– No standard methodology – Standard processes not efficient – Big Data enables us to:

‐ Publicize faster on more details ‐ Respond to current events

– But we need to change!

15