National Datawarehouse for Traffic Information – Big Data supplier
description
Transcript of National Datawarehouse for Traffic Information – Big Data supplier
![Page 1: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/1.jpg)
National Datawarehouse for Traffic Information – Big Data supplier
Els Rijnierse
![Page 2: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/2.jpg)
Contents
• Introducing NDW
• Experiences with our big data
• Challenges, choices and changes
![Page 3: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/3.jpg)
Posting
• The last slide will ask you to post your impression, to share what struck you most with all conference attendees
![Page 4: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/4.jpg)
NDW is a Collaborative venture
• 24 Road authorities
National
6 out of 12 provinces
Cities, either independent or in an alliance• Covering >6000 km road network
(total Dutch road network is 130.000 km)
Introducing NDW
![Page 5: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/5.jpg)
What is our aim?
• Develop and maintain a joint database for traffic data. Up-to-date, complete and unambiguous with known quality
• Create efficiency by working together and sharing information
• Stimulate effective use of this data for:- real time traffic management - real time traffic information - analyses, policy making and research
Introducing NDW
![Page 6: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/6.jpg)
Trafficmanagement
Central source for all road authorities
Introducing NDW
![Page 7: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/7.jpg)
Objectives
• Less traffic jams• Predictability• Safer roads• Less emission• More collaboration
Data voor doorstroming
![Page 8: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/8.jpg)
Happy road users
Introducing NDW
![Page 9: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/9.jpg)
NDW
Supervisory BoardSupervisory Board
Demand
Road authoritiesRoad authorities
Service providersService
providers
Supply
Participating goverments
(IDP)
Participating goverments
(IDP)
Commercial parties(EDP)Commercial parties(EDP)
System provider (external)
Selection fromdata
IndividualData need
Common data need
Individual datasupply
AccountabilitySupervision
Infrastructuresupply
Introducing NDW
![Page 10: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/10.jpg)
Data types - 1
• Traffic flow per lane per vehicle class on 14818 measuring sites
• Travel time (realised or estimated) per lane on 9424 measuring sites
• Traffic speed per lane per vehicle class on 13410 measuring sites
(measuring sites may produce more kind of data)
Introducing NDW
Every minute, traffic data from more than 24,000 measuring sites is collected, processed and within 75 seconds distributed to the users
![Page 11: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/11.jpg)
Data collection
Introducing NDW
![Page 12: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/12.jpg)
Some figures on figures
• Over 24,000 measurement sites• Giving aprox. 460,000 figures on speed, flow and travel
time each minute• => >27 Million per hour• => >600 million per day• => >240 billion per year
+ meta data on these figures
![Page 13: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/13.jpg)
Real-time traffic data (February 2012: 5 cm snow)
Introducing NDW
![Page 14: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/14.jpg)
Data types - 2
• Road works, planned and actual• Reports of congestion and accidents • Status (open/closed) of bridges• Near future: Status (open/closed) of peak lanes and
regular lanes
Introducing NDW
On occurrence data on availability of the road is collected
![Page 15: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/15.jpg)
Cooperation between CBS en NDW
• NDW collects and distributes raw data, we do not aim to do any statistical analysis.
• CBS started with small NDW datasets (1 day) and is now working on a larger set (3 months) to determine new methodology
• Conclusion:
Forget everything you learned about statistics
Experiences
![Page 16: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/16.jpg)
When to start calculating (Experiences with big data – 1)
When using big data:• This traditional way of working does not produce statistics
quicker. • This requests huge datastores for raw data storage• Strongly advised is starting with statistical analyses the
moment data is streaming in and storing only aggregated in between results
• Adapt you algorithms to be able to handle correct any unpredictable gaps in the raw data that will occur
Experiences
Traditional statistical methodology: gather and store everything and perform the statistical analyses on certain times.
![Page 17: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/17.jpg)
Technical issues (Experiences with big data - 2)
• Traditional relational databases but also statistical tools (SPSS/SAS/R) are not fast enough, run far out of memory and do not have enough performance for quick retrieval of raw data.
• When using a data storage technique suitable for fast recovery of raw data then some coding and programming has to be done on the raw data.
• Recalculating because of wrong choices or methods takes an increasing amount of time as the amount of raw data grows quickly every day.
Experiences
![Page 18: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/18.jpg)
Challenges, Choices, Changes
Devils Triangle
Challenges, Choices, Changes
Contents awareness
IT knowledgeStatistical knowledge
![Page 19: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/19.jpg)
Challenges, Choices, Changes
ChallengeGovernment policy is that public data are open data, which means our raw data are on the WWW (www.ndw.nu/datalevering)Anybody can download them and produce surveys, statistics, tables, draw conclusions and publish these (long) before statistical office does.
Be aware of publicity this might cause, discussions on ‘the truth’ and the status of a response or statement from the statistical office.
Take on the challenge of producing real time statistics
Challenges, Choices, Changes
![Page 20: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/20.jpg)
Challenges, Choices, Changes
ChoiceTraditional storage of raw data used for statistics is at thestatistical office.
Big data should be left at their origin and withdrawn when
needed.
Challenges, Choices, Changes
![Page 21: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/21.jpg)
Challenges, Choices, Changes
ChangeLook for appropriate IT infrastructure and develop a new way of handling data
Challenges, Choices, Changes
![Page 22: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/22.jpg)
www.sendsteps.comPrepare to react; keep your phone ready!
TXT 1
2
Text to +316 4250 0030
Type Session <space> WS3 <space> your answer
Internet 1
2
Go to sendc.com
Log in with Session
Posting messages is anonymousNo additional charge per message
3 Type WS3 <space> your answer
![Page 23: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/23.jpg)
When using Big Data for our statistics the biggest change in our way of working will be….
Internet Go to sendc.com and log in with Session Type WS3 <space> Your answer
TXT Send to 06 4250 0030: Session Type WS3 <space> Your answer
![Page 24: National Datawarehouse for Traffic Information – Big Data supplier](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815922550346895dc64b9c/html5/thumbnails/24.jpg)
Challenges, Changes and Choices when using these amounts for Statistics
Forget everything you learned on statistics: How to produce 1 representative figure on traffic intensity from this:
When to start calculating
Immediately soon after data is available, continuously
Were to store what (intermediate or raw) data
Raw data at the providers, intermediate results at the statistical office
Tools and IT techniques
No more SPSS, R, and SAS, but programming and working with new tools
Algorithms Asymptotically, the time complexity as well as the space complexity should be of a lower order, because of the large volumes of data
As/When open data is government policy:
Be aware of others producing also statistics, quicker and with other conclusions