Big Data Europe Transport Pilot case, Luigi Selmi
-
Upload
bigdataeurope -
Category
Technology
-
view
775 -
download
2
Transcript of Big Data Europe Transport Pilot case, Luigi Selmi
Objective of the Pilot SC4
L. Selmi - BDE - SC4 Webinar
A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets and graphs.
Message Broker
L. Selmi - BDE - SC4 Webinar
Apache Kafka is a high-throughput distributed durable messaging system
Apache Kafka
Stream and Batch Processor
L. Selmi - BDE - SC4 Webinar
Apache Flink is an open source platform for distributed stream and batch data processing.
Apache Flink
Storage and Indexing
L. Selmi - BDE - SC4 Webinar
PostGis is a spatial database that stores the road network data. Elasticsearch is a distributed open source document database built on top of Apache Lucene. It stores the result of the workflow.
Visualization
L. Selmi - BDE - SC4 Webinar
The pilot SC4 can process real-time FCD data for map-matching and classify a road segment according to the traffic level.
Distributed computing: the theoretical minimum
L. Selmi - BDE - SC4 Webinar
Minimum requirement for fault-tolerance and scalability
● Cluster of 3 nodes (Docker swarm)
● 4 CPU cores x node● 1 (Flink) worker x node● 1 (Flink) slot x CPU core
Max parallelism = 12
Parallelization: map-match subtasks
L. Selmi - BDE - SC4 Webinar
1. source()2. mapMatch() 3. keyBy()/window()/apply()4. sink()
The subtasks can be distributed in slots with different parallelism (e.g. from 1 to 12)
Parallelization: Flink dataflow
L. Selmi - BDE - SC4 Webinar
A slot can process all the subtasks in a pipeline
Parallelization: input and output data
L. Selmi - BDE - SC4 Webinar
device_id timestamp lat lon speed orientation transit
The mapMatch subtask keeps the time order so that the next task keyBy(road_seg)/window(15’)/apply() will return the correct average speed and number of vehicles within the time window for each road segment.road_seg_id start_date num_vehicles avg_speed
Pilot Cycle 2 Targets
L. Selmi - BDE - SC4 Webinar
● Extend the functionalities● Improve the technology● Lower the boundaries
Cycle 2 - Extend the functionalities
L. Selmi - BDE - SC4 Webinar
Short-term traffic forecasts1. Map-match 44 Gb of historical
Floating Car Data from CERTH (Thessaloniki)
2. Train a model (using ANN)3. Make predictions using the
model and the near real-time data
Cycle 2 - Improve the technology
L. Selmi - BDE - SC4 Webinar
● Improve the map-matching algorithm
● Parallelize the processing of the historical data
● Finalizing the “dockerization” of the components
Cycle 2 - Lower the boundaries
L. Selmi - BDE - SC4 Webinar
● Set up different visualizations for traffic monitoring and forecasting
● Visualize the traffic pattern in a road segment
● Visualize a location of a vehicle and the matched road segment (for tests)
Thanks
L. Selmi - BDE - SC4 Webinar
BDE project website:https://www.big-data-europe.eu/Code repository: https://github.com/big-data-europeContact:[email protected]