Open source big data landscape and possible ITS applications

Post on 15-Apr-2017

887 views 3 download

Transcript of Open source big data landscape and possible ITS applications

Tomasz Szymański Adam Warski

SoftwareMill

Open source big data landscape and possible ITS applications

Big Data? Fast Data?• No clear definition• Big Data

– 100s+ of GB? – Time frame?

• Fast Data– Real-time– Single-node vs multi-node

Why Open Source?• Large developer base

Easy to learn• Projects usually backed by a commercial entity

Support• Cost efficiency

leverage latest developments• Future-proofing

tools with a large user base will be around for longer

Apache Spark / Cassandra / Kafka• Data ingestion: Kafka• Data processing: Spark• Data storage: Cassandra

Apache Spark / Cassandra / Kafka• Spark: largest cluster 8k nodes, eBay, Baidu, NASA, Amazon• Cassandra: over 75k nodes storing 10PB of data at Apple• Kafka: over 1.1 trillion messages per day at LinkedIn

Possible ITS applications

Hotspot detectionComputed using New York open taxi data, Akka & Apache Flink

Architecture of a traffic-jam detection systemLeveraging Apache Kafka, Hadoop, Spark, Cassandra & Akka

Summing up and the future• Open source has a lot to offer• Open data?• Fast-evolving field

– Rapid development, rapid data insights– Leverage in ITS!

technical expertise

‘s ITS domainexperts

• Founded in 2009• Bespoke software development services• Various domains, including logistics & transport• Big data a common theme in our projects