Post on 15-Apr-2017
Tomasz Szymański Adam Warski
SoftwareMill
Open source big data landscape and possible ITS applications
Big Data? Fast Data?• No clear definition• Big Data
– 100s+ of GB? – Time frame?
• Fast Data– Real-time– Single-node vs multi-node
Why Open Source?• Large developer base
Easy to learn• Projects usually backed by a commercial entity
Support• Cost efficiency
leverage latest developments• Future-proofing
tools with a large user base will be around for longer
Apache Spark / Cassandra / Kafka• Data ingestion: Kafka• Data processing: Spark• Data storage: Cassandra
Apache Spark / Cassandra / Kafka• Spark: largest cluster 8k nodes, eBay, Baidu, NASA, Amazon• Cassandra: over 75k nodes storing 10PB of data at Apple• Kafka: over 1.1 trillion messages per day at LinkedIn
Possible ITS applications
Hotspot detectionComputed using New York open taxi data, Akka & Apache Flink
Architecture of a traffic-jam detection systemLeveraging Apache Kafka, Hadoop, Spark, Cassandra & Akka
Summing up and the future• Open source has a lot to offer• Open data?• Fast-evolving field
– Rapid development, rapid data insights– Leverage in ITS!
technical expertise
‘s ITS domainexperts
• Founded in 2009• Bespoke software development services• Various domains, including logistics & transport• Big data a common theme in our projects