Jeremy Harkins - ineni Realtime + Lucid Edge - Knowledge through RealTime Visualisation
DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs
-
Upload
silvio-fiorito -
Category
Software
-
view
232 -
download
2
Transcript of DC Spark bake off - Realtime TCP Packet Analysis using Spark and Azure Event Hubs
Washington DC Area Apache Spark Interactive
Spark Bake-off
Team Name: Silvio Fiorito Solution Title: Real-time Packet Analysis using Spark
Spark Bake-offPage: 2
Team Introductions
Silvio Fiorito – Background in development and app security– Started working with Hadoop in 2012– Started using Spark at v0.6 in early 2013– Built a few prototypes for low-latency query
services with Spark/Shark and then SparkSQL
– Twitter: @granturing
Spark Bake-offPage: 3
Solution Overview
Real-time TCP packet analysis of geographically distributed hosts– Must support high throughput from many hosts– 3 demo VMs ( 2 x Azure & 1 x AWS)
Local Flume agent pushes events to Azure Event Hub Events are partitioned and persisted up to 7 days Spark Streaming app ingests streams
– Reconstruct packets– Lookups for geo-ip and port description– Clusters using pre-trained k-means model– Saves data to Azure Table Storage and publishes on Service
Bus Topic