Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk...

© 2016 Mesosphere, Inc. All Rights Reserved. 1

@joerg_schad @dcos #smack

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search

Spark Summit EastFebruary 08, 2017


Jörg SchadDistributed Systems Engineer

@joerg_schad


HYPERSCALE MEANS VOLUME AND VELOCITY

Batch Event ProcessingMicro-Batch

Days Hours Minutes Seconds Microseconds

Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics

Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations


SMACK stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable databaseReal-time and batch

process dataVisualize data and build data driven applications

DC/OS

Sensors

Devices

Clients


NAIVE APPROACH

Typical Datacentersiloed, over-provisioned servers,

low utilization

Industry Average12-15% utilization

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka


Mesos & DC/OS


MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS

Typical Datacentersiloed, over-provisioned servers,

low utilization

Mesos/ DC/OSautomated schedulers, workload multiplexing onto the

same machines

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka


DC/OS ENABLES MODERN DISTRIBUTED APPS

Datacenter Operating System (DC/OS)

Distributed Systems Kernel (Mesos)

Big Data + Analytics EnginesMicroservices (in containers)

Streaming

Batch

Machine Learning

Analytics

Functions & Logic

Search

Time Series

SQL / NoSQL

Databases

Modern App Components

Distributed systems kernel to abstract resources

Ecosystem of frameworks & apps

Consistent architecture to run on top of kernel

User Interface (GUI & CLI)

Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage)

Any Infrastructure (Physical, Virtual, Cloud)


EXAMPLE:REAL-TIME TRACKING


GEO-ENABLED IoT


DATA FLOW


DEMO


THANK YOU!

ANY QUESTIONS?

@dcos

[email protected]

/groups/8295652

/dcos/dcos/examples/dcos/demos

chat.dcos.io


Keep it running!


SERVICE OPERATIONS

● Configuration Updates (ex: Scaling, re-configuration)● Binary Upgrades● Cluster Maintenance (ex: Backup, Restore, Restart)● Monitor progress of operations● Debug any runtime blockages


Typical Use: distributed, large-scale data processing; micro-batching

Why Spark Streaming?● Micro-batching creates very low

latency, which can be faster● Well defined role means it fits in well

with other pieces of the pipeline

APACHE SPARK (STREAMING)

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk...

Data & Analytics

Transcript of Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk...