Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk...

16
© 2016 Mesosphere, Inc. All Rights Reserved. 1 @joerg_schad @dcos #smack Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search Spark Summit East February 08, 2017

Transcript of Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk...

© 2016 Mesosphere, Inc. All Rights Reserved. 1

@joerg_schad @dcos #smack

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search

Spark Summit EastFebruary 08, 2017

© 2016 Mesosphere, Inc. All Rights Reserved. 2

Jörg SchadDistributed Systems Engineer

@joerg_schad

© 2016 Mesosphere, Inc. All Rights Reserved. 3

HYPERSCALE MEANS VOLUME AND VELOCITY

Batch Event ProcessingMicro-Batch

Days Hours Minutes Seconds Microseconds

Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics

Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations

© 2016 Mesosphere, Inc. All Rights Reserved. 4

SMACK stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable databaseReal-time and batch

process dataVisualize data and build data driven applications

DC/OS

Sensors

Devices

Clients

© 2016 Mesosphere, Inc. All Rights Reserved. 5

NAIVE APPROACH

Typical Datacentersiloed, over-provisioned servers,

low utilization

Industry Average12-15% utilization

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 6

Mesos & DC/OS

© 2016 Mesosphere, Inc. All Rights Reserved. 7

MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS

Typical Datacentersiloed, over-provisioned servers,

low utilization

Mesos/ DC/OSautomated schedulers, workload multiplexing onto the

same machines

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 8

DC/OS ENABLES MODERN DISTRIBUTED APPS

Datacenter Operating System (DC/OS)

Distributed Systems Kernel (Mesos)

Big Data + Analytics EnginesMicroservices (in containers)

Streaming

Batch

Machine Learning

Analytics

Functions & Logic

Search

Time Series

SQL / NoSQL

Databases

Modern App Components

Distributed systems kernel to abstract resources

Ecosystem of frameworks & apps

Consistent architecture to run on top of kernel

User Interface (GUI & CLI)

Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage)

Any Infrastructure (Physical, Virtual, Cloud)

© 2016 Mesosphere, Inc. All Rights Reserved. 9

EXAMPLE:REAL-TIME TRACKING

© 2016 Mesosphere, Inc. All Rights Reserved. 10

GEO-ENABLED IoT

© 2016 Mesosphere, Inc. All Rights Reserved. 11

DATA FLOW

© 2016 Mesosphere, Inc. All Rights Reserved. 12

DEMO

© 2016 Mesosphere, Inc. All Rights Reserved. 13

THANK YOU!

ANY QUESTIONS?

@dcos

[email protected]

/groups/8295652

/dcos/dcos/examples/dcos/demos

chat.dcos.io

© 2017 Mesosphere, Inc. All Rights Reserved. 14

Keep it running!

© 2016 Mesosphere, Inc. All Rights Reserved. 15

SERVICE OPERATIONS

● Configuration Updates (ex: Scaling, re-configuration)● Binary Upgrades● Cluster Maintenance (ex: Backup, Restore, Restart)● Monitor progress of operations● Debug any runtime blockages

© 2016 Mesosphere, Inc. All Rights Reserved. 16

Typical Use: distributed, large-scale data processing; micro-batching

Why Spark Streaming?● Micro-batching creates very low

latency, which can be faster● Well defined role means it fits in well

with other pieces of the pipeline

APACHE SPARK (STREAMING)