© 2014 MapR Technologies 1 Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR...

18
© 2014 MapR Technologies 1 © 2014 MapR Technologies Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014

Transcript of © 2014 MapR Technologies 1 Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR...

© 2014 MapR Technologies 1© 2014 MapR Technologies

Why Spark on Hadoop Matters

MC Srivas, CTO and Founder, MapR Technologies

Apache Spark Summit - July 1, 2014

© 2014 MapR Technologies 2

MapR Overview

Top RankedExponential

Growth500+

Customers Cloud Leaders

3X bookings Q1 ‘13 – Q1 ‘14

80% of accounts expand 3X

90% software licenses

< 1% lifetime churn

> $1B in incremental revenuegenerated by 1 customer

© 2014 MapR Technologies 3

Rapidly Evolving LandscapeM

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provision

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MR v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow &

Data Gov.Tez*

Accumulo*

Hive

Impala

Shark

Drill*

SQL

Sentry* Oozie ZooKeeperSqoop

Knox* WhirrFalcon*Flume

Data Integrtn.& Access

HttpFS

Hue

* 2014 TIMELINE

© 2014 MapR Technologies 4

The Complete Spark Stack on HadoopM

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provision

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MR v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow &

Data Gov.Tez*

Accumulo*

Hive

Impala

Shark

Drill*

SQL

Sentry* Oozie ZooKeeperSqoop

Knox* WhirrFalcon*Flume

Data Integrtn.& Access

HttpFS

Hue

* 2014 TIMELINE

© 2014 MapR Technologies 5

A Winning Combination

© 2014 MapR Technologies 6

Spark Advantages:

IN-MEMORY PERFORMANCE

EASE OF DEVELOPMENT

COMBINE WORKFLOWS

• Easier APIs• Python, Scala, Java

• RDDs• DAGs Unify Processing

• Shark, ML, Streaming, GraphX

© 2014 MapR Technologies 7

Hadoop Advantages:

UNLIMITEDSCALE

WIDE RANGE OF APPLICATIONS

ENTERPRISE PLATFORM

• Multiple data sources• Multiple applications• Multiple users

• Reliability• Multi-tenancy• Security

• Files• Databases• Semi-structured

© 2014 MapR Technologies 8

The Combination of Spark on Hadoop

IN-MEMORY PERFORMANCE

EASE OF DEVELOPMENT

COMBINE WORKFLOWS

UNLIMITEDSCALE

WIDE RANGE OF APPLICATIONS

ENTERPRISE PLATFORM

Operational ApplicationsAugmented by In-Memory Performance

© 2014 MapR Technologies 9© 2014 MapR Technologies

Case Studies

© 2014 MapR Technologies 10

Industry Leading Ad-Targeting Platform

• High performance analytics over MapR M7 NoSQL

• Load from M7 table into RDD to augment scoring in real-time

• Results fed back to M7 for other applications

© 2014 MapR Technologies 11

Leading Pharma Company: NextGen Genomics

Existing process takes several weeks to align chemical compounds with genes

ADAM on Spark allows

realignment in a few hours

Geneticists can minimize engineering dependency

© 2014 MapR Technologies 12

Cisco: Security Intelligence Operations

Sensor data lands in M7

Spark Streaming on M7 for first check on known threats

Data next processed on GraphX and Mahout

Results queried using SQL via Shark and Impala

© 2014 MapR Technologies 13

Insurance Giant: Addressing Health Care Regulations

Patient information in M7 combined with clinical records to compute re-admittance probability

Process uses Spark with transactional data in M7

Insurance options decided in real-time on online portals

© 2014 MapR Technologies 14© 2014 MapR Technologies

In Summary

© 2014 MapR Technologies 15

Spark on

Hadoop gains traction for

Real-time applications

© 2014 MapR Technologies 16

Pick the

Right Tool for the Job

© 2014 MapR Technologies 17

MapR is Unbiased Open Source (a la Linux)

• Open source distribution is about providing choice– Linux includes MySQL, PostgreSQL and SQLite– Linux includes Apache httpd, nginx and Lighttpd

MapR Distribution for Hadoop Distribution C Distribution H

Spark Spark (all of it) and Shark Spark only No

Interactive SQL Shark, Impala, Drill, Hive/Tez One option(Impala)

One option(Hive/Tez)

Versions Hive 0.10, 0.11, 0.12, 0.13Pig 0.11, 012HBase 0.94, 0.98

One version One version

© 2014 MapR Technologies 18

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Thank you