1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera...

16
1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera [email protected], @mikeolson

Transcript of 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera...

Page 1: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

1

Apache Spark and Its Rolein the Enterprise Data HubMike Olson, Chief Strategy Officer, [email protected], @mikeolson

Page 2: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

2 ©2014 Cloudera, Inc. All rights reserved.

Spark Unifies and Simplifies Hadoop

Batch Processing

Stream Processing

Machine Learning

Page 3: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

3 ©2014 Cloudera, Inc. All rights reserved.

Developing and supporting Spark together to ensure customer success

Page 4: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

4 ©2014 Cloudera, Inc. All rights reserved.

Spark at Cloudera

October 2013

February 2014

July 2014

Databricks and Cloudera partner

Spark support added to CDH

Continuing support & innovation

Page 5: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

5 ©2014 Cloudera, Inc. All rights reserved.

Spark is a Core Component of Hadoop

Hadoop Core; 2589

Spark; 4149All Other Ecosystem Projects Shipped by

Cloudera; 12438

Commit Activity Past 12 Months

Page 6: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

6 ©2014 Cloudera, Inc. All rights reserved.

Fully Integrated into CDH

• Integrated and supported part of our platform

• Diverse use cases in production

• Well-trained support and external trainings

3RD PARTY APPS

STORAGE

BATCHPROCESSING

INTERACTIVESQL

SEARCHENGINE

MACHINELEARNING

STREAMPROCESSING

WORKLOAD MANAGEMENT

FILESYSTEM ONLINE NOSQL

Page 7: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

7 ©2014 Cloudera, Inc. All rights reserved.

Customer Adoption

Search personalization through machine

learning investigations

Fast processing of millions of stock

positions and future scenarios

Genomics research using Spark pipelines

Predictive modeling of disease conditions

Page 8: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

8

What’s Next?

Page 9: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

9 ©2014 Cloudera, Inc. All rights reserved.

The only hands-on deep dive into building unified

applications with Spark

Cloudera Developer Training for Apache Spark

Public GA: Aug 5, Redwood City

Page 10: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

10 ©2014 Cloudera, Inc. All rights reserved.

• Simplifies and speeds up complex cluster deployments• Includes Cloudera Enterprise and ScaleMP's Versatile SMP

(vSMP) architecture• Built on the Intel(R) Xeon(R) processor-based Dell R920

hardware• Optimized for Spark

Dell In-Memory Appliances for Cloudera Enterprise

Page 11: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

11 ©2014 Cloudera, Inc. All rights reserved.

Spark as the Standard Processing Engine

Page 12: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

12 ©2014 Cloudera, Inc. All rights reserved.

The Hive and Spark communities are coming together to drive consolidation in the Hadoop ecosystem

Bringing the Communities Together

Page 13: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

13 ©2014 Cloudera, Inc. All rights reserved.

Hive on Spark

Page 14: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

14 ©2014 Cloudera, Inc. All rights reserved.

Architecture

SPARK

BATCH PROCESSING

STREAM PROCESSING

HIVEParser, Metastore, Semantic Analyser,

Logical Plan, Optimizer, Task execution layer

HDFS

MR Tez

Page 15: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

15 ©2014 Cloudera, Inc. All rights reserved.

Our SQL on Hadoop Vision

SQL

BI and SQL Analytics

BatchProcessing

Mixed Spark and SQL Applications

Page 16: 1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer, Cloudera mike.olson@cloudera.com, @mikeolson.

16 ©2014 Cloudera, Inc. All rights reserved.

Mike [email protected]@mikeolson

Thank you!