Spark and the Enterprise by Tony Baer

20
www.ovum.com © Copyright Ovum 2015. All rights reserved. Spark & the Enterprise Tony Baer [email protected] Presentation for Spark Summit East 2016

Transcript of Spark and the Enterprise by Tony Baer

www.ovum.com

© Copyright Ovum 2015. All rights reserved.

Spark & the Enterprise

Tony Baer

[email protected]

Presentation for Spark Summit East 2016

2© Copyright Ovum 2015. All rights reserved.

Spark eating the Big Data world

§ 40+ committers

§ 1000 contributors

§ 179 projects using Spark engine

§ 370k+ LOCSource: Databricks, January 2015 The most active Apache project

3© Copyright Ovum 2015. All rights reserved.

“The leading candidate for ‘successor’ to MapReduce today is Apache Spark.”

Mike OlsonChief Strategy Officer, Cloudera 12/30/2013

Bob PiccianoSVP, Analytics, IBM 6/15/2015

When IBM put its muscle behind Linux in 1999, that move marked the beginning of its ascendancy in corporations and Internet-class data centers. The same sort of thing could happen now with Spark.

4© Copyright Ovum 2015. All rights reserved.

5© Copyright Ovum 2015. All rights reserved.

What’s there to like?Ease of development

Performance

FlexibilityExtensibility

Versatility

10 – 100x faster than MapReduce

Batch + Real timeCore projects & 80+ libraries

Orchestrate multiple analytic processes

Higher level programming abstraction

6© Copyright Ovum 2015. All rights reserved.

So what?

7© Copyright Ovum 2015. All rights reserved.

What’s really there to like?Ease of development

Performance

FlexibilityExtensibility

Versatility

10 – 100x faster than MapReduce

Orchestrate multiple analytic processes

Higher level programming abstraction

Batch + Real time

Handle different & complex scenarios

Better Productivity

Wider tool selection

Smarter predictions

Handle more varied scenarios

Better programming

Core projects & 80+ libraries

8© Copyright Ovum 2015. All rights reserved.

What’s really there to like?Ease of development

Performance

VersatilityExtensibility

Versatility

10 – 100x faster than MapReduce

Orchestrate multiple analytic processes

Higher level programming abstraction

Batch + Real time

Handle more complex scenarios

Better Productivity

Wider tool selection

Smarter predictions

Handle more varied & complex scenarios

Better programming

Core projects & 80+ libraries

9© Copyright Ovum 2015. All rights reserved.

What’s really there to like?Ease of development

Performance

VersatilityExtensibility

Versatility

10 – 100x faster than MapReduce

Orchestrate multiple analytic processes

Higher level programming abstraction

Batch + Real time

Handle more complex scenarios

Better Productivity

Wider tool selection

Smarter predictions

Handle more varied & complex scenarios

Better programming

Core projects & 80+ librariesSo what?

10© Copyright Ovum 2015. All rights reserved.

How will this impact the business?

11© Copyright Ovum 2015. All rights reserved.

Focus on the results

§ What use cases/business scenarios/business problems can Spark address?

§ How does Spark impact analytics?

§ What questions are asked?

§ How questions are asked?§ Types of analytics that are performed?

§ Timeliness of results?

§ The insights that can be obtained?

12© Copyright Ovum 2015. All rights reserved.

Common analytics use cases

Workload shift

Customer Engagement

Risk/Fraud/Security

Operations

Customer RetentionCustomer ExperienceUpsell/Cross-SellSocial Tribe InfluenceReal-time Customer Offer

Risk MitigationFraud Detection/PreventionIntrusion Detection

Operational EfficiencyProcess OptimizationAsset & Service Mgmt.Performance Mgmt.

ETL processesBatch analytics

Many use cases are familiar… the results are different

13© Copyright Ovum 2015. All rights reserved.

Smart City:Manage Traffic flow

From Sense & respond to….

Real-time analytics + interactive query + long running ML = better insights for managing traffic

14© Copyright Ovum 2015. All rights reserved.

Monitoring Automotive product performance

From:

§ Track warranty & repair trends (after the fact)

To:

§ Identify signals from social media to prepare auto mfr & dealer network to anticipate performance issues

§ Use Spark MLlib machine learning capabilities

Benefits:

§ Provided advance warning of customer feedback

§ MLlib libraries eliminated need for custom programming ML functions Source: Toyota 12-week pilot program

15© Copyright Ovum 2015. All rights reserved.

Data wrangling to spot financial fraud

From:

§ DW populated with data from internal sources (mostly OLTP data)

To:

§ Broadening data set to widely varying sources (transactions, text messages, social media) with 10s or 100s of millions of records

§ Use Spark-based ML-powered data prep tool to harmonize data to ID outliers & patterns

Benefits:

§ Spark performance enabled team to expand data pool, query interactively & run more what-if scenarios for spotting fraud

16© Copyright Ovum 2015. All rights reserved.

Customer Experience (CX) Management

From:

§ Surveys, focus groups, CRM data

To:

§ Predictive analytics for improving the customer experience

§ Spark-enabled machine learning for identifying CX trends, customer satisfaction levels; Graph analytics for connecting customer experiences across different channels and ID’inginfluencers & followers

Benefits:

§ Changes CX management from reactive to proactive

17© Copyright Ovum 2015. All rights reserved.

Why Spark?From the tech argument

Ease of development

Performance

FlexibilityExtensibility

Versatility

10 – 100x faster than MapReduce

Batch + Real timeCore projects & 80+ libraries

Orchestrate multiple analytic processes

Higher level programming abstraction

18© Copyright Ovum 2015. All rights reserved.

Why Spark?To: Business Benefits

§ Automotive product performance

§ Machine learning enables the automotive OEM to be proactive in deciphering the signals to anticipate consumer sentiment/perceptions of product performance

§ Business Benefit: Head off product complaints/potential liability/reputational issues before they explode

§ Financial fraud detection

§ Spark’s scalability allows crunching of more complete data sets; performance produces more timely results; machine learning IDs emergent outliers of interest

§ Business Benefit: More thorough, timely detection of fraud

§ Customer Experience

§ Machine learning allows proactive deciphering of signals; graph computing identifies social tribes & influencers

§ Business Benefit: Keep more in sync with customers. Act, not react to events, trends, changes in customer climate

19© Copyright Ovum 2015. All rights reserved.

Takeaways

§ Spark enthusiasm in practitioner community has gone viral

§ Spark community highly successful in sparking vendor support.

§ Spark practitioners must take the message on Spark to higher level: Talk to the business

§ Keep your message real:

§ Business benefits

§ Don’t promise the sky

§ Spark is not the only path to ML, graph, streaming, etc. But API compatibility provides accessibility, enables flexibility & versatility

§ Spark is still in adolescence.

www.ovum.com

© Copyright Ovum 2015. All rights reserved.

Thank you

Tony Baer

Ovum

(646) 546-5330

[email protected] Twitter: @TonyBaer