Lambda architecture with Spark

17
1 ©2016 Talend Inc Lambda Architecture with Spark Efficiently combining Historical and New data for Analytics Laurent Bride-CTO Kurt Layson - Account Executive - Michigan Vincent Galopin - Solutions Engineering Manager March 10, 2016

Transcript of Lambda architecture with Spark

1©2016 Talend Inc

Lambda Architecture with SparkEfficiently combining Historical and New data for Analytics

Laurent Bride-CTOKurt Layson - Account Executive - MichiganVincent Galopin - Solutions Engineering Manager

March 10, 2016

2

Agenda

• Struggles in Traditional Architectures

• What is the Lambda Architecture?

• Spark: Unified Development Framework

• Demonstration: Spark Batch & Streaming jobs in Talend

3

Historical Data New Data

Traditional Architecture

Web Logs

Internet of Things

DBMS / EDW

HADOOP

Social Media

CLOUD DATASET

4

Situation

I need fast access to historical data on the fly with real time data from the stream for analysis

5

Historical Data New Data

Traditional Architecture

6

Lambda Architecture

A data processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods

https://en.wikipedia.org/wiki/Lambda_architecture

7

Lambda Architecture

• Batch Layer

• Speed Layer

• Serving Layer

https://www.mapr.com/developercentral/lambda-architecture

8

Lambda Architecture

• Robust and Fault Tolerant

• Scalable

• Extensible

9

Lambda Architecture

But I still use different technologies to handle Batch & Streaming!

10

Introducing Spark

Unified Development Framework

11

Spark Batch

http://www.slideshare.net/databricks/2015-0317-scala-days

12

Spark Streaming

13

APPLICATION INTEGRATION

CLOUD INTEGRATION

DATAINTEGRATION

BIG DATA INTEGRATION

MASTER DATA MANAGEMENT

STUDIO REPOSITORY DEPLOYMENT EXECUTION MONITORING

ComprehensiveEclipse-based user interface

Web-based deployment &

scheduling

Single web-based monitoring console

Consolidated metadata & project

information

Same container for batch processing,

message routing & services

6

Discovery & cleansing for

business users

SELF-SERVICE

51 3

42

14

APPLICATION INTEGRATION

CLOUD INTEGRATION

DATAINTEGRATION

BIG DATA INTEGRATION

MASTER DATA MANAGEMENT

Data Fabric

STUDIO REPOSITORY DEPLOYMENT EXECUTION MONITORING

ComprehensiveEclipse-based user interface

Web-based deployment &

scheduling

Single web-based monitoring console

Consolidated metadata & project

information

Same container for batch processing,

message routing & services

6

Discovery & cleansing for

business users

SELF-SERVICE

51 3

42

15

Visually develop jobs that run 100% on Spark• 5X times faster using independent benchmarks• 10X developer productivity gained over hand-coding

Spark• 100X faster with in-memory processing

900 components including 100+ new Spark components• HDFS, RDBMS, NoSQL, Cloud Storage, Transformation,

Messaging, In-memory analytics & machine learning recommendations, and much more

• In-memory data caching & “windowed” computations• Click to enable Spark Streaming for real-time data

processing

Real Time Big Data Integration and Unlimited Scale

1st Data Integration Platform on Spark

+ +5X FASTER

UNLIMITED SCALE

Benefits: Make decisions faster. Tremendous developer productivity.

16

Talend Demonstration

1. Talend Studio User Interface

2. Building a Spark Job

3. Building a Real-time Recommendation pipeline

4. Introduction to the Talend Real-time Big Data

Sandbox

17

For More Information

- Download the Talend Sandbox!http://www.talend.com/products/real-time-big-data

- Check the Apache Spark Projecthttp://spark.apache.org/

- Find out more about the Lambda Architecturehttp://lambda-architecture.net/