How to Build Fast Data Applications: Evaluating the Top Contenders

27
page HOW TO BUILD FAST DATA APPLICATIONS: EVALUATING THE TOP CONTENDERS Dheeraj Remella, Director of Solutions Architecture VoltDB

Transcript of How to Build Fast Data Applications: Evaluating the Top Contenders

Page 1: How to Build Fast Data Applications: Evaluating the Top Contenders

page

HOW TO BUILD FAST DATA APPLICATIONS: EVALUATING THE TOP CONTENDERS

Dheeraj Remella, Director of Solutions Architecture

VoltDB

Page 2: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

OUR SPEAKER

2

Dheeraj RemellaDir. of Solutions Architecture, VoltDB

Page 3: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

VOLTDB – PURPOSE-BUILT FOR FAST DATA

•  What?•  Operational database with integrated processing

and data pipeline in a single system

•  Why?

•  “Streaming apps are really database apps when your database is fast enough”

3

Page 4: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

Collect Explore

AnalyzeAct

4

Big Data analytic results:

1.  Discoveries: seasonal predictions, scientific results, long-term capacity planning

2.  Op.miza.ons:  market segmentation, fraud heuristics, optimal customer journey  

Page 5: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps

ETL

CRM ERP Etc.

Data Lake (HDFS, etc.)

BIG DATA SQL on Hadoop

Map Reduce

Exploratory Analytics

BI Reporting

Fast Operational Database

FAST DATA

Export Ingest / Interactive

Real-time Analytics

Fast Serve Analytics

Decisioning

5

Page 6: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

IN THE BIG CORNER

Systems facilitating exploration and analytics of large collections.

6

Example Technologies Columnar OLAP warehouses Hadoop Ecosystem •  MapReduce •  Hive, Pig •  SQL.next: Impala, Drill, Shark

Example Applications •  User segmentation & pre-scoring •  Seasonal trending •  Recommendation matrices •  Building search indexes •  Data Science: statistical clustering,

machine learning

Page 7: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

IN THE FAST CORNER

Systems facilitating real time ingest, analytics and decisions against incoming streams of events.

7

Example Technologies •  Streaming frameworks •  Fast OLAP •  VoltDB (fast OLTP)

Example Applications •  Micro-personalization •  Recommendation serving •  Alerting/alarming •  Operational monitoring •  Data enrichment (ETL elimination) •  High throughput authorization

•  Ex: API quota enforcement

Page 8: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

TYPICAL FAST DATA QUESTIONS

8

Hadoop  Volume  

SQL  /  OLAP  Data  Science  

Fast  Velocity  

•  Is the fast layer streaming? •  It is often more like fast OLTP

•  How do the pieces communicate? •  OLAP analytics from Big -> Fast •  New events from Fast -> Big

•  Where do “analytics” belong? •  Analytics per-event: with Fast •  Analytics across history: with Big

•  Are streaming frameworks equivalent? •  Traditional SQL CEP (Esper, Streambase) •  Tuple DAGs (Storm) •  Window processors on Hadoop (Spark)

 

Page 9: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

HOW TO SOLVE IT*

9

*  With  credit  to  G.  Polya  

Considering  Data   Considering  Processing  

What  are  the  types  of  data  to  be  managed  in  fast  data  applica>ons?  

How  does  data  flow  through  fast  data  applica>ons?  

What  are  the  calcula>ons  &  analy>cs  that  are  necessary?  

Page 10: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

Data Temporality

Incoming events Click stream, tick stream, sensors, metrics

Real-Time Analytic Results

Event metadata Device version, location, user profiles, point-of-interest data

OLAP Analytics Used in Real-Time Decisions

Responses/side effects

10

Examples

Event Stream

Persistent (Queryable)

Persistent (Look-Ups)

Outgoing events

Persistent (Look-Ups)

Event Stream

Event Stream

Counters, streaming aggregates,Time-series rollups

Scoring models, seasonal usage,demographic trends

Policy enforcement decisions,personalization recommendations

Enriched, filtered, correlated transform of input feed

Page 11: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

SOURCES OF STATE

1.  Analytics outputs must be query-able.

2.  “Lookup tables” to create groupings for analytics and to supply enrichment data.

3.  Session managements: grouping, filtering and aggregating create intermediate state.

11

Page 12: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 12

Considering  Data   Considering  Processing  

What  are  the  types  of  data  to  be  managed  in  fast  data  applica>ons?  

How  does  data  flow  through  fast  data  applica>ons?  

What  are  the  calcula>ons  &  analy>cs  that  are  necessary?  

Page 13: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

DATA FLOWS

Real-time Analytics •  Streaming summaries for operations •  KPI measurement •  Analytics for apps

13

Real-Time Analytics

Page 14: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

DATA FLOWS

14

Fast Request/Response (and side effects) •  Mobile Authorization •  Campaign Evaluation •  Quota Enforcement •  Micro-Personalization •  Recommendation Serving

Request/ Response

Page 15: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

DATA FLOWS

Data Pipelines •  Data enrichment •  Sessionization and re-assembly of incoming events. •  Correlation (by time, location, identity) •  Filtering

15

Pipeline Data Lake

Page 16: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

1ST GENERATION FAST DATA: STREAMING ANALYTICS

• Examples: Spark Streaming, Storm, Kinesis, Tibco Streambase, et al

• Technical:• Lack “state” for transaction processing (operational)

• Complex programming model

• No ability to do ad hoc queries

• Functional: • 1st Gen only offers streaming analytics• Separate database required for any meaningful work• Proprietary interface is inconsistent with the rest of the data

pipeline• Does not support applications requirement for interaction

1st

Gen

Stre

amin

g

Stream Analytics

Query Predefined

Page 17: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

2ND GENERATION FAST DATA: STREAMING ANALYTICS & OPERATIONAL WORK

• Streaming Analytics converges with the operational applications

• Convergence is necessary to use data in real-time

• Automated application interactions are informed by data

• Brings the application into the “data analytics” world

• Streaming Analytics alone is passive, Fast Data is interac.ve     1

st G

en2

nd G

en

Stre

amin

g

Stream Analytics

Query Predefined

Ad hoc

Support Operational

Work

Vol

tDB

Page 18: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 18

Considering  Data   Considering  Processing  

What  are  the  types  of  data  to  be  managed  in  fast  data  applica>ons?  

How  does  data  flow  through  fast  data  applica>ons?  

What  are  the  calcula>ons  &  analy>cs  that  are  necessary?  

Page 19: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 19

Continuous Query Transactional Event Evaluation Transformation

Page 20: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

FAST DATA STACK

Applications, Message Queues, Data Sources

Ingest

Analyze Decide

•  Counters•  Aggregations•  Time series•  Statistics

•  Store results•  Query and

recombine•  Fast serving

•  Per-event policy evaluations•  Responses (synchronous):

authorization, personalization•  Side-effects (asynchronous): alerts,

alarms

Export & Pipeline

Page 21: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 21

Applications, Message Queues, Data Sources

Ingest

Analyze Decide

CountersAggregationsTime seriesStatistics

Store resultsQuery and recombineFast serving

Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)

Export & Pipeline

APACHE-ISH TECHNOLOGY STACK

Kafka / RabbitMQ

Storm, Flume, Sqoop

Storm + Serving Layer

Spark +

Serving Layer

Cassandra, HBase

Hadoop, Message queues

Page 22: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 22

Applications, Message Queues, Data Sources

Ingest

Analyze Decide

CountersAggregationsTime seriesStatistics

Store resultsQuery and recombineFast serving

Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)

Export & Pipeline

VOLTDB TECHNOLOGY STACK

Kafka / RabbitMQ

VoltDB

SQL, Java for Analytics

Transactions / ACID

Hadoop, Message queues

Page 23: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 23

OLTP (Transactions First)

Streaming Event Processors

OLAP (Columnar Analytics)

Page 24: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 24

Applications, Message Queues, Data Sources

Ingest

Analyze Decide

CountersAggregationsTime seriesStatistics

Store resultsQuery and recombineFast serving

Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)

Export & Pipeline

STREAM TECHNOLOGY STACK

Page 25: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB 25

Applications, Message Queues, Data Sources

Ingest

Analyze Decide

CountersAggregationsTime seriesStatistics

Store resultsQuery and recombineFast serving

Per-event policy evaluationsResponses (synchronous)Side-effects (asynchronous)

Export & Pipeline

OLAP TECHNOLOGY STACK

Page 26: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB

QUESTIONS?

•  Use the chat window to type in your questions

•  Try VoltDB yourself:Ø  www.voltdb.com/download

26

Page 27: How to Build Fast Data Applications: Evaluating the Top Contenders

page© 2016 VoltDB page

THANK YOU!

27