Building Fast Applications for Streaming Data

44
page BUILDING FAST DATA APPLICATIONS WITH STREAMING DATA Ryan Betts, CTO VoltDB 1

Transcript of Building Fast Applications for Streaming Data

page

BUILDING FAST DATA APPLICATIONS WITH STREAMING DATA Ryan Betts, CTO

VoltDB

1

page © 2014 VoltDB PROPRIETARY page

AGENDA

•  Fast Data Application Patterns

•  Digging Deeper: Looking at the Data

•  Streaming Approach

•  DB Approach

•  Summary

2

page © 2014 VoltDB PROPRIETARY

Collect   Explore  

Analyze  Act  

Data leads to applications Applications create more data

3

page © 2014 VoltDB PROPRIETARY

DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps

ETL

CRM

ERP

Etc.

Data Lake (HDFS, etc.)

BIG DATA SQL on Hadoop

Map Reduce

Exploratory Analytics

BI Reporting

Fast Operational Database

FAST DATA

Export Ingest / Interactive

StreamingAnalytics

Fast Serve Analytics

Decisioning

4

page © 2014 VoltDB PROPRIETARY

IN THE BIG CORNER

Systems facilitating exploration and analytics of large data sets

5

Example  Technologies  Columnar  OLAP  warehouses  Hadoop  Ecosystem  •  MapReduce  •  Hive,  Pig  •  SQL.next:  Impala,  Drill,  Shark  

Example  Applica7ons  •  User  segmentaHon  &  pre-­‐scoring  •  Seasonal  trending  •  RecommendaHon  matrix  calculaHons  •  Building  search  indexes  •  Data  Science:  staHsHcal  clustering,  

Machine  learning  

page © 2014 VoltDB PROPRIETARY

IN THE FAST CORNER

Systems facilitating real time ingest, analytics and decisions against incoming event feeds

6

Example  Technologies  •  Streaming  frameworks  •  VoltDB  

 

Example  Applica7ons  •  Micro-­‐personalizaHon  •  RecommendaHon  serving  •  AlerHng/alarming  •  OperaHonal  monitoring  •  Data  enrichment  (ETL  eliminaHon)  •  High  throughput  authorizaHon  

•  Ex:  API  quota  enforcement  

page © 2014 VoltDB PROPRIETARY

REAL TIME SCORING EXAMPLE

OLAP / Hadoop

User  segmenta6on  model  Calculated  on  Big  Side  and  cached  in  Fast  Side  

Personaliza6on  requests  

Score  based  responses  

Game  play  events  and  scoring  decisions  exported  to  Big  

7

page

FAST AND BIG IN COMBINATION

•  Fast Profile •  In memory: user segmentation

- GB to TB (300M+ rows)

•  10k to 1M+ requests/sec

•  99 percentile latency under 5ms. (5x9’s under 50ms)

•  VoltDB export to Vertica

•  Big Profile •  TB to PB of historical data

•  Columnar analytics for fast reporting.

•  Real time ingest of historical data (possibly via VoltDB)

•  Vertica UDX to VoltDB

page © 2014 VoltDB PROPRIETARY

TYPICAL FAST QUESTIONS

9

Hadoop  

SQL  OLAP  Fast  

•  Is  the  fast  layer  streaming?  •  It  is  oSen  more  like  OLTP  

•  How  do  the  pieces  communicate?  •  OLAP  analyHcs  from  Big  -­‐>  Fast  •  New  events  from  Fast  -­‐>  Big  

•  Where  do  “analy7cs”  belong?  •  AnalyHcs  with  decisions:  with  Fast  •  AnalyHcs  against  history:  with  Big  

•  Are  streaming  frameworks  equivalent?  •  TradiHonal  SQL  CEP  (Esper)  •  Tuple  DAGs  (Storm)  •  Window  processors  on  Hadoop  (Spark)  

 

page © 2014 VoltDB PROPRIETARY

THREE FAST DATA APPLICATION PATTERNS

•  Real-Time Analytics •  Real-time analytics for operations •  Real-time KPI measurement

•  Real-time analytics for apps

•  Data Pipelines •  Streaming data enrichment

•  Sessionization / re-assembly

•  Correlation (by time, by location, by id)

•  Filtering

•  Pre-aggregation

10

•  Fast Request/Response •  Mobile Authorization •  Campaign Authorization

•  Fast API Quota Enforcement

•  Micro-Personalization

•  Recommendation Serving

page © 2014 VoltDB PROPRIETARY

DATA FLOWS

11

Pipeline

Data Lake HDFS/OLAP/Queue

Ingest

Real Time Analytics

Export

Using Real Time Data

Request/ Response

page © 2014 VoltDB PROPRIETARY

THE INPUT FEED IS ONLY A SMALL PART OF FAST DATA

Data Temporality

Input Feed Click stream, tick stream, sensors, metrics

Real-Time Analytic Results

Event metadata Device version, location, user profiles, point of interest data

OLAP Analytics Used in Real-Time Decisions

Responses

12

Examples

Event Stream

Persistent/ Queryable

Persistent (Look-Ups)

Pipeline Output

Persistent (Look-Ups)

Event Stream

Event Stream

Counters, streaming aggregates, Time-series rollups

Scoring models, seasonal usage, demographic trends

Policy enforcement decisions, Personalization recommendations

Enriched, filtered, correlated transform of input feed

page © 2014 VoltDB PROPRIETARY

THREE REQUIREMENTS CREATE STATE

1.  RT analytics outputs must be queryable

2.  Metadata, dimension data, “lookup tables” to create groupings for analytics and to supply enrichment data

3.  Grouping, filtering and aggregating generate intermediate state – open sessions, partially assembled logical events

13

page © 2014 VoltDB PROPRIETARY

STORM: A COMMON ALTERNATIVE

•  Spouts and Bolts

•  Streaming computation

•  Run snippets of java against each event

•  Connect queues to backends with intermediate code

14

But…  1.  Need  “lookup”  database  for  dimension  data.  2.  Need  a  “serving”  database  for  analyHc  results  3.  Need  addiHonal  management  clusters  (ZooKeeper)  4.  No  ad-­‐hoc  queries.  5.  Lots  of  custom  code  (rarely  declaraHve).  

page © 2014 VoltDB PROPRIETARY

STREAMING OPERATORS NEED STATE

Require State

•  Filter

•  Join

•  Aggregate

•  Group By

Stateless

•  Partition

15

page © 2014 VoltDB PROPRIETARY

VOLTDB: REAL-TIME ANALYTICS

16

VoltDB

Metadata  (Dimension  table)  

Session  state  (Fact  table)   •  Operational analytics and

monitoring

•  RT analytics enabling user-facing applications

•  KPI for internal BI/Dashboards

•  In-memory MPP SQL over ODBC/JDBC

•  Cheap + correct materialized views for streaming aggregations

SQL,  Views  

Ingest

page © 2014 VoltDB PROPRIETARY

VOLTDB: REQUEST/RESPONSE DECISIONS

17

•  Authorization •  RT balance checks, quota

enforcement

•  Personalization and Recommendation Serving •  Combine pre-score with immediate

context

•  Fully ACID transaction model.

•  Thousands to Millions per second

•  At less than 5ms latencies

Metadata  (Dimension  table)  

Session  state  (Fact  table)  

ACID  TransacHons  

page © 2014 VoltDB PROPRIETARY

VOLTDB: DATA PIPELINES WITH EXPORT

18

VoltDB

Metadata  (Dimension  table)  

Session  state  (Fact  table)  

•  Filtering (ex: only RFID / iBeacon readings that show change from previous location).

•  Sessionization

•  Common version re-writing

•  Data enrichment

•  MPP streaming Export

•  Row data, Thrift messages, CSV

•  OLAP, HDFS and message queues

Export

page © 2014 VoltDB PROPRIETARY

PIPELINE DEPLOYED: VOLTDB…

19

Manages  game  state  for  online  poker  and  archives  completed  games  to  Hadoop.    Ingests  smart  meter  readings  from  concentrators,  supports  real  7me  applica7ons  and  buffers  data  for  end  of  day  billing  media7on  systems.    

page © 2014 VoltDB PROPRIETARY

PIPELINE DEPLOYED: VOLTDB…

20

Ingests  RFID  readings,  supports  real  Hme  applicaHons  that  push  social  media  updates  based  on  VoltDB  leaderboards.    Processes  clickstream  logs  and  exports  correlated  USERID  records  for  use  at  CDN  endpoints  for  adverHsing  targeHng    Processes  SKU  catalogs  from  suppliers  to  produce  correlated  catalog  that  is  exported  to  indexing  and  post-­‐processing  for  an  online  retailer.  

page © 2014 VoltDB PROPRIETARY

VOLTDB EXPORT ANSWERS THE QUESTIONS:

How do I stream filtered, enriched, updated results to OLAP/HDFS systems?

21

How do I send alerts, alarms, SMS, or messages to downstream applications?

page © 2014 VoltDB PROPRIETARY

VOLTDB EXPORT UI

CREATE TABLE events ( ! EventID INTEGER, ! time TIMESTAMP, ! msg VARCHAR(128)); !EXPORT TABLE events; !

22

<export enabled="true" target="file"> !

ddl.sql!

deployment.xml !INSERT into TABLE values… !Application SQL !

page © 2014 VoltDB PROPRIETARY

EXPORTING TO HDFS

<export enabled="true" target="http"> ! <configuration> ! <property name="endpoint"> ! http://hadoopserver/webhdfs/v1.0/%t/%p.%t.%g.csv! </property> ! </configuration> !</export>

page © 2014 VoltDB PROPRIETARY

EXPORT FORMATS

•  CSV

•  TSV

•  Avro container

•  Raw data

page © 2014 VoltDB PROPRIETARY

EXTENSIBLE API

25

public void onBlockStart() throws RestartBlockException;{ !} !!public boolean processRow(int rowSize, byte[] rowData) throws

"RestartBlockException { !} !!public void onBlockCompletion() throws RestartBlockException { !} !

All  of  these  export  connectors  are  hosted  plugins  to  the  VoltDB  database.  VoltDB  manages  HA,  fault  tolerance,  configura6on,  and  MPP  scale-­‐out.  

page © 2014 VoltDB PROPRIETARY

DATA ARCHITECTURE FOR FAST + BIG DATA

Enterprise Apps

ETL

CRM

ERP

Etc.

Data Lake (HDFS, etc.)

BIG DATA SQL on Hadoop

Map Reduce

Exploratory Analytics

BI Reporting

Fast Operational Database

FAST DATA

Export Ingest / Interactive

StreamingAnalytics

Fast Serve Analytics

Decisioning

26

page © 2014 VoltDB PROPRIETARY 27

page © 2014 VoltDB PROPRIETARY page

THANK YOU!

28

page © 2014 VoltDB PROPRIETARY

VOLTDB:

29

•  We  say  “Ingest  &Export”  vs.  Spout  and  Bolt  •  Scale  (ACID)  snippets  of  Java  for  each  

incoming  event.    

AND…  •  Actually  serve  real  7me  analy7cs  via  SQL  •  Metadata  for  lookup/enrichment  implicit  in  DB  func7on  •  Integrate  with  OLAP  systems  to  use  OLAP  reports  with  event-­‐

based  processing  •  Generate  fast  transac7onal  responses  •  Support  ad-­‐hoc  queryability  •  Declara7ve  aggrega7ons  vs.  code  •  Fast:  no  need  to  micro-­‐batch  

page © 2014 VoltDB PROPRIETARY

VOLTDB: DATA PIPELINES WITH EXPORT

30

VoltDB

Metadata  (Dimension  table)  

Session  state  (Fact  table)  

•  Filtering (ex: only RFID / iBeacon readings that show change from previous location).

•  Sessionization

•  Common version re-writing

•  Data enrichment

•  MPP streaming Export

•  Row data, Thrift messages, CSV

•  OLAP, HDFS and message queues

Export

page © 2014 VoltDB PROPRIETARY

INTEGRATING DATA SOURCES WITH VOLTDB

•  CSV  loader  •  Kaga  loader  •  JDBC  loader  •  Vertica UDx  •  Extensible  loader  API  

•  JDBC  •  ODBC  •  HTTP  JSON  •  NaHve  client  drivers  /  SDKs  

BULK  LOADERS   APPLICATION  INTERFACES  

page © 2014 VoltDB PROPRIETARY

INTEGRATING WITH CSV DATA

csvloader volttable -f data.csv!

page © 2014 VoltDB PROPRIETARY

INTEGRATING WITH KAFKA

kafkaloader volttable \ ! --zookeeper=zkserver:2181 \ ! --topic=topicname!

page © 2014 VoltDB PROPRIETARY

INTEGRATING WITH JDBC SOURCES

jdbcloader volttable \ ! --jdbcurl=jdbc:postgresql://server/db \ ! --jdbcdriver=org.postgresql.Driver \ ! --jdbctable=table

page © 2014 VoltDB PROPRIETARY

INTEGRATING WITH HP VERTICA UDX

SELECT voltdbload(c1, c2, c3 ! USING PARAMETERS voltservers='localhost’, ! volttable=‘volttable’) !FROM T;!

page © 2014 VoltDB PROPRIETARY

EXTENSIBLE VOLTBULKLOADER: BULK SMASH

36

VoltBulkLoader loader = !"client.getNewBulkLoader(tableName, maxBatchSize, "failureCallback); !

for (…) { ! loader.insertRow(handle, values); !} !loader.drain(); !loader.close(); !

All  of  these  tools  are  built  on  a  MIT  licensed  extensible  API  that  provides  performance  op6miza6ons,  batching,  load  balancing.  

page © 2014 VoltDB PROPRIETARY

NATIVE CLIENT LIBRARIES

•  Java

•  C++

•  PHP

•  Node.js  •  Go  •  Python  •  Erlang  •  Ruby  

Or,  just…  curl ‘http://localhost:8080/api/1.0/?\ ! Procedure=Vote&Parameters=[1,1,0]’ !  

page © 2014 VoltDB PROPRIETARY

VOLTDB EXPORT TOPOLOGIES

•  VoltDB -> Queue (Kafka, RabbitMQ)

•  VoltDB -> HDFS (for Pig/Hive/etc. processing)

•  VoltDB -> OLAP (Vertica, Netezza..)

•  VoltDB -> HTTP Endpoint, i.e: ElasticSearch

38

page © 2014 VoltDB PROPRIETARY

VOLTDB EXPORT PROGRAMMING CONTRACT

•  Export data is durable until exported

•  MPP scale-out of export data flows

•  At-least-once delivery during HA events

•  Built-in row ids (for uniqueness filtering)

•  Extensible API for open source connectors

39

page © 2014 VoltDB PROPRIETARY

INTEGRATING VOLTDB WITH EXPORT TARGETS

40

•  Local  file  system  export  •  JDBC  export  •  Kaga  export  •  RabbitMQ  export  •  HDFS  export  •  HTTP  export  •  Extensible  API  

page © 2014 VoltDB PROPRIETARY

EXPORTING TO LOCAL FILE SYSTEM

<export enabled="true" target="file"> ! <configuration> ! <property name="type">csv</property> ! <property name="nonce">MyExport</property> ! </configuration> !</export> !

page © 2014 VoltDB PROPRIETARY

EXPORTING TO JDBC DESTINATIONS

<export enabled="true" target=”jdbc"> ! <configuration> ! <property name=”jdbcurl">jdbc:postgresql://server/db</property> ! <property name=”jdbcuser">guest</property> ! </configuration> !</export>

page © 2014 VoltDB PROPRIETARY

EXPORTING TO KAFKA

<export enabled="true" target=”kafka"> ! <configuration> ! <property name=“metadata.broker.list”>server1</property> ! </configuration> !</export>

page © 2014 VoltDB PROPRIETARY

EXPORTING TO RABBITMQ

<export enabled="true" target="rabbitmq"> ! <configuration> ! <property name="broker.host”>server1</property> ! </configuration> !</export>