C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
-
Upload
datastax -
Category
Technology
-
view
103 -
download
0
description
Transcript of C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
![Page 1: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/1.jpg)
COMPLEX EVENT PROCESSING W/
CASSANDRA
Brian O’NeillLead Architect, Health Market [email protected]@boneill42
Taylor GoetzDevelopment Lead, Health Market [email protected]@ptgoetz
![Page 2: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/2.jpg)
Use CaseWhat is CEP? Why? What for?
Storm BackgroundCluster configuration
Examples / Demo Future : Trident
Agenda
![Page 3: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/3.jpg)
Our Products
Master Data ManagementGood, bad doctors?
Prescriber eligibility and remediation.
![Page 4: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/4.jpg)
Cassandra to the Rescue
1000’s of Feeds
Δt
C* Masterfile
Big Data for us == Variety of Data
![Page 5: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/5.jpg)
But…
Search unstructured data Real-time Analytics / Reporting Transactional Processing
Changes reflected immediately.Wide-row Indexes
![Page 6: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/6.jpg)
What might that look like?
C*
RDBMS
I’m happy
Dro
pwiz
ard
wid
e-ro
w in
dex
Provide for Polyglot Persistence
![Page 7: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/7.jpg)
What we did wrong… (part 1)
Could not react to transactional changes Needed extra logic to track what changed Took too long
![Page 8: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/8.jpg)
What we did wrong… (part 2)
Good IntentionsTake the onus off the clients
Bad Result Guaranteeing executionWrite Overhead
C*Wide Row….Indexing
TRIGGERS
![Page 9: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/9.jpg)
What is Complex Event Processing? Event processing is a method of tracking and
analyzing (processing) streams of information (data) about things that happen (events),[1] and deriving a conclusion from them.
Complex event processing, or CEP, is event processing that combines data from multiple sources[2] to infer events or patterns that suggest more complicated circumstances.
http://en.wikipedia.org/wiki/Complex_event_processing
![Page 10: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/10.jpg)
Imagine…
Treating CRUD operations as events in a system.
Then Suddenly,
CEP = (ETL or Analytics)
![Page 11: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/11.jpg)
What Storm is to us…
CrudOp
A High Throughput Data Processing Pipeline
RDBMSSoR
Dimensional Counts
ETL Enrichment
Fuzzy Index
![Page 12: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/12.jpg)
Enter Storm
![Page 13: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/13.jpg)
Storm Overview
Open-Sourced by Twitter in 2011 Distributed Realtime Computation System Fault Tolerant Highly Scalable Guaranteed Processing Operates on one or more streams of data (i.e.
CEP)
![Page 14: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/14.jpg)
Anatomy of a Storm Cluster
NimbusMaster Node
ZookeeperCluster Coordination
SupervisorsWorker Nodes
![Page 15: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/15.jpg)
Storm Components
SpoutsStream Sources
BoltsUnit of Computation
TopologiesCombination of n Spouts and n BoltsDefines the overall “Computation”
![Page 16: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/16.jpg)
Storm Spouts
Represents a source (stream) of dataQueues (JMS, Kafka, Kestrel, etc.)Twitter FirehoseSensor Data
Emits “Tuples” (Events) based on sourcePrimary Storm data structureSet of Key-Value pairs
![Page 17: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/17.jpg)
Storm Bolts
Receive Tuples from Spouts or other Bolts Operate on, or React to Data
Functions/Filters/Joins/AggregationsDatabase writes/lookups
Optionally emit additional Tuples
![Page 18: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/18.jpg)
Storm Topologies
Data flow between spouts and bolts Routing of Tuples between spouts/bolts
Stream “Groupings” Parallelism of Components Long-Lived
![Page 19: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/19.jpg)
Storm and Cassandra
Use Cases:Write Storm Tuple data to C*
○ Computation Results○ Pre-computed indices
Read data from C* and emit Storm Tuples○ Dynamic Lookups
http://github.com/hmsonline/storm-cassandra
![Page 20: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/20.jpg)
Storm Cassandra Bolt Types
CassandraBolt
C*CassandraLookupBolt
STORM
CassandraBoltWrites data to CassandraAvailable in Batching and Non-Batching
CassandraLookupBoltReads data from Cassandra
http://github.com/hmsonline/storm-cassandra
![Page 21: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/21.jpg)
Storm-Cassandra Project
Provides generic Bolts for writing/reading Storm Tuples to/from C*
TupleTuple
Mapper Rows
C*TuplesColumnsMapper Columns
STORM
http://github.com/hmsonline/storm-cassandra
![Page 22: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/22.jpg)
Storm-Cassandra Project
TupleMapper InterfaceTells the CassandraBolt how to write a tuple to an
arbitrary data model
Given a Storm Tuple:Map to Column FamilyMap to Row KeyMap to Columns
http://github.com/hmsonline/storm-cassandra
![Page 23: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/23.jpg)
Storm-Cassandra Project
ColumnsMapper InterfaceTells the CassandraLookupBolt how to transform a
C* row into a Storm Tuple
Given a C* Row Key and list of Columns:Return a list of Storm Tuples
http://github.com/hmsonline/storm-cassandra
![Page 24: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/24.jpg)
Storm-Cassandra Project Current State:
Version 0.4.0-WIPUses Astyanax ClientSeveral out-of-the-box *Mapper Implementations:
○ Basic Key-Value Columns○ Value-less Columns○ Counter Columns○ Lookup by row key○ Lookup by range query
Initial pass at Trident supportInitial pass at Composite Column Support
http://github.com/hmsonline/storm-cassandra
![Page 25: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/25.jpg)
Storm-Cassandra Project
Future Plans:Switch to CQL (???)Full Trident Support
http://github.com/hmsonline/storm-cassandra
![Page 26: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/26.jpg)
Word Count Demo
http://github.com/hmsonline/storm-cassandra
![Page 27: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/27.jpg)
DRPC
![Page 28: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/28.jpg)
Reach Demo
![Page 29: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/29.jpg)
Next Level : Trident
![Page 30: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/30.jpg)
Trident
Provides a higher-level abstraction for stream processingConstructs for state management and Batching
Adds additional primitives that abstract away common topological patterns
Deprecates transactional topologies Distributes with Storm
![Page 31: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/31.jpg)
Sample Trident Operations Partition Local
Functions ( execute(x) x + y )Filters ( isKeep(x) 0,x )PartitionAggregate
○ Combiner ( pairwise combining )○ Reducer ( iterative accumulation )○ Aggregator ( byoa )
![Page 32: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/32.jpg)
A sample topologyTridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"),
new Split(), new Fields("word"))
.groupBy(new Fields("word")) .persistentAggregate(
MemcachedState.opaque(serverLocations), new Count(), new Fields("count"))
.parallelismHint(6);
https://github.com/nathanmarz/storm/wiki/Trident-state
![Page 33: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/33.jpg)
Trident StateSequenced writes by batch/transaction id. Spouts
Transactional○ Batch contents never change
Opaque○ Batch contents can change
StateTransactional
○ Store tx_id with counts to maintain sequencing of writes.Opaque
○ Store previous value in order to overwrite the current value when contents of a batch change.
![Page 34: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/34.jpg)
Shameless Shoutouts
HMS (https://github.com/hmsonline/)storm-cassandrastorm-elastic-searchstorm-jdbi (coming soon)
ptgoetz (https://github.com/ptgoetz) storm-jmsstorm-signals
![Page 35: C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm](https://reader035.fdocuments.us/reader035/viewer/2022062511/54c665fb4a795928268b458e/html5/thumbnails/35.jpg)
Brian O’NeillLead Architect, Health Market [email protected]@boneill42
Taylor GoetzDevelopment Lead, Health Market [email protected]@ptgoetz
THANKS!