Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data...
Transcript of Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data...
![Page 1: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/1.jpg)
Coherence & Big Data
Ben Stopford
![Page 2: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/2.jpg)
Can you do ‘Big Data’ in Coherence?
![Page 3: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/3.jpg)
Maybe?!?!?!
• Problem: Cost of memory / 6x storage ratio – > Elastic data (Disk or RAM) – > Keep number indexes small – > off heap indexes (coming)
• Problem: Getting your (big) data loaded – > Recoverable caching – > Use other distributed backing store
![Page 4: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/4.jpg)
But
• Elastic data & recoverable caching are separate (plan to unify) – RC => ED is IO intensive (two distinct copies). – 2x disk footprint – No compression – Rebalance time – Memory Ratio (the 6x) >>> Low TB Zone
![Page 5: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/5.jpg)
BIG DATA BANDWAGON
BIG DATA!BAND
WAGGON!
![Page 6: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/6.jpg)
Backing Layer
Cohe
renc
e!
NoSQ
L!
Recent data in cache!
Fast data load!
Lower cost full history!
Write-through!
![Page 7: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/7.jpg)
Hadoop
• Backing – HDFS
• Big files (~GBs) • No random write (ok if you journal writes) • Use sequence files • Hard to manage active set
– Hbase (Better option) • Fast writes (LSM) • Supports predicate pushdown • More complex setup (ZK, NN etc)
![Page 8: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/8.jpg)
Heavily memory optimised. Fast but too similar to Coherence to be a good fit!
KV but can scan with MR API. Eventually consistentency may not suit!
Read/Memory optimised (3.0 big improvement). Rich queries.!
KV with secondary indexes & range predicates!
NoSQL Backing Low memory footprint, write optimised!• Cassandra
• MongoDB • Oracle NoSQL • Riak • Couchbase
![Page 9: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/9.jpg)
Streams
![Page 10: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/10.jpg)
Message Stream Products
RabbitMQ Kafka
Aeron
![Page 11: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/11.jpg)
• Great complement for Coherence • Write through to a topic. Immutable state.
Other !data !
center!DB
Cache of recent data with a rich query API!
Event stream!(system of record)!
Async views: relational, raw, streaming, historic!
Async!Streaming
clients!
sync!
async!
Inbound stream processors!Direct reads & writes!
Messaging as a Backing Store
![Page 12: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/12.jpg)
Hang Tertiary ‘VIEWS’
• Search: Elastic Search, Solr • Graph: Neo4J, OrientDB • Relational: Oracle. Postgres, Teradata • Analytic: Exadata, Teradata, Greenplumb • Document archive: Mongo • Hadoop: HBase, HDFS, Parquet, avro, PB etc
• Complexity increases with Polyglot Persistence Pattern.
• Replica instantiation is good
![Page 13: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/13.jpg)
Streams Processors
• Storm • Samza • Spark Steaming (microbatch) • Libraries such as Esper
![Page 14: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/14.jpg)
Stream layer (fast)!
Batch Layer!Serving Layer!
All y
our
data! Query!
Query!
Lambda Architecture
![Page 15: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/15.jpg)
Kafka + Storm!
Hadoop!Cassandra!
All y
our
data! Query!
Query!
Lambda Architecture
- Cool architecture for use cases that cannot work in a single pass.!- General applicability limited by double-query & double-coding.!
![Page 16: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/16.jpg)
All y
our
data!
Kappa Architecture Views!
Client!
Client!
Stream!
Search!
NoSQL!
SQL!
Stream !Processor!
![Page 17: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/17.jpg)
All y
our
data!
Kappa Architecture Views!
Client!
Client!
- Simpler choice where stream processors can handle full problem set!
Kaffka!
Elastic!Search!
Cassandra!
Oracle!
Samza or!Storm!
![Page 18: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/18.jpg)
Operational /Analytic Bridge
A
ll yo
ur d
ata!
Client!
Client!
Client!Operational!
Search!
SQL!
NoSQL!Stream!
Views!Stream !
Processor!
![Page 19: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/19.jpg)
Operational /Analytic Bridge
A
ll yo
ur d
ata!
Client!
Client!
Client!Coherence!
Hadoop!
Oracle!
Cassandra,!MongoDB!
Kaffka,!RabbitMQ!
…!
Views!
- Adds coordination layer needed for collaborative updates!
Samza!
![Page 20: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/20.jpg)
Nice Stuff
• Scale-by-Sharding at the front, Scale-by-Replication at the back
• Some “normalisation” at front. Fully denormlaised at the back.
• Rewind used to recreate ‘views’
![Page 21: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/21.jpg)
So
• New Coherence features should make TB+ generally viable
• Sensible caching/processing layer over a simpler store
• NoSQL can provide a sensible interim backing store for larger datasets
• Forms a great write-through layer atop a streaming architecture (Op/Analytic Bridge)
![Page 22: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,](https://reader034.fdocuments.us/reader034/viewer/2022042219/5ec53cb027207d51475aa479/html5/thumbnails/22.jpg)
Thanks!