Deep dive into event store using Apache Cassandra

Post on 18-Jul-2015

377 views 20 download

Tags:

Transcript of Deep dive into event store using Apache Cassandra

It's about time : Deep dive into event store using

Apache Cassandra

by Nikunj Thakkar

Agenda

● What is Big Data?● So far in AJM Bigdata

Series● Where it is? Am I using

it?

Big Data at-a-glanceIntroduction to

Apache Cassandra

● What, When and Why of Cassandra

● Protocol, Architecture, Queries and Evrything else

● Interesting Use-cases● Demo

Big Data at-a-glance

What is BigData?

What is Big Data?Large amount of data

that can only be processed during

night hours

क्यया कममी रह गयमीथमी ममेरमे जजॉब डडिडसकस रप्शन मम?

What is Big Data?

So far in AJM Big Data

Series

4 V's OfBig Data

4 V's OfBig Data

Volume

Variety

VelocityVeracity

CAPTheoram

CAPTheoram

Consistency

Availability

Partition tolerance

Family of

NOSQLDATABASES

Family of

NOSQLDATABASES

Wide Column Store / Column Families

Document Store

Wide Column Store / Column FamiliesWide Column Store / Column Families

Key Value / Tuple Store

Graph Databases

Multimodel Databases

Object Databases

Grid & Cloud Database Solutions

Object Databases

XML Databases

Multidimensional Databases

Multivalue Databases

Big Data: Where it is?

Big Data: Am I using it?

Targeted marketingPublic sector

Big Data: Am I using it?

Health careSocial media and web data

Global personal location tracking

Social media and web dataSocial media and web data

Automated device generated data

Introduction to

Apache Cassandra

Hey wait,First tell me about events and event

stores.

What is Apache Cassandra?

Top level Apache Project

Born at Facebook

Google's Big Table + Amazone's Dynamo = Cassandra

Demo Time

Network Topology – Multiple DC

Why Apache Cassandra?

Elastic scalability

Always on architecture -

No single point of failure

Fast linear-scale performance

Flexible data storage

Easy data distribution

Operational simplicity

Transaction support

Apache Cassandra: When to use?

Just kidding.... We will cover this part in use cases. :) :)

Apache Cassandra: Interesting Facts

Apache Cassandra: Interesting

Facts

Protocol

Apache Cassandra: Interesting

Facts

Protocol

Thriftvs

CQL Binary Protocol

Apache Cassandra: Interesting

Facts

Architecture

Apache Cassandra: Interesting

Facts

Architecture

Key structures➔ Node➔ Data Center➔ Cluster➔ Commit Log➔ Table➔ SSTable

Apache Cassandra: Interesting

Facts

Architecture

Key components

Apache Cassandra: Interesting

Facts

Architecture

Key components➔ Gossip➔ Partitioner➔ Replication factor➔ Replica placement

strategy➔ Snitch

Apache Cassandra: Interesting

Facts

Cassandra Query Language

Apache Cassandra: Interesting

Facts

Cassandra Query Language

➔ CRUD➔ Data Modeling➔ Indexing

Apache Cassandra@ Disqus

Apache Cassandra@ Disqus

➔ Disqus - Disqus is a discussion platform for the web. It connects publishers with users and allow them to have a public discourse in a medium that allows communication across the web.

Apache Cassandra@ Disqus

➔ Disqus uses Cassandra in a number of different places. Mainly in the product; it’s used for content recommendation and also a little bit of advertising. Let’s say you’re on that article reading about the war in Syria and you notice that there’s another interesting article relating to what the British PMs have released as a public statement relating to whether or not it’s legal to go to war, and maybe you’re interested in reading that response. What Cassandra does is it powers the analytics and content engine behind how disqus recommends content.

Apache Cassandra@ Disqus

➔ Main cluster - 24 nodes➔ CPU - 6-core Xeons 3Ghz – Biggest – Because

turning out to be a small bottleneck at times➔ 24GB RAM – Per node – 8 GB Heap Size➔ 32 or 48GBs RAM wasn't helping much➔ it’s handling our load of about 30,000 reads a second

Apache Cassandra@ Many other companies

Thank you :)

Resources for Material

● http://smartdatacollective.com/bernardmarr/277731/big-data-25-facts-everyone-needs-know

● http://blog.gramener.com/1984/indian-elections-2014-big-data-for-billion-people

● http://indiaspora.org/blog/indian-elections-2014-big-data-for-billion-people/

● http://www.slideshare.net/BernardMarr/140228-big-data-volume-velocity-variety-varacity-value

● http://www.datastax.com/documentation/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html

● http://planetcassandra.org/

● http://planetcassandra.org/blog/disqus-discusses-migration-from-redis-to-cassandra-for-horizontal-scalability/

● http://wiki.apache.org/cassandra/

Resources for Graphics● http://newstonight.net/content/obesity-pushing-diabetes-cases

● http://lordapes.blogspot.in/

● http://blog.marketo.com/2013/07/big-data-it-doesnt-mean-what-you-think-it-means.html

● http://www.portaloko.hr/clanak/20-stvari-koje-muskarci-nikada-nece-shvatiti-kod-zena/0/59710/

● http://www.blankchapters.com/wp-content/uploads/2012/12/meme-data-data-everywhere.png

● https://medium.com/media-changes/don-draper-applies-for-a-job-in-2013-59aec7398582

● http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

● http://www.slideshare.net/planetcassandra/apache-cassandra-and-datastax-enterprise-explained-with-peter-halliday-at-wildhacks-nu

● http://qph.is.quoracdn.net/main-qimg-dce3b73956c5313650022a5b22068982?convert_to_webp=true

● http://treasure.diylol.com/uploads/post/image/553987/resized_kevin-hart-meme-generator-i-would-take-questions-but-the-way-my-presentation-is-set-up-f1cfd6.jpg