MongoDB to Cassandra

MongoDB to CassandraThe Atlas Odyssey

Fred van den DriesscheEngineer@fredvdd

Tom McAdamCTO

Adam HorwichSystems Engineer@Mmmkayness

http://flickr.com/photos/dhammza/88644497/

Video and audio metadata from 20+ sources

Profiles and activity from video and audio products, social networks

Our platform - late 2012

MetaBroadcast platform

Analytic requests and groupings

Main clients Main Partners

Data Partners

What is Atlas?

etc...

/content

/schedules

/topics

sitemaps

radioplayer

interlinking

Atlas Data Model

brand item

series version

broadcast location

MongoDB

• flexible

• features

• really simple

• shell

Where MongoDB falls short

• too simple

• lack of control

• sharding

• embedding

Where to?

• add a cache?

Atlas API• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

Atlas API• content

• http://atlas.metabroadcast.com/3.0/content.json?uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations&apiKey=6ed2a984627daff816198acde82

• http://atlas.metabroadcast.com/3.0/content.json?apiKey=aaaa&uri=http://www.bbc.co.uk/programmes/b0074g7p&annotations=description,brand_summary,locations

• schedules

• http://atlas.metabroadcast.com/3.0/schedule.json?from=now&to=now.plus.3h&channel=bbcone&publisher=bbc.co.uk

• http://atlas.metabroadcast.com/3.0/schedule.json?from=1948-12-24&to=1948-12-25&channel=radio4&publisher=bbc.co.uk

• api explorer http://atlas.metabroadcast.com/#apiExplorer

Why Cassandra?

• scalability/performance

• row caches

• consistency control

• column-based model matches our use case

• ElasticSearch

• messaging

• tooling: bootstraps

What is Atlas?

etc...

Data ingest server DB

Update bus

HTTP server

Data model

• columns to model annotations

• secondary indexes• index.direct(keyspace, SEGMENT_URI_INDEX_CF, ConsistencyLevel.CL_QUORUM).

from(segment.getCanonicalUri()). to(segment.getIdentifier()). index().execute(requestTimeout, TimeUnit.MILLISECONDS);

ID generation

• give external data our own ID on ingest

• needs to be user-friendly:http://www.radiotimes.com/programme/cf2/eastenders

• mongo: findAndModify()

• solution: uses Astyanax client with its distributed locking

• more details: http://metabroadcast.com/blog/let-cassandra-identify-your-data

Where we’re at

• already live with some data

• alpha release of schedule endpoint coming soon

• later: roll out across other endpoints

Ops in Cassandra

• we love Puppet

• it’s great for automation and deployment

• MongoDB: 1 file

• Cassandra: 2 files!

• oh... tokens

Cassandra Tokens

• define where data is written to in a cluster

• therefore balanced tokens = balanced cluster

• tokens should be rack aware

• tools available to provide appropriate tokens for you

Cassandra plays nicely with AWS

• datacentre / rack aware

• AWS Region = Datacentre

• AWS Availability Zone = Rack

• only recently introduced in MongoDB but simple to implement in Cassandra

• horizontally (and vertically) scalable

Monitoring

• Nagios is a little threadbare for Cassandra

• basic TCP service check

• stats from API not very helpful

• nodetool and CLI tools useful

• manual effort to integrate them

• if only there was some useful service...

OpsCenter

• wonderful for an overview

• not so much for alerting ;)

• ohai API

• can integrate metrics into Nagios

Disaster Recovery

• we operate a 4 node cluster presently

• replication factor of 3 with quorum read/writes

• DR complicated by tokens

• cluster should be balanced

• snapshot + S3 Backups

Cluster Happiness and Headaches

• little maintenance overhead

• cluster rebalancing

• uncommon maintenance procedure

• schema changes are cumbersome

• little scope for rollback, can put cluster in unrecoverable state

Summary

• Mongo is good, Atlas has outgrown it

• Cassandra isn’t a drop-in replacement

• Ops more complex but so far so good

Questions?

MongoDB to Cassandra

Technology

Transcript of MongoDB to Cassandra

Cassandra Community Webinar - Introduction To Apache Cassandra 1.2

IT Guide to Big Data Infrastructure on ZeroStack · 2018-07-25 · siloed management of servers, storage, and networks ... deploy Cassandra or MongoDB, you have to put in effort to

NoSQL Failover Characteristics - ODBMS.org · NoSQL Failover Characteristics: Aerospike, Cassandra, Couchbase, MongoDB Denis Nelubin, Director of Technology, Thumbtack Technology

Base de Datos NoSQL: MongoDB vs. Cassandra en operaciones ...

PERFORMANCE EVALUATION OF SQL AND NOSQL …aircconline.com/ijdms/V9N6/9617ijdms01.pdf · Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB. We use a cluster of four nodes to

Helsinki Cassandra Meetup #2: From Postgres to Cassandra

Blog Serverdensity Com Mongodb vs Cassandra

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

NoSQL databases and Cassandracs.ulb.ac.be/.../teaching/infoh415/student_projects/cassandra.pdf · MySQL, PostgresSQL and NoSQL Cassandra, Hbase, MongoDB can be described in detail

NoSQL Failover Characteristics - ben stopford€¦ · NoSQL Failover Characteristics: Aerospike, Cassandra, Couchbase, MongoDB Denis Nelubin, Director of Technology, Thumbtack Technology

Anatoly Kulakov - · PDF fileAnatoly Kulakov. 1. 2 ... • OpenTSDB by 4x 44 InfluxDB MongoDB Cassandra Elasticsearch OpenTSDB 45. 46

Data Driven Performance Repository to Classify and ... · MongoDB. Cluster-Python Driver. Cassandra - Python Driver. Python. Spark Cluster. Spark - Cassandra Connector. Spark - MongoDB

MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

NoSQL : Unleash the Power of MongoDB...•Different Types of NoSQL Databases ₋Document Store – MongoDB, Elastic Search ₋Wide Column Store – Hadoop, Cassandra ₋Key Value

Centralized vs. DistributedDynamo Voldemort TokyoCabinet KAI Cassandra SimpleDB CouchDB Riak BigTable Hypertable HBase MongoDB Terrastore Scalaris BerkeleyDB MemcacheDB Redis ... BigTable

Mapping SQL to MongoDB - GitHub Pages€¦ · Mapping SQL to MongoDB Converting to MongoDB Terms MYSQL EXECUTABLE ORACLE EXECUTABLE MONGODB EXECUTABLE mysqld oracle mongod mysql sqlplus

SRI SIDDHARTHA INSTITUTE OF TECHNOLOGY, … Sem Syallabus.pdfMongoDB, Data types in MongoDB,MongoDB Query language. IV Introduction to cassandra and MAPREDUCE features, CQL data types,

Introduction to MongoDB - nymph332088.github.io€¦ · Introduction to MongoDB ... MongoDB Analysis MongoDB Demo with Large-scale Data . ... Beyond the data modeling SQL language

Partners for success · Oracle DB MongoDB Cassandra MySQL MySQL Cluster Riak PostgreSQL Voldemort Redis. Data Model Relational Object Column-Family Oracle DB Riak Cassandra MySQL

CrateDB vs. NoSQL Comparisongo.cratedb.com/rs/...Cassandra-MongoDB-Comparison.pdf · NoSQL databases like MongoDB and Apache Cassandra have pushed database technology beyond the schema,