Nyc summit intro_to_cassandra

PowerPoint Presentation

Java, Big Data and
Apache Cassandra

Nate [email protected]@zznate

Apache Cassandra: Origins in big data

TODO: need fb logo

Apache Cassandra: Origins in big data

TODO: need fb logo

But first... the CAP Theorem

ConsistencyAvailability Partition Tolerance

Thou shalt have but 2

- Conjecture made by Eric Brewer in 2000- Published as formal proof in 2002- See: http://en.wikipedia.org/wiki/CAP_theorem for more

CAP Theorem: Cassandra Style

- Explicit choice of partition tolerance and availability. - Opt for more consistency at the cost of availabilityConsistency is tunable (per operation)

Apache Cassandra Concepts

- No read before write- Merge on read- Idempotent- Schema Optional- All nodes share the same roll- Still performs well with larger-than-memory data sets

Generally complements another system(s)

(Not intended to be one-size-fits-all)

*** You should always use the right tool for the right job anyway

How does this differ from an RDBMS?

How does this differ from an RDBMS?

Substantially.

vs. RDBMS - No Joins

Unless: - you do them on the client - you do them via Map/Reduce

vs. RDBMS - Schema Optional

(Though you can add meta information for validation and type checking)

*** Supports secondary indexes too: WHERE state = 'TX'

vs. RDBMS - Prematerialized and Transaction-less

- No ACID transactions - Limited support for ad-hoc queries

vs. RDBMS - Prematerialized and Transaction-less

- No ACID transactions - Limited support for ad-hoc queries

*** You are going to give up both of these anyway when you shard an RDBMS ***

vs. RDBMS - Facilitates Consolidation

It can be your caching layer * Off-heap cache (provided you install JNA)

It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon

vs. RDBMS - Shared-Nothing Architecture

Every node plays the same role: no masters, no slaves, no special nodes

*** No single point of failure

vs. RDBMS - Real Linear Scalability

Want 2x performance? Add 2x nodes (with no downtime!)

vs. RDBMS - Performance

Reads on par with writes

Clustering

Clustering

Consistent Hashing FTW:- No fancy shard logic or tedious management of such required - Ring ownership continuously gossiped between nodes- Any node can act as a coordinator to service client requests for any key * requests forwarded to the appropriate nodes by coordinator transparently to the client

Clustering

Single node cluster (easy development setup)- one node owns the whole hash range

Clustering

Two node cluster- Key range divided between nodes

Clustering

Consistent Hashing: md5(zznate) = C

Clustering: The Client's Perspective

Client Read: get(zznate)md5 = C

Clustering Scale Out



Clustering - Multi-DC

Clustering - Reliability




Clustering - Multi-Datacenter

Clustering Multi-DC Reliability

Storage (Briefly)

Storage (Briefly)

Understanding the on-disk format is extremely helpful in designing your data model correctly

Storage - SSTable

- SSTables are immutable (Merge on read)- Newest timestamp wins

Storage Compaction

Merge SSTables keeping count down making Merge on Read more efficientDiscards Tombstones (more on this later!)

Data Model

Data Model

"...sparse, persistent, distributed, multi-dimensional sorted map."

(The Bigtable paper)

Data Model

Keyspace- Collection of Column Families

- Controls replication

Column Family

- Similar to a table

- Columns ordered by name

Data Model Column Family

Static Column Family- Model my object data

Dynamic Column Family

- Pre-calculated query results

Nothing stopping you from mixing them!

Data Model Static CF

GOOG

AAPL

NFLX

NOK

price: 589.55

price: 401.76

price: 78.73

name: Google

name: Apple

name: Netflix

price: 6.90

name: Nokia

exchange: NYSE

Stocks

Data Model Prematerialized Query

StockHist

10/25/2011: 6.71

GOOG

AAPL

NFLX

NOK

10/24/2011: 6.76

10/21/2011: 6.61

10/25/2011: 77.37

10/24/2011: 118.84

10/21/2011: 117.04

10/25/2011: 397.77

10/24/2011: 405.77

10/21/2011: 392.87

10/25/2011: 583.16

10/24/2011: 596.42

10/21/2011: 590.49

Data Model Prematerialized Query

Additional examples:Timeline of tweets by a userTimeline of tweets by all of the people a user is followingList of comments sorted by scoreList of friends grouped by state

API Operations

Five general categories

RetrievingWriting/Updating/Removing (all the same op!)Increment counters

Meta InformationSchema ManipulationCQL Execution

Big Data Fun and Hijinks

- Hadoop integration- Pig Integration- Hive Integration * open source version coming soon * available in DataStax Enterprise

Big Data: Map/Reduce Integration

Cassandra Implementations of:- InputFormat and OutputFormat - RecordReader and RecordWriter- InputSplit for Column Families

*** See org.apache.cassandra.hadoop package and examples for more

Big Data: Pig Integration

grunt> name_group = GROUP score_data BY name PARALLEL 3;

grunt> name_total = FOREACH name_group GENERATE group, COUNT(score_data.name), LongSum(score_data.score) AS total_score;

grunt> ordered_scores = ORDER name_total BY total_score DESC PARALLEL 3;

grunt> DUMP ordered_scores;

Using a Client

Hector Client:http://hector-client.org- Most popular Java client - In use at very large installations- A number of tools and utilities built on top- Very active community- MIT Licensed

*** like any open source project fully dependent on another open source project it has it's worts

Sample Project for Experimenting

https://github.com/zznate/cassandra-tutorialhttps://github.com/zznate/hector-examplesBuilt using Hector Really basic designed to be beginner level w/ very few moving partsModify/abuse/alter as needed

*** Descriptions of what is going on and how to run each example are in the Javadoc comments.

Hector: ColumnFamilyTemplate

Familiar, type-safe approach- based on template-method design pattern- generic: ColumnFamilyTemplate (K is the key type, N the column name type)

ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get());

*** (no generics for clarity)


new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get());

Key Format

Column Name Format- Cassandra calls this a comparator- Remember: defines column order in on-disk format


ColumnFamilyResult res = cft.queryColumns("zznate");

String value = res.getString("email");

Date startDate = res.getDate(startDate);

Key Format

Column Name Format


ColumnFamilyResult wrapper = template.queryColumns("zznate", "patricioe", "thobbs");

while (wrapper.hasNext() ) { emails.put(wrapper.getKey(), wrapper.getString("email"));...

Querying multiple rows


ColumnFamilyResult wrapper = template.queryColumns("zznate", "patricioe", "thobbs");

while (wrapper.hasNext() ) { emails.put(wrapper.getKey(), wrapper.getString("email")); ...

Iterating over results


ColumnFamilyUpdater updater = template.createUpdater("zznate");

updater.setString("companyName","DataStax");updater.addKey("sergek");updater.setString("companyName","PrestoSports");

template.update(updater);

Insert: Creating an updater for a key


ColumnFamilyUpdater updater = template.createUpdater("zznate"); updater.setString("companyName","DataStax");

updater.addKey("sergek");updater.setString("companyName","PrestoSports");


Insert: Adding Multiple Rows


ColumnFamilyUpdater updater = template.createUpdater("zznate"); updater.setString("companyName","DataStax");

updater.addKey("sergek");updater.setString("companyName","PrestoSports");


Insert: Invoking Batch Execution


template.deleteColumn("zznate", "notNeededStuff");template.deleteColumn("zznate", "somethingElse");template.deleteColumn("patricioe", "aDifferentColumnName");...template.deleteRow(someuser);

template.executeBatch();

Deleting Data: Single Column


template.deleteColumn("zznate", "notNeededStuff");template.deleteColumn("zznate", "somethingElse");template.deleteColumn("patricioe", "aDifferentColumnName");...template.deleteRow(someuser);

template.executeBatch();

Deleting Data: Whole Row

Deletion

Deletion

Again: Every mutation is an insert!

- Merge on read

- Sstables are immutable

- Highest timestamp wins

Deletion As Seen by CLI

[default@Tutorial] list Portfolio;Using default limit of 100

-------------------

RowKey: 12783

=> (column=GOOG, value=30, timestamp=1310340410528000)

-------------------

RowKey: 15736

=> (column=AAPL, value=20, timestamp=1310143852392000)

=> (column=NOK, value=90, timestamp=1310143852444000)

=> (column=IBM, value=50, timestamp=1310143852448000)


=> (column=INTC, value=200, timestamp=1310143852457000)

Deletion As Seen by CLI

[default@Tutorial] list Portfolio;Using default limit of 100

-------------------

RowKey: 12783

-------------------

RowKey: 15736

=> (column=AAPL, value=20, timestamp=1310143852392000)

=> (column=NOK, value=90, timestamp=1310143852444000)

=> (column=IBM, value=50, timestamp=1310143852448000)


=> (column=INTC, value=200, timestamp=1310143852457000)

Deletion FYI

mutator.addDeletion("14100", "INTC", 75, stringSerializer);

Does not exist? You just inserted a tombstone!

Sending a deletion for a non-existing row:

[default@Tutorial] list Portfolio; Using default limit of 100

. . .

-------------------

RowKey: 14100

-------------------

. . .

Integrating with existing patterns


Yes.



Hector Object Mapper (simple, JPA 1.0-style annotations):

https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29

Hector JPA (experimental open-jpa implementation):

https://github.com/riptano/hector-jpa


private static final String STOCK_CQL =

select price FROM Stocks WHERE KEY = ?";

jdbcTemplate.query(STOCK_CQL, stockTicker,

new RowMapper() {

public Stock mapRow(ResultSet rs, int row) throws SQLException {

CassandraResultSet crs = (CassandraResultSet)rs;

Stock stock = new Stock();

stock.setTicker(new String(crs.getKey()));

stock.setPrice(crs.getDouble("price"));

return stock;

}

});


private static String UPDATE_PORTOFOLIO_CQL =

"update Portfolios set ? = ? where KEY = ?";

jdbcTemplate.update(UPDATE_PORTFOLIO_CQL,

new Object[] {position.getTicker(),

position.getCount(),

portfolio.getName()});


private static final String UPDATE_PORT_CQL =

"update Portfolios set ? = ? where KEY = ?";

jdbcTemplate.batchUpdate(UPDATE_PORT_CQL,

new BatchPreparedStatementSetter() {

public void setValues(PreparedStatement ps, int index) throws SQLException {

Position pos = portfolio.getConstituents().get(index);

ps.setString(1, pos.getTicker());

ps.setLong(2, pos.getShares());

ps.setString(3,portfolio.getName());

}

public int getBatchSize() {

return portfolio.getConstituents().size();

}

});

Putting it Together

Take control of consistency

If you do need a high degree of consistency, use thresholds to trigger different behavior

- Bank account:

on values over $10,000, wait to here from all replicas

- Distributed Shopping Cart:

Show a confirmation page to verify order resolution

*** What is your appetite for risk?

Uniquely identify operations in the application

Facilitates idempotent behavior and out-of-order execution

Denormalization

The point of normalization is to avoid update anomalies

***But In an append-only system, we don't do updates

Summary

- Take advantage of strengths

- Look for idempotence and asynchronicity in your business processes

- If it's not in the API, you are probably doing it wrong

- Seek death is still possible if you model incorrectly

Questions

Nate [email protected]@zznate

Development Resources

Hector Documentation
http://hector-client.orgCassandra Maven Plugin
http://mojo.codehaus.org/cassandra-maven-plugin/CCM localhost cassandra cluster
https://github.com/pcmanus/ccmOpsCenter
http://www.datastax.com/products/opscenter

Cassandra AMIs
https://github.com/riptano/CassandraClusterAMI

Additional Resources

DataStax Documentation: http://www.datastax.com/docs/0.8/index

Apache Cassandra project wiki: http://wiki.apache.org/cassandra/

The Dynamo Paper

http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

P. Helland. Building on Quicksand

http://arxiv.org/pdf/0909.1788

P. Helland. Life Beyond Distributed Transactions

http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

S. Anand. Netflix's Transition to High-Availability Storage Systems

http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf

The Megastore Paper

http://research.google.com/pubs/archive/36971.pdf

Nyc summit intro_to_cassandra

Technology

Transcript of Nyc summit intro_to_cassandra