Strata west 2012_java_cassandra

46
1 CONFIDENTIAL | NATE MCCALL Sr. Software Developer Building Applications With Apache Cassandra

description

 

Transcript of Strata west 2012_java_cassandra

Page 1: Strata west 2012_java_cassandra

1 CONFIDENTIAL |

NATE MCCALLSr. Software Developer

Building Applications With Apache Cassandra

Page 2: Strata west 2012_java_cassandra

2 CONFIDENTIAL |

What We’ll Cover

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 3: Strata west 2012_java_cassandra

3 CONFIDENTIAL |

Requirements JDK 1.6 or greater

Apache Maven 3.0.2 or greater

Apache Cassandra 1.0.7 – DataStax community edition:

http://www.datastax.com/download/community/versions

IDE such as Eclipse or IntelliJ will be helpful but not necessary

Several thumb drives available (please share)

All source on GitHub: https://github.com/zznate/strata-west-2012

Getting Started

Page 4: Strata west 2012_java_cassandra

4 CONFIDENTIAL |

Learning by doing Looking at and writing code

Examples are constructed explicitly to show off certain concepts

Move ahead if it gets slow – just start hacking

You must be comfortable writing and debugging software

How We’ll Cover It

Page 5: Strata west 2012_java_cassandra

5 CONFIDENTIAL |

It does not have to be hard.

Getting Down To It

Page 6: Strata west 2012_java_cassandra

6 CONFIDENTIAL |

Getting Down To It

Page 7: Strata west 2012_java_cassandra

7 CONFIDENTIAL |

It does not have to be mysterious.

Getting Down To It

Page 8: Strata west 2012_java_cassandra

8 CONFIDENTIAL |

Getting Down To It

Page 9: Strata west 2012_java_cassandra

9 CONFIDENTIAL |

You can leverage a mature language with stable clients against a proven, best of breed solution in use at high-traffic production environments right now

Getting Down To It

Page 10: Strata west 2012_java_cassandra

10 CONFIDENTIAL |

What We’ll Cover

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 11: Strata west 2012_java_cassandra

11 CONFIDENTIAL |

Scale Out. But Really Though.

Best of Breed Linear scaling Real multi-datacenter support “Fix it on Monday” fault tolerance

Page 12: Strata west 2012_java_cassandra

12 CONFIDENTIAL |

Static Column Family

GOOG Price:589.55 Name=Google

APPL Price=401.76 Name=Apple

NFLX Price=78.73 Nam=Netflix

NOK Price=6.90 Name=Nokia Exchange=NYSE

Schema Optional Not all columns are required

Page 13: Strata west 2012_java_cassandra

13 CONFIDENTIAL |

Dynamic Column Family

GOOG 10/25/11=583.16 10/24/11=596.42 10/23/11=590.49

APPL 10/25/11=397.77 10/24/11=405.77 10/23/11=392.87

NFLX 10/25/11=77.37 10/24/11=118.14 10/23/11=117.23

NOK 10/25/11=6.71 10/24/11=6.76 10/23/11=6.61

Prematerialized Queries Store it how you read it

Page 14: Strata west 2012_java_cassandra

14 CONFIDENTIAL |

The API

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 15: Strata west 2012_java_cassandra

15 CONFIDENTIAL |

Starting up If you didn’t look before hand: http

://www.datastax.com/docs/1.0/getting_started/index

We want to run the Cassandra process in the foreground to see what’s going on:

cd $CASSANDRA_HOME

/bin/cassandra -f

Common API Usage

Page 16: Strata west 2012_java_cassandra

16 CONFIDENTIAL |

DataStax OpsCenter

If you are not sure why you should have monitoring, have this running at all times.

http://www.datastax.com/docs/opscenter/index

Common API Usage

Page 17: Strata west 2012_java_cassandra

17 CONFIDENTIAL |

Static Column Families See org.apache.tutorial.BasicUsageExample

Common API Usage

Page 18: Strata west 2012_java_cassandra

18 CONFIDENTIAL |

Dynamic Column Families See org.apache.tutorial.TimeseriesInserter

– A Cassandra row can hold up to 2 billion columns

Common API Usage

Page 19: Strata west 2012_java_cassandra

19 CONFIDENTIAL |

Dynamic Column Families See org.apache.tutorial.TimeseriesIterationQuery

– Encapsulate paging in iteration for easier traversal of wide rows

Common API Usage

Page 20: Strata west 2012_java_cassandra

20 CONFIDENTIAL |

Using CQL See comments in class files as we go

– Use cqlsh for queries, some administration tasks– Caveat: no composites or super column support

Common API Usage

Page 21: Strata west 2012_java_cassandra

21 CONFIDENTIAL |

JdbcTemplate Some compiling required

– Not quite there on the typing support– Pooling library needs work– Give this a try if you want: https://github.com/riptano/jdbc-conn-pool

Specifically:– https://github.com/riptano/jdbc-conn-pool/tree/master/portfolio-example

Common API Usage

Page 22: Strata west 2012_java_cassandra

22 CONFIDENTIAL |

JdbcTemplate Configuration via ResourceRef

Common API Usage

Page 23: Strata west 2012_java_cassandra

23 CONFIDENTIAL |

JdbcTemplate Configuration via Context

Common API Usage

Page 24: Strata west 2012_java_cassandra

24 CONFIDENTIAL |

JdbcTemplate Insertion

Common API Usage

Page 25: Strata west 2012_java_cassandra

25 CONFIDENTIAL |

JdbcTemplate Selection

Common API Usage

Page 26: Strata west 2012_java_cassandra

26 CONFIDENTIAL |

Storage and On-Disk Structure

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 27: Strata west 2012_java_cassandra

27 CONFIDENTIAL |

Merge-On-Read

On-disk structure is immutable

No read-before-write Highest timestamp wins Delete markers (“tombstones”)

thrown out on merge

Benefits

Page 28: Strata west 2012_java_cassandra

28 CONFIDENTIAL |

Compaction

Merge SSTables

Keeps SSTable count down Makes merge-on-read process

more efficient Groups rows into single SSTable Can be vary on workload

Size-Tiered compaction Leveled compaction

Benefits

Page 29: Strata west 2012_java_cassandra

29 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeDataLoader

– Store a static index in a single row

Common API Usage

Page 30: Strata west 2012_java_cassandra

30 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Use slice of composites to narrow in on query

Common API Usage

Page 31: Strata west 2012_java_cassandra

31 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Let’s add another level to the composite

Common API Usage

Page 32: Strata west 2012_java_cassandra

32 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Add a third level to composite to narrow search to “cities in California starting with “Ag”

Common API Usage

Page 33: Strata west 2012_java_cassandra

33 CONFIDENTIAL |

Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesInserter

– Uses buckets for granularity

Every minute gets a distinct row 2012_02_28_13_30

Common API Usage

Page 34: Strata west 2012_java_cassandra

34 CONFIDENTIAL |

Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesQuery

– More advanced slicing examples– Keys can be rebuilt for any time window– Keep rows grouped tightly on disk

I need the 30 minutes between 3 and 4pm for every day last week

Storage Model

Page 35: Strata west 2012_java_cassandra

35 CONFIDENTIAL |

Tombstones See org.apache.tutorial.TombstoneDemoInserter and

TombstoneDemoQuery

Storage Model

Page 36: Strata west 2012_java_cassandra

36 CONFIDENTIAL |

Tombstone

Output before deletion

Page 37: Strata west 2012_java_cassandra

37 CONFIDENTIAL |

Tombstone

Output after deletion

Page 38: Strata west 2012_java_cassandra

38 CONFIDENTIAL |

Understanding the Ring and Consistency

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 39: Strata west 2012_java_cassandra

39 CONFIDENTIAL |

The Ring

Lexigraphically similar tokens are hashed to (very) different values

Provides for shared knowledge of key location

The actual token range is from 0 to 2^128

The token is created by converting an MD5 hash of the key to a java.lang.BigInteger

Token Distribution Distributed Hashing

Page 40: Strata west 2012_java_cassandra

40 CONFIDENTIAL |

The Ring

The next token after the highest possible value is the lowest possible value.

Token Distribution as a Ring Wrapping Ranges

Page 41: Strata west 2012_java_cassandra

41 CONFIDENTIAL |

The Ring

Nodes distribute ownership via Token ranges

A node owns it’s token and the range immediately before

Nodes continuously “gossip” ring ownership

Any node can act as a coordinator to service requests for any other node

4 Node Token Distribution Simplified Ring Example

“foo”

Page 42: Strata west 2012_java_cassandra

42 CONFIDENTIAL |

The Ring

Initial Token First Token Last Token

Node 1 0 76 0

Node 2 25 1 25

Node 3 50 26 50

Node 4 75 51 75

Inclusive token ranges for a four node cluster

Page 43: Strata west 2012_java_cassandra

43 CONFIDENTIAL |

Integrating with Web Applicaitons

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

Page 44: Strata west 2012_java_cassandra

44 CONFIDENTIAL |

Using Spring AccountController and AccountDao

– Similar to JDBC example for wiring

Web Application Integration

Page 45: Strata west 2012_java_cassandra

45 CONFIDENTIAL |

Probably as far as we’ll get…

DataStax Documentation: http://www.datastax.com/docs/1.0/index

Apache Cassandra project wiki: http://wiki.apache.org/cassandra/

“The Dynamo Paper”: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

P. Helland. Building on Quicksand: http://arxiv.org/pdf/0909.1788

P. Helland. Life Beyond Distributed Transactions: http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

“The Megastore Paper”: http://research.google.com/pubs/archive/36971.pdf

The Hector Client: http://hector-client.org

Web Application Integration