Strata west 2012_java_cassandra

Post on 29-Nov-2014

2.560 views 4 download

Tags:

description

 

Transcript of Strata west 2012_java_cassandra

1 CONFIDENTIAL |

NATE MCCALLSr. Software Developer

Building Applications With Apache Cassandra

2 CONFIDENTIAL |

What We’ll Cover

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

3 CONFIDENTIAL |

Requirements JDK 1.6 or greater

Apache Maven 3.0.2 or greater

Apache Cassandra 1.0.7 – DataStax community edition:

http://www.datastax.com/download/community/versions

IDE such as Eclipse or IntelliJ will be helpful but not necessary

Several thumb drives available (please share)

All source on GitHub: https://github.com/zznate/strata-west-2012

Getting Started

4 CONFIDENTIAL |

Learning by doing Looking at and writing code

Examples are constructed explicitly to show off certain concepts

Move ahead if it gets slow – just start hacking

You must be comfortable writing and debugging software

How We’ll Cover It

5 CONFIDENTIAL |

It does not have to be hard.

Getting Down To It

6 CONFIDENTIAL |

Getting Down To It

7 CONFIDENTIAL |

It does not have to be mysterious.

Getting Down To It

8 CONFIDENTIAL |

Getting Down To It

9 CONFIDENTIAL |

You can leverage a mature language with stable clients against a proven, best of breed solution in use at high-traffic production environments right now

Getting Down To It

10 CONFIDENTIAL |

What We’ll Cover

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

11 CONFIDENTIAL |

Scale Out. But Really Though.

Best of Breed Linear scaling Real multi-datacenter support “Fix it on Monday” fault tolerance

12 CONFIDENTIAL |

Static Column Family

GOOG Price:589.55 Name=Google

APPL Price=401.76 Name=Apple

NFLX Price=78.73 Nam=Netflix

NOK Price=6.90 Name=Nokia Exchange=NYSE

Schema Optional Not all columns are required

13 CONFIDENTIAL |

Dynamic Column Family

GOOG 10/25/11=583.16 10/24/11=596.42 10/23/11=590.49

APPL 10/25/11=397.77 10/24/11=405.77 10/23/11=392.87

NFLX 10/25/11=77.37 10/24/11=118.14 10/23/11=117.23

NOK 10/25/11=6.71 10/24/11=6.76 10/23/11=6.61

Prematerialized Queries Store it how you read it

14 CONFIDENTIAL |

The API

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

15 CONFIDENTIAL |

Starting up If you didn’t look before hand: http

://www.datastax.com/docs/1.0/getting_started/index

We want to run the Cassandra process in the foreground to see what’s going on:

cd $CASSANDRA_HOME

/bin/cassandra -f

Common API Usage

16 CONFIDENTIAL |

DataStax OpsCenter

If you are not sure why you should have monitoring, have this running at all times.

http://www.datastax.com/docs/opscenter/index

Common API Usage

17 CONFIDENTIAL |

Static Column Families See org.apache.tutorial.BasicUsageExample

Common API Usage

18 CONFIDENTIAL |

Dynamic Column Families See org.apache.tutorial.TimeseriesInserter

– A Cassandra row can hold up to 2 billion columns

Common API Usage

19 CONFIDENTIAL |

Dynamic Column Families See org.apache.tutorial.TimeseriesIterationQuery

– Encapsulate paging in iteration for easier traversal of wide rows

Common API Usage

20 CONFIDENTIAL |

Using CQL See comments in class files as we go

– Use cqlsh for queries, some administration tasks– Caveat: no composites or super column support

Common API Usage

21 CONFIDENTIAL |

JdbcTemplate Some compiling required

– Not quite there on the typing support– Pooling library needs work– Give this a try if you want: https://github.com/riptano/jdbc-conn-pool

Specifically:– https://github.com/riptano/jdbc-conn-pool/tree/master/portfolio-example

Common API Usage

22 CONFIDENTIAL |

JdbcTemplate Configuration via ResourceRef

Common API Usage

23 CONFIDENTIAL |

JdbcTemplate Configuration via Context

Common API Usage

24 CONFIDENTIAL |

JdbcTemplate Insertion

Common API Usage

25 CONFIDENTIAL |

JdbcTemplate Selection

Common API Usage

26 CONFIDENTIAL |

Storage and On-Disk Structure

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

27 CONFIDENTIAL |

Merge-On-Read

On-disk structure is immutable

No read-before-write Highest timestamp wins Delete markers (“tombstones”)

thrown out on merge

Benefits

28 CONFIDENTIAL |

Compaction

Merge SSTables

Keeps SSTable count down Makes merge-on-read process

more efficient Groups rows into single SSTable Can be vary on workload

Size-Tiered compaction Leveled compaction

Benefits

29 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeDataLoader

– Store a static index in a single row

Common API Usage

30 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Use slice of composites to narrow in on query

Common API Usage

31 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Let’s add another level to the composite

Common API Usage

32 CONFIDENTIAL |

Indexing Techniques See org.apache.tutorial.CompositeQuery

– Add a third level to composite to narrow search to “cities in California starting with “Ag”

Common API Usage

33 CONFIDENTIAL |

Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesInserter

– Uses buckets for granularity

Every minute gets a distinct row 2012_02_28_13_30

Common API Usage

34 CONFIDENTIAL |

Revisiting the Time Series Example See org.apache.tutorial.BucketingTimeSeriesQuery

– More advanced slicing examples– Keys can be rebuilt for any time window– Keep rows grouped tightly on disk

I need the 30 minutes between 3 and 4pm for every day last week

Storage Model

35 CONFIDENTIAL |

Tombstones See org.apache.tutorial.TombstoneDemoInserter and

TombstoneDemoQuery

Storage Model

36 CONFIDENTIAL |

Tombstone

Output before deletion

37 CONFIDENTIAL |

Tombstone

Output after deletion

38 CONFIDENTIAL |

Understanding the Ring and Consistency

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

39 CONFIDENTIAL |

The Ring

Lexigraphically similar tokens are hashed to (very) different values

Provides for shared knowledge of key location

The actual token range is from 0 to 2^128

The token is created by converting an MD5 hash of the key to a java.lang.BigInteger

Token Distribution Distributed Hashing

40 CONFIDENTIAL |

The Ring

The next token after the highest possible value is the lowest possible value.

Token Distribution as a Ring Wrapping Ranges

41 CONFIDENTIAL |

The Ring

Nodes distribute ownership via Token ranges

A node owns it’s token and the range immediately before

Nodes continuously “gossip” ring ownership

Any node can act as a coordinator to service requests for any other node

4 Node Token Distribution Simplified Ring Example

“foo”

42 CONFIDENTIAL |

The Ring

Initial Token First Token Last Token

Node 1 0 76 0

Node 2 25 1 25

Node 3 50 26 50

Node 4 75 51 75

Inclusive token ranges for a four node cluster

43 CONFIDENTIAL |

Integrating with Web Applicaitons

Cassandra Basics

Common API Usage

Storage Model

Ring Overview

Web Application Integration

44 CONFIDENTIAL |

Using Spring AccountController and AccountDao

– Similar to JDBC example for wiring

Web Application Integration

45 CONFIDENTIAL |

Probably as far as we’ll get…

DataStax Documentation: http://www.datastax.com/docs/1.0/index

Apache Cassandra project wiki: http://wiki.apache.org/cassandra/

“The Dynamo Paper”: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

P. Helland. Building on Quicksand: http://arxiv.org/pdf/0909.1788

P. Helland. Life Beyond Distributed Transactions: http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

“The Megastore Paper”: http://research.google.com/pubs/archive/36971.pdf

The Hector Client: http://hector-client.org

Web Application Integration