Paris Cassandra Meetup - Cassandra for Developers

Post on 26-May-2015

807 views 6 download

Tags:

Transcript of Paris Cassandra Meetup - Cassandra for Developers

Cassandra for DevelopersDataStax Drivers in Practice

Michaël FiguièreDrivers & Developer Tools Architect

@mfiguiere

© 2014 DataStax, All Rights Reserved.

Cassandra Peer to Peer Architecture

2

Node

Node Node

Node

NodeNode

Each node contains a replica of some partitions of tables

Every node have the same role, there’s no Master or Slave

© 2014 DataStax, All Rights Reserved.

Cassandra Peer to Peer Architecture

3

Node

Node Replica

Replica

ReplicaNode

Each partition is stored in several Replicas to ensure durability and high availability

© 2014 DataStax, All Rights Reserved.

Client / Server Communication

4

Client

Client

Client

Client

Node

Node Replica

Replica

ReplicaNode

Coordinator node:Forwards all R/W requeststo corresponding replicas

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

5

3 replicas

A A A

Time

5

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

66

Write and wait for acknowledge from one node

Write ‘B’

B A A

Time

A A A

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

77

Write and wait for acknowledge from one node

Write ‘B’

B A A

Time

A A A

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

88

R + W < N

Read waiting for one node to answer

B A A

8

B A A

A A A

Write and wait for acknowledge from one node

Time

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

9

R + W = N

B B A

B A

A A A

B

Write and wait for acknowledges from two nodes

Read waiting for one node to answer

Time

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

10

R + W > N

B A

B A

A A A

B

B

Write and wait for acknowledges from two nodes

Read waiting for two nodes to answer

Time

© 2014 DataStax, All Rights Reserved.

Tunable Consistency

11

R = W = QUORUM

B A

B A

A A A

B

B

Time

QUORUM = (N / 2) + 1

© 2014 DataStax, All Rights Reserved.

Cassandra Query Language (CQL)

• Similar to SQL, mostly a subset• Without joins, sub-queries, and aggregations• Primary Key contains:

• A Partition Key used to select the partition that will store the Row

• Some Clustering Columns, used to define how Rows should be grouped and sorted on the disk

• Support Collections• Support User Defined Types (UDT)

12

© 2014 DataStax, All Rights Reserved. 13

CQL: Create Table

CREATE TABLE users ( login text, name text, age int, …PRIMARY KEY (login));

login is the partition key, it will be hashed and rows will be spread over the cluster on different partitions

Just like in SQL!

© 2014 DataStax, All Rights Reserved. 14

CQL: Clustered Table

CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id));

message_id is a clustering column, it means that all the rows with a same login will be grouped and sorted by message_id on the disk

A TimeUUID is a UUID that can be sorted chronologically

© 2014 DataStax, All Rights Reserved. 15

CQL: Queries

SELECT * FROM mailboxWHERE login = jdoeAND message_id = '2014-09-25 16:00:00';

Get message by user and message_id (date)

SELECT * FROM mailbox WHERE login = jdoeAND message_id <= '2014-09-25 16:00:00'AND message_id >= '2014-09-20 16:00:00';

Get message by user and date interval

WHERE clauses can only be constraints on the primary key and range queries are not possible on the partition key

© 2014 DataStax, All Rights Reserved. 16

CQL: Collections

CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY (login)); It’s not possible to use nested

collections… yet

set and list have a similar semantic as in Java

© 2014 DataStax, All Rights Reserved. 17

Cassandra 2.1: User Defined Type (UDT)

CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, …PRIMARY KEY(login));

CREATE TYPE address ( street_number int, street_name text, postcode int, country text);

CREATE TABLE users ( login text, … location frozen<address>, … PRIMARY KEY(login));

© 2014 DataStax, All Rights Reserved. 18

Cassandra 2.1: UDT Insert / Update

INSERT INTO users(login,name, location) VALUES ('jdoe','John DOE', { 'street_number': 124, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' });

UPDATE users SET location = { 'street_number': 125, 'street_name': 'Congress Avenue', 'postcode': 95054, 'country': 'USA' } WHERE login = jdoe;

© 2014 DataStax, All Rights Reserved.

Client / Server Communication

19

Client

Client

Client

Client

Node

Node Replica

Replica

ReplicaNode

Coordinator node:Forwards all R/W requeststo corresponding replicas

© 2014 DataStax, All Rights Reserved.

Request Pipelining

20

Client

WithoutRequest Pipelining

Cassandra

Client CassandraWith

Request Pipelining

© 2014 DataStax, All Rights Reserved.

Notifications

21

Client

WithoutNotifications

WithNotifications

NodeNode

Node

Client

NodeNode

Node

© 2014 DataStax, All Rights Reserved.

Asynchronous Driver Architecture

22

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

Driver

© 2014 DataStax, All Rights Reserved.

Asynchronous Driver Architecture

23

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

6

23

45

1

Driver

© 2014 DataStax, All Rights Reserved.

Failover

24

ClientThread

Node

Node

Node

ClientThread

ClientThread

Node

7

2

4

531

Driver

6

© 2014 DataStax, All Rights Reserved.

DataStax Drivers Highlights

• Asynchronous architecture using Non Blocking IOs• Prepared Statements Support• Automatic Failover• Node Discovery• Tunable Load Balancing

• Round Robin, Latency Awareness, Multi Data Centers, Replica Awareness

• Cassandra Tracing Support• Compression & SSL

25

© 2014 DataStax, All Rights Reserved.

DataCenter Aware Balancing

26

Node

Node

NodeClient

Datacenter B

Node

Node

Node

Client

Client

Client

Client

Client

Datacenter A

Local nodes are queried first, if non are available, the request could be sent to a remote node.

© 2014 DataStax, All Rights Reserved.

Token Aware Balancing

27

Nodes that own a Replica of the PK being read or written by the query will be contacted first.

Node

Node

ReplicaNode

Client

Replica

Replica

Partition Key will be inferred from Prepared Statements metadata

© 2014 DataStax, All Rights Reserved.

State of DataStax Drivers

28

Cassandra1.2

Cassandra2.0

Cassandra2.1

Java 1.0 - 2.1 2.0 - 2.1 2.1

Python 1.0 - 2.1 2.0 - 2.1 2.1

C# 1.0 - 2.1 2.0 - 2.1 2.1

Node.js 1.0 1.0 Later

C++ 1.0-beta4 1.0-beta4 Later

Ruby 1.0-beta3 1.0-beta3 Later

Later versions of Cassandra can use earlier Drivers, but some features won’t be supported

© 2014 DataStax, All Rights Reserved. 29

DataStax Driver in Practice

<dependency>  <groupId>com.datastax.cassandra</groupId>  <artifactId>cassandra-­‐driver-­‐core</artifactId>  <version>2.1.0</version>  

</dependency>  

Java

$  pip  install  cassandra-­‐driver

Python

PM>  Install-­‐Package  CassandraCSharpDriver

C#

gem  install  cassandra-­‐driver  -­‐-­‐pre

Ruby

$  npm  install  cassandra-­‐driver

Node.js

© 2014 DataStax, All Rights Reserved. 30

Connect and Write

Cluster cluster = Cluster.builder() .addContactPoints("10.1.2.5", "cassandra_node3") .build();

Session session = cluster.connect(“my_keyspace");

session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', 'john@doe.com')");

The rest of the nodes will be discovered by the driver

A keyspace is just like a schema in the SQL world

© 2014 DataStax, All Rights Reserved. 31

Read

ResultSet resultSet = session.execute( "SELECT * FROM user WHERE user_id IN (1,8,13)");

List<Row> rows = resultSet.all(); for (Row row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

Actually ResultSet also implements Iterable<Row>

Session is a thread safe object. A singleton should be instantiated at startup

© 2014 DataStax, All Rights Reserved. 32

Write with Prepared Statements

PreparedStatement insertUser = session.prepare( "INSERT INTO user (user_id, name, email) VALUES (?, ?, ?)");

BoundStatement statement = insertUser .bind(12345, "johndoe", "john@doe.com") .setConsistencyLevel(ConsistencyLevel.QUORUM);

session.execute(statement);

Parameters can be named as well

PreparedStatement objects are also threadsafe, just create a singleton at startup

BoundStatement is a stateful, NON threadsafe object

Consistency Level can be set for each statement

© 2014 DataStax, All Rights Reserved. 33

Asynchronous Read

ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");

ResultSet resultSet = future.get();

List<Row> rows = resultSet.all(); for (Row row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

Will not block. Returns immediately

Will block until less all the connections are busy

© 2014 DataStax, All Rights Reserved. 34

Asynchronous Read with Callbacks

ResultSetFuture future = session.executeAsync( "SELECT * FROM user WHERE user_id IN (1,2,3)");

future.addListener(new Runnable() { public void run() { // Process the results here }}, executor);

ResultSetFuture implements Guava’s ListenableFuture

executor = Executors .newCachedThreadPool();

executor = MoreExecutors .sameThreadExecutor();

Only if your listener code is trivial and non blocking as it’ll be executed in the IO Thread

…Or any thread pool that you prefer

© 2014 DataStax, All Rights Reserved. 35

Query Builder

import staticcom.datastax.driver.core.querybuilder.QueryBuilder.*;

Statement selectAll = select().all().from("user").where(eq("user_id", userId));

session.execute(selectAll);

Statement insert = insertInto("user") .value("user_id", 2) .value("name", "johndoe") .value("email", "john@doe.com");

session.execute(insert);

import static of QueryBuilder is required in order to use the DSL

© 2014 DataStax, All Rights Reserved. 36

Python

cluster = Cluster(['10.1.1.3', '10.1.1.4', ’10.1.1.5'])session = cluster.connect('mykeyspace')

def handle_success(rows): user = rows[0] try: process_user(user.name, user.age, user.id) except Exception: log.error("Failed to process user %s", user.id) # don't re-raise errors in the callback

def handle_error(exception): log.error("Failed to fetch user info: %s", exception)

future = session.execute_async("SELECT * FROM users WHERE user_id=3")future.add_callbacks(handle_success, handle_error)

It’s also possible to retrieve the result from the future

object synchronously

© 2014 DataStax, All Rights Reserved. 37

C#

var cluster = Cluster.Builder() .AddContactPoints("host1", "host2", "host3") .Build();var session = cluster.Connect("sample_keyspace");

var task = session.ExecuteAsync(statement);task.ContinueWith((t) =>{ var rs = t.Result; foreach (var row in rs) { //Get the values from each row }}, TaskContinuationOptions.OnlyOnRanToCompletion);

Asynchronously execute a query using the TPL

© 2014 DataStax, All Rights Reserved. 38

C / C++

CassString query = cass_string_init("SELECT keyspace_name FROM system.schema_keyspaces;");CassStatement* statement = cass_statement_new(query, 0);

CassFuture* result_future = cass_session_execute(session, statement);

if (cass_future_error_code(result_future) == CASS_OK) { const CassResult* result = cass_future_get_result(result_future); CassIterator* rows = cass_iterator_from_result(result);

while (cass_iterator_next(rows)) { // Process results }

cass_result_free(result); cass_iterator_free(rows);}

cass_future_free(result_future);

Each structure must be freed with the appropriate function

© 2014 DataStax, All Rights Reserved. 39

Node.js

var cassandra = require('cassandra-driver');var client = new cassandra.Client({ contactPoints: ['host1', 'h2'], keyspace: 'ks1'});var query = 'SELECT email, last_name FROM user_profiles WHERE key=?';

client.execute(query, ['guy'], function(err, result) { assert.ifError(err); console.log('got user profile with email ' + result.rows[0].email);});

Here we’re using a Parameterized Statement, which is not prepared, but still allows parameters

© 2014 DataStax, All Rights Reserved. 40

Ruby

cluster = Cassandra.cluster

session = cluster.connect(‘system')

future = session.execute_async('SELECT * FROM schema_columnfamilies')

future.on_success do |rows| rows.each do |row| puts "The keyspace #{row['keyspace_name']} has a table called #{row['columnfamily_name']}" endend

future.join

Register a listener on the future, which will be called when results are available

© 2014 DataStax, All Rights Reserved.

Object Mapper

• Avoid boilerplate for common use cases

• Map Objects to Statements and ResultSets to Objects

• Do NOT hide Cassandra from the developer

• No “clever tricks” à la Hibernate

• Not JPA compatible, but JPA-ish API

41

© 2014 DataStax, All Rights Reserved. 42

Object Mapper in Practice

<dependency>  <groupId>com.datastax.cassandra</groupId>  <artifactId>cassandra-­‐driver-­‐mapping</artifactId>  <version>2.1.0</version>  

</dependency>  

Additional artifact for object mapping

Available from Driver 2.1.0

© 2014 DataStax, All Rights Reserved. 43

Basic Object Mapping

CREATE  TYPE  address  (          street  text,          city  text,          zip  int  );      CREATE  TABLE  users  (          email  text  PRIMARY  KEY,          address  address  );

@UDT(keyspace  =  "ks",  name  =  "address")  public  class  Address  {          private  String  street;          private  String  city;          private  int  zip;              //  getters  and  setters  omitted...  }      @Table(keyspace  =  "ks",  name  =  "users")  public  class  User  {          @PartitionKey          private  String  email;          private  Address  address;              //  getters  and  setters  omitted...  }  

© 2014 DataStax, All Rights Reserved. 44

Basic Object Mapping

MappingManager  manager  =          new  MappingManager(session);  

Mapper  mapper  =  manager.mapper(User.class);      UserProfile  myProfile  =            mapper.get("xyz@example.com");  

ListenableFuture  saveFuture  =          mapper.saveAsync(anotherProfile);  

mapper.delete("xyz@example.com");  

Mapper, just like Session, is a thread-safe object. Create a singleton at startup.

get() returns a mapped row for the given Primary Key

ListenableFuture from Guava. Completed when the write is acknowledged.

© 2014 DataStax, All Rights Reserved. 45

Accessors

UserAccessor  accessor  =          manager.createAccessor(UserAccessor.class);  Result<User>  users  =  accessor.firstN(10);  

for  (User  user  :  users)  {          System.out.println(                  profile.getAddress().getZip()          );  }  

Result is like ResultSet but specialized for a mapped class…

…so we iterate over it just like we would with a ResultSet

@Accessor  interface  UserAccessor  {          @Query("SELECT  *  FROM  user_profiles  LIMIT  :max")          Result<User>  firstN(@Param("max")  int  limit);  }

We’re Hiring!

@mfiguiere

Cassandra Tech Day - ParisNovember 4th

Cassandra Summit Europe - LondonDecember 3-4th