Things YouShould Be Doing When Using Cassandra Drivers

Post on 18-Jul-2015

287 views 2 download

Transcript of Things YouShould Be Doing When Using Cassandra Drivers

Things You Should Be Doing When Using Cassandra Drivers

Rebecca Mills Junior Evangelist at Datastax @rebccamills

What do I do?

2 Confidential

•  Try to create awareness for open source Cassandra

•  Develop content

•  Identify problems newcomers might be encountering

•  Develop strategies and material to help with that first ease of initial use

Of course all this extends to drivers!

Confidential 3

•  Learning and playing with the drivers as much as I can

•  Develop “Getting Started” tutorials for drivers in various programming languages

•  Making it my mission to bring the details to light

So How Can We Communicate with Cassandra in “X” Language?

Confidential 4

We have what you need!

Confidential 5

•  Datastax provides drivers for Java, Python, C#

•  Fresh out of the oven Ruby, Node.js, and C++

•  Also loads of open source drivers to chose from

•  Check out the Planet Cassandra Client Drivers section

Confidential 6

Let’s get into some of the basics of smart Cassandra driver usage:

1. One Cluster instance per cluster

Confidential 7

•  Configure different important aspects of the way connections and queries will be handled.

•  Contact points •  Retry Policies •  Load Balancing Policies

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))  

2. One Session per keyspace

Confidential 8

•  Query execution, connection pooling •  Long-lived object •  Not to be used in a request/response short-lived

fashion •  Share the same cluster and session instances

across your application

Cluster & Session

Confidential 9

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))    session  =  cluster.connect('demo')  

3. Use Prepared Statements

Confidential 10

•  If you execute a statement more than once

•  Has multiple benefits

•  Prepare once, bind and execute multiple times

•  We’ll talk more about this soon!

Confidential 11

Cool

Useful

Confidential 12

Deep Dives:

Confidential 13

•  Prepared Statements •  Load Balancing Policies •  Retry Policies •  Connection Pooling

•  Async API

Why use Prepared Statements?

Confidential 14

•  More performant than using strings •  Will be parsed only once on the server •  We expect you to use them with repeated queries in

production •  Avoid CQL injection

Prepared Statements

Confidential 15

Consider a string session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Jones’,  35,  ‘Austin’,  ‘bob@example.com’,  ‘Bob’)  

 

"""

Prepared Statements

Confidential 16

session.execute("""  

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Smith’,  24,  ‘Tampa’,  ‘ken@example.com’,  ‘Bob’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Power’,  45,  ‘New  York’,  ‘kate@example.com’,  ‘Kate’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Renolds’,  33,  ‘Miami’,  ‘carl@example.com’,  ‘Carl’)  

 

""")  

Prepared Statements

Confidential 17

Now the same, as a prepared statement  

Prepared_stmt  =  session.prepare  (“INSERT  INTO  users  (lastname,  age,  city,  email,                        firstname)  VALUES  (?,  ?,  ?,  ?,  ?)”)  

Bound_stmt  =  prepared.bind([‘Jones’,  35,  ‘Austin’,  ‘bob@example.com’,  ‘Bob’])  

Stmt  =  session.execute(bound_stmt)      

What’s the difference?

Confidential 18

Prepared Statements

Confidential 19

Client Cassandra Entire Query String

Client Cassandra Query ID & Bound Values

INSERT with strings

INSERT with PreparedStatements

Large amount of data Parse cost

Smaller amount of data No parsing

So what does that mean to me?

Confidential 20

Speed!

Confidential 21

Prepared Statements

Confidential 22

http://techblog.netflix.com/2013/12/astyanax-update.html

Prepared Statements

Confidential 23

Putting a prepared statement in a for loop is an anti-pattern  for  (int  i;  i  <  10;  i++)  {      PreparedStatement  ps  =  session.prepare("UPDATE  user  SET  disabled  =  1  WHERE  id  =  ?");  

           session.execute(ps.bind(i));  }  

Load Balancing

Confidential 24

•  A load balancing policy will determine which node to run an insert or query.

•  Since a client can read or write to any node, sometimes that can be inefficient.

•  If a node receives a read or write owned on another node, it will coordinate that request for the client.

•  We can use a load balancing policy to control that action.

Load Balancing deep dive

Confidential 25

Using this example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!

Example data model

Confidential 26

CREATE TABLE users (!

username text PRIMARY KEY!

firstName text,!

lastName text!

);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘rmills’, ‘Rebecca’, ‘Mills’);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘pmcfadin’, ‘Patrick’, ‘McFadin’);!

!

Discover cluster

Confidential 27

Client .addContactPoint(“10.0.0.1”)!

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

RF=3

Populate connection pool

Confidential 28

10.0.0.1 00-25

Client

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC1!

DC1!

Request for data

Confidential 29

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

rmills Murmur3 Hash Token = 15!

DC1!

Token Aware

Confidential 30

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

withLoadBalancingPolicy(! new TokenAwarePolicy(!

DC1!

Token Aware

Confidential 31

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1! Which node?

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware

Confidential 32

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware

Confidential 33

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware

Confidential 34

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Token Aware

Confidential 35

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware - Retry

Confidential 36

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Retry Timeout

Without Token Aware

Confidential 37

Using this modified example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Request for data

Confidential 38

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

pmcfadin Murmur3 Hash Token = 77!

DC1!

No Token Aware

Confidential 39

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Data placement

Confidential 40

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Standard Round Robin

Confidential 41

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50 Coordinate

Load Balancing

Confidential 42

•  Default pre-java 2.0.2: RoundRobinPolicy •  Now: TokenAwarePolicy – Adds token awareness to

a child policy •  Acts as a filter, wraps around another policy •  Used to reduce network hops, as only replicas will

be considered

Load Balancing - Whitelist

Confidential 43

•  Ensures only the hosts from a provided list are used

•  Wraps a child policy

•  Used to limit the effects of automatic peer discovery

•  Execute queries only a given list of hosts

Asynchronous Statements

Confidential 44

•  Native binary protocol supports request pipelining

•  A single connection can be used for single simultaneous and independent request/response exchanges

Asynchronous Statements

Confidential 45

•  Don’t have to wait for a query to complete and return rows directly, non-blocking IO

•  Method almost immediately returns a future  object

Node Client

Asynchronous Statements

Confidential 46

query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  future  =  session.execute_async(query,  [lastname])    #  ...  do  some  other  work    try:          rows  =  future.result()          user  =  rows[0]          print  user.name,  user.age  except  ReadTimeout:          log.exception("Query  timed  out:")  

Asynchronous Statements

Confidential 47

 #  build  a  list  of  futures  futures  =  []  query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  for  user_id  in  ids_to_fetch:          futures.append(session.execute_async(query,  [lastname])    #  wait  for  them  to  complete  and  use  the  results  for  future  in  futures:          rows  =  future.result()          print  rows[0].name,  rows[0].age  

Where can I download the drivers?

Confidential 48

Planet Cassandra

Confidential 49

•  A great place for Apache Cassandra resources!

•  Blog post, webinars, tutorials, and much much more!

•  Also a great place for your driver needs

Confidential 50

Confidential 51

Thank You!Twitter: @rebccamills

Confidential 52