2 Dundee - Cassandra-3

40
©2013 DataStax Confidential. Do not distribute without consent. @chbatey Christopher Batey Cassandra 2.2 and 3.0

Transcript of 2 Dundee - Cassandra-3

©2013 DataStax Confidential. Do not distribute without consent.

@chbateyChristopher Batey

Cassandra 2.2 and 3.0

@chbatey

First comes a blog• Each new feature has a vastly more detailed blog post:

http://christopher-batey.blogspot.co.uk/

@chbatey

Were did 2.2 come from?

@chbatey

New features• 2.2- JSON- User defined functions- User defined aggregates- Role based authentication- The small print• 3.0- New storage engine- Materialised views

@chbatey

Hello JSON• create TABLE user (username text primary key,

first_name text , last_name text , emails set<text> , country text);• INSERT INTO user JSON '{"username": "chbatey",

"first_name":"Christopher", "last_name": "Batey", “emails":["[email protected]"]}';

@chbatey

Goodbye JSON

@chbatey

JSON + User Defined Types• CREATE TYPE movie (title text, time timestamp,

description text);• ALTER TABLE user ADD movies set<frozen<movie>>;• UPDATE user SET movies = { { title:'Batman',

time:'2011-02-03T04:05:00+0000', description: 'This film rocks' } } where username = 'chbatey';

@chbatey

Out it comes

@chbatey

Cassandra HTTP Wrapper?

@RequestMapping(method = {RequestMethod.POST}, value = "/{keyspace}/{table}", consumes = "application/json") public ResponseEntity<String> store(@PathVariable String keyspace, @PathVariable String table, @RequestBody String body) { session.execute(String.format("insert into %s.%s JSON '%s'", keyspace, table, body)); return ResponseEntity.ok("OK");}

Keyspace Table

Raw JSON

curl --header "Content-Type: application/json" -X POST -v "localhost:8080/twotwo/user" --data '{"username": "trev2", "country": null, "emails": ["[email protected]", "[email protected]"], "first_name": "trevor", "last_name": "bunting", "movies": null}'

@chbatey

User defined functions• Run code on the server !Dangerous!• Java + JavaScript supported out of the box

@chbatey

UDF exampleCREATE TABLE user ( username text primary key, first_name text , last_name text , emails set<text> , country text);

@chbatey

Custom name

CREATE FUNCTION name ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘ return first_name + " " + last_name; ‘;

cqlsh:twotwo> select name(first_name, last_name) FROM user;

twotwo.name(first_name, last_name)------------------------------------ Christopher Batey

@chbatey

User defined aggregatesCREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0);

Called for every row state passed between

Initial state

Return type (CQL)

Optional function called onfinal state

@chbatey

State functionCREATE FUNCTION averageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (val != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; ';

Type Columns

@chbatey

Final functionCREATE FUNCTION averageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' double r = 0; if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); ';

Type

@chbatey

Putting it all together

@chbatey

Customer events

CREATE AGGREGATE count_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {};

CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ' ;

@chbatey

Customer events

@chbatey

Built in aggregates• count• max• min• avg• sum

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java

@chbatey

Built in time functions

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java

@chbatey

Built in aggregates in action

@chbatey

“Materialised views” with Spark

@chbatey

Pure C*

@chbatey

Small print• New types- smallint - short- tinyint - byte- date - time• Warnings now sent back to client- batch too large

@chbatey

Time

@chbatey

Materialsed views• Designed to stop *you* having to duplicate• Do we need a secondary index primer?

@chbatey

Customer events tableCREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

create INDEX on customer_events (staff_id) ;

@chbatey

Indexes to the rescue?customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

@chbatey

Secondary index are local • The staff_id partition in the secondary index is not

distributed like a normal table• The secondary index entries are only stored on the node

that contains the customer_id partition

@chbatey

Indexes to the rescue?

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbatey

staff_id customer_idbill rustybill rustytrevor rusty

A B

chbatey rusty

customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

customer_events tablestaff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

staff_id index

@chbatey

Do it your self indexCREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))

@chbatey

KillrWeather data model

@chbatey

KillrWeather data model

@chbatey

KillrWeather data model

@chbatey

KillrWeather data modelINSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station1', 2012, 12, 25, 1, 'GB', 'Cumbria', 14.0, 20) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station2', 2012, 12, 25, 1, 'GB', 'Cumbria', 4.0, 2) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station3', 2012, 12, 25, 1, 'GB', 'Greater London', 16.0, 10) ;

@chbatey

Querying by state?

@chbatey

Fine print• Primary key columns + one other in your MV primary key• Un-used Primary key columns are added to the end of

your MV PK• If the part of your primary key is NULL then it won't

appear in the materialised view• This is not free!

@chbatey

Combining aggregates + MVs

@chbatey

Including the month

@chbatey

Conclusions• We still denormalise and duplicate to achieve scalability

and performance• We just let C* do it for us :)