Cassandra Data Modeling

Cassandra Data Modeling WorkshopMatthew F. Dennis // @mdennis

Overview

● Hopefully interactive

● Use cases submitted via Google Moderator,

email, IRC, etc

● Interesting and/or common requests in the

slides to get us started

● Bring up others if you have them !

Data Modeling Goals

● Keep data queried together on disk together● In a more general sense think about the

efficiency of querying your data and work backward from there to a model in Cassandra

● Don't try to normalize your data (contrary to many use cases in relational databases)

● Usually better to keep a record that something happened as opposed to changing a value (not always advisable or possible)

ClickStream Data(use case #1)

● A ClickStream (in this context) is the sequence of actions a user of an application performs

● Usually this refers to clicking links in a WebApp● Useful for ad selection, error recording, UI/UX

improvement, A/B testing, debugging, et cetera● Not a lot of detail in the Google Moderator

request on what the purpose of collecting the ClickStream data was – so I made some up

ClickStream Data Defined

● Record actions of a user within a session for debugging purposes if app/browser/page/server crashes

Recording Sessions

● CF for sessions a user has had● Row Key is user name/id● Column Name is session id (TimeUUID)● Column Value is empty (or length of session, or some

aggregated details about the session after it ended)

● CF for actual sessions● Row Key is TimeUUID session id● Column Name is timestamp/TimeUUID of each click● Column Value is details about that click (serialized)

UserSessions Column Family

userIdSession_01(TimeUUID)

(empty/agg)

Session_02(TimeUUID)

(empty/agg) (empty/agg)

Session_03(TimeUUID)

● Most recent session● All sessions for a given time period

Sessions Column Family

● Retrieve entire session's ClickStream (row)

● Order of clicks/events preserved

● Retrieve ClickStream for a slice of time within the session

● First action taken in a session

● Most recent action taken in a session

● Why JSON/XML/etc?

SessionId(TimeUUID)

timestamp_01 timestamp_02 timestamp_03

ClickData(json/xml/etc)



Alternatives?

Of Course(depends on what you want to do)

● Secondary Indexes● All Sessions in one row● Track by time of activity instead of session

Secondary Indexes Applied

● Drop UserSessions CF and use secondary indexes

● Uses a “well known” column to record the user in the row; secondary index is created on that column

● Doesn't work so well when storing aggregates about sessions in the UserSessions CF

● Better when you want to retrieve all sessions a user has had

All Sessions In One Row Applied

● Row Key is userId● Column Name is composite of timestamp and

sessionId● Can efficiently request activity of a user across

all sessions within a specific time range● Rows could potentially grow quite large, be

careful● Reads will almost always require at least two

seeks on disk

● Row Key is composite of userId and time “bucket”● e.g. jan_2011 or jan_01_2011 for month or day buckets respectively

● Column Name is TimeUUID of click

● Column Value is serialized click data

● Avoids always requiring multiple seeks when the user has old data but only recent data is requested

● Easy to lazily aggregate old activity

● Can still efficiently request activity of a user across all sessions within a specific time range

Time Period Partitioning Applied

Rolling Time Window Of Data Points(use case #2)

● Similar to RRDTool was the example given● Essentially store a series of data points within a

rolling window● common request from Cassandra users for this

and/or similar

Data Points Defined

● Each data point has a value (or multiple values)● Each data point corresponds to a specific point

in time or an interval/bucket (e.g. 5 th minute of 17th hour on some date)

Time Window Model

● Row Key is the id of the time window data you are tracking (e.g. server7:render_time)

● Column Name is timestamp (or TimeUUID) the event occurred at

● Column Value is the value of the event (e.g. 0.051)

s7:rtTimeUUID0 TimeUUID1 TimeUUID2

0.051 0.014 0.173

System7:RenderTime

Some request took 0.014 seconds to render

The Details

● Cassandra TTL values are key here● When you insert each data point set the TTL to the max time

range you will ever request; there is very little overhead to expiring columns

● When querying, construct TimeUUIDs for the min/max of the time range in question and use them as the start/end in your get_slice call

● Consider partitioning the rows by a known time period (e.g. “year”) if you plan on keeping a long history of data (NB: requires slightly more complex logic in the app if a time range spans such a period)

● Very efficient queries for any window of time

Rolling Window Of Counters(use case #3)

● “How to model rolling time window that contains counters with time buckets of monthly (12 months), weekly (4 weeks), daily (7 days), hourly (24 hours)? Example would be; how many times user logged into a system in last 24 hours, last 7 days ...”

● Timezones and “rolling window” is what makes this interesting

Rolling Time Window Details

● One row for every granularity you want to track (e.g. day, hour)

● Row Key consists of the granularity, metric, user and system

● Column Name is a “fixed” time bucket on UTC time● Column Values are counts of the logins in that

bucket● get_slice calls to return multiple counters which

are them summed up

Rolling Time Window Counter Model

U3:S5:L:D

user3:system5:logins:by_day

20110107 ... 20110523

2 7...

2 logins in Jan 7th 2011 for user 3 on system 5

7 logins on May 23rd 2011for user 3 on system 5

U3:S5:L:H

user3:system5:logins:by_hour

2011010710 ... 2011052316

1 7...

one login for user 3 on system 5 on Jan 7th 2011 for the 10th hour

2 logins for user 3 on system 5on May 23rd 2011 for the 16th hour

Rolling Time Window Queries

● Time window is rolling and there are other timezones besides UTC● one get_slice for the “middle” counts● one get_slice for the “left end”● one get_slice for the “right end”

Example: logins for the past 7 days

● Determine date/time boundaries● Determine UTC days that are wholly contained

within your boundaries to select and sum● Select and sum counters for the remaining hours

on either side of the UTC days● O(1) queries (3 in this case), can be requested

from C* in parallel● NB: some timezones are annoying (e.g. 15 minute

or 30 minutes offsets); I try to ignore them

Alternatives?(of course)

● If you're counting logins and each user doesn't login in hundreds of times a day, just have one row per user with a TimeUUID column name for the time the login occurred

● Supports any timezone/range/granularity easily● More expensive for large ranges (e.g. year)

regardless of granularity, so cache results (in C*) lazily.

● NB: caching results for rolling windows is not usually helpful (because, well it's rolling and always changes)

Eventually Atomic(use case #4)

● “When there are many to many or one to many relations involved how to model that and also keep it atomic? for eg: one user can upload many pictures and those pictures can somehow be related to other users as well.”

● Attempting full ACID compliance in distributed systems is a bad idea (and impossible in the general sense)

● However, consistency is important and can certainly be achieved in C*

● Many approaches / alternatives

● I like transaction log approach, especially in the context of C*

Transaction Logs(in this context)

● Records what is going to be performed before it is actually performed

● Performs the actions that need to be atomic (in the indivisible sense, not the all at once sense)

● Marks that the actions were performed

In Cassandra

● Serialize all actions that need to be performed in a single column – JSON, XML, YAML (yuck!), cpickle, JSO, et cetera● Row Key = randomly chosen C* node token● Column Name = TimeUUID

● Perform actions● Delete Column

Configuration Details

● Short GC_Grace on the XACT_LOG Column Family (e.g. 1 hour)

● Write to XACT_LOG at CL.QUORUM or CL.LOCAL_QUORUM for durability (if it fails with an unavailable exception, pick a different node token and/or node and try again; same semantics as a traditional relational DB)

● 1M memtable ops, 1 hour memtable flush time

Failures

● Before insert into the XACT_LOG● After insert, before actions● After insert, in middle of actions● After insert, after actions, before delete● After insert, after actions, after delete

Recovery

● Each C* has a crond job offset from every other by some time period

● Each job runs the same code: multiget_slice for all node tokens for all columns older than some time period

● Any columns need to be replayed in their entirety and are deleted after replay (normally there are no columns because normally things are working normally)

XACT_LOG Comments

● Idempotent writes are awesome (that's why this works so well)

● Doesn't work so well for counters (they're not idempotent)

● Clients must be able to deal with temporarily inconsistent data (they have to do this anyway)

● Could use a reliable queuing service (e.g. SQS) instead of polling – push to SQS first, then XACT log.

Cassandra Data Modeling WorkshopMatthew F. Dennis // @mdennis

Q?

Cassandra Data Modeling

Technology

Transcript of Cassandra Data Modeling