Cassandra data modeling talk
-
Upload
patrick-mcfadin -
Category
Technology
-
view
130.222 -
download
3
description
Transcript of Cassandra data modeling talk
Building a Cassandra Based Application
From 0 to Deploy
Patrick McFadinSolution Architect at DataStax
Wednesday, November 7, 12
Me
• Solution Architect at DataStax, THE Cassandra company
• Cassandra user since .7
• Follow me here: @PatrickMcFadin
Wednesday, November 7, 12
Goals
• Take a new application concept
• What is the data model??
• Express that in CQL 3
• Some sample code
Wednesday, November 7, 12
The Plan
• Conceptualize a new application
• Identify the entity tables
• Identify query tables
• Code. Rinse. Repeat.
• Deploy
Wednesday, November 7, 12
Start with a concept
• Video sharing website
www.killrvideos.com
Video TitleRecommended
MeowAds
by Google
Comments
Description
Upload New!
Username
Rating: Tags: Foo Bar
*Cat drawing by goodrob13 on Flickr
Text
Wednesday, November 7, 12
Break down the features
• Post a video
• View a video
• Add a comment
• Rate a video
• Tag a video
Wednesday, November 7, 12
Create Entity Tables
Basic storage unit
Wednesday, November 7, 12
Users
CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));
Usernamepasswordfirstname lastname created_dateemail
• Similar to a RDBMS table. Fairly fixed columns • Username is unique• Use secondary indexes on firstname and lastname for lookup• Adding columns with Cassandra is super easy
Wednesday, November 7, 12
Users: The insert codestatic void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
try {
// Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword()));
// Once the mutator is ready, execute on cassandra mutator.execute();
} catch (HectorException he) { he.printStackTrace(); }}
Wednesday, November 7, 12
Videos (one-to-many)
CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname));
VideoId<UUID>
tagsvideoname username upload_datedescription
• Use a UUID as a row key for uniqueness• Allows for same video names• Tags should be stored in some sort of delimited format• Index on username may not be the best plan
Wednesday, November 7, 12
Videos: The get codestatic Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags");
// Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video;}
Wednesday, November 7, 12
Comments (many-to-many)
CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts));
VideoId<UUID>
username comment_ts comment
• Videos have many comments• Comments have many users• Order is as inserted• Use getSlice() to pull some or all of the comments
Wednesday, November 7, 12
Comments... pt 2
• This is what’s really going on
• VideoID is the key
• Composite of username and comment_ts are the column name
• 1 column per comment
Wide rowTime ordered
VideoId<UUID>
username:comment_ts
comment
username:comment_ts
comment
..
..
Wednesday, November 7, 12
Ratings
CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid));
VideoId<UUID>
rating_count rating_total
<counter> <counter>
• Use counter for single call update• rating_count is how many ratings were given• rating_total is the sum of rating• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6
Wednesday, November 7, 12
Video Events
CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event)) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);
VideoId:Usernamestart_<timestamp> stop_<timestamp> start_<timestamp>
video_<timestamp>
Latest .. Oldest
• Track viewing events• Combine Video ID and Username for a unique row• Stop time can be used to pick up where they left off• Great for usage analytics later• Reverse comparator!
Wednesday, November 7, 12
Create Query Tables
Indexes to support fast lookups
Wednesday, November 7, 12
Index table principles
• Lookup by rowkey
• Indexed
• Cached (most times)
RowKey1
RowKey2
RowKey3
RowKey4RowKey5
RowKey6
RowKey7
RowKey8
RowKey9
RowKey10
RowKey11
RowKey12
Lookup5RowKey5
Wednesday, November 7, 12
Index table principles
• Get row by the key
• Slice. Get data in one pass
• Cached (sometimes)
RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8
GetSlice6Col37Col6
Col3 Col4 Col5 Col6
Sequential Read
Wednesday, November 7, 12
Video by Username
CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date));
UsernameVideoId:<timestamp> .. VideoId:<timestamp>
Wide row
• Username is unique• One column for each new video uploaded• Column slice for time span. From x to y• VideoId is added the same time a Video record is added
Wednesday, November 7, 12
Video by Tag
CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid));
tagVideoId .. VideoId
timestamp timestamp
• Tag is unique regardless of video• Great for “List videos with X tag”• Tags have to be updated in Video and Tag at the same time• Index integrity is maintained in app logic
Wednesday, November 7, 12
Deployment
• Replication factor?
• Multi-datacenter?
• Cost?
Wednesday, November 7, 12
Deployment
• Today != tomorrow
• Scale when needed
• Have expansion plan ready
Wednesday, November 7, 12
DataStax Enterprise
• Analytics - Hadoop
• Search - Solr
Wednesday, November 7, 12
Hadoop
• Embedded with Cassandra
• No single point of failure
• Use native c* data
• Hive, Pig, Mahout
Wednesday, November 7, 12
Solr
• Embeded with Cassandra
• Fast reverse-index
• Shards Solr by key range
Wednesday, November 7, 12
OpsCenter
Wednesday, November 7, 12
Thank you!
Connect with me at @PatrickMcFadinOr linkedIn
Wednesday, November 7, 12