Real data models of silicon valley
-
Upload
patrick-mcfadin -
Category
Data & Analytics
-
view
1.075 -
download
1
description
Transcript of Real data models of silicon valley
![Page 1: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/1.jpg)
Real Data Models of Silicon ValleyPatrick McFadin
Chief Evangelist for Apache Cassandra !
@PatrickMcFadin
![Page 2: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/2.jpg)
It's been an epic year
![Page 3: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/3.jpg)
I've had a ton of fun!
• Traveling the world talking to people like you!
Warsaw
Stockholm
Melbourne
New YorkVancouver
Dublin
![Page 4: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/4.jpg)
What's new?• 2.1 is out!
• Amazing changes for performance and stability
![Page 5: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/5.jpg)
Where are we going?
• 3.0 is next. Just hold on…
![Page 6: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/6.jpg)
KillrVideo.com• 2012 Summit
• Complete example for data modeling
www.killrvideos.com
Video TitleRecommended
MeowAds
by Google
Comments
Description
Upload New!
Username
Rating: Tags: Foo Bar
*Cat drawing by goodrob13 on Flickr
![Page 7: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/7.jpg)
It’s alive!!!• Hosted on Azure
• Code on Github
![Page 8: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/8.jpg)
Data Model - Revisited• Add in some 2.1 data models
• Replace (or remove) some app code
• Become a part of Cassandra OSS download
![Page 9: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/9.jpg)
User Defined Types• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
![Page 10: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/10.jpg)
BeforeCREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
CREATE TABLE video_metadata ( video_id uuid PRIMARY KEY, height int, width int, video_bit_rate set<text>, encoding text );
SELECT * FROM videos WHERE videoId = 2; !SELECT * FROM video_metadata WHERE videoId = 2;
Title: Introduction to Apache Cassandra !Description: A one hour talk on everything you need to know about a totally amazing database.
480 720
Playback rate:
In-application join
![Page 11: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/11.jpg)
After• Now video_metadata is
embedded in videos
CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text );
CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
![Page 12: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/12.jpg)
Wait! Frozen??• Staying out of technical
debt
• 3.0 UDTs will not have to be frozen
• Applicable to User Defined Types and Tuples (wait for it…)
Do you want to build a schema? Do you want to store some JSON?
![Page 13: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/13.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
![Page 14: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/14.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
![Page 15: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/15.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
![Page 16: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/16.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
![Page 17: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/17.jpg)
Let’s store some JSONINSERT INTO product (productId, name, price, description, dimensions, categories) VALUES (2, 'Kitchen Table', 249.99, 'Rectangular table with oak finish', { units: 'inches', length: 50.0, width: 66.0, height: 32 }, { 'Home Furnishings': { catalogPage: 45, url: '/home/furnishings' }, 'Kitchen Furnishings': { catalogPage: 108, url: '/kitchen/furnishings' } ! } );
dimensions frozen <dimensions>
categories map <text, frozen <category>>
![Page 18: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/18.jpg)
Retrieving fields
![Page 19: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/19.jpg)
Counters pt Deux
• Since .8
• Commit log replay would change counters
• Repair could change counters
• Performance was inconsistent. Lots of GC
![Page 20: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/20.jpg)
The good• Stable under load
• No commit log replay issues
• No repair weirdness
![Page 21: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/21.jpg)
The bad
• Still can’t delete/reset counters
• Still needs to do a read before write.
![Page 22: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/22.jpg)
UsageWait for it…
It’s the same! Carry on…
![Page 23: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/23.jpg)
Static Fields• New as of 2.0.6
• VERY specific, but useful
• Thrift people will like this
CREATE TABLE t ( k text, s text STATIC, i int, PRIMARY KEY (k, i) );
![Page 24: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/24.jpg)
Why?CREATE TABLE weather ( id int, time timestamp, weatherstation_name text, temperature float, PRIMARY KEY (id, time) );
ID = 1Partition Key
(Storage Row Key)
2014-09-08 12:00:00 : name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:01:00 : name
SFO
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:02:00 : name
SFO
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 2 Partition Row 3
ID = 1Partition Key
(Storage Row Key)
name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 1 Partition Row 1
CREATE TABLE weather ( id int, time timestamp, weatherstation_name text static, temperature float, PRIMARY KEY (id, time) );
![Page 25: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/25.jpg)
Usage• Put a static at the end of the declaration
• Can’t be a part of primary key
CREATE TABLE video_event ( videoid uuid, userid uuid, preview_image_location text static, event varchar, event_timestamp timeuuid, video_timestamp bigint, PRIMARY KEY ((videoid,userid),event_timestamp,event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC,event ASC);
![Page 26: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/26.jpg)
Tuples
• A type that represents a group
• Up to 256 different elements
CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> );
![Page 27: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/27.jpg)
Example Usage• Track a drone’s position
• x, y, z in a 3D Cartesian
CREATE TABLE drone_position ( droneId int, time timestamp, position frozen <tuple<float, float, float>>, PRIMARY KEY (droneId, time) );
![Page 28: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/28.jpg)
What about partition size?
• A CQL partition is a logical projection of a storage row
• Storage rows can have up to 2 billion cells
• Each cell can hold up to 2G of data
![Page 29: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/29.jpg)
How much is too much?
• How many cells before performance degrades?
• How many bytes per partition before it’s unmanageable
• What is “practical”
![Page 30: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/30.jpg)
Old answer• 2011: Pre-Cassandra 1.2 (actually tested on .8)
• Aaron Morton, Cassandra MVP and Founder of The Last Pickle
![Page 31: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/31.jpg)
Conclusion• Keep partition (storage row) length < 10k cells
• Total size in bytes below 64M (Multi-pass compaction)
• Multiple hits to 64k page size will start to hurt
TL;DR - It’s a performance tunable
![Page 32: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/32.jpg)
The tests revisited
• Attempted to reproduce the same tests using CQL
• Cassandra 2.1, 2.0 and 1.2
• Tested partitions sizes 1. 100 2. 2114 3. 5,000 4. 10,000 5. 100,000 6. 1,000,000 7. 10,000,000 8. 100,000,000 9. 1,000,000,000
![Page 33: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/33.jpg)
Results
mSec
Cells per partition
![Page 34: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/34.jpg)
The new answer
• 100’s of thousands is not problem
• 100’s of megs per partition is best operationally
• The issue to manage is operations
![Page 35: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/35.jpg)
Thank You!
Follow me on twitter for more @PatrickMcFadin
![Page 36: Real data models of silicon valley](https://reader033.fdocuments.us/reader033/viewer/2022051610/548478b05906b5886f8b474c/html5/thumbnails/36.jpg)
CASSANDRASUMMIT2014September 10 - 11 | #CassandraSummit