Real Time Analytics with
Vagmi Mudumbai@vagmi / @reducedata
What is Cassandra?
DynamoBased on
FacebookBuilt by
Key Value Storeis both a
Column Storeand a
The CAP Theorem
Column Families
HashMap<RowKey,SortedMap<ColumnName, Value>>
id name email country
1 Vagmi [email protected] IN
2 Karthik yeskarthik@blah IN
3 MarkZ mark@fb US
Rowkey 1 2 3
name Vagmi Karthik MarkZ
email [email protected] yeskarthik@blah mark@fb
country IN IN US
The Problem
As a user, I want to view real time metrics and filter by dimensions like time, city,
category, etc.
select sum(measure) from events where time between A and B and country=’US’ and
device_platform=’Android’
The wrong way
HashMap<RowKey,SortedMap<ColumnName, Value>>
Counters
create column family view_counts_hourly with comparator=UTF8Type and default_validation_class=CounterColumnType and key_validation_class=UTF8Type;
RowKey 20140101 20140102 20140103 20140104 ... ... 20140628 ... 20150308
sid1#us 2553 2341 2342 3242 ... ... 32342 ... 33423
sid1#us#chrome 1556 1532 1892 ... ... ... ... ... ...
sid1#us#chrome#25 833 899 1200
Uniques?but what about
Bitmaps to the rescue
1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1
u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 ... ... ...
UID- 1328abc2838fd283e282
Fast Hash Function - Murmur32
1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1
u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 ... ... ...
RowKey 20140101 20140102 20140103 20140104 ... ... 20140628 ... 20150308
sid1#us 10101 10111 11100 11101 ... ... ... ... 11101
sid1#us#chrome ... ... ... ... ... ... ... ... ...
sid1#us#chrome#25 10101 11101 11100 …. ... ... ... ... ...
But I do not have Big Data
Oh andwe’re hiring
Thanks@vagmi on
Github / Twitter / Facebook
Top Related