C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra
-
Upload
planet-cassandra -
Category
Technology
-
view
885 -
download
0
description
Transcript of C* Summit EU 2013: Denormalizing Your Data: A Java Library to Support Structured Data in Cassandra
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Denormalize your data
Eric Zoerner | Software Developer, eBuddy BV Cassandra Summit Europe 2013 London
#CASSANDRAEU CASSANDRASUMMITEU
About eBuddy
#CASSANDRAEU CASSANDRASUMMITEU
XMS
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra in eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery
#CASSANDRAEU CASSANDRASUMMITEU
Some Statistics
• Current size of data – 1,4 TB total (replication of 3x); 467 GB actual data
!• 12 million sessions (11 million users plus groups) !
• Almost a billion rows in one column family(inverse social graph)
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
#CASSANDRAEU CASSANDRASUMMITEU
The Problem (a “classic”)
Complex Object
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
??
??
??
? ?
Key-Value Store(RDB table, NoSQL, etc.)
#CASSANDRAEU CASSANDRASUMMITEU
Some Strategies
Serialization!
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
#CASSANDRAEU CASSANDRASUMMITEU
Some StrategiesSerialization!
Normalization!
Decomposition!
Personid
John
birthdate
Jack
1979-11-30
110 1985-04-06
Mary111 Mary
name nickname
person_id
001
003
street
New York
78 Hoofd Str
456 Singel
110 123 Main St
Amsterdam110 002
address_id city
London111
Address
person_id
mobile
mobile
phone
+44030393
+44884800
110 +15551234
111 home
name
111
Phone
name/ John
addresses/@0/street 123 Main St.
phones/@0/number +31123456789
... ...
#CASSANDRAEU CASSANDRASUMMITEU
Strategies Comparison
✔ ✘ ✔
✔ ✘ ✔
✔ ✔
✘ ✔ ✔
✔ ✔ ✘
Serialization Normalization Decomposition
Single Write
Single Read
Consistent Updates not enforced
Structural Access
Cycles
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra
https://github.com/ ebuddy/c-star-path !!
* Artifacts available at Maven Central.
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
#CASSANDRAEU CASSANDRASUMMITEU
C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex objects with one read or write operation
#CASSANDRAEU CASSANDRASUMMITEU
How does it work?
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ; UUID rowKey = … ; Pojo pojo = … ; !Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !dao.writeToPath(rowKey, path, pojo);
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Read from a Path
!Path path = dao.createPath(“some”, “path”, ”to”,”my”,”pojo”); !!Pojo pojo = dao.readFromPath(rowKey, path, new TypeReference<Pojo>() { });
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Delete
!!dao.deletePath(rowKey, path);
#CASSANDRAEU CASSANDRASUMMITEU
API Example - Batch Operations
!BatchContext batch = dao.beginBatch(); !dao.writeToPath(rowKey1, path, pojo1, batch); dao.writeToPath(rowKey2, path, pojo2, batch); dao.deletePath(rowKey3, path, pojo3, batch); !dao.applyBatch(batch);
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !
#CASSANDRAEU CASSANDRASUMMITEU
Read or write at any level of a path
Person person = …; !Path path = dao.createPath(“x”); dao.writeToPath(rowKey, path, person); !Path pathToName = path.withElements(“name”); String name = dao.readFromPath(rowKey, pathToName, stringTypeReference);
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
#CASSANDRAEU CASSANDRASUMMITEU
Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and simple values. Uses the jackson (fasterxml) library for this and honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple values (i.e. String, Number, Boolean), done by Decomposer
• Step 3:
– Write this map as key-value pairs in the database
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
name: Stringbirthdate: Datenickname: String
Person
street: Stringcity: Stringprovince: StringpostalCode: StringcountryCode: String
Address
*1
name: Stringnumber: String
Phone*
1
Simplify structure into regular Maps, Lists, and simple values
#CASSANDRAEU CASSANDRASUMMITEU
Example Decomposition - step 1
Simplify structure into regular Maps, Lists, and simple values
Map
name = "John" birthdate = "-39080932298" nickname="Jack" addresses=<List>
[0] = <Map>
[1] = <Map>
street="Singel 45"
place="Amsterdam"
street="123 Main"
place="New York"
phones=<List>
[0] = <Map>
name="mobile"
number="+31651234567"
#CASSANDRAEU CASSANDRASUMMITEU
path value
name/ “John”
birthdate/ “-39080932298”
nickname/ “Jack”
addresses/@0/street “123 Main St.”
addresses/@0/place “New York”
addresses/@1/street “Singel 45”
addresses/@1/place “Amsterdam”
phones/@0/name “mobile”
phones/@1/number "+31651234567"
Example Decomposition - step 2
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
#CASSANDRAEU CASSANDRASUMMITEU
Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure(Maps, Lists, simple values), done by Composer
• Step 3:
– Use Jackson to convert basic structure back into domain object using a TypeReference
#CASSANDRAEU CASSANDRASUMMITEU
Design & Challenges
#CASSANDRAEU CASSANDRASUMMITEU
Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing use of special characters in the implementation
• Special characters: @ for list indices(@0, @1, @2, ...)
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: “Shrinking Lists”
➀ Write a list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list. ➁ Write a shorter list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
Challenge: “Shrinking Lists”
#CASSANDRAEU CASSANDRASUMMITEU
➀ Write a list. ➁ Write a shorter list. ➂ Read the list.
x/@0/ “1”
x/@1/ “2”dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ “2”dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”,”2”}
Challenge: “Shrinking Lists”
✘
#CASSANDRAEU CASSANDRASUMMITEU
Solution: Implementation writes a list terminator value.
x/@0/ “1”
x/@1/ “2”
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“1”,”2”});
x/@0/ “3”
x/@1/ 0xFFFFFFFF
x/@2/ 0xFFFFFFFF
dao.writeToPath(key, “x”, {“3”});
dao.readFromPath(key, “x”, new TypeReference<List<String>>() {});
{“3”}
Challenge: “Shrinking Lists”
✔
✔
#CASSANDRAEU CASSANDRASUMMITEU
Solution: Implementation writes a list terminator value.
Challenge: “Shrinking Lists”
✔
Unfortunately, this is only a partial solution, because it is still possible to read “stale” list elements using a positional index in the path. !This can be avoided by doing a delete before a write, but for performance reasons the library will not do that automatically. !Conclusion: The user must know what they are doing and understand the implementation.
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no
protection against a write “corrupting” an object structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent UpdatesBecause objects can be updated at any path, there is no
protection against a write “corrupting” an object structure
x/address/street/ “Singel 45”
x/name/ “John”
Path path = dao.createPath(“x”); dao.writeToPath(key, path, person1);
path = dao.createPath(“x”,”name”); dao.writeToPath(key, path, person1);
x/address/street/ “Singel 45”
x/name/ “John”
x/name/address/street/ “Singel 45”
x/name/name/ “John”✘
#CASSANDRAEU CASSANDRASUMMITEU
Challenge: Inconsistent Updates
Solution: Don’t do that!
✔
* If it does happen... !The implementation provides a way to still get the “corrupted” data as simple structures, but an attempt to convert to a now incompatible POJO will fail.
Conclusion: The user must know what they are doing and understand the implementation.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !!
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !Instead of storing paths as strings, the implementation could have used DynamicComposite. !We tried it.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It can work. CQL supports it as a user-defined type. !Unfortunately it causes cqlsh to crash, making it difficult to “browse” the data.
#CASSANDRAEU CASSANDRASUMMITEU
Issue: Sorting
Question:What about sorting path elements as something other than strings, such as numerical or time-based UUID elements? !It is still in consideration to use DynamicComposite for paths in a future version.
#CASSANDRAEU CASSANDRASUMMITEU
Cassandra Data Model
#CASSANDRAEU CASSANDRASUMMITEU
Thriftx/address/street/ “Singel 45”
x/name “John”
… …
<UUID>
row key column name column value
column family
- OR -
super column family !(coming soon)
xaddress/street/ “Singel 45”name “John”… …
<UUID>
row keysuper column name
#CASSANDRAEU CASSANDRASUMMITEU
Thrift
ColumnFamilyOperations<K,String,Object> operations = new ColumnFamilyTemplate<K,String,Object>( keyspace,KeySerializer,StringSerializer,StructureSerializer); !!!!
StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);
Thrift implementation relies on the Hector client.
#CASSANDRAEU CASSANDRASUMMITEU
CQLCREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )
• Cannot use the path itself as a column name because it is “dynamic”
• Dynamic column family
#CASSANDRAEU CASSANDRASUMMITEU
CQL: Data Model Constraints
• Need to do a range (“slice”) query on the path ⇒ path must be a clustering key
• Also, the path must be the first clustering key, since otherwise we would need to have to provide an equals condition on previous clustering keys in a query.
• One might try putting a secondary index on the path instead of making it a clustering key, but this doesn’t work since Cassandra indexes only work with equals conditionsBad Request: No indexed columns present in by-columns clause with Equal operator
CREATE TABLE person ( key text, path text, value text, PRIMARY KEY (key, path) )
#CASSANDRAEU CASSANDRASUMMITEU
CQL
!StructuredDataSupport<K> dao = new CqlStructuredDataSupport<K>(String tableName, String partitionKeyColumnName, String pathColumnName, String valueColumnName, Session session);
CQL implementation relies on the DataStax Java driver.
#CASSANDRAEU CASSANDRASUMMITEU
And the rest…
#CASSANDRAEU CASSANDRASUMMITEU
Planned Features
• Sets with simple values: element values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
#CASSANDRAEU CASSANDRASUMMITEU
Credits and Acknowledgements
• Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback
• jackson JSON Processor, which is core to the C* Path implementation http://wiki.fasterxml.com/JacksonHome
• Image credits:
Slide image name author link
Some Strategies binary noegranado http://www.flickr.com/photos/43360884@N04/6949896929/
#CASSANDRAEU CASSANDRASUMMITEU
C* Path
Open Source Java Library for decomposing complex objects into Path-Value pairs — and storing them in Cassandra
https://github.com/ ebuddy/c-star-path !!
* Artifacts available at Maven Central.