Entity Relationships in a Document Database at CouchConf Boston
-
Upload
bradley-holt -
Category
Technology
-
view
2.051 -
download
1
description
Transcript of Entity Relationships in a Document Database at CouchConf Boston
Entity Relationships in a Document Database
MapReduce Views for SQL Users
Entity:An object de!ned by its identity and a thread of continuity[1]
1. "Entity" Domain-driven Design Community <http://domaindrivendesign.org/node/109>.
Entity RelationshipModel
Join vs. Collation
SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title`FROM `publisher`FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id`ORDER BY `publisher`.`id`, `book`.`title`;
SQL Query Joining Publishers and Books
Joined Result Set
publisher.id publisher.name book.title
oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript
oreilly O'Reilly MediaCouchDB: The Definitive
Guide
oreilly O'Reilly MediaDocBook: The Definitive
Guide
oreilly O'Reilly Media RESTful Web Services
Joined Result Set
publisher.id publisher.name book.title
oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript
oreilly O'Reilly MediaCouchDB: The Definitive
Guide
oreilly O'Reilly MediaDocBook: The Definitive
Guide
oreilly O'Reilly Media RESTful Web Services
Publisher (“left”)
Joined Result Set
publisher.id publisher.name book.title
oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript
oreilly O'Reilly MediaCouchDB: The Definitive
Guide
oreilly O'Reilly MediaDocBook: The Definitive
Guide
oreilly O'Reilly Media RESTful Web Services
Publisher (“left”) Book “right”
Collated Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"
["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"
["oreilly",1] "oreilly" "DocBook: The Definitive Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
Collated Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"
["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"
["oreilly",1] "oreilly" "DocBook: The Definitive Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
Publisher
Collated Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"
["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"
["oreilly",1] "oreilly" "DocBook: The Definitive Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
Publisher
Books
View Result SetsMade up of columns and rows
Every row has the same three columns:• key• id• value
Columns can contain a mixture of logical data types
One to Many Relationships
Embedded Entities: Nest related entities within a document
A single document represents the “one” entity
Nested entities (JSON Array) represents the “many” entities
Simplest way to create a one to many relationship
Embedded Entities
Example: Publisher with Nested Books{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ]}
function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } }}
Map Function
Result Setkey id value
["oreilly",0] "oreilly" "O'Reilly Media"
["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"
["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"
["oreilly",1] "oreilly" "DocBook: The Definitive Guide"
["oreilly",1] "oreilly" "RESTful Web Services"
Only works if there aren’t a large number of related entities:• Too many nested entities can result in very large documents• Slow to transfer between client and server• Unwieldy to modify• Time-consuming to index
Limitations
Related Documents: Reference an entity by its identi!er
A document representing the “one” entity
Separate documents for each “many” entity
Each “many” entity references its related “one” entity by the “one” entity’s document identi!er
Makes for smaller documents
Reduces the probability of document update con"icts
Related Documents
Example: Publisher
{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media"}
Example: Related Book
{ "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly"}
Map Function
function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); }}
Result Set
key id value
["oreilly",0] "oreilly" "O'Reilly Media"
["oreilly",1] "9780596155896" "CouchDB: The Definitive Guide"
["oreilly",1] "9780596529260" "RESTful Web Services"
["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript"
["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"
When retrieving the entity on the “right” side of the relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query
Only works for one to many relationships
Limitations
Many to Many Relationships
List of Keys: Reference entities by their identi!ers
A document representing each “many” entity on the “left” side of the relationship
Separate documents for each “many” entity on the “right” side of the relationship
Each “many” entity on the “right” side of the relationship maintains a list of document identi!ers for its related “many” entities on the “left” side of the relationship
List of Keys
Books and Related Authors
Example: Book
{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
Example: Book
{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
Example: Book
{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
Example: Author
{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]}
Example: Author
{ "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ]}
Map Function
function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } }}
Result Setkey id value
["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
["9780596805029",1] "walsh" "Norman Walsh"
["9781565920514",0] "9781565920514" "Making TeX Work"
["9781565920514",1] "walsh" "Norman Walsh"
["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
["9781565925809",1] "muellner" "Leonard Muellner"
["9781565925809",1] "walsh" "Norman Walsh"
Authors and Related Books
function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } }}
Map Function
Result Setkey id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "muellner" {"_id":"9781565925809"}
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "walsh" {"_id":"9780596805029"}
["walsh",1] "walsh" {"_id":"9781565920514"}
["walsh",1] "walsh" {"_id":"9781565925809"}
Including Docs include_docs=true
key id value doc (truncated)
["muellner",0] "muellner" … {"name":"Leonard Muellner"}
["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}
["walsh",0] "walsh" … {"name":"Norman Walsh"}
["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] "walsh" … {"title":"Making TeX Work"}
["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}
Or, we can reverse the references…
Example: Author
{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
Example: Author
{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
Example: Book
{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ]}
Example: Book
{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ]}
Example: Book
{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ]}
Map Function
function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } }}
Result Setkey id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "9781565925809" "DocBook: The Definitive Guide"
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"
["walsh",1] "9781565920514" "Making TeX Work"
["walsh",1] "9781565925809" "DocBook: The Definitive Guide"
Queries from the “right” side of the relationship cannot include any data from entities on the “left” side of the relationship (without the use of include_docs)
A document representing an entity with lots of relationships could become quite large
Limitations
Relationship Documents: Create a document to represent each individual relationship
A document representing each “many” entity on the “left” side of the relationship
Separate documents for each “many” entity on the “right” side of the relationship
Neither the “left” nor “right” side of the relationship contain any direct references to each other
For each distinct relationship, a separate document includes the document identi!ers for both the “left” and “right” sides of the relationship
Relationship Documents
Example: Book
{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}
Example: Book
{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}
Example: Book
{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}
Example: Author
{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}
Example: Author
{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}
Example: Relationship Document{ "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh"}
Example: Relationship Document{ "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh"}
Example: Relationship Document{ "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner"}
Example: Relationship Document{ "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh"}
Books and Related Authors
function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); }}
Map Function
Result Setkey id value
["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"
["9780596805029",1] "44005f2c" {"_id":"walsh"}
["9781565920514",0] "9781565920514" "Making TeX Work"
["9781565920514",1] "44005f72" {"_id":"walsh"}
["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"
["9781565925809",1] "44006720" {"_id":"muellner"}
["9781565925809",1] "44006b0d" {"_id":"walsh"}
Including Docs include_docs=true
key id value doc (truncated)
["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}
["9780596805029",1] … … {"name":"Norman Walsh"}
["9781565920514",0] … … {"title":"Making TeX Work"}
["9781565920514",1] … … {"author","name":"Norman Walsh"}
["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}
["9781565925809",1] … … {"name":"Leonard Muellner"}
["9781565925809",1] … … {"name":"Norman Walsh"}
Authors and Related Books
function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); }}
Map Function
Result Setkey id value
["muellner",0] "muellner" "Leonard Muellner"
["muellner",1] "44006720" {"_id":"9781565925809"}
["walsh",0] "walsh" "Norman Walsh"
["walsh",1] "44005f2c" {"_id":"9780596805029"}
["walsh",1] "44005f72" {"_id":"9781565920514"}
["walsh",1] "44006b0d" {"_id":"9781565925809"}
Including Docs include_docs=true
key id value doc (truncated)
["muellner",0] … … {"name":"Leonard Muellner"}
["muellner",1] … … {"title":"DocBook: The Definitive Guide"}
["walsh",0] … … {"name":"Norman Walsh"}
["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}
["walsh",1] … … {"title":"Making TeX Work"}
["walsh",1] … … {"title":"DocBook: The Definitive Guide"}
Queries can only contain data from the “left” or “right” side of the relationship (without the use of include_docs)
Maintaining relationship documents may require more work
Limitations
Final Thoughts
Document databases have no tables (and therefore no columns)
Indexes (views) are queried directly, instead of being used to optimize more generalized queries
Result set columns can contain a mix of logical data types
No built-in concept of relationships between documents
Related entities can be embedded in a document, referenced from a document, or both
Document Databases Compared to Relational Databases
CaveatsNo referential integrity
No atomic transactions across document boundaries
Some patterns may involve denormalized (i.e. redundant) data
Data inconsistencies are inevitable (i.e. eventual consistency)
Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)
Additional TechniquesUse the startkey and endkey parameters to retrieve one entity and its related entities:startkey=["9781565925809"]&endkey=["9781565925809",{}]
De!ne a reduce function and use grouping levels
Use UUIDs rather than natural keys for better performance
Use the bulk document API when writing Relationship Documents
When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results
Cheat Sheet
One to Many
Many to Many
<= N* Relations
> N* Relations
Embedded Entities
Related Documents
List of Keys Relationship Documents
✓ ✓
✓ ✓
✓ ✓
✓ ✓
* where N is a large number for your system
http://oreilly.com/catalog/9781449303129/ http://oreilly.com/catalog/9781449303433/
Thank You@BradleyHolt
http://[email protected]
Copyright © 2011-2012 Bradley Holt. All rights reserved.