Engineer
Bryan Reinero
@blimpyacht
Relational to MongoDB
Unhelpful Terms
• NoSQL
• Big Data
• Distributed
What’s the data model?
MongoDB
• Non-relational
• Scalable
• Highly available
• Full featured
• Document database
RDBMS MongoDBTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded
DocumentForeign Key ➜ ReferencePartition ➜ Shard
Terminology
Sample Document{
maker : "M.V. Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal cumbustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Relational DBs• Attribute columns are valid for
every row
• Duplicate rows are not allowed
• Every column has the same type and same meaning
As a document store, MongoDB supports a flexible schema
1st Normal Form: No repeating groups
• Can't use equality to match elements
NameLumiaiPad
Galaxy
Categories“electronics,hand held, smart
phone”“PDA,tablet”
“smart phone,tablet”
Product_id
1234567891011
MakerNokiaApple
Samsung
1st Normal Form: No repeating groups
• Can't use equality to match elements
• Must use regular expressions to find data
NameLumiaiPad
Galaxy
Categories“electronics,hand held, smart
phone”“PDA,tablet”
“smart phone,tablet”
Product_id
1234567891011
MakerNokiaApple
Samsung
1st Normal Form: No repeating groups
• Can't use equality to match elements
• Must use regular expressions to find data
• Aggregate functions are difficult
NameLumiaiPad
Galaxy
Categories“electronics,hand held, smart
phone”“PDA,tablet”
“smart phone,tablet”
Product_id
1234567891011
MakerNokiaApple
Samsung
1st Normal Form: No repeating groups
• Can't use equality to match elements
• Must use regular expressions to find data
• Aggregate functions are difficult
• Updating a specific element is difficult
NameLumiaiPad
Galaxy
Categories“electronics,hand held, smart
phone”“PDA,tablet”
“smart phone,tablet”
Product_id
1234567891011
MakerNokiaApple
Samsung
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
// querying is easydb.products.find( { "categories": ”handheld" } );
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
// querying is easydb.products.find( { "categories": ”handheld" } );
// can be indexeddb.products.ensureIndex( { "categories”: 1 } );
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
// Updates are easydb.products.update(
{ "categories": "electronics"}, { $set: { "categories.$" : "consumer electronics" } }
);
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
db.products.aggregate({ $unwind : "$categories" }, { $group : {
"_id" : "$categories", "counts" : { "$sum" : 1 }
} }
);
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
db.products.aggregate({ $unwind : "$categories" }, { $group : {
"_id" : "$categories", "counts" : { "$sum" : 1 }
} }
);
Unwind the array
The Tao of MongoDB
{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [
"electronics","handheld","smart phone"
]}
db.products.aggregate({ $unwind : "$categories" }, { $group : {
"_id" : "$categories", "counts" : { "$sum" : 1 }
} }
);
Unwind the array
Tally the occurrences
The Tao of MongoDB "result" : [
{ "_id" : "smart phone”, "counts" : 1589 },{ "_id" : "handheld”, "counts" : 2403 },{ "_id" : "electronics”, "counts" : 4767 }
]
db.products.aggregate({ $unwind : "$categories" }, { $group : {
"_id" : "$categories", "counts" : { "$sum" : 1 }
} }
);
Meh, big deal…. Right?
Aren’t nested structures just a pre-joined schema?
• I could use an adjacency list
• I could use an intersection table
Goals of Normalization
• Model data an understandable form
• Reduce fact redundancy and data inconsistency
• Enforce integrity constraints
Performance is not a primary goal
Normalize or Denormalize
Commonly held that denormalization is faster
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right?
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right? Requires proper indexing, indexing effects write performance
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right? Requires proper indexing, indexing effects write performance
• Does denormalization commit me to a join strategy?
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right? Requires proper indexing, indexing effects write performance
• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right? Requires proper indexing, indexing effects write performance
• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too
• Does denormalizaiton improve a finite set of queries at the cost of several others?
Normalize or Denormalize
Commonly held that denormalization is faster
• Normalization can be fast, right? Requires proper indexing, indexing effects write performance
• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too
• Does denormalizaiton improve a finite set of queries at the cost of several others? MongoDB works best in service to an application
Object–Relational Impedance Mismatch
• Inheritance hierarchies
• Polymorphic associations
Table Per Subclass
Vehiclesvinregistration maker
MotorcycleEngineraketrial Racebike
racing numberclassteamrider
Table Per Subclass
Vehicles- electric
- car- bus- motorcycle
- internal combustion-motorcycle - aircraft
- human powered- bicycle- skateboard
-horsedrawn
Table Per Concrete Class
• Each class is mapped to a separate table
• Inherited fields are present in each class’ table
• Can’t support polymorphic relationships
Table Per Concrete Class
• Each class is mapped to a separate table
• Inherited fields are present in each class’ table
• Can’t support polymorphic relationshipsSELECT maker FROM Motorcycles WHERE Motorcycles.country = "Italy"UNIONSELECT maker FROM Automobiles WHERE Automobiles.country = "Italy"
Table Per Class Family
• Classes mapped to a single table
NameF4
A104Triton 95
Typesportbikehelicoptersubmarine
Vehicle_id1234567891011
MakerM.V
AgustaM.V.
AgustaTriton
Table Per Class Family
• Classes mapped to a single table
• Discriminator column to identify class
discriminator
NameF4
A104Triton 95
Typesportbikehelicoptersubmarine
Vehicle_id1234567891011
MakerM.V
AgustaM.V.
AgustaTriton
Table Per Class Family
• Classes mapped to a single table
• Discriminator column to identify class
• Many empty columns, nullability issues
NameF4
A104Triton 95
Typesportbikehelicoptersubmarine
Vehicle_id1234567891011
MakerM.V
AgustaM.V.
AgustaTriton
Table Per Class Family
• Classes mapped to a single table
• Discriminator column to identify class
• Many empty columns, nullability issues
maker = “M.V. Agusta”, type = “sportbike”, num_doors = 0,wing_area = 0, maximum_depth = 0
???NameF4
A104Triton 95
Typesportbikehelicoptersubmarine
Vehicle_id1234567891011
MakerM.V
AgustaM.V.
AgustaTriton
The Tao of MongoDB{ maker : "M.V. Agusta",
type : sportsbike,engine : {
type : ”internal combustion",cylinders: 4,displacement : 750
},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"layout : "axial”,massflow : 1318
},Blades : 4undercarriage : "fixed"
}
The Tao of MongoDB{ maker : "M.V. Agusta",
type : sportsbike,engine : {
type : ”internal combustion",cylinders: 4,displacement : 750
},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopter,engine : {
type : "turboshaft"layout : "axial”,massflow : 1318
},Blades : 4,undercarriage : "fixed"
}
Discriminator column
The Tao of MongoDB{ maker : "M.V. Agusta",
type : sportsbike,engine : {
type : ”internal combustion",cylinders: 4,displacement : 750
},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"layout : "axial”,massflow : 1318
},Blades : 4,undercarriage : "fixed"
}
Shared indexing strategy
The Tao of MongoDB{ maker : "M.V. Agusta",
type : sportsbike,engine : {
type : ”internal combustion",cylinders: 4,displacement : 750
},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"layout : "axial”,massflow : 1318
},Blades : 4undercarriage : "fixed"
}
Polymorphic attributes
Relaxed ACID
• Atomic operations at the Document level
Relaxed ACID
• Atomic operations at the Document level
• Consistency – strong / eventual
Replication
Relaxed ACID
• Atomic operations at the Document level
• Consistency – strong / eventual
• Isolation - read lock, write lock / logical database
Relaxed ACID
• Atomic operations at the Document level
• Consistency – strong / eventual
• Isolation - read lock, write lock / logical database
• Durability – write ahead journal, replication
The Tao of MongoDB
• Document database
• Flexible schema
• Relaxed ACID
This favors denormalization. What’s the consequence?
Scaling MongoDB
Client Applicatio
n
Single InstanceOr
Replica Set
MongoDB
Sharded cluster
Partitioning
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
The Mechanism of Sharding
Complete Data Set
Define shard key on vehicle id
3456 56781234 45672345
The Mechanism of Sharding
Chunk Chunk
Define shard key on title
3456 56781234 45672345
The Mechanism of ShardingChunk Chunk ChunkChunk
Define shard key on vehicle id
3456 56781234 45672345
Chunk Chunk ChunkChunk
Shard 1 Shard 2 Shard 3 Shard 4
3456 56781234 45672345
Define shard key on vehicle id
Shard 1 Shard 2 Shard 3 Shard 4
TargetedOperations
Client
mongos
Shard 1 Shard 2 Shard 3 Shard 4
Data Growth
Shard 1 Shard 2 Shard 3 Shard 4
Load Balancing
Relational if you need to
• Enforce data constraints
• Service a broad set of queries
• Minimize redundancy
The Tao of MongoDB
• Avoid ad-hoc queries
• Model data for use, not storage
• Index effectively, index efficiently
Engineer, 10gen
Bryan Reinero
@blimpyacht
Thank You
Top Related