How to Achieve Scale with MongoDB
-
Upload
mongodb -
Category
Technology
-
view
20 -
download
1
description
Transcript of How to Achieve Scale with MongoDB
Sr. Solutions Architect, MongoDB
Jake Angerman
How to Achieve Scale with MongoDB
Today’s Webinar Agenda
Schema Design
Indexes
Monitoring your Workload
Optimization Tips
Scale Vertically
Horizontal Scaling
Achieve Scale
1
2
3
Optimization Tips toScale Your App
Premature Optimization
• There is no doubt that the grail of efficiency leads to
abuse. Programmers waste enormous amounts of time
thinking about, or worrying about, the speed of
noncritical parts of their programs, and these attempts
at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We
should forget about small efficiencies, say about 97% of
the time: premature optimization is the root of all
evil. Yet we should not pass up our opportunities in
that critical 3%.
- Donald Knuth, 1974
Premature Optimization
• "There is no doubt that the grail of efficiency leads to
abuse. Programmers waste enormous amounts of time
thinking about, or worrying about, the speed of
noncritical parts of their programs, and these attempts
at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We
should forget about small efficiencies, say about 97% of
the time: premature optimization is the root of all
evil. Yet we should not pass up our opportunities in
that critical 3%."
- Donald Knuth, 1974
Premature Optimization
• "There is no doubt that the grail of efficiency leads to
abuse. Programmers waste enormous amounts of time
thinking about, or worrying about, the speed of
noncritical parts of their programs, and these attempts
at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We
should forget about small efficiencies, say about 97%
of the time: premature optimization is the root of
all evil. Yet we should not pass up our opportunities in
that critical 3%."
- Donald Knuth, 1974
Schema Design
• Document Model
• Dynamic Schema
• Collections
{ "customer_id" : 123,"first_name" : ”John","last_name" : "Smith","address" : { "street": "123 Main
Street", "city": "Houston", "state": "TX", "zip_code": "77027"
}policies: [ {
policy_number : 13,description: “short
term”,deductible: 500
},{ policy_number : 14,
description: “dental”,visits: […]
} ] }
The Importance of Schema Design
• MongoDB schemas are built oppositely than relational
schemas!
• Relational Schema:– normalize data– write complex queries to join the data– let the query planner figure out how to make queries efficient
• MongoDB Schema:– denormalize the data– create a (potentially complex) schema with prior knowledge
of your actual (not just predicted) query patterns– write simple queries
Real World Example: Optimizing Schema for Scale
Product catalog schema for retailer selling in 20 countries
{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>
}
What's good about this schema?
• Each document contains all the data about the product across all possible locales.
• It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc).
But that's not how the data was accessed
db.catalog.find( { _id: 375 }, { en_US:
true } );
db.catalog.find( { _id: 375 }, { fr_FR:
true } );
db.catalog.find( { _id: 375 }, { de_DE:
true } );
… and so forth for other locales
The data model did not fit the access pattern.
Why is this inefficient?
Data in RED are
being used. Data
in BLUE take up
memory but are
not in demand.
{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>
}
{_id: 42,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>
}
Consequences of the schema
• Each document contained 20x more data than the common use case requires
• Disk IO was too high for the relatively modest query load on the dataset
• MongoDB lets you request a subset of a document's contents via projection…
• … but the entire document must be loaded into RAM to service the request
Consequences of the schema redesign
{_id: "375-en_GB",name: …,description: …, <… the rest of the document …>
}
• Queries induced minimal memory overhead
• 20x as many distinct products fit in RAM at once
• Disk IO utilization reduced
• Application latency reduced
Schema Design Patterns
• Pattern: pre-computing interesting quantities, ideally
with each write operation
• Pattern: putting unrelated items in different collections
to take advantage of indexing
• Anti-pattern: appending to arrays ad infinitum
• Anti-pattern: importing relational schemas directly into
MongoDB
Schema Design Tips
• Avoid inherently slow operations– Updates of unindexed arrays of several thousand elements– Updates of indexed arrays of several hundred elements– Document moves
• Arrays are great, but know how to use them
Schema Design resources
• Blog series, "6 rules of thumb"– Part 1: http://goo.gl/TFJ3dr– Part 2: http://goo.gl/qTdGhP– Part 3: http://goo.gl/JFO1pI
Indexing
• Indexes are tree-structured sets of references to your
documents
• Indexes are the single biggest tunable performance
factor in the database
• Indexing and schema design go hand in hand
Indexing Mistakes
• Failing to build necessary indexes
• Building unnecessary indexes
• Running ad-hoc queries in production
Indexing Fixes
• Failing to build necessary indexes– Run .explain(), examine slow query log, mtools,
system.profile collection
• Building unnecessary indexes– Talk to your application developers about usage
• Running ad-hoc queries in production– Use a staging environment, use secondaries
mongod log files
Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
mongod log files
Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
date and time
threadoperatio
n
nam
esp
ace
n…
counte
rs
locktimes
duration
number of yields
You need a tool when doing log file analysis
mtools
• http://github.com/rueckstiess/mtools
• log file analysis for poorly performing queries– Show me queries that took more than 1000 ms from 6 am to
6 pm:– mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log
Graphing with mtools
% mplotqueries --type histogram --group namespace --bucketSize 3600
Real World Example: Indexing for Scale
Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
Document schema
{
_id: ObjectId("53b9ab7e939f1e229b4f574c"),
firstName: "Alice",
lastName: "Smith",
parent: {
company: 22794,
employeeId: 83881
}
}
But there's an index!?!
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"company" : 1,
"employeeId" : 1
},
"ns" : "test.docs",
"name" : "company_1_employeeId_1"
}]
But there's an index!?!
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"company" : 1,
"employeeId" : 1
},
"ns" : "test.docs",
"name" : "company_1_employeeId_1"
}]
This isn't the index
you're looking
for.
Did you see the problem?
{
_id: ObjectId("53b9ab7e939f1e229b4f574c"),
firstName: "Alice",
lastName: "Smith",
parent: {
company: 22794,
employeeId: 83881
}
}
The index was created incorrectly
db.system.indexes.find().toArray()
[{
"v" : 1,
"key" : {
"parent.company" : 1,
"parent.employeeId" : 1
},
"ns" : "test.docs",
"name" :
"parent.company_1_parent.employeeId_1"
}]
Subdocument needed
Indexing Strategies
• Create indexes that support your queries!
• Create highly selective indexes
• Eliminate duplicate indexes with a compound index, if
possible– db.collection.ensureIndex({A:1, B:1, C:1})– allows queries using leftmost prefix
• Order compound index fields thusly: equality, sort,
then range– see http://emptysqua.re/blog/optimizing-mongodb-
compound-indexes/
• Create indexes that support covered queries
• Prevent collection scans in pre-production
environments– mongod --notablescan– db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )
Monitoring Your Workload
• Log files, iostat, mtools, mongotop are for debugging
• MongoDB Management Service (MMS) can do
metrics collection and reporting
What can MMS do?
Database Metrics
Hardware statistics (CPU, disk)
MMS Monitoring Setup
Cloud Version of MMS
1. Go to http://mms.mongodb.com
2. Create an account
3. Install one agent in your datacenter
4. Add hosts from the web interface
5. Enjoy!
Today’s Webinar Agenda
Hardware ConsiderationsScale Vertically
Horizontal Scaling
Achieve Scale
2
3
Optimization Tips
1
Vertical Scaling
Factors:– RAM– Disk– CPU– Network
Primary
Secondary
Secondary
Replica SetPrimary
Secondary
Secondary
Replica Set
Horizontal Scaling
Working Set Exceeds Physical Memory
RAM - Measure your working set and index sizes
• db.serverStatus({workingSet:1}).workingSet{ "computationTimeMicros": 2751, "note": "thisIsAnEstimate", "overSeconds": 1084, "pagesInMemory": 2041}
• db.stats().indexSize2032880640
• In this example,
(2041 * 4096) + 2032880640 =
2041240576 bytes
= 1.9 GB
• Note: this is a subset of the virtual memory used by
mongod
Real World Example: Vertical Scaling
• System that tracked status information for entities in
the business
• State changes happen in batches; sometimes 10% of
entities get updated, sometimes 100% get updated
Initial Architecture
Sharded cluster with 4 shards using spinning disks
Application / mongosmongod
Adding shards to scale horizontally
• Application was a success! Business entities grew by a
factor of 5
• Cluster capacity multiplied by 5, but so did the TCOApplication / mongos
…16 more shards…
mongod
More success means more shards
• 10x growth means … 200 shards
• Horizontal scaling with sharding is linear scaling, but
an order of magnitude was needed
• Bulk updates of random documents approaches
speed of disks
Final architecture
• Scaling the random IOPS with SSDs was a vertical
scaling approach
Application / mongosmongod SSD
Before you add hardware…
• Make sure you are solving the right scaling problem
• Remedy schema and index problems first– schema and index problems can look like hardware problems
• Tune the Operating System– ulimits, swap, NUMA, NOOP scheduler with hypervisors
• Tune the IO subsystem– ext4 or XFS vs SAN, RAID10, readahead, noatime
• See MongoDB "production notes" page
• Heed logfile startup warnings
Today’s Webinar Agenda
The Basics of ShardingHorizontal
Scaling
Achieve Scale
3
Optimization Tips
1
Scale Vertically2
The basics ofHorizontal Scaling
The basics ofHorizontal Scaling(aka Sharding)
The Basics of Sharding
Rule of Thumb
To make good decisions about MongoDB implementations, you must understand MongoDB and
your applications and the workload your applications generate and your business
requirements.
Summary
• Don't throw hardware at the problem until you
examine all other possibilities (schema, indexes, OS,
IO subsystem)
• Know what is considered "normal" performance by
monitoring
• Horizontal scaling in MongoDB is implemented with
sharding, but you must understand schema design
and indexing before you shard
Sharding a sub-optimally designed database will not
make it performant
Today’s Webinar Agenda
The Basics of ShardingHorizontal
Scaling
Achieve Scale
3
Schema Design
Indexes
Monitoring your Workload
Scale Vertically2
Optimization Tips
1
Limited Time: Get Expert Advice for Free
If you’re thinking about scaling, why reinvent the wheel?
Our experts can collaborate with you to provide detailed guidance.
Sign Up For a Free One Hour Consult:
http://bit.ly/1rkXcfN
Questions?
Stay tuned after the webinar and take our survey for your chance to win MongoDB schwag.
Sr. Solutions Architect, MongoDB
Jake Angerman
Thank You