Sharding in MongoDB Days 2013
-
Upload
randall-hunt -
Category
Technology
-
view
908 -
download
3
description
Transcript of Sharding in MongoDB Days 2013
Introduction To ShardingJ. Randall HuntHackathoner, MongoDB@jrhunt, [email protected]
#MongoDBDays Chicago
In Today's Talk
• What? Why? When?
• How?
• What's happening beind the scenes?
What Is Sharding?
This is a picture of my cat.
This is a picture of ~100 cats.
http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg
This is a cat trying to find a home
webserver mongod
100 cats trying to find a home.
webserver mongod
(not to scale)
Scale Up?
Data Store Scalability
• Custom Hardware
• Custom Software
In the past you've had two options for achieving data store scalability: 1) custom hardware (oracle?) 2) custom software (google, facebook) !The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is incredibly small compared to the number of people using web services 10 years from now.
Scale Out?
Scale Out?
The MongoDB Sharding Solution
• Automatically partition your data
• Worry about failover at the partition layer
• Application independent
• Free and open source
Why Do I Shard?
Input/Output
You input/output exceeds the capacity of a single node or replica set.
this is not easy to do!
Working Set Exceeds Physical Memory
RAM
Working Set Exceeds Physical Memory
RAMData
Working Set Exceeds Physical Memory
RAMData Indexes
Working Set Exceeds Physical Memory
RAMData Indexes Sorts
Working Set Exceeds Physical Memory
RAMData Indexes Sorts Aggregations
Working Set Exceeds Physical Memory
RAMData Indexes Sorts Aggregations
Working Set Exceeds Physical Memory
How Does Sharding Work?
MongoDB's Sharding Infrastructure
mongod
MongoDB's Sharding Infrastructureapp server
mongodmongodmongod
MongoDB's Sharding Infrastructureapp server
shard
MongoDB's Sharding Infrastructureapp server
shard
MongoDB's Sharding Infrastructureapp server
shard
MongoDB's Sharding Infrastructureapp server
mongos
shard
MongoDB's Sharding Infrastructureapp server
mongos
mongod --configsvr
shard
MongoDB's Sharding Infrastructureapp server
mongos
mongod --configsvr
Terminology
• Shards
• Chunks
• Config Servers
• mongos
A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's data A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server config serves hold information about where chunks live mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.
What exactly is a shard?
• Shard is a node of the cluster
• Can be a single mongod or an entire replica set
Shard
Primary
Secondary
Secondary
Shard
orMongod
Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.
Partitioning
• User defines a shard key or uses hash based sharding
• Shard key defines a range of data
• The key space is like points on a line
• A range is a segment of that line
-∞ +∞Key Space
Remember interval notation?
Data Distribution
Initially a single chunk
Default Max Chunk Size: 64mb
MongoDB will automatically split and migrate chunks as they reach the max size
Node 1SecondaryConfigServer Shard 1
MongosMongos Mongos
Shard 2
Mongod
Shards and Shard Keys
Shards and Shard Keys
Chunks!
Shards and Shard Keys
Chunks!
Shard Keys!
What is a config server?
• A config server is for storing shard meta-data
• It stores chunk ranges and locations
• Run with 3 in production!
orNode 1SecondaryConfigServer
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
this is not a replica set, the three servers are purely for failover purposes. !pro-tip use CNAMEs to identify these.
What is a mongos?
• Acts as a router / balancer for queries and ops
• No local data (persists all info to the config servers)
• Can run with just one or many
App Server
Mongos Mongos
App Server App Server App Server
Mongos
or
MongoDB's Sharding Infrastructure
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
Shard Shard Shard
Mongos
App Server
Mongos
App Server
Mongos
App Server
Get Started With Sharding?
1. Choose a shard key (we'll talk about this later)
2. Start config servers
3. Turn on sharding
4. Profit.
Mechanics of ShardingOh hey there devops!
Start the Configuration Server
mongod --configsvr
Starts a configuration server on the default port (27019)
Node 1SecondaryConfigServer
Start the mongos router
mongos --configdb catconf.mongodb.com:27019
Node 1SecondaryConfigServer
Mongos
Start the mongod
mongod --shardsvr
Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Could have already been a part of the cluster
Node 1SecondaryConfigServer
Mongos
Mongod
Shard
Add the Shard
On mongos:
sh.addShard('cat1.mongodb.com:27018')
For a replica set:
sh.addShard('<rsname>/<seedlist>')
Node 1SecondaryConfigServer
Mongos
Mongod
Shard
Check that everything is working!
[mongos] admin> db.runCommand({ listshards: 1 }) { "shards": [ { "_id": "shard0000", "host": "cat1.mongodb.com:27018" } ], "ok": 1 }
Node 1SecondaryConfigServer
Mongos
Mongod
Shard
Now enable sharding
• Enable Sharding on a database sh.enableSharding("<dbname>")
• Shard a collection (with a key): sh.shardCollection( "<dbname>.cat", {"name": 1})
• Use a compound shard key to prevent duplicates sh.shardCollection( "<dbname>.cats", {"name": 1, "uniqueid": 1})
Tag Aware Sharding
• Total control over the distribution of your data!
• Tag a range of shard keys: sh.addTagRange(<collection>,<min>,<max>,<tag>)
• Tag a shard: sh.addShardTag("shard0000","NYC")
The Balancer
• Ensures even distribution of chunks across the cluster
• Transparent to driver and application
• Very tuneable but defaults are often sensible
try to minimize clock skew with ntpd
Routing Requests(Oh hi there application developers!)
Cluster Request Routing
Scatter Gather Targeted
Choose your own adventure!
Targeted Query
Shard Shard Shard
Mongos
Routable request received
Shard Shard Shard
Mongos
1
Request routed to appropriate shard
Shard Shard Shard
Mongos
1
2
Shard returns results
Shard Shard Shard
Mongos
1
2
3
mongos returns results to client
Shard Shard Shard
Mongos
1
2
3
4
Non-targeted queries
Shard Shard Shard
Mongos
request received
Shard Shard Shard
Mongos
1
Farm request out to all shards
Shard Shard Shard
Mongos
1
2 22
shards return results to mongos
Shard Shard Shard
Mongos
1
2 2 2
3 33
mongos returns results to client
Shard Shard Shard
Mongos
1
2 2 2
3 33
4
Choosing A Shard Key
Things to remember!
• Shard Key is immutable
• Shard key values are immutable
• Shard key must be indexed
• It is limited to 512 bytes in size
• Try to choose a field used in queries
• Only the shard key can be guaranteed unique across shards
should not be monotonically increasing!
How to choose your key?
• Cardinality
• Write Distribution
• Query Isolation
• Reliability
• Index Locality
Cardinality – Can your data be broken down enough? Query Isolation - query targeting to a specific shard Reliability – shard outages!A good shard key can: Optimize routing Minimize (unnecessary) traffic Allow best scaling !consider pre splitting no unique indexes keys unless part of the shard key !geokeys cannot be part of a shardkey $near won't work but the $geo commands work fine
Thanks!
• What's Next?
• Resources:https://education.mongodb.com/https://www.mongodb.com/presentations
• Me:@jrhunt, [email protected]
In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of control that you need for your applications while maintaining sensible defaults.