Introduction to Sharding
description
Transcript of Introduction to Sharding
![Page 1: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/1.jpg)
Software Engineer, 10gen
@brandonmblack
Brandon Black
#MongoDBDays
Introduction to Sharding
![Page 2: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/2.jpg)
Agenda
• Scaling Data
• MongoDB's Approach
• Architecture
• Configuration
• Mechanics
![Page 3: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/3.jpg)
Scaling Data
![Page 4: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/4.jpg)
Examining Growth
• User Growth– 1995: 0.4% of the world’s population– Today: 30% of the world is online (~2.2B)– Emerging Markets & Mobile
• Data Set Growth– Facebook’s data set is around 100 petabytes– 4 billion photos taken in the last year (4x a decade
ago)
![Page 5: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/5.jpg)
Read/Write Throughput Exceeds I/O
![Page 6: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/6.jpg)
Working Set Exceeds Physical Memory
![Page 7: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/7.jpg)
Vertical Scalability (Scale Up)
![Page 8: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/8.jpg)
Horizontal Scalability (Scale Out)
![Page 9: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/9.jpg)
Data Store Scalability
• Custom Hardware– Oracle
• Custom Software– Facebook + MySQL– Google
![Page 10: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/10.jpg)
Data Store Scalability Today
• MongoDB Auto-Sharding
• A data store that is– Free– Publicly available– Open source
(https://github.com/mongodb/mongo)– Horizontally scalable– Application independent
![Page 11: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/11.jpg)
MongoDB's Approach to Sharding
![Page 12: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/12.jpg)
Partitioning
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
![Page 13: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/13.jpg)
Data Distribution
• Initially 1 chunk
• Default max chunk size: 64mb
• MongoDB automatically splits & migrates chunks when max reached
![Page 14: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/14.jpg)
Routing and Balancing
• Queries routed to specific shards
• MongoDB balances cluster
• MongoDB migrates data to new nodes
![Page 15: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/15.jpg)
MongoDB Auto-Sharding
• Minimal effort required– Same interface as single mongod
• Two steps– Enable Sharding for a database– Shard collection within database
![Page 16: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/16.jpg)
Architecture
![Page 17: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/17.jpg)
What is a Shard?
• Shard is a node of the cluster
• Shard can be a single mongod or a replica set
![Page 18: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/18.jpg)
• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have
3)– Not a replica set
Meta Data Storage
![Page 19: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/19.jpg)
Routing and Managing Data
• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many
![Page 20: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/20.jpg)
Sharding infrastructure
![Page 21: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/21.jpg)
Configuration
![Page 22: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/22.jpg)
Example Cluster
• Don’t use this setup in production!- Only one Config server (No Fault Tolerance)- Shard not in a replica set (Low Availability)- Only one mongos and shard (No Performance Improvement)- Useful for development or demonstrating configuration
mechanics
![Page 23: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/23.jpg)
Starting the Configuration Server
• mongod --configsvr• Starts a configuration server on the default port
(27019)
![Page 24: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/24.jpg)
Start the mongos Router
• mongos --configdb <hostname>:27019• For 3 configuration servers:
mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>
• This is always how to start a new mongos, even if the cluster is already running
![Page 25: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/25.jpg)
Start the shard database
• mongod --shardsvr• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production
![Page 26: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/26.jpg)
Add the Shard
• On mongos: - sh.addShard(‘<host>:27018’)
• Adding a replica set: - sh.addShard(‘<rsname>/<seedlist>’)
![Page 27: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/27.jpg)
Verify that the shard was added
• db.runCommand({ listshards:1 }) { "shards" : [{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ],
"ok" : 1 }
![Page 28: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/28.jpg)
Enabling Sharding
• Enable sharding on a database
sh.enableSharding(“<dbname>”)
• Shard a collection with the given key
sh.shardCollection(“<dbname>.people”,{“country”:1})
• Use a compound shard key to prevent duplicates
sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})
![Page 29: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/29.jpg)
Tag Aware Sharding
• Tag aware sharding allows you to control the distribution of your data
• Tag a range of shard keys– sh.addTagRange(<collection>,<min>,<max>,<t
ag>)
• Tag a shard– sh.addShardTag(<shard>,<tag>)
![Page 30: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/30.jpg)
Mechanics
![Page 31: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/31.jpg)
Partitioning
• Remember it's based on ranges
![Page 32: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/32.jpg)
Chunk is a section of the entire range
![Page 33: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/33.jpg)
Chunk splitting
• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same
shard key• Chunk split is a logical operation (no data is moved)
![Page 34: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/34.jpg)
Balancing
• Balancer is running on mongos• Once the difference in chunks between the most
dense shard and the least dense shard is above the migration threshold, a balancing round starts
![Page 35: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/35.jpg)
Acquiring the Balancer Lock
• The balancer on mongos takes out a “balancer lock”• To see the status of these locks:
use configdb.locks.find({ _id: “balancer” })
![Page 36: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/36.jpg)
Moving the chunk
• The mongos sends a moveChunk command to source shard
• The source shard then notifies destination shard• Destination shard starts pulling documents from
source shard
![Page 37: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/37.jpg)
Committing Migration
• When complete, destination shard updates config server- Provides new locations of the chunks
![Page 38: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/38.jpg)
Cleanup
• Source shard deletes moved data- Must wait for open cursors to either close or time out- NoTimeout cursors may prevent the release of the lock
• The mongos releases the balancer lock after old chunks are deleted
![Page 39: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/39.jpg)
Routing Requests
![Page 40: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/40.jpg)
Cluster Request Routing
• Targeted Queries
• Scatter Gather Queries
• Scatter Gather Queries with Sort
![Page 41: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/41.jpg)
Cluster Request Routing: Targeted Query
![Page 42: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/42.jpg)
Routable request received
![Page 43: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/43.jpg)
Request routed to appropriate shard
![Page 44: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/44.jpg)
Shard returns results
![Page 45: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/45.jpg)
Mongos returns results to client
![Page 46: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/46.jpg)
Cluster Request Routing: Non-Targeted Query
![Page 47: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/47.jpg)
Non-Targeted Request Received
![Page 48: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/48.jpg)
Request sent to all shards
![Page 49: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/49.jpg)
Shards return results to mongos
![Page 50: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/50.jpg)
Mongos returns results to client
![Page 51: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/51.jpg)
Cluster Request Routing: Non-Targeted Query with Sort
![Page 52: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/52.jpg)
Non-Targeted request with sort received
![Page 53: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/53.jpg)
Request sent to all shards
![Page 54: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/54.jpg)
Query and sort performed locally
![Page 55: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/55.jpg)
Shards return results to mongos
![Page 56: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/56.jpg)
Mongos merges sorted results
![Page 57: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/57.jpg)
Mongos returns results to client
![Page 58: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/58.jpg)
Shard Key
![Page 59: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/59.jpg)
Shard Key
• Shard key is immutable
• Shard key values are immutable
• Shard key must be indexed
• Shard key limited to 512 bytes in size
• Shard key used to route queries– Choose a field commonly used in queries
• Only shard key can be unique across shards– `_id` field is only unique within individual shard
![Page 60: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/60.jpg)
Shard Key Considerations
• Cardinality
• Write Distribution
• Query Isolation
• Reliability
• Index Locality
![Page 61: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/61.jpg)
Conclusion
![Page 62: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/62.jpg)
Read/Write Throughput Exceeds I/O
![Page 63: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/63.jpg)
Working Set Exceeds Physical Memory
![Page 64: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/64.jpg)
Sharding Enables Scale
• MongoDB’s Auto-Sharding– Easy to Configure– Consistent Interface– Free and Open Source
![Page 65: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/65.jpg)
• What’s next?– Webinar: Technical Overview of MongoDB (March
7th)– MongoDB User Group
• Resourceshttps://education.10gen.com/http://www.10gen.com/presentationshttp://github.com/brandonblack/presentations
![Page 66: Introduction to Sharding](https://reader035.fdocuments.us/reader035/viewer/2022081414/54b79df54a795998738b45e1/html5/thumbnails/66.jpg)
Software Engineer, 10gen
@brandonmblack
Brandon Black
#MongoDBDays
Thank You