App Sharding to Autosharding at Sailthru
-
Upload
mongodb -
Category
Technology
-
view
2.803 -
download
0
Transcript of App Sharding to Autosharding at Sailthru
Ian WhiteCTO and Co-Founder, Sailthru
www.sailthru.com [email protected]
App Sharding to Autosharding
• Every user is unique
• Email, onsite, mobile, social, offline personalization on an individual level
• Optimizes conversion and drives retention for eCommerce and media
• Founded in 2008 by three engineers
• 170 employees in NYC, SF, LA, London
Sailthru
Using MongoDB 120 40 TBSince 2009
primary datastoreReplicaset nodes on metal infrastructure
25,000writes/second
Basic Sailthru Objects
850 Million 75 Million 2.5 BillionUser
ProfilesContent
DocumentsMessagesPer Month
The Challenge
•Sailthru is both
•Some apps are ready-heavy
•Some apps are write-heavy
Why Shard?
• Using MongoDB since 2009
• No autosharding capabilities at the time
• Too much data for a single node
Application Sharding?
• Application-level sharding
• Partition data by client
• Db class examines queryand routes to an appropriate
replica set and collection
Application Sharding
Querydb[‘profile’].find( {“client_id”:450, ”email”:”[email protected]”}
Querydb[‘profile.450’].find( {”email”:”[email protected]”})
Shard Map Config File{“profile”: {“shard_key”:”client_id”,”shards”: {“450”:”profile1”, “766”:”profile2”} }}
App Sharding: Advantages
• Smaller indexes due to collection partitioning
• Ability to add specific indices per client(not done much in practice)
App Sharding: Problems
• Uneven load distribution
• Writes bottlenecked by capacity of single server
• Manual rebalancing and allocation = lots of work for DB team
Solution:
Autosharding(Since MongoDB 1.6)
Selecting a Shard Key
• Individual reads
• Individual writes
• Cursored reads
Shard Key Options
•client_id? Uneven distribution
•email? Hard to handle null bucket
•_id? Uneven time-based distribution
Best Option
sh.shardCollection( "profile", { _id: "hashed" } )
•hash of _id
•Available sinceMongoDB 2.4
What about lookups by email?
Don’t want to hit every shard on every lookup
Solution: key collection{‘_id’:’<client> <keytype> <sha256_of_value>’, ‘sid’:<mongoid>}
profileprofile.key
_id _id
• Two quick lookups to individual shards is more scalable than hitting all.
•And autoshard that!
How We DidThe Move.Uptime is crit ical- cannot bring service down for infrastructure changes
Solution:Mongo-ConnectorCreated by MongoDB interns two summers ago.The Swiss army knife of moving data from set to set.
Solution: Mongo-Connector
• Tail oplog in legacy replica set
• Pipe data into autoshard clusterwith mongo-connector
• Repoint app to read/write autoshard
• Zero downtime
Solution: Mongo-Connector
• Our fork contains some improvements
• ts(timestamp) and ns(namespace)get added in separate collection
instead of the target document
https://github.com/sailthru/mongo-connector
But Wait! There’s More
• Mongo-Connector can also be used to
• Pipe data into alternate data stores(Hadoop, Solr, etc)
• Change autoshard keys if you made a mistake
In Conclusion
• Autosharding is helpful
• Think about shard key early
• Start by writing to a mongos,even when its just one set
profileprofile.key
_id _id
Q&A
www.sailthru.com [email protected]@sailthru
NYC HQ160 Varick St., 12th FloorNew York, NY 10013
San Francisco25 Taylor St., Room 724San Francisco, CA 94102
London18 Soho SquareLondon, UK, W1D 3QL
Los Angeles7083 Hollywood BlvdLos Angeles, CA 90028
Ian WhiteCTO and Co-Founder, Sailthru