App Sharding to Autosharding at Sailthru

24

Transcript of App Sharding to Autosharding at Sailthru

Page 1: App Sharding to Autosharding at Sailthru
Page 2: App Sharding to Autosharding at Sailthru

Ian WhiteCTO and Co-Founder, Sailthru

@[email protected]

www.sailthru.com [email protected]

App Sharding to Autosharding

C X
CX can you ensure these are high res?DONE
Page 3: App Sharding to Autosharding at Sailthru

• Every user is unique

• Email, onsite, mobile, social, offline personalization on an individual level

• Optimizes conversion and drives retention for eCommerce and media

• Founded in 2008 by three engineers

• 170 employees in NYC, SF, LA, London

Page 4: App Sharding to Autosharding at Sailthru

Sailthru

Using MongoDB 120 40 TBSince 2009

primary datastoreReplicaset nodes on metal infrastructure

25,000writes/second

Page 5: App Sharding to Autosharding at Sailthru

Basic Sailthru Objects

850 Million 75 Million 2.5 BillionUser

ProfilesContent

DocumentsMessagesPer Month

Page 6: App Sharding to Autosharding at Sailthru

The Challenge

•Sailthru is both

•Some apps are ready-heavy

•Some apps are write-heavy

Page 7: App Sharding to Autosharding at Sailthru

Why Shard?

• Using MongoDB since 2009

• No autosharding capabilities at the time

• Too much data for a single node

Page 8: App Sharding to Autosharding at Sailthru

Application Sharding?

• Application-level sharding

• Partition data by client

• Db class examines queryand routes to an appropriate

replica set and collection

Page 9: App Sharding to Autosharding at Sailthru

Application Sharding

Querydb[‘profile’].find( {“client_id”:450, ”email”:”[email protected]”}

Querydb[‘profile.450’].find( {”email”:”[email protected]”})

Shard Map Config File{“profile”: {“shard_key”:”client_id”,”shards”: {“450”:”profile1”, “766”:”profile2”} }}

Page 10: App Sharding to Autosharding at Sailthru

App Sharding: Advantages

• Smaller indexes due to collection partitioning

• Ability to add specific indices per client(not done much in practice)

Page 11: App Sharding to Autosharding at Sailthru

App Sharding: Problems

• Uneven load distribution

• Writes bottlenecked by capacity of single server

• Manual rebalancing and allocation = lots of work for DB team

Page 12: App Sharding to Autosharding at Sailthru

Solution:

Autosharding(Since MongoDB 1.6)

Page 13: App Sharding to Autosharding at Sailthru

Selecting a Shard Key

• Individual reads

• Individual writes

• Cursored reads

Page 14: App Sharding to Autosharding at Sailthru

Shard Key Options

•client_id? Uneven distribution

•email? Hard to handle null bucket

•_id? Uneven time-based distribution

Page 15: App Sharding to Autosharding at Sailthru

Best Option

sh.shardCollection( "profile", { _id: "hashed" } )

•hash of _id

•Available sinceMongoDB 2.4

Page 16: App Sharding to Autosharding at Sailthru

What about lookups by email?

Don’t want to hit every shard on every lookup

Page 17: App Sharding to Autosharding at Sailthru

Solution: key collection{‘_id’:’<client> <keytype> <sha256_of_value>’, ‘sid’:<mongoid>}

profileprofile.key

_id _id

• Two quick lookups to individual shards is more scalable than hitting all.

•And autoshard that!

Page 18: App Sharding to Autosharding at Sailthru

How We DidThe Move.Uptime is crit ical- cannot bring service down for infrastructure changes

Page 19: App Sharding to Autosharding at Sailthru

Solution:Mongo-ConnectorCreated by MongoDB interns two summers ago.The Swiss army knife of moving data from set to set.

Page 20: App Sharding to Autosharding at Sailthru

Solution: Mongo-Connector

• Tail oplog in legacy replica set

• Pipe data into autoshard clusterwith mongo-connector

• Repoint app to read/write autoshard

• Zero downtime

Page 21: App Sharding to Autosharding at Sailthru

Solution: Mongo-Connector

• Our fork contains some improvements

• ts(timestamp) and ns(namespace)get added in separate collection

instead of the target document

https://github.com/sailthru/mongo-connector

Page 22: App Sharding to Autosharding at Sailthru

But Wait! There’s More

• Mongo-Connector can also be used to

• Pipe data into alternate data stores(Hadoop, Solr, etc)

• Change autoshard keys if you made a mistake

Page 23: App Sharding to Autosharding at Sailthru

In Conclusion

• Autosharding is helpful

• Think about shard key early

• Start by writing to a mongos,even when its just one set

profileprofile.key

_id _id

Page 24: App Sharding to Autosharding at Sailthru

Q&A

www.sailthru.com [email protected]@sailthru

NYC HQ160 Varick St., 12th FloorNew York, NY 10013

San Francisco25 Taylor St., Room 724San Francisco, CA 94102

London18 Soho SquareLondon, UK, W1D 3QL

Los Angeles7083 Hollywood BlvdLos Angeles, CA 90028

Ian WhiteCTO and Co-Founder, Sailthru

@[email protected]