App Sharding to Autosharding at Sailthru

Ian WhiteCTO and Co-Founder, Sailthru

@[email protected]

www.sailthru.com [email protected]

App Sharding to Autosharding

http://www.sailthru.com/

mailto:[email protected]

C X

CX can you ensure these are high res?DONE

• Every user is unique

• Email, onsite, mobile, social, offline personalization on an individual level

• Optimizes conversion and drives retention for eCommerce and media

• Founded in 2008 by three engineers

• 170 employees in NYC, SF, LA, London

Sailthru

Using MongoDB 120 40 TBSince 2009

primary datastoreReplicaset nodes on metal infrastructure

25,000writes/second

Basic Sailthru Objects

850 Million 75 Million 2.5 BillionUser

ProfilesContent

DocumentsMessagesPer Month

The Challenge

•Sailthru is both

•Some apps are ready-heavy

•Some apps are write-heavy

Why Shard?

• Using MongoDB since 2009

• No autosharding capabilities at the time

• Too much data for a single node

Application Sharding?

• Application-level sharding

• Partition data by client

• Db class examines queryand routes to an appropriate

replica set and collection

Application Sharding

Querydb[‘profile’].find( {“client_id”:450, ”email”:”[email protected]”}

Querydb[‘profile.450’].find( {”email”:”[email protected]”})

Shard Map Config File{“profile”: {“shard_key”:”client_id”,”shards”: {“450”:”profile1”, “766”:”profile2”} }}

App Sharding: Advantages

• Smaller indexes due to collection partitioning

• Ability to add specific indices per client(not done much in practice)

App Sharding: Problems

• Uneven load distribution

• Writes bottlenecked by capacity of single server

• Manual rebalancing and allocation = lots of work for DB team

Solution:

Autosharding(Since MongoDB 1.6)

Selecting a Shard Key

• Individual reads

• Individual writes

• Cursored reads

Shard Key Options

•client_id? Uneven distribution

•email? Hard to handle null bucket

•_id? Uneven time-based distribution

Best Option

sh.shardCollection( "profile", { _id: "hashed" } )

•hash of _id

•Available sinceMongoDB 2.4

What about lookups by email?

Don’t want to hit every shard on every lookup

Solution: key collection{‘_id’:’<client> <keytype> <sha256_of_value>’, ‘sid’:<mongoid>}

profileprofile.key

_id _id

• Two quick lookups to individual shards is more scalable than hitting all.

•And autoshard that!

How We DidThe Move.Uptime is crit ical- cannot bring service down for infrastructure changes

Solution:Mongo-ConnectorCreated by MongoDB interns two summers ago.The Swiss army knife of moving data from set to set.

Solution: Mongo-Connector

• Tail oplog in legacy replica set

• Pipe data into autoshard clusterwith mongo-connector

• Repoint app to read/write autoshard

• Zero downtime

Solution: Mongo-Connector

• Our fork contains some improvements

• ts(timestamp) and ns(namespace)get added in separate collection

instead of the target document

https://github.com/sailthru/mongo-connector

But Wait! There’s More

• Mongo-Connector can also be used to

• Pipe data into alternate data stores(Hadoop, Solr, etc)

• Change autoshard keys if you made a mistake

In Conclusion

• Autosharding is helpful

• Think about shard key early

• Start by writing to a mongos,even when its just one set

profileprofile.key

_id _id

Q&A

www.sailthru.com [email protected]@sailthru

NYC HQ160 Varick St., 12th FloorNew York, NY 10013

San Francisco25 Taylor St., Room 724San Francisco, CA 94102

London18 Soho SquareLondon, UK, W1D 3QL

Los Angeles7083 Hollywood BlvdLos Angeles, CA 90028

Ian WhiteCTO and Co-Founder, Sailthru

@[email protected]

App Sharding to Autosharding at Sailthru

Technology

Transcript of App Sharding to Autosharding at Sailthru