Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive
Transcript of Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive
![Page 1: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/1.jpg)
Scaling The Facebook Realtime Endpoint Using MongoDBPRESENTED BY:
Justin Medoy and Mike SherovSNAP Interactive
[email protected]@snap-interactive
![Page 2: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/2.jpg)
Redefining the Way People Meet & Socialize Online
![Page 3: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/3.jpg)
What are Facebook Realtime Updates?
Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook."
What it means: "You provide a URL,Facebook pings it when users do stuff."
![Page 4: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/4.jpg)
Pings from Facebook
● Every minute we get around 20 pings from facebook that contain data for around 11,000 users
{"object": "user","entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },....]}
![Page 5: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/5.jpg)
WHAT?!? Where's the data?
● Facebook tells you that something about the field changed, but not what the current data is.
![Page 6: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/6.jpg)
Retrieving User Data from the Graph
● Solution: go back to Facebook and grab the user's datahttps://graph.facebook.com?ids=<USERID>&fields=music,movies,likes*This will only get data that the user has made publicly available
● To avoid timeouts each call to Facebook only asks for the data for 25 users*Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds
![Page 7: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/7.jpg)
Update the user's profile
● Facebook won't tell you exactly what's changed but we can figure it out from our own data
All Data - Stored Data = Changed Data
● The next step is to update the user's profile with this changed data
![Page 8: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/8.jpg)
Mongo Architecture
● Mongo 2.0.2● Mongo PHP driver 1.2.10● Two separate replica sets
○ User data○ Interest data
● Why separate replica sets?○ Keep as much of the index as possible in
memory○ Disk reads are expensive
![Page 9: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/9.jpg)
User Data Replica Set
Design Challenge● Random access pattern over 106 million
documents
![Page 10: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/10.jpg)
User Data Replica Set
● Large $in queries● High page faults in
MMS● We upgraded from
32G to 128G on each node
![Page 11: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/11.jpg)
Indexes
● We added duplicates of some of our indexes with reversed fields
● Updating all of these extra indexes was a huge bottleneck
![Page 12: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/12.jpg)
Indexes
● Unique index uid_1● profile.sync_1_installed_1_platforms.facebook_1● email_1● uid_1_installed_1● last_login_1_uid_1
![Page 13: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/13.jpg)
Indexes
● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users
○ limit the amount of data Facebook can send in one minute● High number of writes and a large number of
indexes prevented the secondaries from reading the oplog because of the global write lock○ Increase the size of the oplog○ This is fixed in 2.2.1
![Page 14: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/14.jpg)
Indexes and the realtime endpoint
profile.sync_1_installed_1_platforms.facebook_1● Filtered 11,000 users a minute down to a few hundred
○ moved filtering logic out of PHP into the index● Added efficiency from covered index
○ All we need is platforms.facebook, which is part of the index
![Page 15: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/15.jpg)
Interest Replica Set
Different set of challenges than User repl set● Needs to power typeahead● 64 million interests● Access pattern based on interest popularity
○ Lady Gaga is going to get accessed more than Ladybug, Javascript
![Page 16: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/16.jpg)
The Typeahead{
"_id" : ObjectId("4f511a230624967b7d000003"),"name" : "Rubiks Cube","search" : "rubiks cube","subsearch" : [
"r","ru","rub","rubi","rubik","rubiks","rubiks ","rubiks c","rubiks cu","rubiks cub"
],"popularity" : NumberLong(907)
}
![Page 17: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/17.jpg)
The Typeahead
● Add an array with the first few characters of interest
● Add an index on that field● This allows us to have 10 entries in 1 index
instead of 10 separate indexes
http://docs.mongodb.org/manual/core/indexes/#index-type-multikey
![Page 18: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/18.jpg)
Typeahead indexes
subsearch_1_popularity_-1● Specifying -1 for the popularity component of
the index naturally causes the typeahead to show more popular interests first
![Page 19: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/19.jpg)
Lessons Learned
● Don't over index● Covered indexes when possible● indexes to reduce size of returned data● Keep everything in memory● Multikey index for typeaheads● Utilize -1 in index for natural sorting
![Page 20: Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive](https://reader031.fdocuments.us/reader031/viewer/2022032419/55a2bf4c1a28ab093f8b460c/html5/thumbnails/20.jpg)
SNAP Interactive, Inc.Contact Information
● SNAP Interactive, Inc.SNAP-Interactive.com
● Justin MedoyTeam Lead / Software [email protected]
● Mike SherovLead [email protected] @mikesherov
● For more information on our open positions, email [email protected] or check our website at www.snap-interactive.com/jobs/job-openings
meet people like you