Real-time Location Based Social Discovery using MongoDB

27
Real-time Location Based Social Discovery using MongoDB Fredrik Björk Director of Engineering MongoSV, Dec 4th 2012

description

The slides from my MongoSV 2012 presentation

Transcript of Real-time Location Based Social Discovery using MongoDB

Page 1: Real-time Location Based Social Discovery using MongoDB

Real-time Location Based Social Discovery using MongoDB

Fredrik BjörkDirector of Engineering

MongoSV, Dec 4th 2012

Page 2: Real-time Location Based Social Discovery using MongoDB

What is Banjo?

• The most powerful location based mobile technology that brings you the moments you would otherwise miss

• Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time

Page 3: Real-time Location Based Social Discovery using MongoDB

3

Page 4: Real-time Location Based Social Discovery using MongoDB

Stats

• Launched June 2011• 3 million users• Social graph of 400 million profiles• 50 billion connections• ~200 geo posts created per second

4

Page 5: Real-time Location Based Social Discovery using MongoDB

Why MongoDB?

• Developer friendly• Easy to maintain and scale• Automatic failover• Rapid prototyping of features• Good fit for consuming, storing and

presenting JSON data• Geospatial features out of the box

5

Page 6: Real-time Location Based Social Discovery using MongoDB

Infrastructure

• ~160 EC2 instances (75% MongoDB, 25% Redis)

• SSD drives for low latency• App servers (Sinatra & Rails) hosted on

Heroku• Mongos with authentication running on

dedicated servers

6

Page 7: Real-time Location Based Social Discovery using MongoDB

Geo tagged posts

• Consumed as JSON from social network APIs - streaming, polling & real-time callbacks

• Exposed via REST APIs as JSON to the Banjo iOS and Android apps

7

Page 8: Real-time Location Based Social Discovery using MongoDB

Schema design

8

https://twitter.com/fbjork/status/262989592561606656

Page 9: Real-time Location Based Social Discovery using MongoDB

9

> db.posts.find({ _id: ‘2:262989592561606656’ })

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, ...

}

https://twitter.com/fbjork/status/262989592561606656

• _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness

Page 10: Real-time Location Based Social Discovery using MongoDB

10

• Coordinates are stored inside an array with latitude, longitude

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, coordinates: [37.784234,-122.438212],...

}

Page 11: Real-time Location Based Social Discovery using MongoDB

11

• Friends are stored inside an array

{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/events/mongosv”, coordinates: [37.784234,-122.438212],friend_ids: [8816792, 10324882, 2006261, ...]

}

Page 12: Real-time Location Based Social Discovery using MongoDB

12

Page 13: Real-time Location Based Social Discovery using MongoDB

Geospatial Indexing• Create the geo index:

13

> db.posts.ensureIndex( { coordinates: ‘2d’ } )

Page 14: Real-time Location Based Social Discovery using MongoDB

14

> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } )

{ _id: “2:809438082”, coordinates: [25.792610,-80.226100], username: “Rebecca_Boorsma”, text: “I love Miami!”, ... }

{ _id: “2:1234567”, coordinates: [25.781324,-80.431423], username: “foo”, text: “Another day, another dollar”, ... }

Find nearby posts in Miami:

Page 15: Real-time Location Based Social Discovery using MongoDB

15

Page 16: Real-time Location Based Social Discovery using MongoDB

16

> db.posts.find({ friend_ids: { $in: [2006261] })

{ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...],...

}

Find friend posts globally:

Page 17: Real-time Location Based Social Discovery using MongoDB

17

> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_ids: { $in: [2006261] })

{ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...],...

}

Find friend posts in a location:

Page 18: Real-time Location Based Social Discovery using MongoDB

Compound geo indexes• Create a compound index on coordinates

and friend_ids:

18

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )

Page 19: Real-time Location Based Social Discovery using MongoDB

19

• Fails for compound indexes with large arrays

• Geospatial indexes have a size limit of 1000 bytes

> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )

Error: Key too large to index

Page 20: Real-time Location Based Social Discovery using MongoDB

Geospatial query performance

• Do we need a compound index at all?• Geospatial index is usually restrictive

enough• Problem: Array traversal (using $in) is

CPU hungry for large arrays• Solution: Pre-sharded array fields

20

Page 21: Real-time Location Based Social Discovery using MongoDB

Pre-sharded array fields

• When dealing with large arrays, i.e @BarackObama follower ids

• Partition fields using pre-sharding• shard = Hash(key) MOD shard_count• Keep array sizes in the low hundreds

21

Page 22: Real-time Location Based Social Discovery using MongoDB

22

{friends_0: [1000, 1002, 1006],friends_1: [1004],friends_2: [1001, 1003, 1005]

}

# shard_example.rb

SHARDS = 3friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS }0202120

Page 23: Real-time Location Based Social Discovery using MongoDB

23

> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] }, friend_0: { $in: [1000] })

{friends_0: [1000, 1002, 1006],friends_1: [1004],friends_2: [1001, 1003, 1005]

}

Find friend posts using pre-sharding of the friend arrays:

Page 24: Real-time Location Based Social Discovery using MongoDB

Capped collections

• Good fit for storing a feed of posts for a period of time

• Eliminates need to expire old posts• Documents can’t grow• Documents can’t be deleted• Resizing collections is painful• Can’t be sharded

24

Page 25: Real-time Location Based Social Discovery using MongoDB

TTL collections

• We switched to TTL collections with MongoDB 2.2

• Deleting and growing documents is now possible

• Easier to change expiration times• Can be sharded (not by geo)

25

Page 26: Real-time Location Based Social Discovery using MongoDB

Questions

26

Page 27: Real-time Location Based Social Discovery using MongoDB

Thank you!

Available: iPhone and Android

[email protected]@fbjork