TAO: Facebook's Distributed Data Store for the Social Graph

30
Papers We Love Iasi Kickstart

Transcript of TAO: Facebook's Distributed Data Store for the Social Graph

Page 1: TAO: Facebook's Distributed Data Store for the Social Graph

Papers We Love Iasi

Kickstart

Page 2: TAO: Facebook's Distributed Data Store for the Social Graph

Presentation online

http://j.mp/1S0wLZj

Page 3: TAO: Facebook's Distributed Data Store for the Social Graph

paperswelove.org

Page 4: TAO: Facebook's Distributed Data Store for the Social Graph
Page 5: TAO: Facebook's Distributed Data Store for the Social Graph

Why PWL Iasi?

● Because it’s cool● There is no theory-oriented community● Discuss language/ framework-agnostic topics● Bring together academia and industry● Get more people (and institutions?) interested in

actual Research & Development

Page 6: TAO: Facebook's Distributed Data Store for the Social Graph

4 presentations in 2016

Page 7: TAO: Facebook's Distributed Data Store for the Social Graph

Who?

Adrian-Tudor PanescuSoftware Engineer

Adrian BurlacuPhD, Automatic Control and Applied Informatics,

TUIASI

Alexandru ArchipPhD, Computer Engineering, TUIASI

Marius KloetzerPhD, Automatic Control and Applied Informatics,

TUIASI

Alexandra AdamThe Human Side™

Page 8: TAO: Facebook's Distributed Data Store for the Social Graph

And you!

● We are looking for speakers/ moderators!○ ~1 hour presentation/ open discussion on a significant

paper on your favourite subject/ field● And for sponsors

○ Mainly for ensuring that we have a venue for the meetups

● Feel free to contact me: [email protected]

Page 9: TAO: Facebook's Distributed Data Store for the Social Graph

https://github.com/papers-we-love/papers-we-love#how-to-read-a-paper

Page 10: TAO: Facebook's Distributed Data Store for the Social Graph

Code of Conduct and other info:http://www.meetup.com/Papers-We-Love-Iasi/

Page 11: TAO: Facebook's Distributed Data Store for the Social Graph
Page 12: TAO: Facebook's Distributed Data Store for the Social Graph

TAO: Facebook’s Distributed Data Store

for the Social Graph

Bronson et al., Facebook Inc., USENIX’13

Page 13: TAO: Facebook's Distributed Data Store for the Social Graph

2

Page 14: TAO: Facebook's Distributed Data Store for the Social Graph

● Billion reads and million writes per second● Petabyte-sized data set● Geographically spread● Users are unique and impatient

○ Privacy constraints must be satisfied at view time

How do we store and serve all this?

3

Page 15: TAO: Facebook's Distributed Data Store for the Social Graph

Before TAO

Cache

Query

Store

4

Page 16: TAO: Facebook's Distributed Data Store for the Social Graph

The 3 contributions

● Characterize Facebook’s workload● Describe a proper data model● Present an actual large-scale implementation

5

Page 17: TAO: Facebook's Distributed Data Store for the Social Graph

The Associations and Objects

6

Page 18: TAO: Facebook's Distributed Data Store for the Social Graph

The data model

● There are only 2 data types: nodes and edges○ Labeled directed multigraph○ You need only 2 tables in the DB

● Facebook leverages certain application characteristics:○ They don’t need a full graph query API○ “Most of the data is old, but many of the queries are for the newest

subset”● “Likely to be useful for any application domain that needs to efficiently

generate fine-grained customized content from highly interconnected data.”

7

Page 19: TAO: Facebook's Distributed Data Store for the Social Graph

“TAO provides basic access to the nodes and edges of a constantly

changing graph in data centers across multiple regions. It is optimised heavily

for reads, and explicitly favours efficiency and availability over

consistency.”8

Page 20: TAO: Facebook's Distributed Data Store for the Social Graph

CAP Principle

Consistency

Availability

Partitiontolerance

TAO

Towards Robust Distributed Systems, Eric A. Brewer, 2000 9

Page 21: TAO: Facebook's Distributed Data Store for the Social Graph

10

Page 22: TAO: Facebook's Distributed Data Store for the Social Graph

Summary

● 1 storage layer (MySQL)● 2 cache layers (custom, LRU) directly implementing the graph abstraction

○ Leader (DB I/O), protects DB from thundering herds○ Follower (Client I/O)○ Consistency maintained via asynchronous maintenance messages

● A full copy of Facebook’s data is stored in a cluster of data centers in geographical proximity

● A region has a master and multiple slaves (per shard!) deployments○ Writes are always forwarded first to the master

11

Page 23: TAO: Facebook's Distributed Data Store for the Social Graph

Consistency

● Remember that we are eventually consistent!● Write request to slaves forwarded to master

○ If applied, all slaves are informed○ Follower caches are invalidated via maintenance messages

● Propagated changesets use a version number to solve conflicts generated by stale data

● The master DB is the single source of truth○ Requests can be marked as critical and will always be forwarded to the

master DB (e.g., logins)● Replication lag: <1s (85%), <3s (99%), <10s (99.8%)

12

Page 24: TAO: Facebook's Distributed Data Store for the Social Graph

Evaluation: request types

Random sample of 6.5 million requests over 40 days 13

Page 25: TAO: Facebook's Distributed Data Store for the Social Graph

Evaluation: read latency

Overall hit rate: 96.4% 14

Page 26: TAO: Facebook's Distributed Data Store for the Social Graph

Evaluation: write latency

Send packet US West - Netherlands - US West: 150ms 15

Page 27: TAO: Facebook's Distributed Data Store for the Social Graph

Evaluation: hit rate vs. throughput

16

Page 28: TAO: Facebook's Distributed Data Store for the Social Graph

Related work

● Spanner: Google’s globally distributed database● Redis: in-memory key-value store● Dynamo, Voldemort, COPS: distributed key-value store● BigTable, PNUTS, SimpleDB, HBase: NoSQL (NoACID)● Pig Latin, Pregel: graph processing

17

Page 29: TAO: Facebook's Distributed Data Store for the Social Graph

Conclusions

● Paper describes a solution to a practical problem● Data model, API and implementation for a read-intensive,

eventually-consistent, geographically-distributed graph● Simple data model, layered cache which incorporates

application logic● Interesting to see how they leverage domain knowledge to

optimize the system● Evaluation on real data from production system

18

Page 30: TAO: Facebook's Distributed Data Store for the Social Graph

Thank you!