TAO: Facebook's Distributed Data Store for the Social Graph

Papers We Love Iasi

Kickstart

Presentation online

http://j.mp/1S0wLZj

paperswelove.org

Why PWL Iasi?

● Because it’s cool● There is no theory-oriented community● Discuss language/ framework-agnostic topics● Bring together academia and industry● Get more people (and institutions?) interested in

actual Research & Development

4 presentations in 2016

Adrian-Tudor PanescuSoftware Engineer

Adrian BurlacuPhD, Automatic Control and Applied Informatics,

TUIASI

Alexandru ArchipPhD, Computer Engineering, TUIASI

Marius KloetzerPhD, Automatic Control and Applied Informatics,

TUIASI

Alexandra AdamThe Human Side™

And you!

● We are looking for speakers/ moderators!○ ~1 hour presentation/ open discussion on a significant

paper on your favourite subject/ field● And for sponsors

○ Mainly for ensuring that we have a venue for the meetups

● Feel free to contact me: adrian@panescu.com

https://github.com/papers-we-love/papers-we-love#how-to-read-a-paper

Code of Conduct and other info:http://www.meetup.com/Papers-We-Love-Iasi/

TAO: Facebook’s Distributed Data Store

for the Social Graph

Bronson et al., Facebook Inc., USENIX’13

● Billion reads and million writes per second● Petabyte-sized data set● Geographically spread● Users are unique and impatient

○ Privacy constraints must be satisfied at view time

How do we store and serve all this?

Before TAO

The 3 contributions

● Characterize Facebook’s workload● Describe a proper data model● Present an actual large-scale implementation

The Associations and Objects

The data model

● There are only 2 data types: nodes and edges○ Labeled directed multigraph○ You need only 2 tables in the DB

● Facebook leverages certain application characteristics:○ They don’t need a full graph query API○ “Most of the data is old, but many of the queries are for the newest

subset”● “Likely to be useful for any application domain that needs to efficiently

generate fine-grained customized content from highly interconnected data.”

“TAO provides basic access to the nodes and edges of a constantly

changing graph in data centers across multiple regions. It is optimised heavily

for reads, and explicitly favours efficiency and availability over

consistency.”8

CAP Principle

Consistency

Availability

Partitiontolerance

Towards Robust Distributed Systems, Eric A. Brewer, 2000 9

Summary

● 1 storage layer (MySQL)● 2 cache layers (custom, LRU) directly implementing the graph abstraction

○ Leader (DB I/O), protects DB from thundering herds○ Follower (Client I/O)○ Consistency maintained via asynchronous maintenance messages

● A full copy of Facebook’s data is stored in a cluster of data centers in geographical proximity

● A region has a master and multiple slaves (per shard!) deployments○ Writes are always forwarded first to the master

Consistency

● Remember that we are eventually consistent!● Write request to slaves forwarded to master

○ If applied, all slaves are informed○ Follower caches are invalidated via maintenance messages

● Propagated changesets use a version number to solve conflicts generated by stale data

● The master DB is the single source of truth○ Requests can be marked as critical and will always be forwarded to the

master DB (e.g., logins)● Replication lag: <1s (85%), <3s (99%), <10s (99.8%)

Evaluation: request types

Random sample of 6.5 million requests over 40 days 13

Evaluation: read latency

Overall hit rate: 96.4% 14

Evaluation: write latency

Send packet US West - Netherlands - US West: 150ms 15

Evaluation: hit rate vs. throughput

Related work

● Spanner: Google’s globally distributed database● Redis: in-memory key-value store● Dynamo, Voldemort, COPS: distributed key-value store● BigTable, PNUTS, SimpleDB, HBase: NoSQL (NoACID)● Pig Latin, Pregel: graph processing

Conclusions

● Paper describes a solution to a practical problem● Data model, API and implementation for a read-intensive,

eventually-consistent, geographically-distributed graph● Simple data model, layered cache which incorporates

application logic● Interesting to see how they leverage domain knowledge to

optimize the system● Evaluation on real data from production system

Thank you!

TAO: Facebook's Distributed Data Store for the Social Graph

Internet

Transcript of TAO: Facebook's Distributed Data Store for the Social Graph

Facebook's Counterclaims to Yahoo's Patent Lawsuit

Facebook's hack programming language

xsMobi: Optimize Pages for Facebook's Open Graph plus use Facebook-ready FMBL

Facebook's Q2 Earnings Slide Deck

Understanding Facebook's Open Graph

Tao: Facebook's Distributed Data Store For The Social Graphpavlo/courses/fall2013/static/slides/tao.pdf · Tao: Facebook's Distributed Data Store For The Social Graph Bronson et.

Final Facebook's Media Laws

Understanding Facebook's React.js

The History of Facebook's Developer Platform

BuddyMedia - Facebook's Edgerank

social media: facebook's educational implication

Facebook's dilemma case study

Facebook's Timeline Era

Facebook's Brain-Computer Interface

Facebook's culture of innovation

Facebook's Dismissal of Paul Ceglia Case

TAO Facebook’s Distributed Data Store for the Social Graph

Facebook's Global Impact By: Elizabeth Thomas

SMX London: Understanding Facebook's Graph Search

Facebook's Timeline Era - Managing Your Brand Through Facebook's Evolution