PNUTS: Yahoo!’s Hosted Data Serving Platform
description
Transcript of PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni
Research
Mina FaridUniversity of WaterlooCS 848 Presentation8 February 2010
Mina Farid2
Outline Motivation Data and Query Model Consistency System Architecture Applications Experiments
Mina Farid3
Motivation Scalability Response Time (SLAs) High Availability and Fault Tolerance Relaxed Consistency Guarantees
Serializable Transactions Eventual Consistency: update any replica, all updates
are propagated to all replicas, but potentially in different orders
Mina Farid4
Data and Query Model Simplified Relational Data Model (tables,
records, attributes) Flexible schemas Query: Selection and Projection from a single
table. Specific applications Scans a few records No ad-hoc queries
Support for hashed and ordered tables
Mina Farid5
Consistency In between One record updates Per-record timeline consistency: replicas
of a record apply updates in the same order
For one version, all replicas contain the same information
General Serializability Eventual Consistency
Mina Farid6
Consistency (cont’d) Master replica for each record. Updates are forwarded to this master replica Master record carries the version info API calls - Consistency
Read-anyRead-critical(required_version)Read-latestWriteTest-and-set-write(required_version)
Mina Farid7
System Architecture
Tablet Controll
er
Storage Unit 1 Storage Unit 2 Storage Unit N
Routers
Message
Broker
. . . . . . . .
Region
T1 SU1
T2 SU2
T3 SU3
T4 SU1
Mina Farid8
System Architecture – Data Storage and Retrieval
Regions with full complement of system and data
Tables are partitioned into tablets Tablet is just a group of records of a certain table
Tablets are stored on storage units servers Storage units respond to:
get() scan() set()
Mina Farid9
Tablet 1 Tablet 2 Tablet 3 Tablet 4
Routers’ Mapping – Ordered Table Routers decide:
Which tablets contain which records Which SU holds which tablets Banana
. . . .
. . . .Grape. . . .. . . .Lemon. . . .MAX_STRING
MIN_STRING. . . .
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana
T2
Grape T3
Lemon T4
Mina Farid10
System Architecture
Tablet Controll
er
Storage Unit 1 Storage Unit 2 Storage Unit N
Routers
Message
Broker
. . . . . . . .
Region
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana
T2
Grape T3
Lemon T4
MIN T1
Banana
T2
Grape T3
Lemon T4
T1 SU1
T2 SU2
T3 SU3
T4 SU1
Mina Farid11
System Architecture
Tablet Controll
er Routers Message
Broker
Tablet Controll
erRoutersMessage
Broker
Storage Units Storage Units
Region 1
Region 2
T1 SU1
T2 SU2
T3 SU3
T4 SU1
T1 SU1
T2 SU2
T3 SU3
T4 SU1
MIN T1
Banana T2
Grape T3
Lemon T4
T1 SU1T2 SU2T3 SU3T4 SU1
MIN T1
Banana T2
Grape T3
Lemon T4
T1 SU1T2 SU2T3 SU3T4 SU1
Mina Farid12
System Architecture – Replication and Consistency1- Yahoo! Message Broker
Reliable topic based publish/subscribe Updates are asynchronously propagated to all replicas Provides ‘Partial Ordering’:
Messages published to a particular YMB will be delivered to all subscribers in the same order.
Messages published to different YMBs may be delivered in any order
Solution: per-record mastership
Mina Farid13
System Architecture – Replication and Consistency2- Consistency and Record Mastership
One copy of a record as a master Updates are forwarded to that master copy
Publish update (commit) Different records in the same table can be mastered in
different clusters
Who is the master record? How it is selected? Each record carries meta-data information about the
identity of the master record (changeable) Record receiving most updates
Mina Farid14
Query Processing Multi-record querying
Scatter-gather engine (Router) Split multi-record request to multiple single-record
requests Initiates parallel queries Assemble and evaluate results, and send it back to the
client Handles range and scan queries (also supports top-k)
Mina Farid15
Applications User Databases
Millions of records, frequent updates, important data, relaxed consistency
Social ApplicationFlexible schemas, large number of small updates, no real-time requirements (relaxed consistency)
Content Meta-DataManage structured metadata, scalable, consistent
Session DataScalable storage to manage states, but low consistency required
Mina Farid16
Experiments
Main criteria: Average Request Latency (response time)
Experiment Setup 3 Regions (2 West, 1 East)
1- Inserting data2- Varying Load3- Varying number of Storage Units
Mina Farid17
Future EnhancementsIncludes adding the following features:
Indexing, Materialized Views Bundled updates (atomic non-isolated updates
for multiple records)
Mina Farid18
Conclusion
Mina Farid19
Thank You!Questions?
Mina Farid20
Mina Farid21
Google BigTable Record-oriented access to very large tables Does not support:
Geographic replication Secondary indexes Materialized views Hash-organized tables
Mina Farid22
Dynamo Focuses on availability Provides geographic replication via ‘gossip’
mechanism Eventual consistency model does not suit all
applications “Updates are committed in different orders at
different replicas”, then replicas are eventually reconciled (updates may roll back)
Does not support: Ordered tables
Mina Farid23
Boxwood Provides B-tree implementation The design favors consistency over scalability
(tens of machines)