Pnuts
-
Upload
smilekg1220 -
Category
Education
-
view
587 -
download
0
Transcript of Pnuts
1
PNUTS: Yahoo!’s Hosted Data Serving
PlatformBrian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein,
Philip Bohannon, HansArno Jacobsen,Nick Puz, Daniel Weaver and Ramana Yerneni
Yahoo! Research
2
Motivation• Web applications need:
o Scalability -architectural scalability, scale linearlyo Geographic scope -data replicas on multiple continentso High availability -failures, apps will still be able to read datao Relaxed consistency needs -Tolerate stale or reordered data
3
Relaxed Consistency• Not strictly consistency• Very expensive.
• Not eventually consistency• Ex: a photo sharing application• U1: Remove someone from the list of people who
can view his photos• U2: Post spring-break photos
4
What is PNUTS?• PNUTS, a massively parallel and
geographically distributed database system for Yahoo!’s web applications.
• An architecture based on record-level, asynchronous geographic replication, and use of a guaranteed message-delivery service rather than a persistent log.
5
System architecture
6
• Storage Units• Store several hundreds of tablets, a tablet usually
several hundreds of megabytes. • Routers• The router stores an interval mapping, which defines
the boundaries of each tablet, and also maps each tablet to a storage unit.
• Tablet Controller• Routers contain only a cached copy of the interval
mapping. The mapping is owned by the tablet controller• YMB- Yahoo Message Broker• topic-based pub/sub system
System architecture
7
Yahoo Message Broker• Distributed publish-subscribe service.
• Guarantees delivery once a message is published.
• Asynchronously assigned to different regions and applied to their replicas.
8
Types of Table
9
Tablet splitting and balancingEach storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split
Storage unit may become a hotspotStorage unit may become a hotspot
Shed load by moving tablets to other serversShed load by moving tablets to other servers
Storage unitTablet
10
Query processing
11
Accessing data
SUSU SU
1Get key k
2Get key k3Record for key k
4Record for key k
12
Bulk read
SUScatter
/gather engine
SU SU
1{k1, k2, … kn}
2Get k1
Get k2 Get k3
13
Per-record timeline consistency• all replicas of a given record apply all updates to
the record in the same order.
14
Per-record timeline consistency
• An example sequence of updates to a record
• 3 events: insert, update and delete.• One replica assigned as the master• Generation: new insert Version: each
update
15
Consistency model
• Goal: make it easier for applications to reason about updates and cope with asynchrony
• web applications typically manipulate one record at a time
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Update Update
16
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Current version
Stale version
Stale version
Read-any
Consistency model
Read-any: Returns a possibly stale version of the record.
17
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Read latest
Current version
Stale version
Stale version
Consistency model
Read latest: Returns the latest copy of the record thatreflects all writes that have succeeded.
18
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Read ≥ v.6
Current version
Stale version
Stale version
Read-critical(required version):
Consistency model
Read critical: Returns a version of the record that is strictly newer than, or the same as the required version.
19
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale version
Stale version
Test-and-set-write(required version)
Consistency model
This call performs the requested write to the record if and only if the present version of the record is the same as required version
20
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale version
Stale version
Mechanism: per record mastershipMechanism: per record mastership
Consistency model
Consistency levels• Eventual consistency
o Transactions:• Alice changes status from “Sleeping” to “Awake”• Alice changes location from “Home” to “Work”
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Sleeping)
Region 2
(Alice, Work, Awake)
(Alice, Work, Awake)
Work
Awake
Final state consistent
“Invalid” state visible
Awake Work
Consistency levels• Timeline consistency
o Transactions:• Alice changes status from “Sleeping” to “Awake”• Alice changes location from “Home” to “Work”
(Alice, Home, Sleeping) (Alice, Home, Awake)
Region 1
(Alice, Home, Sleeping) (Alice, Work, Awake)
Region 2
(Alice, Work, Awake)
Work
(Alice, Work, Awake)
Awake Work
23
Experiments
24
Experimental setup• Production PNUTS code
o Enhanced with ordered table type
• Three PNUTS regionso 2 west coast, 1 east coasto 5 storage units, 2 message brokers, 1 router
• Workload parameterso Request rate: 1200-3600 requests/secondo Read: write mix ratio:0-50% writeso Locality:80%
25
Inserts• Inserts
o required 75.6 ms per insert in West 1 (tablet master)
o 131.5 ms per insert into the non-master West 2, and
o 315.5 ms per insert into the non-master East.
o These results show the expected effect that the cost of inserting is significantly higher if the insert is initiated in a non-master region that is far away from the tablet master.
26
10% writes by default
Lessons learned (1)• Simpler is better than clever
o Clever approaches are hard to implement, test, debug and maintain
• Incremental is better than big-bang
Lessons learned (2)• Non-algorithmic challenges can be hard
o Dealing with network config, legacy software and requirements, the “corporate way,” multiple stakeholders…
• Researchers should get dirty handso Being a part of shipping a real system can
radically readjust your worldviewo Write some test cases to understand
system complexity