Molecular Biology Xu Liyan Chapter 15 gene recombination and gene engineering.
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
-
Upload
gyles-johns -
Category
Documents
-
view
223 -
download
0
Transcript of PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
3
What does a web application need?• Scalability
– architectural scalability– scale during periods of rapid growth with minimal
operational effort
• Response Time and Geographic Scope– Fast response time to geographically distributed users
• High Availability and Fault Tolerance– Read and even write data in failures
• Relaxed Consistency Guarantees– Eventually consistency: update one replica first and then
update others
4
What do we need from our DBMS?
• Web applications need:– Scalability
• And the ability to scale linearly
– Geographic scope– High availability
• Web applications typically have:– Simplified query needs
• No joins, aggregations
– Relaxed consistency needs• Applications can tolerate stale or reordered data
6
What is PNUTS?
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
E 75656 C
A 42342 EB 42521 W
C 66354 W
D 12352 E
F 15677 E
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…
)
Parallel databaseParallel database Geographic replicationGeographic replication
Indexes and viewsIndexes and views
Structured, flexible schemaStructured, flexible schema
Hosted, managed infrastructureHosted, managed infrastructure
A 42342 E
B 42521 W
C 66354 W
D 12352 E
E 75656 C
F 15677 E
7
Query model• Per-record operations
– Get– Set– Delete
• Multi-record operations– Multiget– Scan– Getrange
8
Data-path componentsData-path components
Storage units
Routers
Tablet controller
REST API
Clients
MessageBroker
Detailed architecture
Data tables are horizontally partitioned into groups of records called tablets.Storage units: store tabletsrespond to get() and scan() requests by retrieving and returning matching records respond to set() requests by processing the update.
If we want to commit the update result, need to write them to Message Broker firstly.
Router:
determine which storage unit is responsible fora given record to be read or written by the client,
we must first determine which tablet contains the record,
and then determine which storage unit has that tablet
tablet controller :determineswhen it is time to move a tablet between storage units forload balancing or recovery when a large tablet must be split.
update the copy of the interval mapping.
9
Storageunits
Routers
Tablet controller
REST API
Clients
Local region Remote regions
YMB
Detailed architecture
record-level mastering:mastership is assigned on a record-by-record basis, and different records in the same table can be mastered in different clusters.
In one week, 85 percent of the writes to a given record originated in the same datacenter.
A master publishes its updates to a single broker, and thus updates are delivered to replicas in commit order.
YMB takes multiple steps to ensure messages are not lost before they are applied to the database.
messages published to one YMB cluster will be relayed to other YMB clusters for delivery to local subscribers
13
Range queries
MIN-Canteloupe
SU1
Canteloupe-Lime
SU3
Lime-Strawberry
SU2
Strawberry-MAX
SU1
Storage unit 1 Storage unit 2 Storage unit 3
Router
AppleAvocadoBananaBlueberry
CanteloupeGrapeKiwiLemon
LimeMangoOrangePearStrawberryTomatoWatermelon
Grapefruit…Pear?
Grapefruit…Lime?
Lime…Pear?
SU1Strawberry-MAX
SU2Lime-Strawberry
SU3Canteloupe-Lime
SU4MIN-Canteloupe
14
Updates
1
Write key k
2Write key k7 Sequence # for key k
8 Sequence # for key k
SU SU SU
3Write key k
4
5SUCCESS
6Write key k
RoutersMessage brokers
17
• Goal: make it easier for applications to reason about updates and cope with asynchrony
• What happens to a record with primary key “Brian”?
Consistency model
Time
Record inserted
Update Update Update UpdateUpdate Delete
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Update Update
18
Consistency model
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Current version
Stale versionStale version
Read
Read-any:Returns a possibly stale version of the record.
e.g., in a social networking application, for displaying a user’s friend’s status, it is not absolutely essential to get the most up-to-date value, and hence read-any can be used.
19
Consistency model
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read up-to-date
Current version
Stale versionStale version
20
Consistency model
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Read ≥ v.6
Current version
Stale versionStale version
Read-critical(required version):
Read-critical:Returns a version of the record that is strictly newer than, or the same as the required version.
For example, when a user writes a record, and then wants to read a version of the record that definitely reflects his changes.
21
Consistency model
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write
Current version
Stale versionStale version
22
Consistency model
Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7
Generation 1
v. 6 v. 8
Write if = v.7
ERROR
Current version
Stale versionStale version
Test-and-set-write(required version)
Test-and-set-write(required version): This call performs the requested write to the record if and only if the present version of the record is the same as required version.
This call can be used to implement transactions that first read a record, and then do a write to the record based on the read, e.g., incrementing the value of a counter..
23
Record and Tablet Mastership• Data in PNUTS is replicated across sites• Hidden field in each record stores which copy is the
master copy– updates can be submitted to any copy– forwarded to master, applied in order received by master
• Record also contains origin of last few updates– Mastership can be changed by current master, based on this
information– Mastership change is simply a record update
• Tablets mastership– Required to ensure primary key consistency– Can be different from record mastership
24
Other Features
• Per record transactions• Copying a tablet (failure recovery, for e.g.)
– Request copy– Publish checkpoint message– Get copy of tablet as of when checkpoint is
received– Apply later updates
• Tablet split– Has to be coordinated across all copies
25
Query Processing• Range scan can span tablets
– Only one tablet scanned at a time– Client may not need all results at once
• Continuation object returned to client to indicate where range scan should continue
• Notification– One pub-sub topic per tablet– Client knows about tables, does not know about tablets
• Automatically subscribed to all tablets, even as tablets are added/removed.
– Usual problem with pub-sub: undelivered notifications, handled in usual way
27
Experimental setup• Production version supported by
– Hash tables– ordered tables
• Database– 3 regions: 2 west coast, 1 east coast– 1 KB records, 128 tablets per region– Each process had 100 client threads, – Totally 300 clients across the system.
• Workload– 1200-3600 requests/second– 0-50% writes– 80% locality
28
Inserts
• Inserts (hash tables)– required 75.6 ms per insert in West 1 (tablet master)– 131.5 ms per insert into the non-master West 2, and – 315.5 ms per insert into the non-master East.
• Inserts (ordered tables)– 33 ms per insert in West 1– 105.8 ms per insert in the non-master West2– 324.5 ms per insert in the non-master East.
29
10% writes by default
latency decreases, andthen increases, with increasing load
The high latency at low request rate resulted froman anomaly in the HTTP client library we used, which closedTCP connections in between requests at low request rates,requiring expensive TCP setup for each call.
As the proportion of reads increases, the average latency decreases.
30
Scalability
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6
Storage units
Ave
rag
e la
ten
cy (
ms)
Hash table Ordered table
31
Size of range scans
0
1000
2000
3000
4000
5000
6000
7000
8000
0 0.02 0.04 0.06 0.08 0.1 0.12
Fraction of table scanned
Ave
rag
e la
ten
cy (
ms)
30 clients 300 clients