Building a Database on S3Building a Database on...
Transcript of Building a Database on S3Building a Database on...
Building a Database on S3Building a Database on S3Matthias Brantner , Daniela Florescu+, David Graf ,
Donald Kossmann Tim KraskaDonald Kossmann , Tim Kraska
Systems Group, ETH Zurich28msec Inc.
Oracle +
September 25, 2007
MotivationBuilding a web page starting a blog andBuilding a web page, starting a blog, and making both searchable for the public have become a commodityBut providing your own service (and to get rich) still comes at high cost:
Have the right (business) ideaHave the right (business) ideaRun your own web-server and databaseMaintain the infrastructureK th i 24 7Keep the service up 24 x 7Backup the dataTune the system if the service is used more often
A d th th Di Eff t
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 2
And then comes the Digg-Effect
Scalability
Requirements for DM on the WebScalability
response time independent of number of clients
No administrationNo administration„outsource“ patches, backups, fault tolerance
100 t d it il bilit100 percent read + write availabilityno client is ever blocked under any circumstances
Cost ($$$)get cheaper every year, leverage new technologypay as you go along, no investment upfront
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 3
Utility Computing as a solutionScalabilityScalability
response time independent of number of clients
No administrationNo administration„outsource“ patches, backups, fault tolerance
100 t d it il bilit100 percent read + write availabilityno client is ever blocked under any circumstances
Cost ($$$)get cheaper every year, leverage new technology
Consistency: optimization goal, pay as you go along, no investment upfront
?June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
y p g ,not constraint
3
?
Utility Computing as a solutionScalabilityScalability
response time independent of number of clients
No administrationNo administration„outsource“ patches, backups, fault tolerance
100 t d it il bilit100 percent read + write availabilityno client is ever blocked under any circumstancesCost
How much does it cost?
ConsistencyHow much consistency is
Cost ($$$)get cheaper every year, leverage new technology
o uc does t costrequired by my application?
Consistency: optimization goal, pay as you go along, no investment upfront
?June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
y p g ,not constraint
3
?
Amazon Web Services (AWS)Most popular utility providerMost popular utility provider
Gives us all necessary building blocks (Storage, CPU-cycles, etc.)Other providers also appear on the market
Amazon infrastructure services:Simple Storage Service (S3)• (Virtually) infinite store
Simple Queuing Service (SQS)• Message service• (Virtually) infinite store
• Costs: $0.15 per GB-month + transfer costs ($0.1-$0.17 In/Out per GB)
• Message service• Allows to exclusively receive a message• Costs: $0.0001 per message sent + transfer
costs
Elastic Cloud Computing (EC2)Vi t l i t 1 8 i t l ( 1 0 2 5
SimpleDBB i ll t t i d• Virtual instance: 1-8 virtual cores (=1.0-2.5
GHz Opterons), 1.7-15 GB of memory, 160GB-1690GB of instance storage
• Costs: $0.1-$0.8 per hour + transfer costs
• Basically a text-index• Costs: $0.14 per Amazon SimpleDB
machine hour consumed
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 4
Plan of AttackSt 1 U S3 h h d di kStep 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared disk in a distributed system
keep properties of a distributed system,maximize consistency
Step 3: Do application-specific trade-offsconsistency vs. costconsistency vs. availabilityconsistency à la carte (levels of consistency)
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 5
Plan of AttackSt 1 U S3 h h d di kStep 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared diskin a distributed system
keep properties of a distributed system,maximize consistency
Step 3: Do application-specific trade-offsconsistency vs. costconsistency vs. availabilityconsistency à la carte (levels of consistency)
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 6
Shared-Disk ArchitectureCli t 1 / EC2
C
Application
Client 1 / EC2
Application
Client M / EC2
........
ould becom
p
Record Manager Record Manager........
e execpletely
Page Manager Page Manager........
cuted oon the
1 2 3 4 5 N
n EC
2 client
6
Pag
e 1
Pag
e 2
Pag
e 3
Pag
e 4
Pag
e 5
Pag
e N or
Pag
e 6
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 7
S3
Problem: Eventual Consistency
Two clients update the same page
Client 1 / EC2 Client 2 / EC2
ApplicationApplicationthe same pageLast update winsC i t bl
Record Manager
Page Manager
Record Manager
Page Manager Consistency problemInconsistency between i d d
Page Manager Page Manager
indexes and pageLost recordsLost updatesag
e 1
age
2 ag
e 3
age
4 ag
e 5
age
N
age
6
Lost updatesPa
Pa
Pa
Pa
Pa
Pa
S3
Pa
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 8
S3
Plan of AttackSt 1 U S3 h h d di kStep 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared diskin a distributed system
keep properties of a distributed system,maximize consistency
Step 3: Do application-specific trade-offsconsistency vs. costconsistency vs. availabilityconsistency à la carte (levels of consistency)
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 9
Levels of Consistency [Tanenbaum]
Shared-Disk (Naïve approach)Shared Disk (Naïve approach)No concurrency control at all
Eventual Consistency (Basic Protocol)Eventual Consistency (Basic Protocol)Updates become visible any time and will persistNo lost update on page levelNo lost update on page level
AtomicityAll or no updates of a transaction become visible
Monotonic reads, Read your writes, Monotonic writes, ...Strong Consistency
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
g ydatabase-style consistency (ACID) via OCC
10
Levels of Consistency [Tanenbaum]
Shared-Disk (Naïve approach)Shared Disk (Naïve approach)No concurrency control at all
Eventual Consistency (Basic Protocol)Eventual Consistency (Basic Protocol)Updates become visible any time and will persistNo lost update on page levelNo lost update on page level
AtomicityAll or no updates of a transaction become visible
Monotonic reads, Read your writes, Monotonic writes, ...Strong Consistency
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
g ydatabase-style consistency (ACID) via OCC
11
Basic Protocol: Queues
O PU d L k iClient 2Client 1 Client M
One PU and Lock queue is associated to each pageLock queues contain exactly
l k l k l k l k l k
LoQ
ue
Lock queues contain exactly one message (inserted directly after creating the queue)
lock lock lock lock lock
ckeues
PendQ
Commit to pages in two phases
ingU
pdateQ
ueues
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 12
Pa Pa
Pa
Pa
Pa
Basic Protocol
St 1 C itClient 2Client 1 Client M
Step 1: CommitClients commit update
d t PU Qlog log
loglogl
loglog
l k l k l k l k l k
LoQ
ue
records to PU-Queues logloglog
log
lock lock lock lock lock
ckeues
PendQ ing
Update
Queues
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 12
Pa Pa
Pa
Pa
Pa
Basic Protocol
St 1 C itClient 2Client 1 Client M
Step 1: Commit Clients commit update
d t PU Qlog log
loglogl
loglog
l k l k l k l k l k
LoQ
ue
records to PU-Queues logloglog
log
lock lock lock lock lock
ckeues
PendQ ing
Update
Queues
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa Pa
Pa
Pa
Pa
12
Basic Protocol
St 1 C itClient 2Client 1 Client M
Step 1: Commit Clients commit update
d t PU Q
l k l k l k l k l k
LoQ
ue
records to PU-QueuesCommit of the transactionT i i fi i h d lock lock lock lock lock
ckeues
PendQ
Transaction is finished
log
log
ingU
pdateQ
ueueslog log log
loglog
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: CheckpointingCheckpointing propagates updates from SQS to S3
l k l k l k l k l k
LoQ
ue
updates from SQS to S3Updates become visible on S3Checkpointing requires lock lock lock lock lock
ckeues
PendQ
Checkpointing requires synchronization Achieved by using SQS as
log
log
ingU
pdateQ
ueues
exclusive lockslog log log
loglog
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock
l k l k l k l k l k
LoQ
ue
lock lock lock lock lock
ckeues
PendQ
log
log
ingU
pdateQ
ueueslog log log
loglog
age
1
eS3
age
N
age
2
age
3
age
4
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock2 f
lock
l k l k l k l k
LoQ
ue
2. Refresh Page
lock lock lock lock
ckeues
PendQ
age
1 log
log
ingU
pdateQ
ueueslog log log
loglogPa
eS3
age
N
age
2
age
3
age
4
age
1
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
12
Pa
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock2 f
lock
Page 1
l k l k l k l k
LoQ
ue
2. Refresh Page3. Receive Messages: as many
as possiblelog
log
lock lock lock lock
ckeues
PendQ
as possible
log
log
ingU
pdateQ
ueueslog log log
loglog
eS3
age
N
age
2
age
3
age
4
age
1
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock2 f
lock
Page 1 Page 1
l k l k l k l k
LoQ
ue
2. Refresh Page3. Receive Messages: as many
as possible lock lock lock lock
ckeues
PendQ
as possible4. Apply the log records to the
cached page
log
log
ingU
pdateQ
ueueslog log log
loglog
eS3
age
N
age
2
age
3
age
4
age
1
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock2 f
lock
l k l k l k l k
LoQ
ue
2. Refresh Page3. Receive Messages: as many
as possible lock lock lock lock
ckeues
PendQ
as possible4. Apply the log records to the
cached page
log
log
ingU
pdateQ
ueues
5. Put the new version of the page to S3 log log log
loglog
eS3
age
N
age
2
age
3
age
4
age
1 ag
e 1
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
Pa
Pa
12
Basic Protocol
St 2 Ch k i tiClient 2Client 1 Client M
Step 2: Checkpointing1. Receive Lock2 f
lock
l k l k l k l k
LoQ
ue
2. Refresh Page3. Receive Messages: as many
as possible lock lock lock lock
ckeues
PendQ
as possible4. Apply the log records to the
cached page
ingU
pdateQ
ueues
5. Put the new version of the page to S3
6 D l t ll th l d hi h
log log log
loglog
eS3
age
N
age
2
age
3
age
4
6. Delete all the log records which were received in Step 3
age
1 ag
e 1
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
Pa
Pa
12
Basic Protocol
Client 2Client 1 Client MExtremely simpleNo additional infrastructure (except SQS) is needed
l k l k l k l k
LoQ
ue
l k
SQS) is neededProtocol is also resilient to failures
Applying a log record twice does not harm lock lock lock lock
ckeues
PendQ
lockas log records are idempotent
Protocol has a price: dollar and freshness of the data ing
Update
Queueslog log log
loglogfreshness of the dataStill weak consistency/ transactional properties, e
S3
age
N
age
2
age
3
age
4
age
1 ag
e 1
No atomicityNo monotonic guaranties: Ordering of the log record messages is not important
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Pa
Pa
Pa
Pa
Pa
Pa
13
No concurrency control on record level
Atomicity: All or none of the updates of a transaction become visible
ClientEach client manages an atomic queue.atomic queue.
log rec.
PUQueue
log rec.
PUQueue
Atomic Queue
S3log rec.
LOCK Queue
LOCK Queue
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 14
Queuelock
Queuelock
Atomicity: All or none of the updates of a transaction become visible
ClientCommit Protocol1 Send all log records to1. Send all log records to
the ATOMIC queue.
log rec.
PUQueue
log rec.
PUQueue
log rec.
Atomic Queue
S3log rec. log rec.
log rec
LOCK Queue
LOCK Queue
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 14
Queuelock
Queuelock
Atomicity: All or none of the updates of a transaction become visible
ClientCommit Protocol1 Send all log records to1. Send all log records to
the ATOMIC queue. 2 Send commit log record
log rec.
PUQueue
2. Send commit log record.
log rec.
PUQueue
log rec.
Atomic Queue
S3log rec. log rec.
log rec
commit
LOCK Queue
LOCK Queue
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 14
Queuelock
Queuelock
Atomicity: All or none of the updates of a transaction become visible
ClientCommit Protocol1 Send all log records to1. Send all log records to
the ATOMIC queue. 2 Send commit log record
log rec.
PUQueue
2. Send commit log record.3. Send all log records to
the corresponding PU log rec.
PUQueue
Atomic Queue
log rec.
S3the corresponding PU queues.
log rec.
log.rec
log rec.
log rec log rec.
log rec
commit
LOCK Queue
LOCK Queue
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 14
Queuelock
Queuelock
Atomicity: All or none of the updates of a transaction become visible
ClientCommit Protocol1 Send all log records to1. Send all log records to
the ATOMIC queue. 2 Send commit log record
log rec.
PUQueue
2. Send commit log record.3. Send all log records to
the corresponding PU log rec.
PUQueue
Atomic Queue
S3the corresponding PU queues.
4 Delete all message after
log rec.
log.rec
log rec.
log rec
LOCK Queue
4. Delete all message after committing.
LOCK Queue
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 14
Queuelock
Queuelock
Atomicity cont’d.
Wh li t f il th li t h k it ATOMICWhen a client fails, the client checks its ATOMIC queue at restart
Winners are all log records which carry the same id as one of the commit records found in theid as one of the commit records found in the ATOMIC queue; all other log records are losers
Winners are propagated, others deleted
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 15
Plan of AttackSt 1 U S3 h h d di kStep 1: Use S3 as a huge shared disk
leverage scalability, no admin features
Step 2: Allow concurrent access to shared diskin a distributed system
keep properties of a distributed system,maximize consistency
Step 3: Do application-specific trade-offsconsistency vs. costconsistency vs. availabilityconsistency à la carte (levels of consistency)
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 16
Experiments and ResultsG l St d i th t d ff i t fGoal: Studying the trade-offs in terms of consistency, latency and cost ($)We used a sub-set of the TPC-W benchmark (models a bookstore scenario)All experiments were done with a complex customer transaction involving the following g gsteps: a) retrieve the customer record from the database; )
b) search for six specific products; c) place orders for three of the six products.
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 17
Running Time per Transaction [secs]Avg MaxAvg. Max.
Naïve (Shared-Disk) 11.3 12.1Basic (Eventual Consistency) 4.0 5.9as c ( e tua Co s ste cy) 0 5 9Monotonicity 4.0 6.8Atomicity + Monotonicity 2.8 4.6
Doesn’t include checkpointing (asynchronously) Every transaction simulates around 12 clicksEvery transaction simulates around 12 clicksTime is less than a sec. per clickTime is independent of the number of clientsIt's fast. Not 15K-SCSI-RAID0-fast, but internet-
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 18
It s fast. Not 15K SCSI RAID0 fast, but internetlatency-fast.
Cost per 1000 Transactions ($)Step1: Step2: Checkpoint TotalStep1:Commit
Step2: Checkpoint+ Atomic Queue
Total
Naïve (Shared-Disk) 0.15 0 0.15Basic (Eventual Consistency) 0.7 1.1 1.8Monotonicity 0.7 1.4 2.1At i it M t i it 0 3 2 6 2 9Atomicity + Monotonicity 0.3 2.6 2.9
Interaction with SQS becomes expensiveFor a bookstore, a transactional cost of about 3 milli-dollars (i.e., 0.3 cents)Especially updates have a big influence on the cost
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 19
Not cheap, but in many scenarios affordable
Summary and Future WorkA hit t ll t t liArchitecture allows transparent scaling:No need to change code, hardware,...Consistency is a goal, not a constraint:Consistency à la carte for your applicationsAmazon’s WebServices are a viable candidate for many Web 2.0 and interactive applicationsy ppFuture work:
SimpleDB as the main indexSimpleDB as the main indexEC2 as application serverFurther studies of stronger consistency protocols
June 18, 2008 Tim Kraska/ETH Zurich/[email protected]
Further studies of stronger consistency protocols
20
Thank you for your interestThank you for your interest
Questions?
Contact:[email protected]
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 21
Cost per 1000 Transacts., Various Checkpoint Intervalsp
June 18, 2008 Tim Kraska/ETH Zurich/[email protected] 38