Presentation
-
Upload
kiarash1361 -
Category
Technology
-
view
163 -
download
0
description
Transcript of Presentation
![Page 1: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/1.jpg)
1
Durability for Memory-Based Key-Value Stores
Kiarash Rezahanjani
July 4, 2012
![Page 2: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/2.jpg)
2
Durability
set(university , UPC)
Ack
get(university )
UPC
Data Store
(university , KTH )
![Page 3: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/3.jpg)
3
Durability
Ack
Data Store
Non Volatile
set(university , UPC )
Commodity
![Page 4: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/4.jpg)
4
Durability
Ack
Data Store
set(myKey, U)
Commodity
![Page 5: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/5.jpg)
5
Durability
Disk
Write Read
SLOWSeek time +Rotational time
+Transfer time
![Page 6: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/6.jpg)
6
Cache in memory
Primary copy of objects
Cached Objects
ReadsWrites
Consistency ?
FastSlow
![Page 7: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/7.jpg)
7
Cache in memory
MySQL Servers
Memcache servers
Application Servers
Update Obj A
Delete Obj A
Read ObjA - > Cache Miss
Read Obj A
Set ObjA
Stale data
Spending resouces
Writes are still Slow
Complicates development
![Page 8: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/8.jpg)
8
Memory-Based Databases
Primary Copy of Objects
Back up
Writes ReadsNo stale data
Reads are fast
Writes latency?
Durability?
No inconsistency
![Page 9: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/9.jpg)
9
Approaches towards durability
Data loss
Slow
Data loss
Snapshot Snapshot
State A State B Periodic Snapshots
Log Log Log
Synchronous logging
Logs Logs
Asynchronous logging
![Page 10: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/10.jpg)
10
Approaches towards durability
Data
Replica Replica
Replica
Expensive
Catastrophic Failure , All gone
![Page 11: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/11.jpg)
11
Project Goals
Durable write
Low latency
Cheap, commodity hardware
Availability, able to recover quickly
![Page 12: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/12.jpg)
12
Target systems
• Data is big = many machines• Read dominant workload• Simple key-value store• Small writes– Example: Facebook• Tera bytes of data = 2000 memcache servers• Write/read ratio < 6%• Memcache is a key-value store• Status update, tag photo, profile update, etc
![Page 13: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/13.jpg)
13
Solution
![Page 14: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/14.jpg)
14
Design decisions
Periodic snapshot vs.
Message logging
![Page 15: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/15.jpg)
15
Design decisions
Local diskvs.
Remote location
![Page 16: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/16.jpg)
16
Design decisions
Remote file servervs.
Local disks of database cluster
![Page 17: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/17.jpg)
17
Design Decision
Database
client
write
LogAck
Remote storage
![Page 18: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/18.jpg)
Design Decision
Database
client
write
LogAck
Two Problems
2) Data availability
Asynchronous loggingMust1) Synchronous logging
18
Replication
Problems: Data loss
![Page 19: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/19.jpg)
Replication
LogAck
LogLogLog
Replication
LogAck
¿
19
![Page 20: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/20.jpg)
20
master
slaveslave
headtail
Broadcast Chain replication
Log LogAck Ack
Replication
![Page 21: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/21.jpg)
21
Replication
master
slaveslave
Broadcast
LogAck
slave
![Page 22: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/22.jpg)
22
Replication
headtail
Chain replication
LogAck
![Page 23: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/23.jpg)
23
Replication
headtail
Chain replication
LogAck
![Page 24: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/24.jpg)
Chain Replication
Database
client
write
LogAck
24
LogLogLog
![Page 25: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/25.jpg)
Chain Replication
Database
client
write
LogAck
LogLogLog
Stable Storage Unit
25
Available Logs
Synchronous logging abstraction
Low latency
![Page 26: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/26.jpg)
26
Log Server
Log
![Page 27: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/27.jpg)
27
Log Server
Receiver
Persister
Reader
356
1
1
23
2
7
Sequential Write
Seek time
![Page 28: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/28.jpg)
28
Zookeeper
ID3ID2ID1
Forming storage units
1. Query zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
members
5. Upload storage unit data
6. Start the service
ID2 ID3ID1
![Page 29: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/29.jpg)
Storage System
29
Stable storage unit Stable storage unit
Stable storage unit Stable storage unit
Zookeeper
Client
Client
Client
![Page 30: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/30.jpg)
30
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
![Page 31: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/31.jpg)
31
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
![Page 32: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/32.jpg)
32
Failover
ID 440%
ID 545%
ID 150%
ID 620%
ID 220%
ID 330%
Stable Storage Unit Stable Storage Unit
Cient
![Page 33: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/33.jpg)
33
Evaluation
• Throughput and latency of stable storage unit– Log entry sizes– Replication factors
• Comparison with WAL into local disk
![Page 34: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/34.jpg)
34
Single synchronous client
Entry Size (bytes)
Latency(ms) Throughput(entries/sec)
200 0,45 2200
1024 0,62 1600
4096 0,99 1000
Replication factor of 3
![Page 35: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/35.jpg)
35
Throughput vs. Latency
0 5000 10000 15000 20000 25000 30000 35000 400000
500
1000
1500
2000
2500
3000
3500
Replication factor of 3
5 B200 B1 KB4 KB10 KB
Throughput (entries/sec)
Late
ncy
(ms)
340002800014000
5000
![Page 36: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/36.jpg)
36
Additional replica
0 5000 10000 15000 20000 25000 30000 35000 400000
200
400
600
800
1000
1200
1400
1600
1800
2000Entry size of 200 bytes
RF 3RF 2
Throughput (entries/sec)
Late
ncy
(micr
osec
ond)
![Page 37: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/37.jpg)
37
Sustained load
![Page 38: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/38.jpg)
38
WAL to local disk vs Stable storage unit
200 1024 40960
10
20
30
40
50
60
1.76 1.81 2.03
49.46 49.81 49.87
0.45 0.62 13.01 4.2
15
Disk (cache enabled)Disk (cache disabled)Stable Storage UnitStable Storage Unit (buffer full)
Entry size (bytes)
late
ncy
(ms)
![Page 39: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/39.jpg)
39
Resource utilization
• Throughput of 6,000 entries/sec• Log entries of 200 bytes– CPU utilization = 9%– Bandwidth = 29 Mb/s – Dedicated disk– Small memory requirement
![Page 40: Presentation](https://reader036.fdocuments.us/reader036/viewer/2022081413/546757c5af7959485c8b602e/html5/thumbnails/40.jpg)
40
Summary
Durable write
Low latency
High availability
No additional resources
Avoid dependencies
Scalable