Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...
Transcript of Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...
![Page 1: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/1.jpg)
Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage
Andromachi Hatzieleftheriou, Stergios V. Anastasiadis
Department of Computer ScienceUniversity of Ioannina, Greece
1A. HatzieleftheriouUniversity of Ioannina
![Page 2: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/2.jpg)
Outline
Motivation
Design
Implementation
Evaluation
Conclusions
2A. HatzieleftheriouUniversity of Ioannina
![Page 3: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/3.jpg)
Motivation
• Synchronous small writes critical for system and application
reliability
• Multistream concurrency effectively random I/O
• In page-sized disk accesses async writes have good performance due to batching in memory
sync writes result in wasteful traffic due to excessive full-page I/Os
1
10
100
1000
0 1 10 100Tota
l Jo
urn
al V
olu
me
(M
B)
Request Size (KB)
Write Traffic (Linux ext3)
Data Journaling
Ordered
Page Size=4KB
3A. Hatzieleftheriou
data & metadata
metadata only
University of Ioannina
![Page 4: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/4.jpg)
Design Goals
1. Reliable storage keep data on disk
2. Inexpensive synchronous small writes
sequential disk throughput
3. Reduce disk bandwidth waste due to: writes with high positioning overhead
unnecessary writes of unmodified data
• Proposed approach: batch random small writes in memory
journal data updates at subpage granularity
4A. HatzieleftheriouUniversity of Ioannina
![Page 5: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/5.jpg)
DISK Filesystem
Wasteless Journaling
• Idea:1. Synchronously transfer data deltas from memory to journal
2. Occasionally move data blocks from memory to final location
• Still wasteful!
large writes disk traffic duplication
MEMORY
Journal
Pages
5A. HatzieleftheriouUniversity of Ioannina
data deltas
![Page 6: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/6.jpg)
Selective Journaling
• Definition: write threshold differentiates requests by size
• Idea:1. Transfer large requests to final location without journaling of data
2. Treat small requests according to wasteless journaling
MEMORY
DISK
Journal
data deltas
Pages
Filesystem
6A. HatzieleftheriouUniversity of Ioannina
![Page 7: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/7.jpg)
Consistency
• Wasteless Journaling: atomic updates of both data and metadata
• Selective Journaling: data updates either journaled or not depending on request size
consistency at least as strict as default ext3 journaling mode (ordered)
7A. HatzieleftheriouUniversity of Ioannina
![Page 8: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/8.jpg)
Prototype Implementation
HeaderTagTag
Tag
Data Delta
Data Delta
Data Delta
Data Copies
Block Buffer
Page Cache
Journal Descriptor Block
Multiwrite Journal Block
Modified DataOriginal Data
• block num of final location• offset in page• length in bytes
…
…
…
• Multiwrite journal block
accumulates multiple subpage data updates
• During recovery
apply data deltas to corresponding final disk blocks
8A. HatzieleftheriouUniversity of Ioannina
![Page 9: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/9.jpg)
Experiments
• Implemented in Linux kernel 2.6.18 ext3
• Experimentation Environment: x86-based servers
quad-core 2.66GHz processor
3GB RAM
Seagate Cheetah SAS 300GB 15KRPM disks
• Workloads: Microbenchmarks
Postmark
MPIO-IO over PVFS2
9A. HatzieleftheriouUniversity of Ioannina
![Page 10: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/10.jpg)
Latency
⁻ Data & wasteless achieve substantially lower write latency similar to NILFS (stable Linux port of LFS )
⁻ NILFS read latency significantly higher due to poor storage locality!
1
10
100
1000
0 20 40 60 80 100
Wri
te L
aten
cy (
ms)
Number of Streams
1 Mbps/stream
SelectiveOrderedWastelessDataNILFS 1
10
100
1000
0 20 40 60 80 100
Re
ad L
ate
ncy
(u
s)
Number of Streams
1 Mbps/stream
NILFSSelectiveOrderedDataWasteless
10A. HatzieleftheriouUniversity of Ioannina
![Page 11: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/11.jpg)
Disk Traffic
⁻ Data journaling expensive in terms of journal traffic
⁻ Ordered journaling incurs increased filesystem traffic
⁻ Wasteless & selective substantially reduce journal and filesystem traffic
0.001
0.01
0.1
1
10
0 2000 4000 6000 8000
Jou
rnal
Th
rou
ghp
ut
(MB
/s)
Number of Streams
1Kbps/stream
Data
Wasteless
Selective
Ordered
University of Ioannina 11A. Hatzieleftheriou
0
1
2
3
4
5
0 2000 4000 6000 8000File
Sys
tem
Th
rou
ghp
ut
(MB
/s)
Number of Streams
1Kbps/stream
Ordered
Data
Wasteless
Selective
Lower is better!
![Page 12: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/12.jpg)
Application-Level Workloads
₋ Small files workload wasteless increases transaction throughput
₋ Parallel I/O workload 13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines)
wasteless doubles the throughput of parallel application checkpointing
0
200
400
600
800
0 1 10 100
Tran
sact
ion
s/s
Request Size (KB)
Postmark
Wasteless
Data
Selective
Ordered
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40
Thro
ugh
pu
t M
B/s
Threads per Client
MPI-IO over PVFS2(Write Size 1KB)
Wasteless
Data
Selective
Ordered
12A. HatzieleftheriouUniversity of Ioannina
![Page 13: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream](https://reader035.fdocuments.us/reader035/viewer/2022071211/60230c2e8784dd08422c7c94/html5/thumbnails/13.jpg)
Conclusions & Future Work
• Key concept: apply subpage journaling of data updates to ensure reliability
• Wasteless Journaling merges subpage writes into page-sized journal blocks
• Selective Journaling journals only updates below a write threshold
• Performance benefits demonstrated over ext3: reduced write latency
improved transaction throughput
avoided bandwidth waste
• Future Work extent for virtualization environments and flash memory systems
13A. HatzieleftheriouUniversity of Ioannina