Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008
-
Upload
violet-santana -
Category
Documents
-
view
42 -
download
2
description
Transcript of Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008
![Page 1: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/1.jpg)
Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.
Zhou (MSR, Silicon Valley), OSDI 2008
Shimin Chen
Big Data Reading Group
![Page 2: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/2.jpg)
Introduction
SSD: block-level APIs as disks Lost of opportunity
Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases
![Page 3: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/3.jpg)
Idea: Transactional Flash (Txflash)
An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct
Each tranx consists of a series of write operations Atomicity Isolation Durability
![Page 4: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/4.jpg)
Why is this useful?
Transaction abstraction required in many places: file system journals, etc.
Each application implements its own Complexity Redundant work Reliability of the implementation
Great if a storage layer provides transactional API
![Page 5: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/5.jpg)
Previous Work: disk-based
Copy-on-Write + Logging Fragmentation poor read performance
Checkpointing and cleaning Cleaning cost
SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast
![Page 6: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/6.jpg)
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
![Page 7: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/7.jpg)
TxFlash Architecture & API
s
WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability
Abort aborting in-progress tranx
In-progress tranx
Not issue conflict writes
Core of TxFlash
![Page 8: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/8.jpg)
Simple Interface
WriteAtomic: multi-page writes Useful for file systems
Not full-fledged tranx: no reads in tranx Reduce complexity
Backward compatible
![Page 9: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/9.jpg)
Flash is good for this purpose
Copy-on-write: already supported by FTL Fast random reads High concurrency
multiple flash chips inside New device:
New interface more likely
![Page 10: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/10.jpg)
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
![Page 11: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/11.jpg)
Traditional Commit
First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record
Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be
garbage collected
![Page 12: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/12.jpg)
Traditional Commit on SSDs
Optimizations: All writes can be issued in parallel Not update the original data, just update the
remap table Problem: commit record
Extra latency after other writes Garbage collection is complicated:
Must know if all the updates complete or not
![Page 13: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/13.jpg)
New Proposal (1): Simple Cyclic Commit
No commit record Intension records of the same tranx use
next links to form a cycle (data, page# & version#, next page# & version#)
Tranx is committed == all intension records are written
Flash page (4KB) + metadata (128B)are co-located
![Page 14: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/14.jpg)
Problem
![Page 15: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/15.jpg)
Solution:
Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page
![Page 16: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/16.jpg)
Operations
Initialization: Setting version# to 0, next-link to self
Transaction Garbage Collection:
For any uncommitted intention For committed page if a newer version is
committed Recovery: scan all pages then look for cycles
![Page 17: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/17.jpg)
New Proposal (2):Back Pointer Cyclic Commit
Another way to deal with ambiguity Intention record:
(data, page#&version#, next-link, link to last committed version)
![Page 18: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/18.jpg)
A3 is a straddler of A2
Some complexity in garbage collection and recovery because of this
![Page 19: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/19.jpg)
Protocol Comparison
![Page 20: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/20.jpg)
Outline
Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
![Page 21: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/21.jpg)
Implementation
Simulatior DiskSim
trace-driven SSD simulator (UNIX’08)modifications for TxFlash
Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:
Employ Txflash for Ext3 file system Tranx: Ext3 journal commit
![Page 22: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/22.jpg)
Experimental Setup
TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection
Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on
TxExt3 file system (sync writes)
Synthetic workloads
![Page 23: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/23.jpg)
Cyclic commit vs. Traditional commit
![Page 24: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/24.jpg)
Unlike database logging, large tranx sizes: no sync; data are included
![Page 25: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/25.jpg)
• simple cyclic commit has a high cost if there are aborts
![Page 26: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/26.jpg)
![Page 27: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/27.jpg)
TxFlash vs. SSD
Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction
guarantees (so should have better performance)
![Page 28: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/28.jpg)
Space comparison: TxFlash needs 25% of more main memory than SSD
• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device
![Page 29: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/29.jpg)
End-to-end performance
TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash
Ext3: Use SSD as journal
SSD cache is disabled in both cases
![Page 30: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/30.jpg)
![Page 31: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008](https://reader035.fdocuments.us/reader035/viewer/2022062422/56813065550346895d963cfb/html5/thumbnails/31.jpg)
Summary
TxFlash: Adding transaction interface in SSD Cyclic commit protocols
Nice solution for file system journaling