Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008...
-
Upload
oscar-jennings -
Category
Documents
-
view
212 -
download
0
description
Transcript of Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008...
Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.
Zhou (MSR, Silicon Valley), OSDI 2008
Shimin ChenBig Data Reading Group
Introduction SSD: block-level APIs as disks
Lost of opportunity
Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases
Idea: Transactional Flash (Txflash) An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct
Each tranx consists of a series of write operations Atomicity Isolation Durability
Why is this useful? Transaction abstraction required in many
places: file system journals, etc. Each application implements its own
Complexity Redundant work Reliability of the implementation
Great if a storage layer provides transactional API
Previous Work: disk-based Copy-on-Write + Logging
Fragmentation poor read performance Checkpointing and cleaning
Cleaning cost
SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast
Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
TxFlash Architecture & API
s
WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability
Abort aborting in-progress tranx
In-progress tranx
Not issue conflict writes
Core of TxFlash
Simple Interface WriteAtomic: multi-page writes
Useful for file systems Not full-fledged tranx: no reads in tranx
Reduce complexity Backward compatible
Flash is good for this purpose Copy-on-write: already supported by FTL Fast random reads High concurrency
multiple flash chips inside New device:
New interface more likely
Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
Traditional Commit First write to a log:
Intention record: (data, page# & version#, tranx ID) … Intention record Commit record
Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be
garbage collected
Traditional Commit on SSDs Optimizations:
All writes can be issued in parallel Not update the original data, just update the
remap table Problem: commit record
Extra latency after other writes Garbage collection is complicated:
Must know if all the updates complete or not
New Proposal (1): Simple Cyclic Commit No commit record Intension records of the same tranx use
next links to form a cycle (data, page# & version#, next page# & version#)
Tranx is committed == all intension records are written
Flash page (4KB) + metadata (128B)are co-located
Problem
Solution: Any uncommitted intention on the stable
storage must be erased before any new writes are issued to the same or a referenced page
Operations Initialization:
Setting version# to 0, next-link to self Transaction Garbage Collection:
For any uncommitted intention For committed page if a newer version is
committed Recovery: scan all pages then look for cycles
New Proposal (2):Back Pointer Cyclic Commit Another way to deal with ambiguity Intention record:
(data, page#&version#, next-link, link to last committed version)
A3 is a straddler of A2
Some complexity in garbage collection and recovery because of this
Protocol Comparison
Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion
Implementation Simulatior
DiskSimtrace-driven SSD simulator (UNIX’08)modifications for TxFlash
Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:
Employ Txflash for Ext3 file system Tranx: Ext3 journal commit
Experimental Setup TxFlash device:
32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection
Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on
TxExt3 file system (sync writes) Synthetic workloads
Cyclic commit vs. Traditional commit
Unlike database logging, large tranx sizes: no sync; data are included
• simple cyclic commit has a high cost if there are aborts
TxFlash vs. SSD Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction
guarantees (so should have better performance)
Space comparison: TxFlash needs 25% of more main memory than SSD
• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device
End-to-end performance TxFlash:
Run pseudo-device driver on real SSD The performance is close to that of TxFlash
Ext3: Use SSD as journal
SSD cache is disabled in both cases
Summary TxFlash:
Adding transaction interface in SSD Cyclic commit protocols
Nice solution for file system journaling