Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Post on 31-Dec-2015

42 views 2 download

description

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity - PowerPoint PPT Presentation

Transcript of Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008

Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.

Zhou (MSR, Silicon Valley), OSDI 2008

Shimin Chen

Big Data Reading Group

Introduction

SSD: block-level APIs as disks Lost of opportunity

Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

Idea: Transactional Flash (Txflash)

An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct

Each tranx consists of a series of write operations Atomicity Isolation Durability

Why is this useful?

Transaction abstraction required in many places: file system journals, etc.

Each application implements its own Complexity Redundant work Reliability of the implementation

Great if a storage layer provides transactional API

Previous Work: disk-based

Copy-on-Write + Logging Fragmentation poor read performance

Checkpointing and cleaning Cleaning cost

SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

TxFlash Architecture & API

s

WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability

Abort aborting in-progress tranx

In-progress tranx

Not issue conflict writes

Core of TxFlash

Simple Interface

WriteAtomic: multi-page writes Useful for file systems

Not full-fledged tranx: no reads in tranx Reduce complexity

Backward compatible

Flash is good for this purpose

Copy-on-write: already supported by FTL Fast random reads High concurrency

multiple flash chips inside New device:

New interface more likely

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Traditional Commit

First write to a log: Intention record: (data, page# & version#, tranx ID) … Intention record Commit record

Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be

garbage collected

Traditional Commit on SSDs

Optimizations: All writes can be issued in parallel Not update the original data, just update the

remap table Problem: commit record

Extra latency after other writes Garbage collection is complicated:

Must know if all the updates complete or not

New Proposal (1): Simple Cyclic Commit

No commit record Intension records of the same tranx use

next links to form a cycle (data, page# & version#, next page# & version#)

Tranx is committed == all intension records are written

Flash page (4KB) + metadata (128B)are co-located

Problem

Solution:

Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page

Operations

Initialization: Setting version# to 0, next-link to self

Transaction Garbage Collection:

For any uncommitted intention For committed page if a newer version is

committed Recovery: scan all pages then look for cycles

New Proposal (2):Back Pointer Cyclic Commit

Another way to deal with ambiguity Intention record:

(data, page#&version#, next-link, link to last committed version)

A3 is a straddler of A2

Some complexity in garbage collection and recovery because of this

Protocol Comparison

Outline

Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Implementation

Simulatior DiskSim

trace-driven SSD simulator (UNIX’08)modifications for TxFlash

Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:

Employ Txflash for Ext3 file system Tranx: Ext3 journal commit

Experimental Setup

TxFlash device: 32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection

Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on

TxExt3 file system (sync writes)

Synthetic workloads

Cyclic commit vs. Traditional commit

Unlike database logging, large tranx sizes: no sync; data are included

• simple cyclic commit has a high cost if there are aborts

TxFlash vs. SSD

Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction

guarantees (so should have better performance)

Space comparison: TxFlash needs 25% of more main memory than SSD

• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device

End-to-end performance

TxFlash: Run pseudo-device driver on real SSD The performance is close to that of TxFlash

Ext3: Use SSD as journal

SSD cache is disabled in both cases

Summary

TxFlash: Adding transaction interface in SSD Cyclic commit protocols

Nice solution for file system journaling