Project Almanac: A Time-Traveling Solid-State...
Transcript of Project Almanac: A Time-Traveling Solid-State...
Project Almanac: A Time-Traveling Solid-State Drive
Xiaohao Wang Yifan Yuan Chance C. Coats Jian Huang
Systems and Platform Research Group
You Zhou
Retaining Past Storage States is Important
User file operations such as multi-versioning
Retaining Past Storage States is Important
Recovering files following a system crash
Retaining Past Storage States is Important
Storage forensics and police investigations
Software-Based Approaches to Retain Past Storage States
Journaling and
Logging
Software-Based Approaches to Retain Past Storage States
Journaling and
Logging
Local and Cloud
Backups
Software-Based Approaches to Retain Past Storage States
Journaling and
Logging
Local and Cloud
BackupsSnapshots and
Checkpointing
Software-Based Approaches to Retain Past Storage States
Journaling and
Logging
Local and Cloud
BackupsSnapshots and
Checkpointing
Software-based approaches not only reduce storage
performance, but also increase storage cost!
The Security Problem with Software-Based Approaches
The Security Problem with Software-Based Approaches
Software systems are vulnerable to malware
The Security Problem with Software-Based Approaches
Malware may obtain kernel privileges and stop
or destroy backups and logging functions
The Underlying Problem: Threat Model Analysis
Block Interface Driver
Kernel Space
User Space
Applications
Block I/O Storage Device
Read/Write Requests
The Underlying Problem: Threat Model Analysis
Block Interface Driver
Kernel Space
User Space
Applications
Block I/O Storage Device
Read/Write Requests
Software cannot be trusted to retain storage states
in the presence of malware attacks!
The Underlying Problem: Threat Model Analysis
Block Interface Driver
Kernel Space
User Space
Applications
Block I/O Storage Device
Read/Write Requests
We therefore turn to hardware solutions to improve
the security of retaining storage states!
Project Almanac: Our Goals
Firmware-isolated
Protection
Project Almanac: Our Goals
Firmware-isolated
Protection
Minimal
Performance
Overhead
Project Almanac: Our Goals
Firmware-isolated
Protection
Preserving
Software
Functionality
Minimal
Performance
Overhead
Hardware Protection Motivated by Flash Technology
Block Interface Driver
Kernel Space
User Space
Applications
Read/Write Requests
Block I/O Storage Device
Hardware Protection Motivated by Flash Technology
Block Interface Driver
Kernel Space
User Space
Applications
Read/Write Requests
Hard Disk Drive
Hardware Protection Motivated by Flash Technology
Block Interface Driver
Kernel Space
User Space
Applications
Read/Write Requests
Hard Disk Drive Flash-based SSD
Why Solid-State Drives?
Lower Latency and
Higher Throughput
Why Solid-State Drives?
Lower Latency and
Higher Throughput
Massive
Parallelism
Why Solid-State Drives?
Lower Latency and
Higher Throughput
Massive
ParallelismCommodity Pricing
$0.20/GB
Why Solid-State Drives?
Lower Latency and
Higher Throughput
Massive
ParallelismCommodity Pricing
$0.20/GB
Flash is widely used in modern computing systems!
Solid-State Drive: A Perfect Candidate
SSD Controller/Firmware
Embedded
ProcessorBlock I/O
Interface
DRAM
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Channel 0
Channel 1
Channel 2
Channel 3
Solid-State Drive: A Perfect Candidate
SSD Controller/Firmware
Embedded
ProcessorBlock I/O
Interface
DRAM
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Channel 0
Channel 1
Channel 2
Channel 3
Firmware-Isolated Interface
Solid-State Drive: A Perfect Candidate
SSD Controller/Firmware
Embedded
ProcessorBlock I/O
Interface
DRAM
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Channel 0
Channel 1
Channel 2
Channel 3
Embedded Processor with DRAM
Solid-State Drive: A Perfect Candidate
SSD Controller/Firmware
Embedded
ProcessorBlock I/O
Interface
DRAM
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Flash
Chip
Channel 0
Channel 1
Channel 2
Channel 3
Massive Parallelism from Flash Channels
How SSDs Used in Modern Computer Systems?
User Applications
How SSDs Used in Modern Computer Systems?
User Applications
File System
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash-based Disk
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
Write A
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
Write A
A
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
A
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
A
Write A’
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
A
Write A’
A A’
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
AA A’
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
AA A’
Garbage
Collection
How SSDs Used in Modern Computer Systems?
User Applications
File System
Flash Translation Layer (FTL)
Flash Chips
Out-of-Place Updates
AA A’
Garbage
Collection
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Write A
Blocks Blocks
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Write A
Blocks Blocks
A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Write A
Blocks Blocks
A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Write A
Blocks Blocks
A A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
Write A’
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
Write A’
A’
A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
A’
A
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
A’
A
Write A’
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
A’
A
Write A’
A’
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
A’
A A’
Retaining Storage-States with Out-of-Place Update
Solid-State Drive Hard Disk Drive
Blocks Blocks
A A
A’
A A’
Past Storage States are Inherently Retained in Flash!
Limited Storage Capacity? Compression is Key!
Differences between pages are encoded as deltas
Limited Storage Capacity? Compression is Key!
Page
A’
Reference Flash
Page
Differences between pages are encoded as deltas
Limited Storage Capacity? Compression is Key!
Page
A’
Reference Flash
Page
+
Retained Invalid
Flash Page
Page
A
Differences between pages are encoded as deltas
Limited Storage Capacity? Compression is Key!
Page
A’
Reference Flash
Page
+ =Page
A
Delta
Compressed Delta
with CR of ~0.2
Retained Invalid
Flash Page
Page
A
Differences between pages are encoded as deltas
Limited Storage Capacity? Compression is Key!
Page
A’
Reference Flash
Page
+ =Page
A
Delta
Compressed Delta
with CR of ~0.2
Retained Invalid
Flash Page
Page
A
Multiple deltas are coalesced into delta blocks
Limited Storage Capacity? Compression is Key!
Page
A’
Reference Flash
Page
+ =Page
A
Delta
Compressed Delta
with CR of ~0.2
Retained Invalid
Flash Page
Page
A
Delta Compression allows for retaining more storage state!
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
LPA
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
LPATimestamp
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
LPATimestampBack-Pointer
PPA Y
Null
A’
A
Page X
Page Y
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
Maintain Chains of
Back-Pointers
LPATimestampBack-Pointer
D → Z
C → Y
B → X
A → W
...PPA Y
Null
A’
A
Page X
Page Y
How to Achieve the Time-Traveling Property?
Flash Page
OOB Region
Data
Utilize Out-of-Band
Region
Maintain Chains of
Back-Pointers
Quickly Traverse
with Mapping Tables
LPATimestampBack-Pointer
Maintaining a Complete Chain
Data page chains are vulnerable to Garbage Collection
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Data page Y is selected for garbage collection!
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Data page Y is selected for garbage collection!
Data Page A
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Data page Y is selected for garbage collection!
Data page A and Z will be
lost!
Data Page A
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Delta page chains are NOT vulnerable to garbage collection!
Data Page A
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Page Y and Z are compressed to create a delta chain
Data Page A
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
Page Y and Z are compressed to create a delta chain
...
B → Q
...
Index Mapping
Table
... ... ... ...
Delta Page Q Delta Page P
Delta Compress
Data Page A
Maintaining a Complete Chain
...
B → X
...
Address
Mapping Table
Data Y Data Z Data --
Data Page X Data Page Y Data Page Z
--
...
B → Q
...
Index Mapping
Table
... ... ... ...
Delta Page Q Delta Page P
All storage states can be retained with data and delta chains!
Data Page A
Reclaiming Garbage Data in Time Order
Garbage collection must quickly find the state of invalid pages
Reclaiming Garbage Data in Time Order
Garbage collection must quickly find the state of invalid pages
Page
A
Page
B
Reclaiming Garbage Data in Time Order
Garbage collection must quickly find the state of invalid pages
Page
A
Retained Invalid
Page → Keep
Page
B
Reclaiming Garbage Data in Time Order
Garbage collection must quickly find the state of invalid pages
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
A table could be used to track page invalidation timestamps
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
Even for reasonable SSD sizes, this table is too large to cache
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
We need a space-efficient solution!
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
We use Bloom filters to track invalid pages efficiently
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
Invalid pages are added to the active Bloom filter
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
0
1
1
...
0
1
0
Inactive Bloom
Filter
More Bloom filters are created as pages are invalidated
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
0
1
1
...
0
1
0
Inactive Bloom
Filter
All filters are checked for hits during garbage collection
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
0
1
1
...
0
1
0
Inactive Bloom
Filter
Any pages which hit in the filters must be retained
Reclaiming Garbage Data in Time Order
Page
A
Retained Invalid
Page → Keep
Expired Page →
Reclaimable
Page
B
Physical Page
Address
Invalidation Time
0 T0
1 T11
...
N-2 T2
N-1 T5
0
1
1
...
0
1
0
Active Bloom
Filter
Invalid PPAs
Hash Functions
h0
h1
h2
0
1
1
...
0
1
0
Inactive Bloom
Filter
Bloom filters quickly identify expired data and save space!
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Retention
Duration
Manager
Garbage
Collection
Garbage Collection
Feedback
Retrievable Time Window
011...
010
T0
011...
010
T1
BF0
BF1
011...
010
T2
BF2
Invalid
PPAs
Hash Functions
h0
h1
h2
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Bloom Filters Hold
PPAs in Order
Retention
Duration
Manager
Garbage
Collection
Garbage Collection
Feedback
Retrievable Time Window
011...
010
T0
011...
010
T1
BF0
BF1
011...
010
T2
BF2
Invalid
PPAs
Hash Functions
h0
h1
h2
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Bloom Filters Hold
PPAs in Order
Retention
Duration
Manager
Garbage
Collection
Garbage Collection
Feedback
Retrievable Time Window
011...
010
T0
011...
010
T1
BF0
BF1
011...
010
T2
BF2
Invalid
PPAs
Hash Functions
h0
h1
h2
Workload Variations: Keeping an Adaptive Window
Trade-off Between
Performance and
Retention Time
Bloom Filters Hold
PPAs in Order
Retention
Duration
Manager
Garbage
Collection
Garbage Collection
Feedback
TimeKits: State Query
Perform fast state queries by leveraging back-pointer chains
TimeKits: State Query
Per-address state queries retrieve the history of a given LPA
TimeKits: State Query
Time-range state queries retrieve all LPA changes in a range of time
TimeKits: State Rollback
Possible to rollback any address to a previous state
A Timestamp
T1
TimeKits: State Rollback
Possible to rollback any address to a previous state
A Timestamp
T1
TimeKits: State Rollback
Possible to rollback any address to a previous state
Update to A
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
Possible to rollback any address to a previous state
Update to A
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
Possible to rollback any address to a previous state
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
Possible to rollback any address to a previous state
Rollback A to previous timestamp
ATimestamp
T2
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
A Timestamp
T1
Possible to rollback any address to a previous state
Rollback A to previous timestamp
ATimestamp
T2
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
A Timestamp
T1
Possible to rollback any address to a previous state
ATimestamp
T2
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
A Timestamp
T1
Channel parallelism allows fast rollback of multiple addresses
ATimestamp
T2
ATimestamp
T2
A Timestamp
T1
ATimestamp
T1
TimeKits: State Rollback
A Timestamp
T1
Channel parallelism allows fast rollback of multiple addresses
B Timestamp
T3
B Timestamp
T3
B Timestamp
T4
Implementation and Experimental Setup HW Platform
Cosmos+ OpenSSD FPGA
development Board
1TB SSD, 4KB page
with 12B OOB data
Benchmarks
Storage traces from MSR and FIU
IOZone benchmarks
PostMark benchmark
OLTP database engine
Ransomware malware samples
Experiment
Setup
Performance: TimeSSD vs. Regular SSDs
Performance: TimeSSD vs. Regular SSDs
<10% performance
overhead
Device Lifetime: TimeSSD vs. Regular SSDs
Device Lifetime: TimeSSD vs. Regular SSDs
10 - 15% increase in
write traffic
Device Lifetime: TimeSSD vs. Regular SSDs
10 - 15% increase in
write traffic
SW solutions have
WA factors of 2.5-6!
TimeSSD vs. Software-Based Solutions
TimeSSD vs. Software-Based Solutions
Similar Performance
TimeSSD vs. Software-Based Solutions
Similar Performance1.5 - 2.2x better
performance
Workload-Adaptive State Retention
Workload-Adaptive State Retention
Guaranteed 3 day
retention window
Workload-Adaptive State Retention
Workload-Adaptive State Retention
Retained for > 50
days
Data Recovery Time After Ransomware Attacks
Data Recovery Time After Ransomware Attacks
14% overheadTimeSSD must
decompress past states
Data Recovery Time After Ransomware Attacks
14% overheadTimeSSD must
decompress past states
Future work using HW
accelerators!
Project Almanac Summary
Firmware Isolation
Increased Security
Project Almanac Summary
Firmware Isolation
Increased Security
Minimal Impact on
Performance and
Lifetime
Achieved Software
Functionality
Project Almanac Summary
Firmware Isolation
Increased Security
Minimal Impact on
Performance and
Lifetime
Thanks!
Q&A
Xiaohao Wang Yifan Yuan Chance C. Coats Jian Huang
Systems and Platform Research Group
You Zhou