Post on 08-Feb-2021
Tolerating Holes in Wearable Memories
Karin Strauss
Microsoft Research
010101010101010101011010101010101010101011010101010101010101010101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010100101101010101010101010101101011010101010101010
11/02/2016 1Tolerating Holes in Wearable Memories - Karin Strauss
What if memory is not reliable?
DRAM is starting to scale poorly • Scaling results in less charge / cell
– Leaks relatively more charge
– More susceptible to transient errors
– Maintaining state requires higher refresh rate
• More manufacturing failures
Phase change memory (PCM)• Expected to scale further
• Multi-level cells (MLC)
• Slower writes
• Wears out fasterSubstrate
Bottom electrode
Top electrode
Phase-
change
material
11/02/2016 2Tolerating Holes in Wearable Memories - Karin Strauss
Where is memory headed?
Desired properties:
•Dynamic failure recovery
•Graceful degradation
•Reasonable memory longevity
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 4
Living with wearable memory
OSCritical
data
PCM DRAM
Applications
Critical data
is protected
from failures
0
10
20
30
40
50
60
70
80
90
100
0 5E+09 1E+10 1.5E+10 2E+10 2.5E+10
Use
ful M
em
ory
(%
)
Writes to Memory
DRAM PCM+EC line
01010101010100001011110101010101010101010111010101010101101010101010101101010110101001110010101010101011101011101010101101010101010101001010101101010101010100101010101010
01010101010100001011110101010101010101010111010101010101101010101010101101010110101001110010101010101011010101010101010110010101011010110101010101010010101010101010101010
01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100
But lots of cells are still alive!11/02/2016 5Tolerating Holes in Wearable Memories - Karin Strauss
Surviving wear-induced failures
01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 6
0101010111010101 00000
Data Metadata
01011
Replacement bit
But lots of cells are still alive!
• Cells within a line
• Lines within a page
Error correcting pointers [ISCA 2010]
Getting the best out of wearable memory
11/02/2016 7Tolerating Holes in Wearable Memories - Karin Strauss
Reduce [PLDI’13]
Discard failed blocks,
instead of pages
Use imperfect pages to
store approximate data
Reuse [MICRO’13, ASPLOS’16]Recycle [ISCA’13]
Use bits in dead pages
to maintain live pages
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 8
Extending Memory Lifetime by
Reviving Dead Blocks
Zombie Memory
with Rodolfo Azevedo, John Davis, Parikshit Gopalan, Mark Manasse and Sergey Yekhanin
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 9
Using a dead page to keep others alive
• Dead page bit recycling to correct nearly-dead lines in other pages
• Deployment of correction resources is on-demand
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 12
Recycling blocks improves memory lifetime
SLC: 92% longer lifetime than state-of-the-art
MLC: 11-17x longer lifetime than drift-tolerant code
lifetim
elif
etim
e
P Z
B Z
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 17
Using Managed Runtime Systems to
Tolerate Holes in Wearable Memory
with Tiejun Gao, Steve Blackburn, Kathryn McKinley, Doug Burger and James Larus
OS
Page granularity
Hardware correction
Diminishing returns
OS
Hardware
Program
Hardware
OS
11/02/2016 18Tolerating Holes in Wearable Memories - Karin Strauss
Coping with failures today
Insufficient
Opportunity
•Memory abstraction
•Finer granularity than OS
•Application transparency
Managed program
OS
Hardware
Managed runtime
Yay, I don’t need
to do anything!
11/02/2016 19Tolerating Holes in Wearable Memories - Karin Strauss
Managed runtimes to the rescue
010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101
01010101010101010101010101010101010101010101010101010101010101010101010101010101
010101010101010101010101010101010101010101010101
0101010101010101
01010101010101010101010101010101010101010101010101010101010101010101010101010101
010101010101010101010101010101010101010101010101
0101010101010101
11/02/2016 20Tolerating Holes in Wearable Memories - Karin Strauss
Faulty pages are still usable
PCM
• Allocator steps over failures
• OS notifies the managed runtime of failures
• OS maintains failure map
• Plenty of PCM and a small amount of DRAM
Failure map
DRAM
OSCritical
data
Managed runtime
11/02/2016 21Tolerating Holes in Wearable Memories - Karin Strauss
System architecture
Type Step over holes Good locality
Contiguous Allocation
Free List
Mark Region
✓✗✓ ✗✓ ✓
Immix
• Mark-region memory manager
• Best proven performance
11/02/2016 22Tolerating Holes in Wearable Memories - Karin Strauss
What kind of allocator?
recycled allocation limit
recycled allocation start
Recycled block pool
Freshly allocated Live: marked in previous collection
block
line
Free
11/02/2016 23Tolerating Holes in Wearable Memories - Karin Strauss
Immix
block
line
Recycled block pool
recycled allocation limit
recycled allocation start
Freshly allocated Live: marked in previous collection FailureFree
11/02/2016 24Tolerating Holes in Wearable Memories - Karin Strauss
Failure-aware Immix
• Better memory efficiency
• Transparent to applications
• Fragmentation
11/02/2016 25Tolerating Holes in Wearable Memories - Karin Strauss
Problem solved?
1
1.04
1.08
1.12
1.16
64
Tim
e /
Tim
e b
est
Imm
ix
Failure Cluster Granularity
10%
64 128 256 512 1024 2048 4096 8192 16384
f
11/02/2016 26Tolerating Holes in Wearable Memories - Karin Strauss
Conceptual failure clustering
0 1 2 3 4 5 6 7 8 9 10 11
After redirection
0 1 2 3 4 5 6 7 8 9 10 11
Redirection map Redirection map
Free page Free page
…
0 0
1 1
…
7 10
2 92 2
Before redirection
11/02/2016 27Tolerating Holes in Wearable Memories - Karin Strauss
Practical failure clustering
• Jikes RVM 3.1.2 Release
•DaCapo 9.12-bach and DaCapo-2006-10
• Intel Core i7 2600, 4GB, Ubuntu 10.04.1 LTS
•20 invocations for each benchmark
11/02/2016 28Tolerating Holes in Wearable Memories - Karin Strauss
Evaluation methodology
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
avrora eclipse pmd geomean
Tim
e /
Tim
e b
est
Imm
ix
0%
10%
25%
50%
Failure rate
• No overhead in the absence of failures
• Overheads grow slowly as failure rate increases
• Overheads are low, even for 50% failure rate
11/02/2016 30Tolerating Holes in Wearable Memories - Karin Strauss
Results
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1 2 3 4 5 6
Tim
e /
Tim
e b
est
Imm
ix
Heap Size / Minimum
S-IX PCM 0% PCM 10% No CL PCM 10%Immix FT Immix 0% FT Immix 10%, No Cl FT Immix 10%
11/02/2016 31Tolerating Holes in Wearable Memories - Karin Strauss
Clustering hardware helps performance
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 32
Approximate Storage
with Adrian Sampson, Jacob Nelson and Luis Ceze
with Qing Guo, Luis Ceze and Henrique Malvar
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 33
Approximation enables optimizations
Not all applications need 100% precision
• Machine learning
• Sensor data processing
• Image and video processing
Can trade-off accuracy for:
• Extended lifetime
• Faster writes
• Denser cells
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 34
Trading off accuracy in a disciplined way
EnerJ provides safety
Statically separate critical and
non-critical program components
Precise Approximate
✗✓
✓
✗
int a = ...;
int p = ...;
@approx
@precise
p = a;
a = p;
✗if (a == 10){
p = 2;
}
011010100011101100000001
011010100011101000000001
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 35
Reusing ‘broken’ memory
no change irrelevant change
Opportunities:
• Worn out memory still useful for ‘approximate data’
• Lower number of iterations on writes
• Lower energy writes
• Faster writes
• Density improvements
R(Ω)
11
10
01
00
11
10
01
00
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 36
Approximation in memory is useful
improvement accuracy loss
density 157% 0.1-4.2%*
write speed 24-114% 1-10%
lifetime 2-39% < 10%
Evaluated for both main memory and persistent storage
Approximations enable desirable further improvements
Applications benefit differently
11/02/2016 Approximate Storage - Karin Strauss 37
How about encoded images?
01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100
010101010101000010111101010101010101010101110101010100
Certain bits are more important than others
[ASPLOS 2016]
11/02/2016 Approximate Storage - Karin Strauss 38
Different level of protection for bits of different importance
010010110010011010100010101001010101011101011010110101
010101010101000010111101010101010101010101110101010100
0010110
0010
Error correction
Algorithm co-development to the rescue
11/02/2016 Approximate Storage - Karin Strauss 39
Baseline 2-
level cells
Optimized
8-level cells
8-level,
completely
error-
corrected
8-level,
selectively
error-
corrected
Bit
capacity1x 3x 2.2x 2.7x
Image
quality
Selective approximation improves density
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 40
Memory Going Forward &
Opportunities in Memory Management
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 41
The rise of new memory technologies
Capacitive
DRAM Flash
Resistive
Electrons
escape
from cells
Electrons
damage oxide,
cause wear
Different points in the space:
• Read/write speeds
• Read/write throughput
• Read/write energy
• Reliability/wear issues
PCM CB-RAM
STT-MRAM Memristor
3D Xpoint
Heterogeneity
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 42
Heterogeneity even within DRAM parts
Multiple memory bandwidths in the same system
- High bandwidth
- Low capacity
- Higher cost
- Low bandwidth
- High capacity
- Lower cost
Traditional DRAM3D stacking + faster signaling
• HMC
• HBM
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 43
I/O boundaries are blurring
Volatile
Non-volatile
Processors
HDD
Non-volatile SSD
DRAMNVRAM
Fast
Network
I/O
• Persistent heaps
• Slow/fast persistence
• Remote memory
• Unified memoryUnification
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 44
Scaling is getting more expensive
Past: just wait and memory doubles for same cost
Tim
e Near future: memory doubles but gets more expensive
Distant future: memory doubles by adding more chips
Resource usage will become an issue again
Resource constraints
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 45
Programming languages/constructs to express:• Locality of data
• Importance of data
• Resource constraints
Memory management mechanisms that are:• Footprint-conscious
• Aware of locality patterns
• Aware of performance, power and wear heterogeneity
Opportunities created by memory trends
Tools that allow programmers to profile:• Memory space usage
• Power/energy consumption
• Wear intensity
11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 46
• Doug Burger
• Gabriel Loh
• Rodolfo Azevedo
• Steve Blackburn
• Luis Ceze
• John Davis
• Tiejun Gao
• Parikshit Gopalan
• Qing Guo
• Andrew Hay
• James Larus
• Henrique Malvar
• Mark Manasse
• Kathryn McKinley
• Jacob Nelson
• Adrian Sampson
• Stuart Schechter
• Timothy Sherwood
• Sergey Yekhanin
Thanks to my collaborators!
Questions?
11/02/2016 47Tolerating Holes in Wearable Memories - Karin Strauss