Tolerating Holes in Wearable Memories - TS | Data6111/02/2016 Tolerating Holes in Wearable Memories...

40
Tolerating Holes in Wearable Memories Karin Strauss Microsoft Research

Transcript of Tolerating Holes in Wearable Memories - TS | Data6111/02/2016 Tolerating Holes in Wearable Memories...

  • Tolerating Holes in Wearable Memories

    Karin Strauss

    Microsoft Research

  • 010101010101010101011010101010101010101011010101010101010101010101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010101010110101101010101010101010101101011010101010101010101011010110101010101010100101101010101010101010101101011010101010101010

    11/02/2016 1Tolerating Holes in Wearable Memories - Karin Strauss

    What if memory is not reliable?

  • DRAM is starting to scale poorly • Scaling results in less charge / cell

    – Leaks relatively more charge

    – More susceptible to transient errors

    – Maintaining state requires higher refresh rate

    • More manufacturing failures

    Phase change memory (PCM)• Expected to scale further

    • Multi-level cells (MLC)

    • Slower writes

    • Wears out fasterSubstrate

    Bottom electrode

    Top electrode

    Phase-

    change

    material

    11/02/2016 2Tolerating Holes in Wearable Memories - Karin Strauss

    Where is memory headed?

  • Desired properties:

    •Dynamic failure recovery

    •Graceful degradation

    •Reasonable memory longevity

    11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 4

    Living with wearable memory

    OSCritical

    data

    PCM DRAM

    Applications

    Critical data

    is protected

    from failures

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0 5E+09 1E+10 1.5E+10 2E+10 2.5E+10

    Use

    ful M

    em

    ory

    (%

    )

    Writes to Memory

    DRAM PCM+EC line

  • 01010101010100001011110101010101010101010111010101010101101010101010101101010110101001110010101010101011101011101010101101010101010101001010101101010101010100101010101010

    01010101010100001011110101010101010101010111010101010101101010101010101101010110101001110010101010101011010101010101010110010101011010110101010101010010101010101010101010

    01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100

    But lots of cells are still alive!11/02/2016 5Tolerating Holes in Wearable Memories - Karin Strauss

    Surviving wear-induced failures

  • 01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100

    11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 6

    0101010111010101 00000

    Data Metadata

    01011

    Replacement bit

    But lots of cells are still alive!

    • Cells within a line

    • Lines within a page

    Error correcting pointers [ISCA 2010]

  • Getting the best out of wearable memory

    11/02/2016 7Tolerating Holes in Wearable Memories - Karin Strauss

    Reduce [PLDI’13]

    Discard failed blocks,

    instead of pages

    Use imperfect pages to

    store approximate data

    Reuse [MICRO’13, ASPLOS’16]Recycle [ISCA’13]

    Use bits in dead pages

    to maintain live pages

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 8

    Extending Memory Lifetime by

    Reviving Dead Blocks

    Zombie Memory

    with Rodolfo Azevedo, John Davis, Parikshit Gopalan, Mark Manasse and Sergey Yekhanin

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 9

    Using a dead page to keep others alive

    • Dead page bit recycling to correct nearly-dead lines in other pages

    • Deployment of correction resources is on-demand

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 12

    Recycling blocks improves memory lifetime

    SLC: 92% longer lifetime than state-of-the-art

    MLC: 11-17x longer lifetime than drift-tolerant code

    lifetim

    elif

    etim

    e

    P Z

    B Z

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 17

    Using Managed Runtime Systems to

    Tolerate Holes in Wearable Memory

    with Tiejun Gao, Steve Blackburn, Kathryn McKinley, Doug Burger and James Larus

  • OS

    Page granularity

    Hardware correction

    Diminishing returns

    OS

    Hardware

    Program

    Hardware

    OS

    11/02/2016 18Tolerating Holes in Wearable Memories - Karin Strauss

    Coping with failures today

    Insufficient

  • Opportunity

    •Memory abstraction

    •Finer granularity than OS

    •Application transparency

    Managed program

    OS

    Hardware

    Managed runtime

    Yay, I don’t need

    to do anything!

    11/02/2016 19Tolerating Holes in Wearable Memories - Karin Strauss

    Managed runtimes to the rescue

  • 010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101

    01010101010101010101010101010101010101010101010101010101010101010101010101010101

    010101010101010101010101010101010101010101010101

    0101010101010101

    01010101010101010101010101010101010101010101010101010101010101010101010101010101

    010101010101010101010101010101010101010101010101

    0101010101010101

    11/02/2016 20Tolerating Holes in Wearable Memories - Karin Strauss

    Faulty pages are still usable

  • PCM

    • Allocator steps over failures

    • OS notifies the managed runtime of failures

    • OS maintains failure map

    • Plenty of PCM and a small amount of DRAM

    Failure map

    DRAM

    OSCritical

    data

    Managed runtime

    11/02/2016 21Tolerating Holes in Wearable Memories - Karin Strauss

    System architecture

  • Type Step over holes Good locality

    Contiguous Allocation

    Free List

    Mark Region

    ✓✗✓ ✗✓ ✓

    Immix

    • Mark-region memory manager

    • Best proven performance

    11/02/2016 22Tolerating Holes in Wearable Memories - Karin Strauss

    What kind of allocator?

  • recycled allocation limit

    recycled allocation start

    Recycled block pool

    Freshly allocated Live: marked in previous collection

    block

    line

    Free

    11/02/2016 23Tolerating Holes in Wearable Memories - Karin Strauss

    Immix

  • block

    line

    Recycled block pool

    recycled allocation limit

    recycled allocation start

    Freshly allocated Live: marked in previous collection FailureFree

    11/02/2016 24Tolerating Holes in Wearable Memories - Karin Strauss

    Failure-aware Immix

  • • Better memory efficiency

    • Transparent to applications

    • Fragmentation

    11/02/2016 25Tolerating Holes in Wearable Memories - Karin Strauss

    Problem solved?

  • 1

    1.04

    1.08

    1.12

    1.16

    64

    Tim

    e /

    Tim

    e b

    est

    Imm

    ix

    Failure Cluster Granularity

    10%

    64 128 256 512 1024 2048 4096 8192 16384

    f

    11/02/2016 26Tolerating Holes in Wearable Memories - Karin Strauss

    Conceptual failure clustering

  • 0 1 2 3 4 5 6 7 8 9 10 11

    After redirection

    0 1 2 3 4 5 6 7 8 9 10 11

    Redirection map Redirection map

    Free page Free page

    0 0

    1 1

    7 10

    2 92 2

    Before redirection

    11/02/2016 27Tolerating Holes in Wearable Memories - Karin Strauss

    Practical failure clustering

  • • Jikes RVM 3.1.2 Release

    •DaCapo 9.12-bach and DaCapo-2006-10

    • Intel Core i7 2600, 4GB, Ubuntu 10.04.1 LTS

    •20 invocations for each benchmark

    11/02/2016 28Tolerating Holes in Wearable Memories - Karin Strauss

    Evaluation methodology

  • 0.95

    1

    1.05

    1.1

    1.15

    1.2

    1.25

    1.3

    1.35

    1.4

    1.45

    avrora eclipse pmd geomean

    Tim

    e /

    Tim

    e b

    est

    Imm

    ix

    0%

    10%

    25%

    50%

    Failure rate

    • No overhead in the absence of failures

    • Overheads grow slowly as failure rate increases

    • Overheads are low, even for 50% failure rate

    11/02/2016 30Tolerating Holes in Wearable Memories - Karin Strauss

    Results

  • 0.95

    1

    1.05

    1.1

    1.15

    1.2

    1.25

    1.3

    1.35

    1.4

    1 2 3 4 5 6

    Tim

    e /

    Tim

    e b

    est

    Imm

    ix

    Heap Size / Minimum

    S-IX PCM 0% PCM 10% No CL PCM 10%Immix FT Immix 0% FT Immix 10%, No Cl FT Immix 10%

    11/02/2016 31Tolerating Holes in Wearable Memories - Karin Strauss

    Clustering hardware helps performance

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 32

    Approximate Storage

    with Adrian Sampson, Jacob Nelson and Luis Ceze

    with Qing Guo, Luis Ceze and Henrique Malvar

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 33

    Approximation enables optimizations

    Not all applications need 100% precision

    • Machine learning

    • Sensor data processing

    • Image and video processing

    Can trade-off accuracy for:

    • Extended lifetime

    • Faster writes

    • Denser cells

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 34

    Trading off accuracy in a disciplined way

    EnerJ provides safety

    Statically separate critical and

    non-critical program components

    Precise Approximate

    ✗✓

    int a = ...;

    int p = ...;

    @approx

    @precise

    p = a;

    a = p;

    ✗if (a == 10){

    p = 2;

    }

  • 011010100011101100000001

    011010100011101000000001

    11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 35

    Reusing ‘broken’ memory

    no change irrelevant change

    Opportunities:

    • Worn out memory still useful for ‘approximate data’

    • Lower number of iterations on writes

    • Lower energy writes

    • Faster writes

    • Density improvements

    R(Ω)

    11

    10

    01

    00

    11

    10

    01

    00

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 36

    Approximation in memory is useful

    improvement accuracy loss

    density 157% 0.1-4.2%*

    write speed 24-114% 1-10%

    lifetime 2-39% < 10%

    Evaluated for both main memory and persistent storage

    Approximations enable desirable further improvements

    Applications benefit differently

  • 11/02/2016 Approximate Storage - Karin Strauss 37

    How about encoded images?

    01010101010100001011110101010101010101010111010101010101101010101010101101010110110101011010100111001010101010101101010101010010101011010101010101010101000101010101010100

    010101010101000010111101010101010101010101110101010100

    Certain bits are more important than others

    [ASPLOS 2016]

  • 11/02/2016 Approximate Storage - Karin Strauss 38

    Different level of protection for bits of different importance

    010010110010011010100010101001010101011101011010110101

    010101010101000010111101010101010101010101110101010100

    0010110

    0010

    Error correction

    Algorithm co-development to the rescue

  • 11/02/2016 Approximate Storage - Karin Strauss 39

    Baseline 2-

    level cells

    Optimized

    8-level cells

    8-level,

    completely

    error-

    corrected

    8-level,

    selectively

    error-

    corrected

    Bit

    capacity1x 3x 2.2x 2.7x

    Image

    quality

    Selective approximation improves density

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 40

    Memory Going Forward &

    Opportunities in Memory Management

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 41

    The rise of new memory technologies

    Capacitive

    DRAM Flash

    Resistive

    Electrons

    escape

    from cells

    Electrons

    damage oxide,

    cause wear

    Different points in the space:

    • Read/write speeds

    • Read/write throughput

    • Read/write energy

    • Reliability/wear issues

    PCM CB-RAM

    STT-MRAM Memristor

    3D Xpoint

    Heterogeneity

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 42

    Heterogeneity even within DRAM parts

    Multiple memory bandwidths in the same system

    - High bandwidth

    - Low capacity

    - Higher cost

    - Low bandwidth

    - High capacity

    - Lower cost

    Traditional DRAM3D stacking + faster signaling

    • HMC

    • HBM

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 43

    I/O boundaries are blurring

    Volatile

    Non-volatile

    Processors

    HDD

    Non-volatile SSD

    DRAMNVRAM

    Fast

    Network

    I/O

    • Persistent heaps

    • Slow/fast persistence

    • Remote memory

    • Unified memoryUnification

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 44

    Scaling is getting more expensive

    Past: just wait and memory doubles for same cost

    Tim

    e Near future: memory doubles but gets more expensive

    Distant future: memory doubles by adding more chips

    Resource usage will become an issue again

    Resource constraints

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 45

    Programming languages/constructs to express:• Locality of data

    • Importance of data

    • Resource constraints

    Memory management mechanisms that are:• Footprint-conscious

    • Aware of locality patterns

    • Aware of performance, power and wear heterogeneity

    Opportunities created by memory trends

    Tools that allow programmers to profile:• Memory space usage

    • Power/energy consumption

    • Wear intensity

  • 11/02/2016 Tolerating Holes in Wearable Memories - Karin Strauss 46

    • Doug Burger

    • Gabriel Loh

    • Rodolfo Azevedo

    • Steve Blackburn

    • Luis Ceze

    • John Davis

    • Tiejun Gao

    • Parikshit Gopalan

    • Qing Guo

    • Andrew Hay

    • James Larus

    • Henrique Malvar

    • Mark Manasse

    • Kathryn McKinley

    • Jacob Nelson

    • Adrian Sampson

    • Stuart Schechter

    • Timothy Sherwood

    • Sergey Yekhanin

    Thanks to my collaborators!

  • Questions?

    11/02/2016 47Tolerating Holes in Wearable Memories - Karin Strauss