conf-isca-2005

download conf-isca-2005

of 22

Transcript of conf-isca-2005

  • 8/6/2019 conf-isca-2005

    1/22

    1

    High Efficiency Counter Mode SecurityArchitecture via Prediction and Pre-computation

    Weidong Shi

    Hsien-Hsin (Sean) Lee

    Mrinmoy Ghosh

    Chenghuai Lu

    Alexandra Boldyreva

    School of Electrical and Computer Engineering

    Georgia Institute of Technology

  • 8/6/2019 conf-isca-2005

    2/22

    2

    Content

    Motivation

    Related Work

    Counter/Decryption Pad Prediction

    Profile Prediction Failures

    2Level Prediction

    Context Based Prediction

    Conclusions

  • 8/6/2019 conf-isca-2005

    3/22

    3

    Why Encrypt System Memory?

    Protect sensitive data stored in the RAM (many simple devicescan bypass OS memory protection and directly access physicalmemory)

    Digital Right Management (industry witness of gradual addition of

    encryption to each platform component, encrypted PCI-E, encrypteddisk, encrypted flash memory, then toward encrypted RAM)

    Anti-reverse engineer (majority software licenses require users notto do reverse engineer, count on the users not breaking the promise)

    Military (customer of encrypted FPGA chips, lots of embedded militarysoftware)

    Program randomization (intrusion prevention, CCS 2003)

  • 8/6/2019 conf-isca-2005

    4/22

    4

    Different Solutions

    SoC. Memory is on-chip.

    Apply to limited platforms

    such as small embedded

    systems (cell phones)

    Crypto Engine

    Processor Core

    Cache

    Configurable system

    RAM encryption. More

    usage models.

    CryptoEngine

    Flash

    Micro Controller

    Create a little secure

    world, limited application

    scenarios (code signing,

    BIOS signature verification)

  • 8/6/2019 conf-isca-2005

    5/22

    5

    Related Work

    Use dedicated cache (sequence number cache) to reducelatency overhead of memory decryption (Micro 2003)

    Prefetch based memory pre-decryption (WASSA 2004)

    Prediction based memory decryption (this paper) Fully exploit pre-computation capability enabled by counter mode

    encryption.

    Use wasted idle crypto engine pipeline stages for prediction andpre-computation.

    Less area overhead than caching and less memory pressure thanprefetch based pre-decryption.

  • 8/6/2019 conf-isca-2005

    6/22

    6

    Counter Mode - Encryption

    Processor Core

    CryptoEngine

    Cache LineCache Line

    ...

    Cache LineCache Line

    Counter

    16B

    Cache Line

    Encrypted 16B

    Key

    AES

    Block Cipher

    Encryption pad

    VAddr Counter

    16B

    Cache Line

    Encrypted 16B

    Key

    AES

    Block Cipher

    Encryption pad

    Vaddr+2

    Counter

    Counter+1 VAddr Counter+1 Vaddr+2

    Counter+1

    Counter+2 VAddr Counter+2 Vaddr+2

    Counter+2

    Each memory line has its own counter. Each time memory line is updated, increment the counter.

  • 8/6/2019 conf-isca-2005

    7/227

    Counter Mode -Decryption

    Processor Core

    CryptoEngine

    Cache LineCache Line

    ...

    Cache LineCache Line

    Key

    AES

    Block Cipher

    Encryption pad

    VAddr

    16B

    Cache Line16B

    Cache Line

    Counter+2 Counter+2

    Encrypted 16B Encrypted 16B

    Key

    AES

    Block Cipher

    Encryption pad

    Vaddr+2

    Counter has to befetched for memoryline missing L2.

  • 8/6/2019 conf-isca-2005

    8/228

    0xabcddcba123443f1

    0xabcddcba12344e0a

    ...

    0xabcddcba12344325

    0xabcddcba12344321

    ...

    Memory line

    Memory line

    Memory line

    Memory line

    Counters exhibit both spacial and temporal coherence.

    To exploit spacial coherence, memory blocks from the same pagestart counting from the same initial value (page root counter)

    Counter Prediction

    static data

    infrequently updated data

    frequently updated data

    counter

    Page Root Counter(64 bits)

    0xabcddcba12344321

    ...

    ...

    Page Base Addr

    0x0000ff00

    ...

    ...

  • 8/6/2019 conf-isca-2005

    9/229

    Pipeline Idle

    Use Free Idle Pipeline Stages for Prediction

    Unrolled and pipelined AES decryption logic often stays idle from tensto hundreds of cycles when data is missing L2.

    Time Line

    AES Pipeline

    Memory Pipeline

    decrypted line

    Retrieving Counter Value

    and Encrypted Line

    Generate Decryption Pad

  • 8/6/2019 conf-isca-2005

    10/22

  • 8/6/2019 conf-isca-2005

    11/2211

    Handle Frequent Updates

    Window based dynamic tracking of prediction rate for each page.

    For frequently updated memory blocks, according to prediction historyvector, reset root counter number. All future write-backs will count

    from the new number.

    TLB

    If total(miss)>threshold, reset the

    corresponding Page Root Counterto a new number

    Prediction Miss/Prediction Hit

    (miss =1, hit = 0)

    Shift Register

    Page Root Counter(64 bits)

    0xabcddcba12344321...

    ...

    Page Base

    Addr

    0x0000ff00...

    ...

    Prediction History Vector(16bits)

    ...

    ...

    ...

    1

    0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0

    Counter Value

    Prediction Logic

  • 8/6/2019 conf-isca-2005

    12/22

    12

    Experiment Setup

    Parameters Value

    L1 I/D Cache DM, 8KB

    L2 Cache 4way, unified, 256KB/1M

    Memory Bus 200MHz, 8B wide

    CPU Clock 1GHz

    AES Latency (256-bit) Total 64 pipeline stages, 1ns each

    Prediction History Window 16 Bits

    Prediction Depth 5

    Simplescalar 3.0 SPEC2000 INT/FP, benchmarks with high L2 misses.

    Prediction hit rate study (8 billion instructions)

    IPC performance (400 million on representative window)

  • 8/6/2019 conf-isca-2005

    13/22

    13

    Prediction Rate

    Prediction Hit Rate (256K L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Ammp

    appl

    u art

    bzip2 gc

    cgz

    ip mcf

    mgrid

    parse

    r

    swim

    twolf

    vorte

    xvp

    r

    Wupw

    ise

    Aver

    age

    128K_Counter_#_Cache 512K_Counter_#_Cache Pred

    Prediction hit rate under 8 billion instructions

    No counter number cache when using prediction

    Prediction depth = 5

    Average prediction hit rate, about 82-83%

    Prediction Hit Rate (1M L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Ammp

    appl

    u art

    bzip2 gc

    cgz

    ip mcf

    mgrid

    parse

    r

    swim

    twolf

    vorte

    xvp

    r

    Wupw

    ise

    Aver

    age

    128K_Counter_#_Cache 512K_Counter_#_Cache Pred

  • 8/6/2019 conf-isca-2005

    14/22

    14

    IPCNormalized IPC (256K L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Amm

    pAp

    plu art

    Bzip2 Gc

    cGz

    ip Mcf

    Mgrid

    Parse

    rSw

    imTw

    olf

    Vorte

    xVp

    r

    Wupw

    ise

    Aver

    age

    Counter_Cache_4K Counter_Cache_128K Counter_Cache_512K Pred

    Normalized IPC (1M L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Amm

    pAp

    plu art

    Bzip2 Gc

    cGz

    ip Mcf

    Mgrid

    Parse

    rSw

    imTw

    olf

    Vorte

    xVp

    r

    Wupw

    ise

    Aver

    age

    Counter_Cache_4K Counter_Cache_128K Counter_Cache_512K Pred

    IPC normalized with the scenario without decryption.

    In general, outperform 128K counter cache

    On average, in par with 512K counter cache

  • 8/6/2019 conf-isca-2005

    15/22

    15

    Prediction Miss

    Reasons of prediction misses Prediction depth is too small.

    Reset of page root counter number. Memory lines whose countervalues based on the old page root counter cannot be predictedcorrectly using the new page root counter.

    Solutions (details in the next few slides)

    Two-level prediction (divide prediction depth into sub ranges,increase effective prediction depth without adding morepredictions)

    Page root counter history memorization (predict using both thecurrent page root counter and the previous root counter, onlyhaving marginal improvement)

    Context based prediction (exploit temporal coherence ofaccessing memory locations with coherent update frequency)

  • 8/6/2019 conf-isca-2005

    16/22

    16

    Two-level Prediction

    Divide prediction window into ranges (power of 2)

    With 2bits per line, effectively quadruple the prediction depth.

    Overhead is about 2KB on chip memory for 64-entry TLB.

    00

    Prediction Window

    01

    Prediction Window

    10

    Prediction Window

    11

    Prediction Window

    Counter Number InNatural Order

  • 8/6/2019 conf-isca-2005

    17/22

    17

    Context Based Prediction

    Prediction Window

    Store the previous lines counter number depth value in a globalregister.

    Generate new predictions based on Page Root Counter and thevalue in Context Register.

    Can be combined with regular and 2-level predictions. Feed all

    the predictions into the decryption pipeline.

    Counter Number InNatural Order

  • 8/6/2019 conf-isca-2005

    18/22

    18

    Why Does It Work?

    ...

    Memory line

    Memory line

    Memory line

    Memory line

    Memory Page (128 lines){

    while (1){

    for all lines of the pagewrite to the line;

    for all lines of the page

    read the line;}

    }

    Regular Prediction

    (prediction depth=4)

    Context BasedPrediction

    Prediction miss ofmemory read (%)

    20% (for each line,every 5 reads, 1 miss)

    0.1% (for every 128*5reads, 1 miss)

  • 8/6/2019 conf-isca-2005

    19/22

    19

    Prediction Rate

    Prediction Hit Rate (256K L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Amm

    pap

    plu art

    bzip2 gc

    cgz

    ip mcf

    mgri

    d

    parse

    r

    swim

    twolf

    vorte

    xvp

    r

    Wupw

    ise

    Aver

    age

    Regular_Pred Two-level_Pred Context + Regular_Pred

    Prediction Hit Rate (1M L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Amm

    pap

    plu art

    bzip2 gc

    cgz

    ip mcf

    mgri

    d

    parse

    r

    swim

    twolf

    vorte

    xvp

    r

    Wupw

    ise

    Aver

    age

    Regular_Pred Two-level_Pred Context + Regular_Pred

    8 billion instruction window

    Two-level prediction about 93% prediction hit

    Context based + regular prediction almost 99% prediction hit

  • 8/6/2019 conf-isca-2005

    20/22

    20

    IPCNormalized IPC (256K L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Ammp

    Applu ar

    t

    Bzip2 Gc

    cGz

    ip Mcf

    Mgrid

    Parse

    rSw

    imTw

    olf

    Vorte

    xVp

    r

    Wup

    wise

    Aver

    age

    Regular_Pred 2level_Pred Context + Regular_Pred

    Normalized IPC (1M L2)

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    Ammp

    Applu ar

    t

    Bzip2 Gc

    cGz

    ip Mcf

    Mgrid

    Parse

    rSw

    imTw

    olf

    Vorte

    xVp

    r

    Wup

    wise

    Aver

    age

    Regular_Pred 2level_Pred Context + Regular_Pred

    IPC normalized to scenario of no decryption

    1-3% loss of performance using best prediction

  • 8/6/2019 conf-isca-2005

    21/22

    21

    Conclusions

    Counter value prediction allows pre-computing of pads speculativelywithout counter value caching.

    Spacial and temporal coherence of memory update frequency enableseffective counter value prediction.

    Use idle cycles of pipelined decryption engine

    Counter prediction achieves better performance than some of the largecache settings.

    Complementary with caching technique

  • 8/6/2019 conf-isca-2005

    22/22

    22

    Questions