conf-isca-2005
Transcript of conf-isca-2005
-
8/6/2019 conf-isca-2005
1/22
1
High Efficiency Counter Mode SecurityArchitecture via Prediction and Pre-computation
Weidong Shi
Hsien-Hsin (Sean) Lee
Mrinmoy Ghosh
Chenghuai Lu
Alexandra Boldyreva
School of Electrical and Computer Engineering
Georgia Institute of Technology
-
8/6/2019 conf-isca-2005
2/22
2
Content
Motivation
Related Work
Counter/Decryption Pad Prediction
Profile Prediction Failures
2Level Prediction
Context Based Prediction
Conclusions
-
8/6/2019 conf-isca-2005
3/22
3
Why Encrypt System Memory?
Protect sensitive data stored in the RAM (many simple devicescan bypass OS memory protection and directly access physicalmemory)
Digital Right Management (industry witness of gradual addition of
encryption to each platform component, encrypted PCI-E, encrypteddisk, encrypted flash memory, then toward encrypted RAM)
Anti-reverse engineer (majority software licenses require users notto do reverse engineer, count on the users not breaking the promise)
Military (customer of encrypted FPGA chips, lots of embedded militarysoftware)
Program randomization (intrusion prevention, CCS 2003)
-
8/6/2019 conf-isca-2005
4/22
4
Different Solutions
SoC. Memory is on-chip.
Apply to limited platforms
such as small embedded
systems (cell phones)
Crypto Engine
Processor Core
Cache
Configurable system
RAM encryption. More
usage models.
CryptoEngine
Flash
Micro Controller
Create a little secure
world, limited application
scenarios (code signing,
BIOS signature verification)
-
8/6/2019 conf-isca-2005
5/22
5
Related Work
Use dedicated cache (sequence number cache) to reducelatency overhead of memory decryption (Micro 2003)
Prefetch based memory pre-decryption (WASSA 2004)
Prediction based memory decryption (this paper) Fully exploit pre-computation capability enabled by counter mode
encryption.
Use wasted idle crypto engine pipeline stages for prediction andpre-computation.
Less area overhead than caching and less memory pressure thanprefetch based pre-decryption.
-
8/6/2019 conf-isca-2005
6/22
6
Counter Mode - Encryption
Processor Core
CryptoEngine
Cache LineCache Line
...
Cache LineCache Line
Counter
16B
Cache Line
Encrypted 16B
Key
AES
Block Cipher
Encryption pad
VAddr Counter
16B
Cache Line
Encrypted 16B
Key
AES
Block Cipher
Encryption pad
Vaddr+2
Counter
Counter+1 VAddr Counter+1 Vaddr+2
Counter+1
Counter+2 VAddr Counter+2 Vaddr+2
Counter+2
Each memory line has its own counter. Each time memory line is updated, increment the counter.
-
8/6/2019 conf-isca-2005
7/227
Counter Mode -Decryption
Processor Core
CryptoEngine
Cache LineCache Line
...
Cache LineCache Line
Key
AES
Block Cipher
Encryption pad
VAddr
16B
Cache Line16B
Cache Line
Counter+2 Counter+2
Encrypted 16B Encrypted 16B
Key
AES
Block Cipher
Encryption pad
Vaddr+2
Counter has to befetched for memoryline missing L2.
-
8/6/2019 conf-isca-2005
8/228
0xabcddcba123443f1
0xabcddcba12344e0a
...
0xabcddcba12344325
0xabcddcba12344321
...
Memory line
Memory line
Memory line
Memory line
Counters exhibit both spacial and temporal coherence.
To exploit spacial coherence, memory blocks from the same pagestart counting from the same initial value (page root counter)
Counter Prediction
static data
infrequently updated data
frequently updated data
counter
Page Root Counter(64 bits)
0xabcddcba12344321
...
...
Page Base Addr
0x0000ff00
...
...
-
8/6/2019 conf-isca-2005
9/229
Pipeline Idle
Use Free Idle Pipeline Stages for Prediction
Unrolled and pipelined AES decryption logic often stays idle from tensto hundreds of cycles when data is missing L2.
Time Line
AES Pipeline
Memory Pipeline
decrypted line
Retrieving Counter Value
and Encrypted Line
Generate Decryption Pad
-
8/6/2019 conf-isca-2005
10/22
-
8/6/2019 conf-isca-2005
11/2211
Handle Frequent Updates
Window based dynamic tracking of prediction rate for each page.
For frequently updated memory blocks, according to prediction historyvector, reset root counter number. All future write-backs will count
from the new number.
TLB
If total(miss)>threshold, reset the
corresponding Page Root Counterto a new number
Prediction Miss/Prediction Hit
(miss =1, hit = 0)
Shift Register
Page Root Counter(64 bits)
0xabcddcba12344321...
...
Page Base
Addr
0x0000ff00...
...
Prediction History Vector(16bits)
...
...
...
1
0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0
Counter Value
Prediction Logic
-
8/6/2019 conf-isca-2005
12/22
12
Experiment Setup
Parameters Value
L1 I/D Cache DM, 8KB
L2 Cache 4way, unified, 256KB/1M
Memory Bus 200MHz, 8B wide
CPU Clock 1GHz
AES Latency (256-bit) Total 64 pipeline stages, 1ns each
Prediction History Window 16 Bits
Prediction Depth 5
Simplescalar 3.0 SPEC2000 INT/FP, benchmarks with high L2 misses.
Prediction hit rate study (8 billion instructions)
IPC performance (400 million on representative window)
-
8/6/2019 conf-isca-2005
13/22
13
Prediction Rate
Prediction Hit Rate (256K L2)
0
0.2
0.4
0.6
0.8
1
1.2
Ammp
appl
u art
bzip2 gc
cgz
ip mcf
mgrid
parse
r
swim
twolf
vorte
xvp
r
Wupw
ise
Aver
age
128K_Counter_#_Cache 512K_Counter_#_Cache Pred
Prediction hit rate under 8 billion instructions
No counter number cache when using prediction
Prediction depth = 5
Average prediction hit rate, about 82-83%
Prediction Hit Rate (1M L2)
0
0.2
0.4
0.6
0.8
1
1.2
Ammp
appl
u art
bzip2 gc
cgz
ip mcf
mgrid
parse
r
swim
twolf
vorte
xvp
r
Wupw
ise
Aver
age
128K_Counter_#_Cache 512K_Counter_#_Cache Pred
-
8/6/2019 conf-isca-2005
14/22
14
IPCNormalized IPC (256K L2)
0
0.2
0.4
0.6
0.8
1
1.2
Amm
pAp
plu art
Bzip2 Gc
cGz
ip Mcf
Mgrid
Parse
rSw
imTw
olf
Vorte
xVp
r
Wupw
ise
Aver
age
Counter_Cache_4K Counter_Cache_128K Counter_Cache_512K Pred
Normalized IPC (1M L2)
0
0.2
0.4
0.6
0.8
1
1.2
Amm
pAp
plu art
Bzip2 Gc
cGz
ip Mcf
Mgrid
Parse
rSw
imTw
olf
Vorte
xVp
r
Wupw
ise
Aver
age
Counter_Cache_4K Counter_Cache_128K Counter_Cache_512K Pred
IPC normalized with the scenario without decryption.
In general, outperform 128K counter cache
On average, in par with 512K counter cache
-
8/6/2019 conf-isca-2005
15/22
15
Prediction Miss
Reasons of prediction misses Prediction depth is too small.
Reset of page root counter number. Memory lines whose countervalues based on the old page root counter cannot be predictedcorrectly using the new page root counter.
Solutions (details in the next few slides)
Two-level prediction (divide prediction depth into sub ranges,increase effective prediction depth without adding morepredictions)
Page root counter history memorization (predict using both thecurrent page root counter and the previous root counter, onlyhaving marginal improvement)
Context based prediction (exploit temporal coherence ofaccessing memory locations with coherent update frequency)
-
8/6/2019 conf-isca-2005
16/22
16
Two-level Prediction
Divide prediction window into ranges (power of 2)
With 2bits per line, effectively quadruple the prediction depth.
Overhead is about 2KB on chip memory for 64-entry TLB.
00
Prediction Window
01
Prediction Window
10
Prediction Window
11
Prediction Window
Counter Number InNatural Order
-
8/6/2019 conf-isca-2005
17/22
17
Context Based Prediction
Prediction Window
Store the previous lines counter number depth value in a globalregister.
Generate new predictions based on Page Root Counter and thevalue in Context Register.
Can be combined with regular and 2-level predictions. Feed all
the predictions into the decryption pipeline.
Counter Number InNatural Order
-
8/6/2019 conf-isca-2005
18/22
18
Why Does It Work?
...
Memory line
Memory line
Memory line
Memory line
Memory Page (128 lines){
while (1){
for all lines of the pagewrite to the line;
for all lines of the page
read the line;}
}
Regular Prediction
(prediction depth=4)
Context BasedPrediction
Prediction miss ofmemory read (%)
20% (for each line,every 5 reads, 1 miss)
0.1% (for every 128*5reads, 1 miss)
-
8/6/2019 conf-isca-2005
19/22
19
Prediction Rate
Prediction Hit Rate (256K L2)
0
0.2
0.4
0.6
0.8
1
1.2
Amm
pap
plu art
bzip2 gc
cgz
ip mcf
mgri
d
parse
r
swim
twolf
vorte
xvp
r
Wupw
ise
Aver
age
Regular_Pred Two-level_Pred Context + Regular_Pred
Prediction Hit Rate (1M L2)
0
0.2
0.4
0.6
0.8
1
1.2
Amm
pap
plu art
bzip2 gc
cgz
ip mcf
mgri
d
parse
r
swim
twolf
vorte
xvp
r
Wupw
ise
Aver
age
Regular_Pred Two-level_Pred Context + Regular_Pred
8 billion instruction window
Two-level prediction about 93% prediction hit
Context based + regular prediction almost 99% prediction hit
-
8/6/2019 conf-isca-2005
20/22
20
IPCNormalized IPC (256K L2)
0
0.2
0.4
0.6
0.8
1
1.2
Ammp
Applu ar
t
Bzip2 Gc
cGz
ip Mcf
Mgrid
Parse
rSw
imTw
olf
Vorte
xVp
r
Wup
wise
Aver
age
Regular_Pred 2level_Pred Context + Regular_Pred
Normalized IPC (1M L2)
0
0.2
0.4
0.6
0.8
1
1.2
Ammp
Applu ar
t
Bzip2 Gc
cGz
ip Mcf
Mgrid
Parse
rSw
imTw
olf
Vorte
xVp
r
Wup
wise
Aver
age
Regular_Pred 2level_Pred Context + Regular_Pred
IPC normalized to scenario of no decryption
1-3% loss of performance using best prediction
-
8/6/2019 conf-isca-2005
21/22
21
Conclusions
Counter value prediction allows pre-computing of pads speculativelywithout counter value caching.
Spacial and temporal coherence of memory update frequency enableseffective counter value prediction.
Use idle cycles of pipelined decryption engine
Counter prediction achieves better performance than some of the largecache settings.
Complementary with caching technique
-
8/6/2019 conf-isca-2005
22/22
22
Questions