University of Michigan Electrical Engineering and Computer Science 1 An Architecture Framework for...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of University of Michigan Electrical Engineering and Computer Science 1 An Architecture Framework for...
1 University of MichiganElectrical Engineering and Computer Science
An Architecture Framework for Transparent Instruction Set
Customization in Embedded Processors
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles*, Krisztián Flautner*
Advanced Computer Architecture Lab, University of Michigan*ARM Ltd.
2 University of MichiganElectrical Engineering and Computer Science
The Expression Gap
• RISC ISAs are lowest common denominator► Don’t match applications’ computation► Don’t match hardware capabilities
• Need efficient execution
• Impressive design wins through customization► Performance, power, etc.
3 University of MichiganElectrical Engineering and Computer Science
Customization Gains: Performance
0
0.5
1
1.5
2
2.5
3
3.5
4
3Des AES Blowfish Md5 Rc4 SHA
Speedup
OptimoDE (5 Issue VLIW, 333 MHz)
OptimoDE + Custom ISA
4 University of MichiganElectrical Engineering and Computer Science
• Demanding parts of applications run on special hardware• New instructions use the special hardware
Traditional ISA Customization
XOR
MPY LD
XOR
SHR
XOR
MOVANDCUSTOM
MPYLD
SHR
CPU
CustomHardware
5 University of MichiganElectrical Engineering and Computer Science
Objectives of Transparent ISA Customization
• Increase execution efficiency of processors
• Architecture framework for subgraph acceleration► Create a pipeline with fixed interface► Design and verify once
• Support Plug-and-Play style accelerators
• CISC on Demand
6 University of MichiganElectrical Engineering and Computer Science
Traditional vs. Transparent Customization
Traditional• Significant ISA change • High NRE
► Verification► Masks
• Control placed in binary► Software migration
• No legacy codes
Transparent• No ISA change• Baseline CPU unchanged
►
• Hardware generates control► Eases software burden
• Forward compatible
7 University of MichiganElectrical Engineering and Computer Science
Architecture Framework
Compiler StandardPipeline
…Subg.…
…Subg.…
Application
SubgraphExecution
Unit
Inputs Outputs
ControlGeneration
Instructions
AugmentsInstruction
Stream
1.
4.
3.
2.
8 University of MichiganElectrical Engineering and Computer Science
Configurable Compute Array (CCA)
• Array of function units
• Two types of FUs: arith/logic, logic
• 82% of important subgraphs
• Crossbar between rows
• 3.19ns critical path
• 0.61mm2 in 0.13
I1 I2I1 I3 I4
O1 O2
9 University of MichiganElectrical Engineering and Computer Science
Architecture Framework
Compiler StandardPipeline
…Subg.…
…Subg.…
Application
SubgraphExecution
Unit
Inputs Outputs
ControlGeneration
Instructions
AugmentsInstruction
Stream
1.
4.
3.
2.
10 University of MichiganElectrical Engineering and Computer Science
Compiler
• Identify and delineate subgraphs• “Procedural Abstraction” – used in compression
11 University of MichiganElectrical Engineering and Computer Science
Architecture Framework
Compiler StandardPipeline
…Subg.…
…Subg.…
Application
SubgraphExecution
Unit
Inputs Outputs
ControlGeneration
Instructions
AugmentsInstruction
Stream
1.
4.
3.
2.
12 University of MichiganElectrical Engineering and Computer Science
I1
Control Generation
I1 I2 I3 I4
O1 O2
Subg:AND r3, r1, #-4SEXT r2, r4AND r2, r2, #3OR r3, r3, r2RET
I1 I2
13 University of MichiganElectrical Engineering and Computer Science
Architecture Framework
Compiler StandardPipeline
…Subg.…
…Subg.…
Application
SubgraphExecution
Unit
Inputs Outputs
ControlGeneration
Instructions
AugmentsInstruction
Stream
1.
4.
3.
2.
15 University of MichiganElectrical Engineering and Computer Science
Evaluation
• Ported Trimaran compiler to ARM ISA► Subgraph identification engine
• Synthesized control generator and accelerator• SimpleScalar configured as ARM926EJ-S
► 5 stage pipe, 250 MHz► 1 cycle 16k I/D caches► Single issue► 1 cycle subgraph execution latency
16 University of MichiganElectrical Engineering and Computer Science
Performance Results
1
1.5
2
2.5
3
3.5
4
4.5
5
164.
gzip
181.
mcf
197.
pars
er
256.
bzip2
300.
twolf
cjpeg
djpeg
epic
unep
ic
g721
enco
de
g721
deco
de
gsm
enco
de
gsm
deco
de
pegw
itenc
pegw
itdec
rawca
udio
rawda
udio
blowfis
hm
d5 rc4
Rijnda
elsh
a
Speedup
SPECint MediaBench Encryption
6.51
1.6 IPC on a single-issue core
17 University of MichiganElectrical Engineering and Computer Science
Plug-and-Play Benefits
Baseline Area: 0.61mm2 Baseline Speedup: 1.8
18 University of MichiganElectrical Engineering and Computer Science
Effect of CCA Pipelining
Average: 2.17 1.86 1.64 1.48
1
1.5
2
2.5
3
3.5
4
4.5
5
cjpeg
djpeg
epicd
ec
epice
nc
g721
deco
de
g721
enco
de
gsm
deco
de
gsm
enco
de
pegw
itdec
pegw
itenc
rawca
udio
rawda
udio
blowfis
hm
d5 rc4
rijnda
elsh
a
Speedup
1 2 3 4
19 University of MichiganElectrical Engineering and Computer Science
Conclusions
• Expression gap between ISAs and computation► Inherent inefficiency
• Transparent ISA Customization► Fixed core low NRE► Plug-and-Play accelerators► Enables “CISC on demand”
• 1.8x speedup for 15% area overhead