University of Michigan Electrical Engineering and Computer Science 1 An Architecture Framework for...

20
1 University of Michigan Electrical Engineering and Computer Science An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles*, Krisztián Flautner* Advanced Computer Architecture Lab, University of Michigan *ARM Ltd.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of University of Michigan Electrical Engineering and Computer Science 1 An Architecture Framework for...

1 University of MichiganElectrical Engineering and Computer Science

An Architecture Framework for Transparent Instruction Set

Customization in Embedded Processors

Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles*, Krisztián Flautner*

Advanced Computer Architecture Lab, University of Michigan*ARM Ltd.

2 University of MichiganElectrical Engineering and Computer Science

The Expression Gap

• RISC ISAs are lowest common denominator► Don’t match applications’ computation► Don’t match hardware capabilities

• Need efficient execution

• Impressive design wins through customization► Performance, power, etc.

3 University of MichiganElectrical Engineering and Computer Science

Customization Gains: Performance

0

0.5

1

1.5

2

2.5

3

3.5

4

3Des AES Blowfish Md5 Rc4 SHA

Speedup

OptimoDE (5 Issue VLIW, 333 MHz)

OptimoDE + Custom ISA

4 University of MichiganElectrical Engineering and Computer Science

• Demanding parts of applications run on special hardware• New instructions use the special hardware

Traditional ISA Customization

XOR

MPY LD

XOR

SHR

XOR

MOVANDCUSTOM

MPYLD

SHR

CPU

CustomHardware

5 University of MichiganElectrical Engineering and Computer Science

Objectives of Transparent ISA Customization

• Increase execution efficiency of processors

• Architecture framework for subgraph acceleration► Create a pipeline with fixed interface► Design and verify once

• Support Plug-and-Play style accelerators

• CISC on Demand

6 University of MichiganElectrical Engineering and Computer Science

Traditional vs. Transparent Customization

Traditional• Significant ISA change • High NRE

► Verification► Masks

• Control placed in binary► Software migration

• No legacy codes

Transparent• No ISA change• Baseline CPU unchanged

• Hardware generates control► Eases software burden

• Forward compatible

7 University of MichiganElectrical Engineering and Computer Science

Architecture Framework

Compiler StandardPipeline

…Subg.…

…Subg.…

Application

SubgraphExecution

Unit

Inputs Outputs

ControlGeneration

Instructions

AugmentsInstruction

Stream

1.

4.

3.

2.

8 University of MichiganElectrical Engineering and Computer Science

Configurable Compute Array (CCA)

• Array of function units

• Two types of FUs: arith/logic, logic

• 82% of important subgraphs

• Crossbar between rows

• 3.19ns critical path

• 0.61mm2 in 0.13

I1 I2I1 I3 I4

O1 O2

9 University of MichiganElectrical Engineering and Computer Science

Architecture Framework

Compiler StandardPipeline

…Subg.…

…Subg.…

Application

SubgraphExecution

Unit

Inputs Outputs

ControlGeneration

Instructions

AugmentsInstruction

Stream

1.

4.

3.

2.

10 University of MichiganElectrical Engineering and Computer Science

Compiler

• Identify and delineate subgraphs• “Procedural Abstraction” – used in compression

11 University of MichiganElectrical Engineering and Computer Science

Architecture Framework

Compiler StandardPipeline

…Subg.…

…Subg.…

Application

SubgraphExecution

Unit

Inputs Outputs

ControlGeneration

Instructions

AugmentsInstruction

Stream

1.

4.

3.

2.

12 University of MichiganElectrical Engineering and Computer Science

I1

Control Generation

I1 I2 I3 I4

O1 O2

Subg:AND r3, r1, #-4SEXT r2, r4AND r2, r2, #3OR r3, r3, r2RET

I1 I2

13 University of MichiganElectrical Engineering and Computer Science

Architecture Framework

Compiler StandardPipeline

…Subg.…

…Subg.…

Application

SubgraphExecution

Unit

Inputs Outputs

ControlGeneration

Instructions

AugmentsInstruction

Stream

1.

4.

3.

2.

14 University of MichiganElectrical Engineering and Computer Science

Pipeline Interface

15 University of MichiganElectrical Engineering and Computer Science

Evaluation

• Ported Trimaran compiler to ARM ISA► Subgraph identification engine

• Synthesized control generator and accelerator• SimpleScalar configured as ARM926EJ-S

► 5 stage pipe, 250 MHz► 1 cycle 16k I/D caches► Single issue► 1 cycle subgraph execution latency

16 University of MichiganElectrical Engineering and Computer Science

Performance Results

1

1.5

2

2.5

3

3.5

4

4.5

5

164.

gzip

181.

mcf

197.

pars

er

256.

bzip2

300.

twolf

cjpeg

djpeg

epic

unep

ic

g721

enco

de

g721

deco

de

gsm

enco

de

gsm

deco

de

pegw

itenc

pegw

itdec

rawca

udio

rawda

udio

blowfis

hm

d5 rc4

Rijnda

elsh

a

Speedup

SPECint MediaBench Encryption

6.51

1.6 IPC on a single-issue core

17 University of MichiganElectrical Engineering and Computer Science

Plug-and-Play Benefits

Baseline Area: 0.61mm2 Baseline Speedup: 1.8

18 University of MichiganElectrical Engineering and Computer Science

Effect of CCA Pipelining

Average: 2.17 1.86 1.64 1.48

1

1.5

2

2.5

3

3.5

4

4.5

5

cjpeg

djpeg

epicd

ec

epice

nc

g721

deco

de

g721

enco

de

gsm

deco

de

gsm

enco

de

pegw

itdec

pegw

itenc

rawca

udio

rawda

udio

blowfis

hm

d5 rc4

rijnda

elsh

a

Speedup

1 2 3 4

19 University of MichiganElectrical Engineering and Computer Science

Conclusions

• Expression gap between ISAs and computation► Inherent inefficiency

• Transparent ISA Customization► Fixed core low NRE► Plug-and-Play accelerators► Enables “CISC on demand”

• 1.8x speedup for 15% area overhead

20 University of MichiganElectrical Engineering and Computer Science

Questions?

More info:More info:

http://cccp.eecs.umich.eduhttp://cccp.eecs.umich.edu