Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

27
Channel Width Reduction Techniques for System-on- Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

description

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays. Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Contributions. Two new FPGA benchmark circuit “suites” - PowerPoint PPT Presentation

Transcript of Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Page 1: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Channel Width Reduction Techniques for System-on-Chip Circuits in

Field-Programmable Gate Arrays

Marvin Tom

University of British ColumbiaDepartment of Electrical and Computer Engineering

Vancouver, BC, Canada

Page 2: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

2

Contributions• Two new FPGA benchmark circuit “suites”

– Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs– Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand

• Two new FPGA CAD flows

– DHPack: Design Hierarchy Packing• Identify congested IP blocks depopulate reduced interconnect demand• Conference paper: “Logic Block Clustering…”, published at DAC 2005

– Un/DoPack: UnPack and DoPack• Find “local” interconnect congestion depopulate reduced interconnect demand• Conference paper, submitted to DAC 2006

• Discoveries…– “Non-uniform” depopulation limits area inflation– “BLE limiting” gives better interconnect controllability than “Input limiting”– “Interconnect variation” important for area inflation and FPGA architecture design– “Routing closure” achieved by re-clustering and incremental place & route

• UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

Page 3: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

3

Mesh-Based FPGA Architecture• 9 logic blocks• 4 wires per channel• 3*4=12 total horizontal tracks

L L L

L L L

L L L

L L L

L L L

L L L

L L L

L

L

L

L

• Larger FPGAs have more “aggregate” interconnect

• 16 logic blocks• 4 wires per channel• 4*4=16 total horizontal tracks

Page 4: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

4

alu4

apex2

apex4

bigkey

clma

des

diffeq

dsip

elliptic

ex1010

ex5p

frisc

misex3

pdc

s298s38417

s38584seq

spla

tseng

pdc

ex1010

frisc splaapex4 elliptic

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

CLB Count

Ro

ute

d C

ha

nn

el W

idth

Logic Utilization vs. Channel Width• Trade-off logic utilization for channel width

– User can always buy more logic…. (not more wires)

FPGA 1 FPGA 2

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L L L L

L

L

L

L

L L L L L

Trade-off:

CLB count

for

Channel width

But….. can we achieve lower Total Area? ( = SIZE * CLB Count)( No! but we can break even! )

Page 5: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

5

Logic Element: BLE and CLB

• Basic Logic Element (BLE)– ‘k’-input LUT + FF

• Configurable Logic Block (CLB) – ‘N’ BLEs, ‘N’ outputs– ‘I’ shared inputs

‘I’ Inputs ‘N’ Outputs

BLE #1

BLE #2

BLE #3

BLE #4

BLE #5

CLB

L L L L

L L L L

L L L L

L L L L

Note: I < k*N

Page 6: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

6

CLB Depopulation

• General Approach– Use existing clustering tools– Do not fill CLB while

clustering

1. Input-Limited• Eg. Maximum 67% input

utilization per CLB• Might use all BLEs

2. BLE-Limited• Eg. Maximum 60% BLE

utilization per CLB• Might use all Inputs

BLE #1

BLE #2

BLE #3

BLE #4

BLE #5

CLB

‘I’ Inputs ‘N’ Outputs

Page 7: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

7

Reducing Channel Width Results(max cluster size 16, max num inputs 51)

• Input-Limited• No channel width control

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Cluster Size (BLE-Limit)

Routed Channel

Width

6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54Number of Inputs (Input-Limit)

Input-limited clmaBLE-Limited clma

• BLE-Limited• (almost) monotonically increasing good channel width control

Page 8: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

8

Meta Benchmark Circuit Creation

• Mimic process of creating large designs– “IP Blocks” <==> MCNC Circuits– SoC <==> Randomly integrate/stitch together “IP Blocks”– IP Blocks have varied interconnect needs

• Considered 3 stitching schemes…

– Independent• IP Blocks are not connected to each other

– Pipeline• Outputs of one IP block connected to inputs of next IP block

– Clique• Outputs of each IP block are uniformly distributed to inputs of all other IP

blocks

Page 9: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

9

DHPack: Meta Circuit P&R

• Use VPR FPGA tools from University of Toronto

• Observation 1– VPR placer successfully

groups IP blocks from random initial placement

• Observation 2– VPR router confirms channel

width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }

Page 10: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

10

0.8

1

1.2

1.4

1.6

1.8

2

40 50 60 70 80 90 100

1

Channel Width Constraint

No

rmal

ized

Are

a

DHPack: Meta Circuit P&R Results

40

50

60

70

80

90

100

40 50 60 70 80 90 100

• Clique MetaCircuit– P&R channel width results closely match “constraints”

• Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase

Channel Width Constraint

Ch

ann

el W

idth

Constraint Routed

Page 11: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

11

Meta Circuits vs. Stdev Circuits

• Meta Circuit Drawbacks– Design hierarchy boundaries not well-defined– Coarse-grained IP block boundary– Stitching unrealistic

• Flip Flop placed at every output• Connections only have FO1

• Stdev Circuits (created using GNL)– Synthetic clone of Meta circuits– Hierarchical specify Rent parameter of each partition

• Root # I/Os, # IP blocks• Second Level 20 IP blocks, # LEs, Rent parameter

Page 12: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

12

Stdev Circuits: Rent Parameters• 7 benchmark circuits• 240/120 primary inputs/outputs, approx 52,000 CLBs• Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

bigke

y

s385

84.1

ellipt

icdif

feq

s298 alu

4

mise

x3 pdc

ex5p

ex10

10

MCNC Circuit IP Blocks

Ren

t P

aram

eter

Stdev000Stdev002

Stdev004Stdev006

Stdev008 / meta cloneStdev010

Stdev012

Page 13: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

13

Un/DoPack Flow

• Iterative non-uniform cluster depopulation tool

• Step 1: Traditional SIS/VPR• Step 2: UnPack:

– Congestion Calculator

• Step 3: DoPack:– Incremental Re-Cluster

• Step 4,5: Fast Place/Route

Circuit DescriptionArchitecture Description

Channel Width ConstraintArray Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

Page 14: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

14

Un/DoPack Flow: SIS/VPRCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 1: Traditional SIS/VPR

Circuit DescriptionArchitecture Description

Channel Width ConstraintArray Size Constraint

Page 15: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

15

Un/DoPack Flow: SIS/VPRCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 1: Traditional SIS/VPR

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Synthesize andTechnology Map(SIS/Flowmap)

Page 16: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

16

Un/DoPack Flow: SIS/VPRCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 1: Traditional SIS/VPR

Channel WidthConstraint Met?

Success!

Yes

No

Page 17: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

17

Un/DoPack Flow: UnPackCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 2: UnPack– Generate Congestion Map– CLB Label = Largest CW occ

in 4 adjacent channels

Page 18: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

18

Un/DoPack Flow: UnPackCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 2: UnPack:Depop Center = Largest CLB label

M X M Array

Page 19: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

19

Un/DoPack Flow: UnPackCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 2: UnPack:Depop Radius = M/4

Depop Amt: 1 new row/col in array

M X M Array

Page 20: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

20

Un/DoPack Flow: DoPackCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 3: DoPack:– Incremental Re-Cluster

IncrementalCluster

(DoPack)

No

Page 21: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

21

Un/DoPack Flow: Fast P&RCircuit Description

Architecture DescriptionChannel Width Constraint

Array Size Constraint

Cluster(iRAC Replica)

Placement(VPR)

Routing(VPR)

Channel WidthConstraint Met?

Success!

CongestionCalculator(UnPack)

Fast Placement(Incremental or

VPR)

Fast Routing(VPR)

Channel WidthConstraint Met?

Yes Yes

No No

Array Size LimitsReached?

Failure

Yes

No

Synthesize andTechnology Map(SIS/Flowmap)

IncrementalCluster

(DoPack)

• Step 4,5: Fast Place/Route

• Fast Placement– UBC Incremental Placer

(under development)– VPR “–fast” option

• Router– Use full routed solution

• Slow but reliable

Page 22: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

22

Before 120/79/27

After 100/79/20Peak / Avg / Stddev

Peak / Avg / Stddev

Page 23: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

23

Normalized Area of GNL Benchmarks

0.901.001.101.201.301.401.501.601.701.801.902.00

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05

% of Maximum Channel Width

No

rmal

ized

Are

a

stdev000

stdev002

stdev004

stdev006

stdev008/clone

stdev010

stdev012

Page 24: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

24

Absolute Area of GNL Benchmarks

0.901.001.101.201.301.401.501.601.701.801.902.00

60 70 80 90 100 110 120 130 140 150 160 170

Absolute Channel Width

No

rmal

ized

Are

a

stdev000

stdev002

stdev004

stdev006

stdev008/clone

stdev010

stdev012

Page 25: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

25

Interconnect Variation: Impact on FPGA Architecture Design

70

80

90

100

110

120

130

140

Min

imu

m R

ou

ted

Ch

an

ne

l W

idth

Baseline

10% Area Increase

20% Area Increase

25% Area Increase

High VariationHigh VariationCircuits RequireCircuits Require

Wide Channel WidthWide Channel Width

Page 26: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

26

Contributions• Two new FPGA benchmark circuit “suites”

– Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs– Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand

• Two new FPGA CAD flows

– DHPack: Design Hierarchy Packing• Identify congested IP blocks depopulate reduced interconnect demand• Conference paper: “Logic Block Clustering…”, published at DAC 2005

– Un/DoPack: UnPack and DoPack• Find “local” interconnect congestion depopulate reduced interconnect demand• Conference paper, submitted to DAC 2006

• Discoveries…– “Non-uniform” depopulation limits area inflation– “BLE limiting” gives better interconnect controllability than “Input limiting”– “Interconnect variation” important for area inflation and FPGA architecture design– “Routing closure” achieved by re-clustering and incremental place & route

• UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

Page 27: Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

End of Talk