University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A...

29
University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A Combined Clustering and Placement Algorithm for FPGAs Mark Yamashita

Transcript of University of British Columbia Dept. of Electrical and Computer Engineering November 30, 2007 A...

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 2007

A Combined Clustering and Placement Algorithm for FPGAs

Mark Yamashita

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20072

Contributions

• New algorithm to do clustering and placement

• Novel approach for trading-off depth for duplication control

• Timing model/placement incorporated into clustering

• Delay improves by an average of 11%

• Controllable trade-off between area overhead and delay improvements

• Plan to submit to FPL ‘08

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20073

Motivation

• FPGAs need to be faster• 4x slower than ASICs

• Limitations of existing clustering approaches:• No depth control during clustering, often greedy

• Provide no means for duplication, or

• Use duplication in excess

• Inaccurate timing models

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20074

Motivation

• GOAL: • Improve critical-path delay by improving

clustering

• Approach:• Use placement information to form accurate

timing model

• Make better clustering decisions

• Use duplication to reduce depth

• Take advantage of otherwise unused logic in FPGA

• Control amount of duplication by relaxing depth

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20075

Algorithm Overview

T-VP

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20076

Phase 1: Microcluster Formation

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20077

Phase 1: Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20078

Phase 1: Lawler Levitt Turner Algorithm

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 20079

Phase 1

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200710

Phase 1: Node Duplication Reduction

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200711

Phase 1: Block Usage Results

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

tseng

ex5p

apex

4ds

ip

mise

x3dif

feq

alu4

des

bigke

yse

q

apex

2s2

98 frisc

ellipt

icsp

lapd

c

ex10

10

s384

17

s385

84.1

clma

MCNC Circuit

To

tal

Blo

cks

TVPack

Lawler

Reduced

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200712

Phase 1: Additional Duplication Reduction Through Depth Relaxation

11.5

11.7

11.9

12.1

12.3

12.5

12.7

12.9

13.1

Lawle

rs

Single

Pass

70%

50%

30%

20%

10% 5%

TVPack

Clustering Method

Tc

rit

[ns

]

200

250

300

350

400

450

500

CL

B C

ou

nt

Tcrit [ns]

CLBs

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200713

Algorithm Overview

T-VP

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200714

Phase 2: Microcluster Compaction with Orchestrator

• Iteratively move microclusters to improve timing

• Can fit multiple microclusters to the same CLB position, provided the aggregate of all microclusters meets CLB constraints

• If an area constraint is given, remove duplication and fragmentation until constraint is met

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200715

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200716

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200717

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200718

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200719

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200720

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200721

Phase 2: Orchestrator Example

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200722

Results: Timing

0.00

5.00

10.00

15.00

20.00

25.00

dsip

bigke

yde

s

mise

x3 seq

apex

4alu

4ex

5p

s385

84.1

apex

2dif

feq

tseng sp

la

ex10

10 pdc

s384

17

ellipt

ics2

98clm

afri

sc

MCNC Benchmark

Tcr

it [

ns]

T-VPack

Orchestrator

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200723

Results: Area

0

200

400

600

800

1000

1200

1400

MCNC Benchmark

CL

Bs

Us

ag

ed

0.00

5.00

10.00

15.00

20.00

25.00

Tc

rit

[ns

]

T-VPack

Orchestrator

T-VPack

Orchestrator

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200724

Results: Timing vs. Area

11.5

12

12.5

13

13.5

14

Unlimited Min +3 Min +2 Min +1 Minimum TVPack

Clustering

Tc

rit[

ns

]

200

220

240

260

280

300

320

340

360

380

400

CL

Bs Tcrit [ns]

CLBs

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200725

Results: Timing vs. Depth

-5.0%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

0% 10% 20% 30% 40% 50% 60%

Depth Improvement

Tim

ing

Imp

rov

em

en

t

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200726

Conclusions

• Reducing depth contributes to a reduction in critical path delay

• Node duplication, when used effectively, reduces critical path delay

• Duplication can be used to provide a performance-area tradeoff to the designer

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200727

Future Work

• Promising Post-Placement Optimizations:• Retiming

• Leverage a more significant depth reduction

• Logic reintroduction

• Create duplication to increase performance

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200728

Contributions

• New algorithm to do clustering and placement

• Novel approach for trading-off depth for duplication control

• Timing model/placement incorporated into clustering

• Delay improves by an average of 11%

• Controllable trade-off between area overhead and delay improvements

• Plan to submit to FPL ‘08

University of British ColumbiaDept. of Electrical and Computer Engineering

November 30, 200729

Thank You