Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington

41
Soha Hassoun Soha Hassoun Tufts University Tufts University Medford, MA Medford, MA Thanks to: Carl Ebeling Thanks to: Carl Ebeling University of Washington University of Washington Seattle, WA Seattle, WA Fine Grain Incremental Rescheduling Fine Grain Incremental Rescheduling Via Via Architectural Retiming Architectural Retiming

description

Fine Grain Incremental Rescheduling Via Architectural Retiming. Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle, WA. Problem -- Clock period is too large. Example. Write Address. RAM. Read Address. Offset. Pipelining. - PowerPoint PPT Presentation

Transcript of Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington

Page 1: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Soha HassounSoha Hassoun

Tufts UniversityTufts University

Medford, MAMedford, MA

Thanks to: Carl EbelingThanks to: Carl Ebeling

University of WashingtonUniversity of Washington

Seattle, WASeattle, WA

Fine Grain Incremental ReschedulingFine Grain Incremental ReschedulingViaVia

Architectural RetimingArchitectural Retiming

Page 2: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

RAM

OffsetOffset

ExampleExample

Problem -- Clock period is too largeProblem -- Clock period is too large

Write AddressWrite Address

Read AddressRead Address

Page 3: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

RAM

Write AddressWrite Address

Read AddressRead Address

OffsetOffset

PipeliningPipelining

Problems w/ consecutive dependent operationsProblems w/ consecutive dependent operations

Page 4: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Performance BottleneckPerformance Bottleneck

Latency constrained pathsLatency constrained paths

Latency = n

Page 5: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Performance BottleneckPerformance Bottleneck

Latency constrained pathsLatency constrained paths

Latency = n

ApproachApproachapply architectural retiming at the RT levelapply architectural retiming at the RT level

Page 6: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Problem:Problem: too much work, too little timetoo much work, too little time

Architectural RetimingArchitectural Retiming

yk

Page 7: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Problem:Problem: too much work, too little timetoo much work, too little time

D

pipelinepipelineregisterregister

yk

Architectural RetimingArchitectural Retiming

Page 8: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

N

negative registernegative register

Problem:Problem: too much work, too little timetoo much work, too little time

pipelinepipelineregisterregister

DCyk

Architectural RetimingArchitectural Retiming

Page 9: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

N

negative registernegative register

Problem:Problem: too much work, too little timetoo much work, too little time

pipelinepipelineregisterregister

DCyk

Architectural RetimingArchitectural Retiming

precomputation prediction

Page 10: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

OutlineOutline

PrecomputationPrecomputationincremental rescheduling incremental rescheduling withoutwithout resource resource

constraintsconstraints

PredictionPredictionincremental rescheduling incremental rescheduling withwith resource resource

constraintsconstraints

ResultsResults

Page 11: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

DD t t = C = C t+1t+1

Precomputation FunctionPrecomputation Function

hhhDCxi

ffggyk

x iN

Page 12: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

DD t t = C = C t+1t+1

= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )

Precomputation FunctionPrecomputation Function

hhhDCxi

ffggyk

x iN

Page 13: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

DD t t = C = C t+1t+1

= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )

xxi i t+1t+1 = x´= x´ii

t t == gg ( ... , y( ... , ykktt , ... ) , ... )

Precomputation FunctionPrecomputation Function

hhhDCxi

ffggyk

x iN

Page 14: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

f´f´DD t t = C = C t+1t+1

= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )

xxi i t+1t+1 = x´= x´ii

t t == gg ( ... , y( ... , ykktt , ... ) , ... )

Precomputation FunctionPrecomputation Function

hhhDCxi

ffggyk

x iN

DD tt = f ( ... , g= f ( ... , g ( ... , y( ... , ykktt , ... ) , ...) , ... ) , ...)

= f´( ... , y= f´( ... , ykktt , ... ) , ... )

Page 15: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Incremental ReschedulingIncremental Rescheduling

hhhffggyk

Time n g

Time n+1 f, h

N

Page 16: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

f´f´

Incremental ReschedulingIncremental Rescheduling

hhhffggyk

Time n g

Time n+1 f, h

N

Time n f ’

Time n+1 h

Page 17: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PrecomputingPrecomputingWith Register ArraysWith Register Arrays

Read Data

Write Address

Read Address

Write Data

Read Data

Page 18: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PrecomputingPrecomputingWith Register ArraysWith Register Arrays

Write Address

Read Address

Write Data

Read Data

Out

N

F

Page 19: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PrecomputingPrecomputingWith Register ArraysWith Register Arrays

F t = Out t+1

Write Address

Read Address

Write Data

Read Data

Out

N

F

Page 20: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PrecomputingPrecomputingWith Register ArraysWith Register Arrays

F t = Out t+1

= Arrayt+1 [Read Addresst+1 ]

Write Address

Read Address

Write Data

Read Data

Out

N

F

Page 21: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Synthesizing Bypass PathsSynthesizing Bypass Paths

Write Address

PrecomputedRead

Address

Write Data

Read Data

=?

Write Address

Read Address

Write Data

Read Data

Page 22: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing RAM OutputPrecomputing RAM Output

RAM

N

RAM

Page 23: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PredictionPrediction

DCffgi

Z

N

What if ? What if ? can’t precompute, can’t precompute, too many additional resources, ortoo many additional resources, orperformance is unsatisfactoryperformance is unsatisfactory

Page 24: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

PredictionPrediction

DCffgi

Z

N

What if ? What if ? can’t precompute, can’t precompute, too many additional resources, ortoo many additional resources, orperformance is unsatisfactoryperformance is unsatisfactory

Predict C one cycle before its arrivalPredict C one cycle before its arrival

Page 25: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Schedule with MispredictionsSchedule with Mispredictions

C HR1 R2

t-1 t t+1C c1 c2

H h1 h2

Page 26: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Schedule with MispredictionsSchedule with Mispredictions

C HR1 R2

t-1 t t+1C c1

H

Verify

NegativeRegister

c2

h1 h2

Page 27: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Schedule with MispredictionsSchedule with Mispredictions

C HR1 R2

t-1 t t+1C c1

H

Verify

NegativeRegister

Page 28: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Schedule with MispredictionsSchedule with Mispredictions

C HR1 R2

t-1 t t+1C c1

H

h1

c1*=? c1

c1*

Verify

NegativeRegister

c2*

c2

h2

c2*=? c2

c2

Page 29: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Synthesis Issues in PredictionSynthesis Issues in Prediction

Negative register as predicting FSM Negative register as predicting FSM use signal transition probabilitiesuse signal transition probabilitiesincorporate don’t care conditionsincorporate don’t care conditions

Nullifying mispredictionsNullifying mispredictionsTwo correction strategiesTwo correction strategies

• As-Soon-As-Possible restoration• As-Late-As-Possible correction

Add handshaking signals to coordinate with Add handshaking signals to coordinate with interfaceinterface

Page 30: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Related WorkRelated Work PrecomputationPrecomputation

Bypass Synthesis Bypass Synthesis lookahead [Kogge ‘81, …..]lookahead [Kogge ‘81, …..]

Prediction / Speculative ExecutionPrediction / Speculative ExecutionMost likely path, arbitrarily deep [Holtmann & Ernst Most likely path, arbitrarily deep [Holtmann & Ernst

‘93,’95]‘93,’95]Pre-execution [Radivojevic & Brewer ‘94]Pre-execution [Radivojevic & Brewer ‘94]Possible multiple paths & arbitrarily deep Possible multiple paths & arbitrarily deep

[Lakshminarayana et al. ‘98][Lakshminarayana et al. ‘98]

Percolation scheduling Percolation scheduling [Potasman et al. ‘90][Potasman et al. ‘90]

Page 31: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

ResultsResults

0

0.5

1

1.5

2

2.5

Seq QC GCD-prec FA1 FA2 MIM MIM-pred GCD-pred

Speed up Area Increase

Page 32: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Architectural RetimingArchitectural Retiming Improves throughput while preserving Improves throughput while preserving

functionality and sometimes latencyfunctionality and sometimes latency

Bridge gap between HLS and logic optimizationsBridge gap between HLS and logic optimizations

Unifies several sequential optimizationsUnifies several sequential optimizationsbypass synthesisbypass synthesislookahead transformationlookahead transformationbranch predictionbranch predictionfine-grain cross register optimizationsfine-grain cross register optimizations

Page 33: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Ph.D. Forum at DAC ‘99Ph.D. Forum at DAC ‘99 Goal Goal

increase interaction between academia and industryincrease interaction between academia and industry

FormatFormatstudents present work at poster session at DAC students present work at poster session at DAC researchers give feedbackresearchers give feedback

Who’s eligible?Who’s eligible?Students within 1 or 2 years of finishing Ph.D. thesisStudents within 1 or 2 years of finishing Ph.D. thesis

www.cs.washington.edu/homes/soha/forum

Page 34: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

The EndThe End

Page 35: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing in Precomputing in Single-Register CyclesSingle-Register Cycles

Original CircuitBA

Page 36: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing in Precomputing in Single-Register CyclesSingle-Register Cycles

Original CircuitN BA

Page 37: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing in Precomputing in Single-Register CyclesSingle-Register Cycles

Lookahead -- A(n) is a function of B(n-2)

N BA

A' BAB'

[Kogge, ‘81], [Parhi & Messerschmidtt, ‘89]

Page 38: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing RAM OutputPrecomputing RAM Output

RAMRAM

Page 39: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Precomputing RAM OutputPrecomputing RAM Output

RAMRAM

Page 40: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Speculative Execution Speculative Execution

c1

c2

c3

c4

c5

c6

Scope and Depth

Page 41: Soha Hassoun Tufts University Medford, MA Thanks to:  Carl Ebeling University of Washington

Speculative Execution Speculative Execution

Scope and Depth