3 Pipelining and Parallel Processing

34
1 Pipelining and Parallel Processing Shao-Yi Chien

description

pipelining and parallel processing

Transcript of 3 Pipelining and Parallel Processing

Page 1: 3 Pipelining and Parallel Processing

1

Pipelining and Parallel Processing

Shao-Yi Chien

Page 2: 3 Pipelining and Parallel Processing

Shao-Yi Chien 2DSP in VLSI Design

Introduction

Pipelining and parallel are the most important design techniques in VLSI DSP systemsMake use of the inherent parallel property of DSP algorithms

Pipelining: different function units working in parallelParallel: duplicated function units working in parallel

Page 3: 3 Pipelining and Parallel Processing

Shao-Yi Chien 3DSP in VLSI Design

An Example:Car Production Line

Throughput: how many cars produced in one hourLatency: how long will it take to produce one car

mTCost: 1Throughput: 1/mTLatency: mT

T T T

mTmT

mT

mCost: >1Throughput: 1/TLatency: >mT

Cost: >nThroughput: (1/mT)*nLatency: >mT

n

Page 4: 3 Pipelining and Parallel Processing

Shao-Yi Chien 4DSP in VLSI Design

Pipelining of Digital Filters (1/3)FIR filter

Critical path: TM+2TA

Page 5: 3 Pipelining and Parallel Processing

Shao-Yi Chien 5DSP in VLSI Design

Pipelining of Digital Filters (2/3)

Pipelined FIR filter

AMsample TTT +≥AM

sample TTf

+≤

1

Page 6: 3 Pipelining and Parallel Processing

Shao-Yi Chien 6DSP in VLSI Design

Pipelining of Digital Filters (3/3)

Schedule

Page 7: 3 Pipelining and Parallel Processing

Shao-Yi Chien 7DSP in VLSI Design

Pipeline

Can reduce the critical path to increase the working frequency and sample rate

TM+2TA TM+TA

Page 8: 3 Pipelining and Parallel Processing

Shao-Yi Chien 8DSP in VLSI Design

Drawbacks of Pipelining

Increasing latency (in cycle)For M-level pipelined system, the number of delay elements in any path from input to output is (M-1) greater than the origin one

Increase the number of latches (registers)

Page 9: 3 Pipelining and Parallel Processing

Shao-Yi Chien 9DSP in VLSI Design

How to Do Pipelining?

Put pipelining latches across any feed-forward cutset of the graphCutset

A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint

Feed-forward cutsetThe data move in the forward direction on all the edges of the cutset

Page 10: 3 Pipelining and Parallel Processing

Shao-Yi Chien 10DSP in VLSI Design

How to Do Pipelining?

Feed-forward cutset

Page 11: 3 Pipelining and Parallel Processing

Shao-Yi Chien 11DSP in VLSI Design

Example

In the SFG in Fig. (a), all the computation time for each node is 1 u.t.(a) Calculate the critical path(b) The critical path is reduced to 2 u.t. by inserting 3 extra delay elements. Is it a valid pipelining?

Page 12: 3 Pipelining and Parallel Processing

Shao-Yi Chien 12DSP in VLSI Design

Answer

(a) The critical path is A3 A5 A4 A6, 4 u.t.(b) No, it is not a valid pipelining. Fig. (c) is the corrected one

Page 13: 3 Pipelining and Parallel Processing

Shao-Yi Chien 13DSP in VLSI Design

Fine-Grain Pipelining

Critical path (TM=10, TA=2)TM+2TA=14TM+TA=12TM1=6 or TM2+TA=6

Page 14: 3 Pipelining and Parallel Processing

Shao-Yi Chien 14DSP in VLSI Design

Notes for Pipelining (1/2)

Pipelining is a very simple design technique which can maintain the input output data configuration and sampling frequencyTclk=TsampleSupported in many EDA toolsStill has some limitations

Pipeline bubblesHas some problems for recursive systemIntroduces large hardware cost for 2-D or 3-D dataCommunication bound

Page 15: 3 Pipelining and Parallel Processing

Shao-Yi Chien 15DSP in VLSI Design

Notes for Pipelining (2/2)

Effective pipeliningPut pipelining registers on the critical pathBalance pipelining

10 (2+8): critical path=810 (5+5): critical path=5

Page 16: 3 Pipelining and Parallel Processing

Shao-Yi Chien 16DSP in VLSI Design

Parallel of Digital Filters (1/5)Single-input single-output (SISO) system

Multiple-input multiple-output (MIMO) system

3-Parallel System!

Page 17: 3 Pipelining and Parallel Processing

Shao-Yi Chien 17DSP in VLSI Design

Parallel of Digital Filters (2/5)

Parallel processing, block processingBlock size (L): the number of data to be processed at the same timeBlock delay (L-slow)

A latch is equivalent to L clock cycles at the sample rate

Page 18: 3 Pipelining and Parallel Processing

Shao-Yi Chien 18DSP in VLSI Design

Parallel of Digital Filters (3/5)

The critical path is the same

Tclk is not equal to Tsample

L-slow

Page 19: 3 Pipelining and Parallel Processing

Shao-Yi Chien 19DSP in VLSI Design

Parallel of Digital Filters (4/5)Whole system

Page 20: 3 Pipelining and Parallel Processing

Shao-Yi Chien 20DSP in VLSI Design

Parallel of Digital Filters (5/5)

Parallel-to-serial converterSerial-to-parallel converter

Page 21: 3 Pipelining and Parallel Processing

Shao-Yi Chien 21DSP in VLSI Design

Parallel Processing v.s. Pipelining

Parallel processing is superior than pipelining processing for the I/O bottleneck (communication bounded)

Pipelining only can increase the clock rateSystem clock rate = sample rate for pipelining systemUse parallel processing can further lower the required working frequency

Page 22: 3 Pipelining and Parallel Processing

Shao-Yi Chien 22DSP in VLSI Design

Pipelining-Parallel Architecture

Page 23: 3 Pipelining and Parallel Processing

Shao-Yi Chien 23DSP in VLSI Design

Notes for Parallel Processing

The input/output data access scheme should be carefully designed, it will cost a lot sometimesTclk>Tsample, fclk<fsample

Large hardware costCombine with pipelining processing

Page 24: 3 Pipelining and Parallel Processing

Shao-Yi Chien 24DSP in VLSI Design

Low Power Issues

Pipelining and parallel processing are also beneficial for low power designPropagation delay

Power consumption

Assume the sampling frequency is the same

Page 25: 3 Pipelining and Parallel Processing

Shao-Yi Chien 25DSP in VLSI Design

Pipelining for Low Power (1/2)

For M-level pipeliningCritical path is reduced to 1/MThe capacitance is also reduced to Ccharge/MThe supply voltage can be reduced to , and the propagation delay remains unchanged

Page 26: 3 Pipelining and Parallel Processing

Shao-Yi Chien 26DSP in VLSI Design

Pipelining for Low Power (2/2)

Power consumption:

How about the parameter ?

Solve this equation to get

Page 27: 3 Pipelining and Parallel Processing

Shao-Yi Chien 27DSP in VLSI Design

ExampleParameters

TM=10 u.t.TA=2 u.t.Tm1=6 u.t.Tm2=4 u.t.CM=5CAVt=0.6VNormal Vcc=5V

(a) New supply voltage?(b) Power saving percentage?

Page 28: 3 Pipelining and Parallel Processing

Shao-Yi Chien 28DSP in VLSI Design

Answer(a)Origin system:Pipelined system:

(b)

Invalid value, less than threshold voltage

Page 29: 3 Pipelining and Parallel Processing

Shao-Yi Chien 29DSP in VLSI Design

Parallel Processing for Low Power (1/2)

For L-parallel systemClock period: Tseq LTseq

Ccharge remains unchangedCtotol LCtotal

Have more time to charge the capacitance, the supply voltage can be lower

Page 30: 3 Pipelining and Parallel Processing

Shao-Yi Chien 30DSP in VLSI Design

Parallel Processing for Low Power (2/2)

Power consumption

To derive the parameter

Page 31: 3 Pipelining and Parallel Processing

Shao-Yi Chien 31DSP in VLSI Design

Example

ParametersTM=8 u.t.TA=1 u.t.Tsample=9 u.t.CM=8CAVt=0.45VNormal Vcc=3.3V

(a) New supply voltage?(b) Power saving percentage?

Page 32: 3 Pipelining and Parallel Processing

Shao-Yi Chien 32DSP in VLSI Design

Answer(a)Origin:2-parallel system:

(b)

Invalid value, less than threshold voltage

Page 33: 3 Pipelining and Parallel Processing

Shao-Yi Chien 33DSP in VLSI Design

Pipelining-Parallel for Low Power

Page 34: 3 Pipelining and Parallel Processing

Shao-Yi Chien 34DSP in VLSI Design

Conclusion

PipeliningReduce the critical pathIncrease the clock speedReduce power consumption at the same speed

ParallelIncrease effective sampling rateReduce power consumption at the same speed