Pipelining and Parallel Processing -...
Transcript of Pipelining and Parallel Processing -...
1
Pipelining and
Parallel Processing
Shao-Yi Chien
DSP in VLSI Design Shao-Yi Chien 2
Introduction
Pipelining and parallel are the most important
design techniques in VLSI DSP systems
Make use of the inherent parallel property of
DSP algorithms
Pipelining: different function units working in parallel
Parallel: duplicated function units working in parallel
DSP in VLSI Design Shao-Yi Chien 3
An Example:
Car Production Line
mT
T T T
mT
mT
mT
m
n
Cost: 1
Throughput: 1/mT
Latency: mT
Throughput: how
many cars produced in
one hour
Latency: how long will
it take to produce one
carCost: >1
Throughput: 1/T
Latency: >mT
Cost: >n
Throughput: (1/mT)*n
Latency: >mT
DSP in VLSI Design Shao-Yi Chien 4
Pipelining of Digital Filters (1/3)
FIR filter
Critical path: TM+2TA
DSP in VLSI Design Shao-Yi Chien 5
Pipelining of Digital Filters (2/3)
Pipelined FIR filter
AMsample TTT AM
sampleTT
f
1
DSP in VLSI Design Shao-Yi Chien 6
Pipelining of Digital Filters (3/3)
Schedule
DSP in VLSI Design Shao-Yi Chien 7
Pipeline
Can reduce the critical path to increase
the working frequency and sample rate
TM+2TATM+TA
DSP in VLSI Design Shao-Yi Chien 8
Drawbacks of Pipelining
Increasing latency (in cycle)
For M-level pipelined system, the number of
delay elements in any path from input to
output is (M-1) greater than the origin one
Increase the number of latches (registers)
DSP in VLSI Design Shao-Yi Chien 9
How to Do Pipelining?
Put pipelining latches across any feed-forward cutset of the graph
Cutset A cutset is a set of edges of a graph such that if these
edges are removed from the graph, the graph becomes disjoint
Feed-forward cutset The data move in the forward direction on all the
edges of the cutset
DSP in VLSI Design Shao-Yi Chien 10
How to Do Pipelining?
Feed-forward cutset
DSP in VLSI Design Shao-Yi Chien 11
Example
In the SFG in Fig. (a), all the computation time for each node is 1 u.t.
(a) Calculate the critical path
(b) The critical path is reduced to 2 u.t. by inserting 3 extra delay elements. Is it a valid pipelining?
DSP in VLSI Design Shao-Yi Chien 12
Fine-Grain Pipelining
Critical path (TM=10, TA=2) TM+2TA=14
TM+TA=12
TM1=6 or TM2+TA=6
DSP in VLSI Design Shao-Yi Chien 13
Notes for Pipelining (1/2)
Pipelining is a very simple design technique which can maintain the input output data configuration and sampling frequency
Tclk=Tsample
Supported in many EDA tools
Still has some limitations Pipeline bubbles
Has some problems for recursive system
Introduces large hardware cost for 2-D or 3-D data
Communication bound
DSP in VLSI Design Shao-Yi Chien 14
Notes for Pipelining (2/2)
Effective pipelining
Put pipelining registers on the critical path
Balance pipelining
10(2+8): critical path=8
10(5+5): critical path=5
DSP in VLSI Design Shao-Yi Chien 15
Parallel of Digital Filters (1/5) Single-input single-output (SISO) system
Multiple-input multiple-output (MIMO) system
3-Parallel System!
DSP in VLSI Design Shao-Yi Chien 16
Parallel of Digital Filters (2/5)
Parallel processing, block processing
Block size (L): the number of data to be
processed at the same time
Block delay (L-slow)
A latch is equivalent to L clock cycles at the
sample rate
DSP in VLSI Design Shao-Yi Chien 17
Parallel of Digital Filters (3/5)
The critical path is the same
Tclk is not equal to Tsample
L-slow
DSP in VLSI Design Shao-Yi Chien 18
Parallel of Digital Filters (4/5)
Whole system
DSP in VLSI Design Shao-Yi Chien 19
Parallel of Digital Filters (5/5)
Serial-to-parallel converter Parallel-to-serial converter
DSP in VLSI Design Shao-Yi Chien 20
Parallel Processing v.s.
Pipelining Parallel processing is superior than pipelining
processing for the I/O bottleneck
(communication bounded)
Pipelining only can increase the clock rate
System clock rate = sample rate for pipelining system
Use parallel processing can further lower the required
working frequency
DSP in VLSI Design Shao-Yi Chien 21
Pipelining-Parallel Architecture
DSP in VLSI Design Shao-Yi Chien 22
Notes for Parallel Processing
The input/output data access scheme
should be carefully designed, it will cost a
lot sometimes
Tclk>Tsample, fclk<fsample
Large hardware cost
Combined with pipelining processing
DSP in VLSI Design Shao-Yi Chien 23
Low Power Issues
Pipelining and parallel processing are also beneficial for low power design
Propagation delay
Power consumption
Assume the sampling frequency is the same
DSP in VLSI Design Shao-Yi Chien 24
Pipelining for Low Power (1/2)
For M-level pipelining
Critical path is reduced to 1/M
The capacitance is also reduced to Ccharge/M
The supply voltage can be reduced to , and the
propagation delay remains unchanged
DSP in VLSI Design Shao-Yi Chien 25
Pipelining for Low Power (2/2)
Power consumption:
How about the parameter ?
Solve this equation to get
DSP in VLSI Design Shao-Yi Chien 26
Example
Parameters TM=10 u.t.
TA=2 u.t.
Tm1=6 u.t.
Tm2=4 u.t.
CM=5CA
Vt=0.6V
Normal Vcc=5V
(a) New supply voltage?
(b) Power saving percentage?
DSP in VLSI Design Shao-Yi Chien 27
Answer(a)
Origin system:
Pipelined system:
(b)
Invalid value, less
than threshold
voltage
DSP in VLSI Design Shao-Yi Chien 28
Parallel Processing for Low
Power (1/2) For L-parallel system
Clock period: TseqLTseq
Ccharge remains unchanged
CtotolLCtotal
Have more time to charge the capacitance, the supply
voltage can be lower
DSP in VLSI Design Shao-Yi Chien 29
Parallel Processing for Low
Power (2/2)
Power consumption
To derive the parameter
DSP in VLSI Design Shao-Yi Chien 30
Example
Parameters TM=8 u.t.
TA=1 u.t.
Tsample=9 u.t.
CM=8CA
Vt=0.45V
Normal Vcc=3.3V
(a) New supply voltage?
(b) Power saving percentage?
DSP in VLSI Design Shao-Yi Chien 31
Answer(a)
Origin:
2-parallel system:
(b)
Invalid value, less
than threshold
voltage
DSP in VLSI Design Shao-Yi Chien 32
Pipelining-Parallel for Low
Power
DSP in VLSI Design Shao-Yi Chien 33
Conclusion
Pipelining
Reduce the critical path
Increase the clock speed
Reduce power consumption at the same speed
Parallel
Increase effective sampling rate
Reduce power consumption at the same speed