Pipelining and Parallel Processing -...

37
VLSI Digital Signal Processing Systems Pipelining and Parallel Processing Lan-Da Van (范倫達), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Spring, 2007 [email protected] http://www.cs.nctu.edu.tw/~ldvan/

Transcript of Pipelining and Parallel Processing -...

  • VLSI Digital Signal Processing Systems

    Pipelining and Parallel Processing

    Lan-Da Van (), Ph. D.Department of Computer ScienceNational Chiao Tung University

    Taiwan, R.O.C.Spring, 2007

    [email protected]

    http://www.cs.nctu.edu.tw/~ldvan/

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-2

    Outlines

    IntroductionPipelining of FIR Digital FilterParallel ProcessingPipelining and Parallel Processing for Low PowerConclusions

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-3

    Introduction

    PipeliningReduce the critical pathIncrease the clock speed or sample speedReduce power consumption

    Parallel processingNot reduce the critical pathNot increase clock speed, but increase sample speedReduce power consumption

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-4

    A 3-tap FIR Filter

    Direct-form structure

    )2()1()()( ++= ncxnbxnaxny

    AMsample TTf

    21+

    AMsample TTT 2+

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-5

    Outlines

    IntroductionPipelining of FIR Digital FilterParallel ProcessingPipelining and Parallel Processing for Low PowerConclusions

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-6

    Pipelining and Parallel Concept

    TA

    PipeliningIntroduce pipelining latches along the datapath

    Parallel processingDuplicate the hardware

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-7

    Pipelining FIR Filter

    Critical path2TA+TM-->TA+TM

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-8

    Pipelining (1/2)

    DrawbacksIncrease number of delay elements (registers/latches) in the critical pathIncrease latency

    Clock period limitation: critical path may be betweenAn input and a latchA latch and an output2 LatchesAn input and an output

    Pipelining latches can only be placed across any feed-forward cutset of the graph

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-9

    Pipelining (2/2)

    Cutset: A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint.Feed-forward cutset: A cutset is called a feed-forward cutset if the data move in the forward direction on all the edges of the cutset.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-10

    Example 3.2.1

    4 u.t.

    Error!

    2 u.t.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-11

    Transposition Theorem

    Reversing the direction of all edges in a given SFG and interchanging the input and output ports preserve the functionality of the system.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-12

    Data-Broadcast Structure

    Critical path is reduced to (TM+TA).

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-13

    Fine-Gain Pipelining

    Let TM=10 u.t., TA=2 u.t., and the desired clock period=6 u.t.Break the MULTIPLIER into 2 smaller units with processing time of 6 and 4 units.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-14

    Outlines

    IntroductionPipelining of FIR Digital FilterParallel ProcessingPipelining and Parallel Processing for Low PowerConclusions

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-15

    Parallel Processing

    Parallel processing and pipelining are dualIf a computation can be pipelined, it can also be processed in parallel.Convert a single-input single-output (SISO) system to multiple-input multiple-output (MIMO) system via parallelism

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-16

    Parallel Processing of 3-Tap FIR Filter (1/2)

    )3()13()23()23()13()3()13()13(

    )23()13()3()3(

    kcxkbxkaxkykcxkbxkaxky

    kcxkbxkaxky

    ++++=++++=+

    ++=)2()1()()( ++= ncxnbxnaxny

    )2(311

    AMclk

    sampleiter

    TTTL

    TT

    +=

    =

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-17

    Parallel Processing of 3-Tap FIR Filter (2/2)

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-18

    Complete Parallel Processing System

    Critical path has remained unchanged.But the iteration period is reduced.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-19

    S/P and P/S Converter

    Edge Trigger!

    Edge Trigger!

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-20

    Why Parallel Processing ?

    Parallel leads to duplicating many copies of hardware, and the cost increases! Why use? Answer lies in the fact that the fundamental limit to pipelining is at I/O bottlenecks, referred to as Communication Bound, composed of I/O pad delay and the wire delay.

    Parallel Transmission

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-21

    Combined Fine-Grain Pipelining and Parallel Processing

    )2(611

    AMclk

    sampleiter

    TTTLM

    TT

    +==

    =

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-22

    Outlines

    IntroductionPipelining of FIR Digital FilterParallel ProcessingPipelining and Parallel Processing for Low PowerConclusions

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-23

    Underlying Low Power Concept

    Propagation delay

    Power consumption

    Sequential filter

    20

    0

    )Vk(VVC

    Tt

    chargepd

    =

    fVCP total2

    0=

    seqTf 1=,2

    0

    0

    )Vk(VVC

    Tt

    chargeseq

    =,20 fVCP totalseq =

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-24

    Pipelining for Low-Power (1/2)

    M-level pipelined systemCritical path-->1/M, capacitance to be charged in a single clock cycle-->1/MIf the clock frequency is maintained, the power supply can be reduced to V0 (0

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-25

    Pipelining for Low-Power (2/2)

    Power consumption

    Propagation delay

    Let Tseq=Tpip

    seqtotalpip PfVCP22

    02 ==

    20

    0

    )VVk(

    VM

    C

    Tt

    charge

    pip =

    ,2

    0

    0

    )Vk(VVC

    Tt

    chargeseq

    =

    get )()( 202

    0 ==>= tt VVVVM

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-26

    Example 3.4.1 (1/2)

    Consider an original 3-tap FIR filter and its fine-grain pipeline version shown in the following figures. Assume TM=10 ut, TA=2 ut, Vt=0.6V, Vo=5V, and CM=5CA.In fine-grain pipeline filter, the multiplier is broken into 2 parts, m1 and m2 with computation time of 6 u.t. and 4 u.t. respectively, with capacitance 3 times and 2 times that of an adder, respectively. (a) What is the supply voltage of the pipelined filter if the clock period remains unchanged? (b) What is the power consumption of the pipelined filter as a percentage of the original filter?

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-27

    Example 3.4.1 (2/2)

    Solution:

    %4.36

    V0165.3e)(infeasibl 0239.0or 6033.0

    )6.05()6.05(2

    3 :GrainFine

    6 :Original

    2

    2221

    ==

    ===>

    =

    =+==

    =+=

    Ratio

    V

    CCCCC

    CCCC

    pip

    AAmmcharge

    AAMcharge

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-28

    Comparison

    System Sequential FIR (Original)

    Pipelined FIR (Without reducing Vo)

    Pipelined FIR (With reducing Vo)

    Power (Ref) Ref 2Ref 0.364Ref

    Clock Period (u.t.)

    12 ut 6 ut 12 ut

    Sample Period (u.t.)

    12 ut 6 ut 12 ut

    Thinking Again!

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-29

    Parallel Processing for Low-Power

    L-parallel systemMaintain the same sample rate, clock period is increased to LTseqThis means that Ccharge is charged in LTseq, and the power supply can be reduced to V0

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-30

    Parallel Processing for Low-Power

    Power consumption

    Propagation delay

    LTseq=Tpap

    get )()( 202

    0 ==>= tt VVVVL

    seqtotalpar PLfVLCP 220 ))(( ==

    ,20

    0

    )Vk(VVC

    Tt

    chargeseq

    = 20

    0

    )VVk(VC

    LTt

    chargeseq

    =

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-31

    Example 3.4.2 (1/2)Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-parallel version in 3.18(b). The two architectures are operated at the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V, CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b) What is the power consumption of the 2-parallel filter as a percentage of the original filter?

    Solution:

    %41.43Ratio

    2.1743VVpar0282.0or 6589.0

    )6.03.3(5)45.03.3(9

    102 :Parallel2

    :Original

    2

    22

    ==

    ===>

    =

    =+=

    +=

    AAMcharge

    AMcharge

    CCCC

    CCC

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-32

    Example 3.4.2 (2/2)

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-33

    Example 3.4.3 (1/2)A more efficient structure than the previous one is depicted in Fig. 3.18(c). (a) What is the supply voltage of the efficient 2-parallel filter? (b) What is the power consumption of the efficient 2-parallel filter as a percentage of the original filter?

    Solution:

    %6.4335

    2155

    Ratio

    45857.2e)(infeasibl 0.025or 0.745

    )6.03.3(12)45.03.3(92

    124 : Parallel-2 New

    9 : Original

    20

    20

    2

    22

    =

    ==

    ==

    =

    =+=

    =+=

    sA

    sA

    seq

    par

    pip

    AAMcharge

    AAMcharge

    fVC

    fVC

    PP

    VV

    CCCC

    CCCC

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-34

    Example 3.4.3 (2/2)

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-35

    Combining Pipelining and Parallel Processing

    Parallel-pipelined structure

    M=L=2, V0=5V, Vt=0.6V-->=0.4, 2=0.16

    20

    20 )()( tt VVVVML ==> = seqLTTpp =

    ,20

    0

    )Vk(VVC

    Tt

    chargeseq

    = 20

    0

    )VVk(

    VM

    C

    Tt

    charge

    pp =

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-36

    Conclusions

    Methodologies of pipelining of 3-tap FIR filterMethodologies of parallel processing for 3-tap FIR filter Methodologies of using pipelining and parallel processing for low power demonstration.Pipelining and parallel processing of recursive digital filters using look-ahead techniques are addressed in Chapter 10.

  • VLSI Digital Signal Processing Systems

    Lan-Da Van VLSI-DSP-3-37

    Self-Test Exercises

    STE1: Problem 8 of Chap 3 in text book.STE2: Problem 9 of Chap 3 in text book.STE2: Problem 10 of Chap 3 in text book.

    Pipelining and Parallel ProcessingOutlinesIntroductionA 3-tap FIR FilterOutlinesPipelining and Parallel ConceptPipelining FIR FilterPipelining (1/2)Pipelining (2/2)Example 3.2.1Transposition TheoremData-Broadcast StructureFine-Gain PipeliningOutlinesParallel ProcessingParallel Processing of 3-Tap FIR Filter (1/2)Parallel Processing of 3-Tap FIR Filter (2/2)Complete Parallel Processing SystemS/P and P/S ConverterWhy Parallel Processing ?Combined Fine-Grain Pipelining and Parallel ProcessingOutlinesUnderlying Low Power ConceptPipelining for Low-Power (1/2)Pipelining for Low-Power (2/2)Example 3.4.1 (1/2)Example 3.4.1 (2/2)ComparisonParallel Processing for Low-PowerParallel Processing for Low-PowerExample 3.4.2 (1/2)Example 3.4.2 (2/2)Example 3.4.3 (1/2)Example 3.4.3 (2/2)Combining Pipelining and Parallel ProcessingConclusionsSelf-Test Exercises