ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.
-
Upload
nickolas-reed -
Category
Documents
-
view
245 -
download
3
Transcript of ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.
![Page 1: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/1.jpg)
ELEC692 VLSI Signal Processing Architecture
Lecture 1Introduction to DSP Systems
![Page 2: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/2.jpg)
Issues of VLSI Signal Processing Architecture
• Performance• Area/Cost• Speed of execution, throughput and clock rate• Power dissipation or amount of energy required
to perform a given task• Fixed-point DSP systems- finite wordlength
performance– Quantization and roundoff noise
• Special features of DSP systems– Real-time throughput requirements– Data-driven property
![Page 3: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/3.jpg)
Typical DSP algorithm and applications (I)
• Speech coding and decoding, Speech encryption and decryption– Cell phones, cordless phone,multimedia computer, secure
communications
• Speech recognition– Advanced user interface, phones, consumer products,
machine/human interface
• Speech synthesis– Advanced user interface, consumer products, machine/human
interface
• Modem algorithms– Phones, wireless communications, data/fax modems, secure
communications
![Page 4: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/4.jpg)
Typical DSP algorithm and applications (II)
• Noise cancellation– Audio applications, wireless communications
• Audio Equalization– Audio applications
• Image compression and decompression– Digital camera, video, multimedia applications
• Beamforming– Navigation, radar/sonar, wireless communications
• Echo cancellation– Speakerphones, modems, telephone switches
![Page 5: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/5.jpg)
Issues in wireless system design
• Ubiquitous services put wireless system spectrum at a premium
• Current spectral efficiency far below theoretical limits
• Emerging solutions– Adoption of better spectrum utilization techniques
• E.g. interference cancellation, multiple antenna, MIMO system
• Multi-functional, adaptive systems
• Even higher bit-rate wireless applications– IEEE 802.11a, wireless IEEE 1394
![Page 6: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/6.jpg)
Improving Spectral Density and higher bit rate comes at a performance and power cost
• Digital baseband processing requirements
Wide-band CDMA FDMA with multiple antenna
Match Filter
Blind MMSE
Exact Decorrelator
SVD
Performance Bits/sec/Hz
1 2 2 6
Multiplications
124 496 230,000 736
Memory 248 1240 640,000 2120
ALU 124 502 240,000 800
Word-length 8-bit 12-bit 16-bit 16-bit
From Jan Rabaey of UC Berkeley
![Page 7: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/7.jpg)
Shannon beats Moore’s Law
![Page 8: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/8.jpg)
Energy plays a critical role
Battery capacity
![Page 9: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/9.jpg)
Programmable processor vs. ASIC
• DSP Selection guide for mobile multimedia
![Page 10: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/10.jpg)
DSP computation - Convolution
k
knhkxnhnxny )()()()()(
•Describe and analyze linear time-invariant (LTI) systems, which are completely characterized by their unit-sample( or impluse) response h(n)•Finite impulse response (FIR) – systems containing a finite number of nonzero samples, i.e. h(n) is of finite duration•infinite impulse response (IIR) –h(n) is of infinite duration•A system is causal of y(n0) depends only on the past input samples x(k) , k<= n0.
![Page 11: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/11.jpg)
DSP computation - Correlation
• Widely used in digital communication• Correlation of 2 sequences a(n) and x(n):
• It can be described as a convolution as follows:
• If a(n) and x(n) have finite length N, these are nonzero for n=0,1,…,N-1, the digital correlation operations is given as:
k
knxkany )()()(
)(*)()()()( nxnaknxkanyk
1
0
)()()(N
k
knxkany
![Page 12: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/12.jpg)
DSP computation – Digital Filters
• Properties of a causal digital filter is characterized by its unit-sample response h(n) or its frequency response H(ejw) or by difference equations.
• A linear, time-invariant, and causal filter is given by
• If ak=0 for 1<= k <= N, we have
• This is a non-recursive M-tap finite impulse response (FIR) Filter, where h(k) = bk.
• If one of the is ak>0, then this is a recursive filter and its corresponding unit-sample response has infinite duration. This is referred as IIR filter
1
01
)()()(M
kk
N
kk knxbknyany
1
0
)()(M
kk knxbny
![Page 13: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/13.jpg)
DSP computation – Digital Filters
• Linear-phase FIR filter– Unit-sample responses are
symmetric and require only half the number of multiplications
– For a M-tap linear phase FIR filter: h(n)=h(M-n).
– E.g. 7-tap linear phase FIR filter with impulse response h(0)=h(6)=b0 h(1)=h(5)=b1, h(2)=h(4)= b2, h(3)= b3,
– Y(n)= b0x(n)+ b1x(n-1)+ b2x(n-2)+ b3x(n-3)+ b2x(n-4)+ b1x(n-5)+ b0x(n-6)
![Page 14: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/14.jpg)
DSP computation – Adaptive Filter
• The filter coefficient is changing and updated at each iteration.
• Used for applications such as echo cancellation, channel equalization, voiceband modem and many others.
• It predict one random process y(n) from observations of another random process x(n) using linear models such as digital filters.
• Coefficients are updated in order to minimize the difference between the filter output and the desried signal. Updating process continues until the coefficient converges.
• Consists of two blocks: a general filter block and a coefficient updating block.
![Page 15: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/15.jpg)
DSP computation – LMS Adaptive Filter
• Notations:– WT(n) = [w1(n), w2(n),..,wN(n)]=weighted vector
– UT(n) = [u(n),u(n-1),…,u(n-N+1)]= vector of current and past input samples
– is the estimated signal and e(n) is the estimation error.
– We have
)(ˆ nd
)()1()()(ˆ)()(
)()1()(ˆ
nUnWndndndne
nUnWndT
T
![Page 16: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/16.jpg)
DSP computation – LMS Adaptive Filter
• In the n-th iteration, the LMS algorithm selects WT(n) which minimizes the square error e(n)2
• LMS adaptive filters consists of an FIR filter block with coefficient vector WT(n) and input sequence u(n) and a weight update block.
![Page 17: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/17.jpg)
DSP computation – LMS Adaptive Filter
• Weight update algorithm
eUUUWd
UUWdUW
ee
T
TTW T
2)(2
22)(2
2
)()()1()(
))((2
1)1()( 2
nUnenWnW
nenWnW TW
![Page 18: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/18.jpg)
Other common DSP computations• Motion estimation
– Used in interframe predictive coding• Discrete Cosine Transform
– Frequency transform used in image processing• Fast Fourier Transform
– Frequency transform used in communication and audio/voice processing
• Vector Quantization– Used for data compression in speech, image and video coding
• Viterbi algorithm– Error control coding, used for communication and other data
correction applications.• Decimator and Expanding
– Multirate systems for image compression, digital audio and adaptive signal processing
![Page 19: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/19.jpg)
Implementation of DSP algorithms
• A lot of applications can be implemented in programmable DSP processor or media-microprocessor
• For some applications, due to complexity and power issue, special VLSI architecture or ASICs are still required
• E.g. – MPEG2 encoder – Block Matching for ME for HDTV frame needs ~370 GOPs/sec
• - 2D-DCT for HDTV = 3.84 GOPs/sec
![Page 20: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/20.jpg)
DSP representation• Non-terminating programs and iteration based
)2()1()()( 210 nxhnxhnxhny
• Iteration period – time required to execute one iteration• Sampling rate (throughput) – number of samples processed per second• Latency – difference between the time an output is generated and the time at which its
corresponding input was received• Critical path delay• Clock period (clock rate is not equal to sampling rate)
DSPInput x(n) Output y(n)
For n=1 to n=
![Page 21: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/21.jpg)
DSP representation• Mathematical formulation• Behavioral descriptive Language
– Applicative language• Set of equations
– Prescriptive languages• Specify order of assignment statement
– E.g. Pascal, C, SystemC
– Descriptive Languages• Represent structure of the DSP system• E.g. VHDL, Verilog
• Graphical Representation– For investigating and analyzing data flow properties– Exhibit parallelism and data-driven (dependency) properties, provide
insight for space-time tradeoff.– Mapping DSP algorithms to hardware implementation
• Block diagram, Signal-Flow Graph (SFG), Data-Flow Graph (DFG), and dependence graph (DG).
![Page 22: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/22.jpg)
Block Diagram
• Consists of functional blocks connected with directed edges, which represents the data flow from its input block to output block.
• Edges may or may not contain delay elements
![Page 23: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/23.jpg)
Signal Flow Graph (SFG)
• SFG is a graph whose nodes represent computations/tasks and directed edge e(j,k) denotes a branch from node j and terminating at node k.
• With input signal at node j and output signal at node k, e(j,k) denotes a linear transformation from the signal at node j to the signal at node k.
• In digital network, the edges are usually restricted to constant gain multipliers, or delay elements
• Adders and multipliers are described by a node with multiple incoming edges and one outgoing edge.
• 2 special nodes – sink and source
![Page 24: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/24.jpg)
Example SFG of a direct-form 3-tap FIR filter
![Page 25: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/25.jpg)
Transposition of SFG
• Linear SFGs can be transformed into different forms– Flow graph reversal or transposition for
Single-input-single-output (SISO) systems– Transform operations
• Reversing the direction of all edges• Exchanging the input and output nodes while
keeping the edge gain or edge delay unchanged• Resulting SFG maintains the same functionality
![Page 26: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/26.jpg)
Data Flow Graph (DFG)• Graph G = (N,E) where nodes represent computations
(or functions or subtasks) and directed edges represent data paths (communications between nodes). Each edge has a non-negative number of delays associated.
![Page 27: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/27.jpg)
Data Flow Graph (DFG)
• DFG captures the data-driven property• Node can execute only when all the input data are
available.• Concurrency execution• A node with multiple input edges can only execute when
all its precedent nodes have executed, thus, describing the precedence constraints– If edge has zero delay – intra-iteration precedence– If edge has non-zero delay – inter-iteration precedence
• DFG are generally used for high-level synthesis, map concurrent implementation of DSP applications onto parallel hardware– Task scheduling and resource allocation
![Page 28: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/28.jpg)
Example of DFG
![Page 29: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/29.jpg)
Synchronous Data Flow graph (SDFG)
• Special case of DFG– Number of data samples produced or consumed by each node
in each execution is specified a priori– Both for single-rate and multi-rate systems– Unrolling (unfolding) multirate systems to single-rate.
![Page 30: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/30.jpg)
Dependence Graph
• A directed graph that shows the dependence of the computation
• Nodes represent computations and edges represent precedence constraints
• Similar to DFG except nodes in DFG only cover the computations in one iteration, where as DG contains computations for all iterations. DFG contains delay elements that store and pass data between iterations while DG does not contain delay elelments
![Page 31: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/31.jpg)
Example of a DG
![Page 32: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/32.jpg)
Critical Path of a DFG• Critical path – path with the longest computation time among all
paths that contain zero delay (i.e. without delay element)• The minimum clock period of the DSP system depends on the
critical path delay• In DSP systems, e.g. filter element, the critical path depends on the
delay of the following:– Input to the delay element– Input to the output– Delay element to the output– Delay element to delay element E.g.
D D D D
X X X
++X
In
Out
2 2 2
111
![Page 33: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/33.jpg)
Critical path comparison
D D
X
+ D+ +
X X X
X(n)
y(n)
D D
X
+
D
+ +
X X X
X(n)
y(n)
Direct Form 4-tap FIR
Transposed Form 4-tap FIR
Critical Path = Delay(mult)+(N-1) delay(add)Delay element: shorter bitwidth
Critical Path = Delay(mult+ delay(add)Delay element: longer bitwidth- Fanout of the input is larger
![Page 34: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/34.jpg)
Iteration Period• Iteration: execution of all computations of an
algorithm once• Iteration period: the time required for execution
of an iteration• E.g. y(n) = ay(n-1) + x(n)
D
X(n) y(n-1)
a
(2)
(4)
...221100 BABABA
y(n)
X(n)
D
(2)(4)
aAB
![Page 35: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/35.jpg)
Loop Bound
• Loop: a directed path that begins and ends at the same nodes.
• Loop Bound of the loop– Lower bound on the loop computation time
– Defined as tl/wl, where tl is the loop computation time and wl is the number of delays in the loop
• E.g.y(n)
X(n)
D
(2)(4)
aAB
A,B, A is a loop andTl = 2+ 4, Wl = 1And hence loop bound =6
![Page 36: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/36.jpg)
Loop Bound• Another example
y(n)
X(n)
2D
(2)(4)
aAB
A,B, A is a loop andTl = 2+ 4, Wl = 2 (since 2D)And hence loop bound =3
It means one iteration of loop can be executed in 3 time unit. This can be done in two independent set of precedence constraints
oddBABABA
evenBABABA
...
...
553311
442200
• Another example
A B C
2D
(2) (4)(5)
Two loopsA->B->A: T = 6, W = 2, bound = 3
A->B->C->A, T = 11, W = 1, bound = 11
Hence the loop bound of this isMax{3,11} = 11
D
![Page 37: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/37.jpg)
Iteration Bound
• Critical Loop- the loop with maximum loop bound
• Iteration bound (Tit)- the loop bound of the critical loop,
• Not possible to achieve iteration period lower than iteration bound even with infinite processing power
• E.g.
ii
ii
i
i
loopalliit loopindelayofW
loopoftimentcomputatioT
W
TT
#
_max_
A B C D
D
D
D
2D
(4) (3) (2) (4)
Loop(A->B->A) (T/W=7/1=7Loop(A-B->C->A) T/W = 9/2=4.5Loop(B->C->D->B) T/W = 9/3=3Iteration Bound= max(7,4.5,3)=7
![Page 38: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/38.jpg)
Algorithms for computing iteration bound
• Longest Path Matrix Algorithm
• Minimum Cycle Mean Algorithm
![Page 39: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/39.jpg)
Longest Path Matrix Algorithm (LPM)
• Construct a series of matrix, iteration bound is found by examining the diagonal elements of the matrices
• Let d be the number of delay element in the DFG, and di be the ith delay element.
• Construct matrix L(m), where m =1,2,…,d such that the value of is the longest computation time of all paths from delay element di to dj that pass through exactly m-1 delays. =-1 if no such path.
• L(m+1) can be obtained form L(1) and L(m) recursively by, if there is k such that ,
otherwise =-1
)(,mjil
)(,mjil
mjkki
mji lll ,
1,
)1(,
)1(,
mjil
![Page 40: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/40.jpg)
LPM algorithm• The diagonal element represents the longest
computation time of all loops with m delays contains di. Then the iteration bound is equal to
dmiform
lT
mii
it ,1}max{)(
,
![Page 41: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/41.jpg)
LPM algorithm (example)1
2
1115
0115
1014
1101
)1(L
3
4
5
6
D
D
D
D
(1)
(1)
(1)
(2)
(2)
(2)
d1
d2
d3
d4
)1(1,3le.g. All paths form d3 to d1 that pass
Through exactly zero delay:Path: d3->5->3->2->1->d1,
)1(1,3l =2+1+1+1=5e.g.
5)50,1max(
),1(max )1(1,
)1(,2
}3{
)2(1,2
kkk
lll
1151
1155
0144
1014
)2(L
1519
1559
1458
0145
)3(L
51910
55910
4589
1458
)4(L
2}4
5,4
5,4
8,4
8,3
5,3
5,3
5,2
4,2
4max{
max ,
},...,2,1{,
m
lT
mii
dmiit
![Page 42: ELEC692 VLSI Signal Processing Architecture Lecture 1 Introduction to DSP Systems.](https://reader035.fdocuments.us/reader035/viewer/2022062221/56649e055503460f94af16ce/html5/thumbnails/42.jpg)
LPM algorithm (another example)
1616
1212
88
44
)2(
)1(
L
L
1 2 3 4 5 6
7
DD
(1) (2) (1) (1) (2) (1)
(1)d2 d1
8}2
16,
2
12,1
8,1
4max{ itT