VLSI Designs for Digital Signal...
Transcript of VLSI Designs for Digital Signal...
VLSI DSP 2008 Y.T.HWANG 1-1
VLSI Designs for Digital Signal Processing
Instructor: Yin-Tsung Hwang
Department of Electrical Engineering
National Chung Hsing University
VLSI DSP 2008 Y.T.HWANG 1-2
Chapter 1Introduction to VLSI DSP
VLSI DSP 2008 Y.T.HWANG 1-3
SOC design Era
SOC architecture
MemoryMixed Signal
Processor Coprocessoror
Special FUSoftware
RF
Peripherals
JTAGApplication algorithm
C language
Interface
VLSI DSP 2008 Y.T.HWANG 1-4
Blue Tooth SOC – block diagram
VLSI DSP 2008 Y.T.HWANG 1-5
Blue Tooth SOC – Die photograph
Die size: 40 mm2
CMOS 0.25 m technology
5 metals
Customized function unit& Glue logic
VLSI DSP 2008 Y.T.HWANG 1-6
Hardwired v.s. Programmable Approaches
System design is often a tradeoff amongCost (hardwired)
Time to market (programmable processor)
Area/power (hardwired)
Design flexibility (programmable)
Performance (hardwired)
HW/SW partitioningSoftware modules: executed in CPU (e.g. ARM, PPC), micro processor (e.g. 8051) or DSP
Hardware modules: to perform customized or specific functions
VLSI DSP 2008 Y.T.HWANG 1-7
Wireless Base Station Example
Turbo codingspectrum spread/de-spread
Multi-user detectionRake receiverSmart antennas
HW components
SW components
VLSI DSP 2008 Y.T.HWANG 1-8
DSP for reality processing (1)
Features of DSP algorithmsFiltering, transform, coding and so on
real time process throughput
require high speed and massive computing capabilities
large volume of data
sophisticated algorithm reduced communication BW
VLSI DSP 2008 Y.T.HWANG 1-9
DSP for reality processing (2)
The roles of DSP
VLSI DSP 2008 Y.T.HWANG 1-10
Design by VLSI
Meritshigh integration
high parallelism
High data bandwidth
Reduced power consumption
suitable for modular design
FFT processor design
VLSI DSP 2008 Y.T.HWANG 1-11
What is VLSI DSP?
Implementing specific DSP algorithms in VLSIConverting DSP algorithms to VLSI circuitry
A hardwired solution
Exploiting the merits of VLSI design
Goals: To meet the computing demands
To reduce the power consumption
To reduce the chip area
To reduce the production cost
To achieve wide data bandwidth
To enhance the performance
VLSI DSP 2008 Y.T.HWANG 1-12
Why VLSI DSP? − computing demand (1)
Performance requirements driven by broadband communication
VLSI DSP 2008 Y.T.HWANG 1-13
Why VLSI DSP? − computing demand (2)
Current programmable processor solution
VLSI DSP 2008 Y.T.HWANG 1-14
Why VLSI DSP? − computing demand (3)
Current ASIC (FPGA) solution
VLSI DSP 2008 Y.T.HWANG 1-15
ASIC (FPGA) v.s programmable processor (1)
VLSI DSP 2008 Y.T.HWANG 1-16
ASIC (FPGA) v.s programmable processor (2)
TI C6X processor v.s. Xilinx 4000 series
VLSI DSP 2008 Y.T.HWANG 1-17
Why VLSI DSP? − power issue (1)
Lead microprocessor power continues to increase
Power delivery and dissipation will be prohibitive
VLSI DSP 2008 Y.T.HWANG 1-18
Why VLSI DSP? − power issue (2)
Chip power densitySun’s
VLSI DSP 2008 Y.T.HWANG 1-19
Why VLSI DSP? − power issue (3)
VLSI DSP 2008 Y.T.HWANG 1-20
Why VLSI DSP? − bandwidth issue
Dedicated interconnect among modules to increase data bandwidth
VLSI DSP 2008 Y.T.HWANG 1-21
Applications of VLSI DSP
Video Signal Processing (2D, 3D Filters) Digital communications Neural Networks and
more …….
Signal SynthesisModulation / DemodulationFast Fourier Transforms
VLSI DSP 2008 Y.T.HWANG 1-22
VLSI DSP design issues
Algorithm aspects
Architecture aspects
Circuit implementation aspectsNot addressed in this course
Algorithm to architecture mapping
VLSI DSP 2008 Y.T.HWANG 1-23
Algorithm Aspect (1)
point typegray scale transformation
histogram equalization
quantization
filter typetemplate matching
window technique (e.g. Hamming window)
Convolution / correlation
linear phase filtering
median filtering
moving average
VLSI DSP 2008 Y.T.HWANG 1-24
Algorithm Aspect (2)
Filter type (cont.)wiener filtering optimal for stationary signals to remove additive noise
Kalman filtering good for non-stationary signals
state vectors, measurement equations, errors
inverse filtering special case of Kalman filtering
adaptive filtering Least Mean Square (LMS), Recursive Least Square (RLS)
VLSI DSP 2008 Y.T.HWANG 1-25
Algorithm Aspect (3)
Matrix algebra typesingular value decomposition image processing (coding & enhancement)
channel estimation
Maximum entropy estimation
stochastic pointer estimation
sorter typebitonic sort
bubble sort
insertion sort
VLSI DSP 2008 Y.T.HWANG 1-26
Algorithm Aspect (4)
transform typeFourier transform DFT, FFT, IFFT Time domain v.s. frequency domain transformation Data modulation: OFDM
Cosine transform Discrete cosine transform (MPEG, JPEG) Modified discrete cosine transform (MP3, MPEG4)
Wavelet transform Multi-resolution sub-band coding, JPEG2000, MPEG4
Hough transform to determine all possible straight lines & curves
Hadamard transform speech processing, word recognition, data compression
VLSI DSP 2008 Y.T.HWANG 1-27
Algorithm Aspect (5)
Algorithm selectionPerformance evaluation E.g. LMS vs. RLS
Computing complexity E.g. full search v.s. fast algorithm in motion estimation
Data bandwidth complexity E.g. memory bandwidth, data storage size, I/O pin count
Numerical property E.g. FIR vs. IIR, rounding effect
Parallelism E.g. Schur v.s. Levinson Durbin algorithm
Hardware module reuse
VLSI DSP 2008 Y.T.HWANG 1-28
Algorithm Aspect (6)
Algorithm refinementtransform for parallelism E.g. look ahead transform
computing complexity reduction Fast computing algorithm, e.g. FFT in lieu of DFT
Relaxation in computation, e.g. norm 1 in lieu of norm 2
VLSI DSP 2008 Y.T.HWANG 1-29
Architecture Aspect (1)
Architecture classificationCustomized function unit
Array processor
Customized function unitfor fine grain level
parallelism
Exploit computing
concurrency within
each processing
iteration+ Slicer
R
A
B
1
64
Dual PortRam
64*12bits
LUT256*
10 bits
RAMRAM
RAM
R
R
R
ChannelEstimation
Coefficient
Update
MatchedFilter
Feed-forwardFilter
FeedbackFilter
y'
Cy
d
M
L
VLSI DSP 2008 Y.T.HWANG 1-30
Architecture Aspect (2)
Array processorSystolic architecture
For massive data parallelism
Exploit computing concurrency across processing iterations
Array processor vs. SIMD & MIMD srchitectures
c44c34c24c14
D
D
D
D
D
D
D
c43c33c23c13
D
DD
a43
D
D
D D
c41c31c21c11
c42c32c22c12
D
a32 a42
a21 a31 a41
D
D D D
D
DD
VLSI DSP 2008 Y.T.HWANG 1-31
Architecture Aspect (3)
VLSI array processorsparallel + pipelined processing
wide internal communication BW
locally connected
VLSI DSP 2008 Y.T.HWANG 1-32
VLSI DSP design flow
Application & functional spec.
algorithm
architecture
Circuit implementation
IP authoring
Spec. refinement
Algorithm simulation,Transform, simplification
Architecture mappingResource allocation,Scheduling, pipelining,retimingProcessor element designVerilog coding, Timing closure
Test bench,Behavioral model
VLSI DSP 2008 Y.T.HWANG 1-33
Algorithm to architecture mapping
MappingFrom algorithm domain to architecture domain
Resource allocation
Scheduling
Architecture refinementpipelined adv : 1.clock rate increase 2.power saving
parallel processing
multi-rate
retiming
VLSI DSP 2008 Y.T.HWANG 1-34
Performance gain at different levels
algorithm ( improvement > 10 times)Better algorithm (e.g. QR factorization vs. matrix inverse)
fast algorithm (e.g. FFT in lieu of DFT)
architecture (improvement about 1~2 times)pipelined
parallel processing
multi-rate
retiming
Circuit (improvement about 10%~30% )fast circuit design
improved technology
VLSI DSP 2008 Y.T.HWANG 1-35
Conclusion
Algorithm mapping is crucial !!
Good command of algorithms
Skills of Architecture designs
Bridging the gap between algorithm design and hardware implementation