APPLIED SIGNAL PROCESSING AND IMPLEMENTATION (ASPI) Introduction for 7th semester Fall 2005 Embedded...
-
Upload
brock-bosworth -
Category
Documents
-
view
218 -
download
3
Transcript of APPLIED SIGNAL PROCESSING AND IMPLEMENTATION (ASPI) Introduction for 7th semester Fall 2005 Embedded...
APPLIED SIGNAL PROCESSING AND IMPLEMENTATION
(ASPI)
Introduction for 7th semesterFall 2005
Embedded Systems group: pk, yml, abo, ssc, jmk, dlc, rab, oo
Dicom group: kjh, pr, uh, ....
2ASPI Introduction
Outline
1. Rationale for ASPI2. Basic ASPI Model (A3)3. Trends: S8 -> S9 -> S104. Course structure 5. Project examples: S8 – S9/S106. Lab facilities7. Demonstrations 8. Conclusion
3ASPI Introduction
Rationale for ASPI/1
Embedded System:• a collection of heterogeneous parts• subject to stringent design constraint such as ...
4ASPI Introduction
Rationale for ASPI/2
Embedded Systems
Nokia 7710
From To
5ASPI Introduction
Rationale for ASPI/3
Shannon Beats Moore’s Law and Energy Plays a Major Role
1
10
100
1000
10000
100000
1000000
10000000
Processor Performance (~Moore’s Law)
Battery Capacity
Source: Jan Rabaey, Summer Course, 2000
Algorithmic Complexity(Shannon’s Law)
1G
2G
3G
6ASPI Introduction
Basic ASPI Model (A3)
ApplicationsApplications
AlgorithmsAlgorithms
ArchitecturesArchitectures
For each application => many candidate algorithms
For each algorithm => many implementation architectures =>
Large no. of solutions => Large Design Space
=> ASPI challenge
Equalizer
FIR/IIR
DSP/FPGA
7ASPI Introduction
FPGA
FPGA components:
1. Dedicated I/O blocks
2. Programmable LogicArrayBlocks (LAB)- combinatorial / seqential circuits- routing resources
3. Dedicated blocks- RAM blocks- multipliers- processors (ARM/PowerPC)
4. Development tools
8ASPI Introduction
FPGA
9ASPI Introduction
ASPI Design Principle
Serial Parallel
Transform a serial specification into a combination of:
• Serial, parallel and pipelined units
That satifies the design constraints: Area, Time => Power
Pipelined
10ASPI Introduction
Trends: S8 -> S9 -> S10
Application: Non-Linear Signal Processing/Mobile Communication 1.1. Algorithm selectionAlgorithm selection2.2. SimulationSimulation3. Architecture selection and mapping
Example later
ApplicationsApplications
AlgorithmsAlgorithms
ArchitecturesArchitectures
2
3
1
11ASPI Introduction
Solution Space for FIR filter
14892
1289011962
20652065 2065
504 504 924
31
45
4552
5256
76
95
52
0
2000
4000
6000
8000
10000
12000
14000
16000
Optimizations
Cyc
le C
ou
nt
0
10
20
30
40
50
60
70
80
90
100
Co
de
Siz
e
Cycle Count
Code SizeCompiler optimiserC code modifications
Compiler optimization
12ASPI Introduction
Trends: S8 -> S9 -> S10
Application: Non-Linear Signal Processing/Mobile Communication
• Algorithm selection• Simulation• Architecture selection and modelling• Design Space Exploration• HW/SW Co-Design
ApplicationsApplications
AlgorithmsAlgorithms
ArchitecturesArchitectures
2
34 5
1
13ASPI Introduction
Design Space Exploration
Constraints: Area, Time => Power = Area*fclock
Area
TimeTmax
Amax
Possible solutions (A*T ~ K)
14ASPI Introduction
HW/SW Co-Design
15ASPI Introduction
Trends: S8 -> S9 -> S10
ApplicationsApplications
AlgorithmsAlgorithms
ArchitecturesArchitectures
• Implementing a complete design trajectory • With solutions where properties satisfies
constraints
Constraints
Properties
16ASPI Introduction
ASPI Course Structure
Design Methodology8.sem 9.Sem
Algorithm analysis
HW compilers
HW Platform analysisSW Platform analysis
SW compilers
Design Space Expoloration
17ASPI Introduction
8th Semester Courses
F8-1FP8-12FP8-9FP8-13
Engineering ResponsibilitiesHigher Order Statistical AnalysisJoint Time Frequency AnalysisDSP Algorithms and Architectures
1 ECTS1 ECTS1 ECTS1 ECTS
SESESESE
FP8-16FP8-19FP8-18
Adaptive SystemsInverse Filtering and DeconvolutionMultidimensional Signal Processing
2 ECTS1 ECTS1 ECTS
PEPEPE
ASPI8-4FP8-17
DSP Design Methodology
Software Programmable Platform Analysis
0.6 ECTS1.4 ECTS
PEPE
Project 20 ECTS
18ASPI Introduction
9th Semester Courses
FP9-2 Discrete-Time Kalman Filtering 2 ECTS SE
ASPI9-2AASPI9-2BASPI9-3ASPI9-4Mob9-2
HW/SW CoDesignHW Platform Analysis, Comp. & Optim.Non-linear Signal ProcessingNeural NetworksRadio Communication III
2 ECTS2 ECTS1 ECTS1 ECTS1.4 ECTS
PEPEPEPEEL
Project 22 ECTS
EL : ELective Course
19ASPI Introduction
Technology
Simulation tools / Language:• Matlab/M• Ptolemy/(M)any• Design Trotter/C
Processors / Language:• ARM/ C++, ASM• TI 320-6413/C++, ASM • Blackfin/ C++, ASM• Microblaze/ C++, ASM• NIOS/ C++, ASM
Programmable Logic:• Xilinx FPGA/ Handel-C• Altera FPGA/ Handel-C
20ASPI Introduction
Technology
Lab facilities
Celoxica RC203 board Xilinx Virtex FPGA
21ASPI Introduction
Technology
Lab facilities
Altera Stratix board Altera Stratix FPGA
22ASPI Introduction
Technology
Lab facilities
Analog Devices Blackfin board Analog Devices Blackfin DSP
23ASPI Introduction
Project Examples: S8/S9/S10
1. S8 Noise Suppression in Speech
2. S9 FPGA implementation of a JPEG 2000 encoder/decoder
3. Reed Solomon Decoder for DVB-H
Most projects involves external contacts in other research groups or companies
Noise Suppression in Speech
ASPI 8, Gruppe 840Søren Birk Sørensen
Andreas PoppMichael Smed Kristensen
25ASPI Introduction
Agenda
ApplikationSystemoversigt
AlgoritmePrincip i algoritmeResultater
ArkitekturImplementation
26ASPI Introduction
Systemoversigt
KravForbedring af taleforståelighedForbedring af signal-støj-forhold (SNR)Acceptabel forsinkelse i systemet (latenstid)
27ASPI Introduction
Princip
28ASPI Introduction
Resultater
SNR ikke væsentligt forbedretTaleforståelse: Fra ”Very poor” til ”Good”Latenstid: 35 ms
29ASPI Introduction
Implementation
Dele af algoritmen blev implementeret på et TI TMS320C6713 udviklingsboard
Floating pointVarierende pipeline dybde8 instruktioner i parallel
Analysere resultat af compileringEfterfølgende optimering
30ASPI Introduction
Foretagede optimeringer
EksekveringstidAnden algoritme til autokorrelationsberegning
Loop unrolling giver mere parallelitet
Informere kompiler om dataafhængighedUdnyttelse af pipeline
Anden divisionsberegningKortere eksekveringstid
31ASPI Introduction
Resultat af optimering
Autokorrelationsberegning24096 cycles 2624 cycles153% mere end estimeret minimum antal cycles
Levinson funktion3842 cycles 1122 cycles26% mere end estimeret minimum antal cycles
9th semester project example
”FPGA implementation of a JPEG 2000
encoder/decoder”
Motivation
• JPEG2000 is up to six times more complex to implement than JPEG
• 2 complex DSP algorithms at the heart of JPEG2000• Discrete Wavelet Transform (DWT)• Embedded Block Coding with Optimized Truncation (EBCOT)
• FPGAs provide the ability to accelerate arithmetic operations via parallel processing
FPGA implementation of a JPEG2000 encoder/decoder
JPEG2K Block diagram (encoder)
Project flow
• Analysis of reference C-code• processing analysis (search for potential parallelism)• memory analysis (memory requirements)
• Sketch an architecture based on the analysis (architectural exploration)
• FPGA implementation • Handel-C language to describe the architecture• Handel-C to FPGA (Celoxica Design-suite)• Analysis -> architectural refinement
FPGA implementation of a JPEG 2000 encoder/decoder
35ASPI Introduction
Application:
• from DVB-T to DVB-H• FEC: RS(n,k,t) => RS(255, 191, 64)
• Constraints:• Frame size: upto 2 MB• Data rate: 2 MB/S• Time constraint: ASAP
S10 Project: Reed-Solomon Decoder
Dat
a
Parit
y
Dat
a
Nokia 7710
36ASPI Introduction
S10 Project: Reed-Solomon Decoder
Complexity:
• Execution on ARM: 22 min/2MB frame
37ASPI Introduction
S10 Project: Reed-Solomon Decoder
Algorithm:
• Galois field arithmetic GF(28)• Data: 8 bit bytes• operators: binary +, *, not• Properties:
• no carry, overflow or rounding error =>• bitwise operations In parallel• Short critical path (delay) => high clock rate
• Identification of parallelism• coarse grain @ function level• fine grain @ operations level
38ASPI Introduction
S10 Project: Reed-Solomon Decoder
Results:
• Execution on ARM: 22 min/2MB frame• Parallelism: the error locator and the evaluator polynomial can be computed concurrently• Reusable DataPath: Syndrome computation, Chien Search, polynomial evaluation and error correction can be performed on the same parallel DataPath
39ASPI Introduction
S10 Project: Reed-Solomon Decoder
Results:
• DataPath: 65 8 bit blocks•
• Design Space Exploration:
40ASPI Introduction
S10 Project: Reed-Solomon Decoder (DSE)
41ASPI Introduction
Conclusion
ASPI salient features:• based on Models and Methods• application independent but also• application related• encompasses new technologies and tools• driven by current research projects• local & global industry cooperation
Any questions - before student presentation continues
42ASPI Introduction
Reklame
Min A3 'opdragelse' er kommet rigtig til gavn – vi veksler frem og tilbage mellem applikation, algoritme og arkitektur noejagtig som vi gjorde i de gode gamle dage i VLSI gruppen.
Desvaerre faar vi ikke gjort meget ved aritmetikken – syntese vaerktoejerne kommer med meget effektive modulgeneratorer for multipliers, adders etc. – og I den 0.18u teknologi vi arbejder i er de mere end rigeligt hurtige.
Saa aritmetikken er mere en del af min baggrund for at forstaa hvad modul generatorerne spytter ud - og hvordan vi bedst udnytter dem.(Og dog - det lysner - jeg skal til at designe en divider for naeste generation IC !-)
Uddrag af e-mail fra: Jack Andersen <[email protected]>
43ASPI Introduction
ASPI Home Page, Staff etc
Home Page:http://kom.aau.dk/~dsp/aspi-05/sites/default/ Secretary:
Dorthe Sparre, NJV12 A5-214, Tlf. 9635 8616, [email protected]
Staff:Peter Koch, Yannick LeMoullec, Ole OlsenDaniel Lázaro Cuadrado, Anders B. Olsen, Jesper Michael Kristensen, Søren Skovgaard Christensen, Rasmus Abildgren
Location:Offices: B1-208, -211, -213, NJV12 A5-207Lab: NJ14 3-015Students: A6-108