Sub- Nyquist Sampling Continuous to Finite Module Orthogonal Matching Pursuit Block
description
Transcript of Sub- Nyquist Sampling Continuous to Finite Module Orthogonal Matching Pursuit Block
Sub-Nyquist SamplingContinuous to Finite Module
Orthogonal Matching Pursuit Block
• Supervisors: Inna Rivkin & Moshe Mishali• Winter 2009 – Spring 2010
Final Presentation – Part APerformed by : Yoni Smolin
2/8/2010
Sub-Nyquist Digital System
• On every clock cycle, the system solves
• How ?• Expand - computes y from input samples.• CTF - locates non zero elements of z.• DSP - computes their values.
Sampled data
ExpandX3 CTF DSP
ny Supp nz
12
4
Support
y
A
z
sparse
Sampling matrixzsupp
Asupp yzsupp
†SuppA
CTF
• The frame matrix - a basis for recovery:
CTF
Frameconstruction
Support recoveryy frame
matrixSupport
y1y2 y70
…y1H y2
H y70H Q
CTF
• Support recovery – applying OMP for Q:
CTF
Frameconstruction
Support recoveryy frame
matrixSupport
QAUQ
support
OMP – Algorithm (SBR2)
0
2
k i 1 2k
1
Q;i Supp Supp Supp
PSEUDO INVERSE
i Supp Q;i
i
1 initialize: Supp ,
2 loop: do (i'th iteration)
best arg max
Supp Supp best
ˆ
ˆ
while
R Q
a R
U A A A Q
R Q A U
R
2
i-1 FthresholdR
Matching
SolutionApproximation
ResidualUpdate
OMP – Adaptation for Hardware
0
2
k i 1 2k
2
i i-1 F
1 initialize: Supp ,
2 loop: do (i'th iteration)
best arg max
Supp Supp best
while threshold
R Q
a R
R R
Matching
OMP – Adaptation for Hardware
0
2Hk i 1 2k
1 i-1 best 1 i-1 iM.G.S.
NEW COLOLD BASIS
i i-1 i i
1 initialize: Supp ,
2 loop: do (i'th iteration)
best arg max
Supp Supp best
,..., , a ,..., ,
R Q
a R
R R
Hi-1
2
i i-1 Fwhile threshold
R
R R
Modified GramSchmidt
Matching
ResidualUpdate
1 OMP iteration
101 projections
1 M.G.S. iteration
update residual
12
12
2
212
12
12
12
12
OMP – Atomic Operations
Implementation Considerations
• CTF must fit in a single Stratix III FPGA.• Number representation: Fixed-point, Q1.16.• Fully parameterized design: N, L, m.• Guiding principle: Achieve best throughput.
Implying:• A pipelined datapath.• Parallel execution of vector-matrix multiplications.
CTF Block
OMPFrame
A memory
CTF
Merge
Block Diagram - OMP
Block Diagram - Datapath
SUB
2
2 keep
max support
1
stop ?
MAMU
Block Diagram - Datapath
SUB
2
2 keep
max support
1
stop ?
MAMU18bit 21bit
Matrix Multiplier (MAMU)
inputs outputs
1 2 … 12 inputs outputs1
Delay
Multiplication - 2 cycles
Projection - 5 cycles
Hardware consumption
DSP blocks – 64%
. . .
A single column
1
12
+
36 36
36
Inverse Square root
• Implementation alternatives:
• Remark: Input number must be > .
1
megafunction Logic DSP Delay representation accuracy
ALTFP_INV_SQRT 1,000 21 38 cycles at 400MHz
floating point(requires
conversions)─
DW_INV_SQRT 2,292 0 257 nSec fixed point 2-16
ALTSQRT+
LPM_DIVIDE1,256 0 134 nSec fixed point 2-16 (98%
of the time)
14
FSM Controller
OMP iteration - Matching
SUB
2
2 keep
max support
1
stop ?
MAMU
Pipelined calculation = 101 cycles
OMP iteration – Gram Schmidt
SUB
2
2 keep
max support
1
stop ?
MAMU
( i-1 projections ) + normalization
OMP iteration – Update Residual
SUB
2
2 keep
max support
1
stop ?
MAMU
Parllel calculation: stopping cond. & update R
Overflow
• In theory, overflow can occur only for:
→ Solution: Divide R by 2 & recalculate:
• Error detection flags: Overflow: MAMU, , SUB, .
Rsomevector
R/2somevector
2
2 1
Frame OMP
Synthesis Report
* Without debug environment
• Input text files are generated by Matlab.
• Output text files are evaluated in Excel.
Logical Simulation – Test Bench
OMPR memory
A memory
Q.txt
A.txt
supp.txt
Logical Simulation – Example
Simulation output:• Support: → → →
Iteration: 1st 2nd 3rd 4th 5th →
z9 9338 64
f
NYQ
1
2fNYQ
1
2f
Logical Simulation – Results
Type # signals
% exact recovery
mistaken elements
200 92% 1 (7.5%)3 (0.5%)
•Problem : recovery < 100%. Solutions:• Regular mode: CTF can be reinitiated until successful
recovery.• Iterative mode: Support merging can help overcome
mistaken elements.
FMFM
• Swap normal ↔ iterations modes on the fly.
• A must be normalized before system startup.
• Q’s calculation:
• Should be wide to avoid overflow.
→ transfer only 18 relevant bits to OMP.
• Can be accelerated by upsampling y.
• Wide mux (5,184 bits) can pose a latency problem.
• Stopping condition varies with input signal’s energy.
Challenges
Gannt