Conventional and Unconventional Monetary Policy in a DSGE ...
Conventional & Unconventional Applications of FPGA
description
Transcript of Conventional & Unconventional Applications of FPGA
Conventional & Unconventional Applications of FPGA
Wu, Jinyuan
Fermilab
Oct, 2010
Fermi National Accelerator Laboratory
Colliding Experiments
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 4
Introduction There is no clear distinction between conventional
and unconventional applications of FPGA. The range of FPGA application is very likely to
be broader than we can image. The outline of this talk:
The Starting Point Topics on Averages Using FPGA as ADC TDC Implemented with FPGA Serial Communication Between FPGA Devices Doublet Finding in Trigger System Triplet Finding in Trigger System
The Starting Point
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 5
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 6
Cares Must Be Taken Outside FPGA (1)
FPGA
ADCShaperLP Filter
BandLimiting
Spectrum ofOriginal Signal
LP filter
ADC Input
SamplingIn ADC
Aliasing w/oLP Filtering
Nyquist Frequency <(1/2) Sampling Frequency
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 7
The “Trend” vs. The Sampling Theorem
There will be no hardware analog
processing. Everything is done
digitally in software.
It sounds very stylish
A shaper/low-pass filter is a minimum requirement.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 8
Cares Must Be Taken Outside FPGA (2)
FPGA
ADCShaperLP Filter
n Dither
51
52
53
54
0 50 100 150
Sampling Index
ADC
Signal Signal+Noise ADC(signal+noise) Weighted Average Threshold
51
52
53
54
0 50 100 150
Sampling Index
ADC
Signal ADC(signal) Threshold
Resolution finer than the ADC LSB can be achieved by adding noise at ADC input and digital filtering.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 9
Adding Noise for Finer Resolution
Photo Credit: www.telegraph.co.uk, trinities.org
Mechanical pressure gauges usually do not track small pressure changes well.
The gauge readers may lightly tap the gauges to get more accurate reading.
The idea of dithering at ADC input is similar.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 10
Why Band Limiting & Dithering are Ignored? Pre-amplifiers usually have a naturally limited
bandwidth and an intrinsic noise larger than the LSB of the ADC.
So a lot of time, band limiting and dithering can be “safely” ignored since they are satisfied automatically.
High bandwidth, low noise devices now become easily accessible. A design can be too fast and too quiet.
Do not forget to review the band limiting and dithering requirements for each design.
Topics on Averages
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 11
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 12
From Sum to Average
When N=2^k values are summed, word length +k bits. These k bits are fractional bits of the average.
+)
Integer Fractional
4
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 13
Gain of Measurement Precision
When N independent measured values are averaged, precision improves by a factor of 1/sqrt(N).
If N=2^k, precision gain k/2 bits, (not k bits).
64
+)
Integer Fractional
N Word Length
Precision Gain
4 +2 bits +1 bits
16 +4 bits +2 bits
64 +6 bits +3 bits
256 +8 bits +4 bits
€
σN =σ1
N
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 14
Weighted Averages The weighted
average is a special case of inner product.
Multipliers are usually needed.
y1y2y3y4y5y6y7
∑
∑
∑
=
=
=
iii
iii
iii
ye
ydh
ycy
η
0
c1
c2
c3
c4
c5
c6
c7
d1
d2
d3
d4
d5
d6
d7
e1
e2
e3
e4
e5
e6
e7
X
X
X
€
c ii
∑ =1
dii
∑ =1
eii
∑ =1
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 15
Exponentially Weighted Average No multipliers are
needed. The average is
available at any time.
It can be used to track pedestal of the input signals.
s[n]=s[n-1]+(x[n]-s[n-1])/NN=2, 4, 8, 16, 32, …
0
2
4
6
8
0 50 100 150 200 250 300 350 400 450 500
N(t) S(t) Y=S+N Ped32 Ped64 Ped128
+s[n-1]
x[n] -s[n]
1/2K
1/2K
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 16
Sliding Sum/Sliding Average
For each input point, a sliding sum is computed. It is preferable to implement sliding sum in recursive fashion. Recursive implementation uses much less resources.
x[n]
s[n]
+
s[n]
-x[n-K]
x[n]
€
sn = x ii=n −k −1
n
∑
€
sn = sn −1 + xn − xn −k
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 17
Sliding Average as a Low-Pass filter
Sliding average removes high frequency random noise.
+
s[n]
-x[n-K]
x[n]
The CIC Filters
SlidingSum
Cascaded IntegratorComb (CIC) Sum of 2nd Order
∑−−
=
⋅=)1(
][1][Km
mk
kxms
∑−−
=
⋅=)12(
][][][Kn
nk
kxkhny
• The CIC-2 filter is a weighted average.
• Sliding Sum = CIC-1 sum.
• The frequency response of CIC-2 sum is a sinc2(x) function that has 2nd order zeros and better stop band suppression.
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
x
sinc(x)
sinc^2(x)
The Zero(e.g. 360Hz)
FrequencyOct. 2010, Wu Jinyuan, Fermilab [email protected] 18
Conventional & Unconventional Applications of FPGA
The CIC-2 Sum as a Low-Pass Filter
SlidingSum
CIC-2Sum
Oct. 2010, Wu Jinyuan, Fermilab [email protected] 19
Conventional & Unconventional Applications of FPGA
+
s[n]
-x[n-K]
x[n]
+y[n]
-s[n-K]
+u[n]
-2x[n-K]
x[n]
+y[n]
x[n-2K]
• No Multipliers are needed to implement the CIC-2 sum.
Using FPGA as ADC
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 20
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 21
The Single Slope ADC
SignalSource
FPGA
VREF
ShaperLine
Driver
Shaper
Shaper
Shaper
ADC
ADC
ADC
ADC
Shaper
FPGA
TDC
LineDriver
LineDriver
LineDriver
SignalSource
Shaper TDC
Shaper TDC
Shaper TDC
Analog signal of each channel from the shaper is fed to a comparator and compared with a common ramping reference voltage VREF.
Pulses, rather than analog signals are transmitted on the cable.
The times of transitions representing input voltage values are digitized by TDC blocks inside FPGA.
This approach sometimes is (mistakenly) refereed as “Wilkinson ADC”.
T1
V1
T2
V2
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 22
Consider sampling rate at 2 MHz, the whole ramping cycle is 500 ns. Arrange 409.6 ns for upward ramping. To achieve 12-bit ADC precision, the TDC LSB is (409.6 ns)/4096 = 100 ps. TDC with 100 ps LSB can be comfortably implemented in FPGA today.
TDC Resolution Requirement
T1
V1
T2
V2
500 ns
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 23
Typical ADC devices creates noise that may interfere the analog circuits. The time interval for resetting of the common reference voltage may be
noisy but analog signal is not sampled during it. There is no digital control activities during ramping up of the common
reference voltage.
Digital Noise During Digitization
T1
V1 T2
V2
Noisy NoisyClean Clean
ADCShaper
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 24
Single Slope ADC Test: Waveform Digitization
1
1.5
2
2.5
2500 3000 3500 4000 4500 5000 5500
t(ns)
Leading Ramp Trailing Ramp
0
8
16
24
32
40
48
56
64
0 32 64 96 128 160 192 224 256
Leading Ramp Trailing Ramp
RawData
Input Waveform, Overlap Trigger& Reference Voltage
Calibrated
FPGA
TDC
TDC
50 50
1000pF
100
VREF
Shown here is a demo of a 6-bit single slope TDC.
Sampling rate in this test is 22 MHz.
Both leading and trailing reference ramps are used in this example.
Nonlinear reference ramping is OK. The measurement can be calibrated.
TDC Implemented with FPGA
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 25
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 26
Multi-Sampling TDC FPGA c0
c90
c180
c270
c0
MultipleSampling
ClockDomain
Changing
Trans. Detection& Encode
Q0
Q1
Q2
Q3QF
QE
QD
c90
Coarse TimeCounter
DV
T0T1
TS
Ultra low-cost: 48 channels in $18.27 EP2C5Q208C7.
Sampling rate: 360 MHz x4 phases = 1.44 GHz.
LSB = 0.69 ns.
4Ch
Logic elements with non-critical timing are freely placed by the fitter of the compiler.
This picture represent a placement in Cyclone FPGA
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 27
TDC Using FPGA Logic Chain Delay
This scheme uses current FPGA technology
Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)
Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).
IN
CLK
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 28
Two Major Issues In a Free Operating FPGA
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
width (ps)
1. Widths of bins are different and varies with supply voltage and temperature.
2. Some bins are ultra-wide due to LAB boundary crossing
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 29
Digital Calibration Using Twice-Recording Method
IN
CLK
Use longer delay line. Some signals may be
registered twice at two consecutive clock edges.
N2-N1=(1/f)/ t
The two measurements can be used: to calibrate the delay. to reduce digitization errors.
1/f: Clock Periodt: Average Bin Width
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 30
TDC Output at Different PS Voltage
0
5
10
15
20
25
1.5 2 2.5
VCCINT (V)
TDC Outputs
N1
n2
TDC Output at Different PS Voltage
0
5
10
15
20
25
1.5 2 2.5
VCCINT (V)
TDC Outputs
N1
n2
Tc
Digital Calibration Result Power supply voltage
changes from 2.5 V to 1.8 V, (about the same as 100 oC to 0 oC).
Delay speed changes by 30%.
The difference of the two TDC numbers reflects delay speed.
N2
N1Corrected Time
)()(
0112
01 NNL
T
NN
NNTTc +=
−+
=
Warning: the calibration is based on average bin width, not bin-by-bin widths.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 31
0
500
1000
1500
2000
2500
0 16 32 48 64
bin
Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at
certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update
LUT every 16K events)
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
DNLHistogram
In (bin)LUT
Out (ps)
16KEvents
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 32
Good, However
Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
width (ps)
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 33
Cell Delay-Based TDC + Wave Union Launcher
Wave UnionLauncher
In
CLK
The wave union launcher creates multiple logic transitions after receiving a input logic step.
The wave union launchers can be classified into two types:
Finite Step Response (FSR) Infinite Step Response (ISR)
This is similar as filter or other linear system classifications:
Finite Impulse Response (FIR) Infinite Impulse Response (IIR)
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 34
Wave Union Launcher A (FSR Type)
In
CLK
1: Unleash0: HoldWave UnionLauncher A
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 35
Wave Union Launcher A: 2 Measurements/hit
1: Unleash
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 36
Sub-dividing Ultra-wide Bins
1: Unleash
1
2
1
2
Device: EP2C8T144C6 Plain TDC:
Max. bin width: 160 ps. Average bin width: 60 ps.
Wave Union TDC A: Max. bin width: 65 ps. Average bin width: 30 ps.
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64 80 96 112 128bin
width (ps)
Plain TDC
Wave Union TDC A
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 37
FPGA TDC A possible choice of the TDC can
be a delay line based architecture called the Wave Union TDC implemented in FPGA.
Shown here is an ASIC-like implementation in a 144-pin device.
18 Channels (16 regular channels + 2 timing reference channels).
This FPGA cost $28, $1.75/channel. (AD9222: $5.06/channel)
LSB ~ 60 ps. RMS resolution < 25 ps. Power consumption 1.3W, or 81
mW/channel. (AD9222: 90 mW/channel)
In
CLK
Wave UnionLauncher A
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 38
More Measurements
Two measurements are better than one. Let’s try 16 measurements?
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 39
Wave Union Launcher B (ISR Type)
Wave UnionLauncher B
In
CLK
1: Oscillate0: Hold
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 40
Wave Union Launcher B: 16 Measurements/hit
1 Hit16 Measurements@ 400 MHz
VCCINT=1.20V
VCCINT=1.18V
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 41
Delay Correction
0
500
1000
1500
2000
2500
3000
0 4 8 12 16
m
16
32
48
64
0 2 4 6 8 10 12 14 16
m
Delay Correction Process: Raw hits TN(m) in bins are first calibrated into
TM(m) in picoseconds. Jumps are compensated for in FPGA so that
TM(m) become T0(m) which have a same value for each hit.
Take average of T0(m) to get better resolution.
The raw data contains: U-Type Jumps: [48-63][16-31] V-Type Jumps: other small jumps. W-Type Jumps: [16-31][48-63]
∑=
=15
000 )(
16
1
mav mtt
The processes are all done in FPGA.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 42
The Test Module
Two NIM inputs
FPGA with 8ch TDC
Data Output via Ethernet
BNC Adapter to add delay @
150ps step.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 43
Test ResultNIM Inputs
0 1 2
RMS 10ps
LeCroy 429ANIM Fan-out
NIM/LVDS
NIM/LVDS
-
140ps
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
+
+BNC adapters to add delays @ 140ps step.
Wave Union?
Photograph: Qi Ji, 2010Oct. 2010, Wu Jinyuan, Fermilab [email protected] 44
Conventional & Unconventional Applications of FPGA
Coincidence in Trigger System
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 45
Parameters in Coincidence Finding
DiscEdge
DetectingDelay
PulseStretch
CoincidenceLogic
Disc DelayPulse
Stretch
Sampling
EdgeDetecting
Sampling
Some Details
Disc
EdgeDetecting Delay
PulseStretch
Sampling
Doublet Finding in Trigger System
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 48
DiscEdge
DetectingDelay
PulseStretch
CoincidenceLogic
Disc DelayPulse
Stretch
Sampling
EdgeDetecting
Sampling
Hit MatchingSoftware FPGA
Typical
FPGA Resource Saving Approaches
O(n2)for(){
for(){…}
}
O(n)*O(N)Comparator
Array
Hash Sorter
O(n)*O(N): in RAM
O(n3)for(){
for(){
for(){…}
}
}
O(n)*O(N2)CAM,
Hugh Trans.
Tiny Triplet Finder
O(n)*O(N*logN)
O(n4)for(){ for(){
for(){ for()
{…}
}}}
Example of Doublet Match, PET
Positrons and electrons annihilate to produce pairs of photons. The back-to-back photons hit the detector at nearly the same time.
Detector hits are digitized and hits at nearly the same time are to be matched together.
The process takes O(n^2) clock cycles.
T
D
T
D
Group 1
Group 2-
T<A?
T>(-A)?
Hash Sorter
K
K
D
K
D
Pass 1: Data in Group 1 are
stored in the hash sorter bins based on key number K.
Pass 2: Data in Group 2 are
fetched though and paired up with corresponding Group 1 data with same key number K.
Group 1
Group 2
The entire pairing process
takes 2n clock cycles, rather
than n2 clock cycles.
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 52
Conclusion Many things can be done in FPGA beyond our
imagination.
The End
Thanks
Delay Line Based TDC Architectures
HIT
CLK
HIT
CLK
HIT
CLK
HIT
CLK
Delay Hit Delay CLK Delay Both
CLK is used as clock
HIT is used as clock
Only this architecture needs dual coarse time counters.
Digital Phase Follower
c0
c90
c180
c270
c0In
MultipleSampling
ClockDomain
Changing
b0
b1
FrameDetection
DataOut
Tri-speedShift
Register
Shift2
Shift0
was3is0
SEL
was0is3
Trans.Detection
Q0
Q1
Q2
Q3QF
QE
QD
The input data rate is 1bit/clock cycle. Four clock phases, c0, c90, c180 and c270 are used to detect input transition edge. The phase for data sample follows the variation of the transition edge.
Schematics of Digital Phase Follower
EE[3..0]OUTPUT
C1OUTPUT
C0OUTPUT
PQQ[11..0]OUTPUT
DS5B[4..0]OUTPUT
BBOUTPUT
JMPOUTPUT
ENOUTPUT
IN1
CLK0
CLK90
CLK180
CLK270
EN
QQ[11..0]
BT
JMP
WTN
EE[3..0]
phtrk1
inst3
BB
BX
JMP
EN
CLK
Q[4..0]
C1
C0
DS5B
inst
GND
D[4..0]
C1
C0
CLK
M[23..20]
Q[27..0]
QQ[23..0]
DV
S[1..0]
ERR
Word24_13z
inst9
CLK0
VCCIN1 INPUT
VCCCLK0 INPUT
VCCCLK90 INPUT
VCCCLK180 INPUT
VCCCLK270 INPUT
EE[3..0]OUTPUT
QQ[11..0]OUTPUT
JMPOUTPUT
WTNOUTPUT
BTOUTPUT
CLRN
DPRN
Q
DFF
inst3
CLRN
DPRN
Q
DFF
inst4
CLRN
DPRN
Q
DFF
inst5
CLRN
DPRN
Q
DFF
inst6
CLRN
DPRN
Q
DFF
inst9
CLRN
DPRN
Q
DFF
inst10
CLRN
DPRN
Q
DFF
inst11
CLRN
DPRN
Q
DFF
inst12
NOT
inst27
AND4
inst29
PRN
CLRN
D
ENA
Q
DFFE
inst19CLRN
DPRN
Q
DFF
inst26
CLRN
DPRN
Q
DFF
inst21CLRN
DPRN
Q
DFF
inst24
OR4
inst8
AND2
inst13
AND2
inst14
AND2
inst15
AND2
inst16
CLRN
DPRN
Q
DFF
inst25
AND2
inst1
NAND2
inst2
CLRN
DPRN
Q
DFF
inst28
CLRN
DPRN
Q
DFF
inst30
CLRN
DPRN
Q
DFF
inst31
OR4
inst
CLRN
DPRN
Q
DFF
inst32
OR4
inst18
OR4
inst20
up countersclr
clockq[6..0]
lpm _counter1
inst7
QA[3]
QA[2]
QA[1]
QA[0]
CLK0
CLK90
CLK180
CLK270 CLK90
QQ[3]
QQ[2]
QQ[1]
QQ[0]
CLK0
QQN[6..3]
QQN[5..2]
QQ[4..1]
QQ[3..0]
AD[3..0]
QQ[7..0] QQN[7..0]
CLK0
QQ[3..0] QQ[7..4]
CLK0 CLK0
QQ[7..4] QQ[11..8]
EE[3]
EE[2]
EE[1]
EE[0]
QQ[11]
QQ[10]
QQ[9]
CLK0
QQ[8]
CLK0
CLK0
ADQ[0]
EE[3]
ADQ[3]
EE[0]
CLK0
AD[3]
CLK0
ADQ[3..0]
ADQ[1]
ADQ[0]AD[2]
CLK0
CLK0
ADQ[3]
ADQ[2]
ADQ[1]
ADQ[0] QCNT[6..0]
QCNT[6]
VCCBB INPUT
VCCBX INPUT
VCCJMP INPUT
VCCEN INPUT
VCCCLK INPUT
C1OUTPUT
C0OUTPUT
Q[4..0]OUTPUTdata1x[4..0]
data0x[4..0]
sel
result[4..0]
lpm_mux4
inst
PRN
CLRN
D
ENA
Q
DFFE
inst5
OR2
inst9
XOR
inst10
XOR
inst11
NOT
inst12
PRN
CLRN
D
ENA
Q
DFFE
inst6
PRN
CLRN
D
ENA
Q
DFFE
inst7
Q[2..0],BB,BX
Q[3..0],BBD[4..0] Q[4..0]
EN
CLK
CLK
EN
CLK
EN
JMP
CLK: 375MHz Data Rate:
375Mbits/s
The Two-Cycle Serial IO
This scheme is slower than digital phase follower but the logic is simpler. The CLK1 and CLK2 can be generated with two free running crystal oscillators.
CLK1
Data Out
Transmitter
Receiver
start bit = 1 b15 b14
b15start bit = 1 X b14X
CLK2
Data In
One data bit is transmitted every 2 clock cycles.
A logic transition is detected between these two falling edges.
Input data are stable at these clock edges.
Schematics of the Two-Cycle Serial IO
VCCCK200 INPUT
VCCDD[15..0] INPUT
VCCDRDY INPUT
VCCSDIN INPUT
VCCDV INPUT
VCCCK100 INPUT
QQ[15..0]OUTPUT
SDOUTOUTPUT
POPCMDOUTPUT
QQOKOUTPUT
VCC
GND
CLRN
DPRN
Q
DFF
inst4
up counterm odulus 36sclr
clockq[5..0]
cout
lpm _counterS2
inst3
CLRN
DPRN
Q
DFF
inst7
NOT
inst9
OR2
inst10NOT
inst11
NOT
inst12
CLRN
DPRN
Q
DFF
inst13
CLRN
DPRN
Q
DFF
inst14
NOT
inst16
CLRN
DPRN
Q
DFF
inst18
CLRN
DPRN
Q
DFF
inst19
AND4
inst20
NOT
inst17 up countersset 32sset
clock
cnt_en
q[5..0]
lpm _counterS4
inst2
AND6
inst22CLRN
DPRN
Q
DFF
inst23
left shiftload
data[16..0]
clock
enable
shiftin
shiftout
lpm _shiftregS1
inst
left shiftclock
enable
shiftinq[15..0]
lpm _shiftregS5
inst21
PRN
CLRN
D
ENA
Q
DFFE
inst1
CLRN
DPRN
Q
DFF
inst5
CLRN
DPRN
Q
DFF
inst24
OR2
inst15
vvv[31..0]
zzz[31..0]
CK200
CK200
DRDY
vvv[16],DD[15..0]
DV
ENA1
zzz[0]
CK200
ENA1
ENA1DV
CK200
CK200 CK200N
CK200
SEQ[0]
CK200
SEQ[0]
SEQ[5]
SEQ[4]
SEQ[3]
SEQ[2]
SEQ[1]
SDINQ
CK200N
SDIN
CK200N
CK200
CK200
SEQ[5..0]
SEQ[5]
SEQ[5]
CK200
SDIN SDINQ
CK200
CK100
CK200
434241403938373635343332
SDIN
SEQ
SDINQ
SD15 SD14 SD13 SD12 SD11 SD10
SD15 SD15,14 SD15..13 SD15..12
SSET
ENAS=SEQ[0]
SDIN1NQ
SDIN2NQ
CK200
CLK: 200MHz Data Rate: 100Mbits/s
The FM coding
A bit is transmitted in two unit time intervals, usually in two internal clock cycles at frequency f.
For bit=1, the output toggles each cycle, i.e., with frequency (f/2) and for bit=0, the output toggles every two cycles, i.e., with frequency (f/4).
When not transmitting data, the output toggles at frequency (f/4), until seeing the start bit. The data stream is naturally DC balanced suitable for AC coupled transmission. The polarity of the interconnection doesn’t matter.
0 start bit = 1 0 0 1 1
Schematics of FM Decoder
VCCCK212 INPUT
VCCINA INPUT
DVOUTPUT
DQ[17..0]OUTPUT
PQOUTPUT
CLRN
DPRN
Q
DFF
inst CLRN
DPRN
Q
DFF
inst2
CLRN
DPRN
Q
DFF
inst3
XOR
inst4
up countersset 8sset
clock
cnt_en
q[3..0]
lpm _counter1
inst5
data[2..0]
eq0
eq1
eq2
eq3
eq4
eq5
eq6
eq7
lpm _decode0
inst6
AND2
inst8
NOT
inst10
data[2..0]
eq0
eq1
eq2
eq3
eq4
eq5
eq6
eq7
lpm _decode0
inst11
up countersset 360sset
clock
cnt_en
q[8..0]
lpm _counter4
inst7
PRN
CLRN
D
ENA
Q
DFFE
inst1
NOT
inst9
AND6
inst12CLRN
DPRN
Q
DFF
inst13
CK212
CK212
CK212
INATOG
CK212
INATOG
TOGCNT[3..0]
TOGCNT[3]
INAQ
TOGCNT[2..0]
INAis0x
CK212
CNTSHFT
SSETFCNTSSETFCNTINAis0x
CNTSHFT
CNTSHFT,BitCNT[4..0],BTK[2..0] BTK[2..0]
OKSam ple
CK212
DQ[17..0],PQ
DD
OKSam ple
BitCNT[4]
OKSam ple
BitCNT[3]
BitCNT[2]
BitCNT[1]
BitCNT[0]CK212
DQ[16..0],PQ,DD
TOGCNT[2]
0 0
INAQ
INATOG
TOGCNT[2..0] 1 2 3 1 2 3 0 1 2 3 0 01 2 3 1 2 34 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
SSETFCNT
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7BTK
CNTSHFT
OKSample
BitCNT 13 14
0 1 2 3 4 5 6 7
... 31
DV
DQ[17] DQ[16] DQ[0] PQ
Logic 0: INA:13.25MHz or 8xCK212
BitCNT: 13..31, Init to 13x8+256=260
CLK: 212MHz Data Rate: 26.5Mbits/s The ratio 8 CLK cycles/bit in this design is not an intrinsic limit.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-1 0 1 2 3 4 5 6
The Clock-Command Combined Carrier Coding (C5)
A data train contains 5 pulses and each pulse is transmitted in four unit time intervals, usually in four internal clock cycles at frequency f.
Information is carried with wide, normal and narrow pulses and the first pulse is always wide or narrow.
When not transmitting data, all pulses have normal width. The data stream is DC balanced over 5 pulses suitable for AC coupled transmission. All leading edges are evenly spread so that the pulse train can be used directly drive the
receiver side logic or PLL.
Schematics of C5 Decoder
VCCCC INPUT
VCCT38 INPUT
VCCT58 INPUT
CmdValidOUTPUT
CmdBit[3..0]OUTPUT
Y[0..4]OUTPUT
NOT
inst
CLRN
DPRN
Q
DFF
inst3
CLRN
DPRN
Q
DFF
inst4
NAND2inst6
CLRN
DPRN
Q
DFF
inst7
CLRN
DPRN
Q
DFF
inst8
NAND2inst9
CLRN
DPRN
Q
DFF
inst10
CLRN
DPRN
Q
DFF
inst11
NAND2inst12
CLRN
DPRN
Q
DFF
inst13
CLRN
DPRN
Q
DFF
inst14
NAND2inst15
CLRN
DPRN
Q
DFF
inst16
CLRN
DPRN
Q
DFF
inst17
CLRN
DPRN
Q
DFF
inst18
NOT
inst19
AND2
inst20
DFFdata[3..0]
clock
enableq[3..0]
lpm _dff0
inst22
up counterm odulus 5sclr
clockq[3..0]
cout
lpm _counter0
inst27
BAND4
inst1
CLRN
DPRN
Q
DFF
inst21CLRN
DPRN
Q
DFF
inst23
Y[0]
Cm dBit[3..0]
Y[0..3]
Y[1]
Y[2]
Y[3]
Y[4]
VCCCC INPUT
VCCC40 INPUT
T38OUTPUT
T58OUTPUT
CLRN
DPRN
Q
DFF
instCLRN
DPRN
Q
DFF
inst1CLRN
DPRN
Q
DFF
inst2
NOT
inst3
VCCCC INPUT
Cy clone
inclk0 period: 36.000 ns
Operation Mode: Normal
Clk Ratio Ph (dg) Td (ns) DC (%)
c0 4/1 0.00 0.00 50.00
e0 1/1 0.00 0.00 50.00
inclk0 c0
e0
locked
altpll1
inst2
CC
C40
T38
T58
Delay
inst3
T38
T58
CC
Y[0..4]
CmdValid
CmdBit[3..0]
Composer
inst8
Data Rate: 36ns/bit or 27.7Mbits/s
Internal clock: 111MHz
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 63
Measurement Result for Wave Union TDC A
Histogram
Raw
TDC+
LUT53 MHzSeparate Crystal
-
-WaveUnion Histogram
Plain TDC: delta t RMS width: 40 ps. 25 ps single hit.
Wave Union TDC A: delta t RMS width: 25 ps. 17 ps single hit.
0
500
1000
1500
2000
2500
3000
3500
1000 1100 1200 1300 1400 1500
dt (ps)
Un-calibrated
Plain TDC
Wave Union TDC A
An Application in Liquid Argon Time Projection Chamber
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 64
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 65
Liquid Argon Time Projection Chamber
Passing charged particles ionize Argon. Electric fields drift electrons meter to wire chamber planes. Waveforms of all wires are digitized, which creates a large amount of data.
Drift TimeData from BO detector of FNAL
Induction #1
Induction #2
Collection
Wire Number
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 66
Data Reduction on Liquid Argon TPC Data
Hit waveforms in TPC carry useful information. Digitizing the waveforms creates large volume of data. Data reduction without losing useful information is necessary.
Drift Time
Wire Number
Data from BO detector of FNAL
0
100
200
300
400
500
600
700
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Serial Communication between FPGA Devices
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 67
Classical Picture of Serial Communications
The parallel data is converted to serial bits driven by crystal oscillator X1 in the transmitter device.
The serial data stream is used to generate a recovered clock at the receiver device with a phase lock loop (PLL).
The recovered clock is used to drive the serial-to-parallel converter and store the data into a first-in-first-out (FIFO) buffer.
The FIFO buffer is used to transfer data from the recovered clock domain to the local clock domain generated by crystal oscillator X2.
Parallel-to-SerialConverter
FIFOSerial-to-Parallel
Converter
PLLX1 X2
LocalLogic
Recovered Clock
Serial Data Receiving Without PLL etc.
Generating recovered clock with PLL, VCO, VCXO etc. is an analog process and it is not convenient to generate in an FPGA, especially for applications with multiple receiving channels.
There are pure digital methods to receive the serial data. Digital Phase Follower: 1bit/CLK The Two-Cycle Serial IO: 1bit/(2CLK) FM Encoder and Decoder: 1bit/(2-16CLK) Clock-Command Combined Carrier Coding (C5): 4bits/(20CLK)
The transmitter and receiver can be driven by two independent free running crystal oscillators.
Parallel-to-SerialConverter
DigitalSerial-to-Parallel
Converter
X1 X2
LocalLogic
SeeBackup Slides
Triplet Finding in Trigger System
Oct. 2010, Wu Jinyuan, Fermilab [email protected]
Conventional & Unconventional Applications of FPGA 70
Hits, Hit Data & Triplets
• Hit data come out of the detector planes in random order.
• Hit data from 3 planes generated by same particle tracks are organized together to form triplets.
• Three data items must satisfy the condition: xA+ xC = 2 xB.
• A total of n3 combinations must be checked (e.g. 5x5x5=125).
• Three layers of loops if the process is implemented in software.
• Large silicon resource may be needed without careful
planning: O(N2)
Triplet Finding
Plane A Plane B Plane C
Tiny Triplet Finder OperationsPass I: Filling Bit Arrays
Note: Flipped Bit Order
Physical Planes
Bit Array/Shifters
For any hit… Fill a corresponding logic cell.
• xA+ xC = 2 xB
• xA= - xC + constant
Tiny Triplet Finder Operations Pass II: Making Match
For any center plane hit…
Logically shift the
bit array.
Perform bit-wise AND in this range.
Triplet is found.
Physical Planes
Bit Array/Shifters
Tiny? Yes, Tiny! – Logic Cell Usage:
AM, CAM, Hough Transform
etc., O(N2)
Tiny Triplet FinderO(N*logN)
The triplet finding process for FPGA schemes takes 2n clock cycles. The Tiny Triplet Finder uses much fewer logic elements
Tiny Triplet FinderReuse Coincident Logic via Shifting Hit Patterns
C1
C2
C3
One set of coincident logic is implemented.
For an arbitrary hit on C3, rotate, i.e., shift the hit patterns for C1 and C2 to search for coincidence.
Tiny Triplet Finder for Circular Tracks
*R1/R3
*R2/R3Triplet Map Output To Decoder
Bit
Arr
ay
Shifter
Bit
Arr
ay
ShifterBit-wise Coincident Logic
0
16
32
48
64
80
96
112
128
0 16 32 48 64 80 96 112 128
1. Fill the C1 and C2 bit arrays. (n1 clock cycles)
2. Loop over C3 hits, shift bit arrays and check for coincidence. (n3 clock cycles)
Also works with more than 3 layers
DIN DOUT
Index RAM
Pointer RAM
DATA RAM
K
Link List Structure of Hash Sorter
Hash Sorter
K
Using hash sorter, matching pairs can be grouped together
using 2n, rather than n2 clock cycles.