Research Overvie · The Big Picture Various terminology for the same overarching trend ›Third...
Transcript of Research Overvie · The Big Picture Various terminology for the same overarching trend ›Third...
Mixed-Signal Circuits for the Data-Driven World
Boris Murmann
September 8, 2015
Research Overv iew
Murmann Mixed-Signal Group
2
Student Support (Past and Present)
3
The Big Picture
Various terminology for the same overarching trend
› Third paradigm
› Ubiquitous computing
› Internet of Everything
5
Hardware Platforms Physical World
Software, Networks Virtual World
Fusion of Physical and Virtual Worlds
The Internet of Everything
6
Source: Vivante Corporation
This Trend is Real and Here to Stay…
7
Source: Kim, ISSCC 2015, Keynote Talk
Needs in Mixed-Signal IC Design
All-time classics
› Low-power analog interfaces
› Low-power wireless and wireline links
› Efficient power management and energy harvesting
Frontiers
› Improved interfacing with the biological world
› In-sensor computing, machine learning
› Flexible and stretchable electronics
Overarching
› Boost in design productivity
8
Current Projects (1)
ADCs and wireline communication circuits
› High-resolution SAR ADC with loop-embedded buffer
› IF ADC for digital radar
› Wideband ADC for optical communications
› Analog equalization in ADC-based serial links
› Flash ADC for serial links
RF building blocks
› Compressed sensing RF receiver
› Low-rate training of PA pre-distorters
Power management
› Energy harvesting for CMOS imagers
9
Current Projects (2)
Instrumentation and sensing
› Finite rate of innovation ultrasound interface
› Always on image sensor
› Densely integrated 3-D ultrasound interface
› Label free biomolecule detection
› Analog interfaces using stretchable organic TFTs
Custom circuits for machine learning
› Architecture optimization of deep neural networks
› Hardware implementation of deep neural networks
Design productivity
› Automatic layout generation for analog FinFet circuits
10
Drive Problem in High-Resolution SAR ADC
Tracking time is only a small
faction of the cycle
Large input capacitance
(several picofarads)
Driver often consumes more
power than SAR ADC
Reference: Kramer, ISSCC 2015
V
Ttracking Tconversion
t
SAR
ADCBufferSignal Dout
Cs
11
Conceptual Block Diagram
LogicVin
DAC
VDAC
Vres
DAC
BufferT&H
12
Moving the Buffer Inside the Loop
Buffer is linearized by the loop gain of the SAR algorithm
Can use a power efficient, weakly nonlinear source follower
DAC
13
Implementation (14b, 35MS/s)
Buffer nonlinearities must be “static” to prevent path mismatch
14
Reference: Kramer, ISSCC 2015
200fF
12.5mW
30mW
Simulated Source-Follower Nonlinearity
15
106
107
108
-100
-80
-60
-40
-20
0
f [Hz]
dB
FS
Fundamental
HD2
HD3
HD4
HD5
17.5 MHzfs/2
[dB
]
ComparisonThis Work Kapusta, ISSCC’13 Inerfield, VLSI’14
Architecture SAR TI-SAR SAR
CMOS technology (nm) 40 65 28
Resolution (bits) 14 14 15
Sampling rate (MS/s) 35 80 100
SNDR at low freq. (dB) 75 73.6 71
SNDR at Nyq. (dB) 74.4 71.3 67.1
SFDR at low freq. (dB) 99 85 89.2
SFDR at Nyq. (dB) 90 80 76
Integrated buffer Yes No Yes
Power incl. reference (mW) 42.5 35.1 8.0
Power of input buffer (mW) 12 - Not reported
16
Power Distribution (54.5mW total)
Future work should replace the current steering DAC with an SC DAC
17
Old School Signal Processing Pipeline
Severe bottlenecks going forward
› Energy of brute-force digitization
› Energy of sending bits to/through the cloud
A/D mP Cloud
Information
18
Research Opportunities
19
Analog-to-
Information
Classification/
InferenceCloud
Actionable output
Generate only as many bits as needed to capture relevant information
Exploit the robustness of classification/inference algorithms to get by with
approximate computing and/or memory
Precursor of a New Direction – Motion Processor
Do not feed raw inertial sensor data to main processor
Fuse sensors and output features (detected gestures, etc.)
Invensense MotionProcessing™ platform, www.invensense.com
20
Example: “Always On” Vision Sensor
[Choi, ISSCC 2015]
Standard imaging pipeline optimized for
visually appealing images
How to approach fundamental power
limits for pure classification tasks?
21
Logarithmic Frontend for HOG Features
Acquisition of log
gradients with 2-3 bit
resolution is sufficient
22
Example: Ultrasound Imaging
23
[Analog Devices]
20 frames/s
256 Channels
X 50 MS/s
X 12 bits
= 154 Gb/s (!)
Transducer Signal Spectrum
0 1 2 3 4 5 6 7-70
-60
-50
-40
-30
-20
-10
0
f [MHz]
Norm
aliz
ed S
pect
rum
[dB
]
Nyquist
fs > 2·7.5 MHz
With margin
for filter roll-off
fs ~ 30…50
MHz
24
Time Domain Signal
0 20 40 60 80 100 120-1
-0.5
0
0.5
1
Time [ms]
Tran
sduc
er S
igna
l [a.
u.]
Depth
Scan
Line
25
Time Domain Signal
0 20 40 60 80 100 120-1
-0.5
0
0.5
1
Time [ms]
Tran
sduc
er S
igna
l [a.
u.]
Depth
Scan
Line
26
Time Domain Signal (Zoom)
Reflections carry the transducer’s pulse response signature
There are really just two pieces of information in each reflection:
Arrival time and amplitude
79 80 81 82 83-0.2
-0.1
0
0.1
0.2
Time [ms]
Tra
nsdu
cer
Sig
nal [
a.u.
]
105 106 107 108
-0.1
-0.05
0
0.05
0.1
Time [ms]
Tra
nsdu
cer
Sig
nal [
a.u.
]
27
Time Domain Signal (Zoom)
Yet, we sample at Nyquist and accurately digitize the (known) pulse
response of the transducer over, and over, and over again…
79 80 81 82 83-0.2
-0.1
0
0.1
0.2
Time [ms]
Tra
nsdu
cer
Sig
nal [
a.u.
]
105 106 107 108
-0.1
-0.05
0
0.05
0.1
Time [ms]
Tra
nsdu
cer
Sig
nal [
a.u.
]
28
Innovation Rate Sampling
Input After Filter
Sample ~50x
below Nyquist
Conventional Innovation Rate
29
Example: Digital Predistorter Training
Acquire wideband spectrum and digitize GS/s just to estimate/track
50-100 parameters?
PAz[n]
y[n]Modelextraction
DPDx[n]
ADC
DAC
y(t)
x(t)
𝑒𝑖Ω𝑐𝑡𝑒−𝑖Ω𝑐𝑡
fs ~ 1.4 GS/s
BW ~ 700 MHz
Resolution ~12 bits
100 MHz
700 MHz
10-50 parameters
30
Sampling the Spectrum, not the Time-Domain Signal
Obtain spectrum samples according to “degrees of freedom”
Independent of signal bandwidth, Nyquist rate
f
Downconversion
0
31
Implementation
N. Hammler, Y.C. Eldar, and B. Murmann, “Low-Rate Identification of
Memory Polynomials, ISCAS 2014.
Windowed
integration
Slow ADCs
32
Identification of an LDMOS Doherty PA
10MHz WCDMA input
1024-fold fs reduction
(92.16 MHz to 90 kHz)
(Measured result)
N. Hammler, Y.C. Eldar, and B. Murmann, “Low-Rate Identification of
Memory Polynomials, ISCAS 2014.
102
103
104
105
-40
-38
-36
-34
-32
-30
-28
-26
-24
-22
X: 4.608e+004
Y: -39.48
# measurements M
X: 95
Y: -38.51
NM
SE
[dB
]
conventional method
proposed method
33
Compressed Sensing RF Receiver
𝑦[𝑛]𝑥(𝑡)
𝑝(𝑡)(𝑇𝑝 Periodic)
Binary sequence
34
𝑌1(𝑒𝑗𝜔)
𝑌2(𝑒𝑗𝜔)
𝑌3(𝑒𝑗𝜔)
=
𝑃𝑎 𝑃𝑏 𝑃𝑐𝑃𝑎′ 𝑃𝑏
′ 𝑃𝑐′
𝑃𝑎′′ 𝑃𝑏
′′ 𝑃𝑐′′
𝑋(𝑗𝜔𝑎)𝑋(𝑗𝜔𝑏)𝑋(𝑗𝜔𝑐)
𝑦1[𝑛]
𝑦2[𝑛]
𝑦3[𝑛]𝑥(𝑡)
𝑝(𝑡)
𝑝′(𝑡)
𝑝′′(𝑡)
Yonina Eldar’s Modulated Wideband Converter
[Mishali, Eldar, “From Theory to Practice…”, 2010]
35
Spectrum Sensing Using Frequency-Diverse Sequence
……𝑋(𝑗𝜔)
……𝑃(𝑗𝜔)
𝑋 𝑗𝜔 ∗ 𝑃(𝑗𝜔)
Signal
Mixer
BB
36
Spectrum Sensing Using Frequency-Diverse Sequence
……𝑋(𝑗𝜔)
……𝑃(𝑗𝜔)
𝑋 𝑗𝜔 ∗ 𝑃(𝑗𝜔)
Signal
Mixer
BB
37
Spectrum Sensing Using Frequency-Diverse Sequence
……𝑋(𝑗𝜔)
……𝑃(𝑗𝜔)
𝑋 𝑗𝜔 ∗ 𝑃(𝑗𝜔)
Signal
Mixer
BB
38
Optimized Sequence for Reception
……𝑋(𝑗𝜔)
……𝑃(𝑗𝜔)
𝑋 𝑗𝜔 ∗ 𝑃(𝑗𝜔)
Signal
Mixer
BB
39
Hardware Based Harmonic Cancellation
Sequence design offers limited blocker rejection
The 𝑛𝑡ℎ harmonic of a periodic signal delayed by 𝜏 is phase offset by 𝑛𝜔𝜏
› Choose 𝜏 such that 𝑛𝜔𝜏 = 𝑘𝑜𝑑𝑑𝜋
The 𝑛𝑡ℎ harmonic of 𝑝 𝑡 + 𝜏 is then 180° out of phase with that of 𝑝 𝑡
› Sum has no 𝑛𝑡ℎ harmonic
This is a generalization of a technique used in power inverters*
*[Kassakian, Schlecht, Verghese. “Principles …” 1991]
𝑝(𝑡)
𝑝(𝑡 + 𝜏)
𝑐𝑛
𝑐𝑛𝑒𝑗𝑛𝜔𝜏
𝑐𝑛 + 𝑐𝑛𝑒𝑗𝜔𝜏
180°
40
Block Diagram of Prototype IC
Five identical branches
› Four active branches
› A rotating calibration branch
A test tone is used to calibrate
the mixing coefficients and the
harmonic cancellation delay
41
Design Specifications Target
Target frequency band 699-915 MHz
Channel bandwidth 1.5 MHz
Sequence-based
harmonic rejection
30-40 dB
Harmonic cancellation-
based rejection
30-40 dB
Digital to Delay Converter (t Block)
Coarse adjustment by shifting the data in the shift register
› Adjustment range: 0 − 667 𝑛𝑠, increments of 500 𝑝𝑠
Fine adjustment by switching in or out extra capacitance
› Adjustment range: 0 − 594 𝑝𝑠, increments of 2 𝑝𝑠
500 𝑝𝑠 Resolution 2 𝑝𝑠 Resolution
42
Compute Fabric for Convolutional Neural Networks
Lane tracking and similar applications burn ~100 W on a GPU board
A. Coates, 2014
43
State-of-the Art Custom Hardware
Main idea: Bring memory closer to computation
Read from 256-bit, 36MB eDRAM: ~19 pJ (off-chip DRAM is ~6 nJ)
Advancements in 3D integration will improve this further
And some point we should become compute-limited
› Opportunity for analog computing?
44
Reference: Y. Chen, MICRO 2014
Area: 67.7 mm2
Digital Multiplier
45
Charge Domain Dot Product Kernel
~5x energy reduction over
custom digital
SIGN
VDD/2
SIGN
8Cu 4Cu 2Cu Cu
8Cu 4Cu 2Cu Cu
+
−
vOD
QP
QN
46
Example: 12 x 9
47
VREF
RESET
INW
SHARE
INX
SHARE2Cu8Cu 4Cu Cu
2Cu8Cu 4Cu Cu
2Cu8Cu 4Cu Cu
2Cu8Cu 4Cu Cu
2Cu8Cu 4Cu Cu
𝑄𝑊 = 12𝐶𝑢𝑉𝑟𝑒𝑓
𝑉𝑊 =12
15𝑉𝑟𝑒𝑓
𝑄𝑊𝑋 =12
15𝑉𝑟𝑒𝑓 ⋅ 9𝐶𝑢
𝑉𝑊𝑋 =12
15⋅9
15𝑉𝑟𝑒𝑓
(Logical inverse of 9)
Charge Domain Dot Product Energy Breakdown
48
The Machine Learning Team: Top-Down Approach
DX,1 DY,1
VX∙YADC
DOUT,X∙Y
DX,2 DY,2 DX,N-1 DY,N-1 DX,N DY,N
ALU
ALU
49
Architecture
Digital, memory
Analog
Summary
The transition to the “third paradigm” brings interesting
opportunities and challenges
› Mixed-signal IC design is no exception
Improving mixed-signal components is still important
› But we must increasingly follow a system/application-guided
approach for large returns
50