30.3 - A 360-Channel Speech Preprocessor that Emulates the ... · Figure 30.3.4: Automatic gain...

3
556 2006 IEEE International Solid-State Circuits Conference ISSCC 2006 / SESSION 30 / SILICON FOR BIOLOGY / 30.3 30.3 A 360-Channel Speech Preprocessor that Emulates the Cochlear Amplifier Bo Wen, Kwabena Boahen University of Pennsylvania, Philadelphia, PA Demand for speech recognition in portable devices, such as cell- phones and PDAs, remains unmet. These applications will undoubtedly benefit from a front-end that functions like the bio- logical cochlea: hundreds of digital output channels, similar to the auditory nerve, for fine frequency discrimination and fre- quency-selective automatic gain control (AGC) to cope with large input dynamic range and combat noise. Mixed analog-digital cochlea-like front-ends (silicon cochleae) have promise in provid- ing these desirable features in real-time at low power. The first silicon cochlea, developed by Lyon and Mead [1], employed a cascade of second-order low-pass filters (LPFs) with exponentially decreasing resonant frequencies. However, the cas- cade structure accumulates noise and lacks fault-tolerance. Connecting filter banks in parallel, rather than in series, prom- ised to eliminate these problems inherent in the cascade [2]. However, another problem emerges: destructive interference occurs when outputs are combined due to the large phase change at resonance [3]. To address the limitations of current silicon cochlea designs, inspiration was sought from the biological cochlea, which exhibits exquisite sensitivity and selectivity in detecting and analyzing sound. The cochlea achieves this performance through an active mechanism (often referred to as the cochlear amplifier) that is still not understood. However, its microanatomy provides clues, based on which we previously proposed active bidirectional cou- pling (ABC) as the amplification mechanism [4]. A silicon cochlea implementation is described here that uses ABC to overcome the limitations of existing architectures. In essence, our architecture is the first to employ negative damping (i.e., active behavior) instead of undamping (i.e., passive behavior). A version of our novel cochlear architecture was fabricated with two 360×13 diffusive grids and 360 second-order sections, each driving 6 pulse-frequency modulators (PFMs) (Fig. 30.3.1). The second-order sections model the stiffness (S), damping (β), and mass (M) of the basilar membrane (BM), the main vibrating organ in the cochlea. The BM interacts with the cochlear fluid, whose motion is governed by Laplace’s equation, solved by the diffusive grid [5]. In our current-mode design, a current (I in ) represents the fluid’s velocity potential, whose spatial derivative is the fluid velocity, and another (I mem ) the BM’s velocity (see Fig. 30.3.1). I in and I mem are related (with physical analogs) by 2 I in /t 2 = SI mem + βI mem /t + M2 I mem /t 2 . This relation is real- ized using two interacting first-order LPFs, as described by: τ 1 I s s + I s = I o I in , τ 2 I o s + I o = I in bI s , and I mem = I in + I s I o . The phys- ical analogs are thus obtained as S = (b+1)/τ 1 τ 2 , β = (τ 1 +τ 2 )/τ 1 τ 2 , and M = 1, with τ 1 and τ 2 increasing exponentially from section to section, to simulate the BM’s nonuniform stiffness and damping. For a given choice of τ 1 and τ 2 (capacitance limited by silicon area), the gain factor b increases the quality factor Q = (b+1) / {(τ 1 /τ 2 ) + (τ 2 /τ 1 )}. ABC is included in the cochlea design by exchanging currents between neighboring second-order sections (see Fig. 30.3.1). The second LPF thus becomes τ 2 I o s + I o = I in bI s r fb (b+1)T(I sb ) + r ff (b+1)T(I sf ), where I sf and I sb are the outputs of the first LPF in the upstream and downstream neighbors, respectively; r ff and r fb are the coupling strength in each direction; and T(I s ) = I s I sat /(I s + I sat ), accomplished by a current-limiting transistor, which sets the sat- uration level I sat (controlled by V sat ). A log-domain Class AB second-order section that constrains the common-mode dynamically, thus avoiding instability associated with negative feedback correction, was synthesized (Fig. 30.3.2). Following the approach in [6], a first-order LPF is described by τ(I out + I out )s + (I out + I out ) = I in + I in and τI out + I out + I out + I out = I q 2 , where I q (controlled by V q ) sets the quiescent current. Hence, the positive path’s nodal equation is CdV out + /dt = I τ {(I in + I in ) – (I out + I q 2 /I out + )}/(I out + + I out ), where I τ (controlled by V τ ) sets the time con- stant (τ = Cu T /κI τ ; κ is the subthreshold slope coefficient and u T is the thermal voltage). The negative path simply swaps the super- scripts + and . These two paths interact in push–pull. To obtain sinusoidal current as the input to the BM circuit, we set the dif- ferential voltages applied to the diffusive grids to be the loga- rithm of a half-wave rectified sinusoid (with appropriate DC off- set). I mem is encoded by the PFMs, whose digital outputs are transmitted through an address event interface [7]. Tuned to frequencies from 210Hz to 14kHz, the chip exhibits rel- atively large amplification (Fig. 30.3.3) and compression at high input intensities (Fig. 30.3.4). Four octave-spaced pure tones elic- it maximum responses at monotonically increasing channels. Increasing the input current level from 0 to 48dB yields 24dB compression at the peak; frequency tuning becomes broader as well, with Q 10 = 1.84 (i.e., f peak over f 10dB , the width 10dB below the peak) at 0dB and 1.14 at 48dB; phase accumulation remains the same. Thus, ABC sharpens frequency tuning and increases dynamic range. Further, the chip processes natural sounds in real time (Fig. 30.3.5). A VLSI implementation of a 2D nonlinear cochlear model is pre- sented that utilizes a novel active mechanism, ABC, which ampli- fies the traveling wave to a degree that decreases with input intensity, thus realizing frequency-selective AGC (Fig. 30.3.6 summarizes the specifications; Fig. 30.3.7 shows the die micro- graph). Rather than detecting the wave’s amplitude and imple- menting a control loop, our biomorphic architecture simply employs nonlinear interactions between adjacent neighbors, emulating the cochlear amplifier. In addition to AGC, this cochlear amplifier nonlinearity is thought to suppress weaker tones in favor of stronger ones, thereby enhancing formant per- ception in noisy environments. These biological features of our silicon cochlea are desirable in speech recognition systems that seek to match biological performance. References: [1] R. F. Lyon and C. A. Mead, “An Analog Electronic Cochlea,” IEEE Trans. Acoust. Speech and Signal Proc., vol. 36, pp. 1119-1134, 1988. [2] L. Watts, “Cochlear Mechanics: Analysis and Analog VLSI,” Ph.D. the- sis, Pasadena, CA, Caltech, 1993. [3] E. Fragnière, “A 100-Channel Analog CMOS Auditory Filter Bank for Speech Recognition,” ISSCC Dig. Tech. Papers, pp. 140-141, Feb., 2005. [4] B. Wen and K. Boahen, “A Linear Cochlear Model with Active Bi-direc- tional Coupling,” IEEE EMBS, pp. 2013-2016, 2003. [5] A. G. Andreou and K. A. Boahen, “Translinear Circuits in Subthreshold MOS,” Journal of Analog Integrated Circuits and Signal Processing, vol. 9, pp. 141-166, 1996. [6] K. Zaghloul and K. A. Boahen, “An On-Off Log-Domain Circuit that Recreates Adaptive Filtering in the Retina,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 52, no. 1, pp. 99-107, Jan., 2005. [7] K. Boahen, “A Burst-Mode Word-Serial Address-Event Channel-I: Transmitter Design,” IEEE Transactions on Circuits and Systems I, vol. 51, no. 7, pp. 1269-1280, July, 2004. 1-4244-0079-1/06/$20.00 ©2006 IEEE

Transcript of 30.3 - A 360-Channel Speech Preprocessor that Emulates the ... · Figure 30.3.4: Automatic gain...

Page 1: 30.3 - A 360-Channel Speech Preprocessor that Emulates the ... · Figure 30.3.4: Automatic gain control. Chirp (from 16k to 200Hz, 1.5 s) Click (0.1 s) Supply voltages Analog: 2.4V;

556 • 2006 IEEE International Solid-State Circuits Conference

ISSCC 2006 / SESSION 30 / SILICON FOR BIOLOGY / 30.3

30.3 A 360-Channel Speech Preprocessor that Emulates the Cochlear Amplifier

Bo Wen, Kwabena Boahen

University of Pennsylvania, Philadelphia, PA

Demand for speech recognition in portable devices, such as cell-phones and PDAs, remains unmet. These applications willundoubtedly benefit from a front-end that functions like the bio-logical cochlea: hundreds of digital output channels, similar tothe auditory nerve, for fine frequency discrimination and fre-quency-selective automatic gain control (AGC) to cope with largeinput dynamic range and combat noise. Mixed analog-digitalcochlea-like front-ends (silicon cochleae) have promise in provid-ing these desirable features in real-time at low power.

The first silicon cochlea, developed by Lyon and Mead [1],employed a cascade of second-order low-pass filters (LPFs) withexponentially decreasing resonant frequencies. However, the cas-cade structure accumulates noise and lacks fault-tolerance.Connecting filter banks in parallel, rather than in series, prom-ised to eliminate these problems inherent in the cascade [2].However, another problem emerges: destructive interferenceoccurs when outputs are combined due to the large phase changeat resonance [3].

To address the limitations of current silicon cochlea designs,inspiration was sought from the biological cochlea, which exhibitsexquisite sensitivity and selectivity in detecting and analyzingsound. The cochlea achieves this performance through an activemechanism (often referred to as the cochlear amplifier) that isstill not understood. However, its microanatomy provides clues,based on which we previously proposed active bidirectional cou-pling (ABC) as the amplification mechanism [4]. A silicon cochleaimplementation is described here that uses ABC to overcome thelimitations of existing architectures. In essence, our architectureis the first to employ negative damping (i.e., active behavior)instead of undamping (i.e., passive behavior).

A version of our novel cochlear architecture was fabricated withtwo 360×13 diffusive grids and 360 second-order sections, eachdriving 6 pulse-frequency modulators (PFMs) (Fig. 30.3.1). Thesecond-order sections model the stiffness (S), damping (β), andmass (M) of the basilar membrane (BM), the main vibratingorgan in the cochlea. The BM interacts with the cochlear fluid,whose motion is governed by Laplace’s equation, solved by thediffusive grid [5]. In our current-mode design, a current (Iin) represents the fluid’s velocity potential, whose spatial derivativeis the fluid velocity, and another (Imem) the BM’s velocity (see Fig. 30.3.1). Iin and Imem are related (with physical analogs)by ∂2Iin/∂t2 = SImem+ β∂Imem/∂t + M∂2Imem/∂t2. This relation is real-ized using two interacting first-order LPFs, as described by: τ1Iss+ Is = Io – Iin, τ2Ios + Io = Iin – bIs, and Imem = Iin + Is – Io. The phys-ical analogs are thus obtained as S = (b+1)/τ1τ2, β = (τ1+τ2)/τ1τ2,and M = 1, with τ1 and τ2 increasing exponentially from section tosection, to simulate the BM’s nonuniform stiffness and damping.For a given choice of τ1 and τ2 (capacitance limited by siliconarea), the gain factor b increases the quality factor Q = √(b+1) /{√(τ1/τ2) + √(τ2/τ1)}.

ABC is included in the cochlea design by exchanging currentsbetween neighboring second-order sections (see Fig. 30.3.1). Thesecond LPF thus becomes τ2Ios + Io = Iin – bIs – rfb(b+1)T(Isb) + rff

(b+1)T(Isf), where Isf and Isb are the outputs of the first LPF in theupstream and downstream neighbors, respectively; rff and rfb are

the coupling strength in each direction; and T(Is) = IsIsat/(Is + Isat),accomplished by a current-limiting transistor, which sets the sat-uration level Isat (controlled by Vsat).

A log-domain Class AB second-order section that constrains thecommon-mode dynamically, thus avoiding instability associatedwith negative feedback correction, was synthesized (Fig. 30.3.2).Following the approach in [6], a first-order LPF is described byτ(Iout

+ – Iout–)s + (Iout

+ – Iout–) = Iin

+ – Iin– and τIout

+Iout–+ Iout

+Iout– = Iq

2,where Iq (controlled by Vq) sets the quiescent current. Hence, thepositive path’s nodal equation is CdVout

+/dt = Iτ{(Iin+ – Iin

–) – (Iout+ –

Iq2/Iout

+)}/(Iout+ + Iout

–), where Iτ (controlled by Vτ) sets the time con-stant (τ = CuT/κIτ; κ is the subthreshold slope coefficient and uT isthe thermal voltage). The negative path simply swaps the super-scripts + and –. These two paths interact in push–pull. To obtainsinusoidal current as the input to the BM circuit, we set the dif-ferential voltages applied to the diffusive grids to be the loga-rithm of a half-wave rectified sinusoid (with appropriate DC off-set). Imem is encoded by the PFMs, whose digital outputs aretransmitted through an address event interface [7].

Tuned to frequencies from 210Hz to 14kHz, the chip exhibits rel-atively large amplification (Fig. 30.3.3) and compression at highinput intensities (Fig. 30.3.4). Four octave-spaced pure tones elic-it maximum responses at monotonically increasing channels.Increasing the input current level from 0 to 48dB yields 24dBcompression at the peak; frequency tuning becomes broader aswell, with Q10 = 1.84 (i.e., fpeak over ∆f10dB, the width 10dB belowthe peak) at 0dB and 1.14 at 48dB; phase accumulation remainsthe same. Thus, ABC sharpens frequency tuning and increasesdynamic range. Further, the chip processes natural sounds inreal time (Fig. 30.3.5).

A VLSI implementation of a 2D nonlinear cochlear model is pre-sented that utilizes a novel active mechanism, ABC, which ampli-fies the traveling wave to a degree that decreases with inputintensity, thus realizing frequency-selective AGC (Fig. 30.3.6summarizes the specifications; Fig. 30.3.7 shows the die micro-graph). Rather than detecting the wave’s amplitude and imple-menting a control loop, our biomorphic architecture simplyemploys nonlinear interactions between adjacent neighbors,emulating the cochlear amplifier. In addition to AGC, thiscochlear amplifier nonlinearity is thought to suppress weakertones in favor of stronger ones, thereby enhancing formant per-ception in noisy environments. These biological features of oursilicon cochlea are desirable in speech recognition systems thatseek to match biological performance.

References:[1] R. F. Lyon and C. A. Mead, “An Analog Electronic Cochlea,” IEEE Trans.Acoust. Speech and Signal Proc., vol. 36, pp. 1119-1134, 1988.[2] L. Watts, “Cochlear Mechanics: Analysis and Analog VLSI,” Ph.D. the-sis, Pasadena, CA, Caltech, 1993.[3] E. Fragnière, “A 100-Channel Analog CMOS Auditory Filter Bank forSpeech Recognition,” ISSCC Dig. Tech. Papers, pp. 140-141, Feb., 2005.[4] B. Wen and K. Boahen, “A Linear Cochlear Model with Active Bi-direc-tional Coupling,” IEEE EMBS, pp. 2013-2016, 2003.[5] A. G. Andreou and K. A. Boahen, “Translinear Circuits in SubthresholdMOS,” Journal of Analog Integrated Circuits and Signal Processing, vol. 9,pp. 141-166, 1996.[6] K. Zaghloul and K. A. Boahen, “An On-Off Log-Domain Circuit thatRecreates Adaptive Filtering in the Retina,” IEEE Transactions on Circuitsand Systems I: Regular Papers, vol. 52, no. 1, pp. 99-107, Jan., 2005.[7] K. Boahen, “A Burst-Mode Word-Serial Address-Event Channel-I:Transmitter Design,” IEEE Transactions on Circuits and Systems I, vol.51, no. 7, pp. 1269-1280, July, 2004.

1-4244-0079-1/06/$20.00 ©2006 IEEE

Session_30_Penmor 1/8/06 11:29 AM Page 556

Page 2: 30.3 - A 360-Channel Speech Preprocessor that Emulates the ... · Figure 30.3.4: Automatic gain control. Chirp (from 16k to 200Hz, 1.5 s) Click (0.1 s) Supply voltages Analog: 2.4V;

557DIGEST OF TECHNICAL PAPERS •

Continued on Page 672

ISSCC 2006 / February 8, 2006 / 2:30 PM

Figure 30.3.1: Cochlear chip architecture. Figure 30.3.2: Basilar membrane (BM) circuit.

Figure 30.3.3: Top: Frequency responses. Bottom: Longitudinal responses.

Figure 30.3.5: Cochleagram. Figure 30.3.6: Chip specifications.

Figure 30.3.4: Automatic gain control.

Chirp (from 16k to 200Hz, 1.5 s)

Click (0.1 s)

Analog: 2.4V; Digital: 2.5VSupply voltages

115,000Transistor count

0.25µm 1P5M CMOS Technology

3.76 × 2.91mm2Chip die size

360Channel count

52mWPower consumption

24dBAGC compression at peak

20 ~ 70dB/octCutoff slope

0.90 ~ 2.73Q10

0.3 ~ 1.4cyclesPhase at peak

17 ~ 42dBDynamic range

210 ~ 14kHzFrequency range

2160PFM outputs

ValuesSpecification

30

Session_30_Penmor 1/8/06 11:29 AM Page 557

Page 3: 30.3 - A 360-Channel Speech Preprocessor that Emulates the ... · Figure 30.3.4: Automatic gain control. Chirp (from 16k to 200Hz, 1.5 s) Click (0.1 s) Supply voltages Analog: 2.4V;

672 • 2006 IEEE International Solid-State Circuits Conference 1-4244-0079-1/06/$20.00 ©2006 IEEE

ISSCC 2006 PAPER CONTINUATIONS

Figure 30.3.7: Cochlear chip micrograph.

500µm