Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...
-
Upload
isaiah-ashby -
Category
Documents
-
view
214 -
download
1
Transcript of Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz,...
Neuromorphic Audition Group
Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm
Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma
Outline
• Field Programmable Analog Array (Dave)• Speaker Identification (Malcolm, Nima and Max)• Speech Recognition (Hynek, Misha, Jordon)• STRF Noise Suppression (Nima, Shihab, Dave)• Reconstructions from STRF/Modulation Detectors
(Nima, Shihab)• Social sonar demonstration using silicon cochlea and
RoboQuad toy (Toby and Malcolm)• Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii)• Cochlear Periodicity Detector (Teddy, John, Malcolm,
Shih-Chii)
FPAA
Speaker ID
Features Model
Features Model
Features Model
Features Model
WinnerTake
All
MFCCSTRF
GMMART
Speaker ID - STRF
Speaker ID – ARTMalcom Slaney – Heather Ames – Max Versace
Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories
First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.
Speaker ID - ART Results
Feature extraction
Feature extraction
Vowel extraction
Vowel extraction
TrainingTraining
Features
Feature vectors for“vowel” data
Acoustic Model of
Speaker Identity
Speech input (.wav)
12 MFCC + E, First and second derivs
Utterance Independent
transformation
Utterance Independent
transformation
TransformedFeatures
½ wave rectify, Lowpass filter,
Choice of high energy timeslices
TBD
ARTMAP TestingTesting
PredictedSpeaker Identity
50% correct after 100 cross-validations (# of instances of ARTMAP run)on 10 speaker identification
Continued work:1.Improved vowel extraction2.Utterance independent transformation of feature space
Why we care?Top-DownOnline
Speaker ID - Results
Test % Correct
% Correctin 5dB noise
MFCC (Baseline)
81.3% 81.0%
STRF 79.8%
ART ~60%
Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over
decades.
ASR - Phoneme Posteriors
ASR - Combining InformationTr
aini
ng
Cont
ext
{ } { }, Pr | ,Q X C Correct X C=
C
X
?Machines P(word|sound)P(word|context)
Humans [1-P(word|sound)] [1-P(word|context)]
Maximize
Inverse model: from neural responses to sound
QuickTime™ and a decompressor
are needed to see this picture.
Reconstruction of speech in white noise
• Reconstructed speech is “cleaner” than the original noisy
QuickTime™ and a decompressor
are needed to see this picture.
Original Spectrograms Reconstructed Spectrograms
Psychoacoustically-motivated Speech Enhancement
• Perceptual loudnessL=(b*e(t))^a
• By mapping loudness using the same type of function, noise can be decreased
• Results from STRFprocessing
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Noise suppression using inverse model
• Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses
14
Cortical decomposition “Trained” inverse filters
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Noise Suppression for White, Jet and City Noise
15
RS MediaLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA Initializing
Cochlear - ITD Detector
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Time
Position
Cochlear - JAER Demo
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cochlear - Periodicity detectorResponse to “hiss” Response to “coo”
When both channels conditionally independent• pCpA – probability of correct recognition in both channels• pC(1-pA ) – correct in ch1 but not in ch2
• pA(1-pC) – correct in ch2 but not in ch1
These three cases are mutually exclusive, thus probability of correct recogntion is
p = pCpA + pC(1-pA) + pA(1-pC) = pC+pA-pCpA
Probability of error
e = (1-p) = 1-pC-pA+pCpA = (1-pC)(1-pA) = eCeA
context(top-down)
acoustic(bottom-up)
pC
pA
stimulus decision