Computational Audition at AFRL/HE: Past, Present, and Future

Computational Audition at AFRL/HE:

Past, Present, and Future

Dr. Timothy R. AndersonHuman Effectiveness Directorate

Air Force Research Laboratory

2

Biologically Based Signal Processing

• research, development and applications of:– Biologically based algorithms– Perceptually relevant features – Human-centered metrics and models– to improve robustness of speech processing

systems

SpeechSpeechTechnologiesTechnologies

JAOC

Sensor-decision maker-shooter

Future JAOCCommand & Control

Combat Plans

AWACS

Chem-bio Defense Environment

3

Why Is This Area Important?

• Present signal processing systems (i.e. speech and speaker recognition, speech coding, etc.) are not robust in adverse military environments.

• Biological principles offer potential to provide improved performance in military environments.

4

Technical Challenges• Identification and modeling of features and processes used by biological systems• Incorporation of those key features and processes into computationally efficient algorithms and structures

Approach• Develop psychoacoustic testing procedures• Characterize key features and processes• Developed human-centered model and metrics• Implement computationally efficient algorithms• Provide support to operational test and warfighting exercises to evaluate system utility

Biologically Based Signal Processing

Dominant

Strong

Favorable

Tenable

Weak

Embryonic Growth Mature Aging

5

Research Areas

• Cockpit Speech Recognition• Robust Speech Recognition

– Monaural Speech Recognition– Binaural Speech Recognition– Auditory Model Front-ends

• Speaker Recognition/Verification– Biologically Based Speaker ID– Channel Robustness– Speaker Recognizability Test

6

Phoneme Classification

• Kohonen Self-Organizing Feature Map– 16 X 16

• 10 Speaker Database (TIMIT)• 10 sentences/speaker• Leaving one out method (per speaker)• Features calculated with

– 16 ms window – 5 ms frame step

7

TRADITIONAL VS. AUDITORYMONAURAL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% AIMMFCC

9

Binaural Speech Recognition

• Past• Present • Future

10


• Stereausis• Cocktail Party Processor• BAIM• BINAP

11

EXPERIMENT SETUP

SOUNDSOURCE

SOURCENOISE

XX

12

MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR


05

101520253035404550

1 5 10 15 20 32 Clean


% CPPMONO

13

MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL


05

101520253035404550

1 5 10 15 20 32 Clean


% BAIMAIM

14

BINAURAL


05

101520253035404550

1 5 10 15 20 32 Clean


% CPPBAIM

15

MONAURAL


05

101520253035404550

1 5 10 15 20 32 Clean


% AIMMONO

16

BAIM VS. CPP-AIM


05

101520253035404550

1 5 10 15 20 32 Clean


%BAIMAIMCPP-AIM

17

COINCIDENCE


05

101520253035404550

1 5 10 15 20 32 Clean


% BAIMBINAP

18

MONAURAL, BINAURAL AND TRADITIONAL


05

101520253035404550

1 5 10 15 20 32 Clean


%

CPPBAIMAIMMONOMFCCBINAPCPP-AIM

19


RESULTSBINAURAL AUDITORY MODELPROVIDES BETTER REPRESENTATION THAN TRADITIONAL TECHNIQUES:

TASK

PHONEME RECOGNITION

SPEECH

LOW TO HIGH SNR

RESULTS7-12 dB BINAURAL ADVANTAGE

20


• Past• Present

– No Current Work• Future

21



– Implement binaural ASR system– Investigate further binaural fusion mechanisms– Meeting room data– Implement binaural system using AIM chips

22

Auditory Model Front Ends


23


• Tanner Research “Analog Speech Recognition”– Implementation of AIM– 56 channels Analog Filter bank– Single SBUS board– 1.5 X Real-time

24


• AFIT – Designed Digital Implementation

• Middle ear, BMM, adaptive thresholding– 32 channels per chip– 300 Hz – 7 kHz– 44.1 KHz sampling rate– 2 chips provide 64 channels in real-time

27


• Past• Present

– Single board system designed and prototyped - USB– Current chip design undergoing debug– Second fabrication run this fall

• Future

28



– Debug and verify chip fabrication– Debug PC based real-time auditory model front end– Implement complete end-to-end auditory ASR– Investigate feedback mechanisms in auditory model

for ASR

29

Biologically Based SID


30


• Auditory Models Investigated– Payton’s Auditory Model (PAM)– Auditory Image Model (AIM)

• VQ Codebook used to model speaker• 37 Speakers from TIMIT (dr1,2 12F 25M)

– MFCC 94%– PAM 67%– AIM 91%

31



32


• Using perceptual features– Formants, formant bandwidths, and pitch

• Voiced Frames• Using GMM classifier• Conducting experiments on larger databases

– Switchboard

33


MFCCs, no Deltas, no CMS

F0 Base

MFCCs, no CMS

34



F0 Base

MFCCs, no CMS

35


F0 Base

MFCCs, no Deltas, no CMSMFCCs,

no CMS

36



F0 Base

MFCCs, no CMS

37


• Performance isn’t the best, but this feature set…– Uses only 9 features versus 19–38 for MFCCs– Hasn’t been as heavily researched as MFCCs

38


• Determine reasons for performance differences between various databases

• Channel & score normalizations• Pitch-synchronous features• Closed-phase analysis• Glottal model features

39


40



41


• Investigate other auditory based features– Vocal agitation– Formants, formant bandwidths, and pitch calculated

from the auditory model– Auditory model features

• Conduct experiments on other databases– Broadcast news– Military training exercises

42

Speaker Recognizability Test


43


• Dynastat “The Development of a Method for Evaluating and Predicting Speaker Recognizability in Voice Communication Systems”– Determined perceptually relevant features

• Perceptual voice traits (PVT)• 21 traits currently identified

– Developed methodology to measure these traits• Human listeners

– Developed measure to determine loss due to channel• Diagnostic Speaker Recogniziability Test (DSRT)

44



45


• Use perceptual voice traits to identify groups of similar and distinctive speakers

• Determine if current SID systems have difficulty with these similar speakers

• Implementing in-house – Web-based listening test for

• PVT rating• DSRT

46



47


• Obtain PVT ratings for larger database– Switchboard

• Determine acoustic correlates of perceptually relevant features

• Use as features for speaker recognition• Utilize DSRT for communication system testing

48

Summary

• Computational Audition offers potential for improved performance in adverse military environments

• Still lots of research needs to be accomplished– Fidelity of model– Model feedback pathways

• Computation issues no longer limiting factor in performing meanful experiments

49

Questions?

Computational Audition at AFRL/HE: Past, Present, and Future

Documents

Transcript of Computational Audition at AFRL/HE: Past, Present, and Future