ESAT/PSI-Speech Hugo Van hammeESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012...
Transcript of ESAT/PSI-Speech Hugo Van hammeESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012...
ESAT/PSI-Speech
Hugo Van hamme
NSF workshop
May 2015
4 July 2012
Leuven
KU Leuven introduction2
Leuven - university
• Associatie K.U.Leuven– University + 12 university colleges
– 85000 students, 600 programs
• KU Leuven– 35000 students 350 programs
• Faculty of Engineering– 2000 Students (Ba+Ma), 60 programs
• Department of Electrical Engineering (ESAT)- 150 Ma students, 6 programs
- 270 PhD students and postdocs …
- 35 FTE permanent staff
Centre for Processing of Speech and Images (PSI)- 37 PhD students and postdocs
- 8 FTE permanent staff
- Speech research group- 12 PhD students and 1.4 postdocs
- 5 Master students
- 2.3 FTE permanent staff- Patrick Wambacq
- Dirk Van Compernolle
- Hugo Van hamme KU Leuven introduction3
• noise robustness – speech enhancement – source separation – source localization
• new paradigms for speech recognition – episodic models
• build and consolidate digital infrastructure for the Dutch language
• speaker properties (text-independent): ID, language, dialect, age, height
• acoustic environment modeling – ADL recognition
• zero-resource ASR - language acquisition by machines
• speech assessment - education
ESAT/PSI-Speech research areas
Speech assessment
• Reading tutor (dyslexia) / trainer after CI fitting
• Assess native (?) pronunciation, reading/respeak tracking
• Children’s speech, hesitant, poorly articulated
Zero-resource speech recognition
Why ?
• Assistive technologies:
– people with limited fine motor control
– alternative to scanning
– cope with dysarthric voices
• Huge inter-speaker variation
• Timing, extraneous sounds
• Dialects
• Long-term: interacting with robots
– “Fetch a Hoegaarden Grand Cru from the fridge”
– “Get my red slippers”
– “Open the garden window for me”
What’s different ?
• Learn acoustic model and language model from
examples with noisy, high-level supervision information
– Not like traditional ASR
– Not like the zero-resource challenge (IS15)
• Our first steps:
– Home automation
– “open the kitchen door”, “kitchen door open”
– Learn from demonstrations = weak supervision
• Learn acoustic model, vocabulary and grammar (ASR)
• Learn mapping to semantic frames (NLU)
VIVOCA results
Work ahead
• Larger vocabularies
– How does a word spurt come about ?
• Faster learning
– Ideally from one example
• More complex instructions and semantic representations
– Continuous state space
– Dynamic representation of semantics
– Uncertainty in meaning
– ...
– Related to many actual research topics in robotics
What’s needed ?
Speech assessment
• Investment in non-native and regional accent data
• Getting government involved is hard (budget cuts etc.)
Zero-resource ASR
• Interaction data
– grow complexity of task
– Limited reuse from one task to the next
• Understanding by community of relevance of the problem
– Cfr. reviewer instructions for IS15 Zero-resource Challenge
• Investment attitude in Europe/Belgium
• Industrial interest is growing internationally
Questions ?