ESAT/PSI-Speech Hugo Van hammeESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012...

ESAT/PSI-Speech

Hugo Van hamme

NSF workshop

May 2015

4 July 2012

Leuven

KU Leuven introduction2

http://nl.wikipedia.org/wiki/Bestand:Belgium_location_map.svg

http://nl.wikipedia.org/wiki/Bestand:Belgium_location_map.svg

//upload.wikimedia.org/wikipedia/commons/2/2e/Leuven-Grote-Markt.jpg

//upload.wikimedia.org/wikipedia/commons/2/2e/Leuven-Grote-Markt.jpg

http://nl.wikipedia.org/wiki/Bestand:Begijnhofleuven2005sep2.jpg

http://nl.wikipedia.org/wiki/Bestand:Begijnhofleuven2005sep2.jpg

Leuven - university

• Associatie K.U.Leuven– University + 12 university colleges

– 85000 students, 600 programs

• KU Leuven– 35000 students 350 programs

• Faculty of Engineering– 2000 Students (Ba+Ma), 60 programs

• Department of Electrical Engineering (ESAT)- 150 Ma students, 6 programs

- 270 PhD students and postdocs …

- 35 FTE permanent staff

Centre for Processing of Speech and Images (PSI)- 37 PhD students and postdocs

- 8 FTE permanent staff

- Speech research group- 12 PhD students and 1.4 postdocs

- 5 Master students

- 2.3 FTE permanent staff- Patrick Wambacq

- Dirk Van Compernolle

- Hugo Van hamme KU Leuven introduction3

http://nl.wikipedia.org/wiki/Bestand:Holbein-erasmus.jpg

http://nl.wikipedia.org/wiki/Bestand:Holbein-erasmus.jpg

• noise robustness – speech enhancement – source separation – source localization

• new paradigms for speech recognition – episodic models

• build and consolidate digital infrastructure for the Dutch language

• speaker properties (text-independent): ID, language, dialect, age, height

• acoustic environment modeling – ADL recognition

• zero-resource ASR - language acquisition by machines

• speech assessment - education

ESAT/PSI-Speech research areas

Speech assessment

• Reading tutor (dyslexia) / trainer after CI fitting

• Assess native (?) pronunciation, reading/respeak tracking

• Children’s speech, hesitant, poorly articulated

Zero-resource speech recognition

Why ?

• Assistive technologies:

– people with limited fine motor control

– alternative to scanning

– cope with dysarthric voices

• Huge inter-speaker variation

• Timing, extraneous sounds

• Dialects

• Long-term: interacting with robots

– “Fetch a Hoegaarden Grand Cru from the fridge”

– “Get my red slippers”

– “Open the garden window for me”

What’s different ?

• Learn acoustic model and language model from

examples with noisy, high-level supervision information

– Not like traditional ASR

– Not like the zero-resource challenge (IS15)

• Our first steps:

– Home automation

– “open the kitchen door”, “kitchen door open”

– Learn from demonstrations = weak supervision

• Learn acoustic model, vocabulary and grammar (ASR)

• Learn mapping to semantic frames (NLU)

VIVOCA results

Work ahead

• Larger vocabularies

– How does a word spurt come about ?

• Faster learning

– Ideally from one example

• More complex instructions and semantic representations

– Continuous state space

– Dynamic representation of semantics

– Uncertainty in meaning

– ...

– Related to many actual research topics in robotics

What’s needed ?

Speech assessment

• Investment in non-native and regional accent data

• Getting government involved is hard (budget cuts etc.)

Zero-resource ASR

• Interaction data

– grow complexity of task

– Limited reuse from one task to the next

• Understanding by community of relevance of the problem

– Cfr. reviewer instructions for IS15 Zero-resource Challenge

• Investment attitude in Europe/Belgium

• Industrial interest is growing internationally

Questions ?

ESAT/PSI-Speech Hugo Van hammeESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012...

Documents

Transcript of ESAT/PSI-Speech Hugo Van hammeESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012...