Discovering the Particulate Structure of Speechroger/publications/RKM - Bielefeld - Feb 2011.pdf ·...

Developmental Speech Recognition, Bielefeld: 17-18 Feb. 2011 slide 1

Discovering the Particulate Structure of Speech

Prof. Roger K. MooreProf. Roger K. Moore

Chair of Spoken Language Processing

Dept. Computer Science, University of Sheffield, UK

(Visiting Prof., Dept. Phonetics, University College London)

(Visiting Prof., Bristol Robotics Laboratory)

Overview

• Human versus machine speech recognition

• Developmentally-inspired ASR

• Research conducted in the EU-FP6 ACORNS FET project

• The particulate structure of speech

• Phylogenetic and ontogenetic perspectives

• The role of the production system

• Relevant research at USFD

Human SR vs. Machine SR

Connected

Digits

Alphabet

Letters

Resource

Management

Wall Street

Journal

Business

Switchboard

Connected

Digits

Alphabet

Letters

Resource

Management

Wall Street

Journal

Business

Switchboard

Taken from Lippmann, R. P. (1997). Speech recognition by

machines and humans. Speech Communication, 22, 1-16.

• What’s going on here?

• The definition of ‘recognition’ in machine SR is fundamentally correct …– “the most likely explanation of the incoming

data given a model of how it was produced”

• Any shortfalls in performance must therefore be due to ...– insufficient fidelity of the data– having the wrong model

• ASR researchers have been investigating both for ~60 years

0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Supervised Unsupervised Unsupervised (reduced LM training)

0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Supervised Unsupervised Unsupervised (reduced LM training)

Human SR vs. Machine SR80 year-old80 year-old10 year-old10 year-old >70 lifetimes>70 lifetimes2 year-old2 year-old

Moore, R. K. (2003). A comparison of the data requirements of automatic

speech recognition systems and human listeners, EUROSPEECH03. Geneva.

• What’s going on here?

• From an ML perspective …

– wrong type of data?

– underusing the data?

– lack of suitable priors?

• Answer = all three!

Human Machine

Learning incremental one-shot

Contextrich

(situated & embodied)

poor(domain-specific)

Styleconversational & communicative

formal & performed

Priors acquisition device AM & LM structure

Structure constructed calibrated

Memorydynamic

(episodic & semantic)

static(probabilistic)

Developmentally-Inspired ASR

• These key differences have inspired a number of investigations into the possibility of an artificial embodied agent acquiring spoken language through incremental learning in a situated environment

• The classic study was published by Deb Roy in 1998

• In December 2006 the EU funded a 3-year Future and Emerging Technology project called ‘ACORNS’ (Acquisition of COmmunication and RecogNition Skills)

http://www.acorns-project.org/

‘Little ACORNS’ (LA)

ACORNS Memory Architecture

ten Bosch, L., Van

hamme, H., Boves,

L., & Moore, R. K.

(2009). A

computational model of language

acquisition: the emergence of

words.

Fundamenta

Informaticae, 90,

229-249.

ACORNSPattern

Discovery Algorithms

Acoustic DP-Ngrams

Aimetti, G., &

Moore, R. K.

(2009). Discovering

keywords from

cross-modal input: ecological vs.

engineering methods for

enhancing

acoustic

repetitions,

INTERSPEECH. Brighton, UK.

10 20 30 40 50 60 70 80

15“Ew

ouch”

“Ewan is shy”

Zit9?m\

Acoustic DP-Ngrams

Aimetti, G., &

Moore, R. K.

(2009). Discovering

keywords from

cross-modal input:

ecological vs.

engineering

methods for

enhancing

acoustic

repetitions,

INTERSPEECH.

Brighton, UK.

Episodic Traces

2 14 3 7 1 28 11 12 18 29 16 22 19 17 5 6 9 10 4 23 24 27 25 26 20 13 8 15 21 30

Dendrogram of Exemplar units Within Internal Class DUCK

Exemplar Index

2 14 3 7 1 28 11 12 18 29 16 22 19 17 5 6 9 10 4 23 24 27 25 26 20 13 8 15 21 30

Dendrogram of Exemplar units Within Internal Class DUCK

Exemplar Index

“duck” “theduck” “the” “is”

Exemplar Units

“nappy” “book”

“shoe” “bath”

“daddy” “car”

“telephone” “mummy”

“Ewan” “bottle”

Pattern Discovery(after 100 utterances)

‘objects’ emerging from audio-visual

pattern discovery

Word Recognition

Epigenetic Landscape

Aimetti, G., ten

Bosch, L., &

Moore, R. K.

(2009). Modelling

early language

acquisition with a

dynamic systems

perspective, 9th

Int. Conf. on

Epigenetic

Robotics. Venice.

Effect of Fetal Hearing

Aimetti, G., &

Moore, R. K.

(2009). Discovering

keywords from

cross-modal input:

ecological vs.

engineering methods for

enhancing

acoustic

repetitions,

INTERSPEECH. Brighton, UK.

Whole Words → Sub-Words

Time-frequency ‘patches’ derived using ‘non-negative matrix factorisation’ (NMF)

Van Segbroeck, M., & Van hamme, H. (2009). Unsupervised learning

of time-frequency patches as a noise-robust representation of

speech. Speech Communication, 51(11), 1124-1138.

Whole Words → Sub-Words

Parsing words using NMF-based sub-word structure

Van Segbroeck, M., & Van hamme, H. (2009). Unsupervised learning of time-frequency patches as a noise-robust representation of

speech. Speech Communication, 51(11), 1124-1138.

Towards a General Principle

• It is not enough simply to ‘decompose’ speech into a hierarchy of seemingly arbitrarily units

• There needs to be an underlying driving principle for the existence (and hence learning) of such structure

• One candidate is ‘the particulate principle of self-diversifying systems’ (Abler, 1989)

Self-Diversifying Systems

Abler, W. L. (1989). On the particulate principle of self-

diversifying systems. Social Biological Structures, 12, 1-13.

+ →‘Blending’ constituents

→+‘Particulate’ constituents

Self-Diversifying Systems

• Examples …– chemical interaction– biological inheritance– human language

• All such systems “make infinite use of finite means” (Humbold, 1836)

• Properties– multidimensional– hierarchical– periodic

The Particulate Structure of Speech• Grounded in …

– sensorimotor channels– drives and intentions

• Structure is constructed …– phylogenetically– ontogenetically

• Emergent structures …– pragmatic– semantic– syntactic– lexico-morphemic– phonological– articulatory

The Particulate Structure of Speech• Abler noted that the physical basis of human speech is

fundamentally different from that of biological inheritance or chemical systems

• Consecutive speech gestures and their consequent acoustic signals exhibit blending

• This increases the length of time during which information concerning any one speech sound is present in the speech signal, thus giving the speech signal resistance to interference

• However, if blending ran to completion, it would obliterate most of the communicative power

• Abler concluded that the psychophysical thresholds of human speech perception superimpose a particulate structure over a blending structure

A Phylogenetic Perspective

• Spoken language also appears to differ from other particulate systems in that it is driven by ‘contrast’

• This is because it is a behaviour exhibited by living organisms, and it has evolved as a consequence of managing ‘energetics’

• In fact, the structure of all particulate systems is the result of constraints/attractors in …– energy– entropy– time

• Living systems have solved the ‘persistence’ problem by actively managing these dimensions

Moore, R. K. (2007). Spoken

language

processing:

piecing together

the puzzle. Speech

Communication,

49, 418-435.

• Dependencies exist between many living organisms, and some actively manage such dependencies

• Managing inter-organism dependencies represents a ‘communication’ system

• Many communication systems have evolved which exploit …– information transfer– manipulation

• Human speech has emerged as the highest information-rate system (probably because of the high DoF of the vocal articulators)

“The evolution of the active management of

communication under energetic, informational

and temporal constraints leads to an efficient

contrastive particulate system with a structure

and complexity that is a direct consequence of

the degrees-of-freedom of the available

signalling apparatus and the discriminability

supported by the sensory inputs.”

Roger K. Moore, Feb 2011

An Ontogenetic Perspective

• So, what are the implications for an organism/system that has to acquire communicative skills?

• Is the particulate structure …– pre-programmed? �– inferred/acquired from the signal? �

– an emergent consequence? �

• “Ontogeny recapitulates phylogeny” (Haeckel, 1866)

• Learning proceeds through a process of differentiation and factorisation, rather than clustering and segmentation(Karmiloff-Smith, 1992; Hendriks-Jansen, 1996)

The Role of Production

• The child is an active participant, not a passive observer

• Meaning is grounded in doing (Rizzolatti & Arbib, 1998)

• Speech understanding (and hence speech recognition) arises from inferring ‘communicative intentions’

• I.e. it is an ‘inverse’ problem (that can be solved computationally using ‘analysis-by-synthesis’)

• This is equivalent to invoking generative processes in perceptual interpretation (by recruiting information from the actual motor system)

• Production and perception develop hand-in-hand

Relevant Research at USFD

• Incremental learning of particulate phonological structure– acquisition of phonemic contrast in word pairs

• Speech energetics– biomimetic/animatronic model of the human

tongue and vocal tract (AnTon)

Hofe, R., & Moore, R. K. (2008). Towards an investigation of speech

energetics using 'AnTon': an animatronic model of a human tongue

and vocal tract. Connection Science, 20(4), 319–336.

Aimetti, G., Moore, R. K., & ten Bosch, L. (2010). Discovering an optimal

set of minimally contrasting acoustic speech units: a point of focus for

whole-word pattern matching, INTERSPEECH. Makahuri, Japan.

• Vocabulary growth– no evidence for the ‘vocabulary spurt’

• PRESENCE– predictive sensorimotor control and emulation

Moore, R. K., & ten Bosch, L. (2009). Modelling vocabulary growth from birth to young adulthood, INTERSPEECH. Brighton, UK.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Age (years)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Age (years)

Moore, R. K. (2007). PRESENCE: A human-inspired

architecture for speech-based human-machine interaction. IEEE Trans. Computers, 56(9), 1176-1188.

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m ))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

S:E(U:n)

feeling

sensitivity

interpretation

actionneeds

intention

vocal interactivity in and between humans, animals and robots

Summary

• Human versus machine speech recognition

• Developmentally-inspired ASR

• Research conducted in the EU-FP6 ACORNS FET project

• The particulate structure of speech

• Phylogenetic and ontogenetic perspectives

• The role of the production system

• Relevant research at USFD

Thanks

to …

The ACORNS teamThe ACORNS team

Thank You

http://www.dcs.shef.ac.uk/~roger

Discovering the Particulate Structure of Speechroger/publications/RKM - Bielefeld - Feb 2011.pdf ·...

Documents

Transcript of Discovering the Particulate Structure of Speechroger/publications/RKM - Bielefeld - Feb 2011.pdf ·...

Particulate Monitoring Systems - binmaster.commq1fuj-1fn/files/1068267... · Particulate Monitoring Systems Installation & Operating Manual ... Particulate Monitoring Systems employ

Engine technology Diesel Particulate Filter: Exhaust ... Particulate Filter: Exhaust aftertreatment for the ... an SCR catalytic converter ... Diesel Particulate Filter: Exhaust aftertreatment

Particulate trace metals

1Charged Particulate Radiation.ppt

Discovering Finance Dr. Hassan Sharafuddin Discovering Mathematics Week 7 Discovering Finance MU123.

DISCOVERING ARABIC RHYTHM THROUGH Bushra …DISCOVERING ARABIC RHYTHM THROUGH A SPEECH CYCLING TASK∗ Bushra Adnan Zawaydeh*,†, Keiichi Tajima‡,† and Mafuyu Kitahara† *Lernout

Print Particulate Respirator N95 Imprimer Respirateur …multimedia.3m.com/mws/media/126678O/3m-particulate-respirator-n95...Particulate Respirator N95 User Instructions IMPORTANT:

Particulate Emissions and Health Health... · Particulate Emissions and Health ... nature, involves ... particulate matter exposure and adverse health outcomes, ...

Discovering the Linear Writing Order of a Two-Dimensional ... · Discovering the Linear Writing Order of a Two-Dimensional Ancient Hieroglyphic Script ... the text-to -speech ...

Particulate Scrubbers

Particulate matter module

Com360: The First Amendment. Freedom of Speech Free speech is a means to an end: discovering the best idea possible. (social reasons) Free speech is also.

Diesel Particulate matter

Radiobiology of Particulate Irradiation...Overview of the presentation • Particulate radiation • LET of particulate irradiation • Microdosimetry/DNA lesion/repair • Linear

Particulate matter properties

Particulate Sampling 1

Venturi Particulate Scrubbers

Particulate technology

Disposable Particulate Respirators

INHALABLE PARTICULATE MATTER

DISCOVERING ARABIC RHYTHM THROUGH Bushra …DISCOVERING ARABIC RHYTHM THROUGH A SPEECH CYCLING TASK∗ Bushra Adnan Zawaydeh,†, Keiichi Tajima‡,† and Mafuyu Kitahara† Lernout