Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering...

Philip Jackson, Boon-Hooi LoPhilip Jackson, Boon-Hooi Lo

and Martin Russelland Martin Russell

Electronic Electrical and Computer Engineering

Models of speech Models of speech dynamics for ASR, using dynamics for ASR, using

intermediate linear intermediate linear representationsrepresentations

http://web.bham.ac.uk/p.jackson/balthasar/

AbstractINTRODUCTIONINTRODUCTION

Speech dynamics into ASR• dynamics of speech production to

constrain recognizer– noisy environments– conversational speech– speaker adaptation

• efficient, complete and trainable models– for recognition– for analysis– for synthesis

INTRODUCTIONINTRODUCTION

Articulatory trajectories

from West (2000)

Articulatory-trajectory model

intermediate

finite-state

surface

source dependent

Articulatory-trajectory model

Multi-level Segmental HMM

• segmental finite-state process

• intermediate “articulatory” layer– linear trajectories

• mapping required– linear transformation– radial basis function network

Linear-trajectory modelINTRODUCTIONINTRODUCTION

acoustic layer

articulatory-to-acoustic mapping

intermediate layer

segmental HMM

2 3 4 51

Linear-trajectory equations

Defined as

whereSegment probability:

,iii ttt cmf

1 ,)();(t

iii RtWtb fyy N

THEORYTHEORY

Linear mapping

Objective function

with matched sequences and

,)()()()(1

ti ttWRttWE yxyx

YWXD min

THEORYTHEORY

Trajectory parameters

1)1(,Pr Msy

Utterance probability,

and, for the optimal (ML) state sequence ,s

tt iki

ttikii

tDWDtt

THEORYTHEORY

Non-linear (RBF) mapping

. . . tjx

tky. . .acoustic layer

formant

trajectories

THEORYTHEORY

Trajectory parametersWith the RBF, the least-squares solution issought by gradient descent:

ijijjj

tftxttxtt

tftxttx

)()()()(2

THEORYTHEORY

Tests on TIMIT• N. American English, at 8kHz

– MFCC13 acoustic features (incl. zero’th)

a) F1-3: formants F1, F2 and F3, estimated by Holmes formant tracker

b) F1-3+BE5: five band energies added

c) PFS12: synthesiser control parameters

METHODMETHOD

TIMIT baseline performance

ID_0 ID_1

Features

• Constant-trajectory SHMM (ID_0)• Linear-trajectory SHMM (ID_1)

RESULTSRESULTS

Performance across feature sets

ID_0 (a) F1-3 (b) F1-3+BE5 (c) PFS12 ID_1

Features

RESULTSRESULTS

Phone categorisationNo. Description

A 1 all data

B 2 silence; speech

C 6 linguistic categories: silence/stop; vowel; liquid; nasal; fricative; affricate

D 10 as Deng and Ma (2000):silence; vowel; liquid; nasal; UV fric; /s,ch/; V fric; /z,jh/; UV stop; V stop

E 10 discrete articulatory regions

F 49 silence; individual phones

METHODMETHOD

Discrete articulatory regionsFeatures Description

0 -voice Silence, non-speech

1 +voice, VT open Vowel, glide

2 +voice, VT part. Liquid, approximant

3 +voice, VT closed, +velum

4 +voice, VT closed Voiced plosive (closure)

5 -voice, VT closed Voiceless plosive (closure)

6 +voice, VT open, +plosion

Voiced plosive (release)

7 -voice, VT open, +plosion Voiceless plosive (release)

8 +voice, VT part., +fric/asp

Voiced fricative

9 -voice, VT part., +fric/asp Voiceless fricative

METHODMETHOD

Performance across groupings

ID_0 A (1) B (2) C (6) D (10) E (10) F (49) ID_1

Mappings

RESULTSRESULTS

Results across groupings

ID_0 A (1) B (2) C (6) D (10) E (10) F (49) ID_1

Mappings

(a) F1-3

(b) F1-3+BE5

(c) PFS12

RESULTSRESULTS

Tests on MOCHA• S. British English, at 16kHz

– MFCC13 acoustic features (incl. zero’th)

– articulatory x- & y-coords from 7 EMA coils

– PCA9+Lx: first nine articulatory modes plus the laryngograph log energy

METHODMETHOD

MOCHA baseline performance

ID_0 ID_1

Mappings

RESULTSRESULTS

Performance across mappings

ID_0 A (1) B (2) C (6) D (10) E (10) F (49) ID_1

Mappings

RESULTSRESULTS

Model visualisationDISCUSSIONDISCUSSION

Originalacousticdata

Constant-trajectorymodel

Linear-trajectorymodel, (F)PFS12 (c)

Conclusions• Theory of Multi-level Segmental HMMs• Benefits of linear trajectories• Results show near optimal performance

with linear mappings• Progress towards unified models of the

speech production process

• What next?– unsupervised (embedded) training, to

derive pseudo-articulatory representations– implement non-linear mapping (i.e., RBF)– include biphone language model, and

segment duration models

SUMMARYSUMMARY

Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering...

Documents

Transcript of Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering...

Boon or Bane

HOAX CATEGORIZATION By Brenda Lee Hooi Fern A REPORT ...

Boon Thesis

ASR 6-49-2011 - ASR - Recent

By KHOO KHAY HOOI - Universiti Sains Malaysiaeprints.usm.my/26507/1/CUSTOMER_LOYALTY... · By . KHOO KHAY HOOI . ... 4.2 Goodness of measures ... have confirmed the brand equity and

052808 Asr Ch8 Asr Ecotox Phase1

Oil Curse and Finance-Growth Nexus in Malaysia: The Role ...€¦ · Oil Curse and Finance-Growth Nexus in Malaysia: The Role of Investment Ramez Abubakr Badeeb, Hooi Hooi Lean* and

Boon Chuan & Joyce

Boon models pdf

C:cygwinhomeAdministratorcvs modules asr-journal asr-draft › _pdf › brochures › COMNET.iprandom.2007.… · Title: C:cygwinhomeAdministratorcvs_modules asr-journal asr-draft.ps

Cisco ASR 9000 System Aggregation Services Routers · (Figure 1): the Cisco ASR 9010 Router, the Cisco ASR 9006 Router, and the Cisco ASR 9922 Router, and the Cisco ASR 9000v Router

Mechanical boon: will automation advance Australia? · Mechanical boon: will automation advance ... automation susceptibility. The education sector is the least ... Mechanical boon:

ONG TZE BOON

Alcoholism a boon.....?

Recycling a bOOn

Cisco ASR 1000 Series Aggregation Services Routers · PDF fileand and ) ) Cisco ASR 1000 Cisco Cisco or Cisco ASR Cisco ASR Cisco ASR 1000 Series or Cisco Cisco ASR 1000 Series EPA

Boon dos interchange

See Hooi Ewe BW - BW 5

(KABU) ASSA ADACHI, MASUMI AKIYOSHI, KUMI …...BEH LIAN YIM BEH SIEW LING BEH, BOH SIN & HO, PETER KOK WAI BEH, BOON SENG & JALILAH BINTI AWANG DAHALAN BEH, HOOI PING & YE, KIM CHYE

Boon and Rogers