Do we need linguistic knowledge for speech technology applications in African languages?
-
Upload
guy-de-pauw -
Category
Technology
-
view
1.526 -
download
0
description
Transcript of Do we need linguistic knowledge for speech technology applications in African languages?
- 1.AFLAT 2010 Malta
Copyright JC Roux
2. Introduction
3. Aims
4. Multidisciplinary nature of our activities
5. Interaction between linguists and engineers / computer
scientists
Every time I fire a linguist, the performance of our speech
recognition system goes up.
6. Striking a balance
7. Factors
8. Linguistic knowledge
Demonstrate how the development of technologies has an impact on
the type of linguistic knowledge needed for particular
applications
Focus :Two approaches towards speech synthesis
Unit selection / Concatenative synthesis (Rule driven)
HMM-based generation synthesis (Data driven)
9. Unit selection / Concatenative synthesis
Widely accepted and approved technique
Festival Speech Synthesis System (Black et al. 1998)
FestVox (Black & Lenzo, 2000)
Rule based :e.g. prosodic models based on detailed data on
duration, stress, intonation -major languages of the world
Normally excellent (domain specific) quality of speech
10. Concatenative synthesis in a tone language
Consider the following two words in isiXhosa where tonal movement
takes place when the diminutive suffix /-ana/ is invoked:
mfnds (teacher)
mfnds + ana > mfndsnHL > LH
In order to generate a speech version of example (2) consider the
following (oversimplified) description of the process:
11. Language Component
TEXT INPUT
umfundisana
MORPHOLOGICAL COMPONENT
u +mu +fund + is + ana
# + m+ fnd + s + ana#
LEXICON / TONAL ASSIGNMENT
GRAPHEME PHONEME CONVERSION
#u +mu +fund + is + ana #
NORMALISATION
# + m_+ fnd + s + ana #
# + m+ fnd + s + ana #
PHONOLOGICAL RULES
V -> / m_+ [ V root
Nas-> [+ syllabic] / # V__ + C
PHONETIC FORM
[mfundsa:na]
PROSODIC RULES
Penultimate vowel lengthening
Tonal assignment HL > LH
# + m+ fnd + s + a:na #
# + m+ fund + s+ a:na #
12. Technical Component : Speech Generation
SPEECH DATABASE of PRE-RECORDED UNITS:
bstlb.., bstlb.., ffff (diphones, triphones)
UNIT SELECTION PROCESS
Selection of applicable units from the speech database to match the
required phonetic form
SETS OF POTENTIAL CANDIDATES
ndsa:mfuna
[ ][ ] [][] [][ ]
[ ][ ] [][] [][ ]
CONCATENATION
m fundsa:na
SMOOTHING OF JUNCTURES
SPOKEN FORM[mfundsa:na]
13. Availability of resources for rule based synthesis (1)
Morphology?
UNISA morphology project for Southern African languages of the
Bantu group great progress -implementation level
Phonology ?
Various studies on aspects of the phonologies of African languages
within different theories theoretical models per se have limited
implementation value
Phonetics?
Mainly impressionistic descriptions, very little quantitative
studies in formats that are suitable for speech technology
applications
Experimental phonetic data mainly represent laboratory based speech
read speech -lacking real world authenticity
14. Availability of resources for rule based synthesis (2)
Prosody - Intonation?
Pronunciation dictionaries
Mapping: orthography to pronunciation > manual / G2P; Progress
with the Lwazi project of Meraka, butproblem remains basic tonal
representation
Zerbian & Barnard (2008): Phonetics of intonation in Southern
African Bantu languages
Emphasises the need for quantitative acoustic data on intonational
processes.
15. Nature of tonal data in Southern Bantu
Impressionistic descriptions (Interpretations of researcher)
Examples of inconsistencies: isiXhosa the same speaker three
differentresearchers (1969, 1973 & 1992) with differenttone
marking for same items !
[Detailed discussions in Roux, 1991, 1995 (a) (b), 2001,
2003]
16. Questions
Which rules are to be used for implementation in a rule-based
speech synthesis system?
Alternatively:
When will reliable (tonal) rules be available in a useable format
for speech synthesis, particularly in African tone languages?
But, technologies change and give rise to new approaches to
linguistic data and the definition of linguistic knowledge
17. HMM-based Speech Synthesis System (HTS)
2002 First version released following pioneering work of Keiichi
Tokudahttp://www.sp.nitech.ac.jp/~tokuda/
18. Technical detail on HMM synthesis for less resourced
languages
Described in
Roux, JC &Visagie, AS. 2007. Data-driven approach to rapid
prototyping Xhosa speech synthesis, Proceedings of the 6th ISCA
Speech Synthesis Workshop, Bonn, Germany, pp 143-147
Maia, R, Zen, H, Tokuda, K, Kitamura, T & Resende,
FGV.2003.Towards the development of a Brazilian Portuguese
Text-to-Speech system based on HMM, Eurospeech, Geneva, pp
2465-68
19. Types of knowledge involved(1)
20. Types of knowledge involved(2)
21. Types of knowledge involved(3)
Point is:
Fine grained phonetic data / tonological rules are not necessarily
required for generating plausible intelligible speech as long as
the carriers of those information are present in the text data (and
corresponding speech recordings)used for training the system
22. Characteristics of text-to-speech (TTS) systems
23. Examples
isiXhosa
- based on 43 minutes of actual recorded speech
24. no tonal information included 25. 3 339 wordsSA
English
- based on 140 minutes of actual recorded speech