Do we need linguistic knowledge for speech technology applications in African languages?

34
Do we need linguistic knowledge for speech technology applications in African languages? Justus C Roux Do we need linguistic knowledge for speech technology applications in African languages? Justus C Roux AFLAT 2010 – Malta Copyright JC Roux

description

© Justus Roux

Transcript of Do we need linguistic knowledge for speech technology applications in African languages?

  • 1.AFLAT 2010 Malta
    Copyright JC Roux

2. Introduction
3. Aims
4. Multidisciplinary nature of our activities
5. Interaction between linguists and engineers / computer scientists
Every time I fire a linguist, the performance of our speech recognition system goes up.
6. Striking a balance
7. Factors
8. Linguistic knowledge
Demonstrate how the development of technologies has an impact on the type of linguistic knowledge needed for particular applications
Focus :Two approaches towards speech synthesis
Unit selection / Concatenative synthesis (Rule driven)
HMM-based generation synthesis (Data driven)
9. Unit selection / Concatenative synthesis
Widely accepted and approved technique
Festival Speech Synthesis System (Black et al. 1998)
FestVox (Black & Lenzo, 2000)
Rule based :e.g. prosodic models based on detailed data on duration, stress, intonation -major languages of the world
Normally excellent (domain specific) quality of speech
10. Concatenative synthesis in a tone language
Consider the following two words in isiXhosa where tonal movement takes place when the diminutive suffix /-ana/ is invoked:
mfnds (teacher)
mfnds + ana > mfndsnHL > LH
In order to generate a speech version of example (2) consider the following (oversimplified) description of the process:
11. Language Component
TEXT INPUT
umfundisana
MORPHOLOGICAL COMPONENT
u +mu +fund + is + ana
# + m+ fnd + s + ana#
LEXICON / TONAL ASSIGNMENT
GRAPHEME PHONEME CONVERSION
#u +mu +fund + is + ana #
NORMALISATION
# + m_+ fnd + s + ana #
# + m+ fnd + s + ana #
PHONOLOGICAL RULES
V -> / m_+ [ V root
Nas-> [+ syllabic] / # V__ + C
PHONETIC FORM
[mfundsa:na]
PROSODIC RULES
Penultimate vowel lengthening
Tonal assignment HL > LH
# + m+ fnd + s + a:na #
# + m+ fund + s+ a:na #
12. Technical Component : Speech Generation
SPEECH DATABASE of PRE-RECORDED UNITS:
bstlb.., bstlb.., ffff (diphones, triphones)
UNIT SELECTION PROCESS
Selection of applicable units from the speech database to match the required phonetic form
SETS OF POTENTIAL CANDIDATES
ndsa:mfuna
[ ][ ] [][] [][ ]
[ ][ ] [][] [][ ]
CONCATENATION
m fundsa:na
SMOOTHING OF JUNCTURES
SPOKEN FORM[mfundsa:na]
13. Availability of resources for rule based synthesis (1)
Morphology?
UNISA morphology project for Southern African languages of the Bantu group great progress -implementation level
Phonology ?
Various studies on aspects of the phonologies of African languages within different theories theoretical models per se have limited implementation value
Phonetics?
Mainly impressionistic descriptions, very little quantitative studies in formats that are suitable for speech technology applications
Experimental phonetic data mainly represent laboratory based speech read speech -lacking real world authenticity
14. Availability of resources for rule based synthesis (2)
Prosody - Intonation?
Pronunciation dictionaries
Mapping: orthography to pronunciation > manual / G2P; Progress with the Lwazi project of Meraka, butproblem remains basic tonal representation
Zerbian & Barnard (2008): Phonetics of intonation in Southern African Bantu languages
Emphasises the need for quantitative acoustic data on intonational processes.
15. Nature of tonal data in Southern Bantu
Impressionistic descriptions (Interpretations of researcher)
Examples of inconsistencies: isiXhosa the same speaker three differentresearchers (1969, 1973 & 1992) with differenttone marking for same items !
[Detailed discussions in Roux, 1991, 1995 (a) (b), 2001, 2003]
16. Questions
Which rules are to be used for implementation in a rule-based speech synthesis system?
Alternatively:
When will reliable (tonal) rules be available in a useable format for speech synthesis, particularly in African tone languages?
But, technologies change and give rise to new approaches to linguistic data and the definition of linguistic knowledge
17. HMM-based Speech Synthesis System (HTS)
2002 First version released following pioneering work of Keiichi Tokudahttp://www.sp.nitech.ac.jp/~tokuda/
18. Technical detail on HMM synthesis for less resourced languages
Described in
Roux, JC &Visagie, AS. 2007. Data-driven approach to rapid prototyping Xhosa speech synthesis, Proceedings of the 6th ISCA Speech Synthesis Workshop, Bonn, Germany, pp 143-147
Maia, R, Zen, H, Tokuda, K, Kitamura, T & Resende, FGV.2003.Towards the development of a Brazilian Portuguese Text-to-Speech system based on HMM, Eurospeech, Geneva, pp 2465-68
19. Types of knowledge involved(1)
20. Types of knowledge involved(2)
21. Types of knowledge involved(3)
Point is:
Fine grained phonetic data / tonological rules are not necessarily required for generating plausible intelligible speech as long as the carriers of those information are present in the text data (and corresponding speech recordings)used for training the system
22. Characteristics of text-to-speech (TTS) systems
23. Examples
isiXhosa

  • based on 43 minutes of actual recorded speech

24. no tonal information included 25. 3 339 wordsSA English

  • based on 140 minutes of actual recorded speech