Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources
description
Transcript of Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources
![Page 1: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/1.jpg)
J. Kunzmann, K. Choukri, E. Janke, J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. YamamotoT. Schultz, and S. Yamamoto
Automatic Speech Recognition and UnderstandingAutomatic Speech Recognition and Understanding ASRU, December 2001 ASRU, December 2001
Portability of ASR Technology Portability of ASR Technology to new Languages: multilinguality to new Languages: multilinguality issues and speech/text resourcesissues and speech/text resources
![Page 2: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/2.jpg)
Topics which will be addressedTopics which will be addressed Everybody speaks English: why bother with other
languages? Doing another language is simply training with
other data: no science left? Language portability: only an acoustic issue? Multilingual ASR: what is it good for? Data: what is available, what do we need? Beyond ASR
![Page 3: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/3.jpg)
Why bother with other languages?Why bother with other languages?
Myth: “Everyone speaks English, why bother?” About 4500-6000 different languages exist in the world Number of languages on internet is increasing English Internet pages: 80% -> 40% in 10 years Users’ mother tongue for acceptance Non-native speech
![Page 4: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/4.jpg)
Top-15 Languages of the WorldTop-15 Languages of the World
907
456 383 362 293 208 189 177 148 126 123 119 96 89 730
200400600800
1000120014001600
Spea
kers
[Mio
]
native language official language
Webster‘s New Encyclopedic Dictionary, 1992
![Page 5: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/5.jpg)
Another language? - No scienceAnother language? - No science
Myth: ASR in another language - It’s just training on
another database - there is no science here BUT: Other languages bring unseen challenges Have we even seen “all” language characteristics? Have we seen most of the language characteristics? Do we have the big picture? How do the differences effect ASR?
![Page 6: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/6.jpg)
Language CharacteristicsLanguage Characteristics What is a word? “the written string between two blanks”
Exp: Osman-l-laç-tr-ama-yabil-ecek-ler-imiz-den-miş-siniz
Inflection system?
Effects for ASR: language modeling •text processing, #words in text•vocabulary size, OOV-rates
• Performance Comparison?
![Page 7: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/7.jpg)
Language CharacteristicsLanguage Characteristics Grapheme-to-phoneme relation / writing system
•No written form at all!
Effects for ASR: Pronunciation dictionary
![Page 8: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/8.jpg)
Language CharacteristicsLanguage Characteristics Linguistic structure
• Phoneme system (number/confusability)• Tonality, stress pattern • Phonotactics (mora, consonant clusters)• Coarticulation
Effect for ASR:• Myth: IPA for real?• What kind of acoustic units? • Suprasegmental modeling?
![Page 9: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/9.jpg)
Questions to the audienceQuestions to the audience
Everybody speaks English: Why bother? For how many languages are speech interfaces
needed?
ASR in another language: no science? Have we seen most of the language characteristics
already? Do we have the big picture?
![Page 10: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/10.jpg)
Language PortabilityLanguage Portability Standard porting steps: do they work?
Audio data, text data, pronunciation model
(for some applications and languages -> yes)
What are suitable acoustic models for bootstrapping ? Language independent ASR ?
What phoneme set to use ?
What lexicon ?
![Page 11: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/11.jpg)
Why we need portability of ASR Why we need portability of ASR technologies to N languagestechnologies to N languages
Portability of ASR system/technology for human-machine interface to N languagesWhen ASR system/technology is applied to other languages,・ lack of speech corpus for acoustic modeling
・ lack of spoken language corpus for language modeling
Portability of ASR system/technology for multilingual speech translation
Extension to multilingual speech communicationExtension to multilingual speech communication
![Page 12: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/12.jpg)
Portability of ASR technology for Portability of ASR technology for multilingual speech translationmultilingual speech translation
• Speech translation = speech recognition + machine translation + other functions• Speech recognition requires a huge speech corpus.• Machine translation technology is shifting from rule-based
technology to corpus-based technology such as Stochastic MT or Example based MT.
• Corpus-based MT technology requires a huge sentence aligned bilingual spoken language corpus.
• One of the key issues is creation of sentence aligned corpus.• Some huge bilingual text corpora available• Lack of bilingual spoken language corpora
![Page 13: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/13.jpg)
MultilingualityMultilinguality What is multilinguality?
Seen/unseen languages
Non-native speech and language
Multiple systems with language switching
Should we be building language independent models ?
Is multilingual pronunciation modelling possible ?
Is multilingual language modelling sensible ?
![Page 14: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/14.jpg)
DataData What data do we need ? Make a wish
speech, transcriptions, lexicon, text corpora
number of languages
amount of speech and text data (#hours, #speakers, #words)
application domain
What is available ?
Do we have the right data?
![Page 15: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/15.jpg)
Data: What is available?Data: What is available? What is available?
From ELRA and LDC Transcribed speech data in >20 languages
Pronunciation dictionaries in the order of 10 languages
text corpora > 20 languages
GlobalPhone
What is planned? Speecon, OrienTel
Bilingual data ATR
![Page 16: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/16.jpg)
Japan
Russia
Israel
USAEnglish +US Spanish
China
Languages• Danish• Dutch• UK-English• US-English• Finnish• Flemish• French French• German & Austrian German• Swiss German• Hebrew• Italian• Japanese• Mandarin Chinese• Polish• Portuguese• Russian• Spanish• Swedish• Mandarin Taiwan• US Spanish• ...
Taiwan
ww
w. sp
eeco
n.co
mTHE WORLD ACCORDING TO SPEECON
![Page 17: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/17.jpg)
GlobalPhoneGlobalPhone Multilingual Database
Uniformity Widespread languages Newspaper domain + Large text corporaTotal sum of resources 15 languages so far Fully transcribed (15x20) 300 h speech 1400 native speakers Ready, Soon available
ArabicCh-MandarinCh-ShanghaiEnglishFrench
German JapaneseKoreanCroatianPortuguese
RussianSpanishSwedishTamilTurkish
![Page 18: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/18.jpg)
Speech/Text Resources being collected at ATRSpeech/Text Resources being collected at ATR
KoreanChinese
Italian
Japanese-EnglishBilingual Corpus
translation translation French
Thai
translation
Multilingual spokenlanguage corpus
German
Various Speech CorporaJapanese and other languages
![Page 19: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/19.jpg)
Speech/Text Resources being collected at ATRSpeech/Text Resources being collected at ATR
For both ASR and MT Bilingual conversation aided by human translators
16,000 utterances
Bilingual conversation via speech translation systems under construction
For MT Text of bilingual conversation
500,000 utterances Expanding with various methods including paraphrasing
Extension rate is high for paraphrasing.
![Page 20: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/20.jpg)
Data: Do we have the right set?Data: Do we have the right set? Do we have the right data?
What goal do you want to achieve ?
How much data do we need ?
Scripts / ready to go data ?
What do we need ?
You can’t always get what you want, you get what you
need (Rolling Stones)
![Page 21: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/21.jpg)
Beyond ASRBeyond ASR We cross cultural borders
Are concepts the same across languages?
Defining concepts Time concepts: when does the day start Politeness concepts:
What is the relationship between “words” and
concepts ?
Generation
![Page 22: Portability of ASR Technology to new Languages: multilinguality issues and speech/text resources](https://reader036.fdocuments.us/reader036/viewer/2022070503/56815639550346895dc3e05f/html5/thumbnails/22.jpg)
Topics which may have been addressed Topics which may have been addressed Everybody speaks English: why bother with other
languages? Doing another language is simply training with
other data: no science left? Language portability: only an acoustic issue? Multilingual ASR: what is it good for? Data: what is available, what do we need? Beyond ASR