Post on 13-Jul-2018
Multilingual Speech Processing -Rapid Language Adaptation Tools & Technologies
Tanja Schultz1,2 & Alan W Black1
1 Language Technologies Institute (LTI), Carnegie Mellon2 Cognitive Systems Lab, Karlsruhe Institute of Technology
Interspeech 2010, Tutorial on Multilingual Speech ProcessingSunday, September 26 2010, T-S2-R3 13:00 – 15:45, Makuhari, Japan
2/145
Tutorial Agenda
13:00 – 14:15 Part 1: Introduction and Motivation
o Motivation
o History and Leveraged Work
o Rapid Language Adaptation Server: Spice
o What, Why, and How
o Building process
14:15 – 14:30 BREAK
14:30 – 15:45 Part 2: SPICE - Under the hood
o Latest Experiments and Results in ASR and TTS
o Lessons Learnt from past studies
o Future
3/145
Outline Part 1
13:00 – 14:15 Part 1: Introduction
o Motivation
o World of Languages – Languages of the World
o Speech Processing Systems
o Speech-to-Speech Translation
o Spoken Dialog Systems
o History and Leveraged Work
o Globalphone
o FestVox
o Spice: A Rapid Language Adaptation Server
o User's Level View
o Walkthrough
o What, Why and How to use SPICE
o Overview of the Building Process
4/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Prior Work: GlobalPhone and FestVox
o Intelligent Learning Systems
o Rapid Language Adaptation Server
5/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Prior Work: GlobalPhone and FestVox
o Intelligent Learning Systems
o Rapid Language Adaptation Server
6/145
Many Languages – So What?
Do we really need Speech Processing in many languages?
Myth: “Everyone speaks English, why bother?”
NO: About 6900 different languages in the world
Increasing number of languages on the web
Humanitarian and military needs
Rural areas, uneducated people, illiteracy
Why is this an research issue?
Myth: “It’s just retraining on foreign data – simple!”
NO: Other languages bring unseen challenges, for example:
different scripts, no vowelization, no writing system
no word segmentation, rich morphology,
tonality, click sounds,
social factors: trust, access, exposure, cultural background
7/145
Everyone speaks English, why bother?
o Huge number of Languages in the world: 6912
o Language is not only a communication tool but
fundamental to cultural identity and empowerment
o Treat linguistic
diversity as we treat
bio-diversity
(David Crystal)
o The strongest
eco systems are the
most diverse
o Cultures, ideas,
memories are
transmitted
through language
8 75
264
892
17791967
1071
344
204308
0
200
400
600
800
1000
1200
1400
1600
1800
2000
[100
- 99
9] M
io
[10
- 99]
Mio
[1 - 9]
Mio
[100
,000
- 1M
io]
[10,
000
- 99,
999]
[100
0 - 9
999]
[100
- 99
9]
[10
- 99]
[1 - 9]
UNK
Each dot gives the geographic center of the 6,912 living
languages, http://www.ethnologue.com (accessed Jul 2007)
8/145
Top Languages – Distribution
http://www.ethnologue.com, as of 25.08.2010
Distribution of Living Languages
9/145
So we need language support but why Speech?
Computerization: Speech is the key technology
Ubiquitous Information Access: on the go, phone-based
Mobile Devices: Too small and cumbersome for keyboards
Globalization:
Cross-cultural Human-Human Interaction
Multilingual Communities: EU, South Africa, …
Humanitarian needs, disaster, health care
Military ops, communicate with local people
Human-Machine Interfaces
People expect speech-driven applications in their mother tongue
Speech Processing in multiple Languages
Why Speech Processing?
10/145
ML Speech Processing – Research Issue?
It’s just retraining on foreign data - no science!
o New language – new challenges
o Writing system: different or no script, no vowelization, G-2-P
o Word segmentation, morphology
o Sound system: tonals, clicks
o Different Cultures – social factors
o trust, access, exposure, background
o Lack of Data and Resources
o Audio recordings, corresponding transcripts
o Pronunciation Dictionaries, Lexicon
o Text corpora, parallel bilingual data
o Lack of Experts
o Technology experts without language expertise
o Native language experts without technology expertise
No
No
No
No
11/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Prior Work: GlobalPhone and FestVox
o Intelligent Learning Systems
o Rapid Language Adaptation Server
12/145
Language Characteristics
Prosody, Tonality: Stress, Pitch, Lenght pattern, Tonal contours
(e.g. Mandarin 4, Cantonese 8, Thai & Vietnamese 5)
Sound system: simple vs very complex sound systems
(e.g. Hawaiian 5V+8C vs. German 17V+3D+22C)
Phonotactics: simple syllable structure vs complex
consonant clusters(e.g. Japanese Mora-syllables vs. German pf,st,ks)
Segmentation: Written form separate words by white space?
(NO: Chinese, Japanese, Thai, Vietnamese)
Morphology: short units, compounds, agglutination
English: Natural segmentation into short units – great!
German: Compounds – not quite so good
Donau-dampf-schiffahrts-gesellschafts-kapitäns-mütze …
Turkish: Agglutination – looooong phrases
Osman-l-laç-tr-ama-yabil-ecek-ler-imiz-den-miş-siniz
behaving as if you were of those whom we might consider not
converting into Ottoman
13/145
Writing Systems
Writing systems – basic unit is a Grapheme:
Logographic: based on semantic units, grapheme represents meaning
Chinese: >10.000 hanzi; Japanese ~7000 kanji, Korean to some extend
Phonographic: based on sound units, grapheme represents sound
Segmental: grapheme roughly corresponds to phonemes
Latin (190), Cyrillic (65), Arabic (22) graphems
Abjads = consonantal segmental phonographic, e.g. Arabic
Syllabic: grapheme represents entire syllable, e.g. Japanese kana
Abugidas = mix of segmental and syllabic systems
Featural: elements smaller than phone, e.g. articulatory features
e.g. Korean: ~5600 gulja
Segmental: Latin, Cyrillic, Latin&Cyrillic, Greek,
Georgian or Armenian
Abjads: Arabic, Arabic&Latin, Hebrew&Arabic
Abugidas: North Indic, South Indic, Ethiopic,
Thaana, Canadian Syllabic ,
Logographic+syllabic: Pure logographic,
Mixed logographic&syllabaries,
Featural syllabary+lmtd logographic
Featural-alphabetic syllabary
Wikipedia: August 2007
14/145
Scripts – Examples
Scripts of some languages: Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, English, Greek,
Hebrew, Hindi, Italian, Japanese, Korean, Romanian, Russian, Serbian, Thai
How many languages do have a written form?
• Omniglot lists about 780 languages that have scripts
• True number might be closer to 1000
(Source Simon Ager, 2007, www.omniglot.com)
Logographic scripts, mostly 2 representatives:
• Chinese: ~ 10.000 hanzi,
• Japanese: ~7000 kanji (+ 3 other scripts )
Phonographic:
• Korean: ~5600 gulja,
• Arabic, Devanagari, Cyrillic, Roman: ~100 characters
15/145
Grapheme-to-Phoneme Relation
Grapheme-to-Phoneme (Letter-to-Sound) Relationship:
Logographic: NO relationship at all
concern for Chinese, Japanese, Korean
Phonographic: segmental: close – far – complicated
e.g. Finnish, Spanish: more or less 1:1, -- English: try „Phydough“
Phonographic: segmental – consonantal
e.g. Arabic: no short vowels written
Phonographic: syllabic
e.g. Thai, Devanagari: C-V flips
Automatic Generation of Pronunciations might get complicated
Phonographic Logographic
English Korean
JapaneseFrench
Finnish Chinese
Ratio Phonetic/Semantic Code
De
Fra
ncis
/Ung
er
16/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Intelligent Learning Systems
o Prior Work: GlobalPhone and FestVox
o Rapid Language Adaptation Server
17/145
One more Reason for MLSP …
6900 Languages in the world …. BUT
o Extinction of languages on massive scale (David Crystal, Spotlight 3/2000)
o Half of all existing languages die out over next century On Average: One language dies every two weeks!
o Survey Feb 1999 from Summer Institute of Linguistics
51 languages with 1 speaker left
28 of those in Australia alone
500 languages with 500 spks
1500 languages with < 1000 spks
3000 languages with < 10.000
5000 languages with < 100.000
96% of world‟s languages are
spoken by only 4% of its people
8 75
264
892
17791967
1071
344
204308
0
200
400
600
800
1000
1200
1400
1600
1800
2000
[100
- 99
9] M
io
[10
- 99]
Mio
[1 -
9] M
io
[100
,000
- 1M
io]
[10,
000
- 99,
999]
[100
0 - 9
999]
[100
- 99
9]
[10
- 99]
[1 -
9]
UNK
18/145
The Future of Language
Is a language with 100.000 speakers safe?
o Survival for generations depends on pressure imposed on language
o Dominance of another language, Attitude of the speakers
o Example Breton: beginning of 20th century has 1 Mio speakers, now
down to 250.000; Without effort Breton could be gone in 50 years
Reasons that languages die:
o Disaster: Earthquake on Papua New Guinea: Sissano, Warapu, Arop
o Genocide: 90% America‟s natives died within 200 years Europeans
o Cultural assimilation: Colonialism, Suppression, Assimilation:
o (1) Political, social, economic pressure to speak the dominant language,
o (2) Emerging bilingualism,
o (3) self-conscious semilingualism, (4) monolingualism
Why should we care?
o Massive death of languages reduces the diversity
o Bio-diversity has been accepted to be a good thing
o Maybe we should accept this for language diversity (D. Crystal)
19/145
What can we do?
What do we learn from other languages?
o Intellectual issues: increase awareness of world history
such as movements of early civilization
o Practical issues: medical practices, alternative treatment forms
o Literature … but also new things about the language itself
o Slovakian proverb: “with each newly learned language you acquire a new soul”
How to save endangered languages:
o Community itself must want it, Surrounding culture must respect it
o Funding for courses, materials, and teachers, support the community
o Get linguists into the field, publish information, grammars, dictionaries
Costs associated:
o Depends on conditions (written vs. unwritten languages, etc.)
o Crystal estimates about $80.000 / year per language
o 3000 endangered languages is about $700Mio …
o Organizations to raise funds
o Foundation of endangered languages (FEL), UNESCO project
20/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Intelligent Learning Systems
o Prior Work: GlobalPhone and FestVox
o Rapid Language Adaptation Server
21/145
o Lack of Resources: Stochastic approach needs many data
o Hundreds of hours audio recordings and corresponding transcriptions
Audio data 40 languages; Transcriptions take up to 40x real time
o Pronunciation dictionaries for large vocabularies (>100.000 words)
Large vocabulary pronunciation dictionaries 20 languages
o Mono- and bilingual text corpora: few language pairs, pivot mostly English
o Algorithms are language independent – MLSP is not!
o Other Languages bring unseen challenges (segmentation, G2P, etc.)
o Have we already seen ALL or MOST of the language characteristics?
o Social and Cultural Aspects
o Non-native speech and language, code switching
o Combinatorical explosion (domain, speaking style, accent, dialect, ...)
o Few native speakers at hand for minority (endangered) languages
o Do we have the right data?
o Lack of Language Experts
o Bridge the gap between technology experts and language experts
Challenges of MLSP
22/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Intelligent Learning Systems
o Prior Work: GlobalPhone and FestVox
o Rapid Language Adaptation Server
23/145
Intelligent systems that learn a language from the user
o Efficient learning algorithms for speech processing
o Learning:
o Interactive learning with user in the loop
o Statistical modeling approaches
o Efficiency:
o Reduce amount of data (save time and costs): at least by factor of 10
o Speed up development cycles: days rather than months
Rapid Language Adaptation from universal models
o Bridge the gap between language and technology experts
o Technology experts do not speak all languages in question
o Native users are not in control of the technology
One Solution: Learning Systems
24/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Intelligent Learning Systems
o Prior Work: GlobalPhone and FestVox
o Rapid Language Adaptation Server
25/145
GlobalPhone
Prior Work: GlobalPhone and FestVox
26/145
Multilingual Database
Widespread languages
Native Speakers
Uniform Data
Broad Domain
Large Text Resources
Internet, Newspaper
Corpus
19 Languages … counting
1800 native speakers
400 hrs Audio data
Read Speech
Filled pauses annotated
Arabic
Ch-Mandarin
Ch-Shanghai
German
French
Japanese
Korean
Croatian
Czech
Portuguese
Russian
Spanish
Swedish
Tamil
Turkish
+ Thai
+ Creole
+ Polish
+ Bulgarian
+ Vietnamese
+ ... ???
GlobalPhone
http://www.cs.cmu.edu/~tanja/GlobalPhone
Available from ELRA
Or check with Tanja
27/145
Phones in GlobalPhone
Multilingual Speech Processing, Schultz&Kirchhoff (ed.), Chapter 4, p.86
28/145
1011.8
14 14.5 14.516.9 18 19 20 20 20.3
23.1 24.3
36.633.8
44.546.4
36.1
45.2 44.1
36.1
46.8
36.7
43.5
0
5
10
15
20
25
30
35
40
45
50
JA DE EN KO CH TU FR PO KR SP BL CZ PL RU
Err
or
Rate
[%
]
Word error rate Phoneme error rate
GlobalPhone Recognizers in 14 Languages
29/145
GlobalPhone: Morphology & OOV-Rate
Language Corpus Size
(in Mio of
word tokens)
Vocabulary
(in thousands of
word types)
OOV at
60k / 64k
English WSJ 19 105 1%
Spanish Newspaper 100 490 ~1.5%
Portuguese Newspaper 11 270 4.3%
German Broadcast N 45 >900 4.4%
Czech Newspaper 16 415 8%
Serbo-Croatian Internet 12 350 8.7% (49k)
MSA Newspaper 19 690 11%
Turkish Newspaper 16 500 15%
Korean Newspaper 15 (eojeols) 1400 (eojeols) 31%
44 (syllables) 3.5 (syllables) ~0.01%
Chinese Newspaper 82 (pinyin) 59 0%
Multilingual Speech Processing, Schultz&Kirchhoff (ed.), Chapter 4, 5, 9
30/145
FestVox
Prior Work: GlobalPhone and FestVox
31/145
FestVox: Building Synthetic Voices
http://festvox.org [Black and Lenzo 2000]
o Documentation, Tools, Scripts, Examples
o Building Synthetic Voices in the Festival Speech Synthesis System
o Supports:
o Diphone, unit selection, (later Statistical Parametric Synthesis)
o Lexicon, letter to sound rules
o Text processing support.
32/145
Early FestVox Example Languages
o CMU development
o Croatian, Thai, Chinese (Mandarin), Japanese,
Catalan, Spanish, Nepali, Telugu, Tamil, Dari,
Pashto, Farsi
o Non-CMU
o At least: Italian, Malay, Maori, Mongolian,
Spanish, Telugu, Hindi, Japanese, English
(Many), German, Swedish, Polish, …
33/145
TTS Build Tasks
o Define phone set
o Define pronunciations (LTS vs. Lexicon)
o Design prompt list
o Record data
o Write text front-end
o Number, symbol expansion
o Write/train prosody model
o Deal with something novel
o Word segmentation, no vowels, declensions
34/145
TTS Build Results
o Results strongly correlated to effort
o Must-have for funded project
o Involve speech experts
o End of semester (student graduates)
o Almost random distribution rights
o Others can‟t always use the previous results
o No explicit copyrights (and no way to change them)
o Results often not in format for re-use
35/145
Joint Speech Model Development
CMU projects: Arabic, Thai, Croatian, Farsi
o Shared audio data collection
o Prompts with phonetic coverage
o Lots of (ASR) / Single (TTS) speaker(s)
o Shared Phone set
o Sometimes “similar” e.g. with/without Tone
o Shared Pronunciation Data
o (Note) input and output are different vocab
But we need a much tighter coupling ….
36/145
Introduction Outline
o Many Languages – so what?
o Growing Language Diversity on the web
o Why do we need Speech Processing in many languages?
o Is this really science – not just retraining on a new language?
o Language Characteristics
o Written form, scripts, letter-to-sound relationship
o Issues and Differences between languages
o Language Extinction
o Do we care? What can we do about?
o Challenges of Multilingual Speech Processing
o Lack of Resources
o Lack of Experts
o Solutions
o Intelligent Learning Systems
o Prior Work: GlobalPhone and FestVox
o Rapid Language Adaptation Server
37/145
Speech Processing: Interactive Creation & Evaluation toolkit
• National Science Foundation, Grant 2004-2008 (Schultz & Black)
• Bridge the gap between technology experts language experts
• Automatic Speech Recognition (ASR),
• Machine Translation (MT),
• Text-to-Speech (TTS)
• Develop web-based intelligent systems
• Interactive Learning with user in the loop
• Rapid Adaptation from universal models
• SPICE webpage http://cmuspice.org
Rapid Language Adaptation Toolkit (RLAT)
• Text Data Webcrawling, Focused Recrawling, Text Normalization
• Wiktionary-based Pronunciation Generation, Telephone Interface
• RLAT webpage http://csl.ira.uka.de/rlat-dev
SPICE and RLAT
38/145
Input: Speech
Pronunciation rules
hi /h//ai/you /j/u/we /w//i/
hi youyou areI am
AM Lex LMOutput:
Speech & Text
Hello NLP
/
MT
TTS
Text data
Phone set & Speech data
Speech Processing Systems
39/145
Lexst LMt
Word s
Word t N-grams
AMtDictt
Word
phone
sequence
LMt
N-grams
AMs Dicts
Word
phone
sequence
Lexts
Word s
Word t
LMs
N-grams
AMs Dicts LMs
Word
phone
sequenceN-grams
AMtDictt
Word
phone
sequence
Input Ls Output Lt
Input LtOutput Ls
Speech-to-Speech Translation
Lsource Ltarget
Lsource Ltarget
SPICE Design Principles
o Data Sharing → Language Universal Models
o Knowledge Sharing across System Components
40/145
Monolingual Dialog Systems
o Speech Models
o ASR acoustic and language models
o TTS models
o Lexical coverage
o NLP models
o Parsing
o Generation
o Interpretation (back-end processing)
41/145
SPICE
o Collects:
o Appropriate text data
o Appropriate audio data
o Defines:
o Phoneme set
o Rich prompt set
o Lexical pronunciations
o Produces:
o Pronunciation model
o ASR acoustic model
o ASR language model
o TTS voice
o Maintains:
o Projects and users login
o Data and Models
42/145
User„s View of Spice
Building support for a new language
o Login and Project Registration
o Text Collection & Prompt Selection
o Speech Data Collection
o Phone Set Specification and Selection
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
43/145
Login and Project Registration
o Separate “projects“ for each language
o could share info between different projects
o All tasks times are logged
o Allow us to do cost/efficiency studies
44/145
45/145
46/145
Text Collection
o We need text data for the target language
o Web crawler
o Plus boost data from similar sites
o Language encoding
o Non-trivial, but ...
o Deal with very common alphabets
o Internally all utf-8
o In-domain vs general text
o Character analysis
o Find the character classes:
o casing, numerals, punctuation etc
47/145
48/145
Prompt Selection
o Prompts for recording:
o Collection without transcription
o “Good” coverage will give “clean” models
o Prompts should be:
o Easy to say (no hard words, numerals etc)
o Rich in variability
49/145
Finding Nice Prompts
o Only contain high frequency words
o No unusual words with unusual spelling
o 5 to 15 words
o Make them easy to say without errors
o Easy to say in one breath group
o “Phonetically” rich
o But we have no phonetic information yet
o Make them orthographically rich
o Greedily select to maximize tri-graphs
50/145
Speech Data Collection
o Online audio recording tools
o Collaboratively record large number of speakers
o Speakers may separate from developer
o Visual feedback during recording
o Automatic upload on completion
o Java based for portability
o Works with *many* browsers
o In control of recording
o We can control the recording format
o File contents and directory structure
51/145
52/145
New Alternative: Telephone-based Collection
52
o Option 1: use web-based recorder
o Option 2: Telephone (RLAT)
IVR: Interactive Voice Response
PSTN: Public Switched Telephone Network
VoIP: Voice over IP
53/145
Implementation
• Dialplan implementation: extensions.conf
• Phone number +49 721 180 30 681
53
54/145
Phoneme Selection
o Selection from standard IPA chart
o User‟s names for phonemes
o Can match their lexicon (if one exists)
o Can match their familiarity
o Audio feedback
o Click to hear recording of each phone
o Allows us to map their phone names
o We map phones to IPA
o Get phonetic features for user‟s phones
o (what are vowels, what are stops etc)
55/145
56/145
57/145
User„s View of Spice
Building support for a new language
o Login and Project Registration
o Text Collection & Prompt Selection
o Speech Data Collection
o Phone Set Specification and Selection
o Lexical construction
o ASR Bootstrap & training
o ASR Language model
o TTS Voice Building
58/145
59/145
60/145
61/145
62/145
63/145
64/145
BREAK
13:00 – 14:15 Part 1: Introduction
o Motivation, History, Leveraged Work
o Spice: A Rapid Language Adaptation Server
o User's Level Walkthrough
Login, Data Collection, Phone Selection
After the Break:
o Spice – Under the Hood
o Lexical construction
o ASR Bootstrap & training
o ASR Language model
o TTS Voice Building
o Latest Experiments and Results
o Future
65/145
SPICE: Demo Tape
66/145
Outline Part 2
14:30 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
1. Forget construction, i.e. perform Grapheme-based ASR
2. Harvest Internet (Wiktionary) Resources
3. Interactive Learning (LexLearner)
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Evaluation
o Future Steps
67/145
Input: Speech
Pronunciation rules
hi /h//ai/you /j/u/we /w//i/
hi youyou areI am
AM Lex LMOutput:
Speech & Text
NLP
/
MT
TTS
Textdaten„adios“ /a/ /d/ /i/ /o/ /s/
„Hallo“ /h/ /a/ /l/ /o/
„Phydough“ ???
Hello
Rapid Portability: Pronunciation Dictionary
68/145
11,5
19,218,4
24,526,8
15,614 12,7
3336,4
32,8
16
26,4
18,3
0,0
10,0
20,0
30,0
40,0
50,0W
ord
Err
or
Ra
te [
%]
Phoneme Grapheme (FTT)Grapheme
English Spanish German Russian Thai
Phoneme- vs Grapheme based ASR
Problem:
• 1 Grapheme 1 Phoneme
Flexible Tree Tying (FTT):
One decision tree
• Improved parameter tying
• Less over specification
• Fewer inconsistencies
0=vowel?
0=obstruent? 0=begin-state?
-1=syllabic? 0=mid? -1=obstruent? 0=end?
AX-m
IX-m
AX-b
69/145
Wiktionary as Source
o Automatic Pronunciation Dictionary Generation
o Idea: Automatically extract pronunciations from Wiktionary
69
Paper here at Interspeech 2010, Oral presentation, Wednesday, 5:20pm, Hall A/B
Tim Schlippe, Sebastian Ochs, and Tanja Schultz
Wiktionary as a Source for Automatic Pronunciation Extraction
70/145
Wiktionary
• Quantity Check: Given a word list, what is the percentage of words for which phonetic
notations are found in a complete IPA representation?
• Quality Check: How many pronunciations derived from Wiktionary are identical to
existing GlobalPhone pronunciations?
How does adding Wiktionary pronunciations impact the performance
of ASR systems?
71/145
Wiktionary as a Source
Top-Ten of Wiktionary
Language Editions
(July 2010)
http://meta.wikimedia.org
/wiki/List of Wiktionaries
Quantity of Pronunciations found
(% of pages with pronunciations)
72/145
Wiktionary as a Source
• Quantity of proper names Proper names can be of diverse etymological origin and can surface
in another language without undergoing the process of assimilation
to the phonetic system of the new language
important as difficult to generate with letter-to-sound rules
Search pronunciations of 189 international city names and
201 country names to investigate the coverage of proper
names:
73/145
Wiktionary as a Source
Amount of compared pronunciations, percentage of
identical ones and amount of new pronunciation variants:
Approach I: Using all Wiktionary pronunciations for training and decoding
Impact on ASR performance
Approach II: Using only those Wiktionary pronunciations in decoding that were
chosen in training (see table):
74/145
* Follow the work of
Davel&Barnard
* Word list:
extract from text
User
Word list W
i:= best select
Word wi
Generate
pronunciation P(wi)
TTS
P(wi) okay?Yes
Delete wi
No
Update G-2-P
Improve
P(wi)
G-2-P
Delete wi
* Update after each wi
effective training
* G-2-P
- explicit map rules
- neural networks
- decision trees
- instance learning
(grapheme context)
LexSkip
Dictionary: Interactive Learning
75/145
76/145
77/145
Lex Learner
78/145
Lex Learner
79/145
Issues and Challenges
o How to make best use of the human?
o Definition of successful completion
o Which words to present in what order
o How to be robust against mistakes
o Feedback that keeps users motivated to continue
o How many words to be solicited?
o G2P complexity depends on the
language (SP easy, EN hard)
o 80% coverage
hundred (SP) to thousands (EN)
o G2P rule system perplexity
Language Perplexity
English 50.11
Dutch 16.80
German 16.70
Afrikaans 11.48
Italian 3.52
Spanish 1.21
80/145
Lex Learner TTS
o TTS feedback good for lexical pronunciation
o [Davel and Barnard 2004]
o Play TTS version of predicted pronunciation
o This even helps expert phoneticians
o Need full IPA TTS voice
o We do phonetic based TTS
o Unit selection (high fidelity)
o Flat prosody (but only isolated words)
o Good enough to keep user on track
81/145
Outline Part 2
14:15 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation
o Future
82/145
Input: Speech
hi /h//ai/you /j/u/we /w//i/
hi youyou areI am
AM Lex LMOutput:
Speech & Text
NLP
/
MT
TTS
Phone set & Speech data
Hello
Rapid Portability: TTS
83/145
84/145
Statistical Parametric TTS
o Text-to-speech for Applications:
o Common technologieso Diphone: too hard to record and label
o Unit selection: too much to record and label accurately
o Statistical Parametric: “just right”
o Statistical Parametric Synthesis
o “HMM synthesis”
o clustergen trajectory synthesiso Clusters representing context-dependent allophones
o PRO: o can work with little speech (10 minutes)
o Robust to poor data
o CON: o Signal sounds “buzzy”, can lack varied prosody
85/145
Voice Building Process
o Can usually collect 300-500 utterances
o Single speaker, rich prompt set
o Have lexical coverage (from Lex Learner)
o Automatic labeling from acoustic models
o Automatic: spectral and prosodic models
o But prosody will be similar to recordings
o No text processing front end (yet)
86/145
Cross Lingual Voice Conversion
o Use non-native acoustic models
o Adapt them to target language
o Use small amount of target data
o Align and build mapping function (mllr or GMM)
o Requires phoneme mapping
o Automatic or by hand
[Anumachipalli and Black SLTU 2010]
87/145
CLVC from English
o Conversion to German and Telugu
88/145
Using ASR data for TTS
o Conventional TTS databases:
o Single speaker, well recorded
o Conventional ASR databases:
o Multi speaker, varied quality
o [Yamagishi et al IS2009]
o Speaker Adaptive Training
o Statistical Parametric Synthesis
o (Select “similar” speakers from DB)
89/145
Outline Part 2
14:15 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation
o Future
90/145
91/145
Input: Speech
hi /h//ai/you /j/u/we /w//i/
hi youyou areI am
AM Lex LMOutput:
Speech & Text
NLP
/
MT
TTS
Phone set & Speech data
+
Hello
Rapid Portability: Acoustic Models
92/145
Phone set & Speech data
Rapid Portability: Data
Step 1:
• Uniform multilingual database (GlobalPhone)
• Build Monolingual acoustic models in many languages
93/145
Multilingual Acoustic Modeling
Step 2:
• Combine monolingual acoustic models to a set of
multilingual “language independent” acoustic model
94/145
Speech Production is independent from Language IPA
1) IPA-based Universal Sound Inventory
2) Each sound class is trained by data sharing
Reduction from 485 to 162 sound classes
m,n,s,l appear in all 12 languages
p,b,t,d,k,g,f and i,u,e,a,o in almost all
Universal Sound Inventory
95/145
Input: Speech
hi /h//ai/you /j/u/we /w//i/
hi youyou areI am
AM Lex LMOutput:
Speech & Text
NLP
/
MT
TTS
+
Hello
Rapid Portability: Acoustic Models
Step 3:
• Define mapping between ML set and new language
• Bootstrap acoustic model of unseen language
96/145
Acoustic Model Building
o Acoustic Model Building requires:
o Recorded Read Speech Data
(since data is read, we have the transcripts!)
o Phone set definition
o Pronunciation Lexicon
o Two step process:
1. Configuration
2. Model Training
97/145
Acoustic Model Building - Configuration
o Checks dependencies and errors
o Lexicon and phone set correspond
o Words in recorded prompts are covered by the lexicon
o Divides the recorded data into training and test sets
o Performance evaluation
o Few data: K-fold cross-validation, with K = #speakers
o More data: Data split into 90% (train) and 10% (test)
98/145
Acoustic Model Building - Configuration
99/145
Acoustic Model Building - Training
o Requires successful configuration
o Creates Log files and Displays to the user
o All steps of training
o EM Training for Context Independent Models
o 3-state HMM
o Number of Gaussians per Model depends on data
o EM Training for Context Dependent Models
o Number of models depends on data
o MFCC front-end, LDA
o Progress of training procedure
o Results of performance evaluation
100/145
Acoustic Model Building - Training
101/145
Unsupervised Adaptation
• Goal:
– Build Automatic Speech Recognition (ASR) for unseen
Language/Accent/Dialect with minimal human effort
• Challenge:
– No or Few Data, i.e. no transcribed Data!!!
• Solution:
– No transcriptions
apply unsupervised training approaches
– Lack of Linguistic Knowledge
transfer knowledge from other languages
– Here:
• Use several languages
• … of the same language family
• … and combine knowledge
102/145
Experimental Setup
• Given: Data, Transcripts, ASR for several languages
• Recognition Systems for 4 Slavic Languages
– Croatian (South-Slavic, 7M spks)
– Russian (East-Slavic, 165M spks)
– Bulgarian (South-Slavic, 12M spks)
– Polish (West-Slavic, 56M spks)
• Wanted: ASR for Czech: (West-Slavic, 12M spks)
• GlobalPhone (ELRA), read speech, 100spks, ~ 20hrs per language
103/145
SOURCERussian
Phone
Set
AM
SOURCEBulgarian
Phone
Set
AM
SOURCECroatian
Phone
Set
AM
Cross-Language Transfer
• Benefit :Source Acoustic Model not touched, apply CD models
• Benefit: Faster Decoding
• Drawback: If applied iteratively (see later), no adaptation of target AM
TARGETCzech
Dict
LM
SOURCEPolish
Phone
Set
AM
Cross-
Language
Transfer
AM
Mapped
Dict
LM
104/145
Manual Phone Mapping
105/145
Cross-Language Transfer (C-T)o Given: ASR in 4 languages (Bulgarian, Croatian, Polish, Russian)
o Audio Data in Czech but NO Transcripts
o Czech Pronunciation Rules (G-2-P Mapping for Dictionary)
o Text Data, Vocabulary Selection (Automatically derived)
o ASR Performance on Czech
o Apply Manual Phone Mapping to Dictionary
o Apply Source Language Acoustic Models (AM) as is
o Recognize Target Language Czech (Dev set)
o Overall performance is depressingly bad
o Word Error Rates (WER) vary with source language
o Automatic Mapping gives slightly better numbers but not worth it
106/145
Improvements: Unsupervised Training (UT)
• UT: Assume audio but no transcription
– take recognizer hypothesis as transcription
• Several interesting works on UT (Zavaliagkos, 1998; Lamel
et. al 2002; Wessel/Lööf/Ney 1999-2009, …)
• UT effective only in combination with „confidence scores“
to select/weight the correct portions of the hypotheses
• Kemp and Schaaf, 1997 proposed „gamma“ and „A-stabil“
as confidence measures (JRTk)
• Problem: A-Stabil works well
for well trained Acoustic
Models (AM) but not with
badly trained AMs,
so NO option for C-T
107/145
Multilingual A-Stabil
108/145
Results on Multilingual A-Stabil
109/145
Bootstrap Framework
WER C-T Iter1 Iter2
Bulgarian 61.0 24.5 23.6
Croatian 57.2 24.6 23.7
Polish 55.8 24.4 24.1
Russian 64.3 24.1 23.8
BestCZ 23.1(supervised)
Performance on
Czech Development Set
Poster here at IS2010:
N.T. Vu, T.Schlippe, F. Kraus, T. Schultz,
Rapid Bootstrapping of five Eastern European
languages using the Rapid Language Adaptation
Toolkit , Tuesday 10:00, Room B
110/145
Outline Part 2
14:15 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation
o Future
111/145
Language Model Building
Goal:
o Get as much relevant text data as possible
o Use the text data for
o Generating recording prompts
o Generating vocabulary lists
o Build Language Models for ASR
Approach
1. User supplies an URL to SPICE for crawling
2. Crawler retrieves N documents (web-pages)
3. Compute the statistics (TF-IDF) from the N documents
4. Terms with highest TF-IDF score form query terms
5. Query search engine (Google) to get the URLs for the
query terms
6. Crawl the URLs for the data
112/145
113/145
RLAT – Snapshot function
o Informative feedback about the quality of the crawled text
o Results show quality (PP, OOV), computed and displayed periodically
(to be defined by the user) during the crawling process
RLAT webpage:
http://csl.ira.uka.de/rlat-dev
114/145
RLAT – Snapshot function
RLAT webpage: http://csl.ira.uka.de/rlat-dev
o PP
o OOV
o 10-fold
115/145
RLAT – Snapshot function
Example: 18 days of crawling www.leparisien.fr (French)
n-gram coverage
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0.0
OOV Rate (%)
100 Mio
50 Mio
150 Mio
200 Mio
250 Mio
300 Mio
350 Mio
400 Mio
100 K
50 K
150 K
200 K
250 K
300 K
Vocabulary size Total words
116/145
Text Normalization based on SMT and Internet User Support
o Text extraction, HTML tag removal
o Language independent normalization
o Language-specific normalization
o Common abbreviations, punctuation, numbers, dates, casing
o New approach: (See Interspeech 2010 paper)
o Web-based user interface for language-specific text normalization
o Hybrid approach (rules + SMT)
Figure: Web-based User Interface for Text Normalization
117/145
Text Normalization based on SMT and Internet User Support
o Experiments and Results:
o How well does SMT perform in comparison to LI-rule, LS-rule and
human?
o How does the performance of SMT evolve over the amount of
training data?
o How can we modify our system to get a time and effort reduction?
o Evaluation:
o comparing the quality of 1k output sentences derived from the
systems to text which was normalized by native speakers in our lab
o creating 3-gram LMs from our hypotheses and evaluated their
perplexities on 500 sentences manually normalized by native
speakers
o Detailed Results:Paper here at Interspeech 2010
Tim Schlippe, Chenfei Zhu, Jan Gebhardt, and Tanja Schultz,
Text Normalization based on Statistical Machine Translation and
Internet User Support
118/145
Text Normalization based on SMT and Internet User Support
Figure: Performance (edit dist.) over amount of training data
119/145
Outline Part 2
14:15 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation
o Future
120/145
o Goal: Build Afrikaans – English Speech Translation System with SPICE
o Cooperation with University Stellenbosch and ARMSCOR
o Bilingual PhD visited CMU for 3 month
o Afrikaans: Related to Dutch and English,
g-2-p very close, regular grammar, simple morphology
o SPICE, all components apply statistical modeling paradigm
o ASR: HMMs, N-gram LM (JRTk-ISL)
o MT: Statistical MT (SMT-ISL)
o TTS: Unit-Selection (Festival)
o Dictionary: G-2-P rules using CART decision trees
o Text: 39 hansards; 680k words; 43k bilingual aligned sentence pairs;
Audio: 6 hours read speech; 10k utterances, telephone speech (AST)
SPICE 2005: Afrikaans – English
121/145
o Good results: ASR 20% WER; MT A-E (E-A) Bleu 34.1 (34.7), Nist 7.6 (7.9)
o Shared pronunciation dictionaries (for ASR+TTS) and LM (for ASR+MT)
o Most time consuming process: data preparation reduce amount of data!
o Still too much expert knowledge required (e.g. ASR parameter tuning!)
58 7
311
5 50
5
10
15
20
25
Data Training Tuning Evaluation Prototype
daysAM (ASR) Lex LM (ASR, MT) TM (MT) TTS S-2-S
Time Effort
Herman Engelbrecht, Tanja Schultz, Rapid Development of an Afrikaans-English Speech-to-Speech Translator ,
IWSLT 2005, Pittsburgh, PA, October 2005
122/145
SPICE 2007: Field Experiments
o Now targeting more languages in a shorter time frame
o 6-weeks Hands-on Course at CMU in Spring 2007
o Adopt native languages of participating students as targets
o Added up to 10 different languages: Bulgarian, English, French,
German, Hindi, Konkani, Mandarin, Telugu, Turkish, Vietnamese
o Teams of two students with different native language
o Course goal was to build a simple S-2-S system and use
this to communicate with each other in their mother tongue
o Solely rely on SPICE tools
o Build speech recognition components in two languages
o Build simple SMT component in two directions
o Build speech synthesis components in two languages
o Report back on problems and system shortfalls
123/145
Field Experiments (2)
o The 10 languages cover broad range of peculiarities
o Writing system:
o Logographic Hanzi (Mandarin);
o Cyrillic (Bulgarian);
o Roman (German, French and English);
o phonographic segmental (Telugu and Hindi);
o phonographic featural (Vietnamese)
o No script: Konkani
o Segmentation: No segmentation (Chinese); Segmentation white
spaces do not necessarily indicate word (Vietnamese)
o Morphology: simple, low inflecting (English), compounding (German),
agglutinating (Turkish) …
o Sound System: tonal (Mandarin and Vietnamese), stress (Bulgarian)
o G-2-P: straightforward (Turkish), challenging (Hindi), difficult (English),
no relationship (Chinese), invented (Konkani)
124/145
Lessons Learned
o It is possible to create speech processing components for
10 languages in 6-weeks using SPICE
o Each language brings new challenges
o Many SPICE features turned out to be very helpful, e.g.
only ONE speaker of Konkani in Pittsburgh, web recorder
allowed remote collection of more speakers
o Log: time spent
in SPICE interface
o Improve interface
using breakdown
o Use feedback
o Interface allows for
collaborative work
Task Time Spent
[hh:mm]
Text Collection 8:35
Audio Collection 10:07
Phoneme Selection 4:05
LM building 1:25
G-2-P specs 1:30
125/145
Outline Part 2
14:15 – 15:45 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation of time vs expertise
o Future
126/145
Initial evaluations
o Conducted 2 semester-long lab courses
o students use SPICE to create working ASR
and TTS in a language of their choice
o bonus for the ambitious
o train statistical MT system between two
languages to create a speech-to-speech
translation system
o Evaluation includes
o time to complete
o task difficulties
o ASR word error rate
o TTS voice quality
127/145
Evaluation details on TTS
o Research questions:
o Which features contribute most?
o Are language-dependent features critical?
o How does voice quality vary with:
o Amount of recorded data
o Number of lexical entries
o Can objective measures estimate voice quality?
o Can any measures motivate the user?
128/145
TTS from phonemes
o “welcome”
o W EH L K AH M
W23
EH912
L41
K941
AH553
M85
lexicon or LTS
clustered 'allophones'
symbolics
units or trajectories
acoustics
parametric synthesis
(trajectory-based)
predicted from
phonetic and linguistic
context using CART trees
129/145
CART tree features
o Four categories of training features
1. name: phoneme and HMM state context
2. position: e.g., % from start (of phrase,
word, phone, HMM-state)
3. IPA features: based on phoneme set
4. linguistic: e.g. PoS, syllable segmentation
o Increasing difficulty
o 1. and 2. are language-independent
o 3. requires a defined phoneset
o 4. requires a computational linguist
130/145
Effect of feature classes
Feature class Number Lang-dep. Δ MCD
no CART trees 1 no baseline
name symbolics 16 no - 0.452
position values 7 no - 0.402
IPA symbolics 72 yes - 0.001
linguistic sym. 14 yes + 0.004
o Mel-cepstral distortion (MCD)
o Standard objective measure in TTS/VC
o lower numbers are better
o ~ 0.2 is perceptually noticeable
o ~ 0.08 is statistically significant
o first two feature classes (name and position) matter
131/145
Effect of feature classes on MCD
arctic_slt
Wagon Stop Value
10 100 1000
MCD
1-2
4
4.5
5.0
5.5
6.0Cummulative Feature Classes
Legend
names
names + posn
names + posn + IPA
+ linguistic (hidden)
132/145
Effect of database size
o How does MCD improve with more speech?
o improvement of 0.2 from 3.75->7.5 minutes
o improvement 0.1-0.12 beyond that:
(near-linear increase with a doubling of data)
o 2x is perceptually better when <10m speech
o 4x is perceptually better after 10m speech
o Where is the asymptote?
o don't know yet!
o maybe 20 hours
133/145
Database size vs MCD
tested on
training data
tested on 10% heldout
Wagon Stop Value
10 100 1000
MCD
1-2
4
4.0
4.5
5.0
5.5
6.0Effect of Database Size on MCD
arctic_slt
Legend
1/16 hour
1/8 hour
1/4 hour
1/2 hour
1 hour
1 hour
134/145
Effect of a good Lexicon
o Grapheme-based voice
o 26 letters a-z are a substitute 'phone' set
o no IPA and linguistics features
o 3 slides ago showed that these don't matter
o English has highly irregular spelling
o the acoustic classes are impure
o measuring global voice quality – overlooking
mispronounced words
o Results:
o MCD improves by 0.27
o consistent across CART leaf node size
(“stop value”)
135/145
Grapheme vs Phoneme (English)
Wagon Stop Value
10 100 1000
MCD
1-2
4
4.5
5.0
5.5
6.0Grapheme versus Phoneme-based Voices
arctic_slt
Legend
grapheme based
phoneme based
136/145
10 non-English test languages
o European
o Bulgarian, French, German, Turkish
o Indian
o Hindi, Konkani, Tamil, Telugu
o East Asian
o Mandarin, Vietnamese
137/145
Evaluating non-English voices
o Need a 'good' and a 'bad' voice
o Phoneme-based English is good
o Grapheme-based English is bad
o Data covers 3m to 1h of speech
o may be extrapolated to about 4h
o Following voices are from student
lab projects
138/145
Non-English languages
Database Size (h)
0.1 1
MC
D 1
-24
4.5
5.0
5.5
6.0Effect of Database Size on MCD - Multi-Lingual
arctic_slt
Konkani
Mandarin
Bulgarian
HindiTamil
German
French
Vietnamese Legend
character-based
phoneme-based
139/145
Characterizing voice quality
o MCD and size permits a quick assessment
o French is in good shape
o German could use lexicon improvements
o Hindi and Tamil are good for their size
o recommendation: collect more speech
o Bulgarian, Konkani and Mandarin need more
speech and a better lexicon
o Vietnamese voice had character set issues
o resulted in only ¼ of the speech being used
140/145
More speech or a better lexicon?
o From the English MCD error curves
o 5x the speech = fixing the phoneset+lexicon
o Which is more time effective?
o assume 3-4 sentence recordings per minute
o assume 2-3 lexicon verifications per minute
o Answer
o If speech DB is small, record more speech
o If speech DB is large, work on the lexicon
o The transition point is language dependent
141/145
Evaluation on TTS: Conclusions
o Discovered
o a way to estimate the quality of new voices
o a framework for directing the best course of
actions (recording more data vs lexical
improvements)
o that IPA and our particular language
dependent features are not critical to
success (Phew!)
o Next stage
o detecting bad lexical entries from acoustics
142/145
Outline Part 2
15:40 – 17:00 Part 2:
o SPICE – Under the hood
o Text collection & Prompt Selection
o Phone set specification
o Lexical construction
o TTS Voice Building
o ASR Bootstrap & training
o ASR Language model
o Latest Experiments and Results
o Lessons Learnt from past studies
o Evaluation of time vs expertise
o Future
143/145
Conclusion
o Challenges in Multilingual Speech Processing
o Well defined build processes: ASR, MT, TTS … BUT:
o Every new language brings unseen challenges
o Current (statistical) approaches require lots of data
o … and native language expert and technology expertise
o How to bridge the gap between language and tech expert?
o Proposed solution: SPICE and RLAT
o Learning by interaction from a cooperative (but naïve) user
o Rapid adaptation from language universal models
o Knowledge sharing across components
o Development cycle: Days rather than weeks
o Better automatic identification of problem types
144/145
Future
o Continuous Server Support
o Improve Interface based on user feedback and lessons learned
o Improve Language Robustness: font encoding, …
o Software Engineering, Scaling
o Collaboration
o Multiple people working on the same project
o Leverage from archived projects
o Cross-confirmation
o Multiple views for within and across project confirmation
o Confidence measure to find appropriate combination
o Error-blaming
o End-to-end system Evaluation vs Component Evaluation
o Automatic Generation of Recommendations to improve systems
145/145
Latest Information
o System is online at http://cmuspice.org
o Use system for your own project
o Create new login/passwd and project
o Preloaded Hindi Example
o Login as
o Login: demo
o Passwd: demo
o Chose project # (your birth day)
146/145