Centro per la Ricerca Scientifica e Tecnologica
Spoken language technologies: recent advances and future challenges
Gianni LazzariVIENNA July 26
Centro per la Ricerca Scientifica e Tecnologica
Focus on the use of Spoken Language Technologies for multilingual transcription
and reporting tasks
SUMMARY Short introduction on SLT Where are we today ? TC-STAR and RAI projects Outlook for the future
Spoken Language Technologies: recent advances and future challenges
3
Typical tasks in Human Language Technologies
(HLT)
speech recognition (voice commands & speech transcription)
character recognition object and gesture recognition (spoken and written) language understanding spoken dialog systems speech synthesis text summarization document classification and information retrieval syntactic analysis of natural language speech and text translation • ...
Spoken Language Technologies: recent advances and future challenges
4
General Spoken Language System Architecture
inputinputRecognition
Understanding and dialog
Generation and Synthesis
answeranswer
MODELS
acoustic
language
semantic
dialog
synthesis
Spoken Language Technologies: recent advances and future challenges
5
Speech Transcription System Architecture
InputInput
Audio:Audio:
-Noise-Noise
-Speech-Speech
-Music-Music
-…..-…..
Recognition
results:results:
Enriched Text Enriched Text
MODELS
Acoustic
Language
Speakers
Speech Music Noise
Spoken Language Technologies: recent advances and future challenges
7
Standard Automatic Speech Recognition Architecture
Spoken Language Technologies: recent advances and future challenges
8
Word error rate of different speech recognition tasks
Dictation: 7%, well formed, computer, FBWBroadcast news: 12%, various, audience, FBWSwitchboard : 20-30% spontaneous, person, TBWVoicemail: 30% spontaneous, person, TWBMeetings: 50-60% spontaneous, person FF
The features characterizing these tasks are:
type of speech: well formed vs spontaneous target of communication: computer, audience, person bandwidth:
FWB, full bandwidth TWB, telephone bandwidth FF, far field.
Spoken Language Technologies: recent advances and future challenges
9
RAI Italian Broadcast news Transcription
Spoken Language Technologies: recent advances and future challenges
10
Evaluation of the Italian broadcast news transcription task.
Acoustic models are trained through a speaker adaptive acoustic modelling procedures
Two sets of acoustic models were trained, for wideband and narrowband speech: exploiting for each set about 140 hours of speech.
The LM was estimated on a 226M-word corpus including newspaper articles, for the largest part, and BN transcripts.
The LM is compiled into a static network with a shared-tail topology..
Spoken Language Technologies: recent advances and future challenges
11
Word error rate on the Italian broadcast news transcription task.
Wideband Narrowband Overall
First Pass
Second Pass
First Pass
Second Pass
First Pass
Second Pass
Old 15.5 14.2 25.2 22.4 17.6 16.0
New 14.6 11.7 21.0 17.1 16.0 12.9
Relative reduction
5.8% 17.6% 16.7% 23.7% 9.1% 19.4%
Spoken Language Technologies: recent advances and future challenges
12
STATISTICAL TRANSLATION BASED ON BAYESIAN DECISION RULE
Speech recognition Transformation
Source language text
Global Search
Transformation Speech synthesis
target language text
Lexicon model
Alignment model
Language model
Vorrei prenotare un albergo a Francoforte
I want to reserve a hotel room in Frankfurt
Spoken Language Technologies: recent advances and future challenges
13
Statistical Translation System
Spoken Language Technologies: recent advances and future challenges
16
Experimental findings in HLT research (1973-2004)
statistical methods most successful: in particular: speech recognition, language translation, parsing, dialog
systems, ... scientific foundations:
methods of computer science, statistical modelling, information theory handling huge amounts of data
200 hours of speech recordings, 100 Mio of running words, ... learning from data:
fully automatic procedures more data than can be processed by human experts
efficient algorithms: search/decision algorithms for heuristic search
• ...
Spoken Language Technologies: recent advances and future challenges
17
Research on HLT: 1973-2004
speech recognition (1973-2004) most of the progress: by pure statistical modelling some progress: by weak acoustic-phonetic-linguistic
knowledge,i.e. domain specific knowledge virtually no progress: by classical rule-based and AI methods
similar recent experience (1993-2004) machine translation, information extraction, dialog systems, ...
expectation for future progress in HLT most important: methodology: computer science, statistical modelling, information theory domain-specific knowledge:
acoustics, phonetics, linguistics, ...
Spoken Language Technologies: recent advances and future challenges
18
Spoken language translation: joint projects (national, European, international: ATR, C-Star, Verbmobil, Eutrans, Nespole!, Fame, LC-Star, PF-Star, TC-STAR:
restricted domains: appointment scheduling, conference registration, travelling, tourism
information, ... • vocabulary size: 3 000 – 10 000 words best performing systems and approaches: data-driven
example-based methods finite-state transducers statistical approaches
e.g.: Verbmobil evaluation [June 2000]: better by a factor of 2 written language translation: US Tides project 2001-2004
unrestricted domain: press news, vocab.size »= 50 000 words language pairs: Chinese!English, Arabic!English performance [July 2003]: best statistical systems are better than conventional/commercial
systems
TC-STARTechnology and Corpora
for Speech to Speech Translation
Contract Nr. FP6 506738
VI FRAMEWORK PROGRAM PRIORITY Multimodal Interfaces
IST-2002-2.3.1.6
Spoken Language Technologies: recent advances and future challenges
21
TC-STAR Project focuses on advanced research in key technologies for speech to speech translation:
- speech recognition (ASR)- spoken language translation (SLT)- speech synthesis (TTS)
- Start: April 2004- End: March 2007- Grant: 11 M. Euro
- METHODOLOGY: - COMPETITIVE EVALUATION- COOPERATION
TC-STAR
Spoken Language Technologies: recent advances and future challenges
22
Vision
Transcription and Translation of broadcast news, speeches, lectures and interviews
Vocal access
Web access
SimultaneousTranslation
Hi, What do you think about
Spoken Language Technologies: recent advances and future challenges
23
Application Scenario
A selection of unconstrained conversational speech domains:
- Broadcast news - European Parliament
Plenary Session
A few languages important for Europe society and economy:
European Accented English European Spanish Chinese
Spoken Language Technologies: recent advances and future challenges
24
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT PLENARY SESSION
TASK The Evaluation Tasks and Databases translation tasks:– English to Spanish: EPPS: European Parliament Plenary Sessions– Spanish to English: EPPS: European Parliament Plenary Session
Three types of input to SLT: – output of automatic speech recognition – verbatim manual transcriptions – final text editions (with punctuation marks)
Spoken Language Technologies: recent advances and future challenges
25
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT PLENARY SESSION
TASKTraining data • Sentence-aligned speeches and their translations • Final text editions: from April 1996 to Oct. 4th, 2004 • Verbatim transcriptions: from May 2004 to Oct. 4th, 2004
Development data Oct. 26, 2004Evaluation data Nov. 14, 2004
Spoken Language Technologies: recent advances and future challenges
26
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT PLENARY SESSION
TASK
Spoken Language Technologies: recent advances and future challenges
27
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT PLENARY SESSION
TASKASR EPPS DATA word error rate - wer- EUROEPAN ACCENTED ENGLISH: 9,5 % best TC-STAR- EUROPEAN SPANISH : 10,1 % best TC-STAR
SLT EPPS DATA position independent - wer- ENGLISH TO SPANISH 49% best PARTNER result- SPANISH TO ENGLISH 46% best PARTNER result
Top Related