2 APT/ITU Conformance and Interoperability Workshop ... · Speech Recognition Smart Phone ......

Contact : Chiori Hori E-mail : [email protected]

National Institute of Information and Communications Technology

(NICT), Japan

ASIA-PACIFIC TELECOMMUNITY

2nd

APT/ITU Conformance and Interoperability Workshop

(C&I-2)

Document:

C&I-2/ INP-12

26 August 2014, Bangkok, Thailand 26 August 2014

"Acceleration of R&D Towards Speech Translation Technologies in the Asia-

Pacific Region by U-STAR"

"Acceleration of R&D Towards Speech Translation Technologies in the Asia-

Pacific Region by U-STAR"

Chiori HoriSpoken Language Communication Laboratory

National Institute of Information and Communications Technology

National Institute of Information and Communications TechnologyUniversal Communication Research Center

Spoken Language Communication Laboratory

Kyoto, JapanEmail: [email protected]

Network

Closed caption

AndIndexes

Publicserver

Video data

Public Server

Video data

Public Server

Video data

Speech data

NICT audio indexing system

Audio indexing system

Real-time Audio indexing

Real-time indexing：speech transcriptionQuery-based retrieval, audio including queries, event categories,speaker diarization ( who speak what and when)Video categorization by topics

Speech Interface Human and Human, Human and

Machine for natural communication

Speech

Text-to-Speech

Synthesized Speech

From Kyotostation

Speech translationfor people speaking different languages

Spoken dialog system with machine

Speech-to-Speech

TranslationModality

Conversion

Japanese

English

Speech-to-textCommunication system

SpokenDialog

Transcribed speech

DialogManage-

ment

Japanese

Japanese text

How can I get to NICT？

MachineTranslation

SpeechRecognition

Smart Phone

Research Target

"To Create a World Without Language Barriers" by

International Research Consortium

http://en.wikipedia.org/wiki/List_of_language_families

Many different languages in the world Overcoming the language barriers is a long-held

dream of mankind. Speech translation technology

Breaking the language barriers

How to overcome language barriers?

Multilingual Speech Translation

Speech-to-Speech Translation (S2ST)A means of communication

between different language speakers

English“I go to school”

Speech Recognition

(ASR)

MachineTranslation

(MT)

SpeechSynthesis

(TTS)

ホテルの予約をお願いします．

a hotelmake a reservation forplease

Please makea reservation fora hotel

Japanese「ホテルの予約をお願いします．」

Corpora

Convert toEnglish wordsequence「ホテル」⇒ “a hotel”「予約」⇒”make areservation for”「お願いします」⇒“Please”

Convert toword sequenceusing lexicon and grammar

Convert toJapanese phoneme sequence“h”, “o”, “t”…

Select appropriate waveformfor English text

Reorder word sequencesaccording toEnglish grammar“a hotel” “please” “make a reservation for” “please” “a hotel”

h o t e r u n o y o y a k u o o n e g a i ...

Please make a reservation for a hotel

History of the International Consortium (1)

Network-based S2ST research by consortiums of C-STAR and A-STAR

2008 20102006 2007

A‐STAR

2009

Japan, China,Korea,

Indonesia, Thailand, India

(6 countries)

+Vietnam,Singapore

(2 countries)

A‐STARNetwork‐based

S2ST

2011 2012 20132000199919921991

C‐STAR

Japan,US,

Germany(3 countries)

C‐STARNetwork‐based

S2ST

+Korea, Italy,France, China,U.S., U.K.,Switzerland,Sweden,India,

(9 countries)

1993

Preparation for the U-STAR Research Activity

Polish speech

Hungarian speech

Dutch speech

German speech

English speech

Turkish speech

Portuguese speech

French speech

Japan NICT Korea ETRI Thailand NECTECIndonesia BPPT China CASIA India CDACVietnam IOIT Singapore I2R Bhutan DITTPakistan KICS‐UET Nepal LTK Mongolia MUSTMongolia NUM Sri Lanka UCSC Philippines UPDFrance CNRS‐LIMSI Portugal INESC‐ID Turkey TUBITAKUK University of Shefield Germany TUM Germany UUlmBelgium ESAT Hungary BME‐TMIT Hungary PPKE

Malay speech

Vietnamese speech

Hindi speech

Chinese speech

Indonesian speech

Thai speech

Korean speech

Japanese speech

Speech data for training acoustic models

Parallel corpus and dictionary for training translation models

from English to the target language

NICTJP

Speech-to-Speech translation

CM

LIBC

MLIB

CM

LB

S2ST servers

U-STAR

ASR/M

T/TTS servers

CMLIB is implemented for the U-STAR S2ST servers

S2ST Client

CM

LBC

MLB

CM

LBC

MLB

CM

LB

S2ST Application on SmartphoneMCML-based

Communication libraries (CMLIB)

Network-based Speech-to-Speech Translation (S2ST)

Communication between Different Language Speakers

ASR Module

Thai

ASR Module

Japanese

S2ST Server S2ST Server

Japanese Speaker

S2ST Client

ThaiSpeaker

S2ST Client

MT Module

Japanese → Thai

S2ST Server

TTS Module

Thai

S2ST Server

MT Module

Thai → Japanese

S2ST Server

TTS Module

Japanse

S2ST Server

Network

Initiation of Standardization from Asia

APT ASTAP Meeting (August 2009) A-STAR Speech-to-speech Translation Demo in 8 Countries (July 2009)

ASTAP 16 Plenary SessionDiscussion to develop the standardization activity more internationally, not limited to the Asian-Pacific region. -> Approved to raise the standardization draft from APT to ITU-T

U-STAR MOU (July 2010)

From Asia to the World

A-STAR to U-STAR

The Universal Speech Translation Advanced Research Consortium is an international research collaboration entity aiming to break language barriers around the world through network-based speech-to-speech translation (S2ST) technologies.

History of the International Consortium (2)

Network-based S2ST research by U-STAR

A‐STAR U‐STAR

Japan, China,Korea,

Indonesia, Thailand, India

(6 countries)

+Vietnam,Singapore

(2 countries)

A‐STARNetwork‐based

S2ST

U‐STAR Network‐based

S2ST

+Bhutan,Mongolia,Nepal, Pakistan,

Philippines,Sri Lanka

(6 countries)

+France, Portugal, Turkey, U.K.,

Germany, Hungary,Poland, Belgium,

Ireland(9 counties)

C‐STAR

Japan,US,

Germany(3 countries)

C‐STARNetwork‐based

S2ST

+Korea, Italy,France, China,U.S., U.K.,Switzerland,Sweden,India,

(9 countries)

2008 20102006 2007 2009 2011 2012 20132000199919921991 1993

From ASTAP To ITU-T Recommendations for network-based speech-to-speech translation andwas published by HP2/SG16.

F.745http://www.itu.int/rec/T‐REC‐F.745‐201010‐I

H.625http://www.itu.int/rec/T‐REC‐H.625‐201010‐I

TitleFunctional Requirements for

Network‐based S2ST Architectural Requirements

for Network‐based S2ST

Recomm‐endation(2010)

U-STAR Network-based Speech Translation

The orange-colored areas indicates the countries whose official languages are supported by U-STAR’s apps.

S2ST servers located all over the world are connected through network.

29 Research institutes from 24 countries/regions

Preparation for the U-STAR Research Activity

Polish speech

Hungarian speechDutch speech

German speechEnglish speechTurkish speech

Portuguese speechFrench speech

Japan NICT Korea ETRI Thailand NECTECIndonesia BPPT China CASIA India CDACVietnam IOIT Singapore I2R Bhutan DITTPakistan KICS‐UET Nepal LTK Mongolia MUSTMongolia NUM Sri Lanka UCSC Philippines UPD

Malay speechVietnamese speech

Hindi speechChinese speech

Indonesian speech

Thai speechKorean speech

Japanese speech

Speech data for training acoustic models

Parallel corpus and dictionary for training translation models

from English to the target language

NICTJP

U-STAR members Coverage of the official languages

Exampleof Hindi

27 MT servers, 17 ASR servers, 14 TTS serversChat system using speech

translationon a smartphone

Client App

ASR using the Collected Speech

20

22

24

26

28

30

32

34

JP

WER

(%)

Baseline AM (USV)

AM+LM (USV) AM+LM (SV)

30

35

40

45

50

55

60

65

70

TH

WER

(%)

Baseline AM (USV)

AM (USV)+Web AM (SV)+Web

Fig. Evaluation of Model Adaptation: Japanese (left) and Thai (right)

Accuracy improvementsusing the collected speech

Intraoperable Speech Communication Platform for 1) human-to-human and 2) human-to-machine

Back-End Server

ASR Servers

DM Server

Client

Online Shopping / BookingSystems

i.e.) Hotels, Stores,etc.

EmergencySystems

i.e.) Hospitals, Police Departments, etc.

Educational Systemsi.e.) VoIP Lessons,

Schools

MT Servers TTS Servers

MCML(ITU‐T

Standardized Protocols)

Language and Domain Portability for Speech Communciation Tool using ITU-T standardized S2ST protocol

‐ 17 languages for ASR, 27 for MT, and 14 for TTS‐ Chat for up to 5 people

Speech‐to‐Speech Translation

2020Olympics

in TokyoReal‐Time Indexing

Video data

NICTaudio indexing

system

Searching scenes with the sound of

“explosion”

Scene of “Riots”

Video A: 20 sec

Video B: 35 sec

Speech VideoAudioevent

Spoken Dialog System

How can I get to the stadium？

Which game

will you see?

From Tokyo

station?

Multilingual Communication Project

2 APT/ITU Conformance and Interoperability Workshop ... · Speech Recognition Smart Phone ......

Documents

Transcript of 2 APT/ITU Conformance and Interoperability Workshop ... · Speech Recognition Smart Phone ......