Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel...
Transcript of Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel...
![Page 1: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/1.jpg)
1ASEAN-NICT Round Table – Feb 2015
Chai WutiwiwatchaiChai WutiwiwatchaiNational Electronics and Computer Technology CenterNational Electronics and Computer Technology CenterNational Science and Technology Development AgencyNational Science and Technology Development Agency
THAILANDTHAILAND
Language and Speech TranslationLanguage and Speech TranslationActivities in ThailandActivities in Thailand
![Page 2: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/2.jpg)
2ASEAN-NICT Round Table – Feb 2015
● U-STAR Speech Translation (since 2007)U-STAR Speech Translation (since 2007)- Brief history- Brief history- System architecture- System architecture- Current status- Current status- Future plan- Future plan
● ASEAN Machine Translation (since 2012)ASEAN Machine Translation (since 2012)- Project overview- Project overview- System architecture- System architecture- Current status- Current status- Future plan- Future plan
OutlineOutline
![Page 3: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/3.jpg)
3ASEAN-NICT Round Table – Feb 2015
● U-STAR Speech Translation (since 2007)U-STAR Speech Translation (since 2007)- Brief history- Brief history- System architecture- System architecture- Current status- Current status- Future plan- Future plan
● ASEAN Machine Translation (since 2012)- Project overview- System architecture- Current status- Future plan
OutlineOutline
![Page 4: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/4.jpg)
4ASEAN-NICT Round Table – Feb 2015
- - Collaboration of Collaboration of 30 Asian and European countries30 Asian and European countries- - Modality Conversion Marked-up Language (MCML)Modality Conversion Marked-up Language (MCML), , registered as an ITU-T recommendation standardregistered as an ITU-T recommendation standard
U-STAR HistoryU-STAR History11
![Page 5: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/5.jpg)
5ASEAN-NICT Round Table – Feb 2015
● 2009 :2009 : A-STAR S2ST Live DemoA-STAR S2ST Live Demo- Network-based Multilingual S2ST- Network-based Multilingual S2ST- 8 Asian languages and English- 8 Asian languages and English- Peer-to-peer and Multi-party clients- Peer-to-peer and Multi-party clients- Portable devices (UMPC)- Portable devices (UMPC)
U-STAR HistoryU-STAR History11
![Page 6: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/6.jpg)
6ASEAN-NICT Round Table – Feb 2015
● 2012 :2012 : U-STAR S2ST Public ServiceU-STAR S2ST Public Service- Network-based Multilingual S2ST in the travel - Network-based Multilingual S2ST in the travel and sport domain launched in Jun 2012and sport domain launched in Jun 2012- 23 Asian and European languages supported- 23 Asian and European languages supported- - VoiceTra4U-MVoiceTra4U-M, an iPhone App available freely, an iPhone App available freely on the AppStoreon the AppStore
U-STAR HistoryU-STAR History11
![Page 7: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/7.jpg)
7ASEAN-NICT Round Table – Feb 2015
System ArchitectureSystem Architecture22
● ITU-T H.625 – Architecture for network-ITU-T H.625 – Architecture for network- based speech-to-speech translation based speech-to-speech translation servicesservices
● ITU-T F.745 – Functional requirements ITU-T F.745 – Functional requirements for network-based speech-to-speech for network-based speech-to-speech translation servicestranslation services
![Page 8: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/8.jpg)
8ASEAN-NICT Round Table – Feb 2015
![Page 9: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/9.jpg)
9ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status● Language ResourcesLanguage Resources
- - Basic Travel Expression Corpus (BTEC)Basic Travel Expression Corpus (BTEC) has been has been used to translate to member languages since A-STARused to translate to member languages since A-STAR
- To extend the service for users during London Olypic - To extend the service for users during London Olypic Games, an Olympic expression corpus by Games, an Olympic expression corpus by HarbinHarbin Institute of Technology (HIT)Institute of Technology (HIT) has been acquired and has been acquired and distributed to translatedistributed to translate
- A - A Named Entity (NE) list Named Entity (NE) list of words related to Olympicof words related to Olympic expressions has also been collected from memberexpressions has also been collected from member countriescountries
![Page 10: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/10.jpg)
10ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status● Examples of EnginesExamples of Engines33
Language ASR MT TTSEnglish (En) HMnet (SSS) Concatenative
Hindi (Hi) SMT (Cleopatra) HMM
Indonesian (Id) HMnet (SSS) SMT (Moses) HMM
Japanese (Ja) HMnet (SSS) SMT (Cleopatra) Concatenative
Korean (Ko) FST RBMT (Parser) HMM
Malay (Ms) HMM RBMT (Piramid) HMM
Thai (Th) HMM SMT (Moses) HMM
Vietnamese (Vi) HMM SMT (Moses) HMM
Chinese (Zh) HMnet (SSS) SMT (Cleopatra) Concatenative
![Page 11: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/11.jpg)
11ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status
● No. of App DownloadsNo. of App Downloads15,645 downloads (til 2012)15,645 downloads (til 2012)5,179 Thai downloads (til 2012) - 25,179 Thai downloads (til 2012) - 2ndnd rank rank65,346 downloads (til 2014)65,346 downloads (til 2014)
● No. of Service TransactionsNo. of Service Transactions26,882 transactions (til 2012)26,882 transactions (til 2012)5,179 transactions for Thai (til 2012) - 25,179 transactions for Thai (til 2012) - 2ndnd rank rank514,552 transactions (til 2014)514,552 transactions (til 2014)45,056 translations for Thai (til 2014)45,056 translations for Thai (til 2014)
![Page 12: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/12.jpg)
12ASEAN-NICT Round Table – Feb 2015
Future PlanFuture Plan● Named-Entities (NE)Named-Entities (NE) - NE words are often language specific- NE words are often language specific
- Using a descriptive or transliterated word for unknown- Using a descriptive or transliterated word for unknown
● Scalability and ExtensibilityScalability and Extensibility - Encouraging service maintenance from members- Encouraging service maintenance from members - Improving service performance by using real data- Improving service performance by using real data - Extending to new domains and languages- Extending to new domains and languages
● Service LatencyService Latency - The condition of network is the key- The condition of network is the key - Setting communication mirror servers- Setting communication mirror servers
![Page 13: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/13.jpg)
13ASEAN-NICT Round Table – Feb 2015
● U-STAR Speech Translation (since 2007)- Brief history- System architecture- Current status- Future plan
● ASEAN Machine Translation (since 2012)ASEAN Machine Translation (since 2012)- Project overview- Project overview- System architecture- System architecture- Current status- Current status- Future plan- Future plan
OutlineOutline
![Page 14: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/14.jpg)
14ASEAN-NICT Round Table – Feb 2015
ASEAN-MT Project (Since 2012)ASEAN-MT Project (Since 2012)
NECTEC
UCSYMOST
NIDA
IOIT
LINTON
BPPT
DLSU
UBD
I2R
● ASEAN languages translation is ASEAN languages translation is increasingly increasingly important to support important to support the coming AEC 2015the coming AEC 2015
• Endorsed by ASEAN SCMITEndorsed by ASEAN SCMIT Approved for ASF partial Approved for ASF partial support (2012-2014)support (2012-2014)
![Page 15: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/15.jpg)
15ASEAN-NICT Round Table – Feb 2015
System ArchitectureSystem Architecture
![Page 16: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/16.jpg)
16ASEAN-NICT Round Table – Feb 2015
Statistical MT ApproachStatistical MT Approach
![Page 17: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/17.jpg)
17ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status
● Kick-off & The 1Kick-off & The 1stst Working Committee Meeting Working Committee Meeting- July 2012- July 2012- Pattaya, Thailand- Pattaya, Thailand
● The 1The 1stst Technology Workshop Technology Workshop- January 2013- January 2013- Pathumthani, Thailand- Pathumthani, Thailand
● Progress Demonstration at ASEAN COST MeetingProgress Demonstration at ASEAN COST Meeting- May 2013- May 2013- Tagaytay City, Philippines- Tagaytay City, Philippines
● The 2The 2ndnd Working Committee Meeting Working Committee Meeting- December 2013- December 2013- Ayudhaya, Thailand- Ayudhaya, Thailand
![Page 18: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/18.jpg)
18ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status
Country Language Translation SMT NE Tag
Brunei Malay
Cambodia Cambodian 20,000 20,000 20,000
Indonesia Indonesian 20,000 20,000 20,000
Laos Lao 20,000 20,000 20,000
Malaysia Malay 20,000 20,000 20,000
Myanmar Myanmar 10,000 10,000 20,000
Philippines Filipino 20,000 20,000 20,000
Singapore Chinese 20,000 20,000 20,000
Thailand Thai 20,000 20,000 20,000
Vietnam Vietnamese 20,000 20,000 20,000
Parallel text corpusParallel text corpus: 20,000 sentences in travel domain: 20,000 sentences in travel domain
![Page 19: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/19.jpg)
19ASEAN-NICT Round Table – Feb 2015
Current StatusCurrent Status
Size 20,000Domain Travel
People Greeting, Introduction, CommunicationSurvival Transportation, Accommodation, FinanceFood Food, Beverage, RestaurantFun Recreation, Traveling, Shopping, NightlifeResource Number, Time, CurrencySpecial Needs Emergency, Health
NE types 17 types
![Page 20: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/20.jpg)
20ASEAN-NICT Round Table – Feb 2015
4 4 http://www.aseanmt.org/demo http://www.aseanmt.org/demo Demonstration in ASEAN COST Meeting, May 2014Demonstration in ASEAN COST Meeting, May 2014
![Page 21: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/21.jpg)
21ASEAN-NICT Round Table – Feb 2015
Future PlanFuture Plan
● EvaluationEvaluation- Overall system evaluation- Overall system evaluation
● Post-Editing ModulePost-Editing Module- R&D on a post-editing module- R&D on a post-editing module
● 2015 Activities2015 Activities- The 2- The 2ndnd technology workshop technology workshop- The 3- The 3rdrd working committee meeting working committee meeting- Final demonstration at ASEAN COST meeting- Final demonstration at ASEAN COST meeting
![Page 22: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/22.jpg)
22ASEAN-NICT Round Table – Feb 2015
● Future of S2STFuture of S2ST55
![Page 23: Language and Speech Translation Activities in Thailand · Language Resources - Basic Travel Expression Corpus (BTEC) has been used to translate to member languages since A-STAR -](https://reader030.fdocuments.us/reader030/viewer/2022040202/5e735f1ec490545e2904b343/html5/thumbnails/23.jpg)
23ASEAN-NICT Round Table – Feb 2015
ReferencesReferences
5 5 S. Nakamura, 2009. Overcoming the language barrier withS. Nakamura, 2009. Overcoming the language barrier with speech translation technology. Science and Technology speech translation technology. Science and Technology Trends – Quarterly Review No. 31, Apr 2009, pp. 35-48.Trends – Quarterly Review No. 31, Apr 2009, pp. 35-48.
1 1 U-STAR consortium, U-STAR consortium, http://ustar-consortium.com/http://ustar-consortium.com/ 2 2 ITU-T standard, ITU-T standard, http://www.itu-t.int/http://www.itu-t.int/ 3 3 S. Sakti, M. Paul, A. Finch, S. Sakai, T. T. Vu, N. Kimura,S. Sakti, M. Paul, A. Finch, S. Sakai, T. T. Vu, N. Kimura, C. Hori, E. Sumita, S. Nakamura, J. Park, C. Wutiwiwatchai,C. Hori, E. Sumita, S. Nakamura, J. Park, C. Wutiwiwatchai, B. Xu, H. Riza, K. Arora, C. M. Luong, H. Li, 2011. TowardB. Xu, H. Riza, K. Arora, C. M. Luong, H. Li, 2011. Toward translating Asian spoken languages. Computer Speech andtranslating Asian spoken languages. Computer Speech and Language (2011), Language (2011), doi:10.1016/j.csl.2011.07.001.doi:10.1016/j.csl.2011.07.001.
4 4 ASEAN-MT project, ASEAN-MT project, http://www.aseanmt.org/http://www.aseanmt.org/