Introduction to myanmar Text-To-Speech

17
Introduction to Myanmar Text-To-Speech Engine Ngwe Tun Consultant Yangon Education Center for Blinds

Transcript of Introduction to myanmar Text-To-Speech

Introduction to Myanmar Text-To-Speech Engine

Introduction to Myanmar Text-To-Speech EngineNgwe TunConsultantYangon Education Center for Blinds

What is Text-To-Speech Engine?A text-to-speech (TTS) system converts normal language text into speech.Also known as speech synthesizer.

e.g Microsoft SAM."The quick brown fox jumps over the lazy dog 1,234,567,890 times. soi"

Overview of Text-To-Speech EngineA text-to-speech system (or "engine") is composed of two parts: a front-end and a back-end.The front-end converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-endoften referred to as the synthesizerthen converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),which is then imposed on the output speech.

Features and Functions of Text-to-Speech SoftwareText-to-speech (TTS) software tools are similar in that they speak text on a computer. However, they vary widely in their functionality.Formatting text Allows you to format digital text you create, download from the Internet, or scan into your computer similar to a word processing program.Speaking what you type Speaks text as you type to give you support in writing. Within this function there may be the ability to set the level of support, such as speaking words or speaking each letter and then the word.

Speaking the TextContinuous reading reads from where you choose to begin reading and stops when it reaches the end of the text or you use a stop commandIncremental reading reads an increment of text such as a word, sentence, chunk/phrase, or paragraph and stops and waits for you to request another increment of text read.Highlighted text reads just text you highlight with the cursor. Some TTS programs read the document from a starting point until the users stops the program. Other TTS programs only read text selected and highlighted by the user.Voices TTS software can use one voice or allow you to choose from a selection of male, female, and even foreign language voicesReading Speed you can choose to read faster or slower in precise words per minute or in speed increments

Who will be using speech synthesis?for users with visual disabilities. Screen readers not only read text files but also give the user other audible navigation support such as reading the user interface, indicating where the users cursor is on the screen, and indicating when the users cursor has passed over a folder.Text readers are commercial TTS software tools for users who read below grade level because of a learning disability, English as a second language, a reading disability, or low vision.Stephen Hawking is one of the most famous people using speech synthesis to communicateMobiles Phones can speak out incoming text such as SMS, E-Mail and notification

Advantage of Text-To-Speech.listen to class notes, text books and electronic text.Facilitates educationAvoids eyestrain from too much readingMake proofreading effectiveLearn English, Myanmar or other languagesPrepare for speeches by hearing your work read aloud.Listen to e-books or e-material during your commute.Amuse children by letting your PC read stories to them.Help seniors or those with vision problems.

Demo

Myanmar Languages in Digital EraComputers, Mobile Phone, MP3 Players, Watches and Electronic DevicesWidely used in Social Networks & Online Content.Accessible to News, Information, Knowledge from International & LocalLocalization & Rich Myanmar Content in Electronic Form.

Why we need Myanmar Text-To-Speech?Myanmar Language Users can not use Screen Reader.Screen/File/Text Reader do not support Myanmar.Any Myanmar computer user must easier to use Myanmar Language with Text-To-Speech Features.The initiative of Open Source Myanmar Text-To-Speech Engine will empower to other Software Vendor who want to develop/integrate with their application. E.g. Mobile Phone Manufacturer can integrate Myanmar Language support easily and without reinvent the wheel.Myanmar Language Learning will be more easier through Text Reader.

How shall we develop Myanmar TTS?LearnDefine Scope of WorkCollect Digital AssetDiscover the best tool to make TTS and plan for the futureDevelopDevelop Myanmar Language Model/Tokenization and Grapheme-To-SoundTrain TTS EngineTest TTS Engine internally/Public ReviewEnhancedReview the work plan, try to find improvement, applied the feedbackInvite Specialist on the TTS and Improve the Engine with third party opinionsDevelop Tools and apply in the real environment (e.g. audio books)

Open Source Model of Myanmar TTSEveryone can participate in the development TeamAnyone can guide to ProjectWhoever can contribute their idea

Open Source Model of TTS Engine for Myanmar Language.We realized that 1 Consultant, 1 Project Leader and 3 Developers will not be fulfilled the complete Myanmar TTS.WE need You, Your Feedback, Your Contribution.

The purpose of a Text-to-Speech systemTo convert any text into natural sounding speech.First, text needs to be normalized. Normalization is the process of transforming text into a single canonical form, therefore text is parsed into single tokens.Next, the text-to-speech system assigns the appropriate phonetic transcriptions to each word which reflect how text should be pronounced in any given natural language. The synthesizer then converts the symbolic linguistic representations into sound.The last step is to choose the right speech units which ensure the high quality and natural sound of generated speech.

Architecture of Myanmar TTSMinimal unit of sound will be Syllable or Syllable-Chain Word Segmentation or Tokenization / / / / / / / / Compose Syllable Sound to compose words sound with concatenation.+ / / / / / / / ++ / Need to adjust speed and intonation between syllable and words. / / /

Application of Myanmar TTSThe longest application has been in the use of screen readers for people with visual impairmentcommonly used by people with dyslexia and other reading difficulties as well as by pre-literate childrenSpeech synthesis techniques are also used in entertainment productions such as games and animationsText to Speech for disability and handicapped communication aids have become widely deployed in Mass Transit.Text-to speech is also used in second language acquisition

Dream for the Myanmar TTSText-To-Speech Engine integrated with Mobile Phone, Computer and Electronic DevicesVoice Command integrated with Mobile Phone, Computer and Electronic DevicesEvery one can read any Myanmar news, information and electronic Text by Screen ReaderText-To-Speech Engine empowered in Public Announcement and weather notification.Screen Reader Functions will be integrated with OCR, even image can read aloud.

Thanks for being here and participating in the ProjectSponsorship of the TTS Project by KBZ Group of CompanyGreat arrangement by Yangon Education Center for BlindContribution of Knowledge by several peopleLast, not the Least, Warmly welcome to Future Contributors.