Atext to Spechconverter
-
Upload
anna-nayomi -
Category
Documents
-
view
224 -
download
0
Transcript of Atext to Spechconverter
-
8/3/2019 Atext to Spechconverter
1/33
A TextA Text--toto--Speech Synthesis SystemSpeech Synthesis System
Presented By:
Michael Beddaoui
Abdel-Aziz El-Solh
-
8/3/2019 Atext to Spechconverter
2/33
Presentation OutlinePresentation Outline Introduction
Background
3 Components of TTS System Text Pre-processing Aziz
Prosody Mike
Concatenation Mike
Summary What has been done / Future Work
Conclusion
Questions
-
8/3/2019 Atext to Spechconverter
3/33
What is a TTS System?What is a TTS System?Definition:
A system which takes as input a sequence of words and
converts them to speech
Applications:
Services for the hearing impaired
Reading email aloud
Commercial TTS Systems:
Festival
Bell Labs TTS
-
8/3/2019 Atext to Spechconverter
4/33
Different TTS SystemsDifferent TTS Systems
Phonemes are: The minimal distinctive phonetic units
Relatively small in number (39 phonemes in
English)
Disadvantage:
Phonemes ignore transitional sound !!!
Phoneme-Based TTS System
-
8/3/2019 Atext to Spechconverter
5/33
Different TTS Systems (contd)Different TTS Systems (contd)
Disadvantage:
Over 1500 diphones in the English language !!!
Diphone-Based TTS System
Diphones are: Made up of 2 phonemes
Incorporate transitional sound
Make for better sounding speech
-
8/3/2019 Atext to Spechconverter
6/33
TTS System
Fundamental ComponentsFundamental Components
Text
Pre-processingProsody Concatenation
words
-
8/3/2019 Atext to Spechconverter
7/33
Text PreText Pre--ProcessingProcessing Input
String of characters (sentence)
Output String of diphone symbols
Objective
Perform sentence level analysis Punctuation marks
Pauses between words
Convert all input to corresponding diphones
-
8/3/2019 Atext to Spechconverter
8/33
Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)
WordSegmenterAcronymConverterNumberConverter
W
ord toDiphone
Translator(Phonetization)
Diphone
Dictionary
MLDSNumberConverter
-
8/3/2019 Atext to Spechconverter
9/33
Number ConverterNumber Converter Replace numerals with their textual
versions
100 one hundred
Handle fractional and decimal
numbers
0.25 point two five
-
8/3/2019 Atext to Spechconverter
10/33
Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)
WordSegmenterAcronymConverterNumberConverter
Word to
Diphone
Translator(Phonetization)
Diphone
Dictionary
MLDSAcronymConverter
-
8/3/2019 Atext to Spechconverter
11/33
Acronym ConverterAcronym Converter Replace acronyms with single letter
components
A.B.C. A B C
Change abbreviations to full textual
format
Mr. Mister
-
8/3/2019 Atext to Spechconverter
12/33
Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)
WordSegmenterAcronymConverterNumberConverter
Word to
Diphone
Translator(Phonetization)
Diphone
Dictionary
MLDS
WordSegmenter
-
8/3/2019 Atext to Spechconverter
13/33
Word SegmenterWord Segmenter Divide sentence into word segments
Special delimiter to separate segments
(i.e. ||)
Segments can be:
A single word
An acronym
A numeral
Identify punctuation marks
-
8/3/2019 Atext to Spechconverter
14/33
Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)
W
ordSegmenterAcronymConverterNumberConverter
Word to
Diphone
Translator(Phonetization)
Diphone
Dictionary
MLDS
Word to
Diphone
Translator(Phonetization)
-
8/3/2019 Atext to Spechconverter
15/33
Word To Diphone ConverterWord To Diphone Converter
(Phonetization)(Phonetization) Purpose
Translate words to their diphone
representations
Resource
Dictionary of words and their diphones
(derived from CMU phoneme database)Over 175,000 words supported
-
8/3/2019 Atext to Spechconverter
16/33
WW--toto--D Converter ContdD Converter Contd Implementation
Binary Search Algorithm in C
Start with whole dictionary as search rangestart index, end index, middle index
If target word alphabetically less then middleword,
then ignore second half (i.e. end index =middle index)
else ignore first half (i.e. start index = middleindex)
Repeat until word found or range contains zero
words
-
8/3/2019 Atext to Spechconverter
17/33
WW--toto--D Converter ContdD Converter Contd Advantages
Fast search times
Search range decreases exponentially with
each iteration (max of 1 sec currently)
Less complicated to implement
Compared to indexing dictionary or
Importing the dictionary to an internal
structure
-
8/3/2019 Atext to Spechconverter
18/33
Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)
W
ordSegmenter
Acronym
Converter
Number
Converter
Word to
Diphone
Translator(Phonetization)
Diphone
Dictionary
MLDSMLDS
-
8/3/2019 Atext to Spechconverter
19/33
The MultiThe Multi--Level Data StructureLevel Data Structure Contains all necessary data for the
next sub-system:
Word
Diphone representation
Prosodic parameters for each diphone
This reflects both word-level and sentence-level prosody
Allows for modularization
-
8/3/2019 Atext to Spechconverter
20/33
ProsodyProsody
DiphoneRetrieval
ConcatenationAcousticManipulation
Diphone
Database
MLDS
done
yes
no
-
8/3/2019 Atext to Spechconverter
21/33
Diphone RetrievalDiphone Retrieval Database of recorded diphones
Every diphone matched with txt file
Distinguished by type (CC, CV, VC, VV)
References to specific components
within waveform
Store diphone waveform andprosodic parameters in variables
-
8/3/2019 Atext to Spechconverter
22/33
Properties of Speech SignalsProperties of Speech Signals
c a t
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
-
8/3/2019 Atext to Spechconverter
23/33
Acoustic ManipulationAcoustic Manipulation -- MATLabMATLab Recognizes wave files (.WAV)
load, play, write
Vast array of signal processing tools
Built-in functions
Ease of debugging
GUI-capable
-
8/3/2019 Atext to Spechconverter
24/33
Pitch/Duration/Amplitude AlterationPitch/Duration/Amplitude Alteration
Pitch vowels only
As pitch increases, pitch period shrinks As pitch decreases, pitch period expands
Need to alter length between pitch marks
in order to alter pitch of speech signal
-
8/3/2019 Atext to Spechconverter
25/33
Altering PitchAltering Pitch
X
Hanning
window
=
Original diphone Extracted
pitch period
Hanned
pitch periodC_A
-
8/3/2019 Atext to Spechconverter
26/33
PSOLA Pitch Synchronous Overlap and Add
=
Altering Pitch ContdAltering Pitch Contd
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
-
8/3/2019 Atext to Spechconverter
27/33
Altering Pitch ContdAltering Pitch Contd
X
=
Kaiserwindow
-naturally spoken
vowels contain 12-18
pitch marks
X 12
-
8/3/2019 Atext to Spechconverter
28/33
Altering DurationAltering Duration Increase number of PSOLA iterations
(overlaps) to increase duration
Decrease number of PSOLA iterations(overlaps) to decrease duration
Altering AmplitudeAltering Amplitude
Multiplying the signal by a constant
If constant > 1, amplitude increase
If constant < 1, amplitude decrease
-
8/3/2019 Atext to Spechconverter
29/33
-
8/3/2019 Atext to Spechconverter
30/33
SummarySummary
TTS System
Text
Pre-processingProsody Concatenation
words
System modularized
-
8/3/2019 Atext to Spechconverter
31/33
ProgressProgress Work Completed / Current Status
Text pre-processing and prosodic manipulation for amulti-syllable word
Diphone concatenation 200+ diphones in database
Fully functional GUI implemented
Work To Be Done Sentence level synthesis
Expand diphone database
Fine-tuning and enhancing
Prepare for Poster Fair
Write final report
-
8/3/2019 Atext to Spechconverter
32/33
Questions?Questions?
Contact Information
Michael Beddaoui
Abdel-Aziz El-Solh
-
8/3/2019 Atext to Spechconverter
33/33