Atext to Spechconverter

download Atext to Spechconverter

of 33

Transcript of Atext to Spechconverter

  • 8/3/2019 Atext to Spechconverter

    1/33

    A TextA Text--toto--Speech Synthesis SystemSpeech Synthesis System

    Presented By:

    Michael Beddaoui

    Abdel-Aziz El-Solh

  • 8/3/2019 Atext to Spechconverter

    2/33

    Presentation OutlinePresentation Outline Introduction

    Background

    3 Components of TTS System Text Pre-processing Aziz

    Prosody Mike

    Concatenation Mike

    Summary What has been done / Future Work

    Conclusion

    Questions

  • 8/3/2019 Atext to Spechconverter

    3/33

    What is a TTS System?What is a TTS System?Definition:

    A system which takes as input a sequence of words and

    converts them to speech

    Applications:

    Services for the hearing impaired

    Reading email aloud

    Commercial TTS Systems:

    Festival

    Bell Labs TTS

  • 8/3/2019 Atext to Spechconverter

    4/33

    Different TTS SystemsDifferent TTS Systems

    Phonemes are: The minimal distinctive phonetic units

    Relatively small in number (39 phonemes in

    English)

    Disadvantage:

    Phonemes ignore transitional sound !!!

    Phoneme-Based TTS System

  • 8/3/2019 Atext to Spechconverter

    5/33

    Different TTS Systems (contd)Different TTS Systems (contd)

    Disadvantage:

    Over 1500 diphones in the English language !!!

    Diphone-Based TTS System

    Diphones are: Made up of 2 phonemes

    Incorporate transitional sound

    Make for better sounding speech

  • 8/3/2019 Atext to Spechconverter

    6/33

    TTS System

    Fundamental ComponentsFundamental Components

    Text

    Pre-processingProsody Concatenation

    words

  • 8/3/2019 Atext to Spechconverter

    7/33

    Text PreText Pre--ProcessingProcessing Input

    String of characters (sentence)

    Output String of diphone symbols

    Objective

    Perform sentence level analysis Punctuation marks

    Pauses between words

    Convert all input to corresponding diphones

  • 8/3/2019 Atext to Spechconverter

    8/33

    Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)

    WordSegmenterAcronymConverterNumberConverter

    W

    ord toDiphone

    Translator(Phonetization)

    Diphone

    Dictionary

    MLDSNumberConverter

  • 8/3/2019 Atext to Spechconverter

    9/33

    Number ConverterNumber Converter Replace numerals with their textual

    versions

    100 one hundred

    Handle fractional and decimal

    numbers

    0.25 point two five

  • 8/3/2019 Atext to Spechconverter

    10/33

    Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)

    WordSegmenterAcronymConverterNumberConverter

    Word to

    Diphone

    Translator(Phonetization)

    Diphone

    Dictionary

    MLDSAcronymConverter

  • 8/3/2019 Atext to Spechconverter

    11/33

    Acronym ConverterAcronym Converter Replace acronyms with single letter

    components

    A.B.C. A B C

    Change abbreviations to full textual

    format

    Mr. Mister

  • 8/3/2019 Atext to Spechconverter

    12/33

    Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)

    WordSegmenterAcronymConverterNumberConverter

    Word to

    Diphone

    Translator(Phonetization)

    Diphone

    Dictionary

    MLDS

    WordSegmenter

  • 8/3/2019 Atext to Spechconverter

    13/33

    Word SegmenterWord Segmenter Divide sentence into word segments

    Special delimiter to separate segments

    (i.e. ||)

    Segments can be:

    A single word

    An acronym

    A numeral

    Identify punctuation marks

  • 8/3/2019 Atext to Spechconverter

    14/33

    Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)

    W

    ordSegmenterAcronymConverterNumberConverter

    Word to

    Diphone

    Translator(Phonetization)

    Diphone

    Dictionary

    MLDS

    Word to

    Diphone

    Translator(Phonetization)

  • 8/3/2019 Atext to Spechconverter

    15/33

    Word To Diphone ConverterWord To Diphone Converter

    (Phonetization)(Phonetization) Purpose

    Translate words to their diphone

    representations

    Resource

    Dictionary of words and their diphones

    (derived from CMU phoneme database)Over 175,000 words supported

  • 8/3/2019 Atext to Spechconverter

    16/33

    WW--toto--D Converter ContdD Converter Contd Implementation

    Binary Search Algorithm in C

    Start with whole dictionary as search rangestart index, end index, middle index

    If target word alphabetically less then middleword,

    then ignore second half (i.e. end index =middle index)

    else ignore first half (i.e. start index = middleindex)

    Repeat until word found or range contains zero

    words

  • 8/3/2019 Atext to Spechconverter

    17/33

    WW--toto--D Converter ContdD Converter Contd Advantages

    Fast search times

    Search range decreases exponentially with

    each iteration (max of 1 sec currently)

    Less complicated to implement

    Compared to indexing dictionary or

    Importing the dictionary to an internal

    structure

  • 8/3/2019 Atext to Spechconverter

    18/33

    Text PreText Pre--Processing (BlockDiagram)Processing (BlockDiagram)

    W

    ordSegmenter

    Acronym

    Converter

    Number

    Converter

    Word to

    Diphone

    Translator(Phonetization)

    Diphone

    Dictionary

    MLDSMLDS

  • 8/3/2019 Atext to Spechconverter

    19/33

    The MultiThe Multi--Level Data StructureLevel Data Structure Contains all necessary data for the

    next sub-system:

    Word

    Diphone representation

    Prosodic parameters for each diphone

    This reflects both word-level and sentence-level prosody

    Allows for modularization

  • 8/3/2019 Atext to Spechconverter

    20/33

    ProsodyProsody

    DiphoneRetrieval

    ConcatenationAcousticManipulation

    Diphone

    Database

    MLDS

    done

    yes

    no

  • 8/3/2019 Atext to Spechconverter

    21/33

    Diphone RetrievalDiphone Retrieval Database of recorded diphones

    Every diphone matched with txt file

    Distinguished by type (CC, CV, VC, VV)

    References to specific components

    within waveform

    Store diphone waveform andprosodic parameters in variables

  • 8/3/2019 Atext to Spechconverter

    22/33

    Properties of Speech SignalsProperties of Speech Signals

    c a t

    PeriodicNon-

    Periodic

    Non-

    Periodic

    eg. cat.wav

  • 8/3/2019 Atext to Spechconverter

    23/33

    Acoustic ManipulationAcoustic Manipulation -- MATLabMATLab Recognizes wave files (.WAV)

    load, play, write

    Vast array of signal processing tools

    Built-in functions

    Ease of debugging

    GUI-capable

  • 8/3/2019 Atext to Spechconverter

    24/33

    Pitch/Duration/Amplitude AlterationPitch/Duration/Amplitude Alteration

    Pitch vowels only

    As pitch increases, pitch period shrinks As pitch decreases, pitch period expands

    Need to alter length between pitch marks

    in order to alter pitch of speech signal

  • 8/3/2019 Atext to Spechconverter

    25/33

    Altering PitchAltering Pitch

    X

    Hanning

    window

    =

    Original diphone Extracted

    pitch period

    Hanned

    pitch periodC_A

  • 8/3/2019 Atext to Spechconverter

    26/33

    PSOLA Pitch Synchronous Overlap and Add

    =

    Altering Pitch ContdAltering Pitch Contd

    50% Overlap + Add

    Pitch Up > 50%

    Pitch Down < 50%

  • 8/3/2019 Atext to Spechconverter

    27/33

    Altering Pitch ContdAltering Pitch Contd

    X

    =

    Kaiserwindow

    -naturally spoken

    vowels contain 12-18

    pitch marks

    X 12

  • 8/3/2019 Atext to Spechconverter

    28/33

    Altering DurationAltering Duration Increase number of PSOLA iterations

    (overlaps) to increase duration

    Decrease number of PSOLA iterations(overlaps) to decrease duration

    Altering AmplitudeAltering Amplitude

    Multiplying the signal by a constant

    If constant > 1, amplitude increase

    If constant < 1, amplitude decrease

  • 8/3/2019 Atext to Spechconverter

    29/33

  • 8/3/2019 Atext to Spechconverter

    30/33

    SummarySummary

    TTS System

    Text

    Pre-processingProsody Concatenation

    words

    System modularized

  • 8/3/2019 Atext to Spechconverter

    31/33

    ProgressProgress Work Completed / Current Status

    Text pre-processing and prosodic manipulation for amulti-syllable word

    Diphone concatenation 200+ diphones in database

    Fully functional GUI implemented

    Work To Be Done Sentence level synthesis

    Expand diphone database

    Fine-tuning and enhancing

    Prepare for Poster Fair

    Write final report

  • 8/3/2019 Atext to Spechconverter

    32/33

    Questions?Questions?

    Contact Information

    Michael Beddaoui

    Abdel-Aziz El-Solh

    [email protected]

    [email protected]

  • 8/3/2019 Atext to Spechconverter

    33/33