From Sounds to Language

34
From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg

description

From Sounds to Language. Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg. Linguistic sounds. How does a sound wave become language? Sounds are continuous wave forms. Linguistic units are categorical. - PowerPoint PPT Presentation

Transcript of From Sounds to Language

Course Overview

From Sounds to LanguageLecture 2Spoken Language ProcessingProf. Andrew Rosenberg0Linguistic soundsHow does a sound wave become language?

Sounds are continuous wave forms.Linguistic units are categorical.

How is the human perceptual system able to categorize and combine linguistic sounds into language?1Studying SpeechWho studies speech?Linguists (phoneticians, phonologists, forensic linguists)Speech Engineers Speech recognitionSpeech synthesisetc.Speech PathologistsLanguage InstructorsSingersMarketing experts2Marketing experts?3

Studying speechMajor questions in studying speech.What is the sound inventory of a language?Which variations are linguistically relevant?R/L in Asian LanguagesP/Ph in EnglishHow are speech sounds produced?What sounds are shared by two languages, and which are not?How do sounds vary in context?Green banana vs. Greem banana4Representing speech soundsWhy are representations important?translation between sounds and wordsASR and TTSLearning pronunciationHaving a shared vocabulary to discuss language.How should we represent speech sounds?Orthography?Special symbols?Abstract classes based on sound and/or articulatory similarities5Using orthography to represent sounsdA single orthographic letter is realized in many different ways (in English)bcomb, tomb, bombccourt, center, chessoofood, good, bloodsreason, sunrise, shy, collision

6Using orthography to represent sounsdA single sound can be written in many different ways (in English)[i]sea, see, scene, receive, thief, miss[s]cereal, same, miss[u]true, few, choose, lieu, do[ay]lie, prime, pry, buy,

How is orthography looking as a choice in English?7Phonetic Symbol SetsInternational Phonetic Alphabet (IPA)Single (unique) character for each soundRepresents all sounds of the worlds languages, but is large, and requires a special (non-ascii) font.ARPAbet, TIMIT, etc.Multiple characters for each soundLanguage specific. A new symbol set is required for each language.89

Exercise:Write your full name in English orthography and in ARPAbet.Sound categoriesPhone: Basic speech sound of a languageA minimal sound difference between two wordstoo vs. zooNot every sound made by a human speaker is phoneticSniffs, laughs, coughs, breathsPhoneme: Class of speech soundsPhoneme may include several phones /t/ in top, stop, little, butter, winterAllophone: the set of phonetic variants that comprise a phoneme.{[t], [], }10Speech ProductionThe articulatory organsGeneral Process:Air is expelled from the lungs through the windpipe (trachea) leaving via the mouth (and nose)Air passes through the trachea through the larynx which contains the vocal folds the space between them is the glottis.When vocal folds vibrate, voiced sounds are produced, otherwise, voiceless (e.g. [f] vs [v])11Vocal Fold Vibration12

Slow motion video of normal vocal foldsArticulatorsWhy did Ken set the net on the soggy deck?Queens University ATR Labs X-ray Film Databasehttp://psyc.queensu.ca/~munhallk/05_database.htm13

Vocal Organs

14Recording Articulatory DataX-Ray Microbeam DatabaseTrack motion of small gold pellets on the tongue, jaw, lips and soft pallateElectroglottographyRun a high freq current through the glottal area of a speaker. There is lower resistance when the vocal folds are closed.Electromagnetic articulography (EMMA)3 transmitters on a helmet allow for triangulation of 5-15 sensor positions15TODO: Track down examples!15Classes of SoundsConsonants and VowelsConsonants: Restricted or blocked airflow (e.g. [s])Voiced or unvoicedVowelsUnrestricted airflowvoicedSemi vowels (approximants): [w], [y]

16Consonants: Place of ArticulationWhat is the point of maximum air restriction?Labial: bilabial [b], [p]; labiodental [v], [f]Dental: [], [] thief vs. themAlveolar: [t], [d], [s], [z]Palatal: [], [t] shrimp vs. chimpVelar: [k], [g]Glottal: [?] glottal stop17Consonants: Place of ArticulationWhat is the point of maximum air restriction?Approximant: [w], [y]2 articulators come close but dont restrict muchSomewhere between vowels and consonantslateral: [l]Tap or flap: [ ] e.g. butter18Places of Articulation19

http://www.chass.utoronto.ca/~danhall/phonetics/sammy.htmllabialdentalalveolarpost-alveolar/palatalvelaruvularpharyngeallaryngeal/glottalConsonants: Manner of articulationHow is the airflow restrictedStop (or plosive): [p], [t], [g], Airflow is completely blocked (closure) and released (release)Glottal stop, e.g. before word-initial vowels in English after a pause. three evenNasal: air is released through the nose [m], [ng]Frivative: [s], [z], [f] air is forced through a narrow channel, leading to turbulent airflowAffricates: [t] begin as stops, but the release is frivative

20Articulation map21 PLACE OF ARTICULATIONbilabiallabio-dentalinter-dentalalveolarpalatalvelarglottalstop p b t d k g qfric. f vthdh s zshzh haffric.chjhnasal m nngapprox wl/r yflapdxVOICING:voicelessvoicedMANNER OF ARTICULATIONVowelsAll voicedVowel heightHow high is the tongue? High or low?Where is its highest point? Front or back?How rounded are the lips?mono- [eh] vs. dipthong [ey]1 vowel sound vs. two22American English Vowel Space23FRONTBACKHIGHLOWeyowawoyayiyihehaeaaaouwuhahaxixuxCompare to vowel spaces in other languagesBritish EnglishIndian EnglishSwedishSpanishMandarin ChineseJapanese24[iy] vs [uw] key vs coo25

(From a lecture given by Rochelle Newman)[ae] vs [aa] cat vs. cot26

(From a lecture given by Rochelle Newman)

Acoustic Landmarks27

[ix][ix][ih][ih][ax][ae][iy][iy][ae][l][p][t][p][t][p][t][sh][s][s]Patricia and Patsy and Sally

CoarticulationThe same phone can be produced differently depending on phonetic context.Articulations overlap as articulators move in different timing patterns to to produce consecutive dounsoundsEight vs. EighthArticulation moves forwardMet vs. MenVowel becomes nasalizedGreen Bananaor greem banana?28Articulator mistimingProbably is canonically [p r aa b ax b l iy][p r aa b iy][p r aw l uh][p r ah b iy][p r aa l iy]Sense is canonically [s eh n s][s eh n t s][s ih t s]29IPA Consonants30

IPA Vowels31

Representations for SoundsWith ways to represent sounds (IPA, Arpabet, etc.) we can classify and manipulate these units.Automatic Speech RecognitionSpeech synthesisSpeech pathologyLanguage IDSpeaker IDButhow do we recognize these different sounds automatically from sound data?Acoustic analysis (digital signal processing)32Next ClassOverview of Spoken Dialog SystemsReadings: J&M 24.1, 24.233