Lecture 2: Phonetics - Stanford...
Transcript of Lecture 2: Phonetics - Stanford...
![Page 1: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/1.jpg)
CS 224S / LINGUIST 285Spoken Language Processing
AndrewMaasStanfordUniversity
Spring2017Lecture2:Phonetics
OriginalslidesbyDanJurafsky
![Page 2: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/2.jpg)
Homework 1� Outafterlecturetoday.Duein1week� PDFhandoutlinkedonwebsitesyllabus� You’llneedtodownloadPRAAT;detailsareinthehomework.
![Page 3: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/3.jpg)
Phonetics� ARPAbet
� An alphabet for transcribing American English phonetic sounds.
� Articulatory Phonetics� How speech sounds are made by articulators
(moving organs) in mouth.� Acoustic Phonetics
� Acoustic properties of speech sounds
![Page 4: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/4.jpg)
ARPAbet� http://www.stanford.edu/class/cs224s/arpabet.ht
ml
� The CMU Pronouncing Dictionary� http://www.speech.cs.cmu.edu/cgi-bin/cmudict
� What about other languages?� International Phonetic Alphabet:� http://en.wikipedia.org/wiki/International_Phoneti
c_Alphabet
![Page 5: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/5.jpg)
ARPAbet Vowelsb_d ARPA b_d ARPA
1 bead iy 9 bode ow2 bid ih 10 booed uw3 bayed ey 11 bud ah4 bed eh 12 bird er5 bad ae 13 bide ay6 bod(y) aa 14 bowed aw7 bawd ao 15 Boyd oy8 Budd(hist) uh
https://corpus.linguistics.berkeley.edu/acip/
Note: Many speakers pronounce Buddhist with the vowel uw as in booed,So for them [uh] is instead the vowel in “put” or “book”
![Page 6: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/6.jpg)
The Speech Chain (Denes and Pinson)
SPEAKERHEARER
![Page 7: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/7.jpg)
Speech Production Process� Respiration:
�We(normally)speakwhilebreathingout.Respirationprovidesairflow.“Pulmonicegressive airstream”
� Phonation� Airstreamsetsvocalfoldsinmotion.Vibrationofvocalfoldsproducessounds.Soundisthenmodulatedby:
� ArticulationandResonance� Shapeofvocaltract,characterizedby:�Oraltract
� Teeth,softpalate(velum),hardpalate� Tongue,lips,uvula
�Nasaltract Text adopted from Sharon Rose
![Page 8: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/8.jpg)
Nasal Cavity
Pharynx
Vocal Folds (within the Larynx)
Trachea
Lungs
Text copyright J. J. Ohala, Sept 2001, from Sharon Rose slide
Sagittal section of the vocal tract(Techmer 1880)
![Page 9: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/9.jpg)
From Mark Liberman’s website, from Ultimate Visual Dictionary
![Page 10: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/10.jpg)
From Mark Liberman’s Web Site, from Language Files (7th ed)
![Page 11: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/11.jpg)
Figure of Ken Stevens, labels from Peter Ladefoged’s web site
![Page 12: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/12.jpg)
USC’s SAIL LabShri Narayanan
![Page 13: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/13.jpg)
![Page 14: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/14.jpg)
Tamil
![Page 15: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/15.jpg)
Larynx and Vocal Folds� TheLarynx(voicebox)
� Astructuremadeofcartilageandmuscle� Locatedabovethetrachea(windpipe)andbelowthepharynx(throat)
� Containsthevocalfolds� (adjectiveforlarynx:laryngeal)
� VocalFolds(olderterm:vocalcords)� Twobandsofmuscleandtissueinthelarynx� Canbesetinmotiontoproducesound(voicing)
Text from slides by Sharon Rose UCSD LING 111 handout
![Page 16: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/16.jpg)
The larynx, external structure, from front
Figure thnx to John Coleman!!
![Page 17: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/17.jpg)
Vertical slice through larynx, as seen from back
Figure thnx to John Coleman!!
![Page 18: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/18.jpg)
Voicing:
•Aircomesupfromlungs•Forcesitswaythroughvocalcords,pushingopen(2,3,4)•Thiscausesairpressureinglottistofall,since:
• whengasrunsthroughconstrictedpassage,itsvelocityincreases(Venturitubeeffect)• thisincreaseinvelocityresultsinadropinpressure(Bernoulliprinciple)
•Becauseofdropinpressure,vocalcordssnaptogetheragain(6-10)•Singlecycle:~1/100ofasecond.
Figure & text from John Coleman’s web site
![Page 19: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/19.jpg)
Voicelessness� Whenvocalcordsareopen,airpassesthroughunobstructed
� Voicelesssounds:p/t/k/s/f/sh/th/ch� Iftheairmovesveryquickly,theturbulencecausesadifferentkindofphonation:whisper
![Page 20: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/20.jpg)
Vocal folds open during breathing
From Mark Liberman’s web site, from Ultimate Visual Dictionary
![Page 21: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/21.jpg)
Vocal Fold Vibration
UCLA Phonetics Lab Demo
![Page 22: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/22.jpg)
Consonants and Vowels� Consonants:phonetically,soundswithaudiblenoiseproducedbyaconstriction
� Vowels:phonetically,soundswithnoaudiblenoiseproducedbyaconstriction
� (it’smorecomplicatedthanthis,sincewehavetoconsidersyllabicfunction,butthiswilldofornow)
Text adapted from John Coleman
![Page 23: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/23.jpg)
Place of Articulation� Consonantsareclassifiedaccordingtothelocationwheretheairflowismostconstricted.
� Thisiscalledplaceofarticulation� Threemajorkindsofplacearticulation:
� Labial (withlips)� Coronal (usingtiporbladeoftongue)�Dorsal (usingbackoftongue)
![Page 24: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/24.jpg)
Places of articulation
labial
dentalalveolar post-alveolar/palatal
velaruvular
pharyngeal
laryngeal/glottal
Figure thanks to Jennifer Venditti
![Page 25: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/25.jpg)
Coronal place
dentalalveolar post-alveolar/palatal
Figure thanks to Jennifer Venditti
Dental:th/dh
Alveolar:t/d/s/z/l
Post:sh/zh/y
![Page 26: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/26.jpg)
Dorsal Place
velaruvular
pharyngeal
Figure thanks to Jennifer Venditti
Velar:k/g/ng
![Page 27: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/27.jpg)
Manner of Articulation� Stop:completeclosureofarticulators,sonoairescapesthroughmouth
� Oralstop:palateisraised,noairescapesthroughnose.Airpressurebuildsupbehindclosure,explodeswhenreleased� p,t,k,b,d,g
� Nasalstop:oralclosure,butpalateislowered,airescapesthroughnose.�m,n,ng
![Page 28: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/28.jpg)
Oral vs. Nasal Sounds
Thanks to Jong-bok Kim for this figure!
![Page 29: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/29.jpg)
More on Manner of articulation of consonants� Fricatives
� Closeapproximationoftwoarticulators,resultinginturbulentairflowbetweenthem,producingahissingsound.� f,v,s,z,th,dh
� Approximant� Notquite-so-closeapproximationoftwoarticulators,sonoturbulence� y,r
� Lateralapproximant� Obstructionofairstreamalongcenteroforaltract,withopeningaroundsidesoftongue.� l
Text from Ladefoged “A Course in Phonetics”
![Page 30: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/30.jpg)
More on manner of articulation of consonants�Taporflap�Tonguemakesasingletapagainstthealveolarridge�dxin“butter”
�Affricate�Stopimmediatelyfollowedbyafricative�ch,jh
![Page 31: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/31.jpg)
Articulatory parameters for English consonants (in ARPAbet)
PLACE OF ARTICULATIONbilabial labio-
dentalinter-dental
alveolar palatal velar glottal
stop p b t d k g q
fric. f v th dh s z sh zh h
affric. ch jh
nasal m n ng
approx w l/r y
flap dx
MA
NN
ERO
F A
RTIC
ULA
TIO
N
VOICING: voiceless voicedTable from Jennifer Venditti
![Page 32: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/32.jpg)
Tongue position for vowels
![Page 33: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/33.jpg)
Vowels
1/5/07
IY AA UW
Fig. from Eric Keller
![Page 34: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/34.jpg)
American English Vowel Space
FRONT BACK
HIGH
LOW
iy
ih
eh
ae aa
ao
uw
uh
ahax
ix ux
Figure from Jennifer VendittiRed: Vowels, Blue: Dipthongs
![Page 35: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/35.jpg)
[iy] vs. [uw]
Figure from Jennifer Venditti, from a lecture given by Rochelle Newman
![Page 36: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/36.jpg)
[ae] vs. [aa]
Figure from Jennifer Venditti, from a lecture given by Rochelle Newman
![Page 37: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/37.jpg)
Where to go for more info� Ladefoged,Peter.1993.ACourseinPhonetics� MarkLiberman’ssite
� http://www.ling.upenn.edu/courses/Spring_2001/ling001/phonetics.html
� JohnColeman’ssite� http://www.phon.ox.ac.uk/%7Ejcoleman/mst_mphil_phonetics_course_index.html
� JenniferSmith’sresourcepage� http://www.unc.edu/~jlsmith/pht-url.html
![Page 38: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/38.jpg)
Sound waves are longitudinal waves
Dan Rusell Figure
![Page 39: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/39.jpg)
particle dispacment
pressure
Dan Rusell Figure
![Page 40: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/40.jpg)
Remember High School PhysicsSimple Period Waves (sine waves)
Time (s)0 0.02
–0.99
0.99
0• Characterized by:• period: T• amplitude A• phase f
• Fundamental frequencyin cycles per second, or Hz• F0=1/T
1 cycle
To listen to sine waves:http://www.szynalski.com/tone-generator/
![Page 41: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/41.jpg)
Simple periodic waves� Computingthefrequencyofawave:
� 5cyclesin.5seconds=10cycles/second=10Hz� Amplitude:
� 1� Equation:
� Y=Asin(2pft)
Thefrequencyofawave:5cyclesin.5seconds=10cycles/second=10Hz
Amplitude:1
![Page 42: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/42.jpg)
Speech sound waves
� Alittlepiecefromthewaveformofthevowel[iy]� Xaxis:time.� Yaxis:
� Amplitude=airpressureatthattime� +:compression� 0:normalairpressure,� -:rarefaction
![Page 43: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/43.jpg)
Back to waves:Fundamental frequency
� Waveformofthevowel[iy]
� Frequency:10repetitions/.03875seconds=258Hz� Thisisspeedthatvocalfoldsmove,hencevoicing� Eachpeakcorrespondstoanopeningofthevocalfolds� ThelowfrequencyofthecomplexwaveiscalledthefundamentalfrequencyofthewaveorF0
![Page 44: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/44.jpg)
She just had a baby
� Notethatvowelsallhaveregularamplitudepeaks� Stopconsonant
� Closurefollowedbyrelease� Noticethesilencefollowedbyslightburstsofemphasis:veryclearfor[b]of“baby”
� Fricative:noisy.[sh]of“she” atbeginning
![Page 45: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/45.jpg)
Fricative
![Page 46: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/46.jpg)
Back to freshman physics:Waves have different frequencies
Time (s)0 0.02
–0.99
0.99
0
Time (s)0 0.02
–0.99
0.99
0
100 Hz
1000 Hz
![Page 47: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/47.jpg)
Complex waves: Adding a 100 Hz and 1000 Hz wave together
Time (s)0 0.05
–0.9654
0.99
0
![Page 48: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/48.jpg)
Spectrum
100 1000Frequency in Hz
Am
plitu
de
Frequency components (100 and 1000 Hz) on x-axis
![Page 49: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/49.jpg)
Spectra continued� Fourieranalysis:anywavecanberepresentedasthe(infinite)sumofsinewavesofdifferentfrequencies(amplitude,phase)
![Page 50: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/50.jpg)
Spectrum of one instant in an actual soundwave: many components across frequency range
Frequency (Hz)0 5000
0
20
40
![Page 51: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/51.jpg)
Part of [ae] waveform from “had”
� Notecomplexwaverepeatingninetimesinfigure� Plussmallerwaveswhichrepeats4timesforeverylargepattern
� Largewavehasfrequencyof250Hz(9timesin.036seconds)
� Smallwaveroughly4timesthis,orroughly1000Hz� Twolittletinywavesontopofpeakof1000Hzwaves
![Page 52: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/52.jpg)
Back to spectrum� Spectrumrepresentsthesefreq components� ComputedbyFouriertransform
� x-axisshowsfrequency,y-axisshowsmagnitude(indecibels)
� Peaksat930Hz,1860Hz,and3020Hz.
![Page 53: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/53.jpg)
Seeing formants: the spectrogram
1/5/07
![Page 54: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/54.jpg)
Formants� Vowelslargelydistinguishedby2characteristicpitches.
� Oneofthem(thehigherofthetwo)goesdownwardthroughouttheseriesiyihehaeaaaoouu
� Theothergoesupforthefirstfourvowelsandthendownforthenextfour.
� Thesearecalled"formants"ofthevowels,loweris1stformant,higheris2ndformant.
![Page 55: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/55.jpg)
Spectrogram: spectrum + time dimension
![Page 56: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/56.jpg)
How to read spectrograms
� bab:closureoflipslowersallformants:sorapidincreaseinallformantsatbeginningof"bab”
� dad:firstformantincreases,butF2andF3slightfall� gag:F2andF3cometogether:thisisacharacteristicofvelars.Formanttransitionstakelongerinvelarsthaninalveolars orlabials
From Ladefoged “A Course in Phonetics”
![Page 57: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/57.jpg)
She came back and started again
� 1.lotsofhigh-freq energy� 3.closurefork� 4.burstofaspirationfork� 5.ey vowel;faint 1100Hzformantisnasalization� 6.bilabialnasal� 7.shortbclosure,voicingbarelyvisible.� 8.ae;noteupwardtransitionsafterbilabialstopatbeginning� 9.noteF2andF3comingtogetherfor"k”
From Ladefoged “A Course in Phonetics”
![Page 58: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/58.jpg)
Praat example� http://www.fon.hum.uva.nl/praat/
![Page 59: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/59.jpg)
Different vowels have different formants
� Everytimethevocalcordsopenandclose,pulseofairfromthelungsissharptaponairinvocaltract.
� Settingairinvocalcavityvibrating,producingdifferentharmonics
![Page 60: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/60.jpg)
Vocal Fold Cycles
![Page 61: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/61.jpg)
The vocal source at 150 Hz� a
![Page 62: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/62.jpg)
The harmonics� a
![Page 63: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/63.jpg)
Source filter model of vowels
�Anybodyofairwillvibrateinawaythatdependsonitssizeandshape.
�Vocaltractas"amplifier";amplifiescertainharmonics
�Formantsareresultofdifferentshapesofvocaltract.
![Page 64: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/64.jpg)
The oral cavity amplifies some harmonics� a
![Page 65: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/65.jpg)
Source-filter model of speech production
Input Filter Output
Glottal spectrum Vocal tract frequencyresponse function
Figures and text from Ratree Wayland slide from his website
Source and filter are independent, so:Different vowels can have same pitchThe same vowel can have different pitch
![Page 66: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/66.jpg)
![Page 67: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/67.jpg)
FromMarkLiberman’sWeb site
![Page 68: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/68.jpg)
Resonances of the vocal tract� Thehumanvocaltractasanopentube
� Airinatubeofagivenlengthwilltendtovibrateatresonancefrequencyoftube.
Closed end Open end
Length 17.5 cm.
Figure from Ladefoged(1996) p 117
![Page 69: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/69.jpg)
Resonances of the vocal tract
� Thehumanvocaltractasanopentube
� Airinatubeofagivenlengthwilltendtovibrateatresonancefrequencyoftube.
Closed end Open end
Length 17.5 cm.
Figure from W. Barry Speech Science slides
![Page 70: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/70.jpg)
Resonances of the vocal tract� Ifvocaltractiscylindricaltubeopenatoneend� Standingwavesformintubes� Waveswillresonateiftheirwavelengthcorrespondstodimensionsoftube
� Constraint:Pressuredifferentialshouldbemaximalat(closed)glottalendandminimalat(open)lipend.
� Nextslideshowswhatkindoflengthofwavescanfitintoatubewiththiscontraint
![Page 71: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/71.jpg)
1/5/07From Sundberg
![Page 72: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/72.jpg)
Defining Intonation� Ladd(1996)“Intonational phonology”� “Theuseofsuprasegmental phonetic features
Suprasegmental =above&beyondthesegment/phone� F0� Intensity(energy)� Duration
� toconveysentence-level pragmaticmeanings”� I.e.meaningsthatapplytophrasesorutterancesasawhole,notlexicalstress,notlexicaltone.
![Page 73: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/73.jpg)
Pitch track
�
![Page 74: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/74.jpg)
Pitch is not Frequency� PitchisthementalsensationorperceptualcorrelatedofF0
� RelationshipbetweenpitchandF0isnotlinear;� humanpitchperceptionismostaccuratebetween100Hzand1000Hz.� Linearinthisrange� Logarithmicabove1000Hz
� MelscaleisonemodelofthisF0-pitchmapping� Amelisaunitofpitchdefinedsothatpairsofsoundswhichareperceptuallyequidistantinpitchareseparatedbyanequalnumberofmels
� Frequencyinmels=1127ln(1+f/700)
![Page 75: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/75.jpg)
Plot of Intensity
![Page 76: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/76.jpg)
Three aspects of prosody� Prominence: some syllables/words are more
prominent than others� Structure/boundaries: sentences have prosodic
structure� Some words group naturally together� Others have a noticeable break or disjuncture
between them� Tune: the intonational melody of an utterance.
From Ladd (1996)
![Page 77: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/77.jpg)
Prosodic Boundaries
I met Mary and Elena’s mother at the mall yesterday.I met Mary and Elena’s mother at the mall yesterday.
French [bread and cheese][French bread] and [cheese]
Slide from Jennifer Venditti
![Page 78: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/78.jpg)
Intonational tunes
![Page 79: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/79.jpg)
Yes-No question tune
are LEGUMES a good source of vitamins
Rise from the main accent to the end of the sentence.
50100150200250300350400450500550
Slide from Jennifer Venditti
![Page 80: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/80.jpg)
Yes-No question tune
are legumes a GOOD source of vitamins
Rise from the main accent to the end of the sentence.
50100150200250300350400450500550
Slide from Jennifer Venditti
![Page 81: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/81.jpg)
Yes-No question tune
are legumes a good source of VITAMINS
Rise from the main accent to the end of the sentence.
50100150200250300350400450500550
Slide from Jennifer Venditti
![Page 82: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/82.jpg)
WH-questions
50
100
150
200
250
300
350
400
WHAT are a good source of vitamins
WH-questions typically have falling contours, like statements.
[I know that many natural foods are healthy, but ...]
Slide from Jennifer Venditti
![Page 83: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/83.jpg)
Broad focus
legumes are a good source of vitamins
“Tell me something about the world.”
50
100
150
200
250
300
350
400
Slide from Jennifer Venditti
In the absence of narrow focus, English tends to mark the firstand last ‘content’ words with perceptually prominent accents.
![Page 84: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/84.jpg)
Rising statements
50100150200250300350400450500550
legumes are a good source of vitamins
High-rising statements can signal that the speaker is seeking approval.
“Tell me something I didn’t already know.”
[... does this statement qualify?]
Slide from Jennifer Venditti
![Page 85: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/85.jpg)
Yes-No question
50100150200250300350400450500550
are legumes a good source of VITAMINS
Rise from the main accent to the end of the sentence.
Slide from Jennifer Venditti
![Page 86: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/86.jpg)
‘Surprise-redundancy’ tune
legumes are a good source of vitamins
Low beginning followed by a gradual rise to a high at the end.
[How many times do I have to tell you ...]
50
100
150
200
250
300
350
400
Slide from Jennifer Venditti
![Page 87: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/87.jpg)
‘Contradiction’ tune
50
100
150
200
250
300
350
400
linguini isn’t a good source of vitamins
Sharp fall at the beginning, flat and low, then rising at the end.
“I’ve heard that linguini is a good source of vitamins.”
[... how could you think that?]
Slide from Jennifer Venditti
![Page 88: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/88.jpg)
Thinking about F0
![Page 89: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/89.jpg)
Graphic representation of F0
legumes are a good source of VITAMINS50
100
150
200
250
300
350
400
time
F0 (i
n H
ertz
)
Slide from Jennifer Venditti
![Page 90: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/90.jpg)
The ‘ripples’
legumes are a good source of VITAMINS[ t ][ s ] [ s ]
50
100
150
200
250
300
350
400
F0 is not defined for consonants without vocalfold vibration.
Slide from Jennifer Venditti
![Page 91: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/91.jpg)
The ‘ripples’
legumes are a good source of VITAMINS[ v ][ g ] [ g ][ z ]
50
100
150
200
250
300
350
400
... and F0 can be perturbed by consonants withan extreme constriction in the vocal tract.
Slide from Jennifer Venditti
![Page 92: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/92.jpg)
Abstraction of the F0 contour
legumes are a good source of VITAMINS50
100
150
200
250
300
350
400
Our perception of the intonation contour abstracts away from these perturbations.
Slide from Jennifer Venditti
![Page 93: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/93.jpg)
The ‘waves’ and the ‘swells’
legumes are a good source of VITAMINS50
100
150
200
250
300
350
400 ‘wave’ = accent
‘swell’ = phrase
Slide from Jennifer Venditti
![Page 94: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/94.jpg)
Prominence: Placement of Pitch Accents
![Page 95: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/95.jpg)
Stress vs. accent� Stress is a structural property of a word
� it marks a potential (arbitrary) location for an accent to occur, if there is one.
� Accent is a property of a word in context� it is a way to mark intonational prominence in order to ‘highlight’
important words in the discourse.
(x) (x) (accented syll)x x stressed syllx x x full vowelsx x x x x x x syllablesvi ta mins Ca li for nia
Slide from Jennifer Venditti
![Page 96: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/96.jpg)
Stress vs. accent (2)� The speaker decides to make the word
vitamin more prominent by accenting it.� Lexical stress tell us that this prominence will
appear on the first syllable, hence VItamin.
� So prosodic prominence is a function of� lexicon� context
� I’m a little surPRISED to hear it CHARacterized as upBEAT
![Page 97: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/97.jpg)
Which word receives an accent?
� It depends on the context. � The ‘new’ information in the answer to a question is
often accented� while the ‘old’ information is usually not.
� Q1: What types of foods are a good source of vitamins?� A1: LEGUMES are a good source of vitamins.
� Q2: Are legumes a source of vitamins?� A2: Legumes are a GOOD source of vitamins.
� Q3: I’ve heard that legumes are healthy, but what are they a good source of ?
� A3: Legumes are a good source of VITAMINS.
Slide from Jennifer Venditti
![Page 98: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/98.jpg)
Same ‘tune’, different alignment
50
100
150
200
250
300
350
400
LEGUMES are a good source of vitamins
The main rise-fall accent (= “I assert this”) shifts locations.
Slide from Jennifer Venditti
![Page 99: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/99.jpg)
Same ‘tune’, different alignment
50
100
150
200
250
300
350
400
Legumes are a GOOD source of vitamins
The main rise-fall accent (= “I assert this”) shifts locations.
Slide from Jennifer Venditti
![Page 100: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/100.jpg)
Same ‘tune’, different alignment
legumes are a good source of VITAMINS50
100
150
200
250
300
350
400
The main rise-fall accent (= “I assert this”) shifts locations.
Slide from Jennifer Venditti
![Page 101: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/101.jpg)
Levels of prominence� Mostphraseshavemorethanoneaccent� Thelastaccentinaphraseisperceivedasmoreprominent
� Calledthe NuclearAccent� Emphaticaccentslikenuclearaccentoftenusedforsemanticpurposes,suchasindicatingthatawordiscontrastive,orthesemanticfocus.� Thekindofthingyouuses***sinIM,orcapitalizedletters� ‘IknowSOMETHING interestingissuretohappen,’ shesaidtoherself.
� Canalsohavewordsthatareless prominentthanusual� Reducedwords,especiallyfunctionwords.
� Oftenuse4classesofprominence:� Emphaticaccent,pitchaccent,unaccented,reduced
![Page 102: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/102.jpg)
Intonational phrasing/boundaries
![Page 103: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/103.jpg)
A single intonation phrase
50
100
150
200
250
300
350
400
legumes are a good source of vitamins
Broad focus statement consisting of one intonation phrase(that is, one intonation tune spans the whole unit).
Slide from Jennifer Venditti
![Page 104: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/104.jpg)
Multiple phrases
50
100
150
200
250
300
350
400
legumes are a good source of vitamins
Utterances can be ‘chunked’ up into smaller phrases in order to signal the importance of information in each unit.
Slide from Jennifer Venditti
![Page 105: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/105.jpg)
Phrasing sometimes helps disambiguate� Global ambiguity:
The old men and women stayed home.
Sally saw the man with the binoculars.
John doesn’t drink because he’s unhappy.
Slide from Jennifer Venditti
![Page 106: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/106.jpg)
Phrasing can disambiguate� Global ambiguity:
The old men and women stayed home.The old men % and women % stayed home.
Sally saw % the man with the binoculars.Sally saw the man % with the binoculars.
John doesn’t drink because he’s unhappy.John doesn’t drink % because he’s unhappy.
Slide from Jennifer Venditti
![Page 107: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/107.jpg)
Phrasing sometimes helps disambiguate
� Temporary ambiguity:When Madonna sings the song ...
Slide from Jennifer Venditti
![Page 108: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/108.jpg)
Phrasing sometimes helps disambiguate
� Temporary ambiguity:When Madonna sings the song is a hit.
Slide from Jennifer Venditti
![Page 109: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/109.jpg)
Phrasing sometimes helps disambiguate
� Temporary ambiguity:When Madonna sings % the song is a hit.
When Madonna sings the song % it’s a hit.
[from Speer & Kjelgaard (1992)]
Slide from Jennifer Venditti
![Page 110: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/110.jpg)
Phrasing sometimes helps disambiguate
50
100
150
200
250
300
350
400
I met Mary and Elena’s mother at the mall yesterday
Mary & Elena’s mothermall
One intonation phrase with relatively flat overall pitch range.
Slide from Jennifer Venditti
![Page 111: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/111.jpg)
Phrasing sometimes helps disambiguate
50
100
150
200
250
300
350
400
I met Mary and Elena’s mother at the mall yesterday
Marymall
Elena’s mother
Separate phrases, with expanded pitch movements.
Slide from Jennifer Venditti
![Page 112: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/112.jpg)
Using Intonation in Spoken Language Processing1) Prominence/Accent: Tells us about
focus of utterance2) Tune: whether utterance is
question/statement, important for affect extraction
3) Boundaries: can help parsing
![Page 113: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/113.jpg)
More phonetic structure� Syllables
� Composedofvowelsandconsonants.Notwelldefined.Somethinglikea“vowelnucleuswithsomeofitssurroundingconsonants”.
![Page 114: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/114.jpg)
More phonetic structure� Stress
� Somesyllableshavemoreenergythanothers� Stressedsyllablesversusunstressedsyllables� (an)‘INsult vs.(to)in’SULT� (an)‘OBject vs.(to)ob’JECT
� Simplemodel:everymulti-syllabicwordhasonesyllablewith:� “primarystress”
� Wecanrepresentbyusingthenumber“1” onthevowel(andanimplicitunmarkingontheothervowels)
� “table”:tey1baxl� “machine:maxsh iy1n
� Alsopossible:“secondarystress”,markedwitha“2”� ih-2nfaxr mey-1sh axn
� Thirdcategory:reduced:schwa:� ax
![Page 115: Lecture 2: Phonetics - Stanford Universityweb.stanford.edu/class/cs224s/lectures/224s.17.lec2.pdf · CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University](https://reader030.fdocuments.us/reader030/viewer/2022020108/5aa457347f8b9ae7438bf1b8/html5/thumbnails/115.jpg)