CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech...
Transcript of CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech...
![Page 1: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/1.jpg)
CS460/626 : Natural Language Processing/Speech, NLP and the Web
Lecture 28, 29:Phonetics, Phonology and Speech; introduce transliterationPhonetics, Phonology and Speech; introduce transliteration
Pushpak BhattacharyyaCSE Dept., IIT Bombay
28th and 29th Oct, 2012
![Page 2: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/2.jpg)
Speech and NLP
� Speech is the “original” language data
� Writing system came much later!
� Word boundary and pause can completely alter the meaning of utterancesutterances� aa jaayenge/aaj aayenge
� I got a plate/I got up late
� When it rains cats and dogs, run for cover/When it rains, cats and dogs run for cover
� Speech to Speech Machine Translation: killer application
![Page 3: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/3.jpg)
A visionText in in L1
Text in L2
Machine
Translation
TTS: Text toSpeech
Utterance in L2Utterance
in L1
ASR: AutomaticSpeech Recognition
Speech
![Page 4: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/4.jpg)
The trinity
NLPProblem
Part of SpeechTagging
Parsing
Semantics NLP
Trinity
Vision Speech
Algorithm
LanguageHindi
Marathi
English
FrenchMorphAnalysis
Statistics and Probability
+Knowledge Based
CRF
HMM
MEMM
![Page 5: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/5.jpg)
NLP Layer and speech
Parsing
Semantics Extraction
Discourse and Co reference
IncreasedComplexity OfProcessing
Morphology
POS tagging
Chunking
Parsing
All thesestagesapply tospoken
utterances
too
![Page 6: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/6.jpg)
Probabilistic Speech Recognition
� Problem Definition : Given a sequence of speech
signals, identify the words.
� 2 steps :
� Segmentation (Word Boundary Detection)� Segmentation (Word Boundary Detection)
� Identify the word
� Isolated Word Recognition :
� Identify W given SS (speech signal)
^
arg max ( | )W
W P W SS=
![Page 7: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/7.jpg)
Speech recognition: Identifying the word
^
arg max ( | )
arg max ( ) ( | )W
W
W P W SS
P W P SS W
=
=
P(SS|W) = likelihood called “phonological model “ � P(SS|W) = likelihood called “phonological model “ � intuitively more tractable!
� P(W) = prior probability called “language model”
# W appears in the corpus( )
# words in the corpusP W =
![Page 8: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/8.jpg)
Pronunciation Dictionary
t o m o
ae
t end
s4
1.0 1.0 1.0 1.01.00.73
Word Pronunciation Automaton
Tomato
� P(SS|W) is maintained in this way.
� P(t o m ae t o |Word is “tomato”) = Product of arc probabilities
t o m ot
aa
end
s1 s2 s3
s5
s6 s7
1.0 1.0 1.0 1.0
1.00.27Tomato
![Page 9: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/9.jpg)
Grapheme to phoneme mapping is not unique
� The plural morpheme:� -s:
� /s/ (cats)
/z/ (dogs)� /z/ (dogs)
� /iz/ (bushes)
� Different sounds
![Page 10: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/10.jpg)
Representing sound can be challenging (as its meaning)� Afrikaans: bromponie a motor scooter (literally, a growling or
muttering pony)
� IsiNdebele: U-Linda mind the village until the father’s return
� Setswana: bitlisisa a sore eye that has been rubbed
� Tshivenda: mmbwe a round pebble taken from a crocodile’s stomach and swallowed by a chief
mvula-tshikole rain with sunshinemvula-tshikole rain with sunshine
� Xitsonga: byatabyata to try to say something but fail for lack of words
kentenga to find oneself suddenly without some vital item (of a man whose only wife has run away, or when the roof of a hut has blown off)
(The above are African languages)
![Page 11: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/11.jpg)
CMU Pronunciation dictionary
� machine-readable pronunciation dictionary for North American English that contains over 125,000 words and that contains over 125,000 words and their transcriptions.
� The current phoneme set contains 39 phonemes
![Page 12: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/12.jpg)
“Parallel” Corpus
Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
![Page 13: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/13.jpg)
“Parallel” Corpus cntd
Phoneme Example Translation
------- ------- -----------CH cheese CH IY Z
D dee D IY
DH thee DH IY EH Ed EH D DH thee DH IY EH Ed EH D
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH gee JH IY
![Page 14: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/14.jpg)
A Statistical Machine Translation like task
� First obtain the Carnegie Mellon University's Pronouncing Dictionary
� Train and Test the following Statistical � Train and Test the following Statistical Machine Learning Algorithms
� HMM - For HMM we can use either Natural Language Toolkit or you can use GIZA++ with MOSES
![Page 15: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/15.jpg)
Phonetics and Phonology
� Phonetics: The study of speech sounds� Articulatory
� Acoustic
� Auditory
� Phonology: the structure and patterning of sounds
� Phonetic Transcription:� A writing system for representing speech
sounds
![Page 16: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/16.jpg)
The need for phonetic transcription
� Eccentricity of English Spelling� Put/Putt
� Car/Kite
� Rough/PuffRough/Puff
� ‘Fish’ can be spelt ‘ghoti’; (Bernard Shaw: ‘laugh’, ‘women’, ‘nation’)
� A standardized system for representing sounds in languages
� IPA (International)
� ARPABET (mainly US)
![Page 17: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/17.jpg)
IPA and ARPAbet vowels
![Page 18: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/18.jpg)
IPA and ARPAbet consonents
![Page 19: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/19.jpg)
Text Input Methods: Keyboard
English QWERTY
![Page 20: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/20.jpg)
![Page 21: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/21.jpg)
Classification
� Manner of articulation
� Place of articulation
� Voicedness� Voicedness
![Page 22: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/22.jpg)
Ancient 5 x 5 Indian Classification of ConsonantsGroup
क वग� क ख ग घ ङ Velar
च वग� च छ ज झ ञ Palatal
ट वग� ट ठ ड ढ ण Alveolar
त वग� त थ द ध न Dental
प वग� प फ ब भ म Labialप वग� प फ ब भ म Labial
![Page 23: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/23.jpg)
Stops
� /p/ - voiceless bilabial
� /b/ - voiced bilabial
� /t/ - voiceless alveolar� /t/ - voiceless alveolar
� /d/ - voiced alveolar
� /k/ - voiceless velar
� /g/ - voiced velar
![Page 24: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/24.jpg)
Fricatives
� /f/
� /v/
� /th/� /th/
� /dh/
� /s/
� /sh/
� /zh/
� /h/
![Page 25: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/25.jpg)
Affricates
� /ch/
� /jh/
![Page 26: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/26.jpg)
Nasals
� /m/
� /n/
� /ng/� /ng/
![Page 27: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/27.jpg)
![Page 28: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/28.jpg)
The plural sound
� Cats, racks … /s/
� dogs, rags … /z/
� Bushes, classes … /iz/� Bushes, classes … /iz/
Hypotheses?
![Page 29: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/29.jpg)
Place of Articulation� Labial: Two lips coming together
� [p] as in possum, [b] as in bear
� Dental: Tongue against the teeth
� [th] of thing or the [dh] of though
� Alveolar: Alveolar ridge is the portion of the roof of the mouth just behind the upper teeth; tip of the tongue against the alveolar ridge.
� Phones [s], [z], [t], and [d]
� Palatal: Roof of the mouth; blade of the tongue against this rising back of the alveolar ridge
� sounds [sh] (shrimp), [ch] (china), [zh] (Asian), and [jh] (jar)
� Velar: Movable muscular flap at the back of the roof of the mouth; back of the tongue up against the � Velar: Movable muscular flap at the back of the roof of the mouth; back of the tongue up against the velum
� sounds [k] (cuckoo), [g] (goose), and [N] (kingfisher)
� Glottal: closing the glottis (by bringing the vocal folds together)
� glottal stop [q] (IPA [P]) is made by closing the glotis (Urdu: gam: sadness)
![Page 30: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/30.jpg)
Manner of Articulation: Stops and Nasals
� All consonants are produced by restriction of airflow
� Manner of Articulation; how the restriction is produced:
� complete or partial stoppage
� A stop is a consonant in which airflow is completely blocked for a short time
� English has voiced stops like [b], [d], and [g] as well as unvoiced stops like [p], [t], and [k].
� Stops are also called plosives
Nasal sounds [n], [m], and [ng] are made by lowering the velum and allowing air � Nasal sounds [n], [m], and [ng] are made by lowering the velum and allowing air to pass into the nasal cavity
![Page 31: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/31.jpg)
Fricatives� Fricatives, airflow is constricted but not cut off completely. The turbulent airflow that
results from the constriction produces a characteristic “hissing” sound.
� The English labiodental fricatives [f] and [v] are produced by pressing the lower lip against the upper teeth, allowing a restricted airflow between the upper teeth.
� The dental fricatives [th] and [dh] allow air to flow around the tongue between the teeth.
� The alveolar fricatives [s] and [z] are produced with the tongue against the alveolar ridge, forcing air over the edge of the teeth.
� In the palato-alveolar fricatives [sh] and [zh] the tongue is at the back of the alveolar ridge forcing air through a groove formed in the tongue. ridge forcing air through a groove formed in the tongue.
![Page 32: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/32.jpg)
Affricates, Laterals/Liquids and Taps/Flaps
� Affricates are stops followed immediately by fricatives
� English [ch] (chicken); Marathi chaa (e.g., gharaachaa; of the house)
� Lateral or Liquids: tip of the tongue up against the alveolar ridge or the teeth, with one or both sides of the tongue lowered to allow air to flow over it
� [l] (learn)
� Tap or flap: quick motion of the tongue against the alveolar ridge
� [dx] (IPA [R])
The consonant in the middle of the word lotus ([l ow dx ax s]) is a tap in most dialects � The consonant in the middle of the word lotus ([l ow dx ax s]) is a tap in most dialects of American English
� speakers of many UK dialects would use a [t] instead of a tap in this word.
![Page 33: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/33.jpg)
Articulation of consonants: Larynx action/glottis state (1/2)
� Vocal cords are pulled apart. The air passes freely through the glottis. This is called the voicelessness state and sounds produced with this configuration of the vocal cords are called voiceless: p t k f θ s ʃʃʃʃ tʃʃʃʃ
� Vocal cords are pulled close together. The air passing through the glottis causes the vocal cords to vibrate. This is called the voicing state and sounds produced with this configuration of the vocal cords are called voiced: b d g v ð z ʒ dʒvoiced: b d g v ð z ʒ dʒ
![Page 34: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/34.jpg)
Articulation of consonants: Larynx action/glottis state (2/2)
� Vocal cords are apart at the back and pulled together at the front. This is called the whisper state.
� Vocal cords assume the voicing state but are relaxed. This is called the murmur state.
![Page 35: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/35.jpg)
Vowels (1/2)
![Page 36: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/36.jpg)
Vowels (2/2)
![Page 37: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/37.jpg)
Phonology: Syllables
![Page 38: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/38.jpg)
Basic of syllables
“Syllable is a unit of spoken language
consisting of a single uninterrupted soundformed generally by a Vowel and preceded orfollowed by one or more consonants.”
� Vowels are the heart of a syllable (MostSonorous Element) (svayam raajate itisvaraH)
� Consonants act as sounds attached tovowels.
![Page 39: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/39.jpg)
Syllable structure
� A syllable consists of 3 major parts:-� Onset (C)
� Nucleus (V)Nucleus (V)
� Coda (C)
� Vowels sit in the Nucleus of a syllable
� Consonants may get attached as Onset or Coda.
� Basic structure - CV
![Page 40: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/40.jpg)
Possible syllable structures� The Nucleus is always present
� Onset and Coda may be absentmay be absent
� Possible structures
� V
� CV
� VC
� CVC
![Page 41: CS460/626 : Natural Language Processing/Speech, …pb/cs626-sem1-2012/cs626...Speech and NLP Speech is the “original” language data Writing system came much later! Word boundary](https://reader036.fdocuments.us/reader036/viewer/2022062602/5e8d03e8635eb3572f7cfa22/html5/thumbnails/41.jpg)
syllable theories
� Prominence Theory
� E.g. entertaining /entәteɪnɪŋ/
� The peaks of prominence: vowels /e ә eɪɪ/
Number of syllables: 4� Number of syllables: 4
� Chest Pulse Theory
� Based on muscular activities
� Sonority Theory
� Based on relative soundness of segment within words