LITE DEPALMA GREENBERG, LLC Bruce D. Greenberg Jeffrey A ...
From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg
description
Transcript of From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg
![Page 1: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/1.jpg)
From Here to UtilityMelding Phonetic Insight With Speech Technology
Steven GreenbergInternational Computer Science Institute1947 Center Street, Berkeley, CA 94704
http://www.icsi.berkeley.edu/[email protected]
![Page 2: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/2.jpg)
Acknowledgements and Thanks
Automatic Feature Classification and AnalysisJoy Hollenback, Shawn Chang, Leah Hitchcock
Research FundingU.S. National Science FoundationU.S. Department of Defense
![Page 3: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/3.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
![Page 4: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/4.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
The Varieties of Scientific Experience• The Fundamental Duality• The Eternal Pentangle• The Inner Triangle
![Page 5: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/5.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
The Varieties of Scientific Experience• The Fundamental Duality• The Eternal Pentangle• The Inner Triangle
The Importance of Being Phonetically Annotated• A Corpus-Centric Perspective on Spoken Language• Phonetic Annotation of Spontaneous American English Discourse
![Page 6: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/6.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
The Varieties of Scientific Experience• The Fundamental Duality• The Eternal Pentangle• The Inner Triangle
The Importance of Being Phonetically Annotated• A Corpus-Centric Perspective on Spoken Language• Phonetic Annotation of Spontaneous American English Discourse
Phonetic Dissection of Automatic Speech Recognition Systems• Stress Accent and Word Error Rate• Syllable Structure and Word Error Rate
![Page 7: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/7.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
The Varieties of Scientific Experience• The Fundamental Duality• The Eternal Pentangle• The Inner Triangle
The Importance of Being Phonetically Annotated• A Corpus-Centric Perspective on Spoken Language• Phonetic Annotation of Spontaneous American English Discourse
Phonetic Dissection of Automatic Speech Recognition Systems• Stress Accent and Word Error Rate• Syllable Structure and Word Error Rate
The Relation Between Stress Accent and Vocalic Identity• The Relation Between Segmental Duration and Vowel Height• Durational Differences Between Stressed and Unstressed Vowels• The Relation Between Vowel Height and Stress Accent
![Page 8: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/8.jpg)
Road Map of the PresentationWhat is Truth?
• The story of Rashomon, a film by Akira Kurosawa• Its application to spoken language
The Varieties of Scientific Experience• The Fundamental Duality• The Eternal Pentangle• The Inner Triangle
The Importance of Being Phonetically Annotated• A Corpus-Centric Perspective on Spoken Language• Phonetic Annotation of Spontaneous American English Discourse
Phonetic Dissection of Automatic Speech Recognition Systems• Stress Accent and Word Error Rate• Syllable Structure and Word Error Rate
The Relation Between Stress Accent and Vocalic Identity• The Relation Between Segmental Duration and Vowel Height• Durational Differences Between Stressed and Unstressed Vowels• The Relation Between Vowel Height and Stress Accent
Spoken Language – What is Truth?• Fundamental Questions Remain Unanswered
![Page 9: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/9.jpg)
Part One
WHAT IS TRUTH?
The Story of Rashomon
Its Moral for the Study of Spoken Language
![Page 10: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/10.jpg)
Rashomon – What is Truth?It is twelfth-century Japan, and a nobleman has died ….
![Page 11: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/11.jpg)
This we learn from a conversation between a woodcutter, a priest and a peasant under a gate in the ancient city of Kyoto ….
Rashomon – What is Truth?
![Page 12: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/12.jpg)
The woodcutter and the priest have just come from a judicial inquest into the death, and are telling the peasant what they have heard
Rashomon – What is Truth?
![Page 13: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/13.jpg)
The woodcutter and the priest have just come from a judicial inquest into the death, and are telling the peasant what they have heard
The woodcutter testified at the inquest, having witnessed the sequence of events resulting in the Nobleman’s death
Rashomon – What is Truth?
![Page 14: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/14.jpg)
The story begins with the capture of the notorious bandit, Tajomaru, who is the accused in the nobleman’s death ….
Rashomon – What is Truth?
![Page 15: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/15.jpg)
The nobleman and his wife had been traveling through the forest ….
Rashomon – What is Truth?
![Page 16: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/16.jpg)
When, all of a sudden,
Rashomon – What is Truth?
![Page 17: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/17.jpg)
When, all of a sudden, they are confronted by Tajomaru, who halts their progress ….
Rashomon – What is Truth?
![Page 18: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/18.jpg)
The nobleman and bandit go off alone into a thicket, where the former winds up being subdued by the latter
Rashomon – What is Truth?
![Page 19: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/19.jpg)
The nobleman is tied to a tree and forced to watch as his wife is violated by the bandit
Rashomon – What is Truth?
![Page 20: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/20.jpg)
Rashomon – What is Truth?The wife, at first, resists ….
![Page 21: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/21.jpg)
Rashomon – What is Truth?But eventually drops the dagger and submits
![Page 22: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/22.jpg)
So far, all parties concerned agree (roughly) as to the course of events, but from this point on the picture becomes murky, with each participant telling a somewhat different version of the story
Rashomon – What is Truth?
![Page 23: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/23.jpg)
In two versions (Tajomaru’s and the woodcutter’s) the wife insists that her husband and the bandit fight for her honor. The nobleman’s death results from losing the duel.
Rashomon – What is Truth?
![Page 24: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/24.jpg)
Rashomon – What is Truth?In the wife’s version, the bandit departs, with the husband still tied to the tree. The
husband proceeds to taunt his wife, telling her how ashamed he is – of her!
![Page 25: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/25.jpg)
Rashomon – What is Truth?She cuts the rope binding her husband to the tree and asks to be killed! The wife
promptly faints and when she awakens, finds the dagger in the chest of her (now very dead) husband
![Page 26: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/26.jpg)
In yet another version (the husband’s through a spirit medium) his wife betrays him and tries to convince the bandit to kill the husband
Rashomon – What is Truth?
![Page 27: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/27.jpg)
However, the bandit is repulsed by this suggestion and quickly departs ….
Rashomon – What is Truth?
![Page 28: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/28.jpg)
However, the bandit is repulsed by this suggestion and quickly departs ….
The nobleman, still tied to the tree, picks up the dagger and plunges it into his chest, thus taking his own life
Rashomon – What is Truth?
![Page 29: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/29.jpg)
However, the bandit is repulsed by this suggestion and quickly departs ….
The nobleman, still tied to the tree, picks up the dagger and plunges it into his chest, thus taking his own life
Some time later the (now very dead) nobleman is aware of someone (it is not clear who) removing the dagger from his chest
Rashomon – What is Truth?
![Page 30: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/30.jpg)
The film ends as the priest, woodcutter and peasant mull over the significance of the disparate accounts of the nobleman’s death, seeking some kernel of truth in the morass of ambiguity and uncertainty
Rashomon – What is Truth?
![Page 31: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/31.jpg)
The film ends as the priest, woodcutter and peasant mull over the significance of the disparate accounts of the nobleman’s death, seeking some kernel of truth in the morass of ambiguity and uncertainty
It is unclear whether ANY witness has been entirely truthful
Rashomon – What is Truth?
![Page 32: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/32.jpg)
The film ends as the priest, woodcutter and peasant mull over the significance of the disparate accounts of the nobleman’s death, seeking some kernel of truth in the morass of ambiguity and uncertainty
It is unclear whether ANY witness has been entirely truthful (probably not)
Rashomon – What is Truth?
![Page 33: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/33.jpg)
The story of Rashomon is cited often in philosophical discussions of “truth”
Rashomon – What is Truth?
![Page 34: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/34.jpg)
The story of Rashomon is cited often in philosophical discussions of “truth”
As nothing is known (or knowable) with absolute certainty, all knowledge is relative (and hence ephemeral)
Rashomon – What is Truth?
![Page 35: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/35.jpg)
The story of Rashomon is cited often in philosophical discussions of “truth”
As nothing is known (or knowable) with absolute certainty, all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera
Rashomon – What is Truth?
![Page 36: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/36.jpg)
The story of Rashomon is cited often in philosophical discussions of “truth”
As nothing is known (or knowable) with absolute certainty, all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera
Rashomon – What is Truth?
![Page 37: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/37.jpg)
The story of Rashomon is cited often in philosophical discussions of “truth”
As nothing is known (or knowable) with absolute certainty, all knowledge is relative (and hence ephemeral)
The concept of truth is a chimera and therefore unworthy of pursuit
Rashomon – What is Truth?
![Page 38: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/38.jpg)
Yet, there is an alternative interpretation, one that questions not the concept of truth itself, but rather the capacity of its assimilation through a single vantage point
Rashomon – What is Truth?
![Page 39: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/39.jpg)
Yet, there is an alternative interpretation, one that questions not the concept of truth itself, but rather the capacity of its assimilation through a single vantage point
Perhaps the “true” message of Rashomon is that deep and ever-lasting knowledge can only be gained through exposure to a variety of perspectives,
Rashomon – What is Truth?
![Page 40: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/40.jpg)
Yet, there is an alternative interpretation, one that questions not the concept of truth itself, but rather the capacity of its assimilation through a single vantage point
Perhaps the “true” message of Rashomon is that deep and ever-lasting knowledge can only be gained through exposure to a variety of perspectives,
No single source providing sufficient depth and detail to comprehend a situation as complex (and as tragic) as the murder of a man
Rashomon – What is Truth?
![Page 41: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/41.jpg)
Can an intellectual domain as complex as spoken language be fully understood through the testimony of a single perspective?
Spoken Language – What is Truth?
![Page 42: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/42.jpg)
Can an intellectual domain as complex as spoken language be fully understood through the testimony of a single perspective?
Or must orthogonal varieties of evidence be sought with which to reconstruct the “truth”?
Spoken Language – What is Truth?
![Page 43: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/43.jpg)
Can an intellectual domain as complex as spoken language be fully understood through the testimony of a single perspective?
Or must orthogonal varieties of evidence be sought with which to reconstruct the “truth”?
How does true insight proceed from “objective” study of spoken language?
Spoken Language – What is Truth?
![Page 44: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/44.jpg)
Can an intellectual domain as complex as spoken language be fully understood through the testimony of a single perspective?
Or must orthogonal varieties of evidence be sought with which to reconstruct the “truth”?
How does true insight proceed from “objective” study of spoken language?
Is it possible to fully comprehend the multivocal nature of a scientific domain from the sole vantage point of a laboratory?
Spoken Language – What is Truth?
![Page 45: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/45.jpg)
Can an intellectual domain as complex as spoken language be fully understood through the testimony of a single perspective?
Or must orthogonal varieties of evidence be sought with which to reconstruct the “truth”?
How does true insight proceed from “objective” study of spoken language?
Is it possible to fully comprehend the multivocal nature of a scientific domain from the sole vantage point of a laboratory?
Or does the spirit of Rashomon compel us to seek testimony from other sources in the pursuit of objective knowledge?
Spoken Language – What is Truth?
![Page 46: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/46.jpg)
Part Two
THE VARIETIES OF SCIENTIFIC EXPERIENCE
The Fundamental Duality
The Eternal Pentangle
The Inner Triangle
![Page 47: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/47.jpg)
The Fundamental DualityTechnology and science appear to oppose each other in perspective
![Page 48: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/48.jpg)
The Fundamental DualityTechnology and science appear to oppose each other in perspective
• Technology is concerned with what works
The Art of the Workable
![Page 49: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/49.jpg)
The Fundamental DualityTechnology and science appear to oppose each other in perspective
• Technology is concerned with what works (and can sell)
The Art of the Sellable The Art of the Workable
![Page 50: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/50.jpg)
The Fundamental DualityTechnology and science appear to oppose each other in perspective
• Technology is concerned with what works (and can sell)• Science is concerned with what is
The Art of the WorkableThe Art of the Sellable
The Art of the Soluble
![Page 51: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/51.jpg)
The Fundamental DualityTechnology and science appear to oppose each other in perspective
• Technology is concerned with what works (and can sell)• Science is concerned with what is (and can be published)
The Art of the Sellable The Art of the Workable
The Art of the SolubleThe Art of the Publishable
![Page 52: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/52.jpg)
The Fundamental DualityThere is an essential “tension” between Science and Technology
The Art of the Sellable The Art of the Workable
The Art of the SolubleThe Art of the Publishable
![Page 53: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/53.jpg)
The Fundamental DualityThere is an essential “tension” between Science and Technology
• Science is often deemed “pure”
The Art of the Sellable The Art of the Workable
The Art of the SolubleThe Art of the Publishable
![Page 54: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/54.jpg)
The Fundamental DualityThere is an essential “tension” between Science and Technology
• Science is often deemed “pure”• Technology is usually perceived as “applied”
The Art of the Sellable The Art of the Workable
The Art of the SolubleThe Art of the Publishable
![Page 55: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/55.jpg)
The Fundamental DualityThere is an essential “tension” between Science and Technology
• Science is often deemed “pure”• Technology is usually perceived as “applied” (and therefore not quite as pure)
The Art of the Sellable The Art of the Workable
The Art of the SolubleThe Art of the Publishable
![Page 56: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/56.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between
Science and Technology
![Page 57: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/57.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between
Science and Technology
![Page 58: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/58.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between
Science and Technology• “Phonetic insight” is on the side of the angels
![Page 59: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/59.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between
Science and Technology· “Phonetic insight” is on the side of the angels (a.k.a. “science”)
Phonetic Insight
![Page 60: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/60.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between Science and
Technology· “Phonetic insight” is on the side of the angels (a.k.a. “science”) · While “speech technology” is on the side of the apes
Phonetic Insight
![Page 61: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/61.jpg)
The Eternal PentangleSpeech Research Provides an Excellent Example of the Tension between Science and
Technology• “Phonetic insight” is on the side of the angels (a.k.a. “science”) • While “speech technology” is on the side of the apes (a.k.a. “the real world”)
Phonetic Insight
The Real World
![Page 62: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/62.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on
this Philosophical (and Methodological) Conundrum
![Page 63: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/63.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on this Philosophical
(and Methodological) Conundrum• Manual annotation provides the empirical foundation with which to train machine algorithms
![Page 64: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/64.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on this Philosophical (and Methodological)
Conundrum• Manual annotation provides the empirical foundation with which to train machine algorithms• Statistical characterization of the annotated material provides the basis for structuring the machine learning regime
![Page 65: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/65.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on this Philosophical (and Methodological)
Conundrum• Manual annotation provides the empirical foundation with which to train machine algorithms• Statistical characterization of the annotated material provides the basis for structuring the machine learning regime• Machine learning provides a method for evaluating phonetic knowledge
![Page 66: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/66.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on this Philosophical (and Methodological) Conundrum
• Manual annotation provides the empirical foundation with which to train machine algorithms• Statistical characterization of the annotated material provides the basis for structuring the machine learning regime• Machine learning provides a method for evaluating phonetic knowledge• Phonetic knowledge can be used to efficiently train machine algorithms
![Page 67: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/67.jpg)
The Inner TriangleThe Inner Triangle of the Eternal Pentangle Can Potentially Shed Light on this Philosophical (and Methodological) Conundrum
• Manual annotation provides the empirical foundation with which to train machine algorithms• Statistical characterization of the annotated material provides the basis for structuring the machine learning regime• Machine learning provides a method for evaluating phonetic knowledge• Phonetic knowledge can be used to efficiently train machine algorithms• Statistical characterization can serve as a “reality check” on phonetic knowledge
![Page 68: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/68.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide
insight and perspective difficult to achieve from a single vantage point
![Page 69: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/69.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and
perspective difficult to achieve from a single vantage point• In a manner analogous to Rashomon, insight may be gained from this multi- dimensional
perspective that deepens our knowledge of spoken language
![Page 70: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/70.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and perspective difficult to achieve
from a single vantage point• In a manner analogous to Rashomon, insight may be gained from this multi- dimensional perspective that deepens our knowledge
of spoken language• And thus enables the development of superior technology that truly works in the “real world”
![Page 71: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/71.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and perspective difficult to achieve from a single vantage
point• In a manner analogous to Rashomon, insight may be gained from this multi- dimensional perspective that deepens our knowledge of spoken language• And thus enables the development of superior technology that truly works in the “real world”• The development of sterling technology provides (in principle) a means to fund further basic technology-driven research
![Page 72: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/72.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and perspective difficult to achieve from a single vantage point
• In a manner analogous to Rashomon, insight may be gained from this multi- dimensional perspective that deepens our knowledge of spoken language• And thus enables the development of superior technology that truly works in the “real world”• The development of sterling technology provides (in principle) a means to fund further basic technology-driven research• And that, in turn, results in further technological advances
![Page 73: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/73.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and perspective difficult to achieve from a single vantage point
• In a manner analogous to Rashomon, insight may be gained from this multi- dimensional perspective that deepens our knowledge of spoken language• And thus enables the development of superior technology that truly works in the “real world”• The development of sterling technology provides (in principle) a means to fund further basic technology-driven research• And that, in turn, results in further technological advances• And so on
![Page 74: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/74.jpg)
The Inner TriangleThus, the three apices of the Inner Triangle feed into each other and provide insight and perspective difficult to achieve from a single vantage point
• In a manner analogous to Rashomon, insight may be gained from this multi-dimensional perspective that deepens our knowledge of spoken language• And thus enables the development of superior technology that truly works in the “real world”• The development of sterling technology provides (in principle) a means to fund further basic technology-driven research• And that, in turn, results in further technological advances• And so on (forever after)
![Page 75: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/75.jpg)
Part Three
THE IMPORTANCE OF BEING PHONETICALLY ANNOTATED
A Corpus-Centric Perspective on Spoken Language
Phonetic Annotation of Spontaneous American English Discourse
![Page 76: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/76.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
Phonetic Annotation is Useful, Because …
![Page 77: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/77.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Phonetic Annotation is Useful, Because …
![Page 78: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/78.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken Language
Phonetic Annotation is Useful, Because …
![Page 79: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/79.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understanding
Phonetic Annotation is Useful, Because …
![Page 80: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/80.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
Phonetic Annotation is Useful, Because …
![Page 81: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/81.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:
Phonetic Annotation is Useful, Because …
![Page 82: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/82.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation models
Phonetic Annotation is Useful, Because …
![Page 83: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/83.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as in
Phonetic Annotation is Useful, Because …
![Page 84: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/84.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
Phonetic Annotation is Useful, Because …
![Page 85: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/85.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
They Promote Development of NOVEL Algorithms for Speech Technology
Phonetic Annotation is Useful, Because …
![Page 86: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/86.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
They Promote Development of NOVEL Algorithms for Speech TechnologyIncluding pronunciation models and lexical representations for
Phonetic Annotation is Useful, Because …
![Page 87: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/87.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
They Promote Development of NOVEL Algorithms for Speech TechnologyIncluding pronunciation models and lexical representations for automatic speech recognition and speech synthesis, as well as
Phonetic Annotation is Useful, Because …
![Page 88: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/88.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in:Automatic speech recognition, particularly pronunciation modelsSpeech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
They Promote Development of NOVEL Algorithms for Speech TechnologyIncluding pronunciation models and lexical representations for automatic speech recognition and speech synthesis, as well asMulti-tier representations of spoken language
Phonetic Annotation is Useful, Because …
![Page 89: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/89.jpg)
Many Properties of Spontaneous Spoken Language Differ from Those of Laboratory and Citation Speech
There are systematic patterns in “real” speech that potentially reveal underlying principles of linguistic organization
Such Corpora Provide Empirical Material for the Study of Spoken LanguageSuch data provide an important basis for scientific insight and understandingAnd facilitate development of new models of spoken language
They Also Provide Training Material for Technology Applications in: Automatic speech recognition, particularly pronunciation models
Speech synthesis, in pronunciation models as well as inCross-linguistic transfer of technology algorithms, etc.
They Promote Development of NOVEL Algorithms for Speech TechnologyIncluding pronunciation models and lexical representations for automatic speech recognition and speech synthesis, as well asMulti-tier representations of spoken language
All of Which Can be Used for Gaining Further Insight into Spoken Language
Phonetic Annotation is Useful, Because …
![Page 90: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/90.jpg)
Corpus-Centric View of Spoken LanguageEach Tier of Linguistic Organization Provides a Unique Perspective
![Page 91: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/91.jpg)
Corpus-Centric View of Spoken LanguageEach Tier of Linguistic Organization Provides a Unique Perspective
However, integrating the annotated material across levels is tricky …
![Page 92: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/92.jpg)
Corpus-Centric View of Spoken LanguageEach Tier of Linguistic Organization Provides a Unique Perspective
However, integrating the annotated material across levels is tricky ….And a lot of work!!
![Page 93: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/93.jpg)
Corpus-Centric View of Spoken LanguageEach Tier of Linguistic Organization Provides a Unique Perspective
However, integrating the annotated material across levels is tricky ….And a lot of work!!
Let’s Focus on a Specific Aspect of Linguistic Organization in Order to Exemplify the Concepts Involved
![Page 94: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/94.jpg)
Corpus-Centric View of Spoken LanguageEach Tier of Linguistic Organization Provides a Unique Perspective
However, integrating the annotated material across levels is tricky ….And a lot of work!!
Let’s Focus on a Specific Aspect of Linguistic Organization in Order to Exemplify the Concepts InvolvedIn order to do so, we first consider the nature of the transcription material used
![Page 95: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/95.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
![Page 96: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/96.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated
![Page 97: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/97.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level
![Page 98: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/98.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level
![Page 99: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/99.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods
![Page 100: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/100.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled material
![Page 101: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/101.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
![Page 102: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/102.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material Transcribed
![Page 103: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/103.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent
(this latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American
dialectal variation, speaking rate and voice quality
![Page 104: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/104.jpg)
Phonetic Transcription of Spontaneous EnglishTelephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD
CORPUS, have been phonetically annotated (labeled and segmented)
Most of this Material has been Manually Annotated 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment levelThe remaining material has been segmented at the phonetic-segment level using
automatic methods45 minutes of stress-accent-labeled materialAn additional four hours of material automatically labeled with respect to accent (this
latter material not used in the current analysis, but will be available soon)
There is a Lot of Diversity in the Material TranscribedSpans speech of both genders (ca. 50/50%), reflecting a wide range of American
dialectal variation, speaking rate and voice quality
Transcription SystemA variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd
![Page 105: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/105.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
![Page 106: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/106.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
![Page 107: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/107.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
![Page 108: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/108.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD
![Page 109: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/109.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD SYLLABLE
![Page 110: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/110.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD SYLLABLE PHONETIC SEGMENT
![Page 111: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/111.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD SYLLABLE PHONETIC SEGMENT
ARTICULATORY-ACOUSTIC FEATURE
![Page 112: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/112.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD SYLLABLE PHONETIC SEGMENT
ARTICULATORY-ACOUSTIC FEATURE and STRESS ACCENT
![Page 113: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/113.jpg)
Phonetic Transcription of Spontaneous EnglishThe Data are Available at ….
http://www.icsi/berkeley.edu/real/stp
This Means there is Phonetically Validated Material at the Level of the:
WORD SYLLABLE PHONETIC SEGMENT
ARTICULATORY-ACOUSTIC FEATURE and STRESS ACCENT
(as well as at the utterance level)
![Page 114: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/114.jpg)
The Eternal Pentangle (Redux)Let’s re-examine the eternal triangle from the perspective of manual
annotation for three linguistic tiers….
![Page 115: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/115.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
![Page 116: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/116.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics students
![Page 117: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/117.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform,
![Page 118: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/118.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram,
![Page 119: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/119.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription
![Page 120: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/120.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries)
![Page 121: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/121.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance)
![Page 122: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/122.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations
![Page 123: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/123.jpg)
Phonetic Transcription How was the Labeling and Segmentation Performed?
VERY carefully …. by UC-Berkeley linguistics studentsUsing a display of the signal waveform, spectrogram, word transcription and
“forced alignments” (automatic estimates of phones and boundaries) + audio (listening at multiple time scales - phone, word, utterance) on Sun workstations
Additionally, automatic segmentation and labeling of articulatory manner was used as a guide for phonetic labeling and segmentation in the current year
![Page 124: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/124.jpg)
Phonetic Transcription In addition to phonetic labels and syllabic segmentation,
![Page 125: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/125.jpg)
Phonetic Transcription In addition to phonetic labels and syllabic segmentation,
45 minutes of this material was labeled with respect to stress accent for each syllable Three levels of stress were marked - FULLY Stressed, Unstressed and Intermediate Stress
![Page 126: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/126.jpg)
Phonetic Transcription Such material can be used to perform statistical characterization of spontaneous speech
as well as train machine algorithms to label and segment additional material
![Page 127: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/127.jpg)
Phonetic Transcription Such material can be used to perform statistical characterization of spontaneous speech
as well as train machine algorithms to label and segment additional material
In addition, the transcription material can be used to evaluate the performance of automatic speech recognition systems
![Page 128: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/128.jpg)
Phonetic Transcription Such material can be used to perform statistical characterization of spontaneous speech
as well as train machine algorithms to label and segment additional material
In addition, the transcription material can be used to evaluate the performance of automatic speech recognition systems
Let’s first consider how this transcription can be used for ASR evaluation
![Page 129: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/129.jpg)
Phonetic Transcription Such material can be used to perform statistical characterization of spontaneous speech
as well as train machine algorithms to label and segment additional material
In addition, the transcription material can be used to evaluate the performance of automatic speech recognition systems
Let’s first consider how this transcription can be used for ASR evaluation
We’ll focus on stress-accent, but then relate this to syllable structure
![Page 130: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/130.jpg)
Part Four
PHONETIC DISSECTION OF
AUTOMATIC SPEECH RECOGNITION SYSTEMS
A Case Study
Stress Accent and Word Error Rate
Syllable Structure and Word Error Rate
In Collaboration with Shawn Chang
![Page 131: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/131.jpg)
The Eternal Pentangle (Redux)Let’s re-examine the eternal triangle from the perspective of automatic
speech recognition ….
![Page 132: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/132.jpg)
Generation of Evaluation Data - 1A complex sequence of data formatting was required to place the speech recognition
data of 8 separate sites into register with the transcription material (and vice versa)
![Page 133: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/133.jpg)
Generation of Evaluation Data - 2But, let’s not sweat the details during this presentation
![Page 134: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/134.jpg)
Generation of Evaluation Data - 2Let’s not sweat the details during this presentationInterested parties may consult the relevant papers (Greenberg, Hollenback and Chang,
2000; Greenberg and Chang, 2000) at
www.icsi.berkeley.edu/~steveng
![Page 135: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/135.jpg)
Generation of Evaluation Data - 3Recognition performance was analyzed with reference to ca. 50 separate acoustic,
linguistic and structural parameters
![Page 136: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/136.jpg)
• LEXICAL PROPERTIES – Lexical Identity– Unigram Frequency– Number of Syllables in Word– Number of Phones in Word– Word Duration– Speaking Rate– Prosodic Prominence– Energy Level– Lexical Compounds– Non-Words– Word Position in Utterance
• SYLLABLE PROPERTIES– Syllable Structure– Syllable Duration– Syllable Energy– Prosodic Prominence– Prosodic Context
Summary of Corpus Acoustic Properties• PHONE PROPERTIES
– Phonetic Identity– Phone Frequency– Position within the Word– Position within the Syllable– Phone Duration– Speaking Rate– Phonetic Context– Contiguous Phones Correct– Contiguous Phones Wrong– Phone Segmentation– Articulatory Features– Articulatory Feature Distance– Phone Confusion Matrices
• OTHER PROPERTIES– Speaker (Dialect, Gender)– Utterance Difficulty– Utterance Energy– Utterance Duration
![Page 137: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/137.jpg)
• LEXICAL PROPERTIES – Lexical Identity– Unigram Frequency– Number of Syllables in Word– Number of Phones in Word– Word Duration– Speaking Rate– Prosodic Prominence– Energy Level– Lexical Compounds– Non-Words– Word Position in Utterance
• SYLLABLE PROPERTIES– Syllable Structure– Syllable Duration– Syllable Energy– Prosodic Prominence– Prosodic Context
Summary of Corpus Acoustic Properties• PHONE PROPERTIES
– Phonetic Identity– Phone Frequency– Position within the Word– Position within the Syllable– Phone Duration– Speaking Rate– Phonetic Context– Contiguous Phones Correct– Contiguous Phones Wrong– Phone Segmentation– Articulatory Features– Articulatory Feature Distance– Phone Confusion Matrices
• OTHER PROPERTIES– Speaker (Dialect, Gender)– Utterance Difficulty– Utterance Energy– Utterance Duration
![Page 138: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/138.jpg)
• LEXICAL PROPERTIES – Lexical Identity– Unigram Frequency– Number of Syllables in Word– Number of Phones in Word– Word Duration– Speaking Rate– Prosodic Prominence– Energy Level– Lexical Compounds– Non-Words– Word Position in Utterance
• SYLLABLE PROPERTIES– Syllable Structure– Syllable Duration– Syllable Energy– Prosodic Prominence– Prosodic Context
Summary of Corpus Acoustic Properties• PHONE PROPERTIES
– Phonetic Identity– Phone Frequency– Position within the Word– Position within the Syllable– Phone Duration– Speaking Rate– Phonetic Context– Contiguous Phones Correct– Contiguous Phones Wrong– Phone Segmentation– Articulatory Features– Articulatory Feature Distance– Phone Confusion Matrices
• OTHER PROPERTIES– Speaker (Dialect, Gender)– Utterance Difficulty– Utterance Energy– Utterance Duration
![Page 139: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/139.jpg)
• LEXICAL PROPERTIES – Lexical Identity– Unigram Frequency– Number of Syllables in Word– Number of Phones in Word– Word Duration– Speaking Rate– Prosodic Prominence– Energy Level– Lexical Compounds– Non-Words– Word Position in Utterance
• SYLLABLE PROPERTIES– Syllable Structure– Syllable Duration– Syllable Energy– Prosodic Prominence– Prosodic Context
Summary of Corpus Acoustic Properties• PHONE PROPERTIES
– Phonetic Identity– Phone Frequency– Position within the Word– Position within the Syllable– Phone Duration– Speaking Rate– Phonetic Context– Contiguous Phones Correct– Contiguous Phones Wrong– Phone Segmentation– Articulatory Features– Articulatory Feature Distance– Phone Confusion Matrices
• OTHER PROPERTIES– Speaker (Dialect, Gender)– Utterance Difficulty– Utterance Energy– Utterance Duration
![Page 140: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/140.jpg)
What is (usually) Meant by Stress Accent?Prosody is supposed to pertain to extra-phonetic cues in the acoustic
signal
![Page 141: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/141.jpg)
What is (usually) Meant by Stress Accent?Prosody is supposed to pertain to extra-phonetic cues in the acoustic
signal
The pattern of variation over a sequence of SYLLABLES pertaining to: syllabic DURATION, AMPLITUDE and PITCH (fo) variation over time
![Page 142: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/142.jpg)
What is (usually) Meant by Stress Accent?Prosody is supposed to pertain to extra-phonetic cues in the acoustic signal
The pattern of variation over a sequence of SYLLABLES pertaining to: syllabic DURATION, AMPLITUDE and PITCH (fo) variation over time
But, the plot thickens (considerably) .… as we’ll shortly see
![Page 143: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/143.jpg)
The effect of stress accent is most discernable among word-deletion errors
Stress Accent and Word Error Rate
Unstressed Fully Stressed Intermediate Stress
Data are averaged across all eight sites
![Page 144: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/144.jpg)
The effect of stress accent is most discernable among word-deletion errors
There is no essential relation between accent and word-substitution errors
Stress Accent and Word Error Rate
Unstressed Fully Stressed Intermediate Stress
Data are averaged across all eight sites
![Page 145: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/145.jpg)
Syllable Structure and Word Error RateLet’s now consider syllable structure with respect to ASR word error
![Page 146: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/146.jpg)
Syllable Structure and Word Error RateLet’s now consider syllable structure with respect to ASR word error
There is a certain similarity with the pattern observed for stress accent ….
![Page 147: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/147.jpg)
Syllable Structure and Word Error RateVowel-initial forms show the greatest error, particularly for word deletions
Data are averaged across all eight sites
C = ConsonantV = Vowel
![Page 148: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/148.jpg)
Syllable Structure and Word Error RateVowel-initial forms show the greatest error, particularly for word deletions
Polysyllabic forms manifest the lowest error, especially for word deletions
C = ConsonantV = Vowel
Data are averaged across all eight sites
![Page 149: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/149.jpg)
Syllable Structure and Word Error RateVowel-initial forms show the greatest error, particularly for word deletions
Polysyllabic forms manifest the lowest error, especially for word deletions
The vowel-initial forms tend to be unstressed, so ….
C = ConsonantV = Vowel
Data are averaged across all eight sites
![Page 150: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/150.jpg)
Syllable Structure and Word Error RateVowel-initial forms show the greatest error, particularly for word deletions
Polysyllabic forms manifest the lowest error, especially for word deletions
The vowel-initial forms tend to be unstressed, so ….
Perhaps the similarity in pattern is not so surprising after all
C = ConsonantV = Vowel
Data are averaged across all eight sites
![Page 151: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/151.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables
The Plot … So Far
![Page 152: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/152.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Plot … So Far
![Page 153: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/153.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The Plot … So Far
![Page 154: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/154.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
The Plot … So Far
![Page 155: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/155.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error Rate
The Plot … So Far
![Page 156: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/156.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressed
The Plot … So Far
![Page 157: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/157.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressedSuggesting that deletion errors reflect the general stress pattern within the word
The Plot … So Far
![Page 158: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/158.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressedSuggesting that deletion errors reflect the general stress pattern within the word
Syllable Structure and Stress Accent are not Salient Properties in (most) ASR Systems
The Plot … So Far
![Page 159: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/159.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressedSuggesting that deletion errors reflect the general stress pattern within the word
Syllable Structure and Stress Accent are not Salient Properties in (most) ASR Systems
As ASR systems know about phones and words, but not syllables and stress (at least in American English)
The Plot … So Far
![Page 160: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/160.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressedSuggesting that deletion errors reflect the general stress pattern within the word
Syllable Structure and Stress Accent are not Salient Properties in (most) ASR Systems
As ASR systems know about phones and words, but not syllables and stress (at least in American English)
Could There Therefore be a Link Between Syllable Structure, Stress Accent and Some Other Linguistic Properties that ASR Systems “Know About”?
The Plot … So Far
![Page 161: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/161.jpg)
The Proportion of Word (Deletion) Errors is Much Higher Among Unstressed Syllables (Relative to Fully and even Partially Stressed Syllables)
The Proportion of Word (Deletion) Errors is Much Higher Among Syllables that Begin with a Vowel
The exception being words composed of more than a single syllable
Polysyllabic Words Exhibit the Lowest Word Deletion Error RateSuch words usually have at least one syllable that is highly stressedSuggesting that deletion errors reflect the general stress pattern within the word
Syllable Structure and Stress Accent are not Salient Properties in (most) ASR Systems
As ASR systems know about phones and words, but not syllables and stress (at least in American English)
Could There Therefore be a Link Between Syllable Structure, Stress Accent and Some Other Linguistic Properties that ASR Systems “Know About”?
Let’s Find Out ….
The Plot … So Far
![Page 162: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/162.jpg)
Part Five
The Relation Between Stress Accent and
Vocalic IdentityYet Another Case Study
The Relation Between Segmental Duration and Vowel Height
Durational Differences Between Stressed and Unstressed Vowels
The Relation Between Vowel Height and Stress Accent
In Collaboration with Leah Hitchcock
![Page 163: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/163.jpg)
The Eternal Pentangle (Redux)Let’s re-examine the eternal triangle from the perspective of statistical
characterization of the annotated Switchboard corpus
![Page 164: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/164.jpg)
The Eternal Pentangle (Redux)Let’s re-examine the eternal triangle from the perspective of statistical characterization of the annotated
Switchboard corpus
These data were originally collected to improve the quality of speech recognition systems, but are now being pressed into service for SCIENCE
![Page 165: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/165.jpg)
The Eternal Pentangle (Redux)But first ….
![Page 166: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/166.jpg)
A Brief Primer on Vocalic Acoustics
![Page 167: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/167.jpg)
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
A Brief Primer on Vocalic Acoustics
![Page 168: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/168.jpg)
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
A Brief Primer on Vocalic Acoustics
![Page 169: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/169.jpg)
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
A Brief Primer on Vocalic Acoustics
![Page 170: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/170.jpg)
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:
A Brief Primer on Vocalic Acoustics
![Page 171: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/171.jpg)
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue
• The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance
• The height parameter is closely linked to the frequency of F1
In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows:
A Brief Primer on Vocalic Acoustics
![Page 172: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/172.jpg)
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
Spatial Patterning of Duration and Amplitude
![Page 173: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/173.jpg)
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid , where the x-axis will always be in terms of hypothetical front-back tongue position
Spatial Patterning of Duration and Amplitude
![Page 174: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/174.jpg)
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid , where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
Spatial Patterning of Duration and Amplitude
![Page 175: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/175.jpg)
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid , where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei
Spatial Patterning of Duration and Amplitude
![Page 176: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/176.jpg)
Let’s return to the vowel triangle and see if it can shed light on certain patterns in the vocalic data
The duration will be plotted on a 2-D grid , where the x-axis will always be in terms of hypothetical front-back tongue position (and hence remain a constant throughout the plots to follow)
The y-axis will serve as the dependent measure expressed in terms of duration or the proportion of fully stressed (or unstressed) nuclei
Spatial Patterning of Duration et al.
![Page 177: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/177.jpg)
Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to
duration
![Page 178: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/178.jpg)
Vocalic Duration and Vowel HeightThe spatial patterning of vocalic segments is systematic with respect to
duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
![Page 179: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/179.jpg)
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
![Page 180: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/180.jpg)
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Thus, duration appears to be highly correlated with vowel height
![Page 181: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/181.jpg)
Vocalic Duration and Vowel Height
All nuclei Diphthongs Monophthongs
The spatial patterning of vocalic segments is systematic with respect to duration
Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Thus, duration appears to be highly correlated with vowel height
But … the situation is a little more complicated than first appearances would suggest
![Page 182: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/182.jpg)
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between stressed and unstressed
nuclei
![Page 183: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/183.jpg)
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between stressed and unstressed nuclei
Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs
![Page 184: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/184.jpg)
Durational Differences - Stressed/UnstressedThere is a large dynamic range in duration between stressed and unstressed nuclei
Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs
Lax monophthongs
![Page 185: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/185.jpg)
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
![Page 186: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/186.jpg)
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
The low vowels, be they monophthongs or diphthongs, are rarely unstressed
![Page 187: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/187.jpg)
Vocalic Identity Among Unstressed NucleiThe high, lax monophthongs are almost always unstressed
The low vowels, be they monophthongs or diphthongs, are rarely unstressed
The high diphthongs and high/mid, tense monophthongs occupy an intermediate position
![Page 188: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/188.jpg)
The high vowels are rarely fully stressed
Vocalic Identity Among Fully Stressed Nuclei
![Page 189: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/189.jpg)
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
Vocalic Identity Among Fully Stressed Nuclei
![Page 190: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/190.jpg)
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
An intermediate degree of stress accounts for the other vocalic instances
Vocalic Identity Among Fully Stressed Nuclei
![Page 191: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/191.jpg)
The high vowels are rarely fully stressed
The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed
An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here)
Vocalic Identity Among Fully Stressed Nuclei
![Page 192: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/192.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
Is It Stress? Vocalic Identity? Or What?
![Page 193: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/193.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)
Is It Stress? Vocalic Identity? Or What?
![Page 194: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/194.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Is It Stress? Vocalic Identity? Or What?
![Page 195: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/195.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High Vowels
Is It Stress? Vocalic Identity? Or What?
![Page 196: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/196.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Is It Stress? Vocalic Identity? Or What?
![Page 197: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/197.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress Accent
Is It Stress? Vocalic Identity? Or What?
![Page 198: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/198.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
Is It Stress? Vocalic Identity? Or What?
![Page 199: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/199.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely Rarely
Is It Stress? Vocalic Identity? Or What?
![Page 200: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/200.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Is It Stress? Vocalic Identity? Or What?
![Page 201: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/201.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Thus, Stress Accent Appears to Be Intricately Involved with Vocalic Identity
Is It Stress? Vocalic Identity? Or What?
![Page 202: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/202.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Thus, Stress Accent Appears to Be Intricately Involved with Vocalic IdentityThis relation is likely to have an important impact on pronunciation variation
Is It Stress? Vocalic Identity? Or What?
![Page 203: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/203.jpg)
Duration Appears to Play An Important (but certainly not exclusive) Role in Stress Accent for Spontaneous American English Discourse
For any given vocalic class, stressed segments are longer (on average)The durational disparity is most pronounced among the low vowels and the
diphthongs
Low Vowels Tend to be Much Longer in Duration than High VowelsThis is the case even for diphthongs
Low Vowels are Rarely without Some Measure of Stress AccentThis is true for monophthongs as well as diphthongs
High Vowels are Fully Stressed Extremely RarelyThis is particularly so for monophthongs, but also applies to diphthongs
Thus, Stress Accent Appears to Be Intricately Involved with Vocalic IdentityThis relation is likely to have an important impact on pronunciation variation
And Thus Could be Useful for Modeling Pronunciation Variation for BOTH Scientific and Technological Applications
Is It Stress? Vocalic Identity? Or What?
![Page 204: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/204.jpg)
Part Six
SPOKEN LANGUAGEWHAT IS TRUTH?
Fundamental Questions Remain Unanswered
![Page 205: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/205.jpg)
The Current Story Raises More Questions than it Answers ….
Spoken Language – What is Truth?
![Page 206: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/206.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Spoken Language – What is Truth?
![Page 207: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/207.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Is Duration an Essential Component of Stress Accent and Vowel Height?
Spoken Language – What is Truth?
![Page 208: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/208.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Is Duration an Essential Component of Stress Accent and Vowel Height?
How Should Words (and other organizational units) be Represented in ASR Lexicons to Exploit Such Interrelations?
Spoken Language – What is Truth?
![Page 209: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/209.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Is Duration an Essential Component of Stress Accent and Vowel Height?
How Should Words (and other organizational units) be Represented in ASR Lexicons to Exploit Such Interrelations?
Can Speech Technology Afford to View Language as a Mere Concatenation of Phones and Words (or analogous units)?
Spoken Language – What is Truth?
![Page 210: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/210.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Is Duration an Essential Component of Stress Accent and Vowel Height?
How Should Words (and other organizational units) be Represented in ASR Lexicons to Exploit Such Interrelations?
Can Speech Technology Afford to View Language as a Mere Concatenation of Phones and Words (or analogous units)?Perhaps No Single Perspective Can Truly Capture the Essence of Spoken Language, or
Spoken Language – What is Truth?
![Page 211: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/211.jpg)
Is It Possible to Dissociate Vocalic Identity from Stress Accent?
Is Duration an Essential Component of Stress Accent and Vowel Height?
How Should Words (and other organizational units) be Represented in ASR Lexicons to Exploit Such Interrelations?
Can Speech Technology Afford to View Language as a Mere Concatenation of Phones and Words (or analogous units)?Perhaps No Single Perspective Can Truly Capture the Essence of Spoken Language, or Portray It with the Depth and Clarity Required to Produce “Flawless” Technology and Enduring Scientific Insight
Spoken Language – What is Truth?
![Page 212: From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg](https://reader038.fdocuments.us/reader038/viewer/2022110212/56813bfa550346895da54612/html5/thumbnails/212.jpg)
That’s All, Folks
Many Thanks for Your Time and Attention