The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program...
-
Upload
eileen-hancock -
Category
Documents
-
view
216 -
download
1
Transcript of The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program...
![Page 1: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/1.jpg)
The Evolving Quality of Telephonic Speech
Richard A. Thompson
Emeritus ProfessorTelecom Program
University of Pittsburgh
Why VoIP's speech qualityis disappointing, and how
it wouldn't have to be.
![Page 2: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. Human capacity for aural quality
3. History of evolving & devolving quality
4. Network integration vs app quality
5. High-fidelity Voice-over-IP
![Page 3: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/3.jpg)
1. Introduction
• Telecom technology has benefited the human species.– Morse, Bell, Tesla, Zworykin we communicate over distance,– But their inventions had greatly reduced aural & visual quality.
• During the last century, successive technology …– Raised many aspects of the original audio & video quality,– But, also lowered other aspects of app quality
• Two examples of lowered quality:1. Successive technologies reduced audio bandwidth
2. pixel-block “dance” after noisy or lost internet packets.
• This talk discusses the devolution of audio quality– And concludes that we don’t have to live with it.
![Page 4: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/4.jpg)
Gucci Family Slogan
“Quality is remembered …long after
the price is forgotten”
$895
$1950
![Page 5: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/5.jpg)
2. Human Capacity for Aural Quality
• Anatomy, physics, physiology, & brainware– of human speech and hearing– How we discriminate phonemes & recognize speakers
• Section Outline1. Review of Human Speech
2. Review of Human Hearing
3. Review of Aural Processing
![Page 6: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/6.jpg)
Review of Human Speech
• Speech = complex acoustic signal humans emit & receive– Sequence of air compressions & rarefactions;– Travels about 770 mph
• Speaking requires a complex structure:– By modulating an exhaled air stream, we emit
sequences of elementary sounds, called phonemes.
• If we partly close our larynx as we exhale,– our “vocal cords” vibrate at a fundamental pitch, f1 = 80 to 350 Hz,
– depending on the speaker’s size, shape, gender, & age.
• Altering tension changes f1 to any value
between half and double its regular pitch;– for singing and linguistic cues.
2. Human Capacity for Aural Quality
![Page 7: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/7.jpg)
Variable Acoustic Filter
• Acoustic waveform at the larynx resembles a saw-tooth rich in harmonics.
• Mouth is a variable resonant cavity;– It acts as a tunable acoustic filter.
• By changing our mouth’s internal shape,– we attenuate different harmonics as they pass through.
• Our two main techniques are:– Change our tongue position,– Switch our nasal cavity in/out using our uvula.
• Each phoneme has a different “recipe”– of the weights of the harmonics.
ee aa
ee
nn
![Page 8: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/8.jpg)
Taxonomy ofEnglish phonemes
Type unvoiced voiced
Vowel-like
mouth - vowels, ll, rr
nose - mm, nn, ng
diphthongs - ow, long-i, …
Fricatives hh wh
(sustained ss zz
turbulence) sh zh
ff vv
Plosives ch j
(burst k g
turbulence) p b
t d
• Sustained phonemes:• vowels, ll, rr,• nasals,• fricatives.
• Dynamic phonemes:• Slowly: diphthongs• Quickly: plosives
• Last eight rows:• 8 diff. mouth positions• 2 phonemes per position;
• By vibrating larynx or not.
![Page 9: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/9.jpg)
Mouth-to-Ear Spectrum
• Runs from f1 to our hearing limit of 14 - 20 kHz,– depending on the listener’s age, etc.
• Acoustic energy in different phonemes– is distributed differently over the aural spectrum.
• For example, fricatives like ss,– have significant energy at the high end of the spectrum.
• Hearing accuracy is– a non-linear function of how much
of this spectrum is actually heard.
![Page 10: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/10.jpg)
Review of Human Hearing
• Ear drum, in each ear,– is AC-coupled (the Eustachian tube maintains DC)– to the cochlea by tiny linked bones.
• Cochlea is a horn, wrapped into a snail-shell,
– filled with fluid, lined with small hairs.
• The acoustic signal– causes standing waves inside the cochlea
to excite nerves at the base of each hair.– These nerves transmit a parallel signal to the brain,
giving the weights of the signal’s harmonics.
• Cochlea & its driver (in brainware) compute* the– Fourier Series coefficients of the received acoustic signal.
2. Human Capacity for Aural Quality
*Color code for what we think happens
![Page 11: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/11.jpg)
Hearing Brain-Ware
• Behind this driver, mid-level BW does more processing:1. Calculates acoustic directionality,
2. Selects the desired signal out of background noise,
3. Performs phoneme discrimination (independent of the speaker),
4. Identifies who the speaker is (independent of the phoneme).
• Last 3 tasks are supported by– high-level syntactic & semantic processing which,– at even higher levels of brainware,– depend on content, context, background, and emotional state.
• This paper deals only with low- and mid-level brainware,– Which performs the last two tasks on the list above.
AD
NF
PD
SI
EDEarHW
![Page 12: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/12.jpg)
Review of Aural Processing
• Mid-level brainware identifies speakers– by comparing the set of weights, received from the driver,– against a speaker database.
• Our accuracy at finding a best match is a– nonlinear function of how many weights the
speaker-identifier process receives from the driver.– This number of coefficients depends on how much
acoustic spectrum is heard by the cochlea & its driver.
• We discriminate phonemes more indirectly.– The spectral envelope of most phonemes has
four relative maxima, called “formant frequencies,” F1 to F4.
2. Human Capacity for Aural Quality
![Page 13: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/13.jpg)
Formant Frequencies
• F1 and F2 peaks for ee and aa can be seen
– in the frequency domain.• Generalized time-domain diagrams:
– of F1 and F2 for 21 phoneme-pairs,
– each a dynamic consonant that elides into a vowel.
ee
aa
F1
f1 F2
![Page 14: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/14.jpg)
Formants for Vowels
• Spectral position of these formants, especially F1 and F2,– is the most important cue in phoneme discrimination.– But, it’s complex because formant positions are speaker dependent.
• Each point is an [F1, F2] value for– 76 speakers of 10 sustained phonemes.– Clusters show the intended phoneme.– Proximities pot. error w/o ++spectrum.
• EG, upper-left cluster ee.– Low F1 & high F2 consistent spectrum.
– High prob. ee interpreted as short-i.
ee
![Page 15: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/15.jpg)
Phoneme Discrimination
• We discriminate phonemes in mid-level brainware by:1. Computing formants from weights received from driver,
2. Comparing Fs against a database that works like
• Our accuracy at finding the best match is a– nonlinear function of how many formants
the phoneme-discriminator has available.– This # of formants depends on how much of the
acoustic spectrum is heard by cochlea & driver.
• We have a mirrored set of multilevel processes– in the speaker’s brainware also.– These communicating processes translate thoughts into language,– then to sequence of neural signals that control our mouth parts.
![Page 16: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/16.jpg)
3. Technology’s Impacton Quality
• After listing components of aural quality,– we review successive technologies and how they– raised some aspects of audio quality and lowered others.
• After discussing their effect on– speaker identification and phoneme discrimination,
• We review the history of the complaint that technology– should never lower any aspect of application quality.
• Section Outline1. Aural Quality and its Impairments
2. Identifying Phonemes and Speakers
3. The History of the Complaint
![Page 17: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/17.jpg)
Aural Quality & its Impairments
• Quality of a natural acoustic signal is measured by its:– Intensity (loudness),– Purity (nothing else added),– Immediacy (un-delayed)– Clarity (undistorted), &– Fidelity ().
• By definition, Fidelity measures an audio signal’s– faithfulness to its acoustic analog.– We’ll defer to the lay def that it implies high band-width.
3. Technology’s Impact on Quality
![Page 18: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/18.jpg)
Natural Impairments
• Natural acoustic signals suffer 5 impairments:– loss,– noise,– crosstalk,– delay, &– echo.
This figure will grow downwardon the following slides
![Page 19: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/19.jpg)
Pros & Cons ofAnalog Networks
• The role of any network is to eliminate natural loss.– Usually replaces large acoustic delay by small signal delay– May also reduce crosstalk & echo.
• Analog networks add crosstalk from the loop pair– and echo from impedance mismatch and leaky hybrids.
• &, they add new impairments, not seen in natural signals:– Amplitude distortion from amplifiers that clip,– Band-restriction & frequency distortion from wire reactance,– Delay distortion because frequency components have diff velocities.
![Page 20: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/20.jpg)
Fidelity inAnalog Networks
• 500-sets– Cut f1 off at low end
– Had 12-kHz of bandpass.– (modern phones have no reason to provide that much BW)
• If phones are connected in a local call,– loop limits end-to-end bandpass to 8-10 kHz, dep on loop-length.
• In long-distance calls,– network further limits bandpass to 4-6 kHz, dep on distance.
• 4-kHz analog LD channel had poorest fidelity, but…– Bell System “spun” the term “toll grade” to imply high quality.
• Note: upper limit of all BWs is given as “3-dB frequency;”– There is significant audio power outside these formal limits.
![Page 21: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/21.jpg)
Subsequent Analog improvements
• Analog technology advancements in:– Channels (fiber),– Amplifiers,– Echo cancellers,– Shielding, &– Noise filters;
• But, not band-restriction,– nor the other two forms of distortion.
• Biggest improvement comes from going digital
Improved:• loss,• noise,• crosstalk,• delay,• echo, &• amplitude distortion.
![Page 22: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/22.jpg)
Pros & Cons of Digital Networks
• Digitizing an audio signal greatly improves intensity.• And, a digital PSTN is virtually noise-free.
– Even loop noise (assume ADC in CO) is partially blocked
on speaker side by ADC anti-alias filter.
• But, new noise is added by:– quantizing, companding, mu-to-A conversion, & bit errors.
• And, Echo is worse because digital transport is 4-wire,– which requires many more hybrids (which can leak) in the network.
Note that adigital networkis embeddedinside ananalog network
![Page 23: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/23.jpg)
Fidelity in Digital Networks
• By far, the worst impairment is that– anti-aliasing filters in the A-to-D converters impair fidelity,– So, all calls are nominally as band-limited as LD analog calls
• Fidelity is even perceptibly lower than “nominal”– because blocking all audio above 4 kHz– requires a half-power point at 3.7 kHz &– high-end drop-off that is much steeper than in analog networks.
• So, digital calls have better SNR than analog calls;– But a local digital call has perceptibly lower fidelity
than even a long-distance analog call.
• For example,
0 4 kHz
analog
digital
![Page 24: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/24.jpg)
w
Transmitted acoustic signal
Natural medium
Analog network
Digital network
Packet network
Received acoustic signal
Trans-ducer
Trans-ducer
Trans-ducer
Trans-ducer
Trans-ducer
Trans-ducer
With givenintensity,purity, clarity,& fidelity Impaired by natural
loss, noise, crosstalk,delay, & echo
Deletes naturalloss & delay.Reduces naturalnoise, crosstalk,& echo
With poorerintensity, purity,Immediacy,clarity & fidelity
Further impaired byanalog loss, noise,crosstalk, delay, echo,band-loss, & 3 distortions
Reduces analogloss, echo,noise, crosstalk,& 3 distortions
Further impaired byquantization noise,bit-error noise, &++band-loss fromanti-aliasing filter
Retains alldigital networkimpairments
Exacerbatesbit-error noise.Adds more delay,which can ++echo
VoIP’s Cons
• VoIP further impairs digital audio quality• Audio purity is further impaired because:
– speech compression exaggerated bit errors– noticeable clunks from lost packets (packet loss: 0 1% 5% )– silence-detecting codecs’ slow-start may clip leading-plosive
Note that apacket networkis embeddedinside adigital network
![Page 25: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/25.jpg)
VoIP & Delay
• Immediacy is greatly impaired by delays caused by:– Packetization, jitter buffers, router proc, & multi-hop packet re-xm.
• VoIP calls often exceed– user acceptance of conversation interaction delay.
• User opinions below are my “compromise”– between Bell System standards & IETF standards
Round-Trip Delay Opinion< 150 ms good150-300 ms noticeable300-450 ms annoying> 450 ms unacceptable
![Page 26: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/26.jpg)
VoIP & Echo
• Acoustic echo– Is eliminated by wearing a head-set.
• Electrical echo1. VoIP-PSTN gateways more problematic than D-to-A gateways
because echo canceller is far from the echo source (hybrid)
2. User sensitivity to echo depends on individual, echo-to-signal ratio (TELR), & one-way delay.
• Since a digital conversation’s TELR 55 dB,– One-Way delay must be < 200-500 ms; but it’s often >200ms.– Large delay reduces the effectiveness of electronic echo cancellers.
• Summarizing, VoIP-to-POTS &, esp, VoIP-to-cell calls– are often characterized by annoying echo.
is much worse because:
![Page 27: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/27.jpg)
Summarizing…
• Digitizing speech – Improves intensity & purity;– But, noticeably degrades fidelity.– Overall, digital is perceived as “better than” analog;– But, it could be much better.
• VoIP makes no positive contribution;– VoIP only lowers the quality.– The last section proposes how we might change this.
![Page 28: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/28.jpg)
Identifying Phonemes and Speakers
• “Telephone voice” impairs our ability to– hear what a speaker says & identify who the speaker is.
• 4-kHz DS0 channel has enough BW for F1 & F2,
Little difficulty identifying vowels, ll, and rr.
• Hearing the 3rd and 4th formants would:– Slightly improve discrimination of these sounds,– Greatly improve discrimination of fricatives & plosives.
• A low F3 passes over a DS0 channel;
– But a high F3 will not, and F4 will not.
3. Technology’s Impact on Quality
![Page 29: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/29.jpg)
++Bandwidth ++Phoneme Discrimination
• We need a 7-kHz channel to receive all four formants,• & >7 kHz for sounds we typically struggle with:
– nasals (distinguishing mm and nn),– plosives (distinguishing k and t),– fricatives (distinguishing ss and ff).
• Exp: ff was spoken to many listeners over 3 channels:
Identified as:
Chan BW ff th p other
200-5000 Hz 194 35 6 9
200-2500 Hz 186 31 6 13
1000-5000 Hz 162 28 12 50
![Page 30: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/30.jpg)
++Bandwidth ++Speaker Identification
• We identify speakers directly by their Fourier weights,– Not their formant frequencies.– Success is based on the amount of data: # weights received.
• Consider three population groups:• Consistent with most people’s experience on the phone:
– Men are easily recognized, women less easily,– & we see why “all children sound the same on the phone.”
• A child could be recognized over a 12-kHz channel– as well as an average male is over a 4-kHz channel.– At 12kHz, any woman would be as identifiable as any man at 4kHz,– and men could be almost perfectly identified.
Type f1-range #H’s < 3.7kHz RankMen 75-150 Hz 25-50 mostWomen 140-300 Hz 12-26 middleChildren 275-350 Hz 10-13 least
Section 4 discusses howaudio quality is Impactedby “integrated networks”
![Page 31: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/31.jpg)
The History of the Complaint
• When T1 was proposed in the 1960s,– Amos Joel objected to its 8-kHz sample rate.
• T1’s advocates stifled him by saying he was– a dinosaur who objected to digital voice (he did not).
• Now, some VoIP advocates– use this tactic to stifle their critics.
• 8-kHz sampling was standardized– when bandwidth was expensive;
• Now that it isn’t,– we’re still stuck with the DS0 channel …– or are we?
![Page 32: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/32.jpg)
4. Network Integrationand App-Quality
• Review historical attempts at integrating networks,– Generalize how integration naturally lowers app quality– Ask why we have refused to learn this lesson.
• Section Outline1.History of Integrated Networks
2.Why Integration Lowers App Quality
3.Why are we Blind to this Lesson?
![Page 33: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/33.jpg)
History ofIntegrated Networks
• More than 35 years ago, ISDN … – was proposed as a global end-to-end network for all data types.– Today, it’s relegated to the network edge, as an access standard.
• ISDN’s post mortem shows two reasons it failed:1. ISDN needed a global digital network,
• an inexpensive users’ appliance/terminal,
• and a collection of integrated services – simultaneously.
• AT&T could have done it, but focused on surviving (it didn’t).
2. We learned that the application matters.• Ethernet’s stat-muxing was more efficient for bursty data,
• especially key-strokes on a LAN, than ISDN circuit switching.
• And, efficiency trumped integration.
4. Network Integration and App-Quality
![Page 34: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/34.jpg)
The 2nd attempt
• More than 20 years ago, ATM …– was proposed as a global end-to-end network for all data types.– Cell relay & virtual circuits avoid congestion from large packets– Limited success in “core,” where congestion is significant,
• Failed to achieve its main goal, again for two reasons:1. ATM’s success required that it also be cost-effective as a LAN.
• But, Ethernet prevailed because of embedded base of interface cards, LAN-manager familiarity, & evolution to higher rates
2. We saw again that application matters.• ATM was compared to a duck:
“Ducks can swim, fly, and walk, but none well.
ATM carries voice, data, and video, but none well.”
![Page 35: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/35.jpg)
The 3rd attempt
• Now, the Internet is proposed– as a global end-to-end network– to carry all data types.
• ISDN and ATM each failed– in part because application matters.
• What is different now?
![Page 36: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/36.jpg)
Why IntegrationLowers App Quality
• Let’s examine an economic explanation.– Box represents the cost of a basic un-optimized network
• Consider four cases defined by Networks: Separated Integrated
Low app-quality
High app-quality
4. Network Integration and App-Quality
$ basicnetwork
1 2
3 4
![Page 37: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/37.jpg)
Implementations withLow Quality
1. Separated & low - 2 apps, voice & data, with equal load
– Boxes represent the cost of two separate networks,• each dedicated to one app.
– App quality is barely acceptable because• neither network has been optimized for its app’s quality.
2. Integrated & low - 2 apps over an un-optimized integrated network.
– This box’s area >> the reference square• because the integrated network supports twice as much load.
• But, its area is less than the sum of the areas of 2 squares
• because of economy-of-scale & reduced staff of network managers.
– Since apps may interact in the integrated network,• each app’s quality is worse than in 2 separate networks.
• This is the classic “duck”.
$ basicvoice
network
$ basicdata
network
$ basicintegratednetwork
![Page 38: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/38.jpg)
Implementations withHigh Quality
3. Separated & high – Increase the quality of both appsby optimizing each network in Case 1 raises cost of each.
– Squares rectangles on different dimensions
optimize each network differently for resp. app.
4. Integrated & high - Improve apps’ quality in integrated network
– Perform same optimizations as on the separate networks.– So, the “duck” is elongated, but in both dimensions.– Significantly larger square than in Case 2– “SWAN” (Superior-service-With-All-apps Network).
$ goodvoice
network
$gooddata
network
$ integratednetwork thatis good for
voice & data
![Page 39: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/39.jpg)
Integration vs App-Quality
• If we don’t care about app-quality,– Case 2 beats Case 1– the integrated network is slightly more economical.
• If we do care about quality, Case 3 vs Case 4?
– Unclear how area of SWAN compares against– the sum of the areas of 2 separate rectangles
• Does the cost of optimizing an integrated network,– so its apps have good quality,– cancel the small savings provided by the integration?
• If not, wouldn’t IP-based voice carriers– Like Qwest long-distance, Skype, and Vonage– have dominated the telephone industry by now?
$ basicvoice
network
$ basicdata
network
$ basicintegratednetwork
$gooddata
network
$ goodvoice
network
$ integratednetwork thatis good for
voice & data
![Page 40: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/40.jpg)
Why are we Blindto this Lesson?
• Prior analysis is admittedly weak,– But it’s not fundamentally flawed.
• Seems clear from analysis & history lesson– that network integration is a bad idea;– assuming we don’t want to further degrade app-quality.
• Alchemists, a half a millennium ago, – had a goal that is at least easy to appreciate.
• Our determination to continue trying– to integrate networks is admirable, but puzzling.
![Page 41: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/41.jpg)
5. What Can We Do?
• Ranting about how bad things are– has become an all-too-familiar form of discourse.– Want to more than rant, & make a positive contribution,
• This section makes the transition from– how-bad-it-is to how-good-it-could-be by discussing– the market potential and proposing a solution.
• Section Outline1.Market Potential for High-Quality Apps
2.High-fidelity Voice-over-IP
![Page 42: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/42.jpg)
Market Potential for High-Quality Apps
• Significant market niche that cares about voice quality?– If there is a market, it’s among people who
• appreciate music that sounds better over a high-fi channel &
• are annoyed by, or have difficulty with, cell-phone audio quality
– This group is older, and growing rapidly as• the surge of baby boomers become older … and deafer.
• Decreasing ear-bandwidth reinforces adequacy of 12-kHz channel.
• Not an accurate marketing study - But, it seems likely that,
– If market size to justify products isn’t significant enough yet,– it could become large enough in just a few more years.
5. What Can We Do?
![Page 43: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/43.jpg)
analogsignal
High-fidelityVoice-over-IP
• VoIP presents the opportunity to raise voice quality,– not just to toll-grade, but even beyond.
1. 12-kHz channel would virtually eliminate “telephone voice” &
– Improve phoneme discrimination & speaker identification.– Channel bandwidth = 3x the DS0’s equivalent bandwidth
• G.711 codec 3x: Anti-aliasing filter’s BW & the ADC’s sample rate
– & Must packetize the digital stream at speaker-end• So it’s easily separated for a G.711 at the listener-end.
? Should be easily downward compatible: G.711New
? Made to work with speech compressing codecs
– While this proposal needs to be built & tested,• Two others have been implemented and tested at Pitt
5. What Can We Do?
A-to-Dconverter
12-kHzAAF
Packe-tizer
24 KHz
Note: The paperis incorrect.
![Page 44: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/44.jpg)
Minimizing Delay
2. VoIP delay,– & echo’s dependence on delay,– can be reduced by optimal packetization.
• When a network is lightly loaded,– packetization delay is reduced by generating small packets– Often – perhaps every 10 ms.
• When a network is heavily loaded,– network queue delays are reduced by generating large packets– less often – perhaps every 30 ms.
• We have demonstrated this– & necessary signaling has been implemented in RTCP.
![Page 45: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/45.jpg)
Maximizing Quality
3. Overall audio quality,– as defined by the ITU, is a complicated function of
• codec type, end-to-end delay, fidelity, etc.
• If an IP-phone has multiple codec-types,– We can optimize overall audio quality
• by changing codec-type mid-stream,
• depending on network congestion.
– Control signaling can also use VoIP’s RTCP.
• At Pitt, we are building …– a prototype system, we call Ernestine,– in which such techniques will be built & tested.
![Page 46: The Evolving Quality of Telephonic Speech Richard A. Thompson Emeritus Professor Telecom Program University of Pittsburgh rthompso@pitt.edu Why VoIP's.](https://reader031.fdocuments.us/reader031/viewer/2022032805/56649ee65503460f94bf6e8f/html5/thumbnails/46.jpg)
6. Conclusion
• Technology has improved net audio quality– over the last 100 years.– But, some aspects of audio quality,
especially fidelity, have devolved.– But, this devolution has an ironic solution.
• VoIP’s poor audio quality is not inherent to VoIP;– But, is a function of design choices,– some of which date back to the 1960s.
• Surprisingly, VoIP gives us the opportunity– to provide excellent audio quality,– If design changes proposed here are implemented.