Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.
-
Upload
beverly-jackson -
Category
Documents
-
view
222 -
download
1
Transcript of Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.
![Page 1: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/1.jpg)
Stein VoiceDSP 1.1
VoiceVoice
DSPDSP
ProcessingProcessing
II
VoiceVoice
DSPDSP
ProcessingProcessing
II
Yaakov J. Stein
Chief ScientistRAD Data Communications
![Page 2: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/2.jpg)
Stein VoiceDSP 1.2
Voice DSPVoice DSP
Part 1 Speech biology and what we can learn from it
Part 2 Speech DSP (AGC, VAD, features, echo cancellation)
Part 3 Speech compression techiques
Part 4 Speech Recognition
![Page 3: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/3.jpg)
Stein VoiceDSP 1.3
Voice DSP - Part 1aVoice DSP - Part 1a
Speech production mechanisms Biology of the vocal tract Pitch and formants Sonograms The basic LPC model The cepstrum LPC cepstrum Line spectral pairs
![Page 4: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/4.jpg)
Stein VoiceDSP 1.4
Voice DSP - Part 1bVoice DSP - Part 1b
Speech perception mechanisms
Biology of the ear
Psychophysical phenomena– Weber’s law– Fechner’s law– Changes– Masking
![Page 5: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/5.jpg)
Stein VoiceDSP 1.5
Voice DSP - Part 1cVoice DSP - Part 1c
Speech quality measurement
Subjective measurement– MOS and its variants
Objective measurement– PSQM, PESQ
![Page 6: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/6.jpg)
Stein VoiceDSP 1.6
Voice DSP - Part 2aVoice DSP - Part 2a
Basic speech processing Simplest processing
– AGC– Simplistic VAD
More complex processing – pitch tracking– formant tracking– U/V decision– computing LPC and other features
![Page 7: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/7.jpg)
Stein VoiceDSP 1.7
Voice DSP - Part 2bVoice DSP - Part 2b
Echo Cancellation Sources of echo (acoustic vs. line echo) Echo suppression and cancellation Adaptive noise cancellation The LMS algorithm Other adaptive algorithms The standard LEC
![Page 8: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/8.jpg)
Stein VoiceDSP 1.8
Voice DSP - Part 3Voice DSP - Part 3
Speech compression techniques PCM ADPCM SBC VQ ABS-CELP MBE MELP STC Waveform Interpolation
![Page 9: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/9.jpg)
Stein VoiceDSP 1.9
Voice DSP - Part 4Voice DSP - Part 4
Speech Recognition tasks
ASR Engine
Phonetic labeling
DTW
HMM
State-of-the-Art
![Page 10: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/10.jpg)
Stein VoiceDSP 1.10
Voice DSP - Part 1aVoice DSP - Part 1a
Speech
production
mechanisms
![Page 11: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/11.jpg)
Stein VoiceDSP 1.11
Speech Production OrgansSpeech Production Organs
Esophagus
Nasalcavity
Mouthcavity
Tongue
Larynx
Trachea
Uvula
Brain
Lungs
Pharynx
Teeth
Lips
Hard Palate
Velum
![Page 12: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/12.jpg)
Stein VoiceDSP 1.12
Speech Production Organs - cont.Speech Production Organs - cont.
Air from lungs is exhaled into trachea (windpipe)
Vocal chords (folds) in larynx can produce periodic pulses of air
by opening and closing (glottis)
Throat (pharynx), mouth, tongue and nasal cavity modify air flow
Teeth and lips can introduce turbulence
Epiglottis separates esophagus (food pipe) from trachea
![Page 13: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/13.jpg)
Stein VoiceDSP 1.13
Voiced vs. Unvoiced SpeechVoiced vs. Unvoiced Speech
When vocal cords are held open air flows unimpeded When laryngeal muscles stretch them glottal flow is in bursts
When glottal flow is periodic called voiced speech Basic interval/frequency called the pitch Pitch period usually between 2.5 and 20 milliseconds
Pitch frequency between 50 and 400 Hz
You can feel the vibration of the larynx Vowels are always voiced (unless whispered) Consonants come in voiced/unvoiced pairs
for example : B/P K/G D/T V/F J/CH TH/th W/WH Z/S ZH/SH
![Page 14: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/14.jpg)
Stein VoiceDSP 1.14
Excitation spectraExcitation spectra
Voiced speech
Pulse train is not sinusoidal - harmonic rich
Unvoiced speech
Common assumption : white noise
f
f
![Page 15: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/15.jpg)
Stein VoiceDSP 1.15
Effect of vocal tractEffect of vocal tract
Mouth and nasal cavities have resonances
Resonant frequencies
depend on geometry
![Page 16: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/16.jpg)
Stein VoiceDSP 1.16
Effect of vocal tract - cont.Effect of vocal tract - cont.
Sound energy at these resonant frequencies is amplified Frequencies of peak amplification are called formants
F1
F2
F3
F4
freq
uenc
y re
spon
se
frequency
voiced speech unvoiced speech
F0
![Page 17: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/17.jpg)
Stein VoiceDSP 1.17
Formant frequenciesFormant frequencies Peterson - Barney data (note the “vowel triangle”)
![Page 18: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/18.jpg)
Stein VoiceDSP 1.18
SonogramsSonograms
![Page 19: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/19.jpg)
Stein VoiceDSP 1.19
Cylinder model(s)Cylinder model(s)
Rough model of throat and mouth cavity
With nasal cavity
Voice
Excitation
Voice
Excitation
open
open
open/closed
![Page 20: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/20.jpg)
Stein VoiceDSP 1.20
PhonemesPhonemes
The smallest acoustic unit that can change meaning Different languages have different phoneme sets Types: (notations: phonetic, CVC, ARPABET)
– Vowels• front (heed, hid, head, hat)• mid (hot, heard, hut, thought)• back (boot, book, boat)• dipthongs (buy, boy, down, date)
– Semivowels• liquids (w, l)• glides (r, y)
![Page 21: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/21.jpg)
Stein VoiceDSP 1.21
Phonemes - cont.Phonemes - cont.
– Consonants
• nasals (murmurs) (n, m, ng)
• stops (plosives)
– voiced (b,d,g)
– unvoiced (p, t, k)
• fricatives
– voiced (v, that, z, zh)
– unvoiced (f, think, s, sh)
• affricatives (j, ch)
• whispers (h, what)
• gutturals ( ע ,ח )
• clicks, etc. etc. etc.
![Page 22: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/22.jpg)
Stein VoiceDSP 1.22
Basic LPC ModelBasic LPC Model
LPCsynthesis
filter
White Noise
Generator
Pulse
Generator
U/VSwitch
![Page 23: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/23.jpg)
Stein VoiceDSP 1.23
Basic LPC Model - cont.Basic LPC Model - cont.
Pulse generator produces a harmonic rich periodic impulse train (with pitch period and gain)
White noise generator produces a random signal
(with gain)
U/V switch chooses between voiced and unvoiced speech
LPC filter amplifies formant frequencies
(all-pole or AR IIR filter)
The output will resemble true speech to within residual error
![Page 24: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/24.jpg)
Stein VoiceDSP 1.24
CepstrumCepstrum
Another way of thinking about the LPC model
Speech spectrum is the obtained from multiplication
Spectrum of (pitch) pulse train times
Vocal tract (formant) frequency response
So log of this spectrum is obtained from addition
Log spectrum of pitch train plus
Log of vocal tract frequency response
Consider this log spectrum to be the spectrum of some new signal
called the cepstrum
The cepstrum is the sum of two components:
excitation plus vocal tract
![Page 25: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/25.jpg)
Stein VoiceDSP 1.25
Cepstrum - cont.Cepstrum - cont.
Cepstral processing has its own language Cepstrum (note that this is really a signal in the time domain)
Quefrency (its units are seconds)
Liftering (filtering)
Alanysis
Saphe
Several variants: complex cepstrum power cesptrum LPC cepstrum
![Page 26: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/26.jpg)
Stein VoiceDSP 1.26
Do we know enough?Do we know enough?
Standard speech model (LPC) (used by most speech processing/compression/recognition systems)
is a model of speech production
Unfortunately, speech production and speech perception systems
are not matched
So next we’ll look at the biology of the hearing (auditory) system
and some psychophysics (perception)
![Page 27: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/27.jpg)
Stein VoiceDSP 1.27
Voice DSP - Part 1bVoice DSP - Part 1b
SpeechHearing &perception mechanisms
![Page 28: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/28.jpg)
Stein VoiceDSP 1.28
Hearing OrgansHearing Organs
![Page 29: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/29.jpg)
Stein VoiceDSP 1.29
Hearing Organs - cont.Hearing Organs - cont.
Sound waves impinge on outer ear enter auditory canal Amplified waves cause eardrum to vibrate Eardrum separates outer ear from middle ear The Eustachian tube equalizes air pressure of middle ear Ossicles (hammer, anvil, stirrup) amplify vibrations Oval window separates middle ear from inner ear Stirrup excites oval window which excites liquid in the cochlea The cochlea is curled up like a snail The basilar membrane runs along middle of cochlea The organ of Corti transduces vibrations to electric pulses Pulses are carried by the auditory nerve to the brain
![Page 30: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/30.jpg)
Stein VoiceDSP 1.30
Function of CochleaFunction of Cochlea
Cochlea has 2 1/2 to 3 turns
were it straightened out it would be 3 cm in length The basilar membrane runs down the center of the cochlea
as does the organ of Corti 15,000 cilia (hairs) contact the vibrating basilar membrane
and release neurotransmitter stimulating 30,000 auditory neurons Cochlea is wide (1/2 cm) near oval window and tapers towards apex is stiff near oval window and flexible near apex Hence high frequencies cause section near oval window to vibrate
low frequencies cause section near apex to vibrate Overlapping bank of filter frequency decomposition
![Page 31: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/31.jpg)
Stein VoiceDSP 1.31
Psychophysics - Weber’s lawPsychophysics - Weber’s law
Ernst Weber Professor of physiology at Leipzig in the early 1800s
Just Noticeable Difference :
minimal stimulus change that can be detected by senses
Discovery: I = K I
Example Tactile sense: place coins in each handsubject could discriminate between with 10 coins and 11, but not 20/21, but could 20/22!
Similarly vision lengths of lines, taste saltiness, sound frequency
![Page 32: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/32.jpg)
Stein VoiceDSP 1.32
Weber’s law - cont.Weber’s law - cont.
This makes a lot of sense
Bill Gates
![Page 33: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/33.jpg)
Stein VoiceDSP 1.33
Psychophysics - Fechner’s lawPsychophysics - Fechner’s law
Weber’s law is not a true psychophysical law it relates stimulus threshold to stimulus (both physical entities)
not internal representation (feelings) to physical entity
Gustav Theodor Fechner student of Weber medicine, physics philosophy
Simplest assumption: JND is single internal unit
Using Weber’s law we find:
Y = A log I + B
Fechner Day (October 22 1850)
![Page 34: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/34.jpg)
Stein VoiceDSP 1.34
Fechner’s law - cont.Fechner’s law - cont.
Log is very compressive
Fechner’s law explains the fantastic ranges of our senses
Sight: single photon - direct sunlight 1015
Hearing: eardrum move 1 H atom - jet plane 1012
Bel defined to be log10 of power ratiodecibel (dB) one tenth of a Bel
d(dB) = 10 log10 P 1 / P 2
![Page 35: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/35.jpg)
Stein VoiceDSP 1.35
Fechner’s law - sound amplitudesFechner’s law - sound amplitudes
Companding
adaptation of logarithm to positive/negative signals
law and A-law are piecewise linear approximations
Equivalent to linear sampling at 12-14 bits
(8 bit linear sampling is significantly more noisy)
![Page 36: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/36.jpg)
Stein VoiceDSP 1.36
Fechner’s law - sound frequenciesFechner’s law - sound frequencies
octaves, well tempered scale
Critical bands
Frequency warping
Melody 1 KHz = 1000, JND afterwards M 1000 log2 ( 1 + fKHz )
Barkhausen can be simultaneously heard B 25 + 75 ( 1 + 1.4 f2KHz )0.69
excite different basilar membrane regions
f
12 2
![Page 37: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/37.jpg)
Stein VoiceDSP 1.37
Psychophysics - changesPsychophysics - changes
Our senses respond to changes
Inverse
E
Filter
![Page 38: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/38.jpg)
Stein VoiceDSP 1.38
Psychophysics - maskingPsychophysics - masking
Masking: strong tones block weaker ones at nearby frequencies
narrowband noise blocks tones (up to critical band)
f
![Page 39: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/39.jpg)
Stein VoiceDSP 1.39
Voice DSP - Part 1cVoice DSP - Part 1c
Speech
Quality
Measurement
![Page 40: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/40.jpg)
Stein VoiceDSP 1.40
Why does it sound Why does it sound the way it sounds?the way it sounds?
PSTN BW=0.2-3.8 KHz, SNR>30 dB PCM, ADPCM (BER 10-3) five nines reliability line echo cancellation
Voice over packet network speech compression delay, delay variation, jitter packet loss/corruption/priority echo cancellation
![Page 41: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/41.jpg)
Stein VoiceDSP 1.41
Subjective Voice QualitySubjective Voice Quality
Old Measures 5/9 DRT DAM
The modern scale MOS DMOS
meet neat seat feet Pete beat heat
![Page 42: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/42.jpg)
Stein VoiceDSP 1.42
MOS according to ITUMOS according to ITU
P.800 Subjective Determination of Transmission Quality
Annex B: Absolute Category Rating (ACR)
Listening Quality Listening Effort5 excellent relaxed
4 good attention needed
3 fair moderate effort
2 poor considerable effort
1 bad no meaning
with feasible effort
![Page 43: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/43.jpg)
Stein VoiceDSP 1.43
MOS according to ITU (cont)MOS according to ITU (cont)
Annex D Degradation Category Rating (DCR)
Annex E Comparison Category Rating (CCR)
ACR not good at high quality speech
DCR CCR 5 inaudible 4 not annoying 3 slightly annoying much better 2 annoying better 1 very annoying slightly better 0 the same -1 slightly worse-2 worse-3 much worse
![Page 44: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/44.jpg)
Stein VoiceDSP 1.44
Some MOS numbersSome MOS numbers
Effect of Speech Compression:
(from ITU-T Study Group 15)
Quiet room 48 KHz 16 bit linear sampling 5.0 PCM (A-law/law) 64 Kb/s 4.1 G.723.1 @ 6.3 Kb/s 3.9 G.729 @ 8 Kb/s 3.9 ADPCM G.726 32 Kb/s 3.8 toll quality GSM @ 13Kb/s 3.6 VSELP IS54 @ 8Kb/s 3.4
![Page 45: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/45.jpg)
Stein VoiceDSP 1.45
The Problem(s) with MOSThe Problem(s) with MOS
Accurate MOS tests are the only reliable benchmark
BUT
MOS tests are off-line MOS tests are slow MOS tests are expensive Different labs give consistently different results Most MOS tests only check one aspect of system
![Page 46: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/46.jpg)
Stein VoiceDSP 1.46
The Problem(s) with SNRThe Problem(s) with SNR
Naive question: Isn’t CCR the same as SNR?
SNR does not correlate well with subjective criteria
Squared difference is not an accurate comparator
Gain Delay Phase Nonlinear processing
![Page 47: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/47.jpg)
Stein VoiceDSP 1.47
Speech distance measuresSpeech distance measures
Many objective measures have been proposed:
Segmental SNR Itakura Saito distance Euclidean distance in Cepstrum space Bark spectral distortion Coherence Function
None correlate well with MOS
ITU target - find a quality-measure that does correlate well
![Page 48: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/48.jpg)
Stein VoiceDSP 1.48
Some objective methodsSome objective methods
Perceptual Speech Quality Measurement (PSQM)
ITU-T P.861
Perceptual Analysis Measurement System (PAMS)
BT proprietary technique
Perceptual Evaluation of Speech Quality (PESQ)
ITU-T P.862
Objective Measurement of Perceived Audio Quality (PAQM)
ITU-R BS.1387
![Page 49: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/49.jpg)
Stein VoiceDSP 1.49
Objective Quality StrategyObjective Quality Strategy
speechMOS
estimate
channel
QM
QM
to
MOS
![Page 50: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/50.jpg)
Stein VoiceDSP 1.50
PSQM philosophyPSQM philosophy(from P.861)(from P.861)
Perceptual
model
Perceptual
model
Internal
Representation
Internal
Representation
Audible
Difference
Cognitive
Model
![Page 51: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/51.jpg)
Stein VoiceDSP 1.51
PSQM philosophy (cont)PSQM philosophy (cont)
Perceptual Modelling (Internal representation) Short time Fourier transform Frequency warping (telephone-band filtering, Hoth noise) Intensity warping
Cognitive Modelling Loudness scaling Internal cognitive noise Asymmetry Silent interval processing
PSQM Values 0 (no degradation) to 6.5 (maximum degradation)
Conversion to MOS PSQM to MOS calibration using known references Equivalent Q values
![Page 52: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/52.jpg)
Stein VoiceDSP 1.52
Problems with PSQMProblems with PSQM
Designed for telephony grade speech codecs
Doesn’t take network effects into account: filtering variable time delay localized distortions
Draft standard P.862 adds: transfer function equalization time alignment, delay skipping distortion averaging
![Page 53: Stein VoiceDSP 1.1 Voice DSP Processing I Yaakov J. Stein Chief Scientist RAD Data Communications.](https://reader035.fdocuments.us/reader035/viewer/2022062301/56649e4b5503460f94b3fd27/html5/thumbnails/53.jpg)
Stein VoiceDSP 1.53
PESQ philosophyPESQ philosophy(from P.862)(from P.862)
Perceptual
model
Perceptual
model
Internal
Representation
Internal
Representation
Audible
Difference
Cognitive
Model
Time
Alignment