Stimulus = anything that elicits a response. Stimulus Response
EE225D Spring,1999 Pitch Detection &Vocoders · but we saw in Chapter 16 that pitch would be...
Transcript of EE225D Spring,1999 Pitch Detection &Vocoders · but we saw in Chapter 16 that pitch would be...
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.1
PE Spring,1999
25D LEC
ORGAN / B.GOLD LECTURE 24
University of CaliforniaBerkeley
College of EngineeringDepartment of Electrical Engineering
and Computer Sciences
rofessors : N.Morgan / B.GoldE225D
Pitch Detection &Vocoders
Lecture 24
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.2
)
resentation?
e] and
25D LEC
ORGAN / B.GOLD LECTURE 24
Major Question
How to make a “Perfect” Vocoder? (Can it be done?
What limitations are encountered for low bit rate rep
Today’s Topic
Traditional 2400bps systems [or at least in that rang
Pitch &Voicing detection.
NEXT
Very low rate systems [600bps]
Higher quality more rubust systems at 5-30Kbps
NEXT
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.3
resu lt
the sam e stim u lus.
n ta l frequency estim ato r”
ived” even if
ve descrip tion
25D LEC
ORGAN / B.GOLD LECTURE 24
D ifficu lties E ncountered in P itch D etection
* Purpose o f p itch detection is to au tom atica lly obta in a
that is in agreem ent w ith a psychoacoustic resu lt for
* E arly researchers p referred to use the term “ fundam e
but w e saw in C hap ter 16 that p itch w ou ld be “perce
the stim u lus w as a harm on ic for that frequency.
(exam p le - sh ift o f v irtua l p itch )
* W hat w e’re really after is the N AT U R E and quantita ti
A nd a lso to m ake a vocoder sound natu ra l.
o f the exc ita tion function.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.4
ive.
bove.
nalysis d iff icu lt.
s
m e adu lts ch ildren 16 :1 range
ting track ing the period .
l tract constric tion .
25D LEC
ORGAN / B.GOLD LECTURE 24
3 . R epresen tation o f the transien t exc ita tion du ring p los
4 . R epresen tation o f the no ise for a w h ispered vow el
5 . R epresen tations o f various com b itnations o f a ll the a
E xpam p les o f speech w aveform s that m akes the above a
* D ynam ic range of quasi-period ic vocal co rd v ib ration
as low as 50H z fo r soas h igh as 800H z fo r
* R ap id varia tion in g lo tta l period* S udden change in vocal treat shape [e.g . nasal]* Transition from unvoiced to vo iced .* E nv ironm enta l transm ission p rob lem s.
* T h is m eans: 1 . D etection o f the tim e w hen the vocal cords are v ibra
in a [perhaps rap id ly vary ing ] quasi-period ic w ay and
2 . R epresen tation o f the fr ic tion no ise caused by a voca
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.5
W ted as the
lter function .
* and E ω( )
ned as fo llow s.
*may be different.
* are time-varying.
25D LEC
ORGAN / B.GOLD LECTURE 24
ith linear assum ptions, the speech w ave can be rep resen
C onvo lu tion of an exc ita tion function w ith a vocal tract f i
In spectra l term s : S ω( ) E ω( )H ω( )=
In a channel vocoder analyzer, m easurem ents , H ω( )S ω( )
are N O T com puted separate ly.
* In the channel vocoder syn thesizer, the spectrum is ob ta i
If E ω( ) is a FLAT SPECTRUM , S ω( ) S ω( ),≅ although the phases
The situation is complicated by the fact that E ω( ) and H ω( )
W hite N o ise
Pu lse G enerato rR outer
e t( )
S ω( ) E ω( )H ω( )=
G enerator
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.6
tion s ignal.
n
th
sate by being
)
25D LEC
ORGAN / B.GOLD LECTURE 24
In L PC , w e start o ff w ith
change o f nom enclatu re is the m odel o f the speech excita
L PC derives an all-pole m odel
It would be nice if Hˆ ω( ) was really a good representatio
Speech can be perfectly reconstructed by carvolving h n( ) wi
the error signal e n( ).
correspondingly different than Ex ω( ) .
if H ω( ) differs greatly from H ω( ), E ω( ) will compen
s n( ) e n( ) h n( )×=
S ω( ) E ω( ) H ω( )⋅ Ex ω( ) H ω(⋅= =
S n( ) Ex ω( ) H ω( )⋅=s ω( ) ex n( ) h n( )⋅=
ex n( )
of H ω( ) , the real vocal tract function.
H ω( ) h n( )→
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.7
f ilter separation
tra l dom ain ,
y, the m odel assum es
on , the exc ita tion
nted and then
ow er
eparation .
25D LEC
ORGAN / B.GOLD LECTURE 24
H om om orph ic analys is has the hypo thesis that source -
T he m odel a lso assum es that these are m u ltip lied in the spec
so that tak ing the log turns the product in to a sum . F inall
that the tw o are separab le w ith lif tering. G iven th is separati
function and the vocal tract f ilter function can be represe
C onvo lved to g ive the syn thesized speech.
H ω( )
* M any L P C system s [m u lti pu lse, ce lp , e tc .] derive their p
by search ing for an erro r s ignal that com pensates for .
is m anifested as spectrum envelope - spectra l f ine structu re s
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.8
), a ll system s re lay on the
b le period pu lse source
c tra and take few b its
0bpss
25D LEC
ORGAN / B.GOLD LECTURE 24
In o rder to ach ieve low transm ission rates (e.g . 2400bps
excita tion m odel consisting o f a no ise source and a varia
- B oth sources are reasonab le approx im ations to f la t spe
buzz-h iss sw itch - 1b it every 10m sec. 10pitch tracker - 6b its every 10m sec. 600bp
to transm it.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.9
M resent, F u ture .
cam e a standard.
e quality at 2400 bps.
o liferate,
nger the so le criterian .
tradeo ff.
m ethodsPC M ,A D PC Mictive ab ility
2
25D LEC
ORGAN / B.GOLD LECTURE 24
a jor M otiva tion for D orry R esearch on Vocoders : P ast, P
P ast - S ecrecy - W W II - D ata rates w ere lim ited . 2400bps be
N early a ll fund ing cam e from D O D to try to im prov
P resen t - M odem s are m uch better. A s cellu la r phones p r
date rate l im itations sti ll app ly but 2400bps is no lo
M ain d irection is sti ll quality (robustness) - b it ra te
F uture - G reater robustness - e ffic ient sto rage o f
speech (and m usic) - coding -recognition tie-in .
Tw o S ides of the C o in
B asic M odels fo r A nalysis & S yn thesis
channelLP CH om om orph ic
Wave fo rmPC M ,A
S om e p red
1
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.10
ction and
System
l Tract
menting a
.
s.
25D LEC
ORGAN / B.GOLD LECTURE 24
Complete Channel Vocoder
* Synthetic speech is the convolution of an excitation fun
a vocal tract filter function.
* Assumption : Synthesizer is a Time variable Linear
If this assumption was wrong and excitation and Voca
Interacted in some Non-Linear Way, problem of imple
“transparent” system probably becames intractable
Working Hypothesis for 2400bps (and lower) system
Excitation is either buss [variable pulse generator]
Remember basic assumption for all vocoders.
or hiss [white noise generator]
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.11
hat tracks perfectly and
ator at the synthesizer is
ents.
pectral distortionintroduced bypitch jitter.
envelope caused by jitter
ω
25D LEC
ORGAN / B.GOLD LECTURE 24
Now, assume that you have built a great pitch detector t
records
Now, this information is transmitted and the buzz gener
forced to produce pulses based on the above measurem
Most of the time,
T1 T2 T3 etc., , ,
Spitch dose NOT behave this badly.
Consider the spectrum of a jittered pulse train.
E ω( )T4T1 T3T2
time
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.12
20msec.. Analyzer
a period of 20msec.
not as bad as
bove E ω( ) and H ω( ).
ω) E ω( ) H ω( )⋅=
Spectral distortionintroduced bypitch jitter.
25D LEC
ORGAN / B.GOLD LECTURE 24
In real life, lets assume that analysis takes place every
generates a single pitch number, so at synthesizer, for
actual excitation during voicing. ~
20ms. 20msec.
time
time
T1 T2 T3 T4
S ω( ) is the product of the a
S(
e t( ) s t( ) S ω( ) E2 ω( ) H ω( )⋅=
at the synthesizer
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.13
oise.
e
ls
Synthetic speech
) H ω( )⋅
with Spectrally flattened
excitation.
25D LEC
ORGAN / B.GOLD LECTURE 24
Spectral Flattering
Turn the excitation signal into a white signal or white n
xcitation
Modulation signa
Model of S ω( ) E ω(=
+
B.P.Filter
Hard Limitor
B.P.Filter
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.14
elope function
ctrally flattered.
rmine perceived quality.
spectral flattering
sizer.
ants”.
nvelope function?
25D LEC
ORGAN / B.GOLD LECTURE 24
Major Question
Does all-pole synthesizer model the Vocal tract env
- if the former is true, excitation should NOT be spe
- if the latter is true, spectral flattering may help.
* Joe Tierrey and I did an informal experiment to dete
The result was ambiguous.
* In general, existing LPC systems (low rate) do NOT use
It may depend on the ORDER of the pedictor & synthe
a 10th order predictor correspends to five “form
or the complete speech e
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.15
el vocoders &LPC.
hink] been tried but
PC).
pe could be completely
e
lope
ght to be perfect.
25D LEC
ORGAN / B.GOLD LECTURE 24
Homomorphic Vocoder
* Excitation is modelled in the same way as for chann
Spectral flattering of the exciation signal has never [I t
it should work ( in the same ballpark as channel &L
Point C is Cepstrum.
Vocal tract
Low time High time cepstrumNote - if excitation and envelo
tim
High time comjienent of enve
Envelope
Speech Spectrum Envelope
or
cepstrumseparated, synthesis ou
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.16
ion Spectral
Pattern
e Function
Recognition for Pitch
25D LEC
ORGAN / B.GOLD LECTURE 24
512-Point Log Separat
Excitation
FFT Magnitude in Tim512-Point
FFT
(a) (b) (c) (d)
Figure 20.1 : Cepstral analysis.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.17
orrelation Function
ed Speech.
ections with 15ms
25D LEC
ORGAN / B.GOLD LECTURE 24
TIM
E
L A G 15m s
Figure 30.8 : Autoc
of Spectrally Flatten
Successive 30ms s
overlap.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.18
O u tpu tΣ
n An⁄
1 A1⁄
peech S ignal.
F la ttened
O rig inal
25D LEC
ORGAN / B.GOLD LECTURE 24
Input
B PF
FW R Sm ooth
D elay ÷F1
Fn
S1
Sn
A1
An
Cn S=
C1 S=
B PF
FW R Sm ooth
D elay ÷
F igure 30.7 : Spectra l F la tten ing and its E ffect on the S
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.19
ntial Decayime)
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.3 Extraction of the Period
Variable Blanking Time Variable Expone(Rundown T
Firings
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.20
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.6 : Low-Pass filtered speech signal.
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.21
h Detection.
ge of Pitch
ns in Time
ariations in Time
iced Transition
peech
ise Background
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.4 : Six Examples of Difficulties in Pitc
Dynamic Ran
Pitch Variatio
Vocal Tract V
Voiced-Unvo
Telephone S
Acoustic No
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.22
D etection.
25D LEC
ORGAN / B.GOLD LECTURE 24
F igu re 30 .10 : C epstra l A nalys is for P itch
40dB
time
time
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.23
el.
h
S
Pitch Detection
Stage 2
Based on Correlation
with Reference Patterns
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 16.9 : Block Diagram of the Periodicity Mod
Figure 16.10 : Block Diagram of the Place Model.
SignalG lobal
Pitc
Filter Bank Elementary
ignal 4096-PT Detection
Pitch Detection
Stage 1
F1
F2
FM
EPD1
EPD2
EPDM
Pitch Detectors
P itch D etec tion A lgo r ithm
FFT Spectral Peaks
Based on Separation
Between Peaks
of
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.24
inner
Freuency
p5 p6 Quanta
ude
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 30.13 :
Frequency
1050Hz
W
p1
p1
p1 p1p2
p2
p2
p2
p2
p3
p3
p3 p3
p3
p3
p3 p3
p4
p5
p5
p6
p1 p2 p3 p4
p n( )
Spectral Magnit
12
345 67
Armonic Pitch Detection Algorithm
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.25
ndidate (P.C.) 1
P.C. 2frequency (Hz)
P.C. 3
P.C. n
lgorithm
ctrum
25D LEC
ORGAN / B.GOLD LECTURE 24
Pitch Ca
Figure 30.14 : Goldstein-Duifhuis Optimum Processor A
Spe
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.26
sizer
Magnitude
Signals
Signals
Encode
Encode
Signals
TRANSMIT
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.2 : Channel Vocoder Analyzer and Synthe
V/UV
PitchPitch
Voicing
Detector
Detector
Bandpass Magnitude
Lowpass DecimateFilter N Filter N
x n( )
Bandpass Magnitude
Lowpass DecimateFilter 1 Filter 1
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.27
sizer
Vocoder
Output+
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.2 : Channel Vocoder Analyzer and Synthe
Switch
Pulse Generator Noise Generator
Bandpass Filter 1
Bandpass Filter 2
Bandpass Filter N
RECEIVE
V/UV Signals
Pitch Signals
Magnitude Signals
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.28
Estimate
Half-Wave Rectifier.
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.4 : Effect of Pitch Ripple in a Spectral
Figure 31.3 : Example of Energy Measurement With a
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.29
r to Modem
Synthesis
C
Synthesized
M
esizer
Parameters
Speech
thm
25D LEC
ORGAN / B.GOLD LECTURE 24
Input Measurement of Computation of Paramete
Pitch and Voicing
from Parameter
Buss Generator
Hiss Generator
LPPitch Voicing
Gain
Autocorrelation Function
Symthesis Parameters Coding
Detection
Decodingodem
SynthBit
Figure 31.11 : Block Diagram of the LPC Algori
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.30
age 0
In
B Front (Lips)
Output
z 1–
1
K0
K0
-
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.12 : Lattice Synthesizer for LPC
+
Stage P-1 Stage 1 St
put
ack (Glottal)
... +
z 1–
+
+ + +z 1– z 1–
-
+
-+
+ +
++...
...
Kp 1– K1
Kp 1– K1
- -
-
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.31
verse
nvolveSynthetic
ourier nsform
Speech
25D LEC
ORGAN / B.GOLD LECTURE 24
Schematic of the Homomorphic Vocoder.
Input Speech Samples
Fourier
Log
InMultiplier
Cepstral
Fourier Exponential
Co
Excitation
Excitation Parameters
Analyzer Synthesizer
Transform
Magnitude
FTra
Generator
Pitch Detector
Transform
Inverse Fourier
Transform
Time
1
C
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.32
is
ultiplier
ime
c nT( )
h nT( )
c nT( )
s nT( )
e
e
25D LEC
ORGAN / B.GOLD LECTURE 24
Figure 31.13 : Cepstral Vocoder Analys
Log Inverse MFourier
MagnitudeFourier
TransformTransform
T
s nT( ) S ω( ) s nT( )
S ω( )
s nT( )
Time
FrequencyTim
Tim
EE 2 TURE ON PITCH DETECTION AND VOCODERS
N.M 23.33
scan in
25D LEC
ORGAN / B.GOLD LECTURE 24
Inverse Fourier Exponential
Convolve
Excitation Excitation Parameters
Fourier Transform
Generator
Transform
Figure 31.14 : Cepstral Vocoder Synthesizer
c nT( )v nT( )
e nT( )