Post on 25-Jan-2021
EE269Signal Processing for Machine Learning
Lecture 6
Instructor : Mert Pilanci
Stanford University
Jan 9 2019
Bayes classifiers
Given Pxy joint probability distribution of (x[n], y) what is thebest classifier ?
I signal classifier function f(x[n]) : RN → {1, ..., k}I best classifier depends on the performance measure
I risk (probability of error):
R(f) = Pxy(f(x) 6= y)
Bayes risk R∗
I Bayes risk is R∗ the smallest risk of any classifier
If R(f) = R∗, then f is a Bayes classifier
πk := Py(y = k) prior class probabilities
gk(x) := Px|y=k class conditional distribution
I define posterior class probabilities:
ηk(x) = Py=k|x(y = k|x)
note that∑K
k=1 ηk(x) = 1
Bayes risk R∗
I Bayes risk is R∗ the smallest risk of any classifier
If R(f) = R∗, then f is a Bayes classifier
πk := Py(y = k) prior class probabilities
gk(x) := Px|y=k class conditional distribution
I define posterior class probabilities:
ηk(x) = Py=k|x(y = k|x)
note that∑K
k=1 ηk(x) = 1
Example: x[n] = x ∈ R
I Suppose x|y is Gaussian, y ∈ {−1,+1}
x|y = −1 ∼ N(µ−, σ2−) and x|y = +1 ∼ N(µ+, σ2+)
η1(x) = P (y = 1|x) = P (x|y=1)P (y=1)P (x) (Bayes rule)
Bayes classifier
πk := Py(y = k) prior class probabilities
gk(x) := Px|y=k class conditional distribution
ηk(x) = Py=k|x(y = k|x) posterior class probabilities
Theorem: The classifier
f∗(x) = arg maxk=1,...,K
ηk(x)
= arg maxk=1,...,K
πkgk(x)
is a Bayes classifier
Detecting a constant signal in Gaussian noise
I x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(µ, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−µ)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
µ
N∑i=1
xi > σ2 log(
π1π2
) +Nµ2/2
Detecting a constant signal in Gaussian noise
I x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(µ, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−µ)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
µ
N∑i=1
xi > σ2 log(
π1π2
) +Nµ2/2
Detecting a constant signal in Gaussian noise
I x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(µ, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−µ)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
µ
N∑i=1
xi > σ2 log(
π1π2
) +Nµ2/2
Detecting a change in variance
I x[n] = [x1, ...xN ] ∼ N(0, σ21) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(0, σ22) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ1 and σ2 is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ21
e− 1
2σ21(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ22
e− 1
2σ22(xi)
2
Bayes classifier:
Classify as class 2 if π1g1(x) < π2g2(x)(1
2σ21− 1
2σ22
) N∑i=1
x2i > log(π1π2
)− N2
log(σ21σ22
)
Detecting a change in variance
I x[n] = [x1, ...xN ] ∼ N(0, σ21) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(0, σ22) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ1 and σ2 is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ21
e− 1
2σ21(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ22
e− 1
2σ22(xi)
2
Bayes classifier:
Classify as class 2 if π1g1(x) < π2g2(x)(1
2σ21− 1
2σ22
) N∑i=1
x2i > log(π1π2
)− N2
log(σ21σ22
)
Detecting a change in variance
I x[n] = [x1, ...xN ] ∼ N(0, σ21) i.i.d. for y = 1 (noise)I x[n] = [x1, ...xN ] ∼ N(0, σ22) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ1 and σ2 is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ21
e− 1
2σ21(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ22
e− 1
2σ22(xi)
2
Bayes classifier:
Classify as class 2 if π1g1(x) < π2g2(x)(1
2σ21− 1
2σ22
) N∑i=1
x2i > log(π1π2
)− N2
log(σ21σ22
)
Detecting a known waveformI s[n] ∈ RN known waveformI x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [s1, ...sN ] +N(0, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−si)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
N∑i=1
xisi > σ2 log(
π1π2
) +1
2
N∑i=1
|si|2
〈x, s〉 > σ2 log(π1π2
) +1
2||s||22
I matched filter (e.g., face detector, DFT basis)
Detecting a known waveformI s[n] ∈ RN known waveformI x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [s1, ...sN ] +N(0, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−si)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
N∑i=1
xisi > σ2 log(
π1π2
) +1
2
N∑i=1
|si|2
〈x, s〉 > σ2 log(π1π2
) +1
2||s||22
I matched filter (e.g., face detector, DFT basis)
Detecting a known waveformI s[n] ∈ RN known waveformI x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [s1, ...sN ] +N(0, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−si)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
N∑i=1
xisi > σ2 log(
π1π2
) +1
2
N∑i=1
|si|2
〈x, s〉 > σ2 log(π1π2
) +1
2||s||22
I matched filter (e.g., face detector, DFT basis)
Detecting a known waveformI s[n] ∈ RN known waveformI x[n] = [x1, ...xN ] ∼ N(0, σ2) i.i.d. for y = 1 (noise)I x[n] = [s1, ...sN ] +N(0, σ2) i.i.d. for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume µ is known
g1(x) := Px|y=1 =∏Ni=1
1√2πσ2
e−1
2σ2(xi)
2
g2(x) := Px|y=2 =∏Ni=1
1√2πσ2
e−1
2σ2(xi−si)2
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
N∑i=1
xisi > σ2 log(
π1π2
) +1
2
N∑i=1
|si|2
〈x, s〉 > σ2 log(π1π2
) +1
2||s||22
I matched filter (e.g., face detector, DFT basis)
Multivariate Gaussianx1, ...xn ∼ i.i.d. N(µ,Σ)
P (x1, ...xn;µ,Σ) =1
(2π)N2 |Σ|
12
e−12
(x−µ)TΣ−1(x−µ)
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ and Σ are known
g1(x) := Px|y=1 =1
(2πσ2)N2e−
12xT (σ2In)−1x
g2(x) := Px|y=2 =1
(2π)N2 |Σ|
12e−
12xT (Σ+σ2In)−1x
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
log(π1)−1
2x(σ2In)
−1x < −12x(Σ+σ2In)
−1x+log π2+constant
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ and Σ are known
g1(x) := Px|y=1 =1
(2πσ2)N2e−
12xT (σ2In)−1x
g2(x) := Px|y=2 =1
(2π)N2 |Σ|
12e−
12xT (Σ+σ2In)−1x
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
log(π1)−1
2x(σ2In)
−1x < −12x(Σ+σ2In)
−1x+log π2+constant
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)I P (y = 1) = π1 and P (y = 2) = π2
Assume σ and Σ are known
g1(x) := Px|y=1 =1
(2πσ2)N2e−
12xT (σ2In)−1x
g2(x) := Px|y=2 =1
(2π)N2 |Σ|
12e−
12xT (Σ+σ2In)−1x
Bayes classifier:
Classify as signal+noise if π1g1(x) < π2g2(x)
log(π1)−1
2x(σ2In)
−1x < −12x(Σ+σ2In)
−1x+log π2+constant
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)
log(π1)−1
2xT (σ2In)
−1x < −12xT (Σ+σ2In)
−1x+log π2+constant
I Matrix inversion lemma (A invertible)(A+B)−1 = A−1 −A−1(I +BA−1)−1BA−1
I (σ2In)−1 − (Σ + σ2In)−1 = 1σ2 (σ2In + Σ)
−1Σ
I Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)
log(π1)−1
2xT (σ2In)
−1x < −12xT (Σ+σ2In)
−1x+log π2+constant
I Matrix inversion lemma (A invertible)(A+B)−1 = A−1 −A−1(I +BA−1)−1BA−1
I (σ2In)−1 − (Σ + σ2In)−1 = 1σ2 (σ2In + Σ)
−1Σ
I Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
Detecting a correlated signal
I [x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)I [x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)
log(π1)−1
2xT (σ2In)
−1x < −12xT (Σ+σ2In)
−1x+log π2+constant
I Matrix inversion lemma (A invertible)(A+B)−1 = A−1 −A−1(I +BA−1)−1BA−1
I (σ2In)−1 − (Σ + σ2In)−1 = 1σ2 (σ2In + Σ)
−1Σ
I Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
Detecting a correlated signal
[x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)[x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
I Eigenvalue Decomposition: Σ = UΛUT =∑n
i=1 λiuiuTi
xT (σ2In + UΛUT )−1UΛUTx
= xTU(σ2In + Λ)−1ΛUTx
I Change of basis: y = UTx
I yT (σ2In + Λ)−1Λy =∑N−1
n=0λn
λn+σ2|y[n]|2
yfiltered[n] =√
λnλn+σ2
y[n]
Detecting a correlated signal
[x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)[x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
I Eigenvalue Decomposition: Σ = UΛUT =∑n
i=1 λiuiuTi
xT (σ2In + UΛUT )−1UΛUTx
= xTU(σ2In + Λ)−1ΛUTx
I Change of basis: y = UTx
I yT (σ2In + Λ)−1Λy =∑N−1
n=0λn
λn+σ2|y[n]|2
yfiltered[n] =√
λnλn+σ2
y[n]
Detecting a correlated signal
[x1, ...xN ] ∼ N(0, σ2In) for y = 1 (noise)[x1, ...xN ] ∼ N(0,Σ) +N(0, σ2In) for y = 2 (signal+noise)Classify as signal whenxT (σ2In + Σ)
−1Σx > σ2 log π1π2 + constant
I Eigenvalue Decomposition: Σ = UΛUT =∑n
i=1 λiuiuTi
xT (σ2In + UΛUT )−1UΛUTx
= xTU(σ2In + Λ)−1ΛUTx
I Change of basis: y = UTx
I yT (σ2In + Λ)−1Λy =∑N−1
n=0λn
λn+σ2|y[n]|2
yfiltered[n] =√
λnλn+σ2
y[n]
Optimality of filtering for circulant covariances
xTU(σ2In + Λ)−1ΛUTx > σ2 log
π1π2
I Symmetric circulant covariances
Σ =
s[0] s[1] s[2] . . . s[2] s[1]s[1] s[0] s[1] . . . s[3] s[2]
......
.... . .
......
s[2] s[3] s[4] . . . s[0] s[1]s[1] s[2] s[3] . . . s[1] s[0]
I Σij = ΣjiI s[N − n] = s[n]
Fourier basis: wk[n] =1√Nej
2πNnk
wHk Σ = S[k]wHk
Optimality of filtering for circulant covariances
xTU(σ2In + Λ)−1ΛUTx > σ2 log
π1π2
I Symmetric circulant covariances
Σ =
s[0] s[1] s[2] . . . s[2] s[1]s[1] s[0] s[1] . . . s[3] s[2]
......
.... . .
......
s[2] s[3] s[4] . . . s[0] s[1]s[1] s[2] s[3] . . . s[1] s[0]
I Σij = ΣjiI s[N − n] = s[n]
Fourier basis: wk[n] =1√Nej
2πNnk
wHk Σ = S[k]wHk