ESE 524ESE 524 Detection and Estimation Theory · 2009-02-03 · φ() ln ()ss=Φ function equals...
Transcript of ESE 524ESE 524 Detection and Estimation Theory · 2009-02-03 · φ() ln ()ss=Φ function equals...
ESE 524ESE 524Detection and Estimation Theoryy
Joseph A. O’SullivanJoseph A. O SullivanSamuel C. Sachs Professor
Electronic Systems and Signals Research Laboratoryl l dElectrical and Systems Engineering
Washington University211 Urbauer Hall
314-935-4173 (Lynda Markham answers)[email protected]
J. A. O'S. ESE 524, Lecture 7, 02/03/09 11
A tAnnouncements Problem Set 1 is due today Problem Set 1 is due today Problem Set 2 is posted, due 02/13/09 No class 02/10 02/12 and 02/19 No class 02/10, 02/12, and 02/19 Make up on Fridays: 02/20, 02/27 Class website: Class website:
http://classes.engineering.wustl.edu/ese524/
Questions? Questions?
J. A. O'S. ESE 524, Lecture 7, 02/03/09 22
Last Class: Information Rate Functions d P f B dand Performance Bounds
Motivation Chernoff Bound Binary Hypothesis Testing Tilted Distributions Relative Entropy Information Rate Functions Information Rate Functions Additivity of Information Examples: Examples:
Gaussian same covariance, different means Poisson different means,
G i diff t iJ. A. O'S. ESE 524, Lecture 7, 02/03/09 33
Gaussian same mean, different variances
Last Class: Information Rate Functions:M i iMotivation Receiver Operating Characteristic is often not
easily computable Performance is often a function of a few
parameters: SNR N other statisticsparameters: SNR, N, other statistics Bounds on performance may be more easily
computed The bounds below guarantee a level of
performance for optimal testsB d ti l i th t f ti Bounds are exponential in the rate function
Rate function is additive in information (proportional to N)
J. A. O'S. ESE 524, Lecture 7, 02/03/09 44
(proportional to N)
Last Class: Ch r ff B d
( ) ( )sx sXxs E e p X e dX
∞
−∞
Φ = = Chernoff Bound
Let x be a random variable with probability density
( ) ( ) sXx
A
s p X e dX∞
∞
Φ ≥ p y y
function px(X). Define the moment generating function and the log-moment generating function for real-
( ) , for all 0
ln ( ) ln ( ), for all 0
sAx
A
e p X dX s
P x A sA s s
∞
≥ ≥
≥ < − + Φ ≥
g gvalued s.
The Chernoff bound is a bound on the tail probability.
The probability of a rare event
[ ]0
ln ( ) ln ( ), for all 0( ) sup ln ( )
ln ( ) ( )s
P x A sA s sI A sA s
P x A I A≥
≥ < + Φ ≥= − Φ
≥ < − The probability of a rare event is closely approximated by the Chernoff bound.
The (information) rate f n tion eq als the Legend e
ln ( ) ( )( ) ln ( )
P x A I As sφ
≥ <= Φ
function equals the Legendre-Fenchel transform of the log-moment generating function.
The rate function can be used
( ) ln , for
ln ( ) inf ( )
T NE e
P I
φ
∈
= ∈ − ∈ >
s x
X
s X
x XA
A
J. A. O'S. ESE 524, Lecture 7, 02/03/09 55
to bound the probability of any open set in N dimensions. ( )I
∈X
XA
sup ln ( )T = − Φ s
s X s
Chernoff Bound SummaryFor random variables:For random variables:
( ) ( )sx sXxs E e p X e dX
∞
Φ = =
[ ]( ) ln ( )( ) ( )
s sI A Aφ
φ
−∞
= Φ
[ ]0
( ) sup ( )
ln ( ) ( )s
I A sA s
P x A I A
φ≥
= −
≥ < −For random vectors:
( ) ln , forT NE eφ = ∈
s xs X ( ) ln , for
ln ( ) inf ( )
E e
P I
φ
∈
∈ − ∈ >
X
s X
x XA
A
J. A. O'S. ESE 524, Lecture 7, 02/03/09 6
( ) sup ln ( )TI = − Φ s
X s X s
Binary Hypothesis T i
1
0
( | )( ) ln( | )
p Hlp H
= rrr
Testing Probabilities of miss and
( )0 0 0( ) | ( | )
( ( ) | )
sl sLls E e H p L H e dL
P P l Hγ
∞
−∞
Φ = =
= ≥
r
rfalse alarm are tail probabilities.
Upper bound them using [ ]
0
0 0
0 0
( ( ) | )ln ( ( ) | ) ( )
( ) sup ln ( )
FP P l HP l H II s s
γγ γ
γ γ
= ≥≥ < −= − Φ
rr
Chernoff bound information rate functions given hypotheses 1 and 0.
0
0 0( ) ln ( )s
s sφ≥
= Φ
Performance is better than computed point: optimal (PD,PF) is “up and to the l ft” f i t
1
1
( ( ) | )ln ( )
M
M
P P l HP I
γγ
∞
= ≤< −
r
left” of point ( )1 1 1
1
( ) | ( | )
( )
l LlE e H p L H e dL
I
σ σσ
γ
∞
−∞
Φ = =
=
r
[ ]1sup ln ( )σγ σ− Φ
J. A. O'S. ESE 524, Lecture 7, 02/03/09 77
1( )I γ [ ]10
1 1
sup ln ( )
( ) ln ( )σ
σγ σ
φ σ σ≤
Φ
= Φ
A idAside Tail probabilities can be
i h id
1( ( ) | )ln ( )
MP P l HP I
γγ
= ≤<
r
on either side. For the bound, the
variable in the (log-) moment generating
1
( )1 1 1
ln ( )
( ) | ( | )
M
l Ll
P I
E e H p L H e dLσ σ
γ
σ∞
−∞
< −
Φ = = r
moment generating function is a dummy variable.
The variables in the log-1( | ) L
lp L H e dLγ
σ
∞
−∞
≥ g
moment generating functions under the two hypotheses are different.
[ ]
1( | ) , for all 0
( ) l ( )
le p L H dL
I
γσγ σ
−∞
≥ ≤
Φ
[ ]1 1
0( ) sup ln ( )I
σγ σγ σ
≤= − Φ
J. A. O'S. ESE 524, Lecture 7, 02/03/09 88
Til d Di ib i
1
0
( )
( | )( ) ln( | )
l
p Hlp H
=
rrr
Tilted Distributions The moment
i f i
1
0
( )0 0
( | )ln( | )
0
( ) |
( | )
sl
p Hsp H
s E e H
p H e d ∞
Φ =
=
r
RRR R
generating functions are for the log-likelihood ratio given the hypotheses. [ ] [ ]
0
10 1
( | )
( | ) ( | )s s
p
p H p H d
−∞∞
−=
R R Rthe hypotheses.
If the supremum defining the rate function is achieved at
[ ]0 00
( ) sup ln ( )
( *)s
I s s
d s
γ γ−∞
≥= − Φ
Φan interior point, then the derivative is zero.
Tilt the original distribution until the
0
00
( *)ln ( *)
( *)( *)
d sd dssds s
d s
γΦ
= Φ =Φ
Φdistribution until the mean of the log-likelihood function equals the threshold.
[ ]0
*0
( |
( *)( )
( *) s
d sds E l
s
Φ=
ΦR
r
)H
J. A. O'S. ESE 524, Lecture 7, 02/03/09 99
q( |ln
0( | )( )
ps
sp H ep =
R
RR
1
0
)( | )
0 ( )
Hp H
s
Φ
R
Relationship toR l i E ( || ) log pD p q p= Relative Entropy Relative entropy is a
i i f
( || ) log
ln 1 1/ log 1 0
D p q pq
p qx x p p pq
=
≥ − ≥ − =
quantitative measure of information, given in bits or nats.
Information rate
( || ) 0( )( || ) ( ) ln s
pqD p q
pD p p p d
≥
= RR R Information rate
function equals the relative entropy between the tilted pdf
1
0
00
( | )ln( | )
0
( || ) ( ) ln( | )
( | )( )
s s
p Hsp H
D p p p dp H
p H e
= RR
R RR
RRand the pdf under the hypothesis.
Duality of exponential family and its mean:
[ ]
0
0
0 0
( | )( )( )
( || ) ( ) ln ( )
s
s s
p eps
D p p E sl s
=Φ
= − Φ
R
rfamily and its mean: mean determines parameter; parameter determines mean.
[ ]0 0ln ( ) ( )
( )*
s s
s s
s s IE l
γ γγ
= − Φ ==r
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1010
* ss sγ γ→ →
Relationship to Relative E P 2
( )1 1( ) |lE e Hσσ Φ =
r
Entropy Part 2 Relative entropy
[ ]1 1
1 10
1
( ) |
( ) sup ln ( )
( *)
I
dσ
γ σγ σ
σ≤
= − Φ
Φbetween the tilted density and the density under hypothesis 1 is i l l d h
1
11
( )ln ( *)
( *)
dd d
d
σσγ σ
σ σ
Φ= Φ =
Φ
simply related to that under hypothesis 0. 1
0
( | )ln( | )
11
( | )( )( )
p Hp Hp H ep
σ
σ
+ =Φ
RRRR
11
( | )
( )( || ) ( ) ln( | )
ss s
p H
pD p p p dp H
= R
RR RR 1
0
1
( | )( 1)ln( | )
0
( )
( | )p Hp Hp H e
σ
σ
σ
+
Φ
=
RRR
1
0
( | )ln( | )
0
0
( | )( )( )
p Hsp H
sp H ep
s
=Φ
RRRR
[ ]
0
1
( 1)( *)
( )d
dE l
σσ
σ
=Φ +
Φ=r
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1111
[ ]1 0
0 0
( || ) ( 1) ( ) ln ( )ln ( ) ( )
s s
s s s s
D p p E s l ss s Iγ γ γ γ
= − − Φ= − + − Φ = − +
r [ ]* 11
( )( *)
E lσ σ+ =Φ
r
Summary of Simple R l i hi ( )( ) |lE Hσ Φ rRelationships Find I0 as a function
( )1 1
11 0
( ) |
( | )( ) exp ( 1) ln( | )
|
lE e H
p HE HH
σσ
σ σ
Φ =
Φ = +
r
RRof the threshold;
subtract threshold to get I1.
1 00
1 0
( | )
( ) ( 1)( ) ( )
|p H
I Iσ σγ γ γ
Φ = Φ +
= +
R
Plot bounds Vary parameters to
gain a better
0 1
1 0
0
( ) ( )(0) (0) 1
(0) is a convex function
I Iγ γ γ= +Φ = Φ =
Φunderstanding.
Information is additive. N i.i.d.
0 0 1 0 1
1 1 0 1
( || ) (0, ( || )) is a point( || ) ( ( ||D p p D p p
D p p D p pγγ
= − = 0 ),0) is a point
0 ( ( || ) ( || )) i i tD D(N I0, N I1) 0 0
0
0 ( ( || ), ( || )) is a pointln ( ) 0
s sD p p D p pd s
ds
γ = Φ =
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1212
Information Rate Functions and P f B dPerformance Bounds Example: Exponential Distributionsp p Additivity of Information Examples: Examples:
Gaussian same covariance, different means Poisson different means, Gaussian same mean, different variances
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1313
Summary of Simple R l i hiRelationships Find I0 as a
( )1 1
1 0
( ) |
( ) ( 1)
lE e Hσσ
σ σ
Φ = Φ = Φ +
r
0function of the threshold; subtract
1 0
0 1
1 0
( ) ( )( ) ( )(0) (0) 1
I Iγ γ γ= +Φ = Φ =
subtract threshold to get I1.
0
0 0 1 0 1
(0) is a convex function( || ) (0, ( || )) is a point
( || ) ( ( || ) 0) is a pointD p p D p p
D p p D p pγγ
Φ= − =
Plot bounds Vary
parameters to
1 1 0 1 0
0 0
( || ) ( ( || ),0) is a point0 ( ( || ), ( || )) is a point
ls s
D p p D p pD p p D p p
d
γγ
=
=
0n ( ) 0sΦ =
parameters to gain a better understanding.
0
0 1ds
s
=
≤ ≤
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1414
E pl E p ti l Di trib tiExample: Exponential Distributions0 : , 0, 1, 2,... , i.i.d.R
iR
H r e R i Nλ
α
λ − ≥ =
1 : , 0, 1, 2,... , i.i.d.
( ) ( ) l
Ri
N
H r e R i N
l
ααλ α
αλ
− ≥ =>
1
( )0 0
( ) ( ) ln
( ) |
ii
sl
l R
s E e H
αλ αλ=
= − + Φ =
r
R
[ ] [ ]10 1( | ) ( | )s s
N
p H p H d∞
−
−∞∞
= R R R
[ ] [ ]10 1
1
( | ) ( | )
( ) ( )
Ns s
i i ii
N
p R H p R H dR
s s
∞−
= −∞
=
Φ = Φ
∏
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1515
0 0( ) ( )s s Φ = Φ
0 0
1 ((1 ) )
( ) ( )
( )
N
s s R s s
s s
s e dRλ αλ α∞
− − − +
Φ = Φ
Φ = Ex: Exponential
00
1
( )
((1 ) )
s s
s e dR
s s
λ α
λ αλ α
−
Φ =
=− +
Distributions
Compute the [ ]0 0
0
0
((1 ) )( ) sup ln ( )
( *)s
s sI s s
d sd d
λ αγ γ
≥
+= − Φ
Φ
pmoment generating functions and
[ ]
00
ln ( *)( *)
( ) ln , one component((1 ) )s s
d dssds s
E l
γ
λ α αγλ λ
= Φ =Φ−= = +
+r
functions and information rate functions. [ ]
( )0
((1 ) )
( ) ln ln((1 ) )
s s
s
s ss
I s ss s
λ α λλ α αγ α
λ α λ
− +−
= + −− + Plot parametric
curves: fns of s (1 )s− − ln ln((1 ) )
1 ln(1 ) (1 )
s s
s s s s
λ λ αλ λλ α λ α
+ − +
= − − − + − +
curves: fns of s γs, I0(γs),
J. A. O'S. ESE 524, Lecture 7, 02/03/09 16161
( ) ( )
( ) 1 ln(1 ) (1 )sI
s s s sα αγλ α λ α
= − − − + − +
I1(γs) = -γs + I0(γs),
E pl ti dExample continued We know exact performance
Th th h ld l t d
( )1
| | , 0( 1)!
1 1
i
i
LN
l H i Ni
L ep L H LN
μ
μ
−−
= ≥−
The thresholds are related The log-probabilities can be
compared to bounds ( )1
0 1
| 1'
1 1,
|D l HP p L H dLγ
μ μλ α∞
= =
=
( )0| 0
'
1
|
1
F l H
N
P p L H dL
γ
γ
∞
= 1( ) ( ) ln
( ) ln
N
ii
N
l R
R N
αλ αλ
αλ α
=
= − + +
R
( )
( )
1'
01
''
1 '!
1 '' , '' '
Nk
Fk
Nk
F
P ek
P ek
λγ
γ
λγ
γ γ λγ
−−
=
−−
=
= =
1
1
( ) ln
1 ( ) ln
ii
N
ii
R N
R l N
λ αλ
αλ α λ
=
=
= − + = − −
R ( )0
1
0
,!
'' 1 ''exp!
Fk
kN
Dk
k
Pk
γ γ γ
αγ αγλ λ
=
−
=
= −
1
1' ln
'' '
i
N αγ γλ α λ
γ λγ
= = − −
=
J. A. O'S. ESE 524, Lecture 7, 02/03/09 1717( )1 1ln '' ln lnF FNP P
N N N
γ λγλ γ αγ
λ α λ = − −
Matlab Code1
ROC in blue, Bound in redfunction [pf,pd]=gammainforate(N,lambda,alpha)gamma=0:N/200:(N+5);
0.6
0.8
1
PD
k=1:(N-1);kfac=factorial(k);gamexp=[gamma];if N>3,
for kk=2:N-1;gamexp=[gamexp; gamma.^kk];
end
0 2 0 4 0 6 0 80
0.2
0.4
Pendendpf=exp(-gamma).*(1+(1./kfac)*gamexp);gd=alpha*gamma/lambda;gamexp=[gd];if N>3,
for kk=2:N-1;0.2 0.4 0.6 0.8
PF
;gamexp=[gamexp; gd.^kk];
endendpd=exp(-gd).*(1+(1./kfac)*gamexp);s=0:0.01:1;info0=lambda./((1-s)*lambda+s*alpha)-1-log(lambda./((1-s)*lambda+s*alpha));gammas (lambda alpha) /((1 s)*lambda+s*alpha)+log(alpha/lambda);
100
ROC in blue, Bound in red
0 25
0.3
I0 in blue, I1 in red
gammas=(lambda-alpha)./((1-s)*lambda+s*alpha)+log(alpha/lambda);info1=alpha./((1-s)*lambda+s*alpha)-1-log(alpha./((1-s)*lambda+s*alpha));
10-1
PD
0 1
0.15
0.2
0.25
orm
atio
n R
ates alpha=.5
lambda=1N=10, 100
J. A. O'S. ESE 524, Lecture 7, 02/03/09 18
10-10
10-5
100
10-2
PF
-0.1 0 0.1 0.2 0.30
0.05
0.1
Threshold γ
Info
M tl b C d 0.8
1ROC in blue, Bound in red
Matlab Codefigure1 = figure;axes('Parent',figure1,'PlotBoxAspectRatio',[1 1.049 2.098],'LineWidth',1.5,...
0 2
0.4
0.6
PD
( , g , p ,[ ], , ,'FontSize',16,'DataAspectRatio',[1 1 1]);
ylim([0 1]);box('on'); hold('all');% Create plotplot(pf,pd,'LineWidth',1.5,'Color',[0 0 1]);
I in blue I in red
0.2 0.4 0.6 0.80
0.2
PF
plot(exp(-N*info0),1-exp(-N*info1),'LineWidth',1.5,'Color',[1 0 0]);xlabel('P_F','FontSize',16);ylabel('P_D','FontSize',16);title('ROC in blue, Bound in red','FontSize',16);
figure1 = figure;0.2
0.25
0.3
Rat
es
I0 in blue, I1 in red
figure1 = figure;axes('Parent',figure1,'LineWidth',1.5,'FontSize',16);box('on'); hold('all');% Create plotplot(gammas,info0,'LineWidth',1.5,'Color',[0 0 1]);
0.05
0.1
0.15
Info
rmat
ion
p (g , , , , ,[ ]);plot(gammas,info1,'LineWidth',1.5,'Color',[1 0 0]);axis tightxlabel('Threshold \gamma','FontSize',16);ylabel('Information Rates','FontSize',16);title('I_0 in blue, I_1 in red','FontSize',16);
-0.1 0 0.1 0.2 0.30
Threshold γ
J. A. O'S. ESE 524, Lecture 7, 02/03/09 19
Relationship of Bounds to L Pr b bilitLog-Probabilitygammas1=gamma*((lambda-alpha)/N*lambda)+log(alpha/lambda);l t( 1 l ( f)/N 'Li Width' 1 5 'C l ' [0 1 0])
( )1 1ln '' ln lnF FNP P λ γ αγ = −
I0 in blue, I1 in red
plot(gammas1,-log(pf)/N,'LineWidth',1.5,'Color',[0 1 0]);plot(gammas1,-log(1-pd)/N,'LineWidth',1.5,'Color',[0 1 0]);
( )ln ln ln
'' ln
F FP PN N N
N N
γλ α λ
γ λ α α γλ λ
− − + =
0.3
0.4
tes
( )01 ln lnF s sN
NP IN
λ αγ γλ α λ →∞
− → − − 0.2or
mat
ion
Rat
1 0 5 0 0 50
0.1Info
-1 -0.5 0 0.5Threshold γ
J. A. O'S. ESE 524, Lecture 7, 02/03/09 20
I f r ti i AdditiInformation is Additive Relative entropy of product distributions Relative entropy of product distributions
equals the sum of the relative entropies. Log-moment generating functions add.og o e t ge e at g u ct o s add Information rate functions add. Information is additive. Information is additive.
N i.i.d. (N I0, N I1) Exponential error bounds. Exponential error bounds.
J. A. O'S. ESE 524, Lecture 7, 02/03/09 2121
I f r ti i AdditiInformation is Additive Relative entropy of product distributions Relative entropy of product distributions
equals the sum of the relative entropies. Log-moment generating functions add.og o e t ge e at g u ct o s add Information rate functions add. Information is additive. Information is additive.
N i.i.d. (N I0, N I1) Exponential error bounds. Exponential error bounds.
J. A. O'S. ESE 524, Lecture 7, 02/03/09 2222