Introduction to Detection and Estimation, and …ipr/ipr2005/data/material...Introduction to...

Post on 09-Jun-2020

15 views 0 download

Transcript of Introduction to Detection and Estimation, and …ipr/ipr2005/data/material...Introduction to...

Introduction to Detection and Estimation, and

Mathematical Notations

Mathematical Methods and Algorithms for Signal Processing, Ch10

2002/12/10 DSP Group Meeting

Outline

♦ Detection and estimation theory♦ Notational conventions♦ Conditional expectations♦ Transforms of random variables♦ Sufficient statistics♦ Exponential families

Detection and Estimation

♦ Two examples– Detection

Letwhere The signal is observed in noise, where n(t) is a random process.An example of detection problem is the choice between the two values of A, given y(t).

),,0[,)2cos()( TttfAtx c ∈= π

}1,1{ −∈A)()()( tntxty +=

Making a choice over some countable (often finite) set of options

Detection and Estimation (cont.)

– EstimationThe signal ismeasured at a receiver, where is an unknown phase. An example of an estimation problem is the determination of the phase, based upon observation of the signal over some interval of time.

)()2cos()()( tntftxty c ++= θπθ

Making a choice over a continuum of options

Game Theory♦ The component of statistical theory that we are

concerned with fits in a larger mathematical construct –that of game theory.

♦ Definition of a mathematical game– A two-person, zero-sum mathematical game, which we will

refer to as a game, consists of three basic components:• A nonempty set, , of possible actions available to Player 1• A nonempty set, , of possible actions available to Player 2• A loss function, ,representing the loss incurred by Player 1.

RL a21: θθ ×

Property of a Game

♦ In a two-person game, each “person” (either of whom may be Nature)

♦ Each player attempts to make a choice to help them achieve their goal (e.g. of “winning”)

♦ In a zero-sum game, one person‘s loss is another’s gain

An Odd-or-Even Game

♦ Two players simultaneously put up either 1 or 2 fingers.♦ Player 1 wins if the sum of digits showing is odd.♦ Player 2 wins if the sum of digits showing is even.♦ The winner receives in dollars the sum of digits showing.

-2 33 -4

1 2

12

1θ 2θ

4)2,2(3)1,2(3)2,1(

2)1,1(}2,1{}2,1{

2

1

−===

−===

LLLLθθ

A Statistical Game♦ An important class of games are those in which one

player is able to obtain information relating to the choice made by their opponent, before committing their own choices. But these observation data are subject to error.

♦ Decision and estimation theory can be viewed as a two-person game between Nature and a decision-making agent.– The choices available to nature are represented as elements of

a set – The decisions that the agent makes are represented as a

element of a set – A loss function L– The observed samples of random variable X, defined over a

sample space, whose distribution depends on

θ

θ

Definition of a Statistical Game

♦ is a nonempty set of possible states of nature, or parameter. is sometimes referred to as the parameter space. An element of is denoted as

♦ is a nonempty set of possible decisions available to the agent, called the decision space. An element of is represented as

♦ is a loss function or cost function

kR∈ΘΘ

Θθ

∆ δ

RL a∆×Θ:

Definition of a Statistical Game (cont.)

♦ is a random variable or vector, with cumulative distribution function . The distribution of X is governed by the parameters

♦ is a decision rule, also termed a strategy, decision function, or test, that provides the coupling between the observations and the decisions

1,: ≥nRX naχ

)|( θxFX

Θ∈θ

∆aχφ :

Elements of the Statistical Decision Game

Randomization

♦ Non-randomized decision rule– The decision rule is a single function mapping

observations into the decision space.♦ Randomized decision rule

– Let D denote the space of all nonrandomized decision rules. is a probability distribution thatspecifies the probability of selecting the elements of D.D* is the space of all randomized decision rule.

φ

]1,0[: →Dϕ

Special Cases

♦ The binary hypothesis testing problem ♦ . Corresponding to each decision is a

hypothesis. – By choosing ,the agent accepts hypothesis ,

thus rejecting hypothesis .– By choosing ,the agent accepts hypothesis ,

thus rejecting hypothesis .

},{ 10 δδ=∆

0δ 0H1H

1δ0H

1H

Special Cases (cont.)

♦ Example: a radar signal is examined at the receiver to determine whether a target is present

– H0: no target present, – H1: target present,

♦ H0 is termed the null hypothesis, and H1 is the alternative hypothesis

NX += θ

0θθ ≤

0θθ >

Special Cases (cont.)

♦ Four possible outcomes:

Null Hypothesis is True

Null Hypothesis is False

Reject Null Hypothesis

Fail to RejectNull Hypothesis

False Alarm(Type I Error)

Missed Detection(Type II Error)

Correct Detection

Correct Detection

Special Cases (cont.)

♦ Multiple decision problems, also named multiple hypothesis testing problems

♦ Point estimation of a real parameter, – Example

3},,...,,{ 21 ≥=∆ MMδδδ

ℜ=∆

2)(),( δθδθ

θ

−=

=

cLR

Notational Conventions

♦ indicates a probability distribution for the random variable X. can be viewed as a parameter, or regarded as a random variable.

♦ is also denoted by the abbreviated form

♦ denotes the probability of the events s under the condition that is the true parameter

)|( θxFX

θ

)|( θxFX

)(xFθ

∫= dxxxfXE )(][ θθ

θ][sPθ

Populations and Statistics♦ The problem of estimations is to

– Obtain a set of data (observations)– Use this information to fashion the guess for the value of an

unknown parameter.– One way to achieve this goal is random sampling.

• Sampling: repeat a given experiment a number of times;• Let X be a random variable known as the population random variable• The ith repetition involves the creation of a copy of the population on

which Xi is defined.• The distribution of Xi is the same as the distribution of X • Xi are called sample random variables, or sometimes, the sample value

of X.• Sample with replacement

Populations and Statistics (cont.)

♦ Statistics– A function of the sample values of a random value X is

called a statistic of X.– Example:

∑=

=n

iiX

nX

1

1

Conditional Expectation

♦ Continuous distributions

♦ Discrete distributions∫= dxyxxfYXE YX )|()|( |

∑= dxyxxfYXE YX )|()|( |

)(),()|(

)()()|(

)()()(),(),()(

}{:}{:

| ygyxfyxf

BPBAPBAP

ygyYPBPyxfyYxXPBAP

yYBxXA

YX =

∩=

=======∩

==

Conditional Expectation (cont.)

♦ Properties of condition expectations– E(X|Y)=EX if X and Y are independent– E(X|Y) is a function of Y– E[E(X|Y)]=EX– E[g(Y)X|Y]=g(y)E(X|Y) where g(.) is a function– E[c|Y]=c for any constant of c– E[g(Y)|Y]=g(Y)– E[(cX+dZ)|Y]=cE[X|Y]+dE[Z|Y] for any constants c and

d

Transformations of Random Variables

♦ Theorem:– Let X and Y be continuous random variables with

Y=g(X). Suppose g is one-to-one, and both g and its inverse function, g-1, are continuously differentiable. Then

|)(|)]([)(1

1

dyydgygfyf XY

−−=

Transformations of Random Variables (cont.)

♦ Proof:Since g is one-to-one, it is either increasing or decreasing; suppose it is increasing. Let a and b be real numbers such that a<b; we have

))](),(([)],()([)],([ 11 bgagXPbaXgPbaYP −−∈=∈=∈

(*)0)()]([)(

|)(|)]([

)())](),(([

)()],([

11

11

)(

)(

111

1

=

=

=∈

=∈

−−

−−

−−−

dydy

ydgygfdyyf

dydy

ydgygf

dxxfbgagXP

dyyfbaYP

b

a XY

b

a X

bg

ag X

b

a Y

Transformations of Random Variables (cont.)

If the equation stated by the theorem is not true, there must exist some y* such that equality does not hold. By the continuity of the density functions fx and fy, (*) must be non-zero for some open interval containing y*. This yields a contradiction.Thus, the theorem is true if g is increasing.For the case that g is decreasing, the change of variable will also reverse the limits as well as the sign of the slope.Thus, the absolute value is required.

Transformations of Random Variables (cont.)

♦ Example:Suppose that a random variable X has the density

2

2

2

2

2

22

22

21

22

)2(2

1)(

)()(2

1)(

σσ

σ

σσ

σ

rr

R

x

X

errerf

RRgXxgR

exf

−−

==

=⇒==

=

Sufficient Statistics

♦ “How much information must be retained from sample data in order to make valid decisions?”

♦ Let X be a random variable whose distribution depends on a parameter . A real-valued function T of X is said to be sufficient for if the conditional distribution of X, given T=t, is independent of . That is, T is sufficient for if

θθ

θ θ

)|(),|( || txFtxF TXTX =θ

Sufficient Statistics (cont.)♦ Example: a coin with unknown probability of heads p is

independently tossed n times.– Let Xi be zero if the outcome of the ith toss is tails and one if

the outcome is heads– Let T denote the total number of heads,– X1,…,Xn are i.i.d. with common pmf

– We must prove that , The conditional probability of {X1,…,Xn}, given T=t, is independent of p.

ii xxiiix pppxXPpxf −−=== 1)1()|()|(

)|()|,,...,(),|,...( 1

1|,...,1 ptPptTxxPptxxf n

nTXX n

==

∑=

=n

iiXT

1

Sufficient Statistics (cont.)

=⇒

=

=−=

∑−∑=

−−=

==

==

−−

tn

ptxxf

pptnptTP

pppp

ppppptTxxP

ptTPptTxxPptxxf

nTXX

tnt

tnt

xx

xxxxn

nnTXX

n

ii

nn

n

1),|,...(

)1(

)|()1(

)1(

)1(....)1(

)|,,...,()|(

)|,,...,(),|,...(

1|,...,

)1(

111

11|,...,

1

11

1

Independent of p

Factorization Theorem

♦ Factorization Theorem– A convenient mechanism for testing the sufficiency of

a statistic.– Let be a discrete random vector

whose pmf depends on a parameter .The statistic is sufficient for if and only if the pmf factors into a production of a function of and and a function of alone; that is:

TnXXXX ],...,,[ 21=

)|( θxf x Θ∈θ

)(xtT = θ)(xt

θ x

)()),(()|( xaxtbxf X θθ =

Factorization Theorem (cont.)

♦ Proof: (à)

=

=otherwise

xtTxfxtxf X

TX 0)()|(

)|)(,(,

θθ

)|)(())(|()|)(()),(|(

)|)(,()|(

,

,

,

θ

θθ

θθ

xtfxtxfxtfxtxf

xtxfxf

TTX

TTX

TXX

=

=

=

)|)(()),(())(|()( |

θθ xtfxtbxtxfxa

T

TX

==

Factorization Theorem (cont.)

♦ Proof: (ß))()),(()|( xaxtbxf X θθ =

0)|( 0 >θtfTChoose such that for some .

0tΘ∈θ

)|()|,(

),|(0

0,0| θ

θθ

tftxf

txfT

TXTX =

≠=

=0

00, )(0

)()|()|,(

txttxtxf

txf XTX

θθ

=

=

=

=

0

0

)(:0

)(:0

)(),(

)|()|(

txtx

txtxXT

xatb

xftf

θ

θθ

=

=∑

=

0

)'(:'

0

0| )()'(

)()(0

),|(

0

txtxa

xatxt

txf

txtx

TX θ

Independent of θ

Examples of Sufficient Statistics (I)

♦ Bernoulli random variables– Let X=[X1,X2,…Xn]T be a random vector, where Xi are

independent Bernoulli random variables.

=−=

∑−∑=

−=

=

itnt

xnx

n

i

xxX

xtpppp

ppxf

ii

ii

,)1()1(

)1()|(1

tnt pptbxa

−−=

=

)1(),(1)(

θ

Examples of Sufficient Statistics (II)

♦ Gaussian random variables– Let X=[X1,X2,…Xn]T be a random vector, where each

Xi is from– The joint pdf of these random variables are

• Only is unknown • Only is unknown • and are unknown

),( 2σµN

−−= ∑

=

−− n

ii

n

X xxf1

21222 )()2(exp)2(),|( µσπσσµ

µ2σ

µ 2σ

Examples of Sufficient Statistics (III)

♦ X1,…,Xn are drawn from the uniform distribution over [a,b]– The joint density of X1,…,Xn is

∏=

−−=n

iiba

nX xIabbaxf

1],[ )()(),|(

∉∈

=AxifAxif

xI A 01

)(

)(max)(min)(),|( ],(),[ ibian

X xIxIabbaxf −∞∞−−=

• a is unknown• b is unknown• a and b is unknown

Minimal Sufficient Statistics

♦ Sufficient statistics leads to economy in the design of algorithms to compute estimates, and simplify the requirement for data acquisition and storage, since only the sufficient statistic needs to be retained for purposes of estimation.

♦ A minimal sufficient statistic?– An example

),(1

13 n

n

ii XXT ∑

=

=),...,( 11 nXXT =XT =2

Definition of Minimal Sufficient Statistics

♦ S sufficient statistic for a parameter that is a function of all other sufficient statistic for is said to be a minimal sufficient statistic for

♦ Questions about minimal sufficient statistics:– Does one always exist?– If so, is it unique?– If it exists, how do I find it?

Θ∈θθ

θ

Complete Sufficient Statistics

♦ Let T be a sufficient statistic for a parameter , and let be any real-valued function of T. T is said to be complete if for all

θ)(tω

0))(( =TE ωθ

θ

Θ∈∀== θωθ 1]0)([ TPimplies that

An Example for Complete Sufficient Statistics

♦ Let X1,…,Xn be a sample from the uniform distribution over the interval [0, ],

we will compute the density of T. θ 0>θ

jjxT max=

<

≤≤

=≤≤=≤

t

ttt

txtxPtTP n

n

n

θ

θθθθ

1

0

00

),...,(][ 1

dtttn

dttftTE

nn

T

∫∫

−−=

θ

ωθ

θωω

0

11 )(

)|()()(

Minimal Sufficient Statistics

♦ Theorem: a complete sufficient statistic for a parameter is minimalθ

Exponential Families♦ A family of distributions with pmf or pdf is said

to be a k-parameter exponential family if has the form

In this definition, may be either a scalar or a vector ofparameters

♦ For a k-parameter exponential family, the sufficient statistic

is complete, and therefore a minimal sufficient statistic

)|( θxf X

)|( θxf X

∑=

=k

iiiX xtxacxf

1

)]()(exp[)()()|( θπθθ

θ

Tn

jjk

n

jj XtXtT

= ∑∑

== 111 )(),...,(

Examples of Exponential Family

)]}1log([logexp{)1(

)1()|(

θθθ

θθθ

−−

−=

= −

xxm

xm

Xf

m

xmxX

•Binomial distribution

•Normal distribution

+

=

−−=

xx

xxfX

22

22

2

2

2

21exp

2exp

21

2)(exp

21)(

σµ

σσµ

σπ

σµ

σπ