Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the...

23

Transcript of Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the...

Page 1: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.
Page 2: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Statistical Decision Theory

Abraham Wald (1902 - 1950)

• Wald’s test

• Rigorous proof of the consistency of MLE

“Note on the consistency of the maximum likelihood estimate”, Ann. Math. Statist., 20, 595-601.

Page 3: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Statistical Decision Theory

and Hypothesis Testing.

A major use of statistical inference is its application to decision making

under uncertainty, say Parameter estimation

Unlike classical statistics which is only directed towards the use of sampling information in making inferences about unknown numerical quantities, an attempt in decision theory is made to combine the sampling information with a knowledge of the consequences of our decisions.

Page 4: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Three elements in SDT

• State of Nature: θ, some unknown quantities, say parameters.

• Decision Space D : A space of all possible values of decisions/actions/rules/estimators

• Loss function L(θ, d(X)): – a non-negative function on Θ x D. – a measure of how much we lose by choosing action d when θ is used. – In estimation, a measure of the accuracy of estimators d of θ.

Page 5: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

For example,

1,0θ = 0 means “nuclear warhead is NOT headed to UBC”

θ = 1 means “nuclear warhead is headed to UBC”

L(0,0) = 0

L(0,1) = cost of moving

L(θ, d):

L(1,1) = cost of moving + cost of belongings we cannot move

L(1,0) = loss of belongings +

D ={0,1}={Stay in Vancouver, Leave }

Page 6: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Common loss functions

• Univariate– L1 = |θ – d(x)| (absolute error loss)– L2 = (θ – d(x))2 (squared error loss)

• Multivariate– (Generalized) Euclidean norm: [θ – d(x)]T Q [θ – d(x)], where Q is positive definite

• More generally,– Non-decreasing functions of L1 or Euclidean norm

Page 7: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Loss function, L(d(X), θ), is random

Frequentist Bayesian

LE (in Frequentist) (in Bayesian)

Risk or Posterior risk

Page 8: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Estimator ComparisonThe Risk principle: the estimator d1(X) is better than another estimator d2(X) in the sense of risk if R(θ,d1) ≤ R(θ,d2) for all θ, with strict inequality for some θ.

However, in general, it does not exist

Best estimator (uniformly minimum risk estimator)

d*(X) = arg min R(θ, d(X)) for all θd

Class of all estimators

Page 9: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

For instance, only consider mean-unbiased estimators.

In particular, UMVUE is the best unbiased estimator when L2 is used.

Shrink the class of estimators. Then find the best estimator in this class.

Class of estimators

Smaller class of estimators

I am the best!!

Too Large

Page 10: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Weaken the optimality criterion by considering the maximum value of the risk over all θ. Then choose the estimator with the smallest maximum risk. The best estimator according to this minimax principle is called minimax estimator.

Notice that the risk depends on θ. So, the risks of two estimators often cross each other. This is another possibility that the best estimator does not exist.

d1 d2

θ

R(θ,d)

Winner Loser

Page 11: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Alternatively, we can find the best estimator by minimizing the average risk with respect to a prior π of θ in the Bayesian framework.

The estimator having the smallest Bayes risk with a specific prior π is called the Bayes estimator (with π).

Given a prior π of θ, the average risk of the estimator d(X) defined by

is called Bayes risk, rπ(d)

ddR )(),(

Page 12: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Under Bayes risk principle

R(θ, d1)

θ

π(θ)

R(θ, d2)

d1 d2

Winner LoserWinnerLoser

i

ii dR )(),(

Page 13: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

In general, it is not easy to find the Bayes estimator by minimizing the Bayes risk.

However, if the Bayes risk of the Bayes estimator is finite, then the estimator minimizing the posterior risk and the Bayes estimator are the same.

Page 14: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Some examples for finding the Bayes estimator

(1) Squared error loss:

E[(Θ-d)2|x] = E[Θ2|x] -2d E[Θ|x] + d2

min E[(Θ-d)2|x]d

Minimize the posterior risk

The minimizer of f(d) is E[Θ|x], i.e. the posterior mean.

f(d)

Page 15: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Some examples for finding the Bayes estimator

(2) Absolute error loss:

min E[|Θ-d| |x]d

The minimizer is med[Θ|x], i.e. the posterior median.

(3) Linear error loss:

L(θ,d) = K0(θ-d) if θ-d>=0 and

= K1(d-θ) if θ-d<0

The K0/(K0+K1) th quantile of the posterior is the Bayes

estimator of θ.

Page 16: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Relationship between minimax and Bayes estimators

In particular, if the Bayes estimator has a constant risk, then it is minimax.

Denote by dπ the Bayes estimator with respect to π.

If the Bayes risk of dπ is equal to the maximum risk of dπ, i.e.

),(sup)(),(

dRddR Then the Bayes estimator dπ is minimax .

Page 17: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Problems for the risk measure:

The risk measure is too sensitive to the choice of loss function.

All estimators are assumed to have finite risks. So, in general, the risk measure fails to use in problems with heavy tails or outliers.

Page 18: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Other measures:

(1) Pitman measure of closeness, PMC:

θ

d1

d2

QQ dd ||2||||1|| Pθ( )≥1/2d1 is Pitman-closer to θ than d2 if the above condition holds for all θ.

Page 19: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Other measures:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if, for all nondecreasing functions h and all θ,

Eθ[h(|| d1(X) - θ ||Q)] ≤ Eθ[h(|| d2(X) - θ ||Q)].

(3) Stochastic domination, s.d.:

d1(X) is said to stochastically dominate d2(X) if, for every c > 0 and all θ,

Pθ[|| d1(X) - θ ||Q≤ c] ≥ Pθ[|| d2(X) - θ ||Q≤ c].

Page 20: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Problems:

(1) Pitman measure of closeness, PMC:

θ

d1

d2

d3

Page 21: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Problems:

(2) Universal domination, u.d.:

d1(X) is said to universally dominate d2(X) if, for all nondecreasing functions h and all θ,

Eθ[h(|| d1(X) - θ ||Q)] ≤ Eθ[h(|| d2(X) - θ ||Q)].

Expectation is a linear operator

h(t) = at + b, a>0

Page 22: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

θ[h(|| d1(X) - θ ||Q)]

For all nondecreasing functions h,

ET

where T has a property that

T[h(y)] =h[T(y)]

Page 23: Statistical Decision Theory Abraham Wald (1902 - 1950) Wald’s test Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood.

Thank You!!