Week 6 Annotated

87
ACTL2002/ACTL5101 Probability and Statistics: Week 6 ACTL2002/ACTL5101 Probability and Statistics c Katja Ignatieva School of Risk and Actuarial Studies Australian School of Business University of New South Wales [email protected] Week 6 Probability: Week 1 Week 2 Week 3 Week 4 Estimation: Week 5 Review Hypothesis testing: Week 7 Week 8 Week 9 Linear regression: Week 10 Week 11 Week 12 Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

description

.

Transcript of Week 6 Annotated

Page 1: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of Business

University of New South Wales

[email protected]

Week 6Probability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

Page 2: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Last five weeks

Introduction to probability;

Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;

Special univariate (parametric) distributions (discrete &continue);

Joint distributions;

Moments & distribution for sample mean and variance.

Convergence; with applications LLN & CLT;

Estimators (MME, MLE, and Bayesian).

1101/1175

Page 3: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

This week

Evaluation estimators:

- UMVUE (unbiased, lowest variance);

- Cramer-Rao lower bound;

- Rao-Blackwell Theorem.

Interval estimation (v.s. point estimates last week):

- Pivotal quantity method;

- Confidence interval for: mean, difference between two means,proportions, variance, ratio of two variances, paired difference,and MLE estimates.

Many examples: not going to cover all in the lecture. Knowand be able to apply the method, do not memorize them!

1102/1175

Page 4: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Fisher (1922) on good estimators

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 5: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Fisher (1922) on good estimators

Fisher (1922) on good estimators

Last week we have seen 3 estimators.

There are infinite different estimators.

How to tell whether an estimator is good/better than another?

Fisher (1922) can with three conditions for good estimators:

- Efficiency: good estimator has smaller variance than others;

- Consistency: good estimator converges to true value ofparameter;

- Sufficiency: good estimator contains/uses all the informationabout our parameter of interest that is present in the data.

1103/1175

Page 6: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Fisher (1922) on good estimators

Methods to Evaluate Estimators

How good is an estimator?

One can compare them using:

i. The Best Unbiased Minimum Variance Estimator;

- Lowest mean squared error and unbias;

- Prove using Cramer-Rao Lower Bound;

ii. Consistency;

iii. Sufficient Statistics.

In the next slides we will discuss all three.

1104/1175

Page 7: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 8: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Mean Squared Error and Bias

The mean squared error (MSE) of an estimatorT (X1,X2, . . . ,Xn) of a parameter θ is defined as:

MSE = E[(T − θ)2

].

The MSE gives the average squared difference between theestimator T (X1,X2, . . . ,Xn) and θ and is given by:

MSE = E[(T − θ)2

]= E

[T 2 + θ2 − 2Tθ

]+ E[T ]− E[T ]

= E[T 2]− E [T ]2 + E [T ]2 + E

[θ2]− E [2Tθ]

∗= Var(T ) + (E[T ]− θ)2 = Var(T ) + (Bias (T ))2 ,

where Bias(T) =E[T ]− θ; * note θ is a constant.

An unbiased estimator has: E[T ] = θ.1105/1175

Page 9: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Example: estimation of Poisson λRecall (last week’s lecture) the MLE of a Poisson is λML = X .

We know X = E [X ] = λ, thus MME estimator: λMM = X .

These are thus both unbiased:

Bias (T ) = E[T ]− λ = E[X ]− λ

=1

n

n∑k=1

E[Xk ]− λ =nλ

n− λ = 0.

The MSE is given by (using unbiased):

MSE = Var(T ) + (Bias (T ))2 = Var

(1

n

n∑k=1

Xk

)

=1

n2

n∑k=1

Var (Xk) =1

n2

n∑k=1

λ =λ

n.

1106/1175

Page 10: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Approximation

Note that the variance of the estimator is a function of theparameter we are estimating.

Hence, we do not know Var(T ), thus we approximate usingthe estimator with:

Var (T ) =λ

n=

X

n.

The square root of this is called the standard error of theestimate:

s (λ) =

√X

n.

1107/1175

Page 11: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Functions of θNote that we defined τ(θ) as a function of the unknownparameters.

Question: Why might we be interested in determining anestimate of a function of the parameters instead of the anestimate of the parameters?

Solution: We might be interested in an estimate of anon-linear transformation of the parameters.

Example: consider Pr(X = 0), where X ∼ Poi(λ):

Pr(X = 0) =e−λλ0

0!.

We know that E[λ]

= λ, however

E[Pr(

X = 0|λ)]

=E

[e−λλ0

0!

]6= e−λλ0

0!.

1108/1175

Page 12: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

UMVUE’s

Consider two unbiased estimators, say T1 and T2. We defineefficiency of T1 relative to T2 as:

eff (T1,T2) =Var(T2)

Var(T1).

It is clear that if this is larger than 1, then:

Var(T2) > Var(T1),

i.e., estimator T1 has lower variance than estimator T2.

Thus high value of eff (T1,T2) implies prefer T1 above T2.

1109/1175

Page 13: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

UMVUE’s

Unbiased Estimators with Minimum Variance (UMVUE’s)

An estimator T is said to be a best unbiased estimator ofτ (θ) if it satisfies two conditions:

- The estimator T is unbiased, i.e., E[T ] = τ (θ);

- The estimator T has the smallest variance, i.e.,Var(T ) ≤ Var(T ?), for any other unbiased estimator T ?.

Note that the best unbiased estimator T is often called theuniform minimum variance unbiased estimator (UMVUE) ofτ (θ) .

1110/1175

Page 14: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Cramer-Rao Lower Bound (CRLB)

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 15: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Cramer-Rao Lower Bound (CRLB)

Cramer-Rao Lower Bound (CRLB)How to prove T (X1,X2, . . . ,Xn) has the lowest variance of allunbiased estimators?

Calculate efficiency for all unbiased estimators?

That will take some time, what is all?

Let X1,X2, . . . ,Xn be a random sample from fX (x |θ) and letT (X1,X2, . . . ,Xn) be an unbiased estimator of θ.

The smallest lower bound of the variance (called theCramer-Rao Lower Bound (CRLB)) for unbiased estimators is:

Var(T (X1,X2, . . . ,Xn)) ≥ 1

n · If ? (θ),

where If ? (θ) is the Fisher information of the parameter θ (seenext slide).

1111/1175

Page 16: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Cramer-Rao Lower Bound (CRLB)

Cramer-Rao Lower Bound (CRLB)

Score: S = ∂`(x ; θ)/∂θ. MLE satisfies FOC ⇒ E[S ] = 0.

The Fisher information of the parameter θ is defined to be thefunction:

If ?(θ) = E[(

∂ log(fX (x |θ))∂θ

)2]

∗= −E

[∂2 log(fX (x |θ))

∂θ2

]∗∗= E

[(∂`(x ;θ)∂θ

)2]/n2 ∗∗

= −E[∂2`(x ;θ)∂θ2

]/n,

* see also slides 1166-1168 (we do not need to prove it in thiscourse). Fisher information is the variance of the score (usingmean of zero). ** using i.i.d. samples.

Note: asymptotically, as n→∞, the MLE is on the CRLB ⇒MLE is asymptotically UMVUE.

1112/1175

Page 17: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Cramer-Rao Lower Bound (CRLB)

Exercise: Cramer-Rao Lower Bound (CRLB)Consider n draws from a Bin(m, p) r.v..

fX (x ; p) =

(m

x

)· px · (1− p)m−x

log (fX (x ; p)) = log

((m

x

))+ x · log(p) + (m − x) · log(1− p)

Question: Find the CRLB.

Solution: First, Fisher information (* Var(X ) = mp(1− p)):

∂ log (fX (x ; p))

∂p=

x

p− m − x

1− p=

x −mp

p(1− p)(∂ log (fX (x ; p))

∂p

)2

=(x −mp)2

p2(1− p)2

If ?(p) = E

[(∂ log (fX (x ; p))

∂p

)2]

=E[(x −mp)2

]p2(1− p)2

=Var(X )

p2(1− p)2

∗=

m

p(1− p).

1113/1175

Page 18: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Cramer-Rao Lower Bound (CRLB)

Exercise: Cramer-Rao Lower Bound (CRLB)

Alternative, we can find the Fisher information by:

∂2 log (fX (x ; p))

∂p2=−x

p2− m − x

(1− p)2

If ?(p) = −E[∂2 log (fX (x ; p))

∂p2

]=−

(−E [X ]

p2− m − E [X ]

(1− p)2

)=

m

p(1− p).

Thus, the Cramer-Rao Lower Bound is given by:

Var(T (X1, . . . ,Xn)) ≥ 1

n · mp(1−p)

=p(1− p)

m · n .

Hence, the minimum of the variance of the estimate pdecreases if the number of r.v. (i.e., m) increases or thesample size (i.e., n) increases.

1114/1175

Page 19: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Consistency

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 20: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Consistency

Consistency

A sequence of estimators {Tn} is a consistent sequence ofestimators of the parameter θ if for every ε > 0 we have:

limn→∞

Pr (|Tn − θ| < ε) = 1,

i.e., Tna.s.→ θ.

Equivalently, if Tn is a sequence of estimators of a parameterθ that satisfies the following two conditions:

i) limn→∞

Var(Tn) = 0 (the uncertainty in the estimate is zero as

n→∞);

ii) limn→∞

Bias (Tn) = 0 (estimator is asymptotically unbiased);

then it is a sequence of consistent estimators of θ (Proofusing Chebyshev’s inequality: Pr (|X − µ| > ε) ≤ σ2/ε2).

1115/1175

Page 21: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Consistency

Example: consistency of MLE’s

Suppose X1,X2, . . .Xn is a random sample from fX (x |θ).

Let θ be the MLE of θ so that τ(θ)

is the MLE of any

continuous function τ (θ).

Under certain regularity conditions (e.g., continuous,differentiable, no parameter on the boundaries of x , etc.) on

fX (x |θ), τ(θ)

is a consistent estimator of τ (θ).

Due to:√

n(θn − θ

)d→ N

(0, 1

If ? (θ)

).

Proof: See slide 1166.

1116/1175

Page 22: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 23: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

Sufficient Statistics

Let (X1,X2, . . . ,Xn) have joint p.d.f. f (x ; θ). A statistic S issaid to be sufficient for θ if for any other statistic T theconditional p.d.f. of T given S = s, denoted by fT |S(t) doesnot depend on θ, for any value of t.

Idea: if S is observed, additional information about θ cannotbe obtained from θ if the conditional distribution of T givenS = s is free of θ.

Factorization Theorem. A necessary and sufficient conditionfor T (X1, . . . ,Xn) to be a sufficient statistic for θ is that thejoint probability function (density function or frequencyfunction) factors in the form:

fX (x1, . . . , xn |θ ) = g (T (x1, . . . , xn) , θ) · h (x1, . . . , xn) .1117/1175

Page 24: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

The Rao-Blackwell TheoremLet Θ be an estimator of θ with E[Θ2] <∞ (i.e., finite) forall θ. Suppose that T is sufficient for θ. Define a newestimator as:

Θ = E[Θ |T ].

Then for all θ, this new estimator has a smaller MSE. Wehave that:

MSE(

Θ)≤ MSE

(Θ)

or, equivalently:

E[(

Θ− θ)2]≤ E

[(Θ− θ

)2].

Thus, we see from that Rao-Blackwell theorem, that if anestimator is not a function of a sufficient statistic it can beimproved in terms of MSE (proof: see next slides).

1118/1175

Page 25: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

Proof: From * the law of iterated expectation (see week 4):

E[Θ]

= E[E[Θ|T

]︸ ︷︷ ︸

]∗= E

[Θ],

so to compare the two estimators, we need only compare theirvariances. Using the conditional variance identity, we have:

Var(

Θ)

= Var(E[Θ|T

])+ E

[Var

(Θ|T

)]= Var

(Θ)

+ E[Var(

Θ|T)

︸ ︷︷ ︸≥0

].

Thus, Var(

Θ)> Var

(Θ)

, unless Var(

Θ|T)

= 0. This is

the case only if Θ is a function of T , which would implyΘ = Θ.

1119/1175

Page 26: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

The Rao-Blackwell Theorem

How do we explain this last clause? Well,

Var(

Θ|T)

= 0 ⇔∫θ

(θ − E

[Θ|T = t

])2· f

Θ|T

(θ|t)

d θ = 0,

for all possible realizations t of T , and so:

Var(

Θ|T)

= 0 ⇔ θ = E[Θ|T = t

],

which implies θ is a function of t, and thus:

Var(

Θ|T)

= 0 ⇔ Θ = E[Θ|T

]= Θ,

as stated above.1120/1175

Page 27: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Evaluating estimators

Sufficient Statistics

Example: Sufficient Statistic for Exponential distribution

Consider a random sample Xi ∼ EXP(λ) for i = 1, . . . , n. Thejoint p.d.f. is:

fX (x1, . . . , xn;λ) = λn · exp

(−λ ·

n∑i=1

xi

).

This suggests checking statistic S =∑n

i=1 xi , we knowS ∼ Gamma(n, λ) so that:

fS(s;λ) =λn

Γ(n)· sn−1 · exp (−λ · s) .

The conditional density given S = s is:

fX (x1, . . . , xn;λ)

fS(s;λ)=

λn · exp (−λ ·∑ni=1 xi )

λn

Γ(n) · sn−1 · exp (−λ · s)=

Γ(n)

sn−1,

which is free of θ, thus S is sufficient for θ.1121/1175

Page 28: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Introduction

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 29: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Introduction

Introduction

Last week we have seen point estimators;

Point estimators: using a sample tries to describes thedistribution of a population;

However, the sample itself is a random variable;

This implies that parameters estimated using a sample areuncertain!

You should take that into account, especially when you areinterested in tail risk (example insurer: probability of ruin).

Using a point estimate would underestimate the “true” risk.

1122/1175

Page 30: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Introduction

Application: parameter risk

See Excel file.

We have 25 samples of 100 simulated observations of aN(8, 122) random variable.

For each sample we can estimate the parameters of thenormal distribution.

Using the parameters we estimate the 99.5% percentile (VaRrequired capital) for each sample or expected shortfallE[Y |Y > b] where b = µY + σY · Φ(0.99).

Large variation in required capital between samples: between±35 and ±43.

Parameters themselves are source of uncertainty!1123/1175

Page 31: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Introduction

Parametric Interval Estimation

An interval estimate of a parameter θ has the formθ1 < θ < θ2, where θ1 and θ2 are realized values of suitablerandom variables θ1 (X1, . . . ,Xn) and θ2 (X1, . . . ,Xn), whichare functions of the random sample X1, . . . ,Xn.

Construct the interval:

Pr(θ1 (X1, . . . ,Xn) < θ < θ2 (X1, . . . ,Xn)

)= 1− α,

for some specified 0 ≤ α ≤ 1 and then we define:(θ1 (X1, . . . ,Xn) , θ2 (X1, . . . ,Xn)

)as the 100 (1− α) % confidence interval for θ.

1124/1175

Page 32: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Introduction

ExampleConsider an i.i.d. sample of size 4, X1, X2, X3, X4 fromN (µ, 1). Recall that we can estimate the population mean µby X . The probability that µ will be in the range(X − 1,X + 1

)is:

Pr(X − 1 < µ < X + 1

)=Pr

(−1 < X − µ < 1

)∗=Pr

(−√

4 < Z <√

4)

=Φ(2)− (1− Φ(2))

=0.9544.

* using m.g.f. technique we have X ∼ N(µ, σ2/n).

Thus, µ is in the range:(X − 1,X + 1

)with probability

0.9544.

Use: Φ(4) = 0.999968, Φ(2) = 0.97725, Φ(1) = 0.8413Φ(0.25) = 0.5987.

1125/1175

Page 33: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

The Pivotal Quantity Method

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 34: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

The Pivotal Quantity Method

The Pivotal Quantity Method

The general method for constructing confidence intervals isusing the pivotal quantity method.

1. Find a pivot: i.e., function of X1, . . . ,Xn whose distributiondoes not depend on θ.

2. Find the function g (X1, . . . ,Xn; θ):

The pivotal quantity method requires finding a function of theform g (X1, . . . ,Xn, θ), so that it is known that for quantilesq1 and q2 we have:

Pr (q1 < g (X1, . . . ,Xn; θ) < q2) = 1− α,

with q1 ≤ q2.

Continues next slide.1126/1175

Page 35: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

The Pivotal Quantity Method

The Pivotal Quantity Method

Thus, let g (X1, . . . ,Xn; θ) be a monotonic function of θ andlet it have a unique inverse g−1 (X1, . . . ,Xn) = θ.

3. The 100 (1− α) % confidence interval of θ is given by:

g−1 (X1, . . . ,Xn; q1) < θ < g−1 (X1, . . . ,Xn; q2) ,

if g (X1, . . . ,Xn; θ) is an increasing function, and

g−1 (X1, . . . ,Xn; q2) < θ < g−1 (X1, . . . ,Xn; q1) ,

if g (X1, . . . ,Xn; θ) is a decreasing function.

See graph on slide 1128.1127/1175

Page 36: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

The Pivotal Quantity Method

Confidence intervals Y = 2 ·X · θ · n ∼ χ2(2n)

f Y(y)

1−α=χ2q

2

(2n)−χ2q

1

(2n)

← 1 − χ2q2(2n)

χ2q1(2n) →

q1 y q2Pr (q1 < g (X1, . . . ,Xn; θ) < q2) = 1− α1128/1175

Page 37: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 38: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: Pivotal quantity method and the Exponential

Suppose X1,X2, . . . ,Xn is a random sample from Exp(λ)distribution (with MXi

(t) = (1− t/λ)−1). We know that(week 2):

n · X =n∑

k=1

Xk ∼ Gamma (n, λ) .

We know that the the m.g.f. of nX is:

Mn·X (t) = E[e∑n

k=1Xk ·t]

= (MXi(t))n =

(1− t

λ

)−n,

and the m.g.f. of the random variable 2 · n · λ · X is:

M2·n·λ·X (t) = Mn·X (2 · λ·t) =

(1− 2λ · t

λ

)−n=

(1

1− 2 · t

)2·n/2

.

1129/1175

Page 39: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

1. The pivot is (recall from week 5):

2 · n · λ · X ∼ Gamma(n, 1

2

)= χ2 (2 · n) .

Its distribution is free of the parameter value λ, thus a pivot.If we therefore denote the quantiles from the χ2−distributionsas (F&T page 164-166 & survival: 168, 169, see graph onslide 1128):

q1 = χ2α/2 (2 · n) and q2 = χ2

1−α/2 (2 · n) .

2. The function g (X1, . . . ,Xn;λ) = 2 · n · λ · X (increasing):

Pr(χ2α/2 (2 · n) < 2 · n · λ · X < χ2

1−α/2 (2 · n))

= 1− α.

3. Hence, a 100 (1− α) % confidence interval for λ is:

χ2α/2 (2 · n)

2 · n · x < λ <χ2

1−α/2 (2 · n)

2 · n · x .

1130/1175

Page 40: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: Confidence Interval for the MeanRecall from week 5.

Suppose X1,X2, . . . ,Xn are independent, identicallydistributed random variables with finite mean µ and finitevariance σ2. As before, denote the sample mean by X n.

Then, the central limit theorem states:

X n − µσ/√

n

d→ N (0, 1) , as n→∞.

This holds for all r.v. with finite mean and variance, not onlynormal r.v.!

Suppose X1, . . . ,Xn is a random sample from a populationwith mean µ and known variance σ2.

Question: Find the CI for µ.1131/1175

Page 41: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Solution: By the central limit theorem, X is approximatelynormally distributed with mean µX = µ and (population)variance σ2

X= σ2

/n.

1. Our pivot is Z =X − µXσX

=X − µσ/√

n∼ N(0, 1).

2. The function g (X1, . . . ,Xn;µ) =X − µσ/√

n(decreasing). Using:

Pr(zα/2 < Z < z1−α/2

)= 1− α,

we then have:

Pr(zα/2 <X − µσ/√

n< z1−α/2) = 1− α

Pr(

zα/2 · σ√n− X < −µ < z1−α/2 · σ√

n− X

)= 1− α

Pr(

X − σ√n· z1−α/2 < µ < X − σ√

n· zα/2

)= 1− α

1132/1175

Page 42: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: Confidence Interval for the Mean

3. Thus we have:

x − σ√n· z1−α/2 < µ < x+

σ√n· z1−α/2,

where z1−α/2 is the point on the standard normal for whichthe probability above it is α/2 (note symmetry of standardnormal distribution). This is an approximate 100 (1− α) %confidence interval for µ (given known population varianceσ2).

Question: Why approximate 100 (1− α) % confidenceinterval?

Solution: Recall, X is asymptotically normally distributedusing CLT (except when the Xi are i.i.d. normally distributed).

1133/1175

Page 43: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Confidence Interval for the Mean

For standard normal distribution, we have the following (oftenused) quantiles:

two-sided one-sided

α z1−α/2 z1−α1% 2.05 2.335% 1.96 1.64510% 1.645 1.28

Note that the above gives the confidence interval for the meanboth when the population variance is known and when it isonly an approximation for which the approximation improveswith increasing sample size. This same confidence intervalformula for the mean holds even if the population variance isreplaced by the sample variance provided the sample is large(generally, n > 30 is a rule of thumb for large samples).

1134/1175

Page 44: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI mean, unknown Variance, Small SampleLet X1, . . . ,Xn is a random sample from a population withmean µ and unknown variance σ2 (but with known samplevariance s2).

a. Question: What is the pivot? See week 5 online lecture.

b. Question: Find an (approximated) 100 (1− α) % confidenceinterval for µ.

a. Solution: The pivot is:

T =X − µS/√

n=

X − µσ/√

n︸ ︷︷ ︸=Z

/√(n − 1)S2

σ2

/(n − 1)︸ ︷︷ ︸

=

√χ2(n−1)

n−1

∼ tn−1.

The function g (X1, . . . ,Xn;µ) = X−µs/√n

(decreasing).1135/1175

Page 45: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI mean, unknown Variance, Small Sample

b. Solution: an approximate 100 (1− α) % confidence intervalfor µ is given by:

x − s√n· t1−α/2,n−1 < µ < x+

s√n· t1−α/2,n−1,

where t1−α/2,n−1 is the point on the t-distribution with n − 1degrees of freedom for which above it is α/2.

Table of percentiles (quantiles) from the t-distribution aregiven in F&T page 163 (note symmetry of the distribution).

Note: tn−1d→ N(0, 1) as n→∞, often used for large

samples.

Interpretation: as n→∞ we have s → σ.1136/1175

Page 46: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the variance

Let X1, . . . ,Xn be a random sample from N(µ, σ2

).

We suppose that µ is not known and we wish to construct a100 (1− α) % confidence interval for σ2.

a. Question: What is the pivot? See week 5 online lecture.

b. Question: Find an (approximated) 100 (1− α) % confidenceinterval for σ2.

1137/1175

Page 47: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the variance

Define quantities χ2α/2 (n − 1) and χ2

1−α/2 (n − 1):

Pr(

X ≤ χ2α/2 (n − 1)

)=α/2

Pr(

X ≤ χ21−α/2 (n − 1)

)=1− α/2,

where X ∼ χ2 (n − 1). See F&T tables page 164-169.

a. Solution: We know from week 5 that the pivot is:

(n − 1) · S2

σ2∼ χ2 (n − 1) .

The function g(X1, . . . ,Xn;σ2

)=

(n − 1) · s2

σ2(decreasing).

1138/1175

Page 48: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the variance

b. Solution:

Pr

(χ2α/2 (n − 1) <

(n − 1) · S2

σ2< χ2

1−α/2 (n − 1)

)=1− α.

Rewriting, we obtain:

Pr

((n − 1) · S2

χ21−α/2 (n − 1)

< σ2 <(n − 1) · S2

χ2α/2 (n − 1)

)=1− α.

A 100 (1− α) % confidence interval estimate for σ2 is:

(n − 1) · s2

χ21−α/2 (n − 1)

< σ2 <(n − 1) · s2

χ2α/2 (n − 1)

,

where s2 is the observed sample variance.1139/1175

Page 49: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for ratios of two variances

When comparing the variances of two populations, the ratioof the variances (rather than the difference) is consideredbecause there is a pivotal quantity available for ratios of thevariances that has an F -distribution.

Assume that we have two sets of samples:

X11,X12, . . . ,X1n1 , from N(µ1, σ

21

),

andX21,X22, . . . ,X2n2 , from N

(µ2, σ

22

).

Denote the respective sample variances by S21 and S2

2 .

Application: Is one portfolio riskier than another?

1140/1175

Page 50: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for ratios of two variances

Recall that:

(n1 − 1) S21

σ21

∼ χ2 (n1 − 1) and(n2 − 1) S2

2

σ22

∼ χ2 (n2 − 1) .

1. The pivot is:

(n1 − 1) S21

/σ2

1

n1 − 1

(n2 − 1) S22

/σ2

2

n2 − 1

=

χ2(n1 − 1)

n1 − 1

χ2(n2 − 1)

n2 − 1

∼ F (n1 − 1, n2 − 1)

=S2

1

/σ2

1

S22

/σ2

2

=σ2

2

σ21

S21

S22

∼ F (n1 − 1, n2 − 1) .

2. The function g(

X1, . . . ,Xn;σ2

1

σ22

)=σ2

2

σ21

s21

s22

(decreasing).

1141/1175

Page 51: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for ratios of two variances

So that:

Pr

(Fα/2 (n1 − 1, n2 − 1) <

σ22

σ21

S21

S22

< F1−α/2 (n1 − 1, n2 − 1)

)=1− α

Pr

(S2

2

S21

· Fα/2 (n1 − 1, n2 − 1) <σ2

2

σ21

<S2

2

S21

· F1−α/2 (n1 − 1, n2 − 1)

)=1− α

Pr

(S2

1

S22

· 1

F1−α/2 (n1 − 1, n2 − 1)<σ2

1

σ22

<S2

1

S22

· 1

Fα/2 (n1 − 1, n2 − 1)

)=1− α,

where Fα/2 (n1 − 1, n2 − 1) and F1−α/2 (n1 − 1, n2 − 1) aredetermined from the table of F -distribution (see F&T page170–174).

1142/1175

Page 52: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x

f X(x)

Snecdor‘s F p.d.f.

n

1=3, n

2=15

0.23 0.83 2.25

1/8

1/2

7/8

x

F X(x)

Snecdor‘s F c.d.f.

n1=3, n

2=15

0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

x

f X(x)

Snecdor‘s F p.d.f.

n

1=15, n

2=3

0.45 1.21 4.37

1/8

1/2

7/8

x

F X(x)

Snecdor‘s F c.d.f.

n1=15, n

2=3

F1−α/2 (n2 − 1, n1 − 1) = 1Fα/2(n1−1,n2−1)1143/1175

Page 53: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for ratios of two variancesNote that we have:

F1−α/2 (n2 − 1, n1 − 1) =1

Fα/2 (n1 − 1, n2 − 1).

Note: F&T tables only has tables for 1− α = 0.1,1− α = 0.05, 1− α = 0.025, or 1− α = 0.01.

3. A 100 (1− α) % confidence interval estimate forσ2

1

σ22

is given

by:

s21

s22

· 1

F1−α/2 (n1 − 1, n2 − 1)<σ2

1

σ22

<s2

1

s22

·F1−α/2 (n2 − 1, n1 − 1) ,

where s21 and s2

2 are the observed sample variances from thetwo populations.1144/1175

Page 54: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI for Ratios of Two VariancesABC Manufacturing Company makes computer chips in theAsia Pacific region. It has been alleged that the price of itscomputer chip is less variable in Asia than in the Pacific. Atotal of 230 random purchases of ABC’s computer chips weremade in the region and the following sample statistics weredetermined:

Asia: n1 = 179, S1 = 0.68;Pacific: n2 = 51, S2 = 0.85.

Question: Construct a 95% confidence interval forσ2

1

σ22

.

One may use F1−0.025 (178, 50) ≈ 1.56 andF1−0.025 (50, 178) ≈ 1.435. These are approximated from Ftables. For degrees of freedom much larger than 120 just usethe corresponding value at ∞.1145/1175

Page 55: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI for Ratios of Two Variances

Solution: A 95% confidence interval for σ21/σ

22 is:(

0.68

0.85

)2

· 1

F1−0.025 (178, 50)<σ2

1

σ22

<

(0.68

0.85

)2

· F1−0.025 (50, 178)(0.68

0.85

)2

· 1

1.56<σ2

1

σ22

<

(0.68

0.85

)2

· 1.435

0.410 3<σ2

1

σ22

< 0.918 4

1146/1175

Page 56: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the Difference Between Two Means

Consider two sets of independent random samples from twodifferent normal populations:

- X11,X12 . . . ,X1n1 from N(µ1, σ

21

)(sample size: n1);

- X21,X22 . . . ,X2n2 from N(µ2, σ

22

)(sample size: n2).

a. Question: What is the distribution of X 1 − X 2?

b. Question: What is the pivot for X 1 − X 2?

c. Question: What is (approximated) 100 (1− α) % confidenceinterval for (µ1 − µ2)?

1147/1175

Page 57: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the Difference Between Two Meansa. Solution: Recall that (week 4) the statistic X 1 − X 2 is

normally distributed with mean:

E[X 1 − X 2

]= µ1 − µ2,

and variance:

Var(X 1 − X 2

)= Var

(X 1

)+Var

(X 2

)−2Cov

(X 1,X 2

) ∗=σ2

1

n1+σ2

2

n2.

* using Cov(X 1,X 2) = 0 using independent samples.

b. Solution: To construct a confidence interval for µ1 − µ2, weuse the pivot:(

X 1 − X 2

)− (µ1 − µ2)√

σ21

n1+σ2

2

n2

∼ N (0, 1) .

1148/1175

Page 58: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for the Difference Between Two Means

We have the (decreasing) function:

g (X1, . . . ,Xn;µ1 − µ2) =(X 1−X 2)−(µ1−µ2)√

σ21/n1+σ2

2/n2

.

Pr

(X 1 − X 2

)−

√σ2

1

n1+σ2

2

n2z1−α/2 < µ1 − µ2 <

(X 1 − X 2

)+

√σ2

1

n1+σ2

2

n2z1−α/2

= 1−α.

c. Solution: An approximate 100 (1− α) % confidence intervalfor (µ1 − µ2) is given by:

(x1 − x2)−√σ2

1

n1+σ2

2

n2z1−α/2 < µ1−µ2 < (x1 − x2) +

√σ2

1

n1+σ2

2

n2z1−α/2,

where z1−α/2 is the point on the standard normal for whichthe probability below it is α/2.

1149/1175

Page 59: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

CI diff means when variances are equal/not equal

Previous slides assumed that the variances in the two sampleswere unequal.

Sometimes, only the location should change, not the volatility.

Then, we might have more information if we combine theinformation of the volatility from the two samples. This leadsto better prediction.

Be cautious when to use it!

1150/1175

Page 60: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI diff means when variances are equal

Consider the case where σ1 = σ2 = σ, then the randomvariable:

Z =

(X 1 − X 2

)− (µ1 − µ2)

σ

√1

n1+

1

n2

has an approximate standard normal distribution.

σ2 can be estimated by pooling the squared deviations fromthe means of the two samples with the pooled estimator:

S2p =

(n1 − 1) S21 + (n2 − 1) S2

2

n1 + n2 − 2.

This is unbiased, that is E[S2p ] = σ2.

1151/1175

Page 61: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI diff means when variances are equalAlso we have:

(n1 − 1) S21

σ2∼ χ2 (n1 − 1) and

(n2 − 1) S22

σ2∼ χ2 (n2 − 1) .

Hence, the weighted average:

Y =(n1 − 1) S2

1

σ2︸ ︷︷ ︸=∑n1−1

i=1 Z2i ∼χ2(n1−1)

+(n2 − 1) S2

2

σ2︸ ︷︷ ︸=∑n2−1

i=1 Z2i ∼χ2(n2−1)

=(n1 + n2 − 2) S2

p

σ2︸ ︷︷ ︸=∑n1+n2−2

i=1 Z2i

∼ χ2 (n1 + n2 − 2) ,

since the sum of two chi-square random variables is anotherchi-square random variable with d.f. the sum of the d.f.’s.

Question: Find the CI for µ1 − µ2.1152/1175

Page 62: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI diff means when variances are equalSolution: Recall the t-distribution definition (week 5).

1. Use as pivot the random variable:

T =Z√Y

n1+n2−2

=

(X 1−X 2)−(µ1−µ2)

σ·√

1/n1+1/n2√S2p

σ2

=

(X 1 − X 2

)− (µ1 − µ2)

Sp

√1

n1+

1

n2

∼ tn1+n2−2.

Here Sp is the pooled standard deviation (see slide 1151).

2. We have (decreasing function):

g (X1, . . . ,Xn;µ1 − µ2) =((

X 1 − X 2

)− (µ1 − µ2)

)/(Sp

√1/n1 + 1/n2

).

1153/1175

Page 63: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI diff means when variances are equalWe have:

Pr

((X 1 − X 2

)− t1−α/2,n1+n2−2 · Sp

√1

n1+

1

n2< µ1 − µ2

<(X 1 − X 2

)+t1−α/2,n1+n2−2 · Sp

√1

n1+

1

n2

)= 1− α.

3. An approximate 100 (1− α) % confidence interval for(µ1 − µ2) is given by:

(x1 − x2)− t1−α/2,n1+n2−2 · sp√

1

n1+

1

n2< µ1 − µ2

< (x1 − x2) +t1−α/2,n1+n2−2 · sp√

1

n1+

1

n2,

where t1−α/2,n1+n2−2 is the point on the t-distribution (withn1 + n2 − 2 degrees of freedom) for which the probabilityabove it is α/2.1154/1175

Page 64: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI difference in meansAn insurance company offers marine insurance.

Up to two years ago, the insurer had 150 contracts, withsample mean claims $150 and sample standard deviation $25.

Last year, the insurer introduced a small deductible in thecontract. The number of contracts after the introduction was25 with sample mean claims $140 and sample standarddeviation $21.

Question: What is the 95% confidence interval for the changein the sample mean due to the introduction of the deductible?

Solution 1: z0.025 = 1.96,√

252/150 + 212/25 = 4.66976.CI: (−19.15;−0.85).

Solution 2: t0.025,173 = 1.96, Sp =√(149 · 252 + 24 · 212)/173 ·

√1/150 + 1/25 = 5.289183.

CI: (−20.37; 0.37).1155/1175

Page 65: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: Confidence Interval for proportionsConfidence interval estimates for p which is the proportion ofsuccesses in a population can be found using the samplingdistribution of proportions.

Let X be the random variable denoting the number ofsuccesses in an experiment of n trials.

Then, X ∼ Bin (n, p) and an estimator for p is p = X/n.

It is unbiased, because E [p] = p.

Its variance is Var (p) =p (1− p)

n.

Application: Probability of issuing a claim.

Question: How to construct a pivotal quantity?1156/1175

Page 66: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: Confidence Interval for proportions

1. Solution: The pivot is (using CLT):

Z =p − p√

p ·(

1− p)

n

approx∼ N(0, 1).

2. Thus, g (X1, . . . ,Xn; p) = (p − p) /√

p · (1− p) /n and

Pr

(p −

√p · (1− p)

n· z1−α/2 < p < p+

√p · (1− p)

n· z1−α/2

)= 1− α.

3. A 100 (1− α) % confidence interval for p is given by:

p −√

p · (1− p)

n· z1−α/2 < p < p+

√p · (1− p)

n· z1−α/2.

1157/1175

Page 67: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for difference of proportions

For two population proportions say p1 and p2, the statistic(p1 − p2) is the unbiased point estimator for the differencebetween p1 and p2.

The variance of the sampling distribution is given by the sumof the variances as:

σ2p1−p2

= Var (p1 − p2) =Var (p1) + Var (p2)− 2Cov (p1, p2)

=p1 (1− p1)

n1+

p2 (1− p2)

n2.

Question: Why is Cov (p1, p2) = 0?

Solution: We have two different populations we draw from,hence independent.

Question: Find a CI for p1 − p2.1158/1175

Page 68: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Exercise: CI for difference of proportions

1. Solution: The pivot is (using CLT):

Z =(p1 − p2)− (p1 − p2)√

σ2p1−p2

approx∼ N(0, 1).

2. Thus,g (X1, . . . ,Xn; p1 − p2) = ((p1 − p2)− (p1 − p2)) /σp1−p2 and

Pr((p1 − p2)− σp1−p2 · z1−α/2 < p1 − p2 < (p1 − p2) +σp1−p2 · z1−α/2

)= 1− α.

3. A 100 (1− α) % confidence interval estimate for p1 − p2 isgiven by:

(p1 − p2)− z1−α/2 · σp1−p2 < p1−p2 < (p1 − p2) +z1−α/2 · σp1−p2 .

1159/1175

Page 69: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI for difference of proportionsA motor vehicle insurer insurer is interested in the differencein claim rates between males and females. The insurer hadeach year 300 males insured and 270 females insured.

The yearly claim sizes in the past five years were:year 2011 2010 2009 2008 2007 total

Males 45 46 31 49 45 216Females 37 42 41 36 32 188

Question: Is there a difference in the claim rate betweenmales and females?

Solution: pM = 2161500 = 0.144, pF = 188

1350 = 0.139259,pM − pF = 0.004740741. Note: pM = 0.15 and pF = 0.13!σ2pM−pF = 0.144·(1−0.144)

1500 + 0.139259·(1−0.139259)1350 = 1.71 · 10−4 ⇒

σpM−pF = 0.0131. Z = 0.004740741/0.01307538 =0.362569852⇒ α = 1− 0.641536883 = 0.358463117.1160/1175

Page 70: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for paired difference

Sometimes we are interested in the comparison of twosamples, but the samples are not independent.

Let investigate data which comes in pairs, i.e.,:

(X11,X21) , (X12,X22) , . . . , (X1n,X2n) .

In the case of paired or matched data, we are interested inanalysing the differences in the sample Di = X1i − X2i andtherefore estimating the difference in the mean µD = µ1 − µ2.

1161/1175

Page 71: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for paired difference

Define:

D =1

n∑k=1

Dk =1

n∑k=1

(X1k − X2k) ,

and define:

SD =

√√√√√ n∑k=1

(Dk − D

)2

n − 1,

which are respectively the sample mean and sample standarddeviation of the differences in the sample.

Question: How to construct a pivotal quantity?

1162/1175

Page 72: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Example: CI for paired difference

1. Solution: The pivot is (using CLT):

D − µDSD

/√n

=D − µDσD/√

n︸ ︷︷ ︸=Z

/√(n − 1)S2

σ2D

/(n − 1)︸ ︷︷ ︸√

χ2(n−1)/(n−1)

approx∼ t (n − 1) .

2. Thus, g (D1, . . . ,Dn;µD) =√

n ·(D − µD

)/SD and

Pr(D − t1−α/2,n−1 ·

(sD/√

n)< µD < D+t1−α/2,n−1 ·

(sD/√

n))

= 1− α.

3. A 100 (1− α) % confidence interval estimate for µD is:

d − t1−α/2,n−1 ·(sD/√

n)< µD < d+t1−α/2,n−1 ·

(sD/√

n),

where d is the observed sample mean of the differences and sDis the observed sample standard deviation of the differences.

1163/1175

Page 73: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI for paired difference

An insurance company offers directors and officers liabilityinsurance (D&O) Iin Australia and China. The yearly claimsizes in the past ten years were:

year 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002

Aus 93 113 93 115 103 111 136 86 133 121China 137 116 126 117 118 140 122 108 130 127

Differ −44 −3 −33 −2 −15 −29 14 −22 3 −6

Moreover, the total claim size in Australia is $1,104 and inChina $1,241. The sum of the squared yearly claim size is$12,444 in Australia and $15,489 in China and the sum of theproduct of the Australian and Chinese yearly claim size is$13,725.

The total difference in claim size is difference $-137 and thesum of the yearly squared differences is $4,829.

1164/1175

Page 74: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Interval estimation using confidence intervals

Examples & Exercises

Application: CI for paired differenceQuestion: Find the correlation coefficient between the claimsin Australia and China.

Solution: Cov(Aus,Ch) = 13,72510 − 1,104

10 ·1,241

10 = 24.66

Var(Aus) = 1/9 ·(

12, 444− 10 ·(

1,10410

)2)

= 285,

Var(Ch) = 1/9 ·(

15, 489− 10 ·(

1,24110

)2)

= 98. Hence

ρ = 24.66√285·98

= 0.15.

Question: Find the probability that the claims in China, onaverage, are $10 larger than in Australia.

Solution: d = −13710 = −13.7,

sd = 1/9 · (4, 829− 13.72 · 10) = 18.11.Pr (µd < −10 = −13.7 + t1−α,9 · (18.11/3)) = 1− α⇒t1−α,9 = 0.61288⇒ α = 0.277561⇒ 1− α = 0.722439.1165/1175

Page 75: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 76: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Important properties of the MLE’s

Suppose the density fX (x |θ) satisfies certain regularityconditions (e.g., continuous, differentiable, no parameter onthe boundaries of x , etc.) and suppose θn is the MLE of θ fora random sample of size n from fX (x |θ). Then the θn areasymptotically normally distributed with mean:

E[θn

]= θ,

and variance:

Var(θn) =1

n·(E

[(∂

∂θlog (fX (x |θ))

)2])−1

=1

n · If ?(θ).

We write this as:

√n ·(θn − θ

)d→ N

(0,

1

If ?(θ)

).

1166/1175

Page 77: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Important properties of the MLE’s

Note that:

If ?(θ) = E

[(∂

∂θlog (fX (x |θ))

)2].

It can be shown (not required for this course) that:

E

[(∂

∂θlog (fX (x |θ))

)2]

= −E[∂2

∂θ2log (fX (x |θ))

].

In evaluating the variance of MLE, you can therefore use eitherform of this variance formula.

1167/1175

Page 78: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Important properties of the MLE’s

For functions of the parameter, say g(θ), we can easily extendthe theorem, except there is a delta-method adjustment to thevariance.

Thus, assuming g(θ) is a differentiable function of θ, then:

√n(

g(θn

)− g (θ)

)d→ N

(0,(g ′ (θ)

)2 · 1

If ?(θ)

),

where g ′(θ) is the first derivative of g with respect to theparameter θ.

Using week 4 approximate method with Taylor series.

1168/1175

Page 79: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Important properties of the MLE’s

Asymptotic properties of MLE’s work well for N(θ, 1), N(0, θ),Exp(θ), Poisson(θ), Bernoulli(θ).

What about the Cauchy density:

fY (y |θ) =1

π (1 + (y − θ)2)?

It’s notorious for having no mean (hence no variance). Whatabout the asymptotic behavior of MLE’s?

Theory above suggests: θnd→ N

(θ, 2

n

).

1169/1175

Page 80: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

Important properties of MLE estimates

Important properties of the MLE’s

Sample of size 40 drawn fromCauchy with θ = 0. We comparethe “Bayesian posterior” densitywith the normal approx., basedon first n observations for n =3,15 and 40. First three simulatedvalues were 5.01, 0.40 and -8.75:pretty spread out. Here, for largen, posterior density moreconcentrated around the meanthan the normal, but normalmust necessarily tail off morequickly.

1170/1175

Page 81: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

CI for Maximum Likelihood Estimates

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 82: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

CI for Maximum Likelihood Estimates

CI for Maximum Likelihood Estimates

Suppose we are interested in constructing a confidence intervalfor θ and let θ denotes its maximum likelihood estimate.

Recall that θ is asymptotically normally distributed with mean:

E[θ]

= θ,

and variance:

Var(θ)

=1

n·(E

[(∂

∂θlog (fX (x |θ))

)2])−1

=1

n · If ?(θ),

where If ?(θ) =E

[(∂

∂θlog (fX (x |θ))

)2]

= −E[∂2

∂θ2log (fX (x |θ))

].

1171/1175

Page 83: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

CI for Maximum Likelihood Estimates

CI for Maximum Likelihood Estimates

We then have:√n · If ? (θ) ·

(θ − θ

)d→ N (0, 1) .

Using this as a pivotal quantity, we have approximately:

Pr

(−z1−α/2 <

√n · If ?

(θ)·(θ − θ

)< z1−α/2

)≈ 1− α,

or, equivalently:

θ ± z1−α/2 ·1√

n · If ?(θ)

is an approximate 100 (1− α) % confidence interval for theparameter θ.

1172/1175

Page 84: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Maximum Likelihood estimate

CI for Maximum Likelihood Estimates

CI for Maximum Likelihood Estimates

Note that the variance Var(θ)

actually depends on the

parameter θ and is being estimated by replacing θ by θ sothat:

Var

(θ)

=1

n · If ?(θ).

The standard error is usually defined to be:

s.e.(θ)

=

Var(θ).

1173/1175

Page 85: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Summary

Summary

Evaluating estimators & Interval estimation using CIs

Evaluating estimatorsFisher (1922) on good estimatorsUMVUE’sCramer-Rao Lower Bound (CRLB)ConsistencySufficient Statistics

Interval estimation using confidence intervalsIntroductionThe Pivotal Quantity MethodExamples & Exercises

Maximum Likelihood estimateImportant properties of MLE estimatesCI for Maximum Likelihood Estimates

SummarySummary

Page 86: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Summary

Summary

Evaluating estimators

1. UMVUE estimator: unbiased (E[T ] = τ(θ)), minimumvariance (Var(T ) ≤ Var(T ?) for all T ?). If estimator T is onCRLB then T is UMVUE.

CRLB: Var(T (X1,X2, . . . ,Xn)) ≥ 1

n · If ? (θ).

2. Consistent estimator:

limn→∞

Pr (|Tn − θ| < ε) = 1, i.e., Tna.s.→ θ.

3. Sufficient statistic: T is sufficient for θ if the conditionaldistribution of X1,X2, . . . ,Xn given T = t does not depend onθ for any value of t.

1174/1175

Page 87: Week 6 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 6

Summary

Summary

Interval estimatorsPivotal quantity method:

1. Find the pivot;

2. Find the function g(X1, . . . ,Xn) such thatPr(q1 < g(X1, . . . ,Xn, θ) < q2) = 1− α;

3. The 100(1− α)% confidence interval of θ:

g−1(X1, . . . ,Xn; q1) ≶ θ ≶ g−1(X1, . . . ,Xn; q2).

Properties of MLE: Asymptotically normally distributed

E[θMLn ]→ θ and Var(θML

n )→ (nIf ?(θ))−1 as n→∞.1. Asymptotically unbiased and asymptotically on the CRLB,

hence asymptotically UMVUE;

2. Asymptotically consistent;

3. Asymptotically sufficient.1175/1175