Section 02 Review of Probability and Statistics

7/28/2019 Section 02 Review of Probability and Statistics

http://slidepdf.com/reader/full/section-02-review-of-probability-and-statistics 1/183

Brief Review

Probability and Statistics



Probability distributions

Continuous distributions



Defn (density function)

Let x denote a continuous random variable then f ( x) is

called the density function of x

1) f ( x) ≥ 0

2)

3)

( ) 1 f x dx

( )

b

a f x dx P a x b



Defn (Joint density function)

Let x = ( x1 , x2 , x3 , ... , xn) denote a vector of continuous

random variables then

f (x) = f ( x1 , x2 , x3 , ... , xn)

is called the joint density function of x = ( x1 , x2 , x3 , ... , xn)if

1) f (x) ≥ 0

2)

3)

1)(

xx d f

R xxx P d f R

)(



Note:

nn dxdxdx x x x f d f 2121 ,,)(

xx

n

R

n

R

dxdxdx x x x f d f 2121 ,,)( xx



Defn (Marginal density function)

The marginal density of x1 = ( x1 , x2 , x3 , ... , x p) ( p < n)

is defined by:

f 1(x1) = =

where x2 = ( x p+1 , x p+2 , x p+3 , ... , xn)

2)( xx d f 221 ),( xxx d f

The marginal density of x2 = ( x p+1 , x p+2 , x p+3 , ... , xn) is

defined by: f 2(x2) = =

where x1 = ( x1 , x2 , x3 , ... , x p) 121 ),( xxx d f 1)( xx d f



Defn (Conditional density function)

The conditional density of x1 given x2 (defined in previous

slide) ( p < n) is defined by:

f 1|2(x1 |x2) =

conditional density of x2 given x1 is defined by:

f 2|1(x2 |x1) =

22

21

22

),()(

x

xx

x

x

f

f

f

f

11

21

11

),()(

x

xx

x

x

f

f

f

f



Marginal densities describe how the subvector xi behaves ignoring x j

Conditional densities describe how the

subvector xi behaves when the subvector x j is

held fixed



Defn (Independence)

The two sub-vectors (x1 and x2) are called independent

if:

f (x) = f (x1, x2) = f 1(x1) f 2(x2)

= product of marginals

or

the conditional density of xi given x j :

f i|j(xi |x j) = f i(xi) = marginal density of xi



Example (p-variate Normal)

The random vector x ( p × 1) is said to have the

p -variate Normal distribution with

mean vector m ( p × 1) and

covariance matrix S ( p × p)

(written x ~ N p(m,S)) if:

S

S )()'(

2

1exp2

1 1

2/12/μxμxx

p f



Example (bivariate Normal)

The random vector is said to have the bi variate

Normal distribution with mean vector

and

covariance matrix

2

1

m

m μ

SS

)()'(2

1exp

2

1 1

2/12/μxμxx

p f

2

1

x

xx

S

2

221

21

2

1

2212

1211



S

S

)()'(

2

1exp

2

1, 1

2/121 μxμx

x x f

212/12

122211

,exp2

1 x xQ

)()'(,

1

2212

1211

21 μxμx

x xQ

2

122211

2

2211221112

2

1122 )())((2)(

m m m m

x x x x



212

1121

,exp12

1, x xQ x x f

21, x xQ

2

2

2

22

2

22

1

11

2

1

11

1

2

m

m

m

m

x x x x



Theorem (Transformations)

Let x = ( x1 , x2 , x3 , ... , xn) denote a vector of

continuous random variables with joint density

function f ( x1 , x2 , x3 , ... , xn) = f (x). Let

y1 =f 1( x1 , x2 , x3 , ... , xn)

y2 =f 2( x1 , x2 , x3 , ... , xn)

... yn =f n( x1 , x2 , x3 , ... , xn)

define a 1-1 transformation of x into y.



Then the joint density of y is g(y) given by:

g (y) = f (x)| J | where

),...,,,(

),...,,,(

)(

)(

321

321

n

n

y y y y

x x x x J

y

x

n

n

nn

n

n

y

x

y

x

y

x

y

x

y

x

y

x

y x

y x

y x

...

...

...

...

det

21

22

2

2

1

11

2

1

1

= the Jacobian of thetransformation



Corollary (Linear Transformations)



function f ( x1 , x2 , x3 , ... , xn) = f (x). Let

y1 = a11 x1 + a12 x2 + a13 x3 , ... + a1n xn

y2 = a21 x1 + a22 x2 + a23 x3 , ... + a2n xn

... yn = an1 x1 + an2 x2 + an3 x3 , ... + ann xn




Then the joint density of y is g(y) given by:

)det(

1)(

)det(

1)()( 1

A A f

A f g yxy

nnnn

n

n

aaa

aaa

aaa

A

...

...

...

where

21

22221

11211

C ( i f i f



Corollary (Linear Transformations for Normal

Random variables)

Let x = ( x1 , x2 , x3 , ... , xn) denote a vector of continuous

random variables having an n-variate Normaldistribution with mean vector m and covariance matrixS.

i.e. x ~ Nn(m, S)

Let

y1 = a11 x1 + a12 x2 + a13 x3 , ... + a1n xn

y2 = a21 x1 + a22 x2 + a23 x3 , ... + a2n xn

... yn = an1 x1 + an2 x2 + an3 x3 , ... + ann xn


Then y = ( y1 , y2 , y3 , ... , yn) ~ N n( Am, AS A')



Defn (Expectation)



function

f (x) = f ( x1 , x2 , x3 , ... , xn).

Let U = h(x) = h( x1 , x2 , x3 , ... , xn)

Then

xxxx d f hh E U E )()()(



Defn (Conditional Expectation)

Let x = ( x1 , x2 , x3 , ... , xn) = (x1 , x2 ) denote a

vector of continuous random variables with joint

density function

f (x) = f ( x1 , x2 , x3 , ... , xn) = f (x1 , x2 ).

Let U = h(x1) = h( x1 , x2 , x3 , ... , x p)

Then the conditional expectation of U given x2

1

212|11212 )()()( xxxxxxx d f hh E U E



Defn (Variance)



function

f (x) = f ( x1 , x2 , x3 , ... , xn).

Let U = h(x) = h( x1 , x2 , x3 , ... , xn)

Then 222 )()( xx h E h E U E U E U Var U



Defn (Conditional Variance)

Let x = ( x1 , x2 , x3 , ... , xn) = (x1 , x2 ) denote a

vector of continuous random variables with joint

density function

f (x) = f ( x1 , x2 , x3 , ... , xn) = f (x1 , x2 ).

Let U = h(x1) = h( x1 , x2 , x3 , ... , x p)

Then the conditional variance of U givenx2

2

2

112 )()( xxxx h E h E U Var



Defn (Covariance, Correlation)


continuous random variables with joint densityfunction

f (x) = f ( x1 , x2 , x3 , ... , xn).

Let U = h(x) = h( x1 , x2 , x3 , ... , xn) and

V = g (x) = g ( x1 , x2 , x3 , ... , xn)

Then the covariance of U and V.

)()()()( xxxx g E g h E h E

V E V U E U E V U Cov ,

ncorrelatio

)()(

, and

V Var U Var

V U CovUV



Properties

• Expectation

• Variance

• Covariance

• Correlation



1. E [a1 x1 + a2 x2 + a3 x3 + ... + an xn]= a1 E [ x1] + a2 E [ x2] + a3 E [ x3] + ... + an E [ xn]

or E [a'x] = a' E [x]



2. E[UV] = E[h(x1)g(x2)]= E[U]E[V] = E[h(x1)]E[g(x2)]

if x1 and x2 are independent



3. Var [a1 x1 + a2 x2 + a3 x3 + ... + an xn]

or Var [a'x] = a′S a

n

ji

ji ji

n

i

ii x xCovaa xVar a ],[2][1

2

S

)(...),(),(

...),(...)(),(

),(...),()(

where

21

2212

1211

nnn

n

n

xVar x xCov x xCov

x xCov xVar x xCov

x xCov x xCov xVar



4. Cov[a1 x1 + a2 x2 + ... + an xn ,

b1 x1 + b2 x2 + ... + bn xn]

or Cov[a'x, b'x] = a′S b

n

ji

ji ji

n

i

i ji x xCovba xVar ba ],[][1



5.

6.

22xx U E E U E

22 22xx xx U E Var U Var E U Var



Multivariate distributions



The Normal distribution



1.The Normal distribution –

parameters m and (or 2)

Comment: If m = 0 and = 1 the distribution is

called the standard normal distribution

0

0.005

0.01

0.015

0.02

0.025

0.03

0 20 40 60 80 100 120

Normal distribution

with m = 50 and =15

Normal distribution with

m = 70 and =20



2

221

( ) e ,

2

x

f x x

m

The probability density of the normal distribution

If a random variable, X , has a normal distribution

with mean m and variance 2 then we willwrite:

2~ , X N m



The multivariate Normal

distribution



Let

1

p

x

x

x

= a random vector

Let1

p

m

m

m

= a vector of constants (the

mean vector)



Let1

1

p

p p

p pp

S

= a p × p positive

definite matrix



Definition

The matrix A is positive semi definite if

for all x Ax 0 x

Further the matrix A is positive definite if

only if x Ax 0 x 0



1( ) , , p f x f x x

Suppose that the joint density of the random

vector

The random vector, [ x1, x2, … x p] is said

to have a p-variate normal distribution withmean vector and covariance matrix S

We will write: ~ , p x N m S

11

2/ 2 1/ 2

1e

2

x x

p

m m

S

S

x

m

is: x



Example: the Bivariate Normal distribution

11

21 2 1/ 2

1, e

2

x x

f x xm m

S

S

with1

2

m m

m

2

2 1 2

22 22 22 2 2

S

and



Now

1 x xm m S

and

2 2 2 2

22 12 1 2 1 S

-1

11 12 1 1

1 1 2 2

12 22 2 2

, x

x x x

m m m

m

22 12 1 1

1 1 2 2

12 11 2 2

1,

x x x

x

m m m

m

S



2 2

22 1 1 12 1 1 2 2 11 2 2

12 x x x x m m m m

S

2 22 2

2 1 1 1 2 1 1 2 2 1 2 2

2 2 2

1 2

2

1

x x x x m m m m

2 2

1 1 1 1 2 2 2 2

1 1 2 2

2

2

1

x x x xm m m m



Hence

11

21 2

1/ 2

1, e

2

x x

f x xm m

S

S

2 2

1 1 1 1 2 2 2 2

1 1 2 2

1 2 2

2

,1

x x x x

Q x x

m m m m

1 2

1,

2

21 2

1e

2 1

Q x x

where



Note:

2 21 1 1 1 2 2 2 2

1 1 2 2

1 2 2

2

,1

x x x x

Q x x

m m m m

1 2

1,

21 2

21 2

1, e

2 1

Q x x

f x x

is constant when

is constant.

This is true when x1, x2 lie on an ellipse

centered at m 1, m 2 .



Surface Plots of the bivariate

Normal distribution



Contour Plots of the bivariate

Normal distribution



Scatter Plots of data from the

bivariate Normal distribution



Trivariate Normal distribution - Contour map

x1 x2

x3

mean vector 1

2

3

m

m m

m

1 = const x xm m S



Trivariate Normal distribution

x1

x2

x3




x1 x2

x3




x1

x2

x3

example



example

In the following study data was collected for a

sample of n = 183 females on the variables

• Age,

• Height (Ht),

• Weight (Wt),

• Birth control pill use (Bpl - 1=no pill, 2=pill)

and the following Blood Chemistry measurements

• Cholesterol (Chl),• Albumin (Abl),

• Calcium (Ca) and

• Uric Acid (UA). The data are tabulated next page:

The data :Age Ht Wt Bpl Chl Alb Ca UA Age Ht Wt Bpl Chl Alb Ca UA Age Ht Wt Bpl Chl Alb Ca UA

22 67 144 1 200 43 98 54 27 64 120 1 172 43 98 60 37 67 125 2 200 45 99 66



The data :22 67 144 1 200 43 98 54 27 64 120 1 172 43 98 60 37 67 125 2 200 45 99 6625 62 128 1 243 41 104 33 27 64 180 2 317 37 98 84 37 65 116 1 270 42 100 4825 68 150 2 50 38 96 30 27 69 137 1 195 46 101 42 37 63 129 2 230 36 91 2219 64 125 1 158 41 99 47 27 64 125 2 185 36 94 54 38 64 165 1 255 44 102 6219 67 130 2 255 45 105 83 27 63 125 1 168 42 97 41 38 65 151 2 275 38 94 4620 64 118 1 210 39 95 40 27 64 124 2 200 40 96 52 39 64 135 1 210 40 95 4620 64 118 1 210 39 95 40 27 60 140 1 250 36 98 68 39 64 108 2 198 44 90 3820 65 119 2 192 38 93 50 27 65 155 2 280 42 103 52 39 63 195 1 260 40 108 4221 60 107 1 246 42 101 52 28 65 108 1 260 48 106 51 39 69 132 2 180 39 94 3021 65 135 2 245 34 106 48 28 62 110 2 250 44 105 38 39 62 100 1 210 45 91 2721 63 100 1 208 38 98 54 28 65 120 1 175 48 100 47 39 62 110 2 235 41 99 3521 64 120 2 260 47 106 38 28 66 113 2 305 41 93 24 40 63 110 1 196 39 97 42

21 67 134 1 204 40 108 34 28 62 135 1 200 43 97 37 40 64 151 2 305 39 99 4821 67 145 2 192 39 95 49 28 65 160 2 235 42 101 41 40 65 145 1 170 45 100 4321 63 138 1 280 41 102 41 29 61 142 1 177 39 99 46 40 66 140 2 276 46 100 5521 64 113 2 230 39 99 38 29 61 115 2 235 45 98 47 40 65 140 1 272 41 91 4421 63 160 1 215 39 96 39 29 68 155 1 226 38 94 43 40 65 137 2 315 37 96 9921 64 115 2 225 44 105 44 29 65 118 2 230 44 99 44 40 67 130 1 300 40 106 5221 68 125 1 165 48 105 28 30 66 143 1 198 45 107 65 40 62 117 2 290 42 99 4221 62 106 2 200 38 95 40 30 63 110 2 295 45 98 46 41 62 116 1 320 44 111 6121 68 150 1 220 47 102 75 30 61 99 1 230 43 99 39 41 68 215 2 255 43 105 4521 64 130 2 255 34 102 40 30 63 132 2 200 37 96 34 41 64 125 1 306 45 98 6222 62 135 1 263 43 98 47 30 62 125 1 230 46 104 48 41 69 170 2 324 40 99 5522 62 110 2 173 42 97 37 30 63 110 2 262 33 99 41 42 60 105 1 240 41 101 5122 57 105 1 170 46 98 45 30 64 135 1 174 40 95 35 42 63 129 2 210 40 100 4622 64 120 2 290 37 98 59 30 66 112 2 250 44 100 35 43 66 167 1 210 40 100 5222 64 115 1 263 42 102 47 30 64 160 1 217 35 95 31 43 68 145 2 250 36 98 4222 59 94 2 220 47 105 46 31 65 125 1 250 43 98 39 43 66 138 1 335 44 105 58

22 67 125 1 200 43 100 44 31 66 120 2 237 34 91 49 43 66 132 2 230 42 98 4822 62 97 2 192 38 95 43 31 65 115 1 270 41 111 64 43 64 125 1 285 45 105 5022 58 100 1 247 42 104 52 31 63 110 2 280 44 99 49 43 62 113 2 200 40 93 3622 66 130 2 175 44 106 58 31 66 123 1 238 37 96 33 43 64 126 1 280 45 106 3822 60 100 1 155 41 96 45 31 67 136 2 218 38 95 42 43 65 148 2 276 41 105 5022 60 100 1 155 41 96 45 32 67 132 1 185 39 103 37 55 64 124 1 275 40 98 5322 65 135 2 215 40 93 43 32 68 203 2 235 38 99 37 55 64 165 2 298 36 100 6322 60 95 1 200 47 99 34 32 62 155 1 262 37 99 43 44 62 118 1 253 43 94 4422 67 124 2 247 44 102 45 32 65 126 2 160 41 97 40 44 63 133 2 242 47 104 4923 63 125 1 220 32 92 42 32 63 125 1 189 40 94 40 45 67 180 1 160 38 97 5923 64 105 2 207 42 100 40 32 71 170 2 205 37 90 60 45 65 140 2 263 45 107 5223 63 125 1 266 42 103 47 32 62 120 1 260 43 107 38 46 67 145 2 320 40 101 3723 63 120 2 240 43 101 39 32 62 145 2 240 45 108 42 46 63 138 1 257 40 90 6124 68 125 1 195 49 106 52 32 66 140 1 197 44 106 58 46 62 118 2 190 38 95 4324 64 130 2 250 39 103 46 32 68 133 2 180 32 95 40 46 62 103 1 230 43 102 3324 64 130 2 250 39 103 46 54 67 140 2 245 39 104 56 46 65 190 2 265 41 108 8524 65 130 1 225 50 108 39 33 64 115 1 205 47 100 54 47 67 135 1 297 42 100 4524 65 148 2 200 37 104 49 33 60 118 2 260 38 99 38 47 67 143 2 255 41 100 4024 64 135 1 180 37 96 49 33 67 137 1 243 41 106 55 47 61 132 1 257 39 96 3824 71 156 2 240 42 102 51 33 68 130 2 195 40 95 58 47 59 94 2 257 41 103 5325 62 107 1 330 48 101 53 33 65 130 1 203 44 101 48 48 62 120 1 300 39 94 5125 67 175 2 175 39 93 51 33 69 138 2 222 40 104 42 48 66 143 2 225 40 100 6225 66 112 1 205 46 101 33 34 62 112 1 197 37 93 44 48 67 143 1 216 40 96 4725 63 120 2 235 44 103 40 34 63 125 2 245 38 95 41 48 65 134 2 248 42 102 4254 67 127 2 260 44 106 57 35 62 115 1 180 40 91 59 48 65 164 1 306 44 100 7825 67 135 1 295 46 106 47 35 67 125 2 223 40 100 37 48 66 120 2 235 36 97 3525 67 141 2 230 38 101 52 35 66 138 1 254 39 107 41 48 60 125 1 195 41 95 5326 66 135 1 240 48 103 51 35 66 140 2 245 39 105 56 48 64 138 2 338 37 100 5826 64 118 2 238 40 99 46 36 62 135 1 247 34 90 44 49 64 126 1 255 41 102 4826 65 125 1 198 44 96 43 36 67 120 2 175 46 103 39 49 69 158 2 217 36 106 6526 65 120 2 196 38 95 43 36 66 112 1 215 43 104 42 50 69 135 1 295 43 105 63

36 65 121 2 270 43 98 35 50 66 140 2 390 46 97 55 52 62 107 2 265 46 104 6453 65 140 2 220 40 107 46 54 66 158 1 305 42 103 48 54 60 170 2 220 35 88 63

3D Scatterplot Wt Ht Age



3D Scatterplot Wt,Ht Age

3D Scatterplot Alb, Chl, Bp



Marginal and Conditional

distributions

Th (W db )



11 1 1 1 1 1 A CBD A A C B DA C DA

Theorem: (Woodbury)

Proof:

11 1 1 1 1 A CBD A A C B DA C DA

11 1 1 1 1 A CBD A A CBD A C B DA C DA

11 1 1 1 1 I CBDA C CBDA C B DA C DA

11 1 1 1 1 1 I CBDA CB B DA C B DA C DA

1 1 I CBDA CBDA I

E l



1

1n n n n

b I bJ I J

nb

Example:

Solution:

11 1 1 1 1 1Use A CBD A A C B DA C DA

1 1

with , 1, 1 ,n A I C D B b

1 11hence 1 1 1 1 1 1n n n n n I b I I b I I

1

111n I

b n

1n n

b I J

nb

Theorem: (Inverse of a partitioned symmetric matrix)



11 12 11 121

12 22 12 22

Let and A A B B

A B A

A A B B

Theorem: (Inverse of a partitioned symmetric matrix)

-11

11 11 12 22 12Then B A A A A

-11

22 22 12 11 12 B A A A A

-11 1

12 22 12 11 12 22 12 B A A A A A A

-11 1

12 11 12 22 12 11 12 B A A A A A A

11 1 1 111 11 12 22 12 11 12 12 11 A A A A A A A A A

11 1 1 1

22 22 12 11 12 22 12 12 22 A A A A A A A A A



11 12 11 121

12 22 12 22

Let and A A B B

A B A A A B B

Proof:

11 11 12 12 11 12 12 22

12 11 22 12 12 12 22 22

0=

0

q

p q

I A B A B A B A B

I A B A B A B A B

11 12 11 121

12 22 12 22

Then p

A A B B I AA AB

A A B B

11 11 12 12 11 12 12 22

12 11 22 12 12 12 22 22

0or

0

q

p q

A B A B I A B A B

A B A B A B A B I



1 1 1

11 11 11 12 12 12 11 12 22

1 1 1

12 22 12 11 22 22 22 12 12

hence0

B A A A B B A A B

B A A B B A A A B

1 1 1

11 12 22 12 11 11or I A A A A B A

1

11 12 22 12 11and A A A A B I

1 1 1

11 11 11 12 22 12 11and B A A A A A B

-11

11 11 12 22 12hence B A A A A -1

1

22 22 12 11 12similarly B A A A A -1

1 1

12 22 12 11 12 22 12and B A A A A A A -1

1 1

12 11 12 22 12 11 12 B A A A A A A

Theorem: (Determinant of a partitioned symmetric



11 12

12 22

Let A A

A A A

Theorem: (Determinant of a partitioned symmetric

matrix)

1

11 12 22 12 22

122 12 11 12 11

Then A A A A A

A

A A A A A

Proof: 111 12 11 11 12

1

12 22 12 22 12 11 12

0 Note

0

A A A I A A A

A A A I A A A A

*0

and0

B B C B D

C D D

Theorem: (Marginal distributions for the Multivariate



1

2

Let q

p q x x x

Theorem: (Marginal distributions for the Multivariate

Normal distribution)

11 12

12 22

S S

S S S

have p-variate Normal distribution

with mean vector 1

2

q

p q

m m

m

and Covariance matrix

Then the marginal distribution of is qi-variate Normaldistribution (q1 = q, q2 = p - q)

iiS

with mean vector im


i x

Theorem: (Conditional distributions for the



1

2

Let q

p q x x x

Theorem: (Conditional distributions for the

Multivariate Normal distribution)

11 12

12 22

S S

S S S

have p-variate Normal distribution

with mean vector 1

2

q

p q

m m

m


Then the conditional distribution of given is qi-variate Normal distribution

1-ii j ii ij jj ij

S S S S S

with mean vector 1=i j i ij jj j j xm m m S S


i x j x

Proof: (of Previous

two theorems)



1

2

x

x x

Proof: (of Previous two theorems)

11 12

12 22

S S

S S S

is

where1

2

q

p q

m m

m

,

The joint density of

and

112

12

1 2 / 2

1,

2

x x

p f x f x x e

m m

S S

1

1 2212

,

/ 2

1

2

Q x x

p e

S

1

1 2,Q x x x xm m S

1



-1 11 1211 121

21 2212 22

S S S SS S S S S

where

,

and

1

1 2,Q x x x xm m S

11 121 1

1 1 2 2 21 222 2

, x

x x x

m m m

m

S S

S S

22

2 2 2 2 x xm m

S

11 12

1 1 1 1 1 1 2 22 x x x xm m m m S S

-1

12 1 1 21

11 12 22 12 11 12 =

S S S S S S S S

111 1 1 1 111 11 12 22 12 11 12 12 11

S S S S S S S S S S 1

22 1

22 12 11 12

S S S S S



also

,

and

-1

1 1

1 1 11 12 22 12 11 12 2 22 x xm m

S S S S S S

1

1 1 11 1 1 x xm m S

1

11 22 12 11 12

S S S S S S

22

2 2 2 2 x xm m S

11 12

1 1 1 1 1 1 1 1 2 2, 2Q x x x x x xm m m m

S S

1

1 1 1

1 1 11 12 22 12 11 12 12 11 1 1 x xm m S S S S S S S S

-1

1

2 2 22 12 11 12 2 2 x xm m

S S S S



,

-1

1 1

2 2 12 11 1 1 22 12 11 12 x xm m

S S S S S S

1

1 2 1 1 11 1 1Hence ,Q x x x xm m S

1

2 2 12 11 1 1 x xm m S S

1 1 2 1 2,Q x Q x x

1

1 1 1 11 1 1where Q x x xm m S

1

2 1 2 2 2 12 11 1 1and ,Q x x x xm m

S S

-1

1 1

22 12 11 12 2 2 12 11 1 1 x xm m

S S S S S S

-1

2 2 x b A x b

1h b S S



1

2 12 11 1 1where b xm m

S S

1

22 12 11 12and A

S S S S

11 22

12

,

1 2 / 2

1now ,

2

Q x x

p f x f x x e

S

11 1 2 1 22

1122

,

/ 2 111 22 12 11 12

1

2

Q x Q x x

p

e

S S S S S

111 1 11 1 12

12

/ 2

11

1

2

x x

qe

m m

S

S

112 22

12

/ 2

1

2

x b A x b

p qe

A



1 1 1 2 2 1 2 1, , q p f x f x x dx f x x dx dx

111 1 11 1 12

12

/ 2

11

1

2

x x

qe

m m

S S

112 22

12

2/ 2

1

2

x b A x b

p qe dx

A

The marginal distribution of is1 x

11 1 1 11 1 12

12

/ 2

11

12

x xq

e m m

S S



1 2

2|1 2 1

1 1

,

f x x

f x x f x

112 22

12

/ 2

1

2

x b A x b

p qe

A

The conditional distribution of given is:2 x

1 x

1

2 12 11 1 1where b xm m

S S

1

22 12 11 12and A

S S S S



1

2 1 22 12 11 12The matrix

S S S S S

is called the matrix of partial var iances and covariances .

th

2 1The , element of the matrixi j S

1,2....ij q

is called the partial covariance (variance if i = j)

between xi and x j given x1, … , xq.

1,2....

1,2....

1,2.... 1,2....

ij q

ij q

ii q jj q

is called the partial correlation between xi and x j given

x1, … , xq.



1

12 11the matrix

S S

is called the matrix of regression coeff icients for

predicting xq+1, xq+2, … , x p from x1, … , xq.

1

2 1 1 2 12 11 1where xm m m

S S

Mean vector of xq+1, xq+2, … , x p given x1, … , xqis:

Example:x



p

10

15and

6

14

m

Suppose that

1

2

3

4

x

x x

x x

Is 4-variate normal with

4 2 4 2

2 17 6 5

4 6 14 6

2 5 6 7

S

x



1

10and

15m

The marginal distribution of 1

1

2

x x

x

is bivariate normal with

11

4 2

2 17

S

1

10

15 and

6

m

The marginal distribution of

1

1 2

3

x

x x

x

is trivariate normal with

11

4 2 4

2 17 6

4 6 14

S

x



Find the conditional distribution of

1

12

15

5

x

x x

given

114 2

2 17 S

1 2

10 6and

15 14

m m

Now

and

3

2

4

x x

x

2214 6

6 7 S

124 2

6 5 S



1

2 1 22 12 11 12

S S S S S

114 6 4 6 4 2 4 2

6 7 2 5 2 17 6 5

9 3

3 5



1

12 11

S S

The matrix of regression coeff icients for predicting x3, x4

from x1, x2.

14 6 4 2

2 5 2 17

0.875 .250

0.375 .250

1

2 12 11 1 m m

S S



1 2

2 1 1

1 2

0.875 0.250 6.5

0.375 0.250 6.5

x x x

x xm

6 0.875 0.250 10

14 0.375 0.250 15

6.5

6.5

2 12 11 1m m

2 1m m

0.875 15 0.250 5 6.5 7.875

0.375 15 0.250 5 6.5 13.375



The Chi-square distribution




The Chi-square ( c 2) distribution with n d.f.

21

2 2

112

2

0

0 0

x x e x

f x

x

n

n

n

2

2 2

2

2

10

2

0 0

x

x e x

x

n

n n


Graph: The c 2 distribution



0

0.1

0.2

0 4 8 12 16

(n = 4)

(n = 5)

(n = 6)

Basic Properties of the Chi-Square distribution



1. If z has a Standard Normal distribution then z 2 has

a c 2 distribution with 1 degree of freedom.

p q

2. If z 1, z 2,…, z n are independent random variables

each having Standard Normal distribution then

has a c 2 distribution with n degrees of freedom.

2 2 2

1 2 ...U z z z n

3. Let X and Y be independent random variables

having a c 2 distribution with n 1 and n 2 degrees of

freedom respectively then X + Y has a c 2

distribution with degrees of freedom n 1 + n 2.

continued



4. Let x1, x2,…, xn, be independent random variables

having a c 2 distribution with n 1

, n

2,…, n

n degrees

of freedom respectively then x1+ x2 +…+ xn has a

c 2 distribution with degrees of freedom n 1 +…+ n n.

5. Suppose X and Y are independent random variableswith X and X + Y having a c 2 distribution with n 1and n (n > n 1 ) degrees of freedom respectively

then Y has a c 2 distribution with degrees of

freedom n - n 1.

The non-central Chi-squared distribution



q

If z 1, z 2,…, z n are independent random variables eachhaving a Normal distribution with mean m i and

variance 2 = 1, then

has a non-central c2 distribution with n degrees of

freedom and non-centrality parameter

2 2 2

1 2

...U z z z n

n

m 1

2

21

i

i

Mean and Variance of non-central c2



distribution

If U has a non-central c2

distribution with n degrees of freedom and non-centrality parameter

n

m 1

2

21

i

i

Then

n

m n n 1

22i

iU E n 42 U Var

If U has a central c2

distribution with n degrees of freedom and is zero, thus

n U E n 2U Var



Distribution of Linear and

Quadratic Forms

S SN



Suppose S, μy N

Consider the random variable22

222

2

111 nnn ya ya yaU yAy

nnnn y ya y ya 1,12112 22

Questions

1. What is the distribution of U ? (many statistics

have this form)

2. When is this distribution simple?

3. When we have two such statistics when are

they independent?

Si l t C I0N



Simplest Case I0y , N

22

2

2

1 n y y yU

yIy

Then the distribution of U is the central c 2

distribution with n = n degrees of freedom.

Now consider the distribution of other quadratic forms



Now consider the distribution of other quadratic forms

then, Suppose

2

IμyTheorem

N

c

, 1 22

2

2

12n y y yU n

yIy

where

n

m

1

2

2

12

i

i



then, Note IμyzProof

11

N

c

, 1 22

2

2

12n y y yU n

yIyzz

with

n

m

1

2

2

11121

2

i

iμμ

thenSuppose SμyTheorem

N



then, Suppose SμyTheorem N

c , 1 nU S yy

with μμ S

21

n.s. withLet AAAProof S

Consider 1yAz

, 1 IμA

N

111 , Then S AAμAz

N

c , and nU zz

with μAμA11

1



with μAμA2

μμ

μAAμμAAμ

11

S

21

1

21

2

1

also

yy

yAAy

yAAy

yAyAzz

1

1

11

11

S

U

1



Hence c , 1 nU S yy

with μμ

S 21

then,Suppose I0zTheorem

N



then, Suppose I0zTheorem N

d.f.on withdistributi centralahas r U c zAz

if and only if

• A is symmetric idempotent of rank r.

Proof Since A is symmetric idempotent of rank r , there exists an orthogonal matrix Psuch that

A = PDP or PAP = D

Since A is idempotent the eigenvalues of Aare 0 or 1 and the # of 1’s is r , the rank of A.

Thus

0I

Dr



Thus

00

Dr n

and

00

0IAPP Let 21 PPP

000I

APPAPPAPPAPP

PPAP

PAPP

Then

2212

21111

21

2

1

11consider

yzPzPy



12

consider yzP

zPy

, Now IPP0Py

N , PP0

N

,I0

N

1 thus y

,I0

r N

d.f.on withdistributiahas and 2

11 r U c yy

zAzyAPPyy00

0Iyyy

11 Now U

d.f.on withdistributiahas Thus 2 r c zAz

theorem previousof tiongeneraliza(ATheorem

thS I

N



on.distributi , central-nonahas c r U

zAz

if and only if

• A is symmetric idempotent of rank r.

Proof Similar to previous theorem

then, Suppose Iμz N

μAμ

21 parameter centrality-nonwith

then, Suppose SμyTheorem

N



,pp μy

ondistributi , has c r U yAy

with μAμ 21

if and only if the following two conditions are

satisfied:

1. AS is idempotent of rank r.

2. SA is idempotent of rank r.

.such that beLet QQQProof S



ondistributi

,,ahas then 11111IμQQQμQyQz

S N N

on withdistributi

,ahas

theorem previousthe

fromthen,or Let

11

211

11

μAμμBQQμ

yAyyBQQyzBz

BQQAAQQB

r χ U

.rank of idempotentis if onlyandif r AQQB

.rank of isif onlyandif rank of is r r AAQQB

similar congruent,areand AAQQB

BBBAQQB idempotentisAlso



BBBAQQB idempotentis Also

AAA

AAA

QAQQAQQAQ

QAQQAQQAQ

QAQQQQAQQAQQQ

AQQAQQAQQ

SSS

SSS

shown becanitSimilarly

and

i.e.

i.e.

.or

. i.e.

11

.idempotentareandidempotentis Thus AABSS

gSummarizin



ondistributi , has c r U yAy

with μAμ

21

if and only if the following two conditions are

satisfied:

1. AS is idempotent of rank r.

2. SA is idempotent of rank r.

Application:



Application:

Let y1, y2, … , yn be a sample from the Normal

distribution with mean m , and variance 2.

Then

has a c 2 distribution with n = n -1 d.f. and = 0

(central)

2

21

snU

Proof

11 y



Proof

1

1

where,

2

1

1I1y m

N

y

y

y

n

n

i

n

i

in

y

y1

2

1

1

2

21

n

i

i y y sn

U 1

2

22

2 11



y11Iy

y11yyy

nn

11122

Ayy

11IA n

11 where 2

nnn

nnn

nnn

111

111

111

2

1

1

1

1

11II11IA

S

111Now 2



11II11IA

Snn

Now2

11IA S n1 Also

SS 11I11IAAnn

11

11111111I 2

111

nnn

nnnn 11111111I since 111

S A11In

1

1rankofidempotentis1

Thus S nr11IA



1rank of idempotentis Thus S nr n

11IA

Hence

has a c 2

distribution with n = n -1 d.f. and non-centrality parameter

yAy

2

21

snU

111I11A1 n12

21

21 m m m

012

21 111111

nm

Independence of Linear and Quadratic Forms



p Q

ondistributi ,ahave letAgain Sμy

N

and ,

s'V.R.followingheConsider t

21 yCvyByyAy

U U

? of tindependen

?of tindependen iswhen

1

21

yCvyAy

yByyAy

U

U U

Theorem

di t ib tihL t I

N



ondistributi ,ahave Let Iμy

N

.if

of tindependenis then

0CA

yCvyAy

U

Proof Since A is symmetric there exists an

orthogonal matrix P such that PAP = D whereD is diagonal.

Note: since CA = 0 then rank(A) = r n and

some of the eigenvalues (diagonal elements of D) are zero.

thus11

00

0DPPA

0PAPCP0CA



0PAPCP0CA

PC

BB

BB

00

00

00

0D

BB

BB

2221

1211

11

2221

1211

where

0B0BD

0DB0DB

21111

11211111

,thenexistssinceand

, i.e.

11

. thus 2

22

12B0

B0

B0PC

ndist',,ahasthen,Let IμPPPμPzyPz

N N



1111

2

111

21

now

zDzz

z

00

0Dzz

zPPAzyAy

22

2

1 Also zB

z

zB0zPCyC 2

221111

21

of tindependenisthenof tindependenisSince

zByCzDzyAyzz

Theorem

N kfihdi t ib tihL t SS



n N rank of ison wheredistributi ,ahave Let SSμy

formlinear theof tindependenis

formquadratictheimpliesthen

yCv

yAy0AC

S U

Proof Exercise.

Similar to previous thereom.

Application:



Application:

Let y1, y2, … , yn be a sample from the Normal

distribution with mean m , and variance 2.

Then

t.independenare 1

1

and 1

22

n

ii y yn s y

Proof

yCy1

n

n

nnn

y

y

y

yv 12

1

111

y1



yAyy11Iy

11

12

nn s

yCy1

n

n

nnn

y

y yv 12111

01111I1

11II1AC

S

11

1

11

11

and

nnnnn

nnn

I1μ

μy

S

S

m and

on withdistributi ,ahas Now

N

Q.E.D.

Theorem (Independence of quadratic forms)



.rank of is

on wheredistributi ,ahave Let

n

N

S

Sμy

t.independenare and

formsquadratictheimpliesthen

21 yByyAy

0BA

S

U U

Proof Let S = is non- singular) .

VVWW

VWBVAW0WV

0BA0BABA

S

and

symmetric.areand both: Note, whereor

Expected Value and Variance of quadratic



Theorem

thenandar and Suppose

S

S

AμAμ

yAyyμy

tr U E

U V E

S

S

AμAμeeAμAμ

eeAμAμeAeμAμ

eAeeAμμAeμAμ

eμAeμ

0eeeμyeProof

tr E tr

tr E tr E

E E E

E U E

E E

andthen,Let

forms

Summary



S AμAμyAy tr E U E

μyeeAeμAμ

where E

Summary

n

i

n

j jiij

n

i

n

j jiij eea E a 1 11 1 m m

Example – One-way Anova



n

j

ijni

k

i

n

j

iij y ySS y yU 1

1

1 1

Error

2

1 where

y11, y12, y13, … y1n a sample from N (m 1, 2)

y21, y22, y23, … y2n a sample from N (m 2, 2

)

yk 1, yk 2, yk 3, … ykn a sample from N (m k , 2)

Now iiijij y E y E m m

Thus



k

i

n

j iij

y y E U E 1 1

2

1

k

i

n

j

iij

k

i

n

j

iij ee E 1 1

2

1 1

2m m

22

1

1

2

11

2

1 and where

e

n

j

iijn

e s E ee s

21 nk

k

i

e

i sn E 1

210

Now let



k

i ik

k

i i

y ySS y ynU 1

1

Treatments1

2

2

where

m m m k

i

ik ii y E y E 1

1 and Now

Thus

k

i

i y yn E U E 1

2

2



i 1

k

i

i

k

i

i eenE n1

2

1

2m m

2

1

21 e

k

i

i s E k nn

m m

n

eVar s E eee

ee s

iek

k

iik e

2

21

1

2

1

12

and ,,, fromcalculated

variancesample where

2

1

21 m m

k n

k

i

i



Statistical Inference

Making decisions from data

There are two main areas of Statistical Inference



• Estimation – deciding on the value of a

parameter – Point estimation

– Confidence Interval, Confidence region Estimation

• Hypothesis testing

– Deciding if a statement (hypotheisis) about a

parameter is True or False



The general statistical modelMost data fits this situation

Defn (The Classical Statistical Model)



The data vector

x = ( x1 , x2 , x3 , ... , xn)

The model

Let f (x| q) = f ( x1 , x2 , ... , xn | q 1 , q 2 ,... , q p)

denote the joint density of the data vector x =

( x1 , x2 , x3 , ... , xn) of observations where the

unknown parameter vector q W (a subset of p-dimensional space).

An Example



The data vector

x = ( x1 , x2 , x3 , ... , xn) a sample from the normaldistribution with mean m and variance 2

The model

Then f (x| m ,

2

) = f ( x1 , x2 , ... , xn | m ,

2

), the jointdensity of x = ( x1 , x2 , x3 , ... , xn) takes on the form:

where the unknown parameter vector q (m , 2) W

={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

n

i

ii x

nn

n

i

x

ee f 1

22

2

2/1

22

2

1

2

1

m

m

m x

Defn (Sufficient Statistics)



Let x have joint density f (x| q) where the unknown

parameter vector q W.

Then S = (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) is called a set of

sufficient statistics for the parameter vector q if the

conditional distribution of x given S = (S 1(x) ,S 2(x)

,S 3(x) , ... , S k (x)) is not functionally dependent on the

parameter vector q.

A set of sufficient statistics contains all of the

information concerning the unknown parameter vector

A Simple Example illustrating Sufficiency



Suppose that we observe a Success-Failure experiment

n = 3 times. Let q denote the probability of Success.

Suppose that the data that is collected is x1, x2, x3 where

xitakes on the value 1 is the ith trial is a Success and 0 if

the ith trial is a Failure.

The following table gives possible values of ( x1, x2, x3).



( x1, x2, x3) f ( x1, x2, x3|q ) S =S xi g (S |q ) f ( x1, x2, x3| S )

(0, 0, 0) (1 - q )3 0 (1 - q )

3 1

(1, 0, 0) (1 - q )2q 1 1/3

(0, 1, 0) (1 - q )2q 1 1/3

(0, 0, 1) (1 - q )2q 1

3(1 - q )2q

1/3

(1, 1, 0) (1 - q )q 2 2 1/3

(1, 0, 1) (1 - q )q 2 2 1/3

(0, 1, 1) (1 - q )q 2 2

3(1 - q )q 2

1/3

(1, 1, 1) q 3 3 q

3 1

The data can be generated in two equivalent ways:

1. Generating ( x1, x2, x3) directly from f ( x1, x2, x3|q) or

2. Generating S from g(S|q ) then generating ( x1, x2, x3) from f ( x1,

x2, x3|S ). Since the second step does involve q no additional

information will be obtained by knowing ( x1, x2, x3) once S is

determined

The Sufficiency Principle



Any decision regarding the parameter q should

be based on a set of Sufficient statistics S 1(x),

S 2(x), ...,S k (x) and not otherwise on the value of x.

A useful approach in developing a statistical



A useful approach in developing a statistical

procedure

1. Find sufficient statistics

2. Develop estimators , tests of hypotheses etc.using only these statistics

Defn (Minimal Sufficient Statistics)



Let x have joint density f (x| q) where theunknown parameter vector q W.

Then S = (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) is a set

of Minimal Sufficient statistics for the parameter vector q if S = (S 1(x) ,S 2(x) ,S 3(x) , ...

, S k (x)) is a set of Sufficient statistics and can be

calculated from any other set of Sufficientstatistics.

Theorem (The Factorization Criterion)



Let x have joint density f (x| q) where the unknown parameter vector q W.

Then S = (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) is a set of

Sufficient statistics for the parameter vector q if

f (x| q) = h(x) g (S, q)

= h(x) g (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x), q).

This is useful for finding Sufficient statisticsi.e. If you can factor out q-dependence with a set of

statistics then these statistics are a set of Sufficient

statistics

Defn (Completeness)



Let x have joint density f (x| q) where the unknown parameter vector q W.

Then S = (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) is a set of

Complete Sufficient statistics for the parameter vector

q if S = (S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) is a set of

Sufficient statistics and whenever

E[f(S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) ] = 0

then

P[f(S 1(x) ,S 2(x) ,S 3(x) , ... , S k (x)) = 0] = 1

Defn (The Exponential Family)



Let x have joint density f (x| q)| where theunknown parameter vector q W. Then f (x| q)

is said to be a member of the exponential family

of distributions if:

,

0

)()(exp)()(1

Otherwise

b xa pS g h f iiii

k

i

i θxθxθx

q W,where

1) ∞ < a < b < ∞ are not dependent on



1) - ∞ < ai < bi < ∞ are not dependent on q.

2) W contains a nondegenerate k-dimensional

rectangle.

3) g (q), ai ,bi and pi(q) are not dependent on x.

4) h(x), ai ,bi and S i(x) are not dependent on q.

If in addition.



If in addition.

5) The S i( x) are functionally independent for i = 1, 2,..., k .

6) [S i(x)]/ x j exists and is continuous for all i = 1, 2,..., k j = 1,2,..., n.

7) pi(q) is a continuous function of q for all i = 1, 2,..., k .

8) R = {[p1(q),p2(q), ...,p K (q)] | q W,} contains nondegenerate

k-dimensional rectangle.

Then

the set of statistics S1(x), S2(x), ...,Sk (x) form a Minimal

Complete set of Sufficient statistics.

Defn (The Likelihood function)



Let x have joint density f (x|q) where the unkown parameter vector q W. Then for a

given value of the observation vector x ,the

Likelihood function, Lx(q), is defined by: Lx(q) = f (x|q) with q W

The log Likelihood function l x(q) is defined by:

l x(q) =lnLx(q) = lnf (x|q) with q W

The Likelihood Principle




be based on the likelihood function Lx (q) and not

otherwise on the value of x.If two data sets result in the same likelihood

function the decision regarding q should be the

same.



Some statisticians find it useful to plot the

likelihood function Lx (q) given the value of x.

It summarizes the information contained in xregarding the parameter vector q.

An Example



The data vector

x = ( x1 , x2 , x3 , ... , xn) a sample from the normaldistribution with mean m and variance 2

The joint distr ibution of x

Then f (x| m , 2) = f ( x1

, x2

, ... , xn | m , 2), the joint

density of x = ( x1 , x2 , x3 , ... , xn) takes on the form:

where the unknown parameter vector q (m , 2) W

={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

n

i

ii x

nn

n

i

x

ee f 1

22

2

2/1

22

2

1

2

1

m

m

m x

The Likelihood function

A d i k



Assume data vector is known

x = ( x1 , x2 , x3 , ... , xn)

The L ikel ihood function

Then L(m , )= f (x| m , ) = f ( x1 , x2 , ... , xn | m , 2),

22

1 22/ 2

1

1 1

2 2

n

ii

i

x xn

n ni

e em m

2

1

1

2

/ 2

1

2

n

i

i

x

n n e

m

2 2

1

12

2

/ 2

1

2

n

i i

i

x x

n ne

m m

or

2 2

1

12

2

/ 2

1,

2

n

i i

i

x x

n n L e

m m

m



2 2

1 1

1 22

/ 2

1

2

n n

i i

i i

x x n

n ne

m m

2 2 211 2

2/ 2

1

2

n s nx nx n

n n e

m m

2 2

2 2 2 21

1since or 11

n

i ni

ii

x nx

s x n s nxn

1

1

and since then

n

i ni

i

i

x

x x nx

n



Contour Map of Likelihood n = 100



1

S1

m

0 20

50

70

Now consider the following data: (n = 100)

57.1 72.3 75.0 57.8 50.3 48.0 49.6 53.1 58.5 53.7

77 8 43 0 69 8 65 1 71 1 44 4 64 4 52 9 56 4 43 9



2 2199 11.8571 100 62.02

250 100

1,

6.2832 L e

m m

77.8 43.0 69.8 65.1 71.1 44.4 64.4 52.9 56.4 43.9

49.0 37.6 65.5 50.4 40.7 66.9 51.5 55.8 49.1 59.5

64.5 67.6 79.9 48.0 68.1 68.0 65.8 61.3 75.0 78.0

61.8 69.0 56.2 77.2 57.5 84.0 45.5 64.4 58.7 77.5

81.9 77.1 58.7 71.2 58.1 50.3 53.2 47.6 53.3 76.4

69.8 57.8 65.9 63.0 43.5 70.7 85.2 57.2 78.9 72.9

78.6 53.9 61.9 75.2 62.2 53.2 73.0 38.9 75.4 69.7

68.8 77.0 51.2 65.6 44.7 40.4 72.1 68.1 82.2 64.7

83.1 71.9 65.4 45.0 51.6 48.3 58.5 65.3 65.9 59.6

mean 62.02

s 11.8571

Likelihood n = 100



1

S1

0

2E-170

4E-170

6E-170

8E-170

1E-169

1.2E-169

1.4E-169

1.6E-169

m

0

2050

70

Contour Map of Likelihood n = 100



1

S1

m

0 20

50

70

The Sufficiency Principle




be based on a set of Sufficient statistics S 1(x),

S 2(x), ...,S k (x) and not otherwise on the value of x.

If two data sets result in the same values for the

set of Sufficient statistics the decision regardingq should be the same.

Theorem (Birnbaum - Equivalency of the

Likelihood Principle and Sufficiency Principle)



p y p )

Lx1(q) Lx

2(q)if and only if

S 1(x1) = S 1(x2),..., and S k (x1) = S k (x2)

The following table gives possible values of ( x1, x2, x3).

(x1 x2 x3) f(x1 x2 x3|q) S =Sxi g(S |q) f(x1 x2 x3| S)



( x1, x2, x3) f ( x1, x2, x3|q ) S =S xi g (S |q ) f ( x1, x2, x3| S )

(0, 0, 0) (1 - q )3 0 (1 - q )

3 1

(1, 0, 0) (1 - q )2

q 1 1/3(0, 1, 0) (1 - q )

2q 1 1/3

(0, 0, 1) (1 - q )2q 1

3(1 - q )2q

1/3

(1, 1, 0) (1 - q )q 2 2 1/3

(1, 0, 1) (1 - q )q 2 2 1/3

(0, 1, 1) (1 - q )q 2 2

3(1 - q )q 2

1/3

(1, 1, 1) q 3 3 q 3 1

The Likelihood function

S = 0

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

S = 1

0

0.02

0.04

0.06

0.08

0.10.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

S = 2

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

S = 3

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1



Estimation Theory

Point Estimation

Defn (Estimator)



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x|q) where

the unknown parameter vector q W.

Then an estimator of the parameter f (q) = f (q 1 ,q 2 , ... , q k ) is any function T (x)=T ( x1 , x2 , x3 , ... ,

xn) of the observation vector.

Defn (Mean Square Error)



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x|q) where theunknown parameter vector q W.

Let T(x) be an estimator of the parameter

f (q). Then the Mean Square Error of T(x) isdefined to be:

2))()((... θxθx f T E E S M T

xθxθx d f T )|())()(( 2f

Defn (Uniformly Better)



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x|q) wherethe unknown parameter vector q W.

Let T(x) and T*(x) be estimators of the parameter f (q). Then T(x) is said to beuni formly better than T*(x) if:

θθ xx *...... T T E S M E S M Wθwhenever

Defn (Unbiased )



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x|q) wherethe unknown parameter vector q W.

Let T(x) be an estimator of the parameter f (q).Then T(x) is said to be an unbiased estimator of the parameter f (q) if:

θxθxxx f d f T T E )|()(

Theorem (Cramer Rao Lower bound) Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f(x| ) where



p p j

θˆ

p ji f E m ji

ij ,,2,1, )|(ln2

q q θx

Then V = M -1 is the lower bound for the covariance

matrix of unbiased estimators of q.That is, var(c' ) = c'var( )c ≥ c' M -1c = c'V c where

is a vector of unbiased estimators of q.θˆ θˆ

Defn (Uniformly Minimum Variance

Unbiased Estimator)




the unknown parameter vector q W. ThenT *(x) is said to be the UMVU (Uniformly

minimum variance unbiased) estimator of f (q)if:

1) E [T *(x)] = f (q) for all q W.

2) Var [T *(x)] ≤ Var [T (x)] for all q W whenever E [T (x)] = f (q).

Theorem (Rao-Blackwell)

Let x = (x x x x ) denote the vector of



Let x ( x1 , x2 , x3 , ... , xn) denote the vector of

observations having joint density f(x|q) wherethe unknown parameter vector q W.

Let S 1(x), S 2(x), ...,S K (x) denote a set of sufficientstatistics.

Let T(x) be any unbiased estimator of f(q).Then T*[S 1(x), S 2(x), ...,S k (x)] = E [T (x)|S 1(x),S 2(x), ...,S k (x)] is an unbiased estimator of f(q)such that:

Var [T*(S 1(x), S 2(x), ...,S k (x))] ≤ Var [T (x)]for all q W.

Theorem (Lehmann-Scheffe')

Let x = (x x x x ) denote the vector of



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of

observations having joint density f(x|q) wherethe unknown parameter vector q W.

Let S 1(x), S 2(x), ...,S K (x) denote a set of

complete sufficient statistics.Let T*[S 1(x), S 2(x), ...,S k (x)] be an unbiased

estimator of f(q). Then:

T*(S 1(x), S 2(x), ...,S k (x)) )] is the UMVUestimator of f(q).

Defn (Consistency)




the unknown parameter vector q W. Let T n(x)

be an estimator of f (q). Then T n(x) is called aconsistent estimator of f (q) if for any e > 0:

W θθx allfor 0lim e f nn T P

Defn (M. S. E. Consistency)



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x|q) where the

unknown parameter vector q W. Let T n(x) be an

estimator of f (q). Then T n(x) is called a M. S. E.consistent estimator of f (q) if for any e > 0:

0lim...lim2

θxθ f n

nT

nT E E S M

n

Wθallfor



Methods for Finding Estimators

1. The Method of Moments2. Maximum Likelihood Estimation

Methods for finding estimators



g

1. Method of Moments

2. Maximum Likelihood Estimation

Method of Moments



Let x1, … , xn denote a sample from the densityfunction

f ( x; q 1, … , q p) = f ( x; q)

The k th moment of the distribution being

sampled is defined to be:

1 1, , ; , ,k k k p p E x x f x dxm q q q q

The k th sample moment is defined to be:



To find the method of moments estimator of

q 1, … , q p we set up the equations:

1

1 n

k k i

im xn

1 1 1, , p mm q q

2 1 2

, , p

mm q q

1, , p p pmm q q

We then solve the equations

mm q q



for q 1, … , q p.

1 1 1, , p mm q q

2 1 2, , p mm q q

1, , p p pmm q q

The solutions 1, , pq q

are called the method of moments estimators

The Method of Maximum Likelihood



Suppose that the data x1, … , xn has joint densityfunction

f ( x1, … , xn ; q 1, … , q p)

where q (q 1, … , q p) are unknown parametersassumed to lie in W (a subset of p-dimensional

space).

We want to estimate the parametersq 1, … , q p

Definition: Maximum Likel ihood Estimation

Suppose that the data x1 x has joint density



Suppose that the data x1, … , xn has joint density

function f ( x1, … , xn ; q 1, … , q p)

Then the L ikel ihood function is defined to be

L(q) = L(q 1, … , q p)= f ( x1, … , xn ; q 1, … , q p)

the Maximum L ikel ihood estimators of the parameters

q 1, … , q p are the values that maximize

L(q) = L(q 1, … , q p)

the Maximum L ikel ihood estimators of the parameters

q1, … , q are the values



q 1, … , q p are the values

1

1 1, ,ˆ ˆ, , max , ,

p

p p L Lq q q q q q

1ˆ ˆ, , pq q

Such that

Note: 1maximizing , , p L q q

is equivalent to maximizing

1 1, , ln , , p pl Lq q q q

the log-l ikelihood function



Hypothesis Testing

Defn (Test of size )



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x| q) where

the unknown parameter vector q W.

Let w be any subset of W.Consider testing the the Null Hypothesis

H0: q w

against the alternative hypothesis

H1: q w.

Let A denote the acceptance region for the test.

(all values x = ( x1 , x2 , x3 , ... , xn) of such that the

d i i t t H i d )



decision to accept H0 is made.)

and let C denote the critical region for the test

(all values x = ( x1 , x2 , x3 , ... , xn) of such that the

decision to reject H0 is made.).

Then the test is said to be of size if

and allfor )|( w θxθxx

C

d f C P

w 0oneleastatfor )|( θxθxxC

d f C P

Defn (Power) Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of

observations having joint density f (x| q) where the



unknown parameter vector q W.

Consider testing the the Null Hypothesis

H0: q w


H1: q w.

where w is any subset of W. Then the Power of the test for

q w is defined to be:

C

C d f C P xθxxθ )|(

Defn (Uniformly Most Powerful (UMP) test of

size )



Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having joint density f (x| q) where theunknown parameter vector q W.

Consider testing the the Null HypothesisH0: q w


H1: q w.where w is any subset of W.

Let C denote the critical region for the test . Thenthe test is called the UMP test of size if :

Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of



observations having joint density f (x| q) where theunknown parameter vector q W.


H0: q w


H1: q w.

where w is any subset of W.

Let C denote the critical region for the test . Thenthe test is called the UMP test of size if :

and allfor )|( w θxθxxC

d f C P



w 0oneleastatfor )|( θxθxx

C d f C P

and for any other critical region C* such that:

and allfor )|(*

*

w

θxθxx

C

d f C P

w 0

*

oneleastatfor )|(* θxθxxC

d f C P

then

.allfor )|()|(*

w θxθxxθxC C

d f d f

Theorem (Neymann-Pearson Lemma)Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having

joint density f (x| q) where the unknown parameter vector q W =

( )



(q0, q1).


H0: q = q0


H1: q = q1.

Then the UMP test of size has critical region:

K

f

f C

)|(

)|(

1

0

θx

θxx

where K is chosen so that C

d f xθx )|( 0

Defn (Likelihood Ratio Test of size )Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having

joint density f (x| q) where the unknown parameter vector q W.

C id i h h N ll H h i




H0: q w


H1: q w.

where w is any subset of W Then the Likelihood Ratio (LR) test of size a has critical region:

where K is chosen so that

W

K f

f C

)|(max

)|(max

θx

θxx

θ

θ w

and allfor )|( w θxθxxC

d f C P

w 0oneleastatfor )|( θxθxx

C

d f C P

Theorem (Asymptotic distribution of

Likelihood ratio test criterion) Let x = ( x1 , x2 , x3 , ... , xn) denote the vector of observations having



joint density f (x| q) where the unknown parameter vector q W.


H0: q w


H1: q w.where w is any subset of W

Then under proper regularity conditions on U = 2ln(x)

)|(max

)|(maxLet

θx

θxx

θ

θ

f

f

W

w

Section 02 Review of Probability and Statistics

Documents

Transcript of Section 02 Review of Probability and Statistics