Applied Statistics and Econometrics Review of Probability and Statistical...

25
Applied Statistics and Econometrics Lecture 2 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 50 Review of Probability and Statistical Theory This review covers 4 topics: 1 The probability framework for statistical inference (SW 2.1-2.4) 2 Estimation (SW 2.5-2.6, SW 3.1, SW 3.5) 3 Hypotheses testing (SW 3.2,SW 3.4) 4 Condence intervals (SW 3.3) Saul Lach () Applied Statistics and Econometrics September 2017 2 / 50

Transcript of Applied Statistics and Econometrics Review of Probability and Statistical...

Page 1: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Applied Statistics and EconometricsLecture 2

Saul Lach

September 2017

Saul Lach () Applied Statistics and Econometrics September 2017 1 / 50

Review of Probability and Statistical Theory

This review covers 4 topics:

1 The probability framework for statistical inference (SW 2.1-2.4)2 Estimation (SW 2.5-2.6, SW 3.1, SW 3.5)3 Hypotheses testing (SW 3.2,SW 3.4)4 Confidence intervals (SW 3.3)

Saul Lach () Applied Statistics and Econometrics September 2017 2 / 50

Page 2: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

The probability framework for statistical inference (SW 2)Outline for this part:

1.1 Population and samples, random variable, and distribution (SW 2.1,2.4)

1.2 Moments of a distribution (mean, variance, standard deviation, covariance,correlation) (SW 2.2)

1.3 Two random variables: Conditional distributions and conditional means (SW2.3).

Saul Lach () Applied Statistics and Econometrics September 2017 3 / 50

Population and samples

PopulationThe group or collection of all possible entities (units) of interest (schooldistricts in CA, American CEOs).We will think of populations as infinitely large (∞ is an approximation to “verybig”)

SampleA sample is a subset of entities (units) selected from the population.Can have many samples from a given population.

We now focus on features of random variables in the population.

Saul Lach () Applied Statistics and Econometrics September 2017 4 / 50

Page 3: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Random variables and probability distribution

Random VariableA random variable X is a numerical summary of a random outcome (districttest score, district STR, CEO salary).

Probability distribution of XThe probability of different values of X that occur in the population, e.g.,

Pr [X = 650]

(when X is discrete).Or, the probability of sets of these values, e.g., Pr [640 ≤ X ≤ 660] (when Xis continuous),

Pr[640 ≤ X ≤ 660] =∫ 660

640f (x)dx

where f (x) is the probability density function (p.d.f.) of X .

Saul Lach () Applied Statistics and Econometrics September 2017 5 / 50

Examples of distributions: Normal

A very important distribution is the normal (or Gaussian) distribution. Thenormal distribution has a bell-shaped p.d.f. which is formally given by:

f (x) =1√2πσ

exp(− (x − µ)2

2σ2

)where µ and σ are parameters that we will see have an importantinterpretation.

If a random variable X has a normal distribution we write

X v N(

µ, σ2)

Saul Lach () Applied Statistics and Econometrics September 2017 6 / 50

Page 4: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Examples of distributions: Normal

The probability that X has values between a and b is the area under the bell-shapedp.d.f.: Pr[a ≤ X ≤ b] =

∫ ba f (x)dx .

Saul Lach () Applied Statistics and Econometrics September 2017 7 / 50

Examples of distributions: Chi-squared

An other important distribution is the chi-squared distribution with p.d.f.:

f (x) =

{ 12v2 Γ( v2 )

xv2−1e−

x2 , x ≥ 0

0, x < 0

where

Γ(·) is a complicated function (called Gamma function)ν is a parameter (this parameter is called the “degrees of freedom”of the χ2

distribution)

If a random variable X has a chi-squared distribution with v degrees offreedom we write

X v χ2v

Saul Lach () Applied Statistics and Econometrics September 2017 8 / 50

Page 5: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Examples of distributions: Chi-squared

The probability that X has values between a and b is the area under the p.d.f.:Pr[a ≤ X ≤ b] =

∫ ba f (x)dx .

Saul Lach () Applied Statistics and Econometrics September 2017 9 / 50

Examples of distributions: t-student

Another important distribution is the t-student distribution with p.d.f. :

f (x) =Γ( ν+1

2 )√νπ Γ( ν

2 )

(1+

x2

ν

)− ν+12

where

Γ(·) is a complicated function (called Gamma function)ν is a parameter (this parameter is called the “degrees of freedom”of the tdistribution)

If a random variable X has a t-student distribution with v degrees of freedomwe write

X v t(v)

Saul Lach () Applied Statistics and Econometrics September 2017 10 / 50

Page 6: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Examples of distributions: t-student

The probability that X has values between a and b is the area under thep.d.f.: Pr[a ≤ X ≤ b] =

∫ ba f (x)dx ,

Saul Lach () Applied Statistics and Econometrics September 2017 11 / 50

Where are we?The probability framework for statistical inference (SW 2)

1.1 Population and samples, random variable, and distribution (SW 2.1, 2.4)

1.2 Moments of a distribution (mean, variance, standard deviation,covariance, correlation) (SW 2.2)

1.3 Two random variables: Conditional distributions and conditional means (SW2.3).

Saul Lach () Applied Statistics and Econometrics September 2017 12 / 50

Page 7: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Moments of a population distribution

Mean of XE (X ) =

∫xf (x)dx

Mean of X is the expected value of X ; the long-run average value of X overrepeated realizations of X .

We often write the mean of X as

E (X ) = µX

Saul Lach () Applied Statistics and Econometrics September 2017 13 / 50

Moments of a population distribution

Variance of X

Var(X ) = E (X − E (X ))2 = E (X − µX )2 =

∫(x − µX )

2 f (x)dx

Variance of X is the expected value of the squared deviation of X from itsmean; measures the squared spread of the distribution.We often write the variance of X as

Var(X ) = σ2X

Standard deviation of XσX =

√V (X )

Saul Lach () Applied Statistics and Econometrics September 2017 14 / 50

Page 8: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Moments of a population distribution

Skewness of XE (X − µX )

3

σ3X

Skewness measures asymmetry of a distribution:

If skewness = 0: distribution is symmetric

If skewness > (<) 0: distribution has long right (left) tail

Saul Lach () Applied Statistics and Econometrics September 2017 15 / 50

Moments of a population distribution

Kurtosis of XE (X − µX )

4

σ4X

Kurtosis is a measure of mass in the tails of distribution; measure ofprobability of large values

kurtosis = 3: normal distribution

kurtosis > 3: heavy tails (“leptokurtotic”)

Saul Lach () Applied Statistics and Econometrics September 2017 16 / 50

Page 9: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Skewness and Kurtosis

Saul Lach () Applied Statistics and Econometrics September 2017 17 / 50

Examples of moments: Normal distribution

When X v N(µ, σ2

)we have the following moments:

µX = E (X ) = µ

σ2X = Var(X ) = σ2

σX =√Var(X ) = σ

skew(X ) =E (X − µX )

3

σ3X= 0

kurt(X ) =E (X − µX )

3

σ3X= 3

Saul Lach () Applied Statistics and Econometrics September 2017 18 / 50

Page 10: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Examples of moments: Chi-squared distribution

When X v χ2v we have the following moments

µX = E (X ) = v

σ2X = Var(X ) = 2v

σX =√Var(X ) =

√2v

skew(X ) =E (X − µX )

3

σ3X=

√8v

kurt(X ) =E (X − µX )

3

σ3X=12v

Saul Lach () Applied Statistics and Econometrics September 2017 19 / 50

Examples of moments: t-student distribution

When X v t(v) we have the following moments:

µX = E (X ) = 0, for v > 1

σ2X = Var(X ) =v

v − 2 , for v > 2

σX =√Var(X ) =

√v

v − 2 , for v > 2

skew(X ) =E (X − µX )

3

σ3X= 0

kurt(X ) =E (X − µX )

3

σ3X=

6v − 4

Saul Lach () Applied Statistics and Econometrics September 2017 20 / 50

Page 11: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Where are we?The probability framework for statistical inference (SW 2)

1.1 Population and samples, random variable, and distribution (SW 2.1, 2.4)

1.2 Moments of a distribution (mean, variance, standard deviation, covariance,correlation) (SW 2.2)

1.3 Two random variables: Conditional distributions and conditionalmeans (SW 2.3).

Saul Lach () Applied Statistics and Econometrics September 2017 21 / 50

Random variables: joint distributions and covariance

Random variables X and Z have a joint distribution describing theprobabilities of different combinations (x , z) of X and Z .The covariance between X and Z is

cov(X ,Z ) = E [(X − µX )(Z − µZ )] = σXZ

In Lecture 1 we saw the formula for the sample covariance. Above is thepopulation covariance!The covariance is a measure of the linear association between X and Z ; itsunits reflect the units of X and of Z .

cov (X ,Z ) > 0 means a positive relation between X and Z . X and Z tend tomove together in the same directioncov (X ,Z ) < 0 means X and Z tend to move together in the oppositedirectioncov (X ,Z ) = 0 means no linear association

The covariance of a r.v. with itself is its variance:

cov(X ,X ) = E [(X − µX )(X − µX )] = E [(X − µX )2 ] = Var(X )

Saul Lach () Applied Statistics and Econometrics September 2017 22 / 50

Page 12: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Covariance and independence

If X and Z are independently distributed, then

cov(X ,Z ) = 0

(but not vice versa!!)

Saul Lach () Applied Statistics and Econometrics September 2017 23 / 50

Correlation coeffi cient between random variables

The (Pearson) correlation coeffi cient is defined by

ρXZ =cov(X ,Z )√Var (X )Var (Z )

=σXZ

σX σZ

In Lecture 1 we saw the formula for the sample correlation. Above is thepopulation correlation!

It is a standardization of the covariance. Does not depend on units ofmeasurement.

Symmetric measure (ρXZ = ρZX ). No direction of causality implied.

Always: −1 ≤ ρXZ ≤ 1

1 ρXZ = 1 mean perfect positive linear association2 ρXZ = −1 means perfect negative linear association3 ρXZ = 0 means no linear association

Saul Lach () Applied Statistics and Econometrics September 2017 24 / 50

Page 13: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Correlation coeffi cient measures linear associations

Last one is a nonlinear association!

Saul Lach () Applied Statistics and Econometrics September 2017 25 / 50

Conditional distributions

Conditional distribution is the distribution of Y , given value(s) of someother random variable, X , denoted by

f (y |X = x)

Examples: the distribution of test scores given that STR < 20 or thedistribution of CEO salaries given sales between 10 and 20 million dollars.

The conditional distribution of Y given X is the distribution of Y in thesubpopulation defined by the values of X .

Saul Lach () Applied Statistics and Econometrics September 2017 26 / 50

Page 14: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Conditional moments

Key concept: conditional meanConditional mean = mean of the conditional distribution given by:

E (Y |X = x) =∫yf (y |X = x) dy

We often writeE (Y |X )

omitting the specific value x of X when we want to emphasize that theconditional mean is as a function of X .

E (Testscores |STR < 20) = mean of test scores among schools with smallclass sizes (less than 20 students per teacher). Schools with small class sizesis the supopulation on which we focus.

Saul Lach () Applied Statistics and Econometrics September 2017 27 / 50

Conditional moments

Conditional variance = variance of the conditional distribution given by

Var(Y |X = x) =∫(y − E (Y |X = x))2 f (y |X = x) dy

Var (Testscores |STR < 20) = variance of test scores among schools withsmall class sizes (less than 20 students per teacher).

Saul Lach () Applied Statistics and Econometrics September 2017 28 / 50

Page 15: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Difference in conditional means

The difference in means is the difference between the means of twoconditional distributions and is usually something of interest.

For example, in the class size debate we are interested in

∆ = E [Testscore|STR < 20]− E [Testscore|STR ≥ 20]

Other examples of conditional means and their differences:

Wages of workers by gender: we compare wages of males and females (Y =wages, X = gender)Mortality rate of a new drug/procedure: we compare the average mortality ofindividuals given the new drug (treated) to that of individuals not treated (Ylive/die; X = treated/not treated)

The conditional mean is a (possibly new) term for the familiar idea of thegroup mean.

Saul Lach () Applied Statistics and Econometrics September 2017 29 / 50

Mean independence

Take two random variables, say U and X .

If E (U |X ) = constant then the mean of U is not affected by X , and we saythat U is mean independent of X .Mean independence implies lack of correlation (but not necessarilyvice-versa),

If E [U |X = x ] = constant for all x ⇒ cov(U,X ) = 0

Note that cov(U,X ) = 0 does not imply E (U |X ) = constant.

Mean independence also implies

E (U |X ) = constant ⇒ E (U |X ) = E (U)

Saul Lach () Applied Statistics and Econometrics September 2017 30 / 50

Page 16: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Where are we in the Review?

1 The probability framework for statistical inference (SW 2.1-2.4)2 Estimation (SW 2.5-2.6, SW 3.1, SW 3.4-3.5)3 Hypotheses testing (SW 3.2)4 Confidence intervals (SW 3.3)

Saul Lach () Applied Statistics and Econometrics September 2017 31 / 50

Sampling from a population (SW 2.5)

Let Y denote a variable of interest in the population, for example,

Y = monthly wage of Italian full time employee (FTE)

We draw a sample of n observations from the population of all FTEs inItaly, denoted by

{Y1,Y2, . . . ,Yn}Y1 is the value of Y for first unit (wage of first worker), Yi is the value of Yfor i th unit.

Prior to sample selection, the wages Y1, . . . ,Yn are random variables becausethe worker i is randomly selected.

Once the worker is selected and the value of Yi is observed, then Y1, . . . ,Ynis just an array of numbers - not random.

It will be clear from context when we treat the Y ′i s as random and when not.

Saul Lach () Applied Statistics and Econometrics September 2017 32 / 50

Page 17: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

How is the sampling done?

The way observations are sampled will affect the statistics we generate.

We will assume simple random sampling, that is, entities (units) are drawnat random from the same population.Because individuals #1 and #2 are selected at random, the value of Y1 hasno information content for Y2. Thus:

Y1 and Y2 are independently distributedY1 and Y2 come from the same distribution: Y1 and Y2 are identicallydistributed.

That is, under simple random sampling, Y1 and Y2 are independently andidentically distributed (i.i.d.).More generally, under simple random sampling, {Yi}, i = 1, . . . , n are i.i.d.random variables.

Saul Lach () Applied Statistics and Econometrics September 2017 33 / 50

Population and sample

This framework allows rigorous statistical inferences about moments ofpopulation distributions using a sample of data from that population.

Saul Lach () Applied Statistics and Econometrics September 2017 34 / 50

Page 18: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

The sample mean estimates the population mean

We focus on estimating the population mean µY .

The sample mean Y is the “natural”estimator of the expected value of Y ,µY .

1 What are the properties of Y ?2 Do these properties justify using Y rather than some other estimator?

1 For example, the first observation Y1.2 Maybe use unequal weights - not a simple average.3 Or use the median of the sample?

The starting point to answer these questions is knowing Y ′s samplingdistribution.

Saul Lach () Applied Statistics and Econometrics September 2017 35 / 50

The sampling distribution of the sample mean (SW 2.5)

The individuals in the sample are drawn at random ⇒ values of(Y1,Y2, . . . ,Yn) are random.=⇒ functions of (Y1,Y2, . . . ,Yn), such as Y , are therefore, also random.

Had a different sample been drawn, they would have taken on a different value.

The sample mean Y is then random variable; its properties are determined byits sampling distribution.The distribution of Y over different possible samples of size n is called thesampling distribution of Y .

The mean and variance of Y are the mean and variance of its samplingdistribution, E (Y ) and Var (Y ).

The concept of the sampling distribution underpins all of econometrics.

Saul Lach () Applied Statistics and Econometrics September 2017 36 / 50

Page 19: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Example: Bernoulli distribution

Suppose Y takes values 0 and 1 with probabilities

Pr [Y = 1] = p = 0.78 and Pr[Y = 0] = 1− p = 0.22

The mean and variance of Y are

µY = E (Y ) = p × 1+ (1− p)× 0 = p = 0.78

and (remember this?)

σ2Y = E [Y − E (Y )]2 = p(1− p) = .78× (1− .78) = 0.1716

The sampling distribution of Y depends on n. Consider n = 2.The sampling distribution of Y is:

Pr [Y = 0] = (1− p)2 = .222 = 0.0484Pr [Y = 0.5] = 2× p(1− p) = 2× .22× .78 = 0.3432Pr [Y = 1] = p2 = .782 = 0.6084

What is E (Y ) and Var (Y )?

Saul Lach () Applied Statistics and Econometrics September 2017 37 / 50

Sampling distribution of the sample mean when Y isBernoulli (p = .78)

Saul Lach () Applied Statistics and Econometrics September 2017 38 / 50

Page 20: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Things we want to know about the sampling distribution

1 What is the mean of Y ?If E (Y ) = µY = 0.78, then Y is an unbiased estimator of µY .

2 What is the variance of Y ? How does Var(Y ) depend on sample size n?3 Does Y become close to µ when n is large?4 Distribution of Y appears bell shaped around µY for n large. . . is thisgenerally true?

Saul Lach () Applied Statistics and Econometrics September 2017 39 / 50

The mean and variance of the sampling distribution ofsample mean

In general case (not just Bernoulli example), for each sample size n,

E (Y ) = µY

and

Var(Y ) =σ2Yn

Implications:

1 Y is an unbiased estimator of µY , (that is, E (Y ) = µY ).2 Var(Y ) is inversely proportional to sample size n.

1 Thus the sampling uncertainty associated with Y —measured by its standarddeviation — is proportional to 1/

√n (larger samples, less uncertainty, but

square-root law).

Saul Lach () Applied Statistics and Econometrics September 2017 40 / 50

Page 21: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Sampling distribution of sample mean when n is large (SW2.6)

For small sample sizes, the distribution of Y is complicated, but if n is large,the sampling distribution is simple!

This is what we know:

1 As n increases, the distribution of Y becomes more tightly centered aroundµY (the Law of Large Numbers).

2 As n increases, the distribution of Y−µY√Var (Y )

becomes normal (the Central

Limit Theorem).

Saul Lach () Applied Statistics and Econometrics September 2017 41 / 50

The Law of Large Numbers (LLN)

DefinitionAn estimator is consistent if the probability that its falls within an interval of thetrue population value tends to one as the sample size increases.

Theorem (LLN)

If (Y1, . . . ,Yn) are i.i.d. and σ2Y < ∞, then Y is a consistent estimator of µY ,that is,

Pr[|Y − µY | < ε]→ 1 as n→ ∞

which is written as,Y

p−→ µY .

Saul Lach () Applied Statistics and Econometrics September 2017 42 / 50

Page 22: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

The Central Limit Theorem (CLT)

Theorem (CLT)

If (Y1, . . . ,Yn) are i.i.d. and 0 < σ2Y < ∞, then when n is large the distribution

of Y is well approximated by a normal distribution N(µY ,σ2Yn ) (“normal

distribution with mean µY and variance σ2/n).

1√n(Y − µY )/σY is approximately distributed N(0, 1) (standard normal).

1 That is, the “standardized” sample mean Y−E (Y )√Var (Y )

= (Y − µY )/σY /√n is

approximately distributed as N(0, 1).

2√n(Y − µY )/sY is also approximately distributed N(0, 1) (standard normal)

3 The larger is n, the better are these approximations.

Saul Lach () Applied Statistics and Econometrics September 2017 43 / 50

Summary for the sampling distribution of the sample mean

For Y1, . . . ,Yn i.i.d. with 0 < σ2Y < ∞.

The exact (finite sample) sampling distribution of Y has mean µY (“Y is anunbiased estimator of µY ”) and variance σ2Y /n.Other than its mean and variance, the exact distribution of Y is complicatedand depends on the distribution of Y .

But, when n is large, the sampling distribution of Y simplifies. We know:

Yp−→ µY , (Law of large numbers)

√n (Y − µY )

σYis approximately N(0,1), (CLT)

Saul Lach () Applied Statistics and Econometrics September 2017 44 / 50

Page 23: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

So...why use the sample mean to estimate the populationmean?

1 Y is unbiased:E (Y ) = µY

2 Y is consistent:Y

p−→ µY

And, in addition,3 Y is the “least squares” estimator of µY ; Y solves

minm

n

∑i=1(Yi −m)2

Y is the value of m that minimizes the sum of squared “residuals” (Yi −m)2in the sample.

4 Y has a smaller variance than all other linear unbiased estimators.

Saul Lach () Applied Statistics and Econometrics September 2017 45 / 50

Sample mean is the least squares estimator of populationmean

Find the value of m the minimizes the sum of the squared deviations(residuals) (Yi −m)2 in the sample.Mathematically, we solve

minm

n

∑i=1(Yi −m)2

We take the derivative of this function wrt m, equate it to zero and solve form:

ddm

n

∑i=1(Yi −m)2 =

n

∑i=1

ddm(Yi −m)2 = −2

n

∑i=1(Yi −m)

=⇒ m =1n

n

∑i=1

Yi = Y

Saul Lach () Applied Statistics and Econometrics September 2017 46 / 50

Page 24: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Sample mean has smaller variance

Consider any other estimator of µY which is linear in the Y ′i s and unbiased.

Such an estimator can be written generically as

µY =1n

n

∑i=1

aiYi

where the a′i s are such that µY is unbiased.

Note that the sample mean Y has ai = 1 for each i .

Then it can be shown that

Var (µY ) ≥ Var (Y )

Saul Lach () Applied Statistics and Econometrics September 2017 47 / 50

Estimator of the variance of the sample mean

A good estimator of σ2Y is the sample variance of Y :

s2Y =1

n− 1n

∑i=1(Yi − Y )2

1 If (Y1, . . . ,Yn) are i.i.d. and E (Y 4) < ∞, then

s2Yp−→ σ2Y

sYp−→ σY

1 Why does the law of large numbers apply?

1 Because s2Y is also a sample average of (Yi − Y )2 (dividing by n-1 or by n doesnot matter when n is large)

2 Technical note: we assume E (Y 4) < ∞ because here the average is not of Yi ,but of its square.

Saul Lach () Applied Statistics and Econometrics September 2017 48 / 50

Page 25: Applied Statistics and Econometrics Review of Probability and Statistical …saullach.weebly.com/uploads/2/4/5/3/2453675/lecture_2... · 2018. 9. 5. · Applied Statistics and Econometrics

Estimator of population moments

Sample analogues are good estimators of population quantities (parameters) inthe sense that they are consistent estimators.

population quantity alternative notation sample quantityE (Y ) µY Y

Var(Y ) σ2Y s2Y√Var(Y ) σY sY

cov(Y ,X ) σYX sYX

corr(Y ,X ) ρYX rYX

Saul Lach () Applied Statistics and Econometrics September 2017 49 / 50

Where are we in the Review?

1 The probability framework for statistical inference (SW 2.1-2.4)2 Estimation (SW 2.5-2.6, SW 3.1, SW 3.5)3 Hypotheses testing (SW 3.2, SW 3.4)4 Confidence intervals (SW 3.3)

Saul Lach () Applied Statistics and Econometrics September 2017 50 / 50