Applied Statistics - web.uniroma1.it lezioni/Slides... · Introduction Sampling Variation Random...

Introduction Sampling Variation Random sample Normal distribution Other distributions Association Convergence

Applied Statistics

Lecturer: Cristina Mollica


Statistical models

Statistics concerns what can be learned from data using

statistical models

to study the variability of the data.

The key feature of a statistical model is that variability isrepresented using probability distributions, which form thebuilding-blocks from which the model is constructed.

Statistical models must accommodate:systematic variationrandom variation


Statistical models

The purpose of these first lectures is to review the concepts ofSample statistics and Sampling VariationMomentsConvergence

and we will focus on the

Normal distribution


Statistics and Sampling Variation

The key idea in statistical modelling is to treat the data asthe outcome of a random experiment.

n i.i.d. r. v. (Y1, . . . ,Yi , . . . ,Yn) form the random sample.

Statistical analysis generally deals with

n observations (y1, . . . , yi , . . . , yn) known as observed sample.

We say that a quantity

t = t(y1, ..., yn) is a sample statistic.


Data summaries

Statistics summarize some important aspects of the data.

Example:

y =1n

n∑i=1

yi (location) s2 =1

n − 1

n∑i=1

(yi − y)2 (scale)

median(y) =

y(n+1)/2 n odd12(y(n/2) + y(n/2+1) n even.

(center)

ECDF (t) =1n

n∑i=1

Iyi ≤ t (empirical cumulative distribution function)


Graphs

HistogramEmpirical cumulative distribution functionBoxplotScatterplot (2 variables)


GraphsExample (Italian consumption data): income and consumption of asample 7927 italians in 2010.

Histogram of the income (on log scale)

Income

6 8 10 12

0.0

0.2

0.4

0.6

6 8 10 12 14

0.0

0.2

0.4

0.6

0.8

1.0

ECFD of the income (on log scale)

x

Fn(

x)

(5.7,7.21] (8.72,10.2] (11.8,13.3]

78

910

1112

Consumption by class of income (on log scale)

6 8 10 12

78

910

1112

Income vs consumption (on log scale)

Income

Con

sum

ptio

n


Random sample

The fundamental idea of statistical modelling is to treat dataas observed values of random variables.

The data available y1, y2, ..., yn are the observed values of arandom sample of size n, defined to be a collection of nindependent and identically distributed random variablesY1,Y2, ...,Yn.

We suppose that each of the Yi has the same cumulativedistribution function F , representing the population Y fromwhich the sample has been taken.


Random sample

Statistical models ⇔ Random variables

Typically, we are interested in inferring specific features of thepopulation:

mean of Y ∼ F

variance of Y ∼ F

moments of Y ∼ F


Mean of a random variable

Let Y be a random variable with cumulative distribution function Fand density function f .The expected value of Y , E [Y ] is defined as

E [Y ] =

∫ydF (y) =

∫yf (y)dy .


Variance of a random variable

The variance V [Y ] of Y is defined as

V [Y ] = E [(Y−E [Y ])2] = E [Y 2]−E [Y ]2 =

∫y2dF (y)−

(∫ydF (y)

)2

.

The computation of the moments of a random variable is facilitatedby the use of moment generating function.


Moment generating function

The moment generating function of the random variable Y is

MY (t) = E (etY ) t ∈ R

provided that MY (t) <∞.

Let

M ′(t) = dM(t)dt , M ′′(t) = d2M(t)

dt2, M(r)(t) = d rM(t)

dtr

denote the derivatives of M.

µr = E [Y r ] = M (r)(0).


Moment generating functionSome properties:

Y1, ...,Yn are independent if and only if their joint momentgenerating function factorizes as

E [exp(Y1t1 + ... + Yntn)] = E [exp(Y1t1)] · · ·E [exp(Yntn)]

Let Y = a + bX . The moment generating function of Y is

MY (t) = eatMX (bt)

Any moment generating function corresponds to a uniqueprobability distribution.


Examples

We will consider some examples of important random variables.In particular, we will focus on the

Poisson distribution (blackboard);Binomial distribution (at home);Uniform distribution (at home);Exponential distribution (at home).


Normal distribution

We say that X ∼ N(µ, σ2) when

fX (x ;µ, σ) =1

σ√2π

exp

(− 12σ2 (x − µ)2

)

µ ∈ R→ mean, median and mode (location parameter);σ2 > 0→ variance (scale parameter).


Normal distribution

−6 −4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Varying location parameterde

nsity

µ = 0 µ = − 1 µ = − 3 µ = 1 µ = 3


Normal distribution

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

Varying scale parameterde

nsity

σ2 = 1 σ2 = 4 σ2 = 9 σ2 = 0.1 σ2 = 0.25


Normal distribution

−4 −2 0 2 4

0.0

0.5

1.0

1.5

Varying scale and location parametersde

nsity

σ2 = 1σ2 = 0.1σ2 = − 3

µ = 0µ = 2µ = − 2


Standardization

The random variable

Z =X − µσ

∼ N(0, 1)

is a standardized Normal distribution.

fZ (z) =1√2π

exp

(−12z2)

Clearly,X = σZ + µ.


Standardization

The cumulative density function is obtained by integrating thedensity as follows

FZ (z) =1√2π

∫ u

−∞exp

(−12z2)dz = Φ(u)

If we want to compute P(X ≤ 2) for X ∼ N(µ = 3, σ2 = 4)

P(X ≤ 2) = P(Z ≤ −0.5) = Φ(−0.5) = 1− Φ(0.5)

that cannot be computed analytically! Use R!


M.G.F. of Normal distribution

Let Z ∼ N(0, 1). The moment generating function is

MZ (t) = E [etZ ] = et22

Proof (at home) Let Y ∼ N(µ, σ2); for the standardization,

Y = µ+ σZ

For linear combination, the m.g.f. of Y is

MY (t) = E [etY ] = eµtMZ (σt) = eµteσ2t2

2


Chi-squared distribution

Let Z ∼ N(0, 1). The random variable

Y = Z 2 ∼ χ2(1)

FY (y) = 2Φ(√y)− 1

fY (y) = 1√2πy e

− 12 y y ∈ R+

Proof (blackboard)

E [Y ] = 1 V [Y ] = 2



More generally, Y ∼ χ2(n) where the parameter n ∈ N+ is referred

to as degrees of freedom if

fY (y ; n) =1

2n/2Γ(n/2)y

n2−1e−y/2 x ∈ R+

0

Note that

Y ∼ χ2(n) ⇔ Y ∼ Gamma(α = n/2, β = 1/2).

It can be easily proven (at home) that

E [Y ] = n V [Y ] = 2n


Chi-squared distributionThe m.g.f. of a Gamma distribution X ∼ Gamma(α, β) is

MX (t) = E [etX ] =

(β

β − t

)αand hence the m.g.f. of a Y ∼ χ2

(1) (α = 12 , β = 1

2)

MY (t) = E [etY ] =

√1

1− 2t.

THEOREM Let (Z1,Z2, . . . ,Z2)iid∼ N(0, 1). Hence

Y = Z 21 + Z 2

2 + · · ·+ Z 2n ∼ χ2

(n).

Proof (blackboard)



0 5 10 15 20 25 30

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Den

sity

ν = 1ν = 3ν = 5ν = 10ν = 20


Student T distributionLet Z ∼ N(0, 1) and W ∼ χ2

(ν) be independent. The randomvariable

Y =Z√W /ν

∼ Tν

where the parameter ν > 0 indicates the degrees of freedom.

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

Den

sity

ν = 1ν = 3ν = 5ν = 10ν = 20N(0,1)


Fisher F distributionLet X1 ∼ χ2

(ν1)and X2 ∼ χ2

(ν2)(independent). The random variable

Y =X1/ν1

X2/ν2∼ F (ν1, ν2),

where the parameters ν1, ν2 > 0 are the degrees of freedom.

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F distribution (nu1=5)

Den

sity

nu2 = 1nu2 = 5nu2 = 10nu2 = 20


Association measures

Let Y1 and Y2 be two random variables.

The covariance between Y1 and Y2 is

Cov(Y1,Y2) = E [(Y1 − E [Y1])(Y2 − E [Y2])]

= E [Y1Y2]− E [Y1]E [Y2]



Let Y be a p-dimensional random vector.

The variance-covariance matrix is a p × p matrix defined as

Σ = Cov(Y,Y) = E [(Y − E [Y])(Y − E [Y])′]

The generic element σrs is

σrs = Cov(Yr ,Ys) = E [(Yr − E [Yr ])(Ys − E [Ys ])]



Theorem: Σ is a positive semidefinite matrix.

Proof: Let a = (a1, ..., ap) be a vector of constants. It followsthat

0 ≤ V (a′Y) = Cov(a′Y, a′Y) = a′Cov(Y,Y)a = a′Σa.



LetY be a p-dimensional random vector;a be a q-dimensional vector;B be a p × q matrix.

The q × q variance-covariance matrix of the vector a + B′Y

Cov(a + B′Y, a + B′Y) = Cov(B′Y,B′Y)

= E [(B′Y − E [B′Y])(B′Y − E [B′Y])′]

= E [B′(Y − E [Y])(Y − E [Y])′B]

= B′E [(Y − E [Y])(Y − E [Y])′]B= B′ΣB.



The correlation between two variables Y1 and Y2 is defined as

ρ(Y1,Y2) =Cov(Y1,Y2)√Var(Y1)Var(Y2)

|Cov(Y1,Y2)| ≤√Var(Y1)Var(Y2);

−1 ≤ ρ(Y1,Y2) ≤ 1.The correlation matrix is defined as

Ω = Σ−12 ΣΣ−

12

where Σ−12 is a diagonal matrix with the standard deviations of Y.


The multivariate normal distribution

We say that Y = (Y1, ...,Yp) ∼ Np(µ,Σ) when its densityfunction is

f (y ;µ,Σ) =1

(2π)p/2|Σ|1/2exp

−12

(y − µ)′Σ−1(y − µ)

where |Σ| = det(Σ).



Example: p = 2 (blackboard).

f (y1, y2;µ1, µ2,Σ) =1

2πσ1σ2√

1− ρ2exp

−12Q(y1, y2)

where

Q(y1, y2) =1

1− ρ2

[(y1 − µ1

σ1

)2

− 2ρ(y1 − µ1

σ1

)(y2 − µ2

σ2

)+

(y2 − µ2

σ2

)2].



x

−4−2

0

2

4y

−4−2

02

4

f(x,y)

0.000.050.100.150.200.25

rho= −0.8

x

−4−2

0

2

4y

−4−2

02

4

f(x,y)

0.000.05

0.10

0.15

rho= 0

x

−4−2

0

2

4y

−4−2

02

4

f(x,y)

0.000.050.10

0.15

rho= 0.4

x

−4−2

0

2

4y

−4−2

02

4

f(x,y)

0.000.050.100.150.20

rho= 0.7


The multivariate normal distributionrho= −0.8

0.02

0.04

0.06

0.08 0.1

0.12

0.14

0.16

0.18

−2 −1 0 1 2 3 4

−2

−1

01

23

4

rho= 0

0.02

0.04

0.06

0.08 0.1

0.12

−2 −1 0 1 2 3 4

−2

−1

01

23

4

rho= 0.4

0.02

0.04 0.06

0.08

0.1

0.12

0.14

−2 −1 0 1 2 3 4

−2

−1

01

23

4

rho= 0.7

0.02

0.04 0.06

0.08

0.1 0.12

0.14

0.18

−2 −1 0 1 2 3 4

−2

−1

01

23

4



The moment generating function of Y ∼ Np(µ,Σ) is

MY(t) = E [et′Y] = exp(t′µ +t′Σt2

)



Theorem Let Y ∼ Np(µ,Σ) and B a k × p matrix. Then thevariable W = BY is

W ∼ Nk(Bµ,BΣB′)

Proof. Compute the m.g.f. and its property for linear combination:

MW(t) = E [exp(t′BY)] = MY(B′t)

= exp

(t′Bµ+

t′BΣB′t2

)


Marginal and conditional distribution

Let Y ∼ Np(µ,Σ) and Y′ = (Y′1,Y′2) where Y1 is q × 1 and Y2 is

(p − q)× 1.

Let µ =

(µ1µ2

)Σ =

(Σ11 Σ12Σ21 Σ22

)be the decomposition of the mean vector and thevariance-covariance matrix, where

Σ11 is q × q;Σ22 is (p − q)× (p − q).Σ12 is q × (p − q);Σ21 is (p − q)× q;Σ21 = Σ′12;


Marginal and conditional distribution

It can be shown that1 Y1 ∼ Nq(µ1,Σ11);2 Y2 ∼ N(p−q)(µ2,Σ22);

3 Y1|Y2 = y ∼ Nq(µ1 + Σ12Σ−122 (y − µ2); Σ11 − Σ12Σ−1

22 Σ21)



Let Y ∼ Np(µ,Σ), Σ full rank matrix. Then

Z = (Y − µ)′Σ−1(Y − µ) ∼ χ2p.


Modes of convergence

IDEA:The bigger our sample, the more faith we can have in ourinferences, because the sample is more representative of the

distribution F from which it comes.

We will study two modes of convergence:convergence in distributionconvergence in probability


Convergence in distribution

We say that the sequence Z1,Z2, . . . ,Zn, . . . converges indistribution to Z , Zn

d→ Z , if

FZn = Pr(Zn ≤ z)→ FZ = Pr(Z ≤ z) as n→∞

This implies that, for large n, one can use FZ to approximate FZn :

Zn.∼ Z


Some examples

1 A sequence X1,X2, . . . ,Xn, . . . where Xn ∼ T(n) converges indistribution to Z ∼ N(0, 1)

T(n).∼ N(0, 1) as n→ +∞

2 A sequence X1,X2, . . . ,Xn, . . . where Xn ∼ χ2(n) converges in

distribution to Z ∼ N(n, 2n)

χ2(n)

.∼ N(n, 2n) as n→ +∞


Central Limit Theorem

Another important cases is the Central Limit Theorem (CLT):let Y1,Y2, . . . ,Yn, . . . be a sequence of i.i.d. variables with finitemean µ and finite variance σ2 > 0. Then

Zn =(Y − µ)

σ/√n

d→ N(0, 1)

The CLT implies that, in large samples, the sampling distribution ofY can be approximated with the normal density with mean µ andvariance σ2

n .


Central Limit Theoremn = 1

x

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

n = 5

x

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

n = 10

x

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

n = 50

x

−6 −4 −2 0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0


Convergence in probability

We say hat the sequence S1, S2, . . . , Sn, . . . converges inprobability to S , Sn

p→ S , if for any ε > 0

Pr(|Sn − S | > ε)→ 0 as n→∞

A special case of this is the weak law of large numbers:if Y1,Y2, ... is a sequence of i.i.d. random variables, each with finitemean µ, then

Yp→ µ.


Some consequences

Consider the average Y of a random sample drawn from Y withmean E [Y ] = µ and variance V [Y ] = σ2.The weak law of large numbers implies that

Y is a consistent estimator of µ.

It is also an unbiased estimator of µ (E [Y ] = µ.)


Some properties

If s0 and u0 are constants, these modes of convergence are relatedas follows:

Snp→ S =⇒ Sn

d→ S ;

Snd→ s0 =⇒ Sn

p→ s0;Sn

p→ s0 =⇒ h(Sn)p→ h(s0);

Snd→ S and Un

p→ u0 =⇒ Sn + Und→ S + u0 and

SnUnd→ Su0.

The fourth of these is known as Slutsky’s lemma.

Applied Statistics - web.uniroma1.it lezioni/Slides... · Introduction Sampling Variation Random...

Documents

Transcript of Applied Statistics - web.uniroma1.it lezioni/Slides... · Introduction Sampling Variation Random...