STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section...

11
1/12/14 1 STAT 341 Format: Lectures Monday and first hour Friday Section Wednesday Problem session second hour Friday (except this week) Midterm on Friday February 7 Final as scheduled Wednesday March 19 Chapter 5-6 in the text More information on class webpage 1 Climate change Climate change [is] changes in long-term averages of daily weather. NASA: Climate and weather web site Climate is what you expect; weather is what you get. Heinlein: Notebooks of Lazarus Long (1978) Climate is the distribution of weather. AMSTAT News (June 2010) 2 Some evidence Sea level rise Arctic ice 3 1950 2000 2050 2100 0 10 20 30 40 50 60 RCP 4.5 Year Sea level anomaly (cm) Global Historical Climatology Network

Transcript of STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section...

Page 1: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

1  

STAT 341!Format:!Lectures Monday and first hour Friday!Section Wednesday!Problem session second hour Friday (except this week)!!Midterm on Friday February 7!Final as scheduled Wednesday March 19!!Chapter 5-6 in the text!!More information on class webpage!

1

Climate change!Climate change [is] changes in long-term averages of daily weather. !NASA: Climate and weather web site!!Climate is what you expect; weather is what you get.!Heinlein: Notebooks of Lazarus Long (1978)!!Climate is the distribution of weather.!AMSTAT News (June 2010)!!

2

Some evidence!Sea level rise!!!!!Arctic ice!!

3

1950 2000 2050 2100

010

2030

4050

60

RCP 4.5

Year

Sea

leve

l ano

mal

y (c

m)

Global Historical Climatology Network!

Page 2: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

2  

Global temperature!Estimate temperature everywhere (by solving a stochastic partial differential equation numerically)!

5

Comparison of global averages!

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

1850 1900 1950 2000

-1.0

-0.5

0.0

0.5

YearAnomaly

Base period 1970-1989

Ranking temperatures!“Global temperatures in 2003 were 0.56°C (1.01°F) above the long-term (1880-2003) average, ranking 2003 the second warmest year on record, which tied 2002. ” (NOAA 2012)!What is the uncertainty in such a statement?!!

1880 1900 1920 1940 1960 1980 2000

-1.0

-0.5

0.0

0.5

Year

Anomalies

1880 1900 1920 1940 1960 1980 2000

-1.0

-0.5

0.0

0.5

Year

Anomalies

Rank and P(hottest)!1994 0.03

Rank

Frequency

0 40 80 120

010000

1995 0.04

Rank

Frequency

0 40 80 120

06000

1996 0.03

Rank

Frequency

0 40 80 120

010000

1997 0.1

Rank

Frequency

0 40 80 120

015000

1998 0.41

Rank

Frequency

0 40 80 120

030000

1999 0.04

Rank

Frequency

0 40 80 120

010000

2000 0.01

Rank

Frequency

0 40 80 120

04000

2001 0.05

Rank

Frequency

0 40 80 120

06000

2002 0.12

Rank

Frequency

0 40 80 120

010000

2003 0.21

Rank

Frequency

0 40 80 120

015000

2004 0.1

Rank

Frequency

0 40 80 120

08000

2005 0.09

Rank

Frequency

0 40 80 120

08000

2006 0.07

Rank

Frequency

0 40 80 120

06000

2007 0.24

Rank

Frequency

0 40 80 120

015000

2008 0.09

Rank

Frequency

0 40 80 120

010000

2009 0.09

Rank

Frequency

0 40 80 120

08000

Page 3: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

3  

Statistics vs. probability!A probability model is a fixed, fully determined distribution !P(X=k) = 3k e-3 / k!!!A statistical model is not fully determined:!P(X=k) = λk e-λ / k!, λ>0!!We use data from a sample (often independent observations from this distribution) to estimate λ, determine whether λ < 3, etc.!!

9

Probability is the science of randomness!–regularity in irregularity!!Build probability models to !describe observed phenomena!!Statistics is a system for formal inference based on probability models!!Estimate parameters of models!Test hypotheses!Assess uncertainty!

10

Lund, Sweden, annual maximum temperatures!

Address 55.7,13.2

loc: 55.7,13.2 - Google Maps https://maps.google.com/maps?q=loc:55.7,13.2&hl=en&t=m&ie=...

1 of 1 1/2/14, 4:11 PM

11

1960 1970 1980 1990 2000 2010

2022

2426

Year

Ann

ual m

axim

um te

mpe

ratu

re (°

C)

A possible model!The generalized extreme value distribution has cdf The parameters describe location, spread and shape, respectively. If we knew the values of the parameters we could draw the histogram of the data and overlay the corresponding density function to see if it is a good fit.

12

F(x;µ,σ,ξ) = exp − 1+ ξ x − µσ

⎛⎝

⎞⎠

⎡⎣⎢

⎤⎦⎥

⎧⎨⎩

⎫⎬⎭

Page 4: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

4  

13

18 20 22 24 26

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Maximum annual temperature (°C)

Frequency

18 20 22 24 26

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Maximum annual temperature (°C)

Frequency

18 20 22 24 26

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Maximum annual temperature (°C)

Frequency

18 20 22 24 26

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Maximum annual temperature (°C)

Frequency

µ,σ,ξ=20,2,-.25

µ,σ,ξ=23,1.5,-.2

µ,σ,ξ=22.4,1.6,-.28 Concepts!

Theoretical! Practical!

Parameter θ! Parameter value θ0!

Sample X1,X2,…,Xn! Data x1,x2,…,xn!

Statistic g(X1,X2,…,Xn)! Sample statistic! g(x1,x2,…xn)!

Estimator ! Estimate!

pdf f(x;θ)! Estimated pdf f(x; )!

14

θ̂(X1,X2,...,Xn) θ̂(x1,x2,...,xn)θ̂

Sampling distribution!Since! ! ! ! ! is a random variable we can figure out its distribution. !Since it comes from a sample it is called a sampling distribution.!Its cdf is (as usual) P(! ! ! ! ! ≤y),!and we can determine, e.g., its mean!!!!!Note that it is not a number, but a function of the unknown parameter θ.!!

15

θ̂(X1,X2,...,Xn)

θ̂(X1,X2,...,Xn)

Eθ θ̂(X1,X2,...,Xn)⎡⎣ ⎤⎦ =

! θ̂(x1,...,xn)f(x1,...,xn;θ)dx1!dxn∫∫

What estimation is all about!

In 1918 R. A. Fisher proposed estimating parameters by considering !how likely are the data if θ is the true parameter?!Choosing the parameter that makes the observations most likely is formalized using the likelihood function!!!!The data are fixed!The parameter is varying!!! 16

L(θ ) =fX1,...,X n (x1, ...,xn;θ), continuous casepX1,...,X n (x1, ...,xn;θ ),discrete case

# $ %

R.A.Fisher 1890-1962

Page 5: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

5  

The method of maximum likelihood!

Define the mle!We compute it by setting!and checking that ! ! ! , or that L′ has sign change + 0 – about the maximum. Alternatively, plot L′(θ) as a function of θ and find the maximum numerically.!Computational trick: maximize the log likelihood !

17

θ̂ = argmax(L(θ))!L (θ) = 0

!!L (θ) < 0

(θ) = log(L(θ))

The exponential case!Consider x1,x2,…,xn independent observations from an exponential distribution with parameter λ.!The likelihood is!

18

L(λ) = λexp(−λxi)i=1

n

∏ = λn exp −λ xii=1

n

∑⎛⎝⎜

⎞⎠⎟

0.0 0.5 1.0 1.5 2.0

0.0e+00

5.0e-47

1.0e-46

1.5e-46

2.0e-46

Lambda

Likelihood

0.95

Exponential, cont.!

19

L(λ) = λn exp(−λ xii=1

n

∑ )

ℓ(λ) = logL(λ) = nlogλ − λ xi

i=1

n

′ℓ (λ) = n

λ− xi

i=1

n

′ℓ (λ) = 0⇒ λ̂ = n

xi∑= 1x

′′ℓ (λ) = − n

λ2< 0

Normal example!

20

L(µ) = 12π

⎛⎝

⎞⎠

n/2

e− 12 xi−µ( )2

i=1

n

Page 6: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

6  

Stopping!Flip a coin until the first heads. Suppose it takes 6 tries. The likelihood is!!!!Now suppose we were going to flip the coin 6 times, and happened to get one head. The likelihood is!!!!!How are the mles different?!!Fact: Changing the likelihood by a constant does not change the mle.!!However, the standard errors (sd of sampling distribution) would be different.!

21

Approximate standard errors for the mle!

An approximate formula for the estimated standard error of an mle (valid for large samples and for smooth likelihoods) is!

22

ese(ˆ θ ) = −d2

dθ 2 l( ˆ θ )$ % &

' &

( ) &

* &

−1/ 2

The method of moments!

Karl Pearson (1900) proposed this way of estimating parameters. Suppose we have a distribution with Eθ(X)=h(θ ). By the law of large numbers, !!!so we let our estimate solve the equation !

Eθ (X) ≈ x θ(x1,...,xn )

h(θ) = x

Karl Pearson, 1857-1936!

The exponential case!Let X1,...,Xn be iid Exp(λ), so Eλ(X)=1/λ. Below is h(λ) plotted against λ, with three samples of size 50 for λ=2.5 marked.!

25 0 1 2 3 4

010

2030

4050

lambda

1/lambda

x = 0.298 λ = 3.35x = 0.440 λ = 2.27x = 0.551 λ = 1.82

Page 7: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

7  

Bias calculation!We have ! ! ! ! ! , the gamma distribution with density !!!!!!!!!!!!

27

Y = Xi∑ Γ(n,λ)

f(y) = λnyn−1

(n− 1)!e−λy , y > 0

E( λ) = nE(1/ Y) =

bias( λ;λ) =

The hypergeometric

distribution!Consider X ~ Hyp(N,n,w) and assume that we observe X=x. How can we estimate p=w/N?!!E(X) = !

Muon decay!The cosine X of the emission angle of electrons in muon decay has density!!!The mean of the density is! ! h(α) =!!!and the resulting mom estimator is!

fX (x;α) =1+ αx2

, − 1≤ x ≤ 1, − 1≤ α ≤ 1

Some properties!Unbiased!Minimum variance!Minimum variance unbiased!Minimum mean squared error:!!!For unbiased estimators Mse = Var!!

30

Mse(θ*;θ) = Eθ (θ * −θ)2#$ %&

Mse(θ*;θ) = Var(θ*) + Bias(θ*;θ)[ ]2

Page 8: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

8  

Important distinctions!!!!Xi is a random variable!xi is the value of the random variable after we observe it (a number)!!An estimator is a function of random variables. It is a random variable.!!An estimate (the observed value of the estimator) is a function of observed values. It is a number.!

31

fθ̂(y;θ)

Estimator (random variable) Parameter Dummy variable

Friday’s lecture!Maximum likelihood!Method of moments!Comparison of estimators!Mean squared error!!

32

Normal case again!Sample from N(μ,1), but we want to estimate θ=μ2.!

33

Invariance of mle!Suppose we are interested in estimating a (nice) function h(θ), using a sample from f(x;θ). If is the mle of θ, then h( ) is the mle of h(θ). We say that the mle is invariant under reparametrization!

34

θ̂ θ̂

Page 9: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

9  

Standard errors!Sample from Bin(1,p)!Mle?!!Bias?!!Variance?!!!Standard error?!!!Estimated standard error?!

35

Computing mean squared error!

Sample from Exp(λ)!

36

Binomial distribution!

Consider a binomial experiment with four trials, and outcomes x1,…,x4. Here are three possible estimators of the success probability p:!!!!!Which of these are unbiased?!

ˆ p 1 = X ˆ p 2 = X2ˆ p 3 = X1 + 2X2 − 4X3 + 2X4

Sample variance!The mle of the variance for the normal distribution is !!!!A tedious calculation (p. 384) shows that!!!This is true for any distribution with a variance. Often one defines the sample variance as the unbiased estimator!

49

σ̂2 =1n

(xi − x)2i=1

n

E σ̂2( ) = n− 1n σ2

S2 = 1n− 1

(xi − x)2i=1

n

Page 10: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

10  

Efficiency!Define the relative efficiency of two unbiased estimators by the ratio of their variances!!!!!In the binomial case!!!and!!

eff(θ̂1,θ̂2 ) =Var(θ̂2 )Var(θ̂1)

eff(p̂1,p̂2 ) =

eff(p̂1,p̂3 ) =

Uniform case!

52

Mse(θ̂) =

Mse(θ) =

Uniform case!Recall that ! ! ! !, so ! ! ! ! ! is unbiased.!! !

53

E(θ̂) = nθn+ 1

θ* = n+ 1n

θ

eff(θ,θ*) =

Finding the best unbiased estimator!

Cramér-Rao!!!Important contributors to mathematical statistics.!Cramér-Rao inequality:!X1,...,Xn iid from smooth density, unbiased estimator!

54

Harald Cramér 1893-1985 C. R. Rao 1920-

θ* = θ * (X1,...,Xn)

Var(θ*) ≥ −nE ∂2

∂θ2lnf(Y;θ)

%

&'(

)*+

,-

.

/0

−1

Page 11: STAT 341 Climate change · STAT 341! Format:! Lectures Monday and first hour Friday! Section Wednesday! Problem session second hour Friday (except this week)!! Midterm on Friday

1/12/14  

11  

Exponential case!

55

! ! !is unbiased, with variance!!! ! . The lower bound is obtained !

!from !!!!!!!Hence the bound is λ2/n which is smaller than the variance of the biascorrected mle/mome. !

n− 1n

λ̂

λ2

n− 2ln(f(x;λ)) = lnλ − λx∂∂λln(f(x;λ)) = 1

λ− x

∂2

∂λ2ln(f(x;λ)) = − 1

λ2

Parametrization matters!!Let θ=1/λ, so E(X)=θ. Then the mle of θ is !which is unbiased with variance !Var(X) / n = θ2 / n. !

The log density is – ln (θ) – x/θ !with second derivative 1/θ2 – 2x/θ3. !Thus the lower bound is ! (n(2 E(X) / θ3 – 1 / θ2))-1 = θ2/n.!!The λ-parametrization is called the natural parametrization, while the θ-parametrization is the mean-value parametrization.!

56

x

Another best unbiased estimator!

Let X~Po(λ), and try to estimate P(X=0).!If h(x) is unbiased we have!!!!!!!By identifying terms in the power series we must have!!!What is the mle?!Is it unbiased?!

57

h(x)λx

x!x=0

∑ = 1

h(x) = 10

!"#

if x = 0if x ≠ 0

h(x)λx

x!x=0

∑ e−λ = e−λ