Stat 101: Lecture 18

33
Stat 101: Lecture 18 Summer 2006

Transcript of Stat 101: Lecture 18

Page 1: Stat 101: Lecture 18

Stat 101: Lecture 18

Summer 2006

Page 2: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 3: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 4: Stat 101: Lecture 18

Designed Experiments

I Double-blinded, randomized, control study versusobservational study.

I Causation and association.I Confounding factors may exist.I Weighted average and the chi-square test.I Summary statistics: mean, median, sd, IQR.I Plots: histogram, boxplot, scatterplot.

Page 5: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 6: Stat 101: Lecture 18

Mathematical model for regression

I Each point (Xi , Yi) in the scatterplot satisfies:

Yi = a + bXi + εi

I εi ∼ N(0, sd = σ). σ is usually unknown. The ε’s havenothing to do with one another (independent). e.g., big εidoes not imply big εj .

I We know Xi ’s exactly. This imply that all error occurs in thevertical direction.

Page 7: Stat 101: Lecture 18

Estimating the regression line

ei = Yi − (a + bXi) is called residuals. It measures the verticaldistance from a point to the regression line.One estimates a and b by minimizing,

f (a, b) =n∑

i=1

(Yi − (a + bXi))2

Take the derivative of f (a, b) w.r.t a and b, and set them to 0, weget,

a = Y − bX ; b =1n

∑n1 XiYi − X Y

1n

∑n1 X 2

i − X 2

f (a, b) is also referred as Sum of Squared Errors (SSE).

Page 8: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 9: Stat 101: Lecture 18

Probability

I Definition – Frequentist view versus Bayesian view.I Kolmogorov’s Axioms.I Conditional probability.

P(A | B) =P(A and B)

P(B)

I Independence.

P(A | B) = P(A)

I The Addition Rule:

P(A or B) = P(A) + P(B)− P(A and B)

Page 10: Stat 101: Lecture 18

I The total probability rule:

P(A) = P(A | B)P(B) + P(A | not B)P(not B)

I The Bayes’ rule:Let A1, . . . , An be mutually exclusive and suppose thatP(A1 or A2 . . . or An) = 1. Then,

P(A1 | B) =P(B | A1)× P(A1)∑n

i=1 P(B | Ai)P(Ai)

Page 11: Stat 101: Lecture 18

I The binomial formula:

P(exactly r successes) =

(nr

)pr (1− p)n−r

I The Poisson formula:

P(exactly k events) =λk

k !exp(−λ)

Page 12: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 13: Stat 101: Lecture 18

The Normal distribution and the Central Limit TheoremI The normal distribution and use of the normal table,

f (x | µ, σ) =1√2πσ

exp(− 1

2σ2 (x − µ)2)

I Box model – EV, σ.I the Central Limit Theorem for averages:

X − EVσ/√

n∼ N(0, 1)

I the Central Limit Theorem for sums:

nX − nEV√nσ

∼ N(0, 1)

I the Central Limit Theorem for proportion:

p − p√p(1− p)/n

∼ N(0, 1)

Page 14: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 15: Stat 101: Lecture 18

Confidence Intervals

I The formulas:I (L, U):

L = pe − se × cv(1−C)/2; U = pe + se × cv(1−C)/2

I (−∞, L) : L = pe + se × cvC .I (U,+∞) : U = pe + se × cv(1−C).

I Confidence Intervals for,I Average: pe = X , se = σ/

√n.

I Sum: pe = nX , se =√

nσ.I Proportion: pe = X , se =

√p(1− p)/n

I Interpretation – what is random, and what is constant.

Page 16: Stat 101: Lecture 18

Confidence Intervals

I The formulas:I (L, U):

L = pe − se × cv(1−C)/2; U = pe + se × cv(1−C)/2

I (−∞, L) : L = pe + se × cvC .I (U,+∞) : U = pe + se × cv(1−C).

I Confidence Intervals for,I Average: pe = X , se = σ/

√n.

I Sum: pe = nX , se =√

nσ.I Proportion: pe = X , se =

√p(1− p)/n

I Interpretation – what is random, and what is constant.

Page 17: Stat 101: Lecture 18

Confidence Intervals

I The formulas:I (L, U):

L = pe − se × cv(1−C)/2; U = pe + se × cv(1−C)/2

I (−∞, L) : L = pe + se × cvC .I (U,+∞) : U = pe + se × cv(1−C).

I Confidence Intervals for,I Average: pe = X , se = σ/

√n.

I Sum: pe = nX , se =√

nσ.I Proportion: pe = X , se =

√p(1− p)/n

I Interpretation – what is random, and what is constant.

Page 18: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 19: Stat 101: Lecture 18

Significance Tests

A significance test requires:

I a null and alternative hypothesis.I a test statistic.I a significance probability (P-value).

Page 20: Stat 101: Lecture 18

I: Possible hypotheses

1. H0 : θ = θ0; H0 : θ 6= θ0.2. H0 : θ ≤ θ0; H0 : θ > θ0.3. H0 : θ ≥ θ0; H0 : θ < θ0.

Here θ represents a generic parameter. It could be a populationmean, a population proportion, the difference of two populationmeans, or many other things.

Page 21: Stat 101: Lecture 18

II: Possible test statistics

a. Population mean, we take θ to be the population mean µ. Ifyou know the population SD, or for n > 26 you use thesample SD as an estimate of the SD, then you get thesignificance probability from a z-table and the test statisticis:

ts =X − µ0

SD/√

n

b. For the previous case, if you have a sample of size n ≤ 26,and use the sample SD to estimate the population SD,then the significance probability comes from a tn−1 tableand the test statistic is:

ts =X − µ0

SD/√

n − 1

Page 22: Stat 101: Lecture 18

c. For a test about a proportion, θ = p. The significanceprobability comes from a z-table, and the test statistics is:

ts =p − p0√

p0(1− p0)/n

d. For a test of the difference of two means, θ = µ1 − µ2.Assuming that the sample sizes from each populationsatisfy n1 > 26 and n2 > 26, then the significanceprobability comes from a z-table and the test statistic is:

ts =X1 − X2 − θ0√

SD21/n1 + SD2

2/n2

Page 23: Stat 101: Lecture 18

e. For a test of the difference of two proportions, takeθ = p1 − p2. Use a z-table for the significance probabilityand the test statistic:

ts =p1 − p2 − θ0√

p1 (1− p1) /n1 + p2 (1− p2) /n2

f. For n ≤ 26, with θ = µ1 − µ2, and n paired differencesXi −Yi , use tn−1 for the significance probability and the teststatistic is:

ts =X − Y − θ0

SDd/√

n − 1

The SDd is the sample variance of the n differences.

Page 24: Stat 101: Lecture 18

III: The significance probability

I The significance probability of the test statistic depends onthe hypothesis chosen in Part I. For that choice, let W be arandom variable with z or tn−1 distribution, as indicated inPart II. Then,

1. The significance probability isP(W ≤ − | ts |) + P(W ≥ − | ts |).

2. The significance probability is P(W ≥ ts).3. The significance probability is P(W ≤ −ts).

I The significance probability is “the chance of observingdata that supports the alternative hypothesis as or morestrongly than the data you have seen, when the nullhypothesis is correct.”

Page 25: Stat 101: Lecture 18

I Goodness-of-Fit Tests:

H0 : The model holds; Ha : The model fails

ts =∑ (Oi − Ei)

2

Ei

k = #categories − 1

I Contigency table and tests of independence:

H0 : the two criteria are independent

Ha : some dependence exists

ts =∑

all cells

(Oij − Eij)2

Eij

Eij =ith row sum× jth column sum

totalk = (number of rows - 1)× (number of columns - 1)

Page 26: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 27: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 28: Stat 101: Lecture 18

Multiple Regression

I In multiple regression, there is more than one explanatoryvariable. The model is,

Yi = a + b1X1i + b2X2i + . . . + bpXpi + εi

Again, the εi are independent normal r.v.s with mean 0.I The null and alternative hypotheses are:

H0 : b1 ≥ 0; Ha : b1 < 0

I The test statistic is,

ts =b1 − 0

seI This is compared to a t-distribution with n-p-1 degrees of

freedom where p is the number of explanatory variables inour regression model.

Page 29: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 30: Stat 101: Lecture 18

The BootstrapThe pivot confidence interval assumes that the behavior ofθ − θ is approximately the same as the behavior of θ − θ∗. And,

I Suppose we use a computer to draw 1000 bootstrapsamples of size n. For each such sample, it calculates anew estimate of the parameter of interest.

I Rank these estimate from least to largest. We denotethese ordered bootstrap estimates by

θ∗(1), . . . , θ∗(1000)

where the number in parentheses shows the order in termsof size. Thus θ∗(1) is the smallest estimate of the sd found in

one of the 1000 bootstrap samples, and θ∗(1000) is thelargest.

I The 95% confidence interval is given by,

L = 2θ − θ∗(0.975); U = 2θ − θ∗(0.025)

Page 31: Stat 101: Lecture 18

Outline

Designed Experiments and Descriptive Statistics

Simple Linear Regression

Probability

The normal distribution and the Central Limit Theorem

Confidence Intervals

Significance Tests

Multiple Linear Regression

Multiple Regression

The Bootstrap

Bayesian Statistics

Page 32: Stat 101: Lecture 18

Bayesian Statistics

I Recall the Bayes’ Theorem:

P(A1 | B) =P(B | A1)× P(A1)∑ki=1 P(B | Ai)× P(Ai)

where A1, . . . , Ak are mutually exclusive and

P(A1 or A2 or . . . or Ak ) = 1

I Specify prior distribution. Calculate Likelihood andposterior.

I Posterior predictive probability – use the posteriorprobability as weight.

Page 33: Stat 101: Lecture 18

The Prior, Likelihood, and Posterior

Model Prior P(data | model) Product Posteriorp P(model) P(k = 0 | p) P(model | data)

0.1 1/9 0.656 0.0729 0.4270.2 1/9 0.410 0.0455 0.2670.3 1/9 0.24 0.0266 0.1560.4 1/9 0.130 0.0144 0.0840.5 1/9 0.065 0.007 0.0410.6 1/9 0.026 0.0029 0.0170.7 1/9 0.008 0.0009 0.0050.8 1/9 0.002 0.0002 0.0010.9 1/9 0.000 0.0000 0.000

1 0.1704 1