MIT2 854F10 Stats

7/30/2019 MIT2 854F10 Stats

1/38

1

Statistical Inference

Lecturer: Prof. Duane S. Boning

7/30/2019 MIT2 854F10 Stats

2/38

2

Agenda

1. Review: Probability Distributions & Random Variables2. Sampling: Key distributions arising in sampling

Chi-square, t, and F distributions

3. Estimation:

Reasoning about the population based on a sample4. Some basic confidence intervals

Estimate of mean with variance known

Estimate of mean with variance not known

Estimate of variance

5. Hypothesis tests

7/30/2019 MIT2 854F10 Stats

3/38

3

Discrete Distribution: Bernoulli

Bernoulli trial: an experiment with two outcomes

Probability mass function (pmf):

f(x)

x

p

1 - p

0 1

7/30/2019 MIT2 854F10 Stats

4/38

4

Discrete Distribution: Binomial

Repeated random Bernoulli trials

nis the number of trials pis the probability of success on any one trial xis the number of successes in ntrials

7/30/2019 MIT2 854F10 Stats

5/385

Binomial Distribution

Binomial Distribution

0

0.05

0.1

0.15

0.2

0.25

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Number of "Successes"

0

0.2

0.4

0.6

0.8

1

1.2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Series1

7/30/2019 MIT2 854F10 Stats

6/386

Discrete Distribution: Poisson

Poisson is a good approximation to Binomial when nislarge and pis small (< 0.1)

Example applications: # misprints on page(s) of a book

# transistors which fail on first day of operation

Mean:

Variance:

7/30/2019 MIT2 854F10 Stats

7/387

Poisson Distribution

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1 3 5 7 9 11 13 1 5 17 19 21 2 3 25 27 29 3 1 33 35 37 3 9 41Events per unit

c


0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1 3 5 7 9 11 1 3 15 1 7 19 2 1 23 2 5 27 2 9 31 3 3 35 3 7 39 4 1

Events pe r unit

c


0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 3 5 7 9 11 1 3 15 17 1 9 21 2 3 25 2 7 29 31 3 3 35 3 7 39 4 1

Events per unit

=5

=20

=30

Poisson Distributions

7/30/2019 MIT2 854F10 Stats

8/388

Continuous Distributions

Uniform Distribution Normal Distribution

Unit (Standard) Normal Distribution

7/30/2019 MIT2 854F10 Stats

9/389

Continuous Distribution: Uniform

xa

1

b

xa b

cumulative distribution function* (cdf)

probability density function (pdf)

also sometimes called a probability distribution function

*also sometimes called a cumulative density function

7/30/2019 MIT2 854F10 Stats

10/3810

Standard Questions You Should Be Able To Answer(For a Known cdf or pdf)

xa

1

b

xa b Probability x sits within

some range

Probability x less than orequal to some value

7/30/2019 MIT2 854F10 Stats

11/3811

Continuous Distribution: Normal (Gaussian)

0

0.16

0.5

0.84

1

0.977

0.99865

0.0227.00135

pdf

cdf

0

7/30/2019 MIT2 854F10 Stats

12/3812

Continuous Distribution: Unit Normal

Normalization

cdf

Mean

Variance

pdf

7/30/2019 MIT2 854F10 Stats

13/3813

0.1

0.5

0.9

1

Using the Unit Normal pdf and cdf

We often want to talk aboutpercentage points of thedistribution portion in thetails

7/30/2019 MIT2 854F10 Stats

14/3814

Philosophy

The field of statistics is aboutreasoning in the face of

uncertainty, based on evidencefrom observed data

Beliefs: Distribution or model form

Distribution/model parameters

Evidence:

Finite set of observations or data drawn from a population Models:

Seek to explain data

7/30/2019 MIT2 854F10 Stats

15/38

15

Moments of the Populationvs. Sample Statistics

Mean

Variance

Standard

Deviation

Covariance

Correlation

Coefficient

Population Sample

7/30/2019 MIT2 854F10 Stats

16/38

16

Sampling and Estimation

Sampling: act of making observations from populations Random sampling: when each observation is identically

and independently distributed (IID)

Statistic: a function of sample data; a value that can be

computed from data (contains no unknowns) average, median, standard deviation

7/30/2019 MIT2 854F10 Stats

17/38

17

http://stat-www.berkeley.edu/~stark/SticiGui

Sampling Demo

SticiGui: Statistics Tools for Internet and ClassroomInstruction with a Graphical User Interface
http://stat-www.berkeley.edu/~stark/SticiGuihttp://stat-www.berkeley.edu/~stark/SticiGui

7/30/2019 MIT2 854F10 Stats

18/38

18

Population vs. Sampling Distribution

Population

(probability density function)

n = 20

Sample Mean

(statistic) n = 10

n = 2

Sample Mean

(sampling distribution)

n = 1

7/30/2019 MIT2 854F10 Stats

19/38

19

Sampling and Estimation, cont.

Sampling Random sampling

Statistic

A statistic is a random variable, which itself has asampling distribution I.e., if we take multiple random samples, the value for the statistic

will be different for each set of samples, but will be governed bythe same sampling distribution

If we know the appropriate sampling distribution, we canreason about the population based on the observed

value of a statistic E.g. we calculate a sample mean from a random sample; in what

range do we think the actual (population) mean really sits?

7/30/2019 MIT2 854F10 Stats

20/38

20

Sampling and Estimation An Example

Suppose we know that the thickness ofa part is normally distributed with std.dev. of 10:

We sample n= 50 random parts andcompute the mean part thickness:

First question: What is distribution of

Second question: can we useknowledge of distribution to reasonabout the actual (population) meangiven observed (sample) mean?

7/30/2019 MIT2 854F10 Stats

21/38

21

Estimation and Confidence Intervals

Point Estimation: Find best values for parameters of a distribution

Should be

Unbiased: expected value of estimate should be true value

Minimum variance: should be estimator with smallest variance

Interval Estimation:

Give bounds that contain actual value with a

given probability

Must know sampling distribution!

Confidence Interval Demo

7/30/2019 MIT2 854F10 Stats

22/38

22

Confidence Intervals: Variance Known

We know , e.g. from historical data

Estimate mean in some interval to (1- )100% confidence

0.1

0.5

0.9

1

Remember the unit normal

percentage points

Apply to the sampling

distribution for thesample mean

7/30/2019 MIT2 854F10 Stats

23/38

23

95% confidence interval, = 0.05

n = 50

~95% of distribution

lies within +/- 2 of mean

Example, Contd

Second question: can we use knowledge of distribution to

reason about the actual (population) mean given observed

(sample) mean?

7/30/2019 MIT2 854F10 Stats

24/38

24

Reasoning & Sampling Distributions

Example shows that we need to know our sampling

distribution in order to reason about the sample andpopulation parameters

Other important sampling distributions:

Student-t

Use instead of normal distribution when we dont know actual

variation or

Chi-square

Use when we are asking about variances

F

Use when we are asking about ratios of variances

7/30/2019 MIT2 854F10 Stats

25/38

25

Sampling: The Chi-Square Distribution

Typical use: find distribution

of variance when mean isknown

Ex:

So if we calculate s2, we can useknowledge of chi-square distribution

s

0 5 10 15 20 25 300

0.01

0.02

0.03

0.04

0.05

0.06

0.070.08

0.09

0.1

to put bounds on where we believe

the actual (population) variance sit

S li Th S d Di ib i

7/30/2019 MIT2 854F10 Stats

26/38

26

Sampling: The Student-t Distribution

Typical use: Find distribution of average when is NOT known

Fork! 1, tk ! N(0,1)

Considerxi~ N( ,2) . Then

This is just the normalized distance from mean (normalized

to our estimate of the sample variance)

B k E l

7/30/2019 MIT2 854F10 Stats

27/38

27

Back to our Example

Suppose we do not know either the variance or the mean in

our parts population:

We take our sample of size n= 50, and calculate

Best estimate of population mean and variance (std.dev.)?

If had to pick a range where would be 95% of time?

Have to use the appropriate sampling distribution:

In this case the t-distribution (rather than normal

distribution)

C fid I t l V i U k

7/30/2019 MIT2 854F10 Stats

28/38

28

Confidence Intervals: Variance Unknown

Case where we dont know variance a priori Now we have to estimate not only the mean based on

our data, but also estimate the variance

Our estimate of the mean to some interval with

(1- )100% confidence becomes

Note that the t distribution is slightly wider than the normaldistribution, so that our confidence interval on the true mean is

not as tight as when we know the variance.

E l C td

7/30/2019 MIT2 854F10 Stats

29/38

29

Example, Contd

Third question: can we use knowledge of distribution to

reason about the actual (population) mean given observed

(sample) mean even though we werent told ?

n = 50

95% confidence intervalt distribution isslightly wider than

gaussian distribution

Once More to Our Example

7/30/2019 MIT2 854F10 Stats

30/38

30

Once More to Our Example

Fourth question: how about a confidence interval on our

estimate of the variance of the thickness of our parts, based onour 50 observations?

Confidence Intervals: Estimate of Variance

7/30/2019 MIT2 854F10 Stats

31/38

31

Confidence Intervals: Estimate of Variance

The appropriate sampling distribution is the Chi-square

Because2

is asymmetric, c.i. bounds not symmetric.

Example Contd

7/30/2019 MIT2 854F10 Stats

32/38

32

0 10 20 30 40 50 60 70 80 90 1000

0.0050.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Example, Contd

Fourth question: for our example (where we observed

sT2 = 102.3) with n= 50 samples, what is the 95%confidence interval for the population variance?

Sampling: The F Distribution

7/30/2019 MIT2 854F10 Stats

33/38

33

Sampling: The F Distribution

Typical use: compare the spread of two populations

Example: x~ N( x,

2x) from which we sample x1, x2, , xn

y~ N( y,2

y) from which we sample y1, y2, , ym

Then

or

Concept of the F Distribution

7/30/2019 MIT2 854F10 Stats

34/38

34

Concept of the F Distribution

Assume we have a normally

distributed population We generate two different

random samples from the

population

In each case, we calculate a

sample variance si2 What range will the ratio of

these two variances take?

) F distribution

Purely by chance (due to

sampling) we get a range ofratios even though drawing

from same population

Example:

Assume x~ N(0,1)

Take samples of size n= 20

Calculate s12 and s2

2 and take ratio

95% confidence interval on ratio

Large range in ratio!

Hypothesis Testing

7/30/2019 MIT2 854F10 Stats

35/38

35

Hypothesis Testing

A statistical hypothesis is a statement about the parameters of a

probability distribution H0 is the null hypothesis

E.g.

Would indicate that the machine is working correctly

H1 is the alternative hypothesis

E.g. Indicates an undesirable change (mean shift) in the machine

operation (perhaps a worn tool)

In general, we formulate our hypothesis, generate a random

sample, compute a statistic, and then seek to reject H0 or fail toreject (accept) H0 based on probabilities associated with thestatistic and level of confidence we select

Which Population is Sample x From?

7/30/2019 MIT2 854F10 Stats

36/38

36

Which Population is Sample x From?

Two error probabilities in decision:

Type I error: false alarm

Type II error: miss

Power of test (correct alarm)

ConsiderH1 analarm condition

Set decision point (andsample size) based on

acceptable ,

ConsiderH0thenormal condition

risks Control charts are hypothesis tests:

Is my process in control or has a significant change occurred?

Summary

7/30/2019 MIT2 854F10 Stats

37/38

37

Summary1. Review: Probability Distributions & Random Variables

2. Sampling: Key distributions arising in sampling

Chi-square, t, and F distributions3. Estimation: Reasoning about the population based on a

sample

4. Some basic confidence intervals Estimate of mean with variance known

Estimate of mean with variance not known Estimate of variance

5. Hypothesis tests

Next Time:

1. Are effects (one or more variables) significant?) ANOVA (Analysis of Variance)2. How do we model the effectof some variable(s)?

) Regression modeling

MIT O C W

7/30/2019 MIT2 854F10 Stats

38/38

MIT OpenCourseWarehttp://ocw.mit.edu

2.854 / 2.853 Introduction to Manufacturing Systems

Fall 2010

For information about citing these materials or our Terms of Use, visit:http://ocw.mit.edu/terms.
http://ocw.mit.edu/http://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/

MIT2 854F10 Stats

Documents

Transcript of MIT2 854F10 Stats