MIT2 854F10 Stats
-
Upload
tesfaye-tefera -
Category
Documents
-
view
223 -
download
0
Transcript of MIT2 854F10 Stats
-
7/30/2019 MIT2 854F10 Stats
1/38
1
Statistical Inference
Lecturer: Prof. Duane S. Boning
-
7/30/2019 MIT2 854F10 Stats
2/38
2
Agenda
1. Review: Probability Distributions & Random Variables2. Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions
3. Estimation:
Reasoning about the population based on a sample4. Some basic confidence intervals
Estimate of mean with variance known
Estimate of mean with variance not known
Estimate of variance
5. Hypothesis tests
-
7/30/2019 MIT2 854F10 Stats
3/38
3
Discrete Distribution: Bernoulli
Bernoulli trial: an experiment with two outcomes
Probability mass function (pmf):
f(x)
x
p
1 - p
0 1
-
7/30/2019 MIT2 854F10 Stats
4/38
4
Discrete Distribution: Binomial
Repeated random Bernoulli trials
nis the number of trials pis the probability of success on any one trial xis the number of successes in ntrials
-
7/30/2019 MIT2 854F10 Stats
5/385
Binomial Distribution
Binomial Distribution
0
0.05
0.1
0.15
0.2
0.25
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Number of "Successes"
0
0.2
0.4
0.6
0.8
1
1.2
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Series1
-
7/30/2019 MIT2 854F10 Stats
6/386
Discrete Distribution: Poisson
Poisson is a good approximation to Binomial when nislarge and pis small (< 0.1)
Example applications: # misprints on page(s) of a book
# transistors which fail on first day of operation
Mean:
Variance:
-
7/30/2019 MIT2 854F10 Stats
7/387
Poisson Distribution
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
1 3 5 7 9 11 13 1 5 17 19 21 2 3 25 27 29 3 1 33 35 37 3 9 41Events per unit
c
Poisson Distribution
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
1 3 5 7 9 11 1 3 15 1 7 19 2 1 23 2 5 27 2 9 31 3 3 35 3 7 39 4 1
Events pe r unit
c
Poisson Distribution
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 3 5 7 9 11 1 3 15 17 1 9 21 2 3 25 2 7 29 31 3 3 35 3 7 39 4 1
Events per unit
=5
=20
=30
Poisson Distributions
-
7/30/2019 MIT2 854F10 Stats
8/388
Continuous Distributions
Uniform Distribution Normal Distribution
Unit (Standard) Normal Distribution
-
7/30/2019 MIT2 854F10 Stats
9/389
Continuous Distribution: Uniform
xa
1
b
xa b
cumulative distribution function* (cdf)
probability density function (pdf)
also sometimes called a probability distribution function
*also sometimes called a cumulative density function
-
7/30/2019 MIT2 854F10 Stats
10/3810
Standard Questions You Should Be Able To Answer(For a Known cdf or pdf)
xa
1
b
xa b Probability x sits within
some range
Probability x less than orequal to some value
-
7/30/2019 MIT2 854F10 Stats
11/3811
Continuous Distribution: Normal (Gaussian)
0
0.16
0.5
0.84
1
0.977
0.99865
0.0227.00135
pdf
cdf
0
-
7/30/2019 MIT2 854F10 Stats
12/3812
Continuous Distribution: Unit Normal
Normalization
cdf
Mean
Variance
pdf
-
7/30/2019 MIT2 854F10 Stats
13/3813
0.1
0.5
0.9
1
Using the Unit Normal pdf and cdf
We often want to talk aboutpercentage points of thedistribution portion in thetails
-
7/30/2019 MIT2 854F10 Stats
14/3814
Philosophy
The field of statistics is aboutreasoning in the face of
uncertainty, based on evidencefrom observed data
Beliefs: Distribution or model form
Distribution/model parameters
Evidence:
Finite set of observations or data drawn from a population Models:
Seek to explain data
-
7/30/2019 MIT2 854F10 Stats
15/38
15
Moments of the Populationvs. Sample Statistics
Mean
Variance
Standard
Deviation
Covariance
Correlation
Coefficient
Population Sample
-
7/30/2019 MIT2 854F10 Stats
16/38
16
Sampling and Estimation
Sampling: act of making observations from populations Random sampling: when each observation is identically
and independently distributed (IID)
Statistic: a function of sample data; a value that can be
computed from data (contains no unknowns) average, median, standard deviation
-
7/30/2019 MIT2 854F10 Stats
17/38
17
http://stat-www.berkeley.edu/~stark/SticiGui
Sampling Demo
SticiGui: Statistics Tools for Internet and ClassroomInstruction with a Graphical User Interface
http://stat-www.berkeley.edu/~stark/SticiGuihttp://stat-www.berkeley.edu/~stark/SticiGui -
7/30/2019 MIT2 854F10 Stats
18/38
18
Population vs. Sampling Distribution
Population
(probability density function)
n = 20
Sample Mean
(statistic) n = 10
n = 2
Sample Mean
(sampling distribution)
n = 1
-
7/30/2019 MIT2 854F10 Stats
19/38
19
Sampling and Estimation, cont.
Sampling Random sampling
Statistic
A statistic is a random variable, which itself has asampling distribution I.e., if we take multiple random samples, the value for the statistic
will be different for each set of samples, but will be governed bythe same sampling distribution
If we know the appropriate sampling distribution, we canreason about the population based on the observed
value of a statistic E.g. we calculate a sample mean from a random sample; in what
range do we think the actual (population) mean really sits?
-
7/30/2019 MIT2 854F10 Stats
20/38
20
Sampling and Estimation An Example
Suppose we know that the thickness ofa part is normally distributed with std.dev. of 10:
We sample n= 50 random parts andcompute the mean part thickness:
First question: What is distribution of
Second question: can we useknowledge of distribution to reasonabout the actual (population) meangiven observed (sample) mean?
-
7/30/2019 MIT2 854F10 Stats
21/38
21
Estimation and Confidence Intervals
Point Estimation: Find best values for parameters of a distribution
Should be
Unbiased: expected value of estimate should be true value
Minimum variance: should be estimator with smallest variance
Interval Estimation:
Give bounds that contain actual value with a
given probability
Must know sampling distribution!
Confidence Interval Demo
-
7/30/2019 MIT2 854F10 Stats
22/38
22
Confidence Intervals: Variance Known
We know , e.g. from historical data
Estimate mean in some interval to (1- )100% confidence
0.1
0.5
0.9
1
Remember the unit normal
percentage points
Apply to the sampling
distribution for thesample mean
-
7/30/2019 MIT2 854F10 Stats
23/38
23
95% confidence interval, = 0.05
n = 50
~95% of distribution
lies within +/- 2 of mean
Example, Contd
Second question: can we use knowledge of distribution to
reason about the actual (population) mean given observed
(sample) mean?
-
7/30/2019 MIT2 854F10 Stats
24/38
24
Reasoning & Sampling Distributions
Example shows that we need to know our sampling
distribution in order to reason about the sample andpopulation parameters
Other important sampling distributions:
Student-t
Use instead of normal distribution when we dont know actual
variation or
Chi-square
Use when we are asking about variances
F
Use when we are asking about ratios of variances
-
7/30/2019 MIT2 854F10 Stats
25/38
25
Sampling: The Chi-Square Distribution
Typical use: find distribution
of variance when mean isknown
Ex:
So if we calculate s2, we can useknowledge of chi-square distribution
s
0 5 10 15 20 25 300
0.01
0.02
0.03
0.04
0.05
0.06
0.070.08
0.09
0.1
to put bounds on where we believe
the actual (population) variance sit
S li Th S d Di ib i
-
7/30/2019 MIT2 854F10 Stats
26/38
26
Sampling: The Student-t Distribution
Typical use: Find distribution of average when is NOT known
Fork! 1, tk ! N(0,1)
Considerxi~ N( ,2) . Then
This is just the normalized distance from mean (normalized
to our estimate of the sample variance)
B k E l
-
7/30/2019 MIT2 854F10 Stats
27/38
27
Back to our Example
Suppose we do not know either the variance or the mean in
our parts population:
We take our sample of size n= 50, and calculate
Best estimate of population mean and variance (std.dev.)?
If had to pick a range where would be 95% of time?
Have to use the appropriate sampling distribution:
In this case the t-distribution (rather than normal
distribution)
C fid I t l V i U k
-
7/30/2019 MIT2 854F10 Stats
28/38
28
Confidence Intervals: Variance Unknown
Case where we dont know variance a priori Now we have to estimate not only the mean based on
our data, but also estimate the variance
Our estimate of the mean to some interval with
(1- )100% confidence becomes
Note that the t distribution is slightly wider than the normaldistribution, so that our confidence interval on the true mean is
not as tight as when we know the variance.
E l C td
-
7/30/2019 MIT2 854F10 Stats
29/38
29
Example, Contd
Third question: can we use knowledge of distribution to
reason about the actual (population) mean given observed
(sample) mean even though we werent told ?
n = 50
95% confidence intervalt distribution isslightly wider than
gaussian distribution
Once More to Our Example
-
7/30/2019 MIT2 854F10 Stats
30/38
30
Once More to Our Example
Fourth question: how about a confidence interval on our
estimate of the variance of the thickness of our parts, based onour 50 observations?
Confidence Intervals: Estimate of Variance
-
7/30/2019 MIT2 854F10 Stats
31/38
31
Confidence Intervals: Estimate of Variance
The appropriate sampling distribution is the Chi-square
Because2
is asymmetric, c.i. bounds not symmetric.
Example Contd
-
7/30/2019 MIT2 854F10 Stats
32/38
32
0 10 20 30 40 50 60 70 80 90 1000
0.0050.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Example, Contd
Fourth question: for our example (where we observed
sT2 = 102.3) with n= 50 samples, what is the 95%confidence interval for the population variance?
Sampling: The F Distribution
-
7/30/2019 MIT2 854F10 Stats
33/38
33
Sampling: The F Distribution
Typical use: compare the spread of two populations
Example: x~ N( x,
2x) from which we sample x1, x2, , xn
y~ N( y,2
y) from which we sample y1, y2, , ym
Then
or
Concept of the F Distribution
-
7/30/2019 MIT2 854F10 Stats
34/38
34
Concept of the F Distribution
Assume we have a normally
distributed population We generate two different
random samples from the
population
In each case, we calculate a
sample variance si2 What range will the ratio of
these two variances take?
) F distribution
Purely by chance (due to
sampling) we get a range ofratios even though drawing
from same population
Example:
Assume x~ N(0,1)
Take samples of size n= 20
Calculate s12 and s2
2 and take ratio
95% confidence interval on ratio
Large range in ratio!
Hypothesis Testing
-
7/30/2019 MIT2 854F10 Stats
35/38
35
Hypothesis Testing
A statistical hypothesis is a statement about the parameters of a
probability distribution H0 is the null hypothesis
E.g.
Would indicate that the machine is working correctly
H1 is the alternative hypothesis
E.g. Indicates an undesirable change (mean shift) in the machine
operation (perhaps a worn tool)
In general, we formulate our hypothesis, generate a random
sample, compute a statistic, and then seek to reject H0 or fail toreject (accept) H0 based on probabilities associated with thestatistic and level of confidence we select
Which Population is Sample x From?
-
7/30/2019 MIT2 854F10 Stats
36/38
36
Which Population is Sample x From?
Two error probabilities in decision:
Type I error: false alarm
Type II error: miss
Power of test (correct alarm)
ConsiderH1 analarm condition
Set decision point (andsample size) based on
acceptable ,
ConsiderH0thenormal condition
risks Control charts are hypothesis tests:
Is my process in control or has a significant change occurred?
Summary
-
7/30/2019 MIT2 854F10 Stats
37/38
37
Summary1. Review: Probability Distributions & Random Variables
2. Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions3. Estimation: Reasoning about the population based on a
sample
4. Some basic confidence intervals Estimate of mean with variance known
Estimate of mean with variance not known Estimate of variance
5. Hypothesis tests
Next Time:
1. Are effects (one or more variables) significant?) ANOVA (Analysis of Variance)2. How do we model the effectof some variable(s)?
) Regression modeling
MIT O C W
-
7/30/2019 MIT2 854F10 Stats
38/38
MIT OpenCourseWarehttp://ocw.mit.edu
2.854 / 2.853 Introduction to Manufacturing Systems
Fall 2010
For information about citing these materials or our Terms of Use, visit:http://ocw.mit.edu/terms.
http://ocw.mit.edu/http://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/termshttp://ocw.mit.edu/