Quantitative Introduction ro Risk and Uncertainty in Business...
Transcript of Quantitative Introduction ro Risk and Uncertainty in Business...
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Quantitative Introduction roRisk and Uncertainty in Business Module 5:
Hypothesis Testing
M. Vidyasagar
Cecil & Ida Green ChairThe University of Texas at DallasEmail: [email protected]
October 13, 2012
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Hypothesis Testing: Basic Idea
‘Null’ hypothesis: What we believe in the absence of furtherevidence, e.g. a two-sided coin is ‘fair’ with equal likelihood.
Think: Null hypothesis = default assumption.
Two kinds of testing:
There is only the null hypothesis, and we accept or reject it.
There is a null as well as an alternate hypothesis, and wechoose one or the other.
The second kind of testing is easier: We choose whicheverhypothesis is more likely under the data.
The first kind of testing is harder.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Choosing Between Alternatives: Example
We are given a coin. The null hypothesis is that the coin is ‘fair’with equal probabilities of heads and tails. Call it H0.
The alternative hypothesis is that the coin is ‘biased’ with theprobability of heads equal to 0.7. Call it H1.
Suppose we toss the coin 20 times and 12 heads result. Whichhypothesis should we accept?
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Choosing Between Alternatives: Example (Cont’d)
Let n = 20 (number of coin tosses), k = 12 (number of heads),p0 = 0.5 (probability of heads under hypothesis H0) and P1 = 0.7(probability of heads under hypothesis H1).
The likelihood of the observed outcome under each hypothesis iscomputed.
L0 =
(2012
)(p0)
12(1− p0)8 = 0.1201,
L1 =
(2012
)(p1)
12(1− p1)8 = 0.1144.
So we accept hypothesis H0, that the coin is fair, but only becausethe alternative hypothesis is even less likely!
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Connection to MLE
We choose the hypothesis that the coin is fair only because thealternate hypothesis is even more unlikely!
So what is the value of p that maximizes
L =
(2012
)p12(1− p)8?
Answer: pMLE = 12/20 = 0.6, the fraction of heads observed.
With MLE (maximum likelihood estimation), we need not choosebetween two competing hypotheses – MLE gives the most likelyvalues for the parameters!
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Estimating Probabilities of Binary Outcomes
Suppose an event has only two outcomes, e.g. coin toss. Let pequal the true but unknown probability of ‘success’, e.g. that thecoin comes up heads.
After n trials, suppose k successes result. Then p̂ := k/n is calledthe empirical probability of success. As we have seen, it is alsothe maximum likelihood estimate of p.
Question: How close is the empirical probability p̂ to the true butunknown probability p?
Hoeffding’s inequalities answer this question.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Hoeffding’s Inequalities: Statements
Let ε > 0 be any specified accuracy. Then
Pr{p̂− p ≥ ε} ≤ exp(−2nε2).
Pr{p̂− p ≤ −ε} ≤ exp(−2nε2).
Pr{|p̂− p| ≤ ε} ≥ 1− 2 exp(−2nε2).
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Hoeffding’s Inequalities: Interpretation
Interpretations of Hoeffding’s inequalities:
With confidence 1− 2 exp(−2nε2), we can say that the true butunknown probability p lies in the interval (p̂− ε, p̂+ ε). As weincrease ε, the term δ := 2 exp(−2nε2) decreases, and we can bemore sure of our interval.
The widely used 95% confidence interval corresponds to δ = 0.5.
The one-sided inequalities have similar interpretations.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
An Example of Applying Hoeffding’s Inequality
Suppose we toss a coin 1000 times and it comes up heads 552times. How sure can we be that the coin is biased?
n = 1000, k = 552, p̂ = 0.552. If p > 0.5 then we can say that thecoin is biased. So let ε = p̂− p = 0.052. Compute
δ = exp(−2nε2) = 0.0045
So with confidence 1− δ = 0.9955, we can say that p > 0.5. Inother words, we can be 99.55% sure that the coin is biased. Usingthe two-sided Hoeffding inequality, we can be 99.1% sure thatp̂ ∈ (0.5, 0.614).
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Another Example
An opinion poll of 750 voters (ignoring ‘don’t know’s) shows that387 will vote for candidate A and 363 will vote for candidate B.How sure can we be that candidate A will win?
Let p denote the true but unknown fraction of voters who will votefor A, and p̂ = 387/750 = 0.5160 denote the empirical estimate ofp. If p < 0.5 then A will lose. So the accuracy ε = 0.0160, and thenumber of samples n = 750. The one-sided confidence is
δ = exp(−2nε2) = 0.6811.
So we can be only 1− δ ≈ 32% sure that A will win. In otherwords, the election cannot be ‘called’ with any confidence based onsuch a small margin of preference.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Relating Confidence, Accuracy and Number of Samples
For the two-sided Hoeffding inequality, the confidence δ associatedwith n samples and accuracy ε is given by
δ = 2 exp(−2nε2).
We can turn this around and ask: Given an empirical estimate p̂based on n samples, what is the accuracy corresponding to a givenconfidence level δ?
Solving the above equation for ε in terms of δ and n gives
ε(n, δ) =
(1
2nlog
2
δ
)1/2
.
So with confidence δ we can say that the true but unknownprobability p is in the interval [p̂− ε(n, δ), p̂+ ε(n, δ)].
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Hoeffding’s Inequalities for More Than Two Outcomes
Suppose a random experiment has more than two possibleoutcomes (e.g. rolling six-sided die). Say there are k outcomes,and in n trials, the i-th outcome appears ni times (and of course∑k
i=1 ni = n).
We can definep̂i =
nin, i = 1, . . . , k,
and as we have seen, these are the maximum likelihood estimatesfor each probability.
Question: How good are these estimates?
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
More Than Two Outcomes – 2
Fact: For any sample size n and any accuracy ε, it is the case that
Pr{maxi|p̂i − pi| > ε} ≤ 2k exp(−2nε2).
So with confidence 1− 2k exp(−2nε2), we can assert that everyempirical probability p̂i is within ε of the correct value.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
More Than Two Outcomes: Example
Suppose we roll a six-sided die 1,000 times and get the outcomes 1through 6 in the following order:
p̂1 = 0.169, p̂2 = 0.165, p̂3 = 0.166, p̂4 = 0.165, p̂5 = 0.167, p̂6 = 0.168.
With what confidence can we say that the die is not fair, that is,that p̂i 6= 1/6 for all i?
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
More Than Two Outcomes: Example (Cont’d)
Suppose that indeed the true probability is pi = 1/6 for all i. Then
maxi|p̂i − pi| = |p̂1 − 1/6| ≈ 0.0233.
Take ε = 0.233, n = 1000 and compute
δ = 6× 2 exp(−2nε2) ≈ 11.87!
How can a ‘probability’ be greater than one?
Note: This δ is just an upper bound for Pr{maxi |p̂i − pi| > ε}; soit can be larger than one.
So we cannot rule out the possibility that the die is fair (which isquite different from saying that it is fair).
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
K-S Tests: Problem Formulations
There are two widely used tests. They should be called theKolmogorov test and the Smirnov test, respectively. Unfortunatelythe erroneous names ‘one-sample K-S test’ and ‘two-sample K-Stest’ have become popular.
Kolmogorov Test, or One-Sample K-S Test: We have a set ofsamples, and we have a candidate probability distribution.Question: How well does the distribution fit the set of samples?
Smirnov Test, or Two-Sample K-S Test: We have two sets ofsamples, say x1, . . . , xn and y1, . . . , ym. Question: How sure arewe that both sets of samples came from the same (but unknown)distribution?
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Empirical Distributions
Suppose X is a random variable for which we have generated ni.i.d. samples, call them x1, . . . , xn.
Then we define the empirical distribution of X, based on theseobservations, as follows:
Φ̂(a) =1
n
n∑i=1
I{xi≤a},
where I denotes the indicator function: I = 1 if the conditionbelow is satisfied and I = 0 otherwise.
So in this case Φ̂(a) is just the fraction of the n samples that are≤ a. The diagram on the next slide illustrates this.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Empirical Distribution Depicted
Note: The diagram shows the samples occurring in increasingorder but they can be in any order.1
1Source:http://www.aiaccess.net/English/Glossaries/GlosMod/e gm distribution function.htm
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
Glivenko-Cantelli Lemma
Theorem: As n→∞, the empirical distribution Φ̂(·) approachesthe true distribution Φ(·).
Specifically, if we define the Kolmogorov-Smirnov distance
dn = maxu|Φ̂(u)− Φ(u)|,
then dn → 0 as n→∞.
At what rate does the convergence take place?
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
One-Sample Kolmogorov-Smirnov Statistic
Fix a ‘confidence level’ δ > 0 (usually δ is taken as 0.05 or 0.02).Define the threshold
θ(n, δ) =
(1
2nlog
2
δ
)1/2
.
Then with probability 1− δ, we can say that
maxu|Φ̂(u)− Φ(u)| =: dn ≤ θn.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
K-S (Kolmogorov-Smirnov) Tests: ObjectivesKolmogorov-Smirnov Tests: Statements
One-Sample Kolmogorov-Smirnov Test
Given samples x1, . . . , xn, fit it with some distribution F (·) (e.g.Gaussian). Compute the K-S statistic
dn = maxu|Φ̂(u)− F (u)|.
Compare dn with the threshold θ(n, δ). If dn > θ(n, δ), we ‘rejectthe null hypothesis’ at level δ. In other words, if dn > θ(n, δ), thenwe are 1− δ sure that the data was not generated by thedistribution F (·).
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Student t Test: Motivation
The student t test is used the null hypothesis that two sets ofsamples have the same mean, assuming that they have the samevariance. The test has broad applicability even if the assumption of‘same variance’ is not satisfied.
Problem: We are given two samples x1, . . . , xm1 andxm1+1, . . . , xm1+m2 . Determine whether the two sets of samplesarise from a distribution with the same mean.
Application: Most commonly used in quality control.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Student t Test: Theory
Let x̄1, x̄2 denote the means of the two sample classes, that is,
x̄1 =1
m1
m1∑i=1
xi, x̄2 =1
m2
m2∑i=1
xm1+i.
Let S1, S2 denote the unbiased estimates of the standarddeviations of the two samples, that is,
S21 =
1
m1 − 1
m1∑i=1
(xi − x̄1)2,
S22 =
1
m2 − 1
m2∑i=1
(xm1+i − x̄2)2.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Student t Test: Theory – 2
Now define the ‘pooled’ standard deviation S12 by
S212 =
(m1 − 1)S21 + (m2 − 1)S2
2
m1 +m2 − 2.
Then the quantity
dt =x̄1 − x̄2
S12√
(1/m1) + (1/m2)
satisfies the t distribution with m1 +m2 − 2 ‘degrees of freedom.’
As the number of d.o.f. becomes large, the t distributionapproaches the normal distribution. The next slide shows thedensity of the t distribution for various d.o.f.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Density of the t Distribution
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Outline
1 Hypothesis Testing
2 Hoeffding’s Inequalities
3 K-S Tests for Goodness of Fit
K-S (Kolmogorov-Smirnov) Tests: Objectives
Kolmogorov-Smirnov Tests: Statements
4 Student t Test
5 Chi-Squared Test
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Chi-Squared Test: Motivation
The t test is to determine whether two samples have the samemean. The chi-squared test is to determine whether two sampleshave the same variance.
The application is again to quality control.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Chi-Squared Test: Theory
Given two sets of samples, say x1, . . . , xm1 andxm1+1, . . . , xm1+m2 (where usually m2 � m1), compute theunbiased variance estimate V1 of the larger (first) sample
V1 =1
m1 − 1
m1∑i=1
(xi − x̄1)2,
and the sum of squares of the smaller (second) sample
S2 =
m2∑i=1
(xm1+i − x̄2)2 = (m2 − 1)V2.
Then the ratio S2/V1 satisfies the chi-squared (or χ2) distributionwith m2 − 1 degrees of freedom.
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Distribution Function of the Chi-Squared Variable
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Density Function of the Chi-Squared Variable
M. Vidyasagar Hypothesis Testing
Hypothesis TestingHoeffding’s Inequalities
K-S Tests for Goodness of FitStudent t Test
Chi-Squared Test
Application of the Chi-Squared Test
Note that the χ2 r.v. is always nonnegative. So, given someconfidence δ (usually δ = 0.05), we need to determine a confidenceinterval
xl = Φ−1χ2,m2−1(δ), xu = Φ−1
χ2,m2−1(1− δ).
If the test statistic S2/V1 lies in the interval [xl, xu], then weaccept the null hypothesis that both samples have the samevariance.
M. Vidyasagar Hypothesis Testing