Statistical inference: Hypothesis Testing and t-tests
-
Upload
eugene-yan -
Category
Education
-
view
866 -
download
1
Transcript of Statistical inference: Hypothesis Testing and t-tests
Central Limit Theorem What is the mean height (๐) of all primary school children in Singapore?
Sample = Anderson Primary
Population = All primary school children in SG
Sample = DamaiPrimary
Sample = Red Swastika Primary
Sample = Zhenghua Primary
๐๐จ๐๐ ๐๐๐๐๐ ๐ท๐๐๐๐๐๐ = Mean height of
100 children from Anderson Primary
๐๐ซ๐๐๐๐ ๐ท๐๐๐๐๐๐ = Mean height of 100
children from Damai Primary
๐๐น๐๐ ๐บ๐๐๐๐๐๐๐ = Mean height of 100 children from Red Swastika Primary
๐๐๐๐๐๐๐๐๐ ๐ท๐๐๐๐๐๐= Mean height of 100
children from Zhenghua Primary
๐ท๐๐ ๐ก๐๐๐๐ข๐ก๐๐๐ ๐๐ ๐๐๐๐ โ๐๐๐โ๐ก ~ ๐(๐๐๐๐ = ๐, ๐ ๐ก๐๐๐๐๐๐ ๐๐๐๐๐ =๐
100)
โฆ
โฆ
โฆ
From the sampling distribution: Mean( ๐ฅ) โ ๐ SD( ๐ฅ) < ๐
โ As sample size increases, SD decreases
Central Limit Theorem (CLT)
The distribution of sample statistics (e.g., mean) is approximately normal, regardless of the underlying distribution, with mean =
๐ and variance = ๐2
๐
๐ ~ ๐ต(๐๐๐๐ = ๐, ๐๐๐๐๐ ๐๐๐ ๐๐๐๐๐ =๐
๐)
Further experimentation: http://bitly.com/clt_mean
Distribution is normal
Sample mean = population mean
Sample sd = population sd divided by square root
of sample size
Applet source: Mine รetinkaya-Rundel, Duke University
Conditions for CLT
Independence: Sampled observations must be independent:โRandom sample/assignment
โ If sampling without replacement, n < 10% of population
Sample Size/Skew:โPopulation should be normal
โ If not, sample size should be large (rule of thumb: n > 30)
Confidence Interval
An interval estimate of a population parameterโComputed as sample mean +/- a
margin of error
๐ฅ ยฑ ๐ง ร ๐๐ธ,where SE =๐
๐
โ95% confidence interval would contain 95% of all values and would be ๐ฅ ยฑ 2๐๐ธ or ๐ฅ ยฑ 1.96 ร
๐
๐
๐ช๐ณ๐ป: ๐ ~ ๐ต(๐๐๐๐ = ๐, ๐๐๐๐๐ ๐๐๐ ๐๐๐๐๐ =๐
๐)
Confidence Interval
You have taken a random sample of 100 primary school children in Singapore. Their heights had mean = 150cm and sd = 10cm. Estimate the true average height of primary school children based on this sample using a 95% confidence interval.
We are 95% confident that primary school children mean height is between 148.04cm and 151.96cm
Confidence Interval: ๐ฅ ยฑ ๐ง ร ๐๐ธ๐ = 100 ๐ฅ = 150๐ ๐ = 10
๐๐ธ =๐ ๐
๐=
10
100= 1
๐ฅ ยฑ ๐ง ร ๐๐ธ = 150 ยฑ 1.96 ร 1= 150 ยฑ 1.96= (148.04, 151.96)
Required sample size for margin of error
Given a target margin of error and confidence level, and information on the standard deviation of sample (or population), we can work backwards to determine the required sample size.
Previous measurements of primary school children heights show sd = 15cm. What should be the sample size in order to get a 95% confidence interval with a margin of error less than or equal 1cm?
Margin of error: โค 1๐๐Confidence level: 95%๐ง = 1.96๐ ๐ = 15
๐๐ธ = ๐ง ร ๐๐ธ
1 = 1.96 ร15
๐
๐ = (1.96 ร 15
1)2
๐ = (29.4)2 = 864.36Thus, we need a sample size of at least 865 primary school children
Hypothesis Testing
Null hypothesis ๐ป0
โThe status quo that is assumed to be true
Alternative hypothesis (๐ป๐)โAn alternative claim under consideration that will require statistical
evidence to accept, and thus, reject the null hypothesis
We will consider ๐ป0 to be true and accept it unless the evidence in favour of ๐ป๐ is so strong that we reject ๐ป0 in favour of ๐ป๐.
Hypothesis Testing
Earlier, we found the sample of 100 primary school children had mean height = 150cm and sd = 10cm. Based on this statistic, does the data support the hypothesis that primary school children on average are shorter than 151cm?
๐ป0: ฮผ = 151 #primary school students have mean height = 151
๐ป๐: ๐ < 151 #primary school students have mean height < 151
P-value
Probability of obtaining the observed result or results that are more โextremeโ, given that the null hypothesis is trueโP(observed or more extreme outcome | ๐ป0 is true)
โ If the p-value is low (i.e., lower than the significance level (๐ผ), usually 5%), then we say that it is very unlikely to observe the data if the null hypothesis was true, and reject ๐ป0
โ If the p-value is high (i.e., higher than ๐ผ), we say that it is likely to observe the data even if the null hypothesis was true, and thus do not reject ๐ป0
Hypothesis Testing and P-value
Recall that the sample of 100 primary school children had mean height = 150cm and sd = 10cm. Also take sig. level = 0.05
๐ฅ = 150cm; sd = 10cm; SE =10
100= 1 #what we know from the sample
๐ ~๐(๐ = 151, ๐๐ธ = 1) #null hypothesis of the population
Test Statistic:
๐ =150 โ 151
1= โ1
P-value: ๐ ๐ < โ1 = 1 โ 0.8413= 0.1587
Since p-value is higher than 0.05, we do not reject ๐ป0
๐ = 151150
0.1587
Hypothesis Testing and P-value
Interpreting p-valueโ If in fact, primary school children have mean height of 151cm, there is a
15.9% chance that a random sample of 100 children would yield a sample mean of 150cm or lower
โThis is a pretty high probability
โThus, the sample mean of 150 could have
likely occurred by chance
Two-sided Hypothesis Testing
What is the probability that the children have mean height different from 151cm?
๐ป0: ฮผ = 151 #primary school students have mean height = 151
๐ป๐: ๐ โ 151 #primary school students have mean height โ 151
P-value: ๐ ๐ < โ1 + ๐ ๐ > 1= 2 ร 1 โ 0.8413= 0.3174
๐ = 151150
0.1587 0.1587
152
Hypothesis Testing and Confidence Intervals
If the confidence interval contains the null value, donโt reject ๐ป0. If the confidence interval does not contain the null value, reject ๐ป0.โPreviously, we found the 95% confidence interval for heights of primary
school children to be (148, 152). Given that our null hypothesis(๐ป0 =151cm) falls within this 95% CI, we do not reject it.
A two-sided hypothesis with significance level ๐ผ is equivalent to a confidence interval with ๐ถ๐ฟ = 1 โ ๐ผ
A one-sided hypothesis with a significance level ๐ผ is equivalent to a confidence interval with ๐ถ๐ฟ = 1 โ 2๐ผ
148 cm 152 cm
95% confident that the average height is between 148 and 152 cm
Decision Errors
Which error is worse to commit (in a research/business context)?โType II: Declaring the defendant innocent when they are actually guilty
โType I: Declaring the defendant guilty when they are actually innocent
โBetter that ten guilty persons escape than that one innocent sufferโ
- William Blackstone
Fail to reject ๐ป0 Reject ๐ป0
๐ป0 is True Type I error
๐ป0 is False Type II error
Type I Error rate
We reject ๐ป0 when the p-value is less than 0.05 (๐ผ=0.05)โ I.e., Should ๐ป0 actually be true, we do not want to incorrectly reject it
more than 5% of the time
โThus, using a 0.05 significance level is equivalent to having a 5% chance of making a Type I error
Choosing significance levelsโ If Type I Error is costly, we choose a lower significance level (e.g., 0.01)
โ E.g., spam filtering
โ If Type II Error is costly, we choose a higher significance level (e.g., 0.10)โ E.g., airport baggage screening
Fail to reject ๐ป0 Reject ๐ป0
๐ป0 is True Type I error (๐ผ)
๐ป0 is False Type II error (๐ฝ)
Studentโs t Distribution
According to CLT, the distribution of sample statistics is approximately normal, if: โPopulation is normal
โSample size is large (n > 30)
If so, we can use the population sd (๐) to compute a z-score
However, sample sizes are sometimes small and we often do not know the standard deviation of the population (๐)โThus, the normal distribution may not be appropriate
Thus, we rely on the t distribution
Shape of the t distribution
Bell shaped but thicker tails than the normalโThus, observations are more likely to fall beyond 2sd from the mean
โThe thicker tails are helpful in adjusting for the less reliable data on the standard deviation (when n is small and/or ๐ is unknown)
Shape of the t distribution
Has one parameter, degrees of freedom (df), which determines the thickness of the tailsโdf refers to the number of independent observations in data set
โNumber of independent observations = sample size minus 1
โE.g., in a sample size of 8, there are (8-1) degrees of freedom
What happens to the shape of the t distribution when df increases?โ It approaches the normal distribution
When to use the t distribution
In general, we use the t distribution when:โN is small (n < 30) and/or;
โ๐ is unknown
However, nowadays, our sample sizes are usually above 30โThus, why bother with the t distribution?
โBecause 95% of the world prefers the t distribution to the normal and youโll definitely encounter it eventually
โ If youโre unsure, use the t distribution since it approximates to the normal distribution with large sample sizes
Independent and Dependent t-tests
When to use independent and dependent t-tests?โDependent: when evaluating the effect between two related samples
โ You feed a group of 100 people fast food everyday
โ Did they gain weight after 30 days?
โ Independent: when evaluating the effect between two independent samplesโ You feed 50 males and 50 females fast food everyday
โ Did males or females gain more weight after 30 days?
You conduct a study with two groups and have them exercise three times a day for 30 days (group A = crossfit, group B = yoga).โHow would you test the difference between crossfit and yoga participants?
โHow would you test the difference in weight between day 0 and day 30 for yoga participants?
Effect Size
When samples become large enough, you often get significant resultsโHowever, is it practically significant?
Effect size is a simple way to quantify difference between two groupsโEmphasizes the size of the difference (without effect of sample size)
โCohenโs d is one of the most common ways to measure effect size
Effect size:
Proper calculation for ๐๐ท๐๐๐๐๐๐:
Simple calculation for ๐๐ท๐๐๐๐๐๐:
Time for practice
In this lab session we will cover:โ Independent t-tests
โDependent (paired) t-tests
โEffect size (Cohenโs d)
GitHub repository: https://github.com/eugeneyan/Statistical-Inference