• date post

31-Mar-2015
• Category

## Documents

• view

220

2

TAGS:

Embed Size (px)

### Transcript of Comparing Two Groupsâ€™ Means or Proportions: Independent Samples t-tests

• Slide 1

Comparing Two Groups Means or Proportions: Independent Samples t-tests Slide 2 Review Confidence Interval for a Mean Slap a sampling distribution * over a sample mean to determine a range in which the population mean has a particular probability of beingsuch as 95% CI. If our sample is one of the middle 95%, we know that the mean of the population is within the CI. Y-bar 95% CI: Y-bar +/- 1.96 *(s.e.) 2.5% Significance Test for a Mean Slap a sampling distribution * over a guess of the population mean to determine if the sample has a very low probability of having come from a population where the guess is truesuch as -level =.05. If our sample mean is in the outer 5%, we know to reject the guess, our sample has a low chance of having come from a population with the mean we guessed. *sampling distribution: the way a statistic from samples of a certain size are distributed after all possible random samples with replacement are collected ? 2.5% Y-bar? o =guess z or t = (Y-bar - o )/ s.e. Y-bar? -1.96z+1.96z -1.96z+1.96z 20 21 22 23 24 X H 27 28 29 30 Slide 3 Review Lets collect some data on educational aspirations and produce a 95% confidence interval to tell us what the population parameter likely is, and then lets do a significance test, guessing that average aspiration will be 16 years. We have a sample size of 625 kids who reported their educational aspirations (where 12 = high school, 16 equals 4 years of college and so forth). The sample mean is 15 years with a standard deviation of 2 years. 95% confidence interval = Sample Mean +/- z * s.e. 1.Calculate the standard error (s.e.) of the sampling distribution: s.e. = s / n = 2/625 = 2/25 = 0.08 2.Build the width of the Interval, using s.e. and the z that corresponds with the percent confidence. 95% corresponds with a z of +/- 1.96. Interval = +/- z * s.e. = +/- 1.96 * 0.08 = +/- 0.157 3.Center the interval width on the mean (add to and subtract from the mean): 95% CI = Sample Mean +/- z * s.e. = 15 +/- 0.157 The 95% CI:14.84 to 15.16 We are 95% confident that the population mean falls between these values. (What does this say about my guess???) Slide 4 Review Lets collect some data on educational aspirations and produce a 95% confidence interval to tell us what the population parameter likely is, and then lets do a significance test, guessing that average aspiration will be 16 years. We have a sample size of 625 kids who reported their educational aspirations (where 12 = high school, 16 equals 4 years of college and so forth). The sample mean is 15 years with a standard deviation of 2 years. Significance Test z or t = (Y-bar - o ) / s.e. 1.Decide -level ( =.05) and nature of test (two-tailed) 2.Set critical z or t: (+/- 1.96) 3.Make guess or null hypothesis, H o : = 16 H a : 16 4.Collect and analyze data 5.Calculate Z or t: z or t = Y-bar - o (s.e. = s/n = 2/625 = 2/25 =.08) s.e. z or t = (15 16)/.08 = -1/.08 = -12.5 6.Make a decision about the null hypothesis (reject the null: -12.5 < -1.96) 7.Find the P-value (look up 12.5 in z or t table). P t The reason for using t is due to the fact that we use sample standard deviation (s) rather than population standard deviation () to calculate standard error. Since s, standard deviations, will vary from sample to sample, the variability in the sampling distribution ought to be greater than in the normal curve. t has a larger spread, more accurately reflecting the likelihood of extreme samples, especially when sample size is small. The larger the degrees of freedom (n 1 when estimating the mean), the closer the t curve is to the normal curve. This reflects the fact that the standard deviation s approaches for large sample size n. Even though z-scores based on the normal curve will work for larger samples (n > 120) SPSS uses t for all tests because it works for small samples and large samples alike. (df = the number of scores that are free to vary when calculating a statistic... n - ?) Tea Tests? Slide 8 Comparing Two Groups Were going to move forward to more sophisticated statistics, building on what we have learned about confidence intervals and significance tests. Social scientists look for relationships between concepts in the social world. For example: Does ones sex affect income? Focus on the relationship between the concepts: Sex and Income Does ones race affect educational attainment? Focus on the relationship between the concepts: Race and Educational Attainment I love sophisticated statistics! Slide 9 Comparing Two Groups In this section of the course, you will learn ways to infer from a sample whether two concepts are related in a population. Independent variable (X): That which causes another variable to change when it changes. Dependent variable (Y): That which changes in response to change in another variable. X Y (X= Sex or Race) (Y= Income or Education) The statistical technique you use will depend of the level of measurement of your independent and dependent variablesthe statistical test must match the variables! Levels of Measurement: Nominal, Ordinal, Interval-Ratio Causality: Three necessary conditions: 1.Association 2.Time order 3.Nonspuriousness Slide 10 Comparing Two Groups The test you choose depends on level of measurement: IndependentDependentStatistical Test DichotomousInterval-ratio Independent Samples t-test Dichotomous NominalNominalCross TabsOrdinalDichotomous NominalInterval-ratioANOVA OrdinalDichotomous Dichotomous Interval-ratioInterval-ratioCorrelation and OLS Regression Dichotomous Slide 11 Comparing Two Groups IndependentDependentStatistical Test DichotomousInterval-ratio Independent Samples t-test Dichotomous An independent samples t-test is concerned with whether a mean or proportion is equal between two groups. For example, does sex affect income? Womens mean = Mens Mean ??? Income Income Slide 12 Comparing Two Groups Independent Samples t-tests: Earlier, our focus was on the mean. We used the mean of the sample (statistic) to infer a range for what our population mean (parameter) might be (confidence interval) or whether it was like some guess or not (significance test). Now, our focus is on the difference in the mean for two groups. We will use the difference in the sample means (statistic) to infer a range for what our population difference in means (parameter) might be (confidence interval) or whether it is like some guess (significance test). Slide 13 Comparing Two Groups The difference will be calculated as such: D-bar = Y-bar 2 Y-bar 1 For example: Average Difference in Income by Sex = Male Average Income Female Average Income (What would it mean if mens income minus womens income equaled zero?) Slide 14 Comparing Two Groups Like the mean, if one were to take random sample after random sample from two groupswith normal population distributionsand calculate and record the difference between groups each time, one would see the formation of a Sampling Distribution for D-bar that was normal and centered on the two populations difference. Z -3 -2 -1 0 1 2 3 95% Range Sampling Distribution of D-bar average difference between two groups samples = Slide 15 Comparing Two Groups So the rules and techniques we learned for means apply to the differences in groups means. One creates sampling distributions to create confidence intervals and do significance tests in the same ways. However, the standard error of D-bar has to be calculated slightly differently. For Means: (s 1 ) 2 (s 2 ) 2 s.e. (s.d. of the sampling distribution) = n 1 + n 2 (assumes equal sample size) For Proportions: s.e. = 1 (1 - 1 ) 2 (1 - 2 ) n 1 + n 2 df = n1 + n2 - 2 Variance Sum Law: variance of difference between two independent variables is the sum of their variances Slide 16 Comparing Two Groups When variances are assumed to be equal, and sample sizes differ, we use the pooled estimate of variance for the standard error. Estimated Standard error pooled: Start with a pooled variance. Then: For Means: (s p ) 2 (s p ) 2 s.e.= n 1 + n 2 (assumes equal variance) df = n1 + n2 - 2 Slide 17 Comparing Two Groups Calculating a Confidence Interval for the Difference between Two Groups Means By slapping the sampling distribution for the difference over our samples difference between groups, D-bar, we can find the values between which the population difference is likely to be. 95% C.I. = D-bar +/- 1.96 * (s.e.)Remember: When = (Y-bar 2 Y-bar 1 ) +/- 1.96 * (s.e.)sample sizes are Or = ( 2 1 ) +/- 1.96 * (s.e.)small, t z, and +/- 1.96 may not be 99% C.I. = D-bar +/- 2.58 * (s.e.)appropriate. = (Y-bar 2 Y-bar 1 ) +/- 2.58 * (s.e.) Or = ( 2 1 ) +/- 2.58 * (s.e.) Slide 18 Comparing Two Groups Confidence Interval Example: We want to know what the likely difference is between male and female GPAs in a population of college students with 95% confidence. Sample: 50 men, average gpa = 2.9, s.d. = 0.5 (To confuse you, equal sample sizes, ergo 50 women, average gpa = 3.1, s.d. = 0.4 standard error formula not pooled) 95% C.I. = Y-bar 2 Y-bar 1 +/- 1.96 * s.e. 1.Find the standard error of the sampling distribution: s.e. = (.5) 2 / 50 + (.4) 2 /50 = .005 +.003 = .008 = 0.089 2.Build the width of the Interval. 95% corresponds with a z or t of +/- 1.96. +/- z * s.e = +/- 1.96 * 0.089 = +/- 0.174 3.Insert the mean difference to build the interval: 95% C.I. = (Y-bar 2 Y-bar 1 ) +/- 1.96 * s.e. = 3.1 - 2.9 +/- 0.174 = 0.2 +/- 0.174 The interval: 0.026 to 0.374 We are 95% confident that the difference between mens and womens GPAs in the population is between.026 and 0.374. If we had guessed zero difference, would the difference be a significant difference? Slide 19 Comparing Two Groups We can also use the standard error (standard deviation of the sampling distribution for differences between means) to conduct a t-test. Independent Samples t-test: Y 1 - Y 2 t = For Means: (s p ) 2 (s p ) 2 n 1 + n 2 n1 + n2 - 2 Slide 20 Comparing Two Groups Conducting a Te