Sociology 601: Midterm review, October 15, 2009
description
Transcript of Sociology 601: Midterm review, October 15, 2009
Sociology 601: Midterm review, October 15, 2009
• Basic information for the midterm– Date: Tuesday October 20, 2009– Start time: 2 pm.– Place: usual classroom, Art/Sociology 3221– Bring a sheet of notes, a calculator, two pens or pencils– Notify me if you anticipate any timing problems
• Review for midterm– terms– symbols– steps in a significance test– testing differences in groups– contingency tables and measures of association– equations 1
Important terms from chapter 1
Terms for statistical inference:• population• sample• parameter• statistic
Key idea: You use a sample to make inferences about a population
2
Important terms from chapter 22.1) Measurement:• variable• interval scale• ordinal scale• nominal scale• discrete variable• continuous variable
2.2-2.4) Sampling:• simple random sample• probability sampling• stratified sampling• cluster sampling• multistage sampling• sampling errorKey idea: Statistical inferences depend on measurement and sampling.3
Important terms from chapter 33.1) Tabular and graphic description• frequency distribution• relative frequency distribution• histogram• bar graph
3.2-3.4) Measures of central tendency and variation• mean• median• mode• proportion• standard deviation• variance• interquartile range• quartile, quintile, percentile
4
Important terms from chapter 3
Key ideas:
1.) Statistical inferences are often made about a measure of central tendency.
2.) Measures of variation help us estimate certainty about an inference.
5
Important terms from Chapter 4
• probability distribution• sampling distribution • sample distribution• normal distribution• standard error• central limit theorem• z-score
Key ideas:1.) If we know what the population is like, we can predict what a sample
might be like.2.) A sample statistic gives us a best guess of the population parameter.2.) If we work carefully, a sample can tell us how confident to be about our
sample statistic.6
Important terms from chapter 5• point estimator• estimate• unbiased• efficient• confidence interval
Key ideas: 1.) We have a standard set of equations we use to make estimates.2.) These equations are used because they have specific desirable
properties.3.) A confidence interval provides your best guess of a parameter.4.) A confidence interval provides your best guess of how close your
best guess (in part 3.)) will typically be to the parameter. 7
Important terms from chapter 66.1 – 6.3) Statistical inference: Significance tests
• assumptions• hypothesis• test statistic• p-value• conclusion• null hypothesis• one-sided test• two-sided test• z-statistic
8
Key Idea from chapter 6
A significance test is a ritualized way to ask about a population parameter.
1.) Clearly state assumptions
2.) Hypothesize a value for a population parameter
3.) Calculate a sample statistic.
4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic.
5.) Decide whether the hypothesis can be thrown out.
9
More important terms from chapter 66.4, 6.7) Decisions and types of errors in hypothesis tests• type I error• type II error• power6.5-6.6) Small sample tests• t-statistic• binomial distribution• binomial testKey ideas: 1.) Modeling decisions and population characteristics can affect the
probability of a mistaken inference.2.) Small sample tests have the same principles as large sample
tests, but require different assumptions and techniques. 10
symbols
a
YY
i
HH
dfnzt
ss
PYY
000
ˆ
22 ˆˆ
ˆˆ
11
Significance tests, Step 1: assumptions
• An assumption that the sample was drawn at random.– this is pretty much a universal assumption for all significance
tests.• An assumption whether the variable has two outcome
categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal
sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution.– Other tests assume different minimum sample sizes.– Some tests do not make this assumption.
• Declare α level at the start, if you use one. 12
Significance Tests, Step 2: Hypothesis
• State the hypothesis as a null hypothesis.– Remember that the null hypothesis is about the
population from which you draw your sample.
• Write the equation for the null hypothesis.
• The null hypothesis can imply a one- or two-sided test.– Be sure the statement and equation are consistent.
13
Significance Tests, Step 3: Test statistic
For the test statistic, write:• the equation, • your work, and • the answer.
– Full disclosure maximizes partial credit.
– I recommend four significant digits at each computational step, but present three as the answer.
14
Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic.
– Use the correct table for the type of test;
– Use the correct degrees of freedom if applicable;
– Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.
15
Significance Tests, Step 5: Conclusion
Write a conclusion
– write the p-value, your decision to reject H0 or not;
– a statement of what your decision means;
– discuss the substantive importance of your sample statistic.
16
Useful STATA outputs• immediate test for sample mean using TTESTI:. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500. ttesti 100 508 100 500, level(95)
One-sample t test
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 100 508 10 100 488.1578 527.8422
------------------------------------------------------------------------------
Degrees of freedom: 99
Ho: mean(x) = 500
Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.212820
Useful STATA outputs• immediate test for sample proportion using PRTESTI:
• . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5• . prtesti 832 .53 .50, level(95)
• One-sample test of proportion x: Number of obs = 832
• ------------------------------------------------------------------------------• Variable | Mean Std. Err. [95% Conf. Interval]• -------------+----------------------------------------------------------------• x | .53 .0173032 .4960864 .5639136• ------------------------------------------------------------------------------
• Ho: proportion(x) = .5
• Ha: x < .5 Ha: x != .5 Ha: x > .5• z = 1.731 z = 1.731 z = 1.731• P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418
21
Useful STATA outputs• Comparison of two means using ttesti•
• ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal
• Two-sample t test with unequal variances
• ------------------------------------------------------------------------------• | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]• ---------+--------------------------------------------------------------------• x | 4252 18.1 .1978304 12.9 17.71215 18.48785• y | 6764 32.6 .221294 18.2 32.16619 33.03381• ---------+--------------------------------------------------------------------• combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597• ---------+--------------------------------------------------------------------• diff | -14.5 .2968297 -15.08184 -13.91816• ------------------------------------------------------------------------------• Satterthwaite's degrees of freedom: 10858.6
• Ho: mean(x) - mean(y) = diff = 0
• Ha: diff < 0 Ha: diff != 0 Ha: diff > 0• t = -48.8496 t = -48.8496 t = -48.8496• P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000
24
Chapter 6: Significance Tests for Single Sample
or sample size best testmean large z-test for Ybar - 0
proportion large z-test for hat - 1
mean small t-test for Ybar - 0
proportion small Fisher’s exact test
32
Equations for tests of statistical significance
€
z = Y − μ0
ˆ σ Y
33
€
z =ˆ π − π 0
σ ˆ π
€
t = Y − μ0
ˆ σ Y
Chapter 7: Comparing scores for two groups
or sample size sample scheme best testmean large independent z-test for 2 - 1
proportion large independent z-test for 2 - 1
mean small independent t-test for 2 - 1
proportion small independent Fisher’s exact test
mean large dependent z-test for Dproportion large dependent McNemar testmean small dependent t-test for Dproportion small dependent Binomial test
34
Two Independent Groups: Large Samples, Means
€
7.1. difference of two large sample means : z =Y 2 −Y 1( ) − 0
s12
n1
+ s22
n2
• It is important to be able to recognize the parts of the equation, what they mean, and why they are used.
• Equal variance assumption? NO
35
Two Independent Groups: Large Samples, Proportions
€
7.2 difference of 2 large sample proportions : z =ˆ π 2 − ˆ π 1( ) − 0
ˆ π (1− ˆ π )n1
+ˆ π (1− ˆ π )
n2
• Equal variance assumption? YES (if proportions are equal then so are variances).
• df = N1 + N2 - 2
36
Two Independent Groups: Small Samples, Means
€
t(or z) = (Y 2 −Y 1) − 0ˆ σ Y 2 −Y 1
= (Y 2 −Y 1)(n1 −1)s1
2 + (n2 −1)s22
n1 + n2 − 2* 1
n1
+ 1n2
7.3 Difference of two small sample means:
Equal variance assumption: SOMETIMES (for ease)
NO (in computer programs)37
Two Independent Groups: Small Samples, Proportions
Fisher’s exact test • via stata, SAS, or SPSS• calculates exact probability of all possible
occurences
38
Dependent Samples:
• Means:
• Proportions:
€
t(or z) = D ˆ σ D
= D sD
n
39
€
z = n12 − n21
n12 + n21
Chapter 8: Analyzing associations
• Contingency tables and their terminologies:– marginal distributions and joint distributions– conditional distribution of R, given a value of E.
(as counts or percentages in A & F)– marginal, joint, and conditional probabilities.
(as proportions in A & F)
• “Are two variables statistically independent?”
40
Descriptive statistics you need to know
• How to draw and interpret contingency tables (crosstabs)
• Frequency and probability/ percentage terms– marginal – conditional– joint
• Measures of relationships: – odds, odds ratios– gamma and tau-b
41
Observed and expected cell counts
• fo, the observed cell count, is the number of cases in a given cell.
• fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other.
• fe = row total * column total / N– the equation for fe is a correction for rows or columns
with small totals.
42
Chi-squared test of independence• Assumptions: 2 categorical variables, random sampling, fe
>= 5
• Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.)
• Test statistic: 2 = ((fo-fe)2/fe)
• p-value from 2 table, df = (r-1)(c-1)
• Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.
43
Probabilities, odds, and odds ratios.
• Given a probability, you can calculate an odds and a log odds.– odds = p / (1-p)
• 50/50 = 1.0• 0 ∞
– log odds = log (p / (1-p) ) = log (p) – log(1-p)• 50/50 = 0.0• -∞ +∞
– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]
• Given an odds, you can calculate a probability.p = odds / ( 1 + odds)
44
Measures of association with ordinal data• concordant observations C:
– in a pair, one is higher on both x and y• discordant observations D:
– in a pair, one is higher on x and lower on y• ties
– in a pair, same on x or same on y
• gamma (ignores ties)
• tau-b is a gamma that adjusts for “ties”– gamma often increases with more collapsed tables b and both have standard errors in computer output b can be interpreted as a correlation coefficient
€
=C − DC + D
45