Data Quality Class 6. This Week Review for Exam Project Questions Data Standardization.
Final Exam Review Class
-
Upload
emersaphire -
Category
Documents
-
view
221 -
download
0
Transcript of Final Exam Review Class
-
8/10/2019 Final Exam Review Class
1/62
BS704 Review for Final Exam
-
8/10/2019 Final Exam Review Class
2/62
Final Exam
You may bring 5 pages of notes
You MUST bring full copies ofstatistical tables (on Blackboard)
You MUST bring a calculator
-
8/10/2019 Final Exam Review Class
3/62
Topics Covered Since Midterm:
Hypothesis Testing for a single mean andproportion, and for two means
One-way ANOVA
Chi-square Tests
Power and Sample size
Regression and Correlation
Logistic Regression
Survival analysis
-
8/10/2019 Final Exam Review Class
4/62
Hypothesis Tests for...
Single mean
Single proportionp
Comparing two means 1- 2
Paired (or matched) data d
-
8/10/2019 Final Exam Review Class
5/62
Conducting a Hypothesis Test
Define null and research hypotheses
Define test statistic, level of
significance and decision rule
Calculate test statistic based uponsample data.
Use decision rule or p-value to decidewhether to reject or not reject the nullhypothesis.
-
8/10/2019 Final Exam Review Class
6/62
Conducting a Hypothesis Test
For a single mean
If n 30, use z-test statistic
If n < 30 use t-test statistic
For a single proportionp
Use z-test statistic
Check assumptions
-
8/10/2019 Final Exam Review Class
7/62
Conducting a Hypothesis Test
For comparing two means 1- 2
If n1and n2both 30, use z-test statistic
If n1and/or n2< 30 use t-test statistic
For comparing two proportionsp1-p2
Use chi-square test
-
8/10/2019 Final Exam Review Class
8/62
Type I and II errors
Type I error occurs when we reject nullhypothesis when we shouldnt.
Pr(Type I error) =
Type II error occurs when we dontreject null hypothesis when we should
have.
Pr(Type II error) =
-
8/10/2019 Final Exam Review Class
9/62
One-Way ANOVA
Used when we want to compare the meansof three or more groups from independentpopulations.
Continuous outcome measured on eachsubject.
We set up an analysis of variance table and
compare the variances of between groupsand within groups.
An F-test is used with two different degreesof freedom terms.
-
8/10/2019 Final Exam Review Class
10/62
Chi-Square Test
Chi-square goodness of fit test
Assess whether responses fit a specifieddistribution for one sample of people
Chi-square test of independence
Test if two discrete variables are associated insome way for a sample of people
Chi-square test comparing distributions
Compare distributions of proportions among twoor more independent groups
-
8/10/2019 Final Exam Review Class
11/62
Calculating a Sample Size for a Study
Need a large enough sample to ensureyou have the pre-specified amount of
precision in analysis Sample size determined based on type
of planned analysis:
Confidence interval
Hypothesis test
-
8/10/2019 Final Exam Review Class
12/62
Calculating a Sample Size for a Study
We always round up our calculation.
Need to account for possible dropout
from study. This always increases therequired sample size.
-
8/10/2019 Final Exam Review Class
13/62
Power
Linked up with Type II error
Power = 1-
=P(Reject H0 | H0false)
= Probability of correctlyrejecting H0when H0is false.
-
8/10/2019 Final Exam Review Class
14/62
Correlation
Correlation measures the nature andstrength of linear association between
two variables at a time.
Regression equation that best
describes relationship betweenvariables.
-
8/10/2019 Final Exam Review Class
15/62
Correlation Coefficient
Population correlation is r (rho)
Sample correlation is rwhere
-1 < r< +1
Sign indicates nature of relationship(positive or direct, negative or inverse)
Magnitude indicates strength
-
8/10/2019 Final Exam Review Class
16/62
Linear Regression
A very popular method for describingthe linear relationship between two
variables (usually continuousvariables).
We use a scatterplot to display the
data graphically A line to show the association between
the two variables.
-
8/10/2019 Final Exam Review Class
17/62
Simple Linear Regression
Y = Dependent, Outcome variable
X = Independent, Covariate, Predictor
variable
= b0+ b1x
b0is the Y-intercept, b1is the slope
y
-
8/10/2019 Final Exam Review Class
18/62
Multiple Linear Regression
Useful when we want to jointlyexamine the effect of several X
variables on the outcome Y variable. Y = continuous outcome variable
X1, X2, , Xp= set of independent or
predictor variables
. xb+...+xb+xb+b=y pp22110
-
8/10/2019 Final Exam Review Class
19/62
Linear Regression
Predictors can be continuous, indicatorvariables (0/1) or a set of dummy variables
Confoundingthe effect of a risk factor onan outcome is somehow changed due to theeffect of another factor.
Effect Modificationa different relationship
between the risk factor and an outcomedepending on the level of another variable.
-
8/10/2019 Final Exam Review Class
20/62
Logistic Regression
Used when the outcome is dichotomous(binary), e.g. diseased , not diseased.
Our goals remain the same as for linearregression:
is there an association between a
variable X and our outcome variable Y?If so, what type?
-
8/10/2019 Final Exam Review Class
21/62
Simple Logistic Regression
We model the probabilitypof havingthe disease.
Xbb
Xbb
10
10
e1
ep
xbbp1
pln)plogit( 10
-
8/10/2019 Final Exam Review Class
22/62
Multiple Logistic Regression
Outcome is dichotomous (1=event,0=non-event) and p=P(event)
Outcome is modeled as log odds
Exp(bi) = OR
pp22110 xb...xbxbbp-1
pln
-
8/10/2019 Final Exam Review Class
23/62
Survival Analysis
Outcome is the time to an event.
An event could be time to heart attack,
cancer remission or death. Measure whether person has event or not
(Yes/No) and if so, their time to event.
Determine factors associated with longersurvival.
-
8/10/2019 Final Exam Review Class
24/62
Survival Analysis
Incomplete follow-up information
Censoring
Measure follow-up time and not time toevent
We know survival time > follow-up time
Log rank test to compare survival intwo or more independent groups
-
8/10/2019 Final Exam Review Class
25/62
Cox Proportional Hazards Model
Model:
ln(h(t)/h0(t)) = b1X1+ b2X2+ + bpXp
Exp(bi) = hazard ratio
Model used to jointly assess effects of
independent variables on outcome(time to an event).
-
8/10/2019 Final Exam Review Class
26/62
BS704 Practice Problems forFinal Exam
-
8/10/2019 Final Exam Review Class
27/62
Suppose a cross-sectional study is
conducted to investigate cardiovascular riskfactors among a sample of patients seeking
medical care at one of three local hospitals.A total of 300 patients are enrolled. Usingthe following data, test if there is anassociation between enrollment site (i.e.,hospital) and family history of CVD. Runthe appropriate test at a 5% level ofsignificance.
Problem 1.
-
8/10/2019 Final Exam Review Class
28/62
Family
Hx
Hosp 1 Hosp 2 Hosp 3
Definite 24 14 22
Probable 8 14 8
No 68 72 70
Total 100 100 100
Problem 1.
-
8/10/2019 Final Exam Review Class
29/62
H0: Site and family history areindependent
H1: H
0is false =0.05
Df = (r-1)(c-1) = (3-1)(3-1) = 4.
Reject H0if 2> 9.49
Problem 1.
-
8/10/2019 Final Exam Review Class
30/62
FamilyHx
Hosp 1 Hosp 2 Hosp 3
Definite 24 (20) 14 (20) 22 (20)
Probable 8 (10) 14 (10) 8 (10)
No 68 (70) 72 (70) 70 (70)
Total 100 100 100
Problem 1.
-
8/10/2019 Final Exam Review Class
31/62
= 0.8 + 1.8 + 0.2 + 0.4 + 1.6 + 0.4 + 0.06+ 0.06 + 0 = 5.32
Do not reject H0because 5.32
-
8/10/2019 Final Exam Review Class
32/62
The following table summarizes data collectedin the study described in problem 1. Thevariable summarized below is body mass
index (BMI) computed as the ratio of weightin kilograms to height in meters squared.
BMI Overall Hosp 1 Hosp 2 Hosp 3
N 300 100 100 100Mean 24.8 21.6 24.8 27.9
Std Dev 2.5 2.1 1.8 1.3
Problem 2.
-
8/10/2019 Final Exam Review Class
33/62
Test if there is a significant difference in the mean BMI
scores among hospitals. Show all parts of the test anduse a 5% level of significance. (HINT: MSE = 3.1).
H0: 123H1: means not all equal =0.05
=100((21.6-24.8)2+(24.824.8)2+(27.924.8)2)
= 100(10.24 + 0 + 9.61) = 1985
2
jj )XX(nSSb
Problem 2.
-
8/10/2019 Final Exam Review Class
34/62
Source SS Df MS F
Between 1985 2 992.5 320.2
Error 920.7 297 3.1
Total 2905.7 299
Reject H0if F > 3.09
F = 320.2
Reject H0since 320.2 > 3.09. We have significantevidence, =0.05, to show that the means are notall equal.
Problem 2.
-
8/10/2019 Final Exam Review Class
35/62
Suppose each participant in the studydescribed in problem 1 is assigned acardiovascular risk (a value between 0 and
100 with higher scores indicative of morerisk of cardiovascular disease). The meancardiovascular risk is 21.7 with a standarddeviation of 5.6. Suppose that the
covariance between BMI and cardiovascularrisk is 4.5.
Problem 3.
-
8/10/2019 Final Exam Review Class
36/62
Compute the sample correlation coefficient betweenBMI and cardiovascular risk.
Var(BMI) = sx2= 2.52Var(Risk) = sy
2 = 5.62
0.3)6.5()5.2(
4.5
ss
Y)Cov(X,r222
y
2
x
Problem 3.
-
8/10/2019 Final Exam Review Class
37/62
2r12)(nrZ
4.5
)3.0(1
2983.0Z
2
Is this correlation statistically significant?Run the appropriate test at a 5% level of significance.
H0: r= 0H1: r0 =0.05
Reject H0if Z < -1.96 or if Z > 1.96
Reject H0since 5.4 > 1.96. We have significantevidence, =0.05, to show that r0.
-
8/10/2019 Final Exam Review Class
38/62
-
8/10/2019 Final Exam Review Class
39/62
Suppose we restrict our attention to thesubgroup of patients at high risk forcardiovascular disease (cardiovascular
risk score of 30 or more).
Using the following data, test if BMI is
significantly different in men versuswomen. Use a 5% level of significance.
Problem 5.
-
8/10/2019 Final Exam Review Class
40/62
BMI Men Women
N 20 10
Mean 31.6 28.1
Std Dev 1.7 2.121
21
n
1
n
1Sp
XXt
H0: 1= 2H1: 1 2 =0.05
Df=20+10-2 = 28Reject H0if t < -2.048 or if t > 2.048
Problem 5.
-
8/10/2019 Final Exam Review Class
41/62
84.121020
)1.2(9)7.1(19 22
pS
Reject H0since 4.91>2.048. We have significant evidence,=0.05, to show there is a difference in mean BMIbetween men and women.
91.4
10
1
20
11.84
28.1-31.6t
Problem 5.
-
8/10/2019 Final Exam Review Class
42/62
Problem 6.
How many men and women would be required toestimate a difference in mean BMI with a 95%confidence interval and a margin of error notexceeding 1 unit. (Use data from problem 6 asneeded.)
Need 27 men and 27 women.
2
iE
Zs2n
26.011
1.96(1.84)2n
2
i
Use Sp from #6
-
8/10/2019 Final Exam Review Class
43/62
Problem 7.
The following table was constructed based on acomparison of various sociodemographiccharacteristics between men and women enrolled in
the study of cardiovascular risk factors.
Which, if any, of the characteristics shownabove are significantly different between men
and women? Justify.
-
8/10/2019 Final Exam Review Class
44/62
-
8/10/2019 Final Exam Review Class
45/62
-
8/10/2019 Final Exam Review Class
46/62
Problem 9.
Two different scales are used in a particularlaboratory. There is some concern that onescale gives different readings than the other.
Ten specimens are randomly selected andweighed on each scale. The data are shownbelow.
Test if there is a significant difference inweights between the two scales at =0.05
-
8/10/2019 Final Exam Review Class
47/62
Specimen Scale 1 Scale 2
1 1.2 2.1
2 3.5 3.6
3 1.8 1.9
4 4.0 4.0
5 5.0 4.9
6 1.9 2.0
7 2.7 2.7
8 2.2 2.3
9 2.8 2.9
10 3.5 3.7
Problem 9.
-
8/10/2019 Final Exam Review Class
48/62
0.1510
1.5
n
diffXd
0.276
9
/10(1.5)0.91
1n
/ndiffdiffs
222
d
H0: d= 0H1: d0 =0.05
1ndf,ns
Xtd
d Reject H0if t < -2.262 or if t > 2.262
1.72
10
0.276
0.15
ns
Xt
d
d
Do not reject H0because -2.262 < 1.72 < 2.262. We do nothave significant evidence at =0.05 to show that d0
-
8/10/2019 Final Exam Review Class
49/62
Patients with hypertension are generally
recommended to follow a low salt diet.Surveys report that approximately 75% of
patients adhere to these diets. In a randomsample of 100 patients with hypertension,70% report following a low-salt diet. Arethese patients significantly low in terms of
adherence? Run the test at = 0.05.
Problem 10.
-
8/10/2019 Final Exam Review Class
50/62
H0: p = 0.75H1: p < 0.75 =0.05
Reject H0if Z < -1.645
Do not reject H0because -1.15 > -1.645. We do nothave significant evidence at =0.05 to show that p
-
8/10/2019 Final Exam Review Class
51/62
Risk Factors Outcome = Systolic Blood
Pressure
RegressionCoefficient
p
Intercept 105.3 0.0001
Age
1.2
0.0042
Male Sex
4.5
0.0956
Current Smoker -0.5 0.2354
Number of Hrs
Exercise/Week
-2.4 0.0003
The following table was presented in a journal and describesthe associations between demographic and clinical riskfactors and systolic blood pressure.
Problem 11.
-
8/10/2019 Final Exam Review Class
52/62
a) What type of analysis generated the results summarizedabove?
Multiple linear regression analysis because the outcome(systolic blood pressure) is continuous.
b) Which of the risk factors are significantly associated withsystolic blood pressure?
Age and number of hours of exercise are statisticallysignificant at the 5% level (both have p values < 0.05). Malesex is marginally significant with a p value of 0.0956.
Problem 11.
-
8/10/2019 Final Exam Review Class
53/62
c) What is the relative importance of the risk factors?
The most important (statistically significant) risk factor is number ofhours of exercise per week, followed by age and then male sex.
Current smoking status is not statistically significant.
d) How would you interpret the regression coefficient associated withmale sex? With number of hours of exercise per week?
Mens systolic blood pressure is 4.5 units higher than womensholding age, smoking status and number of hours of exerciseconstant. Each additional hour of exercise per week is associatedwith a reduction of 2.4 units of systolic blood pressure holding age,sex and current smoking status constant.
Problem 11.
-
8/10/2019 Final Exam Review Class
54/62
Risk Factors Outcome = Hypertension
Regression Coefficient
p
Intercept 3.5 0.0001
Age
0.02
0.0357
Male Sex
0.27
0.0264
Current Smoker
-0.005
0.7564
Number of Hrs Exercise/Week -0.36 0.0111
The following table was presented in a journal and describesthe associations between demographic and clinical risk factorsand hypertension.
Problem 12.
-
8/10/2019 Final Exam Review Class
55/62
a) What type of analysis generated the results summarized above?
Multiple logistic regression analysis because the outcome(hypertension) is dichotomous.
b) Which of the risk factors are significantly associated withhypertension?
Age, male sex and number of hours of exercise are statisticallysignificant at the 5% level (both have p values < 0.05).
c) What is the relative importance of the risk factors?The most important (statistically significant) risk factor is number ofhours of exercise per week, followed by male sex and then age.Current smoking status is not statistically significant.
Problem 12.
-
8/10/2019 Final Exam Review Class
56/62
-
8/10/2019 Final Exam Review Class
57/62
Radiation Surgery Neither Total
-
8/10/2019 Final Exam Review Class
58/62
H0: Age and treatment recommendation are independentH1: H0is false
=0.05
E
EO 22 )(
Df = (r-1)(c-1) = (3-1)(3-1) = 4.
Reject H0if 2> 9.49
Problem 13.
)3741()9.2830()1.3429()3750()9.2815()1.3435( 2222222
-
8/10/2019 Final Exam Review Class
59/62
Radiation Surgery Neither Total
9.49. We have significant evidence, =0.05,to show that age and treatment recommendation are not independent.
-
8/10/2019 Final Exam Review Class
60/62
Problem 14.
For each of the following scenarios,indicate which test would be used. Usethe letters below to indicate the test in
the space provided. Note that the sametest might be used for more than onescenario.
-
8/10/2019 Final Exam Review Class
61/62
Problem 14.
a) Compare mean to historical/external control
b) Compare proportion to historical/external control
c) Compare two independent means
d) Compare two matched/paired means
e) Analysis of variance
f) Chi-square goodness of fit test
g) Chi-square test of independence
h) Correlation analysis
i) Linear regression analysisj) Logistic regression analysis
k) Survival analysis
-
8/10/2019 Final Exam Review Class
62/62
Problem 14.Scenario
Test
1. We want to test if there is a significant association between BMI (kg/m2) andincident myocardial infarction adjusting for age, sex, systolic blood pressure andsmoking.
j
2. We want to test if a new environmental intervention is effective in reducingexposure to second-hand smoke. Each participant in the study has levels of exposuremeasured before and after the intervention is implemented.
d
3. We wish to test if there is a significant association between GRE scores and first
year GPA in MPH students who matriculated in fall 2011.
h or i
4. We want to determine if there are significant differences in ages of participantsenrolled in a study comparing those with a family history of cardiovascular disease tothose without.
c
5. A study reports that 15% of college freshman smoke. We want to test ifsignificantly more BU freshman smoke.
b
6. We want to test if there is a difference in preterm versus term deliveries amongwomen of black, Hispanic and white race.
g
7. We want to test if nutritional supplements prolong life (minimize time to death) in
persons over 65 years of age, adjusted for sex and other comorbid conditions.
k
8. A clinical trial is run to assess the safety of a new drug compared to a standarddrug and the outcome is development of skin rash or not
g or j
9. We want to test if there is a difference in mean time to complete a physical taskwhen comparing 12, 13, 14 and 15 year olds.
e
10. We want to test whether smoking in pregnancy increases the risk of infection innewborns.
g or k