Final Exam Review Class

download Final Exam Review Class

of 62

Transcript of Final Exam Review Class

  • 8/10/2019 Final Exam Review Class

    1/62

    BS704 Review for Final Exam

  • 8/10/2019 Final Exam Review Class

    2/62

    Final Exam

    You may bring 5 pages of notes

    You MUST bring full copies ofstatistical tables (on Blackboard)

    You MUST bring a calculator

  • 8/10/2019 Final Exam Review Class

    3/62

    Topics Covered Since Midterm:

    Hypothesis Testing for a single mean andproportion, and for two means

    One-way ANOVA

    Chi-square Tests

    Power and Sample size

    Regression and Correlation

    Logistic Regression

    Survival analysis

  • 8/10/2019 Final Exam Review Class

    4/62

    Hypothesis Tests for...

    Single mean

    Single proportionp

    Comparing two means 1- 2

    Paired (or matched) data d

  • 8/10/2019 Final Exam Review Class

    5/62

    Conducting a Hypothesis Test

    Define null and research hypotheses

    Define test statistic, level of

    significance and decision rule

    Calculate test statistic based uponsample data.

    Use decision rule or p-value to decidewhether to reject or not reject the nullhypothesis.

  • 8/10/2019 Final Exam Review Class

    6/62

    Conducting a Hypothesis Test

    For a single mean

    If n 30, use z-test statistic

    If n < 30 use t-test statistic

    For a single proportionp

    Use z-test statistic

    Check assumptions

  • 8/10/2019 Final Exam Review Class

    7/62

    Conducting a Hypothesis Test

    For comparing two means 1- 2

    If n1and n2both 30, use z-test statistic

    If n1and/or n2< 30 use t-test statistic

    For comparing two proportionsp1-p2

    Use chi-square test

  • 8/10/2019 Final Exam Review Class

    8/62

    Type I and II errors

    Type I error occurs when we reject nullhypothesis when we shouldnt.

    Pr(Type I error) =

    Type II error occurs when we dontreject null hypothesis when we should

    have.

    Pr(Type II error) =

  • 8/10/2019 Final Exam Review Class

    9/62

    One-Way ANOVA

    Used when we want to compare the meansof three or more groups from independentpopulations.

    Continuous outcome measured on eachsubject.

    We set up an analysis of variance table and

    compare the variances of between groupsand within groups.

    An F-test is used with two different degreesof freedom terms.

  • 8/10/2019 Final Exam Review Class

    10/62

    Chi-Square Test

    Chi-square goodness of fit test

    Assess whether responses fit a specifieddistribution for one sample of people

    Chi-square test of independence

    Test if two discrete variables are associated insome way for a sample of people

    Chi-square test comparing distributions

    Compare distributions of proportions among twoor more independent groups

  • 8/10/2019 Final Exam Review Class

    11/62

    Calculating a Sample Size for a Study

    Need a large enough sample to ensureyou have the pre-specified amount of

    precision in analysis Sample size determined based on type

    of planned analysis:

    Confidence interval

    Hypothesis test

  • 8/10/2019 Final Exam Review Class

    12/62

    Calculating a Sample Size for a Study

    We always round up our calculation.

    Need to account for possible dropout

    from study. This always increases therequired sample size.

  • 8/10/2019 Final Exam Review Class

    13/62

    Power

    Linked up with Type II error

    Power = 1-

    =P(Reject H0 | H0false)

    = Probability of correctlyrejecting H0when H0is false.

  • 8/10/2019 Final Exam Review Class

    14/62

    Correlation

    Correlation measures the nature andstrength of linear association between

    two variables at a time.

    Regression equation that best

    describes relationship betweenvariables.

  • 8/10/2019 Final Exam Review Class

    15/62

    Correlation Coefficient

    Population correlation is r (rho)

    Sample correlation is rwhere

    -1 < r< +1

    Sign indicates nature of relationship(positive or direct, negative or inverse)

    Magnitude indicates strength

  • 8/10/2019 Final Exam Review Class

    16/62

    Linear Regression

    A very popular method for describingthe linear relationship between two

    variables (usually continuousvariables).

    We use a scatterplot to display the

    data graphically A line to show the association between

    the two variables.

  • 8/10/2019 Final Exam Review Class

    17/62

    Simple Linear Regression

    Y = Dependent, Outcome variable

    X = Independent, Covariate, Predictor

    variable

    = b0+ b1x

    b0is the Y-intercept, b1is the slope

    y

  • 8/10/2019 Final Exam Review Class

    18/62

    Multiple Linear Regression

    Useful when we want to jointlyexamine the effect of several X

    variables on the outcome Y variable. Y = continuous outcome variable

    X1, X2, , Xp= set of independent or

    predictor variables

    . xb+...+xb+xb+b=y pp22110

  • 8/10/2019 Final Exam Review Class

    19/62

    Linear Regression

    Predictors can be continuous, indicatorvariables (0/1) or a set of dummy variables

    Confoundingthe effect of a risk factor onan outcome is somehow changed due to theeffect of another factor.

    Effect Modificationa different relationship

    between the risk factor and an outcomedepending on the level of another variable.

  • 8/10/2019 Final Exam Review Class

    20/62

    Logistic Regression

    Used when the outcome is dichotomous(binary), e.g. diseased , not diseased.

    Our goals remain the same as for linearregression:

    is there an association between a

    variable X and our outcome variable Y?If so, what type?

  • 8/10/2019 Final Exam Review Class

    21/62

    Simple Logistic Regression

    We model the probabilitypof havingthe disease.

    Xbb

    Xbb

    10

    10

    e1

    ep

    xbbp1

    pln)plogit( 10

  • 8/10/2019 Final Exam Review Class

    22/62

    Multiple Logistic Regression

    Outcome is dichotomous (1=event,0=non-event) and p=P(event)

    Outcome is modeled as log odds

    Exp(bi) = OR

    pp22110 xb...xbxbbp-1

    pln

  • 8/10/2019 Final Exam Review Class

    23/62

    Survival Analysis

    Outcome is the time to an event.

    An event could be time to heart attack,

    cancer remission or death. Measure whether person has event or not

    (Yes/No) and if so, their time to event.

    Determine factors associated with longersurvival.

  • 8/10/2019 Final Exam Review Class

    24/62

    Survival Analysis

    Incomplete follow-up information

    Censoring

    Measure follow-up time and not time toevent

    We know survival time > follow-up time

    Log rank test to compare survival intwo or more independent groups

  • 8/10/2019 Final Exam Review Class

    25/62

    Cox Proportional Hazards Model

    Model:

    ln(h(t)/h0(t)) = b1X1+ b2X2+ + bpXp

    Exp(bi) = hazard ratio

    Model used to jointly assess effects of

    independent variables on outcome(time to an event).

  • 8/10/2019 Final Exam Review Class

    26/62

    BS704 Practice Problems forFinal Exam

  • 8/10/2019 Final Exam Review Class

    27/62

    Suppose a cross-sectional study is

    conducted to investigate cardiovascular riskfactors among a sample of patients seeking

    medical care at one of three local hospitals.A total of 300 patients are enrolled. Usingthe following data, test if there is anassociation between enrollment site (i.e.,hospital) and family history of CVD. Runthe appropriate test at a 5% level ofsignificance.

    Problem 1.

  • 8/10/2019 Final Exam Review Class

    28/62

    Family

    Hx

    Hosp 1 Hosp 2 Hosp 3

    Definite 24 14 22

    Probable 8 14 8

    No 68 72 70

    Total 100 100 100

    Problem 1.

  • 8/10/2019 Final Exam Review Class

    29/62

    H0: Site and family history areindependent

    H1: H

    0is false =0.05

    Df = (r-1)(c-1) = (3-1)(3-1) = 4.

    Reject H0if 2> 9.49

    Problem 1.

  • 8/10/2019 Final Exam Review Class

    30/62

    FamilyHx

    Hosp 1 Hosp 2 Hosp 3

    Definite 24 (20) 14 (20) 22 (20)

    Probable 8 (10) 14 (10) 8 (10)

    No 68 (70) 72 (70) 70 (70)

    Total 100 100 100

    Problem 1.

  • 8/10/2019 Final Exam Review Class

    31/62

    = 0.8 + 1.8 + 0.2 + 0.4 + 1.6 + 0.4 + 0.06+ 0.06 + 0 = 5.32

    Do not reject H0because 5.32

  • 8/10/2019 Final Exam Review Class

    32/62

    The following table summarizes data collectedin the study described in problem 1. Thevariable summarized below is body mass

    index (BMI) computed as the ratio of weightin kilograms to height in meters squared.

    BMI Overall Hosp 1 Hosp 2 Hosp 3

    N 300 100 100 100Mean 24.8 21.6 24.8 27.9

    Std Dev 2.5 2.1 1.8 1.3

    Problem 2.

  • 8/10/2019 Final Exam Review Class

    33/62

    Test if there is a significant difference in the mean BMI

    scores among hospitals. Show all parts of the test anduse a 5% level of significance. (HINT: MSE = 3.1).

    H0: 123H1: means not all equal =0.05

    =100((21.6-24.8)2+(24.824.8)2+(27.924.8)2)

    = 100(10.24 + 0 + 9.61) = 1985

    2

    jj )XX(nSSb

    Problem 2.

  • 8/10/2019 Final Exam Review Class

    34/62

    Source SS Df MS F

    Between 1985 2 992.5 320.2

    Error 920.7 297 3.1

    Total 2905.7 299

    Reject H0if F > 3.09

    F = 320.2

    Reject H0since 320.2 > 3.09. We have significantevidence, =0.05, to show that the means are notall equal.

    Problem 2.

  • 8/10/2019 Final Exam Review Class

    35/62

    Suppose each participant in the studydescribed in problem 1 is assigned acardiovascular risk (a value between 0 and

    100 with higher scores indicative of morerisk of cardiovascular disease). The meancardiovascular risk is 21.7 with a standarddeviation of 5.6. Suppose that the

    covariance between BMI and cardiovascularrisk is 4.5.

    Problem 3.

  • 8/10/2019 Final Exam Review Class

    36/62

    Compute the sample correlation coefficient betweenBMI and cardiovascular risk.

    Var(BMI) = sx2= 2.52Var(Risk) = sy

    2 = 5.62

    0.3)6.5()5.2(

    4.5

    ss

    Y)Cov(X,r222

    y

    2

    x

    Problem 3.

  • 8/10/2019 Final Exam Review Class

    37/62

    2r12)(nrZ

    4.5

    )3.0(1

    2983.0Z

    2

    Is this correlation statistically significant?Run the appropriate test at a 5% level of significance.

    H0: r= 0H1: r0 =0.05

    Reject H0if Z < -1.96 or if Z > 1.96

    Reject H0since 5.4 > 1.96. We have significantevidence, =0.05, to show that r0.

  • 8/10/2019 Final Exam Review Class

    38/62

  • 8/10/2019 Final Exam Review Class

    39/62

    Suppose we restrict our attention to thesubgroup of patients at high risk forcardiovascular disease (cardiovascular

    risk score of 30 or more).

    Using the following data, test if BMI is

    significantly different in men versuswomen. Use a 5% level of significance.

    Problem 5.

  • 8/10/2019 Final Exam Review Class

    40/62

    BMI Men Women

    N 20 10

    Mean 31.6 28.1

    Std Dev 1.7 2.121

    21

    n

    1

    n

    1Sp

    XXt

    H0: 1= 2H1: 1 2 =0.05

    Df=20+10-2 = 28Reject H0if t < -2.048 or if t > 2.048

    Problem 5.

  • 8/10/2019 Final Exam Review Class

    41/62

    84.121020

    )1.2(9)7.1(19 22

    pS

    Reject H0since 4.91>2.048. We have significant evidence,=0.05, to show there is a difference in mean BMIbetween men and women.

    91.4

    10

    1

    20

    11.84

    28.1-31.6t

    Problem 5.

  • 8/10/2019 Final Exam Review Class

    42/62

    Problem 6.

    How many men and women would be required toestimate a difference in mean BMI with a 95%confidence interval and a margin of error notexceeding 1 unit. (Use data from problem 6 asneeded.)

    Need 27 men and 27 women.

    2

    iE

    Zs2n

    26.011

    1.96(1.84)2n

    2

    i

    Use Sp from #6

  • 8/10/2019 Final Exam Review Class

    43/62

    Problem 7.

    The following table was constructed based on acomparison of various sociodemographiccharacteristics between men and women enrolled in

    the study of cardiovascular risk factors.

    Which, if any, of the characteristics shownabove are significantly different between men

    and women? Justify.

  • 8/10/2019 Final Exam Review Class

    44/62

  • 8/10/2019 Final Exam Review Class

    45/62

  • 8/10/2019 Final Exam Review Class

    46/62

    Problem 9.

    Two different scales are used in a particularlaboratory. There is some concern that onescale gives different readings than the other.

    Ten specimens are randomly selected andweighed on each scale. The data are shownbelow.

    Test if there is a significant difference inweights between the two scales at =0.05

  • 8/10/2019 Final Exam Review Class

    47/62

    Specimen Scale 1 Scale 2

    1 1.2 2.1

    2 3.5 3.6

    3 1.8 1.9

    4 4.0 4.0

    5 5.0 4.9

    6 1.9 2.0

    7 2.7 2.7

    8 2.2 2.3

    9 2.8 2.9

    10 3.5 3.7

    Problem 9.

  • 8/10/2019 Final Exam Review Class

    48/62

    0.1510

    1.5

    n

    diffXd

    0.276

    9

    /10(1.5)0.91

    1n

    /ndiffdiffs

    222

    d

    H0: d= 0H1: d0 =0.05

    1ndf,ns

    Xtd

    d Reject H0if t < -2.262 or if t > 2.262

    1.72

    10

    0.276

    0.15

    ns

    Xt

    d

    d

    Do not reject H0because -2.262 < 1.72 < 2.262. We do nothave significant evidence at =0.05 to show that d0

  • 8/10/2019 Final Exam Review Class

    49/62

    Patients with hypertension are generally

    recommended to follow a low salt diet.Surveys report that approximately 75% of

    patients adhere to these diets. In a randomsample of 100 patients with hypertension,70% report following a low-salt diet. Arethese patients significantly low in terms of

    adherence? Run the test at = 0.05.

    Problem 10.

  • 8/10/2019 Final Exam Review Class

    50/62

    H0: p = 0.75H1: p < 0.75 =0.05

    Reject H0if Z < -1.645

    Do not reject H0because -1.15 > -1.645. We do nothave significant evidence at =0.05 to show that p

  • 8/10/2019 Final Exam Review Class

    51/62

    Risk Factors Outcome = Systolic Blood

    Pressure

    RegressionCoefficient

    p

    Intercept 105.3 0.0001

    Age

    1.2

    0.0042

    Male Sex

    4.5

    0.0956

    Current Smoker -0.5 0.2354

    Number of Hrs

    Exercise/Week

    -2.4 0.0003

    The following table was presented in a journal and describesthe associations between demographic and clinical riskfactors and systolic blood pressure.

    Problem 11.

  • 8/10/2019 Final Exam Review Class

    52/62

    a) What type of analysis generated the results summarizedabove?

    Multiple linear regression analysis because the outcome(systolic blood pressure) is continuous.

    b) Which of the risk factors are significantly associated withsystolic blood pressure?

    Age and number of hours of exercise are statisticallysignificant at the 5% level (both have p values < 0.05). Malesex is marginally significant with a p value of 0.0956.

    Problem 11.

  • 8/10/2019 Final Exam Review Class

    53/62

    c) What is the relative importance of the risk factors?

    The most important (statistically significant) risk factor is number ofhours of exercise per week, followed by age and then male sex.

    Current smoking status is not statistically significant.

    d) How would you interpret the regression coefficient associated withmale sex? With number of hours of exercise per week?

    Mens systolic blood pressure is 4.5 units higher than womensholding age, smoking status and number of hours of exerciseconstant. Each additional hour of exercise per week is associatedwith a reduction of 2.4 units of systolic blood pressure holding age,sex and current smoking status constant.

    Problem 11.

  • 8/10/2019 Final Exam Review Class

    54/62

    Risk Factors Outcome = Hypertension

    Regression Coefficient

    p

    Intercept 3.5 0.0001

    Age

    0.02

    0.0357

    Male Sex

    0.27

    0.0264

    Current Smoker

    -0.005

    0.7564

    Number of Hrs Exercise/Week -0.36 0.0111

    The following table was presented in a journal and describesthe associations between demographic and clinical risk factorsand hypertension.

    Problem 12.

  • 8/10/2019 Final Exam Review Class

    55/62

    a) What type of analysis generated the results summarized above?

    Multiple logistic regression analysis because the outcome(hypertension) is dichotomous.

    b) Which of the risk factors are significantly associated withhypertension?

    Age, male sex and number of hours of exercise are statisticallysignificant at the 5% level (both have p values < 0.05).

    c) What is the relative importance of the risk factors?The most important (statistically significant) risk factor is number ofhours of exercise per week, followed by male sex and then age.Current smoking status is not statistically significant.

    Problem 12.

  • 8/10/2019 Final Exam Review Class

    56/62

  • 8/10/2019 Final Exam Review Class

    57/62

    Radiation Surgery Neither Total

  • 8/10/2019 Final Exam Review Class

    58/62

    H0: Age and treatment recommendation are independentH1: H0is false

    =0.05

    E

    EO 22 )(

    Df = (r-1)(c-1) = (3-1)(3-1) = 4.

    Reject H0if 2> 9.49

    Problem 13.

    )3741()9.2830()1.3429()3750()9.2815()1.3435( 2222222

  • 8/10/2019 Final Exam Review Class

    59/62

    Radiation Surgery Neither Total

    9.49. We have significant evidence, =0.05,to show that age and treatment recommendation are not independent.

  • 8/10/2019 Final Exam Review Class

    60/62

    Problem 14.

    For each of the following scenarios,indicate which test would be used. Usethe letters below to indicate the test in

    the space provided. Note that the sametest might be used for more than onescenario.

  • 8/10/2019 Final Exam Review Class

    61/62

    Problem 14.

    a) Compare mean to historical/external control

    b) Compare proportion to historical/external control

    c) Compare two independent means

    d) Compare two matched/paired means

    e) Analysis of variance

    f) Chi-square goodness of fit test

    g) Chi-square test of independence

    h) Correlation analysis

    i) Linear regression analysisj) Logistic regression analysis

    k) Survival analysis

  • 8/10/2019 Final Exam Review Class

    62/62

    Problem 14.Scenario

    Test

    1. We want to test if there is a significant association between BMI (kg/m2) andincident myocardial infarction adjusting for age, sex, systolic blood pressure andsmoking.

    j

    2. We want to test if a new environmental intervention is effective in reducingexposure to second-hand smoke. Each participant in the study has levels of exposuremeasured before and after the intervention is implemented.

    d

    3. We wish to test if there is a significant association between GRE scores and first

    year GPA in MPH students who matriculated in fall 2011.

    h or i

    4. We want to determine if there are significant differences in ages of participantsenrolled in a study comparing those with a family history of cardiovascular disease tothose without.

    c

    5. A study reports that 15% of college freshman smoke. We want to test ifsignificantly more BU freshman smoke.

    b

    6. We want to test if there is a difference in preterm versus term deliveries amongwomen of black, Hispanic and white race.

    g

    7. We want to test if nutritional supplements prolong life (minimize time to death) in

    persons over 65 years of age, adjusted for sex and other comorbid conditions.

    k

    8. A clinical trial is run to assess the safety of a new drug compared to a standarddrug and the outcome is development of skin rash or not

    g or j

    9. We want to test if there is a difference in mean time to complete a physical taskwhen comparing 12, 13, 14 and 15 year olds.

    e

    10. We want to test whether smoking in pregnancy increases the risk of infection innewborns.

    g or k