Lecture

download Lecture

of 49

description

l

Transcript of Lecture

  • Statistics for Medical ResearchersHongshik AhnProfessorDepartment of Applied Math and StatisticsStony Brook UniversityBiostatistician, Stony Brook GCRC

  • *ContentsExperimental DesignDescriptive Statistics and DistributionsComparison of MeansComparison of ProportionsPower Analysis/Sample Size CalculationCorrelation and Regression

  • *1. Experimental DesignExperimentTreatment: something that researchers administer to experimental unitsFactor: controlled independent variable whose levels are set by the experimenterExperimental designControlTreatmentPlacebo effectBlindsingle blind, double blind, triple blind

  • *1. Experimental DesignRandomizationCompletely randomized designRandomized block design: if there are specific differences among groups of subjectsPermuted block randomization: used for small studies to maintain reasonably good balance among groupsStratified block randomization: matching

  • *1. Experimental DesignCompletely randomized design The computer generated sequence: 4,8,3,2,7,2,6,6,3,4,2,1,6,2,0,. Two Groups (criterion: even-odd): AABABAAABAABAAA Three Groups: (criterion:{1,2,3}~A, {4,5,6}~B, {7,8,9}~C; ignore 0s) BCAACABBABAABA Two Groups: different randomization ratios(eg.,2:3): (criterion:{0,1,2,3}~A, {4,5,6,7,8,9}~B) BBAABABBABAABAA..

  • *1. Experimental DesignPermuted block randomization With a block size of 4 for two groups(A,B), there are 6 possible permutations and they can be coded as: 1=AABB, 2=ABAB, 3=ABBA, 4=BAAB, 5=BABA, 6=BBAA Each number in the random number sequence in turn selects the next block, determining the next four participant allocations (ignoring numbers 0,7,8 and 9). e.g., The sequence 67126814. will produce BBAA AABB ABAB BBAA AABB BAAB. In practice, a block size of four is too small since researchers may crack the code and risk selection bias. Mixing block sizes of 4 and 6 is better with the size kept un known to the investigator.

  • *1. Experimental DesignMethods of Sampling Random sampling Systematic sampling Convenience sampling Stratified sampling

  • *1. Experimental DesignRandom SamplingSelection so that each individual member has an equal chance of being selectedSystematic SamplingSelect some starting point and then select every k th element in the population

  • *1. Experimental DesignConvenience SamplingUse results that are easy to get

  • *1. Experimental DesignStratified SamplingDraw a sample from each stratum

  • *2. Descriptive Statistics & DistributionsParameter: population quantityStatistic: summary of the sampleInference for parameters: use sampleCentral TendencyMean (average)Median (middle value)VariabilityVariance: measure of variationStandard deviation (sd): square root of varianceStandard error (se): sd of the estimateMedian, quartiles, min., max, range, boxplotProportion

  • *2. Descriptive Statistics & DistributionsNormal distribution

  • *2. Descriptive Statistics & DistributionsStandard normal distribution: Mean 0, variance 1

  • *2. Descriptive Statistics & DistributionsZ-test for means T-test for means if sd is unknown

  • *3. Inference for MeansTwo-sample t-testTwo independent groups: Control and treatmentContinuous variablesAssumption: populations are normally distributedChecking normalityHistogramNormal probability curve (Q-Q plot): straight?Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling testIf the normality assumption is violatedT-test is not appropriate. Possible transformationUse non-parametric alternative: Mann-Whitney U-test (Wilcoxon rank-sum test)

  • *3. Inference for MeansA clinical trial on effectiveness of drug A in preventing premature birth30 pregnant women are randomly assigned to control and treatment groups of size 15 eachPrimary endpoint: weight of the babies at birth Treatment Controln15 15mean 7.08 6.26sd 0.90 0.96

  • *3. Inference for MeansHypothesis: The group means are differentNull hypothesis (Ho): 1 = 2Alternative hypothesis (H1): 1 2Significance level: = 0.05Assumption: Equal varianceDegrees of freedom (df): Calculate the T-value (test statistic)

    P-value: Type I error rate (false positive rate)Reject Ho if p-value < Do not reject Ho if p-value >

  • *3. Inference for Means

    Previous example: Test at

    P-value: 0.026 < 0.05Reject the null hypothesis that there is no drug effect.

  • *3. Inference for MeansConfidence interval (CI):An interval of values used to estimate the true value of a population parameter. The probability 1- that is the proportion of times that the CI actually contains the population parameter, assuming that the estimation process is repeated a large number of times. Common choices: 90% CI ( = 10%), 95% CI ( = 5%), 99% CI ( = 1%)

  • 3. Inference for MeansCI for a comparison of two means:

    where A 95% CI for the previous example:

  • *3. Inference for MeansSAS programming for Two-Sample T-testData steps :Click File Click Import Data Select a data source Click Browse and find the path of the data file Click Next Fill the blank of Member with the name of the SAS data set Click FinishProcedure steps :Click Solutions Click Analysis Click Analyst Click File Click Open By SAS Name Select the SAS data set and Click OK Click Statistics Click Hypothesis Tests Click Two-Sample T-test for Means Select the independent variable as Group and the dependent variable as Dependent Choose the interested Hypothesis and Click OK

  • *3. Inference for MeansClick Statistics to select the statistical procedure.Click File to open the SAS data set.Click File to import data and create the SAS data set.Click Solution to create a project to run statistical test

  • *3. Inference for MeansMann-Whitney U-Test (Wilcoxon Rank-Sum Test)Nonparametric alternative to two-sample t-testThe populations dont need to be normal H0: The two samples come from populations with equal medians H1: The two samples come from populations with different medians

  • *3. Inference for MeansMann-Whitney U-Test ProcedureTemporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samplesCalculate the value of the z test statistic

  • *3. Inference for MeansMann-Whitney U-Test, ExampleNumbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7.R1 and R2: sum of ranks

  • *3. Inference for MeansHypothesis: The group means are differentHo: Men and women have same median BMIsH1: Men and women have different median BMIs

    p-value= 0.33, thus we do not reject H0 at =0.05. There is no significant difference in BMI between men and women.

  • *3. Inference for MeansSAS Programming for Mann-Whitney U-Test ProcedureData steps : The same as slide 21.Procedure steps : Click Solutions Click Analysis Click Analyst Click File Click Open By SAS Name Select the SAS data set and Click OK Click Statistics Click ANOVA Click Nonparametric One-Way ANOVA Select the Dependent and Independent variables respectively and choose the interested test Click OK

  • *3. Inference for MeansClick Statistics to select the statistical procedure.Click File to open the SAS data set.Select the dependent and independent variables:

  • *3. Inference for MeansPaired t-testMean difference of matched pairsTest for changes (e.g., before & after)The measures in each pair are correlated.Assumption: population is normally distributedTake the difference in each pair and perform one-sample t-test.Check normalityIf the normality assumption is viloatedT-test is not appropriate. Use non-parametric alternative: Wilcoxon signed rank test

  • *3. Inference for MeansNotation for paired t-test d = individual difference between the two values of a single matched pair d = mean value of the differences d for the population of paired data = mean value of the differences d for the paired sample data sd = standard deviation of the differences d for the paired sample data n = number of pairs

  • *3. Inference for MeansExample: Systolic Blood Pressure

    OC: Oral contraceptive

    IDWithout OCsWith OCsDifference111512813211211533107106-141191289511512276138145771261326810510949104102-2101151172

  • *3. Inference for MeansHypothesis: The group means are differentHo: vs. H1: Significance level: = 0.05Degrees of freedom (df): Test statistic

    P-value: 0.009, thus reject Ho at =0.05The data support the claim that oral contraceptives affect the systolic bp.

  • *3. Inference for MeansConfidence interval for matched pairs100(1-)% CI:

    95% CI for the mean difference of the systolic bp:

    (1.53, 8.07)

  • *3. Inference for MeansSAS Programming for Paired T-testData steps : The same as slide 21.Procedure steps : Click Solutions Click Analysis Click Analyst Click File Click Open By SAS Name Select the SAS data set and Click OK Click Statistics Click Hypothesis tests Click Two-Sample Paired T-test for means Select the Group1 and Group2 variables respectively Click OK

    (Note: You can also calculate the difference, and use it as the dependent variable to run the one-sample t-test)

  • *3. Inference for MeansClick Statistics to select the statistical procedure.Click File to open the SAS data set. Put the two group variables into Group 1 and Group 2

  • *3. Inference for MeansComparison of more than two means:ANOVA (Analysis of Variance)One-way ANOVA: One factor, eg., control, drug 1, drug 2Two-way ANOVA: Two factors, eg., drugs, age groupsRepeated measures: If there is a repeated measures within subject such as time points

  • *3. Inference for meansExample: Pulmonary diseaseEndpoint: Mid-expiratory flow (FEF) in L/s6 groups: nonsmokers (NS), passive smokers (PS), noninhaling smokers (NI), light smokers (LS), moderate smokers (MS) and heavy smokers (HS)

    Group nameMean FEFSD FEFnNS3.780.79200PS3.300.77200NI3.320.8650LS3.230.78200MS2.730.81200HS2.590.82200

  • *3. Inference for meansExample: Pulmonary diseaseHo: group means are the same H1: not all the groups means are the same

    P-value

  • *3. Inference for MeansSAS Programming for One-Way ANOVAData steps : The same as slide 21.Procedure steps : Click Solutions Click Analysis Click Analyst Click File Click Open By SAS Name Select the SAS data set and Click OK Click Statistics Click ANOVA Click One-Way ANOVA Select the Independent and Dependent variables respectively Click OK

  • *3. Inference for MeansClick Solutions to select the statistical procedure.Click File to open the SAS data set.Select the dependent and Independent variables:

  • *4. Inference for ProportionsChi-square testTesting difference of two proportionsn: #successes, p: success rateRequirement: &H0: p1 = p2H1: p1 p2 (for two-sided test)If the requirement is not satisfied, use Fishers exact test.

  • *5. Power/Sample Size CalculationDecide significance level (eg. 0.05)Decide desired power (eg. 80%)One-sided or two-sided testComparison of means: two-sample t-testNeed to know sample means in each groupNeed to know sample sds in each groupCalculation: use software (Nquery, power, etc)Comparison of proportions: Chi-square testNeed to know sample proportions in each groupContinuity correctionSmall sample size: Fishers exact testCalculation: use software

  • *6. Correlation and RegressionCorrelationPearson correlation for continuous variablesSpearman correlation for ranked variablesChi-square test for categorical variablesPearson correlationCorrelation coefficient (r): -1
  • *6. Correlation and RegressionRegressionObjectiveFind out whether a significant linear relationship exists between the response and independent variablesUse it to predict a future valueNotationX: independent (predictor) variableY: dependent (response) variableMultiple linear regression model

    Where is the random errorChecking the model (assumption)Normality: q-q plot, histogram, Shapiro-Wilk testEqual variance: predicted y vs. error is a band shapeLinear relationship: predicted y vs. each x

  • *6. Correlation and Regression

    Weight (x1) in LBAge (x2)Blood pressure (y)15250120183201411712012416530126158301171615012914960123158501251704013215355123164401321904015518520147

  • *6. Correlation and RegressionThe regression equation is

    The mean blood pressure increases by 1.08 if weight (x1) increases by one pound and age (x2) remains fixed. Similarly, a 1-year increase in age with the weight held fixed will increase the mean blood pressure by 0.425.

    s=2.509 R2=95.8%Error sd is estimated as 2.509 with df=13-3=1095.8% of the variation in y can be explained by the regression.

    PredictorCoefficientseT-ratioP-valueConstant-65.1014.94-4.360.001x11.0770.07713.980.000x20.4250.0735.820.000

  • *6. Correlation and RegressionSAS Programming for Linear RegressionData steps : The same as slide 21.Procedure steps : Click Solutions Click Analysis Click Analyst Click File Click Open By SAS Name Select the SAS data set and Click OK Click Statistics Click Regression Click Linear Select the Dependent (Response) variable and the Explanatory (Predictor) variable respectively Click OK

  • *6. Correlation and RegressionClick Solutions to select the statistical procedure.Click File to open the SAS data set.Select the dependent and explanatory variables:

  • *6. Correlation and RegressionOther regression modelsPolynomial regressionTransformationLogistic regression

    ***********************************************