Review of Methods from Prerequisite Course

101
Review of Methods from Prerequisite Course Assuming exposure to all of the content from STAT 601 – Statistical Methods for Healthcare Research

description

Review of Methods from Prerequisite Course. Assuming exposure to all of the content from STAT 601 – Statistical Methods for Healthcare Research. Presentation Outline. Review of variable t ypes Review will cover both descriptive and inferential methods - PowerPoint PPT Presentation

Transcript of Review of Methods from Prerequisite Course

Page 1: Review of Methods from Prerequisite Course

Review of Methods from Prerequisite

CourseAssuming exposure to all of the content fromSTAT 601 – Statistical Methods for Healthcare Research

Page 2: Review of Methods from Prerequisite Course

Presentation Outline• Review of variable types• Review will cover both descriptive

and inferential methods• Methods for numeric (or possibly

ordinal) response variables• Methods for categorical (or possibly

ordinal) response variables* Before viewing this presentation download and print the supplements!

Page 3: Review of Methods from Prerequisite Course

Brief Review of Data TypesThere are three main data types with further subclasses within some of them.• Continuous – measurements or

counts Important subclasses – discrete, continuous, ratio scale, & interval scale (Wiki these scales)

• Ordinal – ordered categoriesMay be coded numerically and could be treated as such.

• Nominal – unordered categories May also be coded numerically, BUT cannot be treated as such.

Page 4: Review of Methods from Prerequisite Course

Brief Review of Data TypesIn JMP (and SPSS) these are the three classifications. In JMP (which we’ll use)…• Continuous variables are denoted:

• Ordinal variables are denoted:

• Nominal variables are denoted:

Page 5: Review of Methods from Prerequisite Course

ICU Study – used in most examples• This study consists of 200 subjects who

were admitted to an adult intensive care (ICU). A major goal of this study was to predict the probability of survival to hospital discharge of these patients. (Lemeshow, Teres, Avrunin & Pastides, 1988)

• Several measurements were taken at the time of admission and the ultimate survival of the patients was recorded.

Page 6: Review of Methods from Prerequisite Course

ICU Study – used in most examples

The variable descriptions and coding are found in this table.

Comments:Notice that most of the information has been coded numerically, although only Age, Systolic BP, and Heart Rate are continuous.

Some of the dichotomous variables have been created using continuous measurements (e.g. PO2, PH, PCO, etc.)

The Level of Consciousness variable (LOC) could be treated as ordinal as the levels indicate increasing states of unresponsiveness.

Page 7: Review of Methods from Prerequisite Course

Methods for a Numeric Response

Print this flowchart for reference (see website)

• One population inference

• Two population inference

• More than two population inference

Covers both parametric and nonparametric methods.

Page 8: Review of Methods from Prerequisite Course

One or Two or More Populations?

• Is the study comparative in nature or are we making an inference about a single population?

• Most studies are certainly comparative (i.e. multivariable) in nature!

• However, we will review methods for a single numeric variable first.

Page 9: Review of Methods from Prerequisite Course

Methods for a Single Numeric Variable

Descriptive Methods

Visual Descriptions• Histogram• Boxplots• Stem Leaf Plots (archaic)• Cumulative Distribution Plots

(CDF)• Normal Quantile Plots

Numeric Descriptions• Measures of central

tendency• Measures of variation• Measures of relative

standing• Measures of

distributional shape

Page 10: Review of Methods from Prerequisite Course

Plots for a Single Numeric Variable

Visual Summaries of Heart Rate @ Admission (ICU Study)

• Histogram• Boxplots (outlier and quantile)• Normal quantile plot• CDF plot

CDF Plot - shows P(X < x) vs. x

e.g. P(X < 100) = .60 or 60% chance a patient’s heart rate is less or equal to 100 bpm at admission to ICU.

Page 11: Review of Methods from Prerequisite Course

Summary Statistics for a Numeric Variable

Measure of Central Tendency• Mean, Median, Mode (3 M’s) - mode is not unique!• Trimmed Mean (5%) – mean

with the 5% of the obs. trimmed off the tails.

• Geometric Mean - mean in the log-scale transformed back to original scale. Good measure for skewed right data!

 

Page 12: Review of Methods from Prerequisite Course

Summary Statistics for a Numeric Variable

Measure of Relative Standing• Quantiles/Percentiles –

values such that k% of the observations are less and (100-k)% are greater.

• Quartiles – specific percentiles Q1 – first quartile (25th percentile) Q2 – second quartile (median) Q3 – third quartile (75th percentile)

Measures of Shape• Skewness – measures degree of skewness of the distribution. If the distribution is

symmetric (e.g. normal) then Skewness is 0. If Skewness > 0 then distribution is skewed to the right, if Skewness < 0 then distribution is skewed to the left.

• Kurtosis – measures degree of kurtosis. If the distribution is approx. normal the kurtosis is zero. If it is positive the distribution has heavier tails than a normal distribution (outliers on each end) and if it negative the distribution has thinner tails than a normal distribution and more observations near the mean. (Wiki kurtosis for pictures)

Page 13: Review of Methods from Prerequisite Course

Parametric Inference for the Population Mean (m)Assuming either the outcome comes from a normally distributed

population or if the sample size is sufficiently “large”.

Test Statistic

Sample size required formargin of error (E) with

Confidence Interval for m 95% confidence

1 ~

ndfondistributit

ns

xt om

nstx

296.1

E

n

Page 14: Review of Methods from Prerequisite Course

Example: Heart Rate of ICU patients•  

Page 15: Review of Methods from Prerequisite Course

Example: Heart Rate of ICU patientsOutput from JMP

The upper-tail test p-value = .00000238 or (p < .0001),thus we have strong evidence to suggest that patients admitted to the adult ICU have a mean heart rate that would be considered high (i.e. m > 90 bpm).

Furthermore we estimate that the mean resting heart rate of adults admitted to the ICU is between 95.18 bpm and 102.67 bpm with 95% confidence.

Page 16: Review of Methods from Prerequisite Course

Nonparametric Inference for a Single Numeric Variable

If the outcome/response does NOT come from a normally distributed population or if the sample size is NOT sufficiently “large”.

To test the general hypothesis that in the population of patients admitted to the adult ICU have elevated/high resting heart rates we could use the Wilcoxon Signed-Rank Test as an alternative to the t-Test.

1) Form differences and drop any that are 0.2) Compute the signed rank statistics .3) Compare the smaller of these to the critical values from a

Wilcoxon Signed-Rank Test table.4) Better yet, use statistical software!

Page 17: Review of Methods from Prerequisite Course

Nonparametric Inference for a Single Numeric Variable

The upper-tail p-value from Wilcoxon Signed-Rank Test is (p < .0001) thus we conclude that the median heart rate of the population of patients admitted to the adult ICU is considered high (above 90 bpm).

The Wilcoxon Signed-Rank Test is used to make inferences about the population median rather than the mean.

Page 18: Review of Methods from Prerequisite Course

Comparing a Continuous Response Between Two Populations• When comparing a numeric response between

two populations we must first consider the sampling scheme or experiment that generated the data, namely were the two samples drawn independently or dependently?

• For dependent samples, there is a one-to-one correspondence between an individual in one population to an individual in the other.e.g. Pre-test vs. Post-test situations

Page 19: Review of Methods from Prerequisite Course

More on Dependent Samples• Pre-test vs. Post-test, e.g. Before treatment vs.

After treatment (i.e. subjects = blocks)

• Comparing different treatments using the same subjects, e.g. pain relievers used on the same subjects (again subjects = blocks)

• Matched subjects in the two populations according to some criteria, e.g. matched patients on basis of age, race, gender, socioeconomic status, weight, height, existing health conditions, etc. (Note: Need to be careful here!)

Page 20: Review of Methods from Prerequisite Course

• Research Question: Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg?

• Experiment: Measure the blood pressure of 15 patients before and after taking Captopril. Our interest is on the measured changes in blood pressure and whether or not we believe that those changes have a mean greater than 10 mmHg.

Example 1: Captopril & Systolic Blood Pressure

Page 21: Review of Methods from Prerequisite Course

Once the paired differences have been formed we simply treat them as a single numeric response and make inferences accordingly.

Example 1: Captopril & Systolic Blood Pressure

Summary Statistics

Page 22: Review of Methods from Prerequisite Course

Parametric Inference for the Mean Paired Difference (md) Assuming either the paired differences come from a normally

distributed population or if the sample size (i.e. # of pairs) is sufficiently “large”.

Test Statistict-distribution df = n - 1

Confidence Interval for md

Note: These formulae are the same as those for single population mean (m)!

the hypothesized difference under the null hypothesis. Typically this will be 0!

Page 23: Review of Methods from Prerequisite Course

• Research Question: Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg?

• HYPOTHESES , mean decrease in systolic blood pressure 30 minutes following taking Captopril is not greater than 10 mmHg. , mean decrease in systolic blood pressure 30 minutes following taking Captopril is greater than 10 mmHg.

Example 1: Captopril & Systolic Blood Pressure

Page 24: Review of Methods from Prerequisite Course

Example 1: Captopril & Systolic Blood Pressure

We have evidence to suggest that the mean decrease in systolic blood pressure 30 minutes after taking Captopril is more than 10 mmHg (p = .0009). Furthermore, we estimate the mean decrease is between 13.93 mmHg and 23.93 mmHg with 95% confidence.

Page 25: Review of Methods from Prerequisite Course

Nonparametric Inference for Paired Differences

Use if the paired differences do NOT come from a normally distributed population or if the sample size (# of pairs) is NOT sufficiently “large”.

To test the general hypothesis that the change in systolic blood pressure is more than 10 mmHg we could use the Wilcoxon Signed-Rank Test as an alternative to the paired t-Test.

1) Form paired differences and subtract 10, dropping any that are 0. If simply testing for a difference we would not subtract 10.

2) Compute the signed rank statistics .3) Compare the smaller of these to the critical values from a Wilcoxon

Signed-Rank Test table.4) Better yet, use statistical software!

Page 26: Review of Methods from Prerequisite Course

Nonparametric Inference for Paired Differences

We have evidence to suggest the median change in systolic blood pressure 30 minutes following taking Captopril is more than 10 mmHg (p = .0010).

Page 27: Review of Methods from Prerequisite Course

• Another nonparametric option is to use the Sign Test.

• For the Sign Test we simply looks at the number of positive and negative paired differences and computes the p-value using a binomial distribution with n = # of pairs and p = .50.

• This should only be used if the response is difficult to measure or is ordinal !

Nonparametric Inference for Paired Differences

Page 28: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means

• For independent samples we are either:

- drawing samples from two existing populations (i.e. observational study), e.g. males & females, smokers & non-smokers.

- randomly allocating subjects into two populations (i.e. experiment), e.g. treatment vs. placebo, therapy A vs. therapy B, etc.

Page 29: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means• Analysis of these two situations is the

same, although the conclusions reached may differ (i.e. association vs. causation).

• This an example of a bivariate analysis, Y = response (continuous, possibly ordinal) X = population identifier (nominal)

• If the response is normally distributed or if both sample sizes are “large” we can use a parametric approach.

Page 30: Review of Methods from Prerequisite Course

Example: Heart Rate and Type of Admission

Type of admission (TYP)1 = ER0 = non-ER

The heart rate at admission appears higher for those admitted through the ER, about 10 bpm higher on average.

This apparent difference could be due to chance variation however!

Heart rate is approximately normally distributed for both samples.

Variation in the heart rates appear to be similar.

Page 31: Review of Methods from Prerequisite Course

Example: Heart Rate and Type of Admission

Type of admission (TYP)1 = ER0 = non-ER

The separation between the CDF plots suggest a potential difference in the heart rate distributions for patients admitted to the adult ICU through the ER and those that were not.

In particular, it looks like the heart rate of patients admitted through the ER have tendency to have higher heart rates.

Page 32: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means

For testing equality of meansHo: m1 = m2 or (m1 – m2) = 0The possible alternatives are:Ha: m1 > m2 or (m1 – m2) > 0 (upper-tailed)Ha: m1 < m2 or (m1 – m2) < 0 (lower-tailed)Ha: m1 m2 or (m1 – m2) 0 (two-tailed)Note: If we wanted to establish that one mean was say e.g. at least 10 units larger than the other we could replace 0 in these statements by 10. In general to establish a difference of at least D units then we replace 0 by D.

Page 33: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means

Test statistic ~ t-distribution (df)

The standard error of the difference in the sample means and the degrees of freedom (df) are calculated two different ways depending on whether or not we assume the population variances are equal.

Rule O’ Thumb:Assume variances are equal only if neither sample variance is more than twice that of the other sample variance.

Page 34: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means – Pooled t-Test𝑆𝐸 (𝑦1 − 𝑦2 )=√𝑠𝑝

2 ( 1𝑛1

+ 1𝑛2 ) where 𝑠𝑝

2=(𝑛1 −1 ) 𝑠1

2+(𝑛2− 1 )𝑠22

𝑛1+𝑛2

Pooled estimate of the common variance to both populations, it is essentially a weighted average of the two sample variances. It is called pooled because both samples are combined (or pooled) to estimate the variance common to both populations.

The degrees of freedom for the associated test statistic is

𝑡= ( 𝑦1− 𝑦2 )− ∆𝑆𝐸 ( 𝑦1− 𝑦2 )

Test statistic

~ t-distribution (df)

)

Confidence Interval for (

Assuming common variance

Page 35: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Means – Welch’s t-Test𝑆𝐸 (𝑦1 − 𝑦2 )=√( 𝑠1

2

𝑛1+

𝑠22

𝑛2) where

𝑑𝑓 ≈( 𝑠1

2

𝑛1+

𝑠22

𝑛2)

2

( 𝑠12

𝑛1 )2

𝑛1− 1+( 𝑠2

2

𝑛2 )2

𝑛2 −1

The degrees of freedom for the associated test statistic is

𝑡= ( 𝑦1− 𝑦2 )− ∆𝑆𝐸 ( 𝑦1− 𝑦2 )

Test statistic

~ t-distribution (df)

)

Confidence Interval for ( Always round down!

Assuming , i.e. unequal variances

Page 36: Review of Methods from Prerequisite Course

• We can formally test the equality of the population variances rather than use the Rule O’ Thumb.

• In some situations it may also be of interest to compare the population variances in addition to the population means.

• HYPOTHESES (or we could use a one-tailed alternative) Test Statistic (for comparing two population variances) ~ F-distribution with respectively. Large F statistic value small p-value (Reject Ho)• There are several other tests for equality of

variance.

Independent Samples Comparison of Two Population Means – Formally Testing Equality of the Population Variances Assumption

Page 37: Review of Methods from Prerequisite Course

Example: Heart Rate and Type of Admission

Type of admission (TYP)1 = ER0 = non-ER

The F-test for comparing population variances do not provide evidence of a significant difference in heart rate variation between the two groups of patients (p = .3992).

None of the other tests (O’Brien, Brown-Forsythe, Levene, Bartlett) have significant p-values either.

Given these results we could conduct a pooled t-Test to compare the mean heart rates.

Page 38: Review of Methods from Prerequisite Course

Example: Heart Rate and Type of Admission

Type of admission (TYP)1 = ER0 = non-ER

The two-tailed p-value = .0131, thus we conclude there is a statistically significant difference in the population mean heart rates between these two populations of patients admitted to the adult ICU.

Furthermore, we estimate that the mean heart rate for patients admitted to the adult ICU through the emergency room anywhere from 2.26 bpm to 19 bpm larger than the mean for those who were not admitted to the ICU through the emergency room. Note: order of subtraction 1-0, i.e. , i.e. ER mean – non-ER mean.

The results from the confidence interval lend themselves to a brief discussion of the concept of practical significance and/or effect size (ES). While a difference in the means of 19 bpm seems physiologically meaningful, the same could not be said for the lower confidence limit which is roughly 2 bpm. We will examine the concepts of practical significance and effect size in more detail later in the course.

The output from the non-pooled option (t-Test) is presented in exactly the same format.

Page 39: Review of Methods from Prerequisite Course

Nonparametric Testing for Two Independent Samples• If the population distributions do not appear to be

normally distributed or if the sample sizes are “small”, we may choose to use a nonparametric test to compare the size of the values from the two populations.

• There a few options available but by far the most frequently used nonparametric test for comparing a numeric response across two populations is the Wilcoxon Rank Sum Test (also known as the Mann-Whitney Test).

• The test utilizes the sum of the ranks assigned to observations from the two populations when the two samples are combined. Essentially the larger the difference in the rank sums when taking the sample sizes into account, the more evidence we have against equality of the two distributions in terms of the size of the values.

Page 40: Review of Methods from Prerequisite Course

Nonparametric Testing for Two Independent SamplesHYPOTHESES, i.e. the distribution of the two populations is essentially the same, particularly in terms of the size of the values., i.e. the distributions of the two population is different, specifically we believe one distribution is shifted to the right or left of the other. Note: One-tailed alternatives are fine also, meaning we can specify which population has larger values than the other in the alternative.

Here the alternative hypothesis states population A is shifted to the right of population B, i.e. population A has larger values than population B.

Page 41: Review of Methods from Prerequisite Course

Example: Heart Rate and Type of Admission

Type of admission (TYP)1 = ER0 = non-ER

The Wilcoxon Rank Sum Test p-value = .0137, thus we conclude the two populations of patients differ in terms of their heart rate at admission to the adult ICU. In particular, we conclude those that were admitted to the adult ICU via the ER had higher heart rates in general than those not admitted through the ER.

Page 42: Review of Methods from Prerequisite Course

Comparing a Continuous Response Between Three or More Populations• As with two populations comparisons,

there are independent and dependent sampling schemes when comparing several populations.

• Assuming normality and equality of population variances across populations both situations use a form of Analysis of Variance (ANOVA) to compare the means of the populations.

Page 43: Review of Methods from Prerequisite Course

Comparing a Continuous Response Between Three or More Populations• We will cover ANOVA in more detail later

in the course and review both one-way ANOVA and randomized block designs as part of that discussion.

• For now we will look at an quick example of each.

Page 44: Review of Methods from Prerequisite Course

Example: Age and Race (Descriptive Summaries)

Race of Patient1 = White2 = Black3 = Other

Although this may not be of interest in this study, here we compare the ages of patients in this study across race classified as white, black, or other.

White patients in the sample were the oldest with a mean age of 59, while the other two race groups have a mean age of around 47.

The age distributions do appear to be left-skewed or kurtotic (i.e. non-normal) and the standard deviations differ enough that equality of variances may be suspect.

Page 45: Review of Methods from Prerequisite Course

Example: Age & Race (Comparing Variances)

Race of Patient1 = White2 = Black3 = Other

All four tests for equality of variance do provide statistically significant evidence of unequal population variances (p > .05).

If these tests did suggest a problem with the equality of population variance assumption we could use Welch’s ANOVA (like the non-pooled t-Test) to determine if the mean ages differed across race.

Page 46: Review of Methods from Prerequisite Course

Example: Age and Race (One-way ANOVA)

Race of Patient1 = White2 = Black3 = Other

From the one-way ANOVA F-test we conclude that at least two population means differ (p = .0222).

With only three populations controlling for the experiment-wise error rate using Tukey’s HSD is not vital, as there are only three possible pairwise comparisons (white vs. black, white vs. other, and black vs. other).

Page 47: Review of Methods from Prerequisite Course

Example: Age & Race (Multiple Comparisons)Race of Patient1 = White2 = Black3 = Other

Using Tukey’s HSD we see that none of the pairwise comparisons suggest a difference between the population means (all p > .05).

Two-sample t-Tests (pooled) not controlling for experiment-wise error rate (EER)

Without controlling for EER we see that the mean ages of white and black patients differ significantly (p = .0283). However, the estimated difference in means covers a wide range 1.26 years to 22.24 years. On the low end of the confidence interval this difference is certainly inconsequential.

Page 48: Review of Methods from Prerequisite Course

Example: Age and Race (Nonparametric Test and Multiple Comparisons)

Race of Patient1 = White2 = Black3 = Other

The nonparametric alternative to the one-way ANOVA F-test is the Kruskal-Wallis test. We conclude the populations differ in terms of the age distributions (p = .0110).

The nonparametric alternative to Tukey’s HSD is the Steel-Dwass Method which suggests that the age distributions between white and black patients significantly differ (p = .0268). Again the CI for the difference in typical ages is wide, from 1 year to 25 years, with the low end representing a very small difference.

Page 49: Review of Methods from Prerequisite Course

Methods for a Numeric Response

We have just reviewed the following::• One population inference

• Two population inference

• More than two population inference

Covered both parametric and nonparametric methods.

We will cover block designs and their analysis when we cover ANOVA in more detail later in the course.

Page 50: Review of Methods from Prerequisite Course

Methods for a Categorical ResponseFor a dichotomous categorical response we covered many of the methods in the flow chart to the left in the prerequisite course.

A dichotomous response has two levels which we can generically classify as “success” or “failure” or “yes” or “no”.

We will cover more advanced methods for the analysis of categorical data later in the course.

We will briefly review some these methods from the prerequisite course using the ICU study data and data from other studies.

Page 51: Review of Methods from Prerequisite Course

ICU Study – variables & codingThe variable descriptions and coding are found in this table.

Comments:There are numerous dichotomous variables in this study, vital status (STA) is the primary outcome of interest.

Some of the dichotomous variables have been created using continuous measurements (e.g. PO2, PH, PCO, etc.)

The Level of Consciousness variable (LOC) could be treated as ordinal as the levels indicate increasing states of unresponsiveness.

Page 52: Review of Methods from Prerequisite Course

Summary of Inference for Single Proportion (p)Assuming the sample size n sufficiently “large”. Test Statistic Sample size required

for margin of error (E) with 95% confidence

assuming prior value for p

Confidence Interval for p Conservative

approach

normal standard ~)1(

ˆ

npp

ppz

oo

o

nppzp )ˆ1(ˆˆ

2

2 )1(96.1E

ppn

2

2

496.1E

n

Page 53: Review of Methods from Prerequisite Course

Exact inferential methods using the binomial distribution

Binomial Exact Test (one-sided)

and

Find the probability of observing the number of successes as extreme or more extreme than those observed assuming the null is true.

Use a binomial table to calculate the p-value

A two-sided alternative would have p-value equal to the smaller of the probabilities above multiplied by 2.

Summary of Inference for Single Proportion (p)

Page 54: Review of Methods from Prerequisite Course

Exact inferential methods using the binomial distribution

Binomial Exact 95% Confidence Interval for

Use a binomial table to find the proportions that make the following probability statements true:

The Exact 95% Confidence Interval for p is given by

Summary of Inference for Single Proportion (p)

Page 55: Review of Methods from Prerequisite Course

Example: Gender of ICU PatientsResearch Question: Is there evidence that a majority of adult ICU admissions are men?

Here the parameter of interest is :

p = proportion of adult ICU admissions that are men.

In our sample of n = 200 patients 124 or 62% were men, which certainly represents a majority. However, this could be due to sampling variation and in actuality there is an equal balance of ICU admissions based on gender.

Page 56: Review of Methods from Prerequisite Course

Example: Gender of ICU PatientsResearch Question: Is there evidence that a majority of adult ICU admissions are men?

Thus we have evidence that a majority of patients admitted to the adult ICU are males.

= (.5527, .6873) or (55.27%, 68.73%)

Page 57: Review of Methods from Prerequisite Course

Example: Gender of ICU PatientsResearch Question: Is there evidence that a majority of adult ICU admissions are men?

Thus we have evidence that a majority of patients admitted to the adult ICU are males.

Thus a Exact 95% CI for p is (54.9%, 68.7%).

Page 58: Review of Methods from Prerequisite Course

Independent Samples Comparison of Two Population Proportions

For testing equality of two proportionsHo: p1 = p2 or (p1 – p2) = 0The possible alternatives are:Ha: p1 > p2 or (p1 – p2) > 0 (upper-tailed)Ha: p1 < p2 or (p1 – p2) < 0 (lower-tailed)Ha: p1 p2 or (p1 – p2) 0 (two-tailed)Note: If we wanted to establish that one proportion was say e.g. at least .10 or 10 percentage points larger than the other we could replace 0 in these statements by .10. In general to establish a difference of at least D , then we replace 0 by D.

Page 59: Review of Methods from Prerequisite Course

Test Statistic for Large Independent Samples

For testing to see if difference is at least D

Ho: (p1 – p2) = DHA: (p1 – p2) > D (upper-tail) (p1 – p2) < D (lower-tail)

Provided n1p1 > 10 & n1q1 > 10 and n2 p2 > 10 & n2q2 > 10

0)( and ˆ1ˆ , ˆ1ˆ where,

N(0,1) i.e. normal, standard ~ˆˆˆˆ

)ˆˆ(

212211

2

22

1

11

21

D

D

pppqpq

nqp

nqp

ppz

Most important case

Independent Samples Comparison of Two Population Proportions (

Page 60: Review of Methods from Prerequisite Course

Provided n1p1 > 10 & n1q1 > 10 n2 p2 > 10 & n2q2 > 10

The confidence interval for (p1 – p2) has a general form:

ˆ1ˆ and ˆ1ˆ where,

ˆˆˆˆ)()ˆˆ(

)()()(

2211

2

22

1

1121

pqpq

nqp

nqpvaluezpp

estimateSEvaluezestimate

z-values90% z = 1.64595% z = 1.96099% z = 2.576

Independent Samples Comparison of Two Population Proportions (

Page 61: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status• The ICU study is a case control study – that

is 40 patients who died and 160 who did not die were sampled and the admission related variables were collected.

• Because of this we cannot calculate the probability of patient death using these data.

• To identify variables related to survival we use vital status (STA) as the population identifier, i.e. as the X variable in JMP.

Page 62: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status

Amongst the patients in the study who died in the ICU, 65% were admitted from the Medical unit and 35% from the Surgical unit. For patients that did not die 58.1% were admitted from the Surgical unit and 41.9% were admitted from the Medical unit.

These percentages are used to construct the mosaic plot and are displayed in the cells of the plot. The 2 X 2 contingency table below the plot gives frequencies and row percentages (i.e. a percentage breakdown of the column variable within each row). You can see the row %’s are the same as those discussed above.

Page 63: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status

The large sample test p-values and confidence interval for the difference in the proportions are given under the heading Two Sample Test for Proportions. The proportion of patients admitted to the ICU from the surgical unit is significantly higher for those that survived (p = .0038). This finding is certainly expected.

We estimate that the percentage of patients coming from the surgical unit is between 5.9 and 38.7 percentage points higher for ICU survivors. The difference in proportions is also known as the attributable risk (AR).

Page 64: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status

Another large sample test for 2 X 2 tables is the chi-square test, either Pearson’s or Likelihood Ratio, which suggests that the proportion of patients coming the surgical unit differs for survivors and non-survivors (p = .0087 or .0085).

The Fisher’s Exact Test p-values do not rely on the large sample assumption. This test is preferable to either of the large sample procedures. The alternative hypothesis is communicated along with the associated p-values. The Left p-value = .0071, which leads us to conclude that the proportion of patients coming from the surgical unit is higher for the survivor group.

Page 65: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status

The Odds Ratio (OR) is used to quantify risk when a case-control study was used. The easiest way to calculate the OR is the formula:The a cell in table corresponds to those that have the adverse outcome (in this case death) and have the risk factor present – which in this case is coming from the medical unit (vs. surgical unit). Thus a = 26 and subsequently b = 14, c = 67, and d = 93.

Thus the estimated OR is

Page 66: Review of Methods from Prerequisite Course

Example: Comparing Service at Admission Across Survival Status

From the previous slide the estimated odds ratio is

However JMP reports a different OR, this is because JMP does computations alpha-numerically, essentially reversing the roles of 0 and 1. If JMP gives an OR that is inconsistent with your calculation, then you simply need to reciprocate the OR from JMP.

OR = 1/.388 = 2.58, giving the result we want.

Thus the 95% CI for the OR is given by (1/.79828, 1/.188511) = (1.25 , 5.30) .

Patients admitted to the ICU from the medical unit have at a minimum a 25% increase in odds for death. We estimate the odds ratio is between 1.25 and 5.30.

Page 67: Review of Methods from Prerequisite Course

Quick Recapo We have just compared the

proportion of patients in both service at admission categories across survival status (p1 vs. p2) using the z-test, a CI for (p1 – p2) & Fisher’s Exact Test.

o Computed the Odds Ratio (OR) and found a CI for the population OR.

Page 68: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

IndependenceOne way to generalize the question of

interest to the researchers is to think of it as follows:

Q: Is there an association between the service at admission and the survival status of patients admitted to the adult ICU?

Page 69: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

IndependenceIf there is not an association, we

say that these variables are independent.

In the probability we say that two events A and B are said to be independent if P(A|B) = P(A).

Page 70: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

IndependenceIn the context of our study this

would mean for example, P(Medical|Patient Survived) =

P(Medical)i.e. knowing that the patient survived

tells you nothing about the chance that they came to the ICU from the medical unit vs. the surgical unit.

Page 71: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

Independence

When we consider this percentage conditioning on survival status we see that relationship for independence does not hold for these data.P(Medical|Died) = 26/40 = .650P(Medical|Survived) = 67/160 = .419

P(Medical) = 93/200 = .465 In this study 46.5% of the patients admitted to the adult ICU came from the medical unit.

Should both be equal to .465

Page 72: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

Independenceo Of course the observed differences

could be due to random variation and in truth it may be the case that disease and risk factor status are independent.

o Therefore we need a means of assessing how different the observed results are from what we would expect to see if the these two factors were independent.

Page 73: Review of Methods from Prerequisite Course

2 X 2 Example: Case-Control Study Survival Status and Service at

AdmissionSurvivalStatus

Serviced in Medical Unit

Serviced in Surgical Unit

RowTotals

Died (Case)

26 14 40

Survived(Control)

67 93 160

ColumnTotals

93 107 200C1 C2

R1

R2

n

a b

c d

Page 74: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

Independence

From this table we can calculate the conditional probability of admission from medical given that the patient died as follows:

1

)|(RaDiseaseRiskP

The unconditional probability of risk presence (admission from medical) for these data is given by:

nC

RiskP 1)(

and setting these to equal we have

nCR

an

CRa 111

1

Page 75: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

IndependenceThus we expect the frequency in the a cell to

be equal to:

Similarly we find the following expected frequencies for the cells making up the 2 X 2 table

nCR

a 11

nCR

dnCR

c

nCR

bnCR

a

2212

2111

Page 76: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

IndependenceIn general we denote the observed

frequency in the ith row and jth column as or just O for short.

We denote the expected frequency for the ith row and jth column as

ijO

nCR

E jiij or just E for

short.𝑅𝑖=𝑟𝑜𝑤𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑟𝑜𝑤𝑖𝑎𝑛𝑑𝐶 𝑗=𝑐𝑜𝑙𝑢𝑚𝑛𝑡𝑜𝑡𝑎𝑙 𝑓𝑜𝑟 𝑐𝑜𝑙𝑢𝑚𝑛 𝑗

Page 77: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

Independenceo To measure how different our

observed results are from what we expected to see if the two variables in question were independent we intuitively should look at the difference between the observed (O) and expected (E) frequencies, i.e. O – E or more specifically

o However this will give too much weight to differences where these frequencies are both large in size.

ijij EO

Page 78: Review of Methods from Prerequisite Course

Development of a Test Statistic to Measure Lack of

Independenceo One test statistic that addresses the

“size” of the frequencies issue is Pearson’s Chi-Square (c2)

( )

( )

)1()1(h wit

on distributi squared-chi~~ 2

1 1

2

cells all

22

crdf

EEO

EEO

r

i

c

j ij

ijij c

cNotice this test statistic still uses (O – E) as the basic building block. This statistic will be large when the observed frequencies do NOT match the expected values for independence.

Page 79: Review of Methods from Prerequisite Course

Chi-square Distribution (c2)

This is a graph of the chi-square distribution with 4 degrees of freedom. The area to the right of Pearson’s chi-square statistic give the p-value. The p-value is always the area to the right!

p-value

c2

Page 80: Review of Methods from Prerequisite Course

2 X 2 Example: Case-Control Study Survival Status and Service at

AdmissionSurvivalStatus

Served by Medical Unit

Served by Surgical Unit

RowTotals

Died (Case)

26 14 40

Healthy(Control)

67 93 160

ColumnTotals

93 107 200C1 C2

R1

R2

n

O11 O12

O21 O22

Page 81: Review of Methods from Prerequisite Course

Calculating Expected Frequencies

 Survival Status and Service at Admission

SurvivalStatus

Served by MedicalUnit

Served by SurgicalUnit

RowTotals

Died(Case)

26 14 40

Survived(Control)

67 93 160

ColumnTotals

93 107 366

85.6 200

10716074.4

20093160

21.4 200

1074018.6

2009340

2222

1221

2112

1111

nCRE

nCRE

nCRE

nCRE

(18.6)

(21.4)

(74.4) (85.6)

C1 C2

R1

R2

n

celljth and rowith for frequency expected ijE

Page 82: Review of Methods from Prerequisite Course

Calculating the Pearson Chi-square

( )

( ) ( )

( ) ( )

0087

1)12()12( 6.879

6.879 640. .736 56.294.2

6.856.8593

4.744.7467

4.214.2114

6.186.1826

2

22

22

cells all

22

.p-value

df

EEO

c

c

http://www.stat.tamu.edu/~west/applets/chisqdemo.html

Page 83: Review of Methods from Prerequisite Course

Chi-square Probability Calculator in JMP

Enter the test statistic value and df and thep-value is automatically calculated.p-value = P(c2 6.879) .0087

Page 84: Review of Methods from Prerequisite Course

2 X 2 Example: Case-Control Study

Service at Admission and Survival Status

Conclusion:We have strong evidence to suggest that

at service at admission and survival status are NOT independent, and thus conclude they are associated or related (p =.0087). In particular, we found that the proportion of patients admitted to the adult ICU from the medical unit was higher amongst patients who did not survive.

Page 85: Review of Methods from Prerequisite Course

• The test used to compare two proportions using dependent samples is called McNemar’s test.

• As with most tests, there are both a “large” sample and “small” sample version of the test.

• The small sample version uses the binomial distribution and is an exact test, so technically there is no reason to use the large sample version, though many do.

Dependent Samples Comparison of Two Population Proportions (

Page 86: Review of Methods from Prerequisite Course

Example: Low pH and Elevated PCO2 Levels • For each patient in the ICU study the

pH levels and PCO2 levels found in their blood gases were measured. If pH levels were below 7.5 they were coded as being low (1) or not (0). If PCO2 levels were above 45 mmHg they were coded as being high (1) or not (0).

• pH < 7.5 Low pH (bad)• PCO2 > 45 mmHg Elevated PCO2 (bad)

Page 87: Review of Methods from Prerequisite Course

Example: Low pH and Elevated PCO2 Levels • If we wish to compare the proportion

of patients with low/“bad” pH levels to the proportion of patients with elevated/ “bad” PCO2 levels we could not compare them using the independent samples approach because these measurements are being made on the same patients. Thus we have dependent samples.

Page 88: Review of Methods from Prerequisite Course

Example: Low pH and Elevated PCO2 Levels The mosaic plot shows that the relationship between pH and PCO2 levels. Patients with low pH levels are more likely to also have high PCO2 levels. Amongst those with low pH 61.5% have high PCO2 levels and amongst those with normal pH levels only 6.4% have high PCO2 levels.

Fisher’s Exact test confirms that the difference in the percentages discussed above are statistically significant (p < .0001). We can conclude that have low pH levels are more likely to have high PCO2 levels.

This analysis however, does not compare the incidence of these two conditions to one another, it only suggests that the two conditions are significantly related.

Page 89: Review of Methods from Prerequisite Course

Example: Low pH and Elevated PCO2 Levels The 2 X 2 contingency table constructed by cross-tabulating these levels vs. one another is shown to the left.

We can see that of the patients have high PCO2 levels.

Also of the same patients have low pH levels.

So in our sample of ICU patients we see that a higher percentage of them have elevated PCO2 levels, but is this difference statistically significant?

McNemar’s test is used to determine this.The results of this test from JMP (using the large sample test) is shown to the left.

The McNemar’s test p-value is not significant (p = .0896). Therefore we cannot conclude the differences in these two percentages is statistically significant. This is a two-sided p-value!

Page 90: Review of Methods from Prerequisite Course

Exact McNemar’s Test: p-values

(uses binomial distribution)Ha: p1 > p2 Reject Ho if

Ha: p1 < p2 Reject Ho if

Ha: p1 = p2 Reject Ho if

)50.),(|( pcbnbXP

)50.),(|( pcbncXP

)50.),(|),max((2 pcbncbXP

Use either binomial probability tables or computer software to find these probabilities.

Page 91: Review of Methods from Prerequisite Course

Example: Low pH and Elevated PCO2 Levels Here and , therefore we have discordant pairs.

If our research hypothesis was that a greater proportion of patients had elevated PCO2 levels than had low pH levels the p-value is found using the binomial distribution as:

If our research hypothesis was that a greater proportion of patients had low pH levels than had elevated PCO2 levels the p-values is found using the binomial distribution as:

Finally the two-tailed p-value

Exact McNemar’s Test using the binomial distribution.

Notice the difference in the two-tailed p-values from the exact vs. large sample approximation.

Page 92: Review of Methods from Prerequisite Course

Methods for a Categorical Response

For a dichotomous categorical response we have just reviewed the following:

• One population inference

• Two population inference

• Covered both large sample and exact methods.

We will cover the Cox-Stuart (or Cochran-Armitage) test for trend later in the course when cover more advanced methods for analyzing categorical data.

Page 93: Review of Methods from Prerequisite Course

• Data in 2 X 2 Tables (covered above)Comparing two population proportions using independent samples (Fisher’s Exact Test)Comparing two population proportions using

dependent samples (McNemar’s Test)Relative Risk (RR), Odds Ratios (OR), Risk Difference, Attributable Risk (AR), & NNT/NNH

• Data in r X c TablesTests of Independence & Association and Homogeneity.

Methods for a Categorical Response

Page 94: Review of Methods from Prerequisite Course

Example: Response to Treatment and Histological Type of Hodgkin’s Disease

In this study a random sample of 538 patients diagnosed with some form of Hodgkin’s Disease was taken and the histological type: nodular sclerosis (NS), mixed cellularity (MC), lymphocyte predominance (LP), or lymphocyte depletion (LD) was recorded along with the outcome from standard treatment which was recorded as being none, partial, or complete remission.

Q: Is there an association between type of Hodgkin’s and response to treatment? If so, what is the nature of the relationship?

Page 95: Review of Methods from Prerequisite Course

Example: Response to Treatment and Histological Type of Hodgkin’s

DiseaseType None Partia

lPositive

RowTotals

LD 44 10 18 72

LP 12 18 74 104

MC 58 54 154 266

NS 12 16 68 96

ColumnTotals

126 98 314 n = 538

Some Probabilities of Potential InterestProbability of Positive Response to TreatmentP(positive) = 314/538 = .5836

Probability of Positive Response to Treatment Given Disease TypeP(positive|LD) = 18/72 = .2500P(positive|LP) = 74/104 = .7115P(positive|MC) = 154/266 = .5789P(positive|NS) = 68/96 = .7083

Notice the conditional probabilities are not equal to the unconditional!!!

Page 96: Review of Methods from Prerequisite Course

Mosaic plot of the resultsResponse to Treatment vs. Histological Type Clearly we see that LP

and NS respond most favorably to treatment with over 70% of those sampled having experiencing complete remission, whereas lymphocyte depletion has a majority (61.1%) of patients having no response to treatment.A statistical test at this point seems unnecessary

as it seems clear that there is an association between the type of Hodgkin’s disease and the response to treatment, nonetheless we will proceed…

Page 97: Review of Methods from Prerequisite Course

Example: Response to Treatment and Histological Type of Hodgkin’s

DiseaseType None Partia

lPositive

RowTotals

LD 44 10 18 72

LP 12 18 74 104

MC 58 54 154 266

NS 12 16 68 96

ColumnTotals

126 98 314 n = 538

03.56 538

31496...

42.02 538

3147213.11

5389872

86.61 538

12672

3443

3121

2112

1111

nCR

E

nCR

E

nCR

E

nCR

E

(16.86)

(13.11)

(42.02)

(24.36)

(18.94)

(60.69)

(62.30)

(48.45)

(155.25)

(22.48)

(17.49)

(56.03)

Page 98: Review of Methods from Prerequisite Course

Pearson’s Chi-Square Test of Independence

Pearson’s Chi-Square (c2)

( )

( )

)1()1(h wit

on distributi squared-chi~~ 2

1 1

2

cells all

22

crdf

EEO

EEO

r

i

c

j ij

ijij c

c

Notice this test statistic still uses (O – E) as the basic building block. This statistic will be large when the observed frequencies do NOT match the expected values for independence.

Page 99: Review of Methods from Prerequisite Course

Chi-square Distribution (c2)

This is a graph of the chi-square distribution with 4 degrees of freedom. The area to the right of Pearson’s chi-square statistic give the p-value. The p-value is always the area to the right!

p-value

c2

Page 100: Review of Methods from Prerequisite Course

Example: Response to Treatment and Histological Type of Hodgkin’s

DiseaseType None Partia

lPositive

RowTotals

LD 44 10 18 72

LP 12 18 74 104

MC 58 54 154 266

NS 12 16 68 96

ColumnTotals

126 98 314 n = 538

(16.86)

(13.11)

(42.02)

(24.36)

(18.94)

(60.69)

(62.30)

(48.45)

(155.25)

(22.48)

(17.49)

(56.03)

( )

( )

( ) ( )

( )

.0001 6 89.75

89.7503.56

03.5668 . . .

11.1311.1310

86.1686.1644

2

2

222

1 1

22

cells all

22

valuepdf

EEO

EEO

r

i

c

j ij

ijij

c

c

c

c

We have strong evidence of an association between the type of Hodgkin’s and response to treatment (p < .0001).

Page 101: Review of Methods from Prerequisite Course

Summary of Review• We have reviewed most of the methods

covered in the prerequisite course that were organized in the flow charts for a numeric response and for a dichotomous categorical response.

• Additionally we reviewed the chi-square test of independence for r x c contingency tables.

• The other major topics covered in the prerequisite course that were not reviewed are basic study design, correlation, and regression modeling. We will review and extend our coverage of these topics later in the course.