Introduction to Biostatistics for Clinical and Translational Researchers

Introduction to Biostatistics for Clinical and Translational

Researchers

KUMC Departments of Biostatistics & Internal Medicine

University of Kansas Cancer Center

FRONTIERS: The Heartland Institute of Clinical and Translational Research

Course Information

Jo A. Wick, PhDOffice Location: 5028 RobinsonEmail: [email protected]

Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Events and Opportunities’

http://biostatistics.kumc.edu/

Inferences: Hypothesis Testing

Experiment

An experiment is a process whose results are not known until after it has been performed.The range of possible outcomes are known in advanceWe do not know the exact outcome, but would like to

know the chances of its occurrenceThe probability of an outcome E, denoted P(E), is

a numerical measure of the chances of E occurring.0 ≤ P(E) ≤ 1

Probability

The most common definition of probability is the relative frequency view:

Probabilities for the outcomes of a random variable x are represented through a probability distribution:

# of times =P = =

total # of observations of

x ax a

x

x

P(x

)

0 2 4 6 8 10 12 14

0.0

00

.05

0.1

00

.15

1 2 3 4 5 6 7 8 9 10 11 12

05

10

15

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

Probability of LOS = 6 days

Length of stay = 6 days

Population Parameters

Most often our research questions involve unknown population parameters:

What is the average BMI among 5th graders?

What proportion of hospital patients acquire a hospital-based infection?

To determine these values exactly would require a census.

However, due to a prohibitively large population (or other considerations) a sample is taken instead.

Sample Statistics

Statistics describe or summarize sample observations.

They vary from sample to sample, making them random variables.

We use statistics generated from samples to make inferences about the parameters that describe populations.

Sampling Variability

Population

Samples

0 1x s

0.15 1.1x s

0.1 0.98x s Sampling Distribution of x

μ σ

Recall: Hypotheses

Null hypothesis “H0”: statement of no differences or association between variablesThis is the hypothesis we test—the first step in the

‘recipe’ for hypothesis testing is to assume H0 is true

Alternative hypothesis “H1”: statement of differences or association between variablesThis is what we are (usually) trying to prove

Hypothesis Testing

One-tailed hypothesis: outcome is expected in a single direction (e.g., administration of experimental drug will result in a decrease in systolic BP)H1 includes ‘<‘ or ‘>’

Two-tailed hypothesis: the direction of the effect is unknown (e.g., experimental therapy will result in a different response rate than that of current standard of care)H1 includes ‘≠‘

Hypothesis Testing

The statistical hypotheses are statements concerning characteristics of the population(s) of interest:Population mean: μPopulation variability: σPopulation rate (or proportion): πPopulation correlation: ρ

Example: It is hypothesized that the response rate for the experimental therapy is greater than that of the current standard of care.πExp > πSOC ← This is H1.

Recall: Decisions

Type I Error (α): a true H0 is incorrectly rejected“An innocent man is proven GUILTY in a court of law”Commonly accepted rate is α = 0.05

Type II Error (β): failing to reject a false H0

“A guilty man is proven NOT GUILTY in a court of law”Commonly accepted rate is β = 0.2

Power (1 – β): correctly rejecting a false H0

“Justice has been served”Commonly accepted rate is 1 – β = 0.8

Decisions

ConclusionTruth

H1 H0

H1 Correct: Power Type I Error

H0 Type II Error Correct

Basic Recipe for Hypothesis Testing

1. State H0 and H1

2. Assume H0 is true ← Fundamental assumption!!

3. Collect the evidence—from the sample data, compute the appropriate sample statistic and the test statistic

Test statistics quantify the level of evidence within the sample—they also provide us with the information for computing a p-value (e.g., t, chi-square, F)

4. Determine if the test statistic is large enough to meet the a priori determined level of evidence necessary to reject H0 (. . . or, is p < α?)

Example: Carbon Monoxide

An experiment is undertaken to determine the concentration of carbon monoxide in air.

It is a concern that the actual concentration is significantly greater than 10 mg/m3.

Eighteen air samples are obtained and the concentration for each sample is measured.The outcome x is carbon monoxide concentration in

samples.The characteristic (parameter) of interest is μ—the true

average concentration of carbon monoxide in air.

Step 1: State H0 & H1

H1: μ > 10 mg/m3 ← We suspect!

H0: μ ≤ 10 mg/m3 ← We assume in order to test!

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

μ = 10

Step 2: Assume μ = 10

Step 3: Evidence

10.25 10.37 10.66

10.47 10.56 10.22

10.44 10.38 10.63

10.40 10.39 10.26

10.32 10.35 10.54

10.33 10.48 10.68

Sample statistic: =10.43x

Test statistic:

x μ

ts

n

0 10.43 101.79

1.0218

What does 1.79 mean? How do we use it?

Student’s t Distribution

Remember when we assumed H0 was true?

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4


μ = 10


What we were actually doing was setting up this theoretical Student’s t distribution from which the p-value can be calculated:

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

x μt

sn

0 10 100

1.0218

t = 0


Assuming the true air concentration of carbon monoxide is actually 10 mg/mm3, how likely is it that we should get evidence in the form of a sample mean equal to 10.43?

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4


μ = 10=10.43x

P x 10.43 ?


We can say how likely by framing the statement in terms of the probability of an outcome:

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

x μt

sn

0 10 100

1.0218

t = 0

t = 1.79

p = P(t ≥ 1.79) = 0.0456

Step 4: Make a Decision

Decision rule: if p ≤ α, the chances of getting the actual collected evidence from our sample given the null hypothesis is true are very small.The observed data conflicts with the null ‘theory.’The observed data supports the alternative

‘theory.’Since the evidence (data) was actually observed

and our theory (H0) is unobservable, we choose to believe that our evidence is the more accurate portrayal of reality and reject H0 in favor of H1.

Step 4: Make a Decision

What if our evidence had not been in as great of degree of conflict with our theory?p > α: the chances of getting the actual collected

evidence from our sample given the null hypothesis is true are pretty high

We fail to reject H0.

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

10 x =10.1

P x 10.1 ?

Decision

How do we know if the decision we made was the correct one?We don’t!If α = 0.05, the chances of our decision being an incorrect

reject of a true H0 are no greater than 5%.We have no way of knowing whether we made this kind

of error—we only know that our chances of making it in this setting are relatively small.

Which test do I use?

What kind of outcome do you have?Nominal? Ordinal? Interval? Ratio?

How many samples do you have?Are they related or independent?

Types of Tests

One Sample

Measurement Level

Population Parameter

HypothesesSample Statistic

Inferential Method(s)

NominalProportion

πH0: π = π0

H1: π ≠ π0

Binomial test orz test (if np > 10 & nq > 10)

Ordinal Median MH0: M = M0

H1: M ≠ M0

m = p50 Wilcoxon signed-rank test

Interval Mean μH0: μ = μ0

H1: μ ≠ μ0

Student’s t or Wilcoxon (if non-normal or small n)

Ratio Mean μH0: μ = μ0

H1: μ ≠ μ0

Student’s t or Wilcoxon (if non-normal or small n)

xp =

n

x

x

Types of Tests

Parametric methods: make assumptions about the distribution of the data (e.g., normally distributed) and are suited for sample sizes large enough to assess whether the distributional assumption is met

Nonparametric methods: make no assumptions about the distribution of the data and are suitable for small sample sizes or large samples where parametric assumptions are violatedUse ranks of the data values rather than actual data values

themselvesLoss of power when parametric test is appropriate

Types of Tests

Two Independent Samples

Measurement Level

Population Parameters

HypothesesSample

StatisticsInferential Method(s)

Nominal π1, π2

H0: π1 = π2

H1: π1 ≠ π2

Fisher’s exact or Chi-square (if cell counts > 5)

Ordinal M1, M2

H0: M1 = M2

H1: M1 ≠ M2

m1, m2 Median test

Interval μ1, μ2

H0: μ1 = μ2

H1: μ1 ≠ μ2

Student’s t or Mann-Whitney (if non-normal, unequal variances or small n)

Ratio μ1, μ2

H0: μ1 = μ2

H1: μ1 ≠ μ2

Student’s t or Mann-Whitney (if non-normal, unequal variances or small n)

11

1

xp =

n2

22

xp =

n

1x 2x

1x 2x

# Groups

2

Normal or large n

Independent Samples

2-sample t

Dependent Samples

Paired t

Non-normal or small n

Independent Samples

Wilcoxon Signed-Rank

Dependent Samples

Wilcoxon Rank-Sum

> 2

Normal or large n

Independent Samples

ANOVA

Dependent Samples

2-way ANOVA

Non-normal or small n

Independent Samples

Kruskal-Wallis

Dependent Samples

Friedman’s

Comparing Central Tendency

Two-Sample Test of Means

Clotting times (minutes) of blood for subjects given one of two different drugs:

It is hypothesized that the two drugs will result in different blood-clotting times.H1: μB ≠ μG

H0: μB = μG

Drug B Drug G

8.8 8.4 9.9 9.0

7.9 8.7 11.1 9.6

9.1 9.6 8.7 10.4

9.5

x

x1

2

8.75

9.74


What we’re actually hypothesizing: H0: μB - μG = 0

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

μB - μG = 0

x x1 2 0.99

P x x

P x x

1 2

1 2

0.99 ?

0.99 ?

Evidence!


What we’re actually hypothesizing: H0: μB - μG = 0

x

P(x

)

-4 -2 0 2 4

0.0

0.2

0.4

t = 0

x xt

s sn n

1 2

2 21 2

1 2

8.75 9.742.475

0.40p = P(|t| > -2.475) = 0.03

t = -2.48 t = +2.48

***Two-sided tests detect ANY evidence in EITHER direction that the null difference is unlikely!

Assumptions of t

In order to use the parametric Student’s t test, we have a few assumptions that need to be met:Approximate normality of the observationsIn the case of two samples, approximate equality of the

sample variances

Assumption Checking

To assess the assumption of normality, a simple histogram would show any issues with skewness or outliers:

Assumption Checking

Skewness

Assumption Checking

Other graphical assessments include the QQ plot:

Assumption Checking

Violation of normality:

Assumption Checking

To assess the assumption of equal variances (when groups = 2), simple boxplots would show any issues with heteroscedasticity:

Assumption Checking

Rule of thumb: if the larger variance is more than 2 times the smaller, the assumption has been violated

Now what?

If you have enough observations (20? 30?) to be able to determine that the assumptions are feasible, check them.If violated:

• Try a transformation to correct the violated assumptions (natural log) and reassess; proceed with the t-test if fixed

• If a transformation doesn’t work, proceed with a non-parametric test• Skip the transformation altogether and proceed to the non-

parametric test

If okay, proceed with t-test.

Now what?

If you have too small a sample to adequately assess the assumptions, perform the non-parametric test instead.For the one-sample t, we typically substitute the Wilcoxon

signed-rank testFor the two-sample t, we typically substitute the Mann-

Whitney test

Consequences of Nonparametric Testing

Robust!Less powerful because they are based on ranks

which do not contain the full level of information contained in the raw data

When in doubt, use the nonparametric test—it will be less likely to give you a ‘false positive’ result.

Speaking of Power

“How many subjects do we need?”Statistical methods can be used to determine the

required number of patients to meet the trial’s principal scientific objectives.

Other considerations that must be accounted for include availability of patients and resources and the ethical need to prevent any patient from receiving inferior treatment.We want the minimum number of patients required to

achieve our principal scientific objective.

The Size of a Clinical Trial

For the chosen level of significance (type I error rate, α), a clinically meaningful difference (Δ) between two groups can be detected with a minimally acceptable power (1 – β) with n subjects.

Example: Detecting a Difference

Primary objective: To compare pain improvement in knee OA for new treatment A compared to standard treatment S.

Primary outcome: Change in pain score from baseline to 24 weeks (continuous).

Data analysis: Comparison of mean change in pain score of patients on treatment A (μ1) versus standard (μ2) using a two-sided t-test at the α = 0.05 level of significance.


Difference to detect (Δ): It has been determined that a difference of 10 on this pain scale is clinically meaningful.If standard therapy results in a 5 point decrease, our new

therapy would need to show a decrease of at least 15 (5 + 10) to be declared clinically different from the standard.

We would like to be 80% sure that we detect this difference as statistically significant.


What usually occurs on the standard? This is important information because it tells us about the

behavior of the outcome (pain scale) in these patients.If the pain scale has great variability, it may be difficult to

detect small to moderate changes (signal-to-noise)!

‘Signal-to-Noise’C

ha

ng

e in

Pa

in fr

om

Ba

selin

e

S A

Difference = 20!

Ch

an

ge

in P

ain

fro

m B

ase

line

S A

0Difference = 20!


We have:H0: μ1 = μ2 versus H1: μ1 μ2 (Δ= 0) α = 0.051 – β = 0.80Δ = 10

For continuous outcomes we need to determine what difference would be clinically meaningful, but specified in the form of an effect size which takes into account the variability of the data.


Effect size is the difference in the means divided by the standard deviation, usually of the control or comparison group, or the pooled standard deviation of the two groups

where

1 2d

2 21 2

1 2n n


Power Calculations an interesting interactive web-based tool to show the relationship between power and the sample size, variability, and difference to detect.

A decrease in the variability of the data results in an increase in power for a given sample size.

An increase in the effect size results in a decrease in the required sample size to achieve a given power.

Increasing α results in an increase in the required sample size to achieve a given power.

http://www.stat.uiowa.edu/~rlenth/Power/

Inferences on Two Means

Example: Smoking cessationTwo types of therapy: x = {behavioral therapy, literature}Dependent variable: y = % decrease in number of

cigarettes smoked per day after six months of therapy

Behavioral Therapy Literature Only

10 6

20 2

65 0

0 12

30 4

Smoking Cessation

Research question: Is behavioral therapy in addition to education better than education alone in getting smokers to quit?

H0: μ1 = μ2 versus H1: μ1 ≠ μ2

Two independent samples t-test IF:the change is approximately normal OR can be

transformed to an approximate normal distribution (e.g., natural log)

the variability within each group is approximately the same (ROT: no more than 2x difference)

Smoking Cessation

Conclusion: Adding behavioral therapy to cessation education results in—on average—a greater reduction in cigarettes smoked per day at six months post-therapy when compared to education alone (t30.9 = -2.87, p < 0.01).

Reject H0: μ1 = μ2

Smoking Cessation

The 95% confidence interval is:

-8.39 ≤ μ1 - μ2 ≤ -1.42Interpretation: On average, behavioral therapy

resulted in an additional reduction of 4.9% (95%CI: 1.42%, 8.39%) relative to control.

Confidence Intervals

What exactly do confidence intervals represent?Remember that theoretical sampling distribution

concept?It doesn’t actually exist, it’s only mathematical.What would we see if we took sample after sample after

sample and did the same test on each . . .


Suppose we actually took sample after sample . . . 100 of them, to be exact

Every time we take a different sample and compute the confidence interval, we will likely get a slightly different result simply due to sampling variability.



95% confident means: “In 95 of the 100 samples, our interval will contain the true unknown value of the parameter. However, in 5 of the 100 it will not.”



Our “confidence” is in the procedure that produces the interval—i.e., it performs well most of the time.

Our “confidence” is not directly related to our particular interval—we cannot say “The probability that the mean difference is between (1.4,8.4) is 0.95.”

Inferences on More Than Two Means

Example: Smoking cessationThree types of therapy: x = {pharmaceutical therapy,

behavioral therapy, literature}Dependent variable: y = % decrease in number of

cigarettes smoked per day after six months of therapy

Pharmaceutical Therapy Behavioral Therapy Literature Only

10 10 6

30 0 20

60 6 0

32 0 12

65 30 4

Smoking Cessation

Research question: Is therapy in addition to education better than education alone in getting smokers to quit? If so, is one therapy more effective?

H0: μ1 = μ2 = μ3 versus H1: At least one μ is differentMore than 2 independent samples requires an

ANOVA:the change is approximately normal OR can be transformed

to an approximate normal distribution (e.g., natural log)the variability within each group is approximately the same

(ROT: no more than 2x difference)

Smoking Cessation

ANOVA produces a table:

One-way ANOVA indicates you have a single categorical factor x (e.g., treatment) and a single continuous response y and your interest is in comparing the mean response μ across the levels of the categorical factor.

Wait . . .

Why is ANOVA using variances when we’re hypothesizing about means?Between-groups mean square: a varianceWithin-groups mean square: also a varianceF: a ratio of variances—F = MSBG/MSWG

What’s the Rationale?

In the simplest case of the one-way ANOVA, the variation in the response y is broken down into parts: variation in response attributed to the treatment (group/sample) and variation in response attributed to error (subject characteristics + everything else not controlled for)The variation in the treatment (group/sample) means is

compared to the variation within a treatment (group/sample) using a ratio—this is the F test statistic!

If the between treatment variation is a lot bigger than the within treatment variation, that suggests there are some different effects among the treatments.

Rationale

1

2

3

Rationale

There is an obvious difference between scenarios 1 and 2. What is it?

Just looking at the boxplots, which of the two scenarios (1 or 2) do you think would provide more evidence that at least one of the populations is different from the others? Why?

Rationale

1

2

3

F Statistic

Case A: If all the sample means were exactly the same, what would be the value of the numerator of the F statistic?

Case B: If all the sample means were spread out and very different, how would the variation between sample means compare to the value in A?

F = Variation between the sample means

Natural variation within the samples

F Statistic

So what values could the F statistic take on?Could you get an F that is negative?What type of values of F would lead you to believe

the null hypothesis—that there is no difference in group means—is not accurate?

F = Variation between the sample means

Natural variation within the samples

Smoking Cessation


Conclusion: Reject H0: μ1 = μ2 = μ3. Some difference in the number of cigarettes smoked per day exists between subjects receiving the three types of therapy.

Smoking Cessation


But where is the difference? Are the two experimental therapies different? Or is it that each are different from the control?

Reject H0: μ1 = μ3 and μ1 = μ2. Both pharmaceutical and behavioral therapy are significantly different from the literature only control group, but the two therapies are not different from each other.

Smoking Cessation

Smoking Cessation

Conclusion: Adding either behavioral (p = 0.015) or pharmaceutical therapy (p < 0.01) to cessation education results in—on average—significantly greater decreases in cigarettes smoked per day at six months post-therapy when compared to education alone.

Inferences on Means

Concerns a continuous response yOne or two groups: tMore than two groups: ANOVA

Remember, this (and the two-sample case) is essentially looking at the association between an x and a y, where x is categorical (nominal or ordinal) and y is continuous (interval or ratio).

Check assumptions!Normality of yEqual group variances

ANOVA Models

There are many . . . Randomized designs with one treatment A. Subjects not subdivided on any basis other than randomization prior to assignment to treatment

levels; no restriction on random assignment other than the option of assigning the same number of subjects to each treatment level 1. Completely randomized or one factor design

B. Subjects subdivided on some nonrandom basis or one or more restrictions on random assignment other than assigning the same number of subjects to each treatment level 1. Balanced incomplete block design 2. Crossover design 3. Generalized randomized block design 4. Graeco-Latin square design 5. Hyper-Graeco-Latin square design 6. Latin square design 7. Partially balanced incomplete block design 8. Randomized block design 9. Youden square design

Randomized designs with two or more treatments A. Factorial experiments: designs in which all treatment levels are crossed

1. Designs without confounding a. Completely randomized factorial design b. Generalized randomized factorial design c. Randomized block factorial design

2. Design with group-treatment confounding a. Split-plot factorial design

3. Designs with group-interaction confounding a. Latin square confounded factorial design b. Randomized block completely confounded factorial design

Inferences on Proportions (k = 2)

Example: plant geneticsTwo phenotypes: x = {yellow-flowered plants, green-

flowered plants}Dependent variable: y = proportion of plants out of 100

progeny that express each phenotypePhenotype

Yellow

Yellow

Green

Yellow

Green

xy =

n

Plant Genetics

The plant geneticist hypothesizes that his crossed progeny will result in a 3:1 phenotypic ratio of yellow-flowered to green-flowered plants.

H0: The population contains 75% yellow-flowered plants versus H1: The population does not contain 75% yellow-flowered plants.H0: πy = 0.75 versus H1: πy ≠ 0.75

This particular type of test is referred to as the chi-square goodness of fit test for k = 2.

Plant Genetics

Chi-square statistics compute deviations between what is expected (under H0) and what is actually observed in the data:

DF = k – 1 where

k is number of

categories of x

2

2

x

O E

E

Plant Genetics

Suppose the researcher actually observed in his sample of 100 plants this breakdown of phenotype:

Does it appear that this type of sample could have come from a population where the true proportion of yellow-flowered plants is 75%?

Phenotype f (%)

Yellow-flowered 84 (84%)

Green-flowered 16 (16%)

Plant Genetics

Conclusion: Reject H0: πy = 0.75—it does not appear that the geneticist’s hypothesis about the population phenotypic ratio is correct (p = 0.038).

Phenotype f (%)

Yellow-flowered 84 (84%)

Green-flowered 16 (16%)

2 2

21

84 75 16 254.32

75 25

Inferences on Proportions (k > 2)

Example: plant geneticsFour phenotypes: x = {yellow-smooth flowered, yellow-

wrinkled flowered, green-smooth flowered, green-wrinkled flowered}

Dependent variable: y = proportion of plants out of 250 progeny that express each phenotype

Phenotype

Yellow smooth

Yellow smooth

Green wrinkled

Yellow wrinkled

xy =

n

Plant Genetics

The plant geneticist hypothesizes that his crossed progeny will result in a 9:3:3:1 phenotypic ratio of YS:YW:GS:GW plants.

Actual numeric hypothesis is H0: π1 = 0.5625, π2 = 0.1875, π3 = 0.1875, π4 = 0.0625

This particular type of test is referred to as the chi-square goodness of fit test for k = 4.

Plant Genetics


DF = k – 1 where

k is number of

categories of x

2

2

x

O E

E

Plant Genetics

Suppose the researcher actually observed in his sample of 250 plants this breakdown of phenotype:

Does it appear that this type of sample could have come from a population where the true phenotypic ratio is as the geneticist hypothesized?

Phenotype f (%)

YS 152 (60.8%)

YW 39 (15.6%)

GS 53 (21.2%)

GW 6 (2.4%)

Plant Genetics

Conclusion: Reject H0—it does not appear that the geneticist’s hypothesis about the population phenotypic ratio is correct (p = 0.03).

Phenotype f (%)

YS 152 (60.8%)

YW 39 (15.6%)

GS 53 (21.2%)

GW 6 (2.4%)

23 8.972

Inferences on Proportions

Concerns a categorical response yRegardless of the number of groups, a chi-square

test may be usedRemember, this is essentially looking at the association

between an x and a y, where x is categorical (nominal or ordinal) and y is categorical (nominal or ordinal).

Assumptions?ROT: No expected frequency should be less than 5 (i.e.,

nπ < 5)If not met, use the binomial (k = 2) or multinomial (k > 2)

test


What do we do when we have nominal data on more than one factor x?Gender and hair colorMenopausal status and disease stage at diagnosis‘Handedness’ and gender

We still use chi-square!These types of tests are looking at whether two

categorical variables are independent of one another—thus, tests of this type are often referred to as chi-square tests of independence.


Example: Hair color and GenderGender: x1 = {M, F}

Hair Color: x1 = {Black, Brown, Blonde, Red}

Black Brown Blonde Red Total

Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300

GenderHair

Color

Male Black

Female Red

Female Blonde

What the data should look like in the actual dataset:

Hair Color and Gender

The researcher hypothesizes that hair color is not independent of sex.

H0: Hair color is independent of gender (i.e., the phenotypic ratio is the same within each gender).

H1: Hair color is not independent of gender (i.e., the phenotypic ratio is different between genders).



DF = (r – 1)(c – 1)

where r is number of

rows and c is

number of columns

2

2

x

O E

E


Does it appear that this type of sample could have come from a population where the different hair colors occur with the same frequency within each gender?

OR does it appear that the distribution of hair color is different between men and women?


Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300


Conclusion: Reject H0: Gender and Hair Color are independent. It appears that the researcher’s hypothesis that the population phenotypic ratio is different between genders is correct (p = 0.029).


Male 32 (32%) 43 (43%) 16 (16%) 9 (9%) 100

Female 55 (27.5%) 65 (32.5%) 64 (32%) 16 (8%) 200

Total 87 108 80 25 N = 300

23 7.815


Special case: when you have a 2X2 contingency table, you are actually testing a hypothesis concerning two population proportions: H0: π1 = π2

(i.e., the proportion of males who are blonde is the same as the proportion of females who are blonde).

Blonde Non-blonde Total

Male 16 (16%) 84 (84%) 100

Female 64 (32%) 136 (68%) 200

Total 80 (26.7%) 220 (73.3%) N = 300


When you have a single proportion and have a small sample, substitute the Binomial test which provides exact results.

The nonparametric Fisher Exact test can be always be used in place of the chi-square test when you have contingency table-like data (i.e., two categorical factors whose association is of interest)—it should be substituted for the chi-square test of independence when ‘cell’ sizes are small.

Next Time

Linear Regression and CorrelationSurvival AnalysisFinal Thoughts

Introduction to Biostatistics for Clinical and Translational Researchers

Documents

Transcript of Introduction to Biostatistics for Clinical and Translational Researchers