Enhancing Community Health Center PCORI Engagement … › wp-content › uploads › 2017 ›...

Post on 27-Jun-2020

3 views 0 download

Transcript of Enhancing Community Health Center PCORI Engagement … › wp-content › uploads › 2017 ›...

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Enhancing Community Health Center PCORI Engagement (EnCoRE)

This work was partially supported through aPatient-Centered Outcomes Research Institute (PCORI) Program Award

(NCHR 1000-30-10-10 EA-0001).

With support from:N2 PBRN

funded by:

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Project PartnersClinical Directors Network (CDN) New York, NY

National Association of Community Health Centers (NACHC) Washington D.C.

The Association of Asian Pacific Community Health Organizations (AAPCHO) Oakland, CA

Access Community Health NetworkChicago, IL

Institute for Community Health (ICH) a Harvard Affiliated InstituteCambridge, MA

The South Carolina Primary Health Care Association (SCPHCA)Columbia, South Carolina

Jonathan N. Tobin, PhD JNTobin@CDNetwork.org

Michelle Proser, MPP MProser@NACHC.orgMichelle Jester, MA MJester@NACHC.org

Rosy Chang Weir, PhD rcweir@aapcho.org

Danielle Lazar, Danielle.Lazar@accesscommunityhealth.net

Shalini, A. Tendulkar, ScM, ScD stendulkar@challiance.orgLeah Zallman lzallman@challiance.org

Vicki Young, PhD vickiy@scphca.org

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

EnCoRE Partners’ Geography2014-2015

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

AIM

AIM: To build health center capacity to engage in patient-centered outcomes research through an interactive 12-month long training

curriculum, walking health centers through the steps and skills needed to develop a patient-centered research proposal

EnCoRE: Enhancing Community Health Center PCORI Engagement

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

EnCoRE

Goal:To adapt, enhance, and implement an existing year long training curriculum designed to educate and engage Health Center teams including patients, clinical and administrative staff in Patient Centered Outcomes Research (PCOR).

Objectives: • Build infrastructure to strengthen the patient-centered comparative

effectiveness research (CER) capacity of Health Centers as they develop or expand their own research infrastructure

• Develop, implement, and disseminate an innovative online training, which will be targeted to and accessible at no cost to all Health Centers and other primary care practices.

• Content will prepare Health Center patients, staff, and researchers in the conduct of community-led PCOR

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Chat

During this live training, you may ask questions at any time in the Chat Window.

This area is located in the lower left hand corner of your screen.

These questions will be answered at the end of the presentation

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Audio Setup

Configure your PC for Audio

Configure Your PCClick the Microphone/Gears icon or

Go to: Tools > Audio > Audio Setup Wizard

1.

2.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

This program has been reviewed and approved for up to 1.5 Prescribed CME credits by the American

Academy of Family Physicians (AAFP).

Please complete the CE Evaluation launched at the end of the presentation and email

eLearning@CDNetwork.org with a request for credits.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Presenters

Mary Ann McBurnie, PhDSenior Investigator, Kaiser Permanente Center for Health

Research Steering Committee Chair, Community Health Applied Research Network (CHARN)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Session 7:Basic Concepts in Biostatistics

May 21, 2015 Mary Ann McBurnie, PhD

Senior Investigator, Kaiser Permanente Center for Health Research Steering Committee Chair,

Community Health Applied Research Network (CHARN)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Acknowledgment

Material in this presentation was developed as part of

the curriculum for the international Methods in

Epidemiologic, Clinical and Operations Research

(MECOR) program sponsored by the American

Thoracic Society (ATS)

Designed for physicians and health care professionals

Intended to strengthen capacity and leadership in research

related to respiratory conditions, critical care and sleep

medicine in middle and low income countries

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Objectives

To be able to identify different data types

and appropriate statistical tests

To understand when non-parametric

methods are preferred over parametric

methods

To be able to interpret results (summary

measures) of statistical tests.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Entire Population

Study Design

Study Sample

Data Collection

& Analysis

Results

(e.g. Mean FEV1 in pa)

But how good is our estimate from the sample?

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Data Analysis

• Using data to answer questions

• Evaluate the association between exposure

measure(s) and outcome measure(s)

• i.e., we have a hypothesis we want to test

• Use data to estimate measures of interest and

make inferences (test our hypothesis) about

these measures

• E.g., does prevalence of COPD vary by geographic

region?

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Data Analysis

• Descriptive statistics

• Characterize the distributions of variables of

interest

• Inferential statistics

• Formally evaluate the role of chance in explaining

the findings of a study – is the observed difference

due to chance alone?

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Types of Data

• Continuous measures

• E.g., age, weight, blood pressure, FEV1

• Discrete measures

• Binary

• Yes/no, present/not present, high/low

• Categorical

• Ordered: education level, income level

• Unordered: race/ethnicity categories, blood type

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Getting Started - Descriptive

Statistics

• What do the data look like?

• Understand/confirm types of data

• Assess quality and completeness of the data

• Describe the distributions of variables of interest

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Getting Started - Descriptive Statistics

• Categorical variables• simple frequencies, minimum and maximum values,

proportions/rates, crosstabs - esp. for nested questions (i.e., if “yes” to Q1, then ask Q2) and recoded variables

• Continuous variables • mean, SD, median/percentiles, minimum and maximum values,

range

• Listings of selected variables

• Graphics – histograms, bar charts, scatter plots

=> Essential to describe/understand data before testing/modeling

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Variable Obs Mean

Std.

Dev. Min Max

ID 10712 137334.6 91395.18 1 452001

age 10712 56.45622 11.5369 40 98

female 10712 0.521378 0.499566 0 1

weight_kg 10712 74.15711 19.05139 0 181

height_cm 10712 166.1451 10.47751 115 203

bmi 10578 27.10881 5.388997 12.5 70.9968

gold_nhanes 10001 0.317268 0.72707 0 4

stage_nhanes 0

smokestat 10711 2.186724 0.8042 1 3

cursmok 10711 0.247409 0.431527 0 1

eversmoking 10711 0.565867 0.495666 0 1

packyrs 10712 13.54834 23.05431 -9 614.25

asthma 10712 0.120426 0.325474 0 1

country 10712 1471.688 874.0461 101 2901

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Frequency listing to check format, content of specific variables

Stata command: tab gold_nhanes

gold_nhanes Freq. Percent Cum.

0 8,111 81.1 81.1

1 858 8.58 89.68

2 807 8.07 97.75

3 199 1.99 99.74

4 26 0.26 100

Total 10,001 100

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Crosstabs to check coding of new variablesStata command: tab gold_nhanes Stage3plus

Stage3plus

gold_nhanes 0 1 Total

0 8,111 0 8,111

1 858 0 858

2 807 0 807

3 0 199 199

4 0 26 26

Total 9,776 225 10,001

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

0

.02

.04

.06

.08

De

nsity

0 20 40 60 80bmi

0

.01

.02

.03

.04

De

nsity

0 200 400 600f_100_derived: number of cigarette packs smoke per year

0

.00

5.0

1.0

15

.02

.02

5

De

nsity

0 50 100 150 200weight_kg

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Normal Distribution – has some nice properties

Mean = 30, SD = 4

Mean = 30, SD = 7

•Symmetric•Bell-shaped•Mean=median

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interval Estimate

1SD

2SD

3SD

±1 SD, coverage=68.26%

±2 SDs, coverage=95.46%

±1.96 SDs, coverage=95%

±3 SDs, coverage=99.73%

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Normal Distribution

Mean = 7.5

Median = 5.8

Skewed Right

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Normal Data

• Extreme values in skewed distributions have bigger

effect on means than on medians.

• Mean gets “pulled” out toward the tail

• Median more robust to skewing

• Mean and median can be very different for a very skewed

distribution

• Important to look at both measures when data are

skewed

• Median may be more informative/appropriate

• Normality (or lack thereof) is important – impacts

analytic approach

• “parametric” tests assume an underlying normal

distribution

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Inferential StatisticsSummary Measures

• Point estimate• Estimate of a population parameter (e.g., mean FEV1, prevalence of TB)

• P-value• probability (under the assumptions of the test statistic) of obtaining a

result (i.e., the point estimate) equal to or more extreme than the one we observed.

• Is our observed value (point estimate) consistent with the expected value?

• Interval estimate (confidence interval)• Certainty, or confidence level we have that the interval covers the true

population value. (95% confidence intervals are very common)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Population A

N 100

Mean FEV1 1.70

SD (mean FEV1) .4

SE(mean) = SD/Sqrt(N) .04

Does Population A have a different mean FEV1 (1.70 L) than that of the reference population (1.60 L, say)? That is, is the sample mean FEV1 L different from 1.60 L or is the difference due to chance?

A simple example

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Compare sample mean FEV1 = 1.70 to

population mean (=1.60)

Observed

Mean

Expected

Mean

Distance In SE?

How many SEs

between 1.70 &

1.60?

(1.70-1.60)/.044

= 2.5 SEs

Covers ~49.4%

0.6% chance of

getting a mean of at

least 1.70 L.

P-value = 0.006

49.4%

1.40 1.50 1.60 1.70 1.80

FEV1, L

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

P-Value

• A probability between 0 and 1

• Interpretation: probability of observing a difference

that is at least as extreme as the one we observed

when there is really no difference.

• Smaller the p-value => stronger the evidence for a

difference

• Commonly use a significance level of 0.05 (5%)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

P-Value = 0.006

• Assuming the mean FEV1 is not different from

1.60 L:

• If random samples (of 100) are taken repeatedly,

mean FEV1 will be at least as great as 1.70 L only

0.6% of the time.

• i.e., it is very unlikely we would observe this value

(1.70L) if the true mean FEV1 for this population

were 1.60L

=> We conclude that mean FEV1 in this

population is different from 1.60L.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Confidence Interval for the Mean

Standard Deviation

Sample Size (N) Standard Error of the mean =

N

SD

95% CI: Mean + 1.96 x

99.7% CI: Mean + 3.0 x

68.3% CI: Mean + 1.0 x

N

SD

N

SD

N

SD

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Population A

N 100

Mean FEV1 1.70

SD(mean FEV1) .4

SE(mean) = SD/Sqrt(N) .04

95% CI for mean FEV1

1.70 + 1.96 x .04

(1.62, 1.78)

One Sample T-test

95% CI: Mean + 1.96 x N

SD

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

“Small” sample”“Larger”

sample

N 100 400

Mean FEV1 1.70 1.70

SD(mean FEV1) .40 .40

SE(mean) = SD/Sqrt(N) .04 .02

95% CI for mean FEV1

1.70 + 1.96 x .04

(1.62, 1.78)

1.70 + 1.96 x .02

(1.66, 1.73)

Lower & Upper

Confidence Limits

Impact of Sample Size: As sample size increases, confidence interval get tighter

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Hypothesis Testing

• Test H0: FEV1 = 160 vs Ha: FEV1 > 160

Test statistic is:

T = FEV1 difference = 1.70 -1.60

Mean FEV1 se .04

• The properties of the statistic, T, are known if assumptions hold, and a p-value can be calculated

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Females, mean FEV1 (sd) 1.61 (.44)

Males, mean FEV1 (sd) 2.28 (.66)

Difference, mean FEV1, (se) .67 (.04)

Comparing Two Means – Two Sample T-test:

Do males have a different mean FEV1 than females?

Test H0: FEV1(males) – FEV1(females)} = FEV1 diff = 0, vs

Ha: FEV1 diff = 0

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Females, mean FEV1 (sd) 1.61 (.44)

Males, mean FEV1 (sd) 2.28 (.66)

Difference, mean FEV1, (se) .67 (.04)

Comparing Two Means – Two Sample T-test:

Do males have a different mean FEV1 than females?

Mean FEV1 Difference = T

se(Mean FEV1 Difference )

• Properties of T are known if assumptions hold

=> Can compute p-value. (In this case, T = 16.14 and p<.001)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Key Assumptions for the Two-Sample T-test

• Each sampled observation is random and independent of all other observations

• Data randomly sampled from normally distributed populations

• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Key Assumptions for the Two-Sample T-test

• Each sampled observation is random and independent of all other observations• If pairs of observations are correlated (e.g., blood pressure before and

after a challenge test) a paired t-test can be used

• Data randomly sampled from normally distributed populations

• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Key Assumptions for the Two-Sample T-test

• Each sampled observation is random and independent of all other observations• If pairs of observations are correlated (e.g., blood pressure before and

after a challenge test) a paired t-test can be used

• Data randomly sampled from normally distributed populations• If distribution is not normal, one can use a non-parametric test – e.g., the

Wilcoxon Rank Sum (still requires independent obs.)

• The larger the sample size, the less serious the departure from normality

• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Key Assumptions for the Two-Sample T-test• Each sampled observation is random and independent of all

other observations• If pairs of observations are correlated (e.g., blood pressure before and

after a challenge test) a paired t-test can be used

• Data randomly sampled from normally distributed populations• If distribution is not normal, one can use a non-parametric test – e.g., the

Wilcoxon Rank Sum (still requires independent obs.)

• The larger the sample size, the less serious the departure from normality

• The two populations have “equal” variances• Test for “homogeneity” of variance (F-test)

• There is an “adjusted” version of the two-sample t-test (Welch’s) if variances aren’t homogeneous

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

Paired T testBased on the differences between the values of pairs of observations

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

Paired T testBased on the differences between the values of pairs of observations

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

T = mean(diff)

sd/sqrt(n)

Paired T testBased on the differences between the values of pairs of observations

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Paired T testBased on the differences between the values of pairs of observations

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

T = mean(diff)

sd/sqrt(n)

= .026

.06/sqrt(8)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

T = mean(diff)

sd/sqrt(n)

= .026

.06/sqrt(8)

= 1.23

p-value = .234

Paired T testBased on the differences between the values of pairs of observations

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Patient Tech 1 Tech 2 Difference

1 3.96 3.88 0.08

2 2.80 2.85 -0.05

3 3.91 3.86 0.05

4 3.17 3.14 0.03

5 2.95 2.90 0.05

6 2.55 2.63 -0.08

7 3.29 3.22 0.07

8 4.30 4.23 0.07

Mean 3.366 3.339 0.026

SD 0.624 0.579 0.060

T = mean(diff)

sd/sqrt(n)

= .026

.06/sqrt(8)

= 1.23

p-value = .234

Under the null hypothesis (of no difference), the probability that we

we would observe a difference of at least .026 L by chance is .23

Paired T testBased on the differences between the values of pairs of observations

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Analysis of Variance (ANOVA)

• Extension of T-test for evaluating differences in

means. Generalizes T-test to > 2 groups, e.g.,

H0: m1 = m2 = m3, vs

Ha: at least one m differs from one of the others

• Assumptions Independent observations

Normality

Homogeneity of variances

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Parametric Tests

• Alternatives to parametric tests when assumptions don’t hold• Don’t require assumptions about shape of distributions or

variances

• Do require that observations/pairs be randomly and independently chosen.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Parametric Tests• Wilcoxon Rank Sum (WRS)

• Alternative to the two-sample t-test

• Based on the ranks –the order in which the observations (of both groups combined) fall

• WRS test statistic is the sum of the ranks for observations from one of the samples. “Large” or “small” rank sums constitute evidence against the null hypothesis.

• For smaller sample sizes, tables for WRS exist to look up p-values. For larger sample sizes a normal approximation can be used.

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Parametric Tests• Wilcoxon Signed Rank test (WSR)

• Alternative to the paired t-test• Paired differences are ranked

• Kruskal-Wallis• Alternative to ANOVA• Also based on ranks

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Chi-square test for categorical data

Is smoking (y/n) associated with CVD (y/n)

CVD

No Yes

SmokerNo 140 50 190

Yes 60 50 110

200 100 300

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

CVD

No Yes

SmokerNo 140 50 190

Yes 60 (30%) 50 (50%) 110

200 100 300

Compare proportions of smokers in each CVD group

Test H0: p1 = p2, or p1-p2 = 0, vs

Ha: p1-p2 = 0

Chi-square test for categorical data

Is smoking (y/n) associated with CVD (y/n)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Computation of Chi-square statistic

CVD

No Yes

Smoker

No190*200/300

= 127.7190

Yes

200 300

Compute the expected value for each cell from the

marginal totals

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Computation of Chi-square statistic

CVD

No Yes

Smoker

No190*200/300

= 127.7

190*100/300

= 63.3190

Yes110*200/300

= 66.7

110*100/300

= 36.7110

200 100 300

Compute the expected value for each cell from the

marginal totals

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Computation of Chi-square statistic

CVD

No Yes

Smoker

No 104 -127.7 50 - 63.3 190

Yes 60 - 66.7 50 - 36.7 110

200 100 300

• Compute the differences between Observed and Expected values

• Square the difference and divide by the expected value: (O-E)2

/ E

• Add these up to compute the chi-square statistic: S {(O-E)2

/ E}

• Our test statistic, X = 12.69

• Properties of X are known if assumptions hold

• => Can compute p-value

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Chi-square test for categorical data

CVD

No Yes

SmokerNo 140 50 190

Yes 60 (30%) 50 (50%) 110

200 100 300

50% of CVD patients vs 30% non-CVD patients are smokers

Chi-Square Test gives a P-Value <0.001.

=> Very unlikely that we would see this difference (30% vs 50%) if

there really were no difference between the 2 groups.

=> Very strong evidence for a real difference.

Note: This test can be used for exposure variables with >2 categories

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Key Assumptions for the Chi-Square Test

• Each observation is sampled randomly and independently of all other observations• McNemar’s test assesses paired observations

• “Sufficient” sample size• E.g., all cells have counts > 5 for 2x2 tables, or 80% of cells have counts >5

for larger tables and no cells have zero’s

• Apply “Yate’s Correction” if this assumption isn’t met

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Parametric

Analyses

Type of Outcome Variable

Binary Continuous

Type of

Predictor

Variable

Binary Pearson’s c2 test

McNemar’s c2 test

Two-sample t-test

Paired T-test

Categorical

(K-levels)

Pearson’s c2 test ANOVA

Continuous

Multivariate

Recap

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Non-Parametric

Analyses

Type of Outcome Variable

Binary Continuous

Type of

Predictor

Variable

Binary Wilcoxon Sign Rank test Wilcoxon Rank Sum test

Categorical

(K-levels)

Kruskall-Wallis

Continuous

Multivariate

Recap

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Simple Linear Regression

• E{Y|X=x} = b0 + b1 * x

• E{Y| X} = mean response

• X = predictor

• b0 = ??

• b1 = ??

• What do b0 and b1 mean?

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients - Example

• E{FEV1| Gender}= b0 + b1 * Gender

• Y = response = FEV1

• X = predictor = Gender (dichotomous)

= 0 if female

= 1 if male

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients

Example 2: E{FEV1|Gender}= b0 + b1 * gender

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

“jittered” data:

gender = gender + N(0, 0.1)

E{FEV1|Gender} =

b0 + b1 * Gender

Mean (se) FEV1

Females

1.61 (.44)

Males

2.28 (.66)

..

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

b0 (se)

1.61 (.024)

b1 (se)

.67 (.038)

E{FEV1|Gender} = b0 + b1 * Gender

..

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients Example 2: E{FEV1|Gender}= b0 + b1 * gender

E{FEV1 | Gender} = 1.61 + .67 * Gender

1. X = 0 (Females)• E{FEV1| Gender = 0} = 1.61 + .67 * 0 = 1.61

2. X = 1 (Males)• E{FEV1| Gender = 1} = 1.61 + .67 * 1 = 2.28

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

FEV1 = b0 + b1gender

Regression and ANOVA:The t-test as a regression model

0

2

4

6

0=F 1=M

b0b0 = mean FEV1 in women

---------------------b0+b1

---------------------} b1

b1 = mean difference

test of b1 = 0 is equivalent to the t-test

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Relationship between T-test and Regression (2-sample problem)

T-Test

Females (x=0) Mean FEV: 1.61 (.44)

Males (x=1) Mean FEV: 2.28 (.66)

Mean Difference: 0.67 (.04) T-stat = 16.140*, p<.001

Linear Regression

b0 (se): 1.61 (.024)

b1 (se): 0.67 (.04)

b0 + b1: 2.28

T-stat for b1 = 17.526*,

p<.001

*T-statistics from T-test and regression coefficient are EXACT when

the variances for both groups (males and females) are equal

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Hypothesis Testing for Regression Coefficients

• Test H0: b1 = 0 vs Ha: b1 = 0

• Test statistic for coefficient estimate is:

b1 = Tse(b1)

• Properties of T are known if assumptions hold

• Linear regression intercept and slope estimates, b0 and b1, are asymptotically normally distributed

=> Can compute p-value

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Regression Coefficients

What if we don’t reject the hypothesis that b1 = 0?

• There may, in fact, be no association

• Zero slope doesn’t prove there is no association • May be an association but not in the parameter we looked at

(multiplicative model?)

• May be an association but it may not be linear (curvilinear assoc.)

• May be a linear trend but we lack statistical precision to be confident that it truly exists (type II error: we didn’t have a big enough sample or we were unlucky – suerte mala)

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Regression Coefficients

What if we don’t reject the hypothesis that b1 = 0?

• Non-zero slope suggests an association is present between the mean response and the predictor• Reject the hypothesis that there is no linear trend in the

average response (e.g., FEV1) across predictor groups (e.g., age)

• Does NOT imply causality

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Simple Linear Regression Model Assumptions• Linearity of the regression function, i.e., relationship is linear

in the modeled predictors

• Independence of observations

• Equal variance across predictor groups

• Normality of error terms

Consider “robust” regression methods if assumptions don’t hold.

More precise than robust methods if assumptions hold

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Robust Regression Model AssumptionsRobust

• Allow correlated observations within identified clusters;

• Allow unequal variances across groups

• Still correct if classical assumptions hold but may be less precise

• Avoid need to check model assumptions

Requires fewer assumptions than classical methods

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Type of Outcome Variable

Binary Continuous

Type of

Predictor

Variable

Binary c2 test

logistic regression

T-test

ANOVA

linear regression

K-levels chi-square

logistic regression

ANOVA

linear regression

Continuous logistic regression correlation

Linear, non-linear regression

Multivariate logistic regression linear/non-linear regression

Parametric analyses

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Type of Outcome Variable

Binary Continuous

Type of

Predictor

Variable

Binary sign test Mann-Whitney

Kruskall-Wallis

K-levels Robust logistic

regression

Kruskall-Wallis

Continuous Robust logistic

regression

Robust regression

Multivariate Robust logistic

regression

Robust regression

Non-parametric analyses

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Correlation

• Measures how closely the largest values of one variable are associated with the largest values of a second variable and vice versa.

• Sample correlation coefficient, R, is an estimate of the population correlation r.

• Ranges from –1 to +1• –1 (perfect negative correlation)

• +1 (perfect positive correlation)

• R=0 indicates no linear association

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients

• E{FEV1| Age}= b0 + b1 * Age

• Y = response = FEV1

• X = predictor = Age (continuous)

We’ve estimated that

• b0 = 3.12

• b1 = -.016

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients for E{FEV1 |Age} = b0 + b1 * Age

1. Age = 0• E{Y|X=0} = b0 + b1* 0 = b0

• E{FEV1|Age = 0} = 3.12 - .016* 0 = 3.12

2. Age = x• E{Y|X=x} = b0 + b1* x

• E{FEV1|Age = x} = 3.12 - .016 * x

3. Age = x+1• E{Y1|X x+1} = b0 + b1*(x+1) = b0 + b1*x + b1

• E{FEV1|Age = x+1} = 3.12 - .016*(x+1) = 3.12 – .016*x –.016

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Interpretation of Coefficients for E{FEV1 |Age} = b0 + b1 * Age

Mean FEV1 at age=x+1… (b0 + b1*x + b1) (3.12 - .016*x -.016)

Mean FEV1 at age=x… - (b0 + b1*x ) - (3.12 - .016*x )

------------------------ ----------------------------

b1 -.016

b1 (= -.016) is the average difference in FEV1 when age increases from x

to x+1

OR

On average, a one year difference in age results in a change of b1

(= -.016) in FEV1

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Additional Questions?

And Discussion

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Upcoming Webinars

Webinar Date Content Activities Presenters

May 192:00 – 3:30 pm EST

Bioinformatics Data management and EHR data collection methods

Michelle Proser and Mickey Eder

June 162:00 – 3:30 pm EST

Research Ethics, IRB, and Good Clinical Practices

Creation of informed consent forms, plans for fair compensation of patient participants

Leah Zallman and Rosy Chang Weir

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Available Resources

• EnCoRE Website for Past Webinars and Materials• https://cdnencore.wordpress.com/live-session-library/

• Additional resources to build research capacity at health centers

• www.CDNetwork.org/NACHC

www.CDNetwork.org www.NACHC.com

www.icommunityhealth.org www.aapcho.org www.SCPHCA.org www.accesscommunityhealth.net

Future Funding Opportunities from PCORI

• Visit http://www.pcori.org/funding/opportunities for more information

Opportunity Letter of Intent Due

Application Due

Addressing Disparities March 3, 2015 May 5, 2015

Improving Healthcare Systems March 3, 2015 May 5, 2015

Assessment of Prevention, Diagnosis, and Treatment Options

March 3, 2015 May 5, 2015

Communication and Dissemination Research March 3, 2015 May 5, 2015

Clinical Management of Hepatitis C Infection March 3, 2015 May 5, 2015

Improving Methods for Conducting PCOR March 3, 2015 May 5, 2015

Engagement Award: Knowledge, Training and Development, and Dissemination Awards

April 1, 2015 April 1, 2015

Engagement Award: Research Meeting and Conference Support

April 1, 2015

• Visit http://www.pcori.org/funding/opportunities for more information