Download - Basics of Biostatistics for Health Research Session 4 – February 28, 2013

Basics of Biostatistics for Health ResearchSession 4 – February 28, 2013

Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences

& Department of Psychiatry

[email protected]

Generate Commands Using Logic

generate obese2 = .

recode obese2 .=0 if bmi <= 30

recode obese2 .=1 if bmi > 30

tab obese obese2

prtest obese2, by(sex)

Missing as obese, which is strange.

Missing Values and Logical Operators

• http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/

http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/



Generate Commands Using Logic

generate obese2 = .

recode obese2 .=0 if bmi <= 30

recode obese2 .=1 if bmi > 30 & bmi !=.

tab obese obese2, missing

prtest obese2, by(sex)

This code works.

Statistical Errors

P (non-exposed) 0.1Alt Hypoth. 0.2 (diff. between 2 prop.)P (exposed) 0.3

N (exposed) 30N (non-exposed) 30 (set equal to exposed)

Alpha 0.05

Power 0.5095

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

-0.5 -0.4 -0.3 -0.2 -0.14.3715E-160.1 0.2 0.3 0.4 0.5

Null Hypothesis Alternative Hypothesis Reject Indicator

Increase Sample Size

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

Power

Reset

Increase Effect Size

Increase Alpha

Sample Size Simulation

Sample Size Calculation in STATA

3

2

1

Sample Size Dialogue Boxes

Let’s do a calculation!

• You are planning a parallel group RCT – with treatment and control groups.

• Normally, 20% of people die with disease X, but you expect to cut this in half with a new treatment.

• How many do you need in each group to achieve 95% power at alpha = 5%?

Output (sampsi)

n2 = 349

n1 = 349

Estimated required sample sizes:

n2/n1 = 1.00

p2 = 0.1000

p1 = 0.2000

power = 0.9500

alpha = 0.0500 (two-sided)

Assumptions:

and p2 is the proportion in population 2

Test Ho: p1 = p2, where p1 is the proportion in population 1

Estimated sample size for two-sample comparison of proportions

. sampsi .2 .1, alpha(0.05) power(.95)

Another Calculation

• A QoL scale in a particular disease has a mean score of 20 and a standard deviation of 5.

• You are conducting a placebo controlled trial to evaluate a treatment that is expected to improve the QoL by 2 points on this scale.

• You recruit n=50 into each group – what power will you achieve?

Output (sampsi)

power = 0.5160

Estimated power:

n2/n1 = 1.00

n2 = 50

sample size n1 = 50

sd2 = 5

sd1 = 5

m2 = 22

m1 = 20

alpha = 0.0500 (two-sided)

Assumptions:

and m2 is the mean in population 2

Test Ho: m1 = m2, where m1 is the mean in population 1

Estimated power for two-sample comparison of means

• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten

• Scroll to the bottom.

• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary

• Save them on your desktop

http://www.ucalgary.ca/~patten

Review: Comparing Proportions

• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):

generate obese = .

recode obese .=0 if bmi <= 30

recode obese .=1 if bmi > 30 & bmi !=.

tab obese obese, missing

prtest obese, by(sex)

Epitab Commands

1

3

2

Review: Comparing Proportions

• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):

recode sex 2=1 1=0

cs obese sex

The output…

chi2(1) = 17.16 Pr>chi2 = 0.0000

Attr. frac. pop .1118099

Attr. frac. ex. .181502 .0997744 .25581

Risk ratio 1.22175 1.110833 1.343743

Risk difference .0265444 .0141393 .0389496

Point estimate [95% Conf. Interval]

Risk .1462487 .1197042 .1347732

Total 6571 5004 11575

Noncases 5610 4405 10015

Cases 961 599 1560

Exposed Unexposed Total

sex

. cs obese sex

A “non-significant” association

generate highgluc = .

recode highgluc .=0 if glucose <= 140

recode highgluc .=1 if glucose > 140 & glucose !=.

generate female=sex

recode female (1=0) (2=1)

tab highgluc female, exact

How does this look with cs?

.

chi2(1) = 3.51 Pr>chi2 = 0.0609

Prev. frac. pop .12358

Prev. frac. ex. .2215609 -.0122169 .4013463

Risk ratio .7784391 .5986537 1.012217

Risk difference -.0054099 -.0111474 .0003276


Risk .0190074 .0244173 .0213998

Total 5682 4505 10187

Noncases 5574 4395 9969

Cases 108 110 218


female

. cs highgluc female

Review: Try the cci command to obtain the OR

.

chi2(1) = 3.51 Pr>chi2 = 0.0609

Prev. frac. pop .12358

Prev. frac. ex. .2215609 -.0122169 .4013463

Risk ratio .7784391 .5986537 1.012217

Risk difference -.0054099 -.0111474 .0003276


Risk .0190074 .0244173 .0213998

Total 5682 4505 10187

Noncases 5574 4395 9969

Cases 108 110 218


female

. cs highgluc female

Check your work with the cc command.

Comparing Proportions?

Yes No

Fisher’s Exact Test Parametric Assumptions?

Yes No

Multiple Groups? Multiple Groups?

Yes NoYes No

ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum

Two situations we haven’t covered…

• Severely skewed distributions

• Two continuous variables

Severely Skewed Variables

Solution: Make Some Categories

• For example:– Non-smokers– Light smokers (<20)– Moderate 20-40– Heavy > 40

• Your task: Make a variable with these categories and do a statistical test to compare men to women.

E.g. for the recoding…

generate smoke = .recode smoke .=1 if cigpday==0recode smoke .=2 if cigpday > 0 & cigpday < 20recode smoke .=3 if cigpday >=20 & cigpday <= 40recode smoke .=4 if cigpday > 40 & cigpday !=.tab smoke, missing

Some output…

Fisher's exact = 0.000

Total 4,990 6,558 11,548

4 122 23 145

3 1,754 1,073 2,827

2 686 1,292 1,978

1 2,428 4,170 6,598

smoke 1 2 Total

sex

stage 1: enumerations = 0




Enumerating sample-space combinations:

. tab smoke sex, exact

Two continuous variables

• E.g. diastolic blood pressure and BMI

• The place to start is always a scatter plot

• STATA calls this a “two way” graph

Start with Create

Select the two variables

Submit

The command produced…• Produced by our dialogue box…

twoway (scatter diabp sysbp)

• The same dialogue box can fit a line…twoway (lfit diabp sysbp)

This time select “line”

You can combine the two..

• Try it!twoway (scatter diabp sysbp) (lfit diabp sysbp)

• To assess significance, use the regress command (can you find the menu option?)regress diabp sysbp

Note: the linear output

• Line: y = mx + b

• diabp = 33.42 + 0.364(sysbp)

_cons 33.42091 .4606105 72.56 0.000 32.51804 34.32379

sysbp .3639623 .0033325 109.22 0.000 .3574301 .3704946

diabp Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 1580658.92 11626 135.958965 Root MSE = 8.1921

Adj R-squared = 0.5064

Residual 780160.451 11625 67.1105764 R-squared = 0.5064

Model 800498.474 1 800498.474 Prob > F = 0.0000

F( 1, 11625) =11928.05

Source SS df MS Number of obs = 11627

. regress diabp sysbp

(In Class) Assignment for Today

• Assess whether there is an association between systolic blood pressure and death

(you need to decide how)

• We’ll define elevated systolic blood pressure as being > 140 mm of Hg.– What is the risk ratio for death for people with

elevated systolic blood pressure?– Is the risk ratio statistically significant?