Basics of Biostatistics for Health ResearchSession 4 – February 28, 2013
Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences
& Department of Psychiatry
Generate Commands Using Logic
generate obese2 = .
recode obese2 .=0 if bmi <= 30
recode obese2 .=1 if bmi > 30
tab obese obese2
prtest obese2, by(sex)
Missing as obese, which is strange.
Missing Values and Logical Operators
• http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/
Generate Commands Using Logic
generate obese2 = .
recode obese2 .=0 if bmi <= 30
recode obese2 .=1 if bmi > 30 & bmi !=.
tab obese obese2, missing
prtest obese2, by(sex)
This code works.
Statistical Errors
P (non-exposed) 0.1Alt Hypoth. 0.2 (diff. between 2 prop.)P (exposed) 0.3
N (exposed) 30N (non-exposed) 30 (set equal to exposed)
Alpha 0.05
Power 0.5095
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
-0.5 -0.4 -0.3 -0.2 -0.14.3715E-160.1 0.2 0.3 0.4 0.5
Null Hypothesis Alternative Hypothesis Reject Indicator
Increase Sample Size
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
Power
Reset
Increase Effect Size
Increase Alpha
Sample Size Simulation
Sample Size Calculation in STATA
3
2
1
Sample Size Dialogue Boxes
Let’s do a calculation!
• You are planning a parallel group RCT – with treatment and control groups.
• Normally, 20% of people die with disease X, but you expect to cut this in half with a new treatment.
• How many do you need in each group to achieve 95% power at alpha = 5%?
Output (sampsi)
n2 = 349
n1 = 349
Estimated required sample sizes:
n2/n1 = 1.00
p2 = 0.1000
p1 = 0.2000
power = 0.9500
alpha = 0.0500 (two-sided)
Assumptions:
and p2 is the proportion in population 2
Test Ho: p1 = p2, where p1 is the proportion in population 1
Estimated sample size for two-sample comparison of proportions
. sampsi .2 .1, alpha(0.05) power(.95)
Another Calculation
• A QoL scale in a particular disease has a mean score of 20 and a standard deviation of 5.
• You are conducting a placebo controlled trial to evaluate a treatment that is expected to improve the QoL by 2 points on this scale.
• You recruit n=50 into each group – what power will you achieve?
Output (sampsi)
power = 0.5160
Estimated power:
n2/n1 = 1.00
n2 = 50
sample size n1 = 50
sd2 = 5
sd1 = 5
m2 = 22
m1 = 20
alpha = 0.0500 (two-sided)
Assumptions:
and m2 is the mean in population 2
Test Ho: m1 = m2, where m1 is the mean in population 1
Estimated power for two-sample comparison of means
• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten
• Scroll to the bottom.
• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary
• Save them on your desktop
Review: Comparing Proportions
• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):
generate obese = .
recode obese .=0 if bmi <= 30
recode obese .=1 if bmi > 30 & bmi !=.
tab obese obese, missing
prtest obese, by(sex)
Epitab Commands
1
3
2
Review: Comparing Proportions
• We’ve looked at several procedures for comparing proportions (e.g. for obesity in men vs. women):
recode sex 2=1 1=0
cs obese sex
The output…
chi2(1) = 17.16 Pr>chi2 = 0.0000
Attr. frac. pop .1118099
Attr. frac. ex. .181502 .0997744 .25581
Risk ratio 1.22175 1.110833 1.343743
Risk difference .0265444 .0141393 .0389496
Point estimate [95% Conf. Interval]
Risk .1462487 .1197042 .1347732
Total 6571 5004 11575
Noncases 5610 4405 10015
Cases 961 599 1560
Exposed Unexposed Total
sex
. cs obese sex
A “non-significant” association
generate highgluc = .
recode highgluc .=0 if glucose <= 140
recode highgluc .=1 if glucose > 140 & glucose !=.
generate female=sex
recode female (1=0) (2=1)
tab highgluc female, exact
How does this look with cs?
.
chi2(1) = 3.51 Pr>chi2 = 0.0609
Prev. frac. pop .12358
Prev. frac. ex. .2215609 -.0122169 .4013463
Risk ratio .7784391 .5986537 1.012217
Risk difference -.0054099 -.0111474 .0003276
Point estimate [95% Conf. Interval]
Risk .0190074 .0244173 .0213998
Total 5682 4505 10187
Noncases 5574 4395 9969
Cases 108 110 218
Exposed Unexposed Total
female
. cs highgluc female
Review: Try the cci command to obtain the OR
.
chi2(1) = 3.51 Pr>chi2 = 0.0609
Prev. frac. pop .12358
Prev. frac. ex. .2215609 -.0122169 .4013463
Risk ratio .7784391 .5986537 1.012217
Risk difference -.0054099 -.0111474 .0003276
Point estimate [95% Conf. Interval]
Risk .0190074 .0244173 .0213998
Total 5682 4505 10187
Noncases 5574 4395 9969
Cases 108 110 218
Exposed Unexposed Total
female
. cs highgluc female
Check your work with the cc command.
Comparing Proportions?
Yes No
Fisher’s Exact Test Parametric Assumptions?
Yes No
Multiple Groups? Multiple Groups?
Yes NoYes No
ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum
Two situations we haven’t covered…
• Severely skewed distributions
• Two continuous variables
Severely Skewed Variables
Solution: Make Some Categories
• For example:– Non-smokers– Light smokers (<20)– Moderate 20-40– Heavy > 40
• Your task: Make a variable with these categories and do a statistical test to compare men to women.
E.g. for the recoding…
generate smoke = .recode smoke .=1 if cigpday==0recode smoke .=2 if cigpday > 0 & cigpday < 20recode smoke .=3 if cigpday >=20 & cigpday <= 40recode smoke .=4 if cigpday > 40 & cigpday !=.tab smoke, missing
Some output…
Fisher's exact = 0.000
Total 4,990 6,558 11,548
4 122 23 145
3 1,754 1,073 2,827
2 686 1,292 1,978
1 2,428 4,170 6,598
smoke 1 2 Total
sex
stage 1: enumerations = 0
stage 2: enumerations = 142603
stage 3: enumerations = 146
stage 4: enumerations = 1
Enumerating sample-space combinations:
. tab smoke sex, exact
Two continuous variables
• E.g. diastolic blood pressure and BMI
• The place to start is always a scatter plot
• STATA calls this a “two way” graph
Start with Create
Select the two variables
Submit
The command produced…• Produced by our dialogue box…
twoway (scatter diabp sysbp)
• The same dialogue box can fit a line…twoway (lfit diabp sysbp)
This time select “line”
You can combine the two..
• Try it!twoway (scatter diabp sysbp) (lfit diabp sysbp)
• To assess significance, use the regress command (can you find the menu option?)regress diabp sysbp
Note: the linear output
• Line: y = mx + b
• diabp = 33.42 + 0.364(sysbp)
_cons 33.42091 .4606105 72.56 0.000 32.51804 34.32379
sysbp .3639623 .0033325 109.22 0.000 .3574301 .3704946
diabp Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1580658.92 11626 135.958965 Root MSE = 8.1921
Adj R-squared = 0.5064
Residual 780160.451 11625 67.1105764 R-squared = 0.5064
Model 800498.474 1 800498.474 Prob > F = 0.0000
F( 1, 11625) =11928.05
Source SS df MS Number of obs = 11627
. regress diabp sysbp
(In Class) Assignment for Today
• Assess whether there is an association between systolic blood pressure and death
(you need to decide how)
• We’ll define elevated systolic blood pressure as being > 140 mm of Hg.– What is the risk ratio for death for people with
elevated systolic blood pressure?– Is the risk ratio statistically significant?
Top Related