An Introduction to statistics Risk, rates and odds Written ... · An Introduction to statistics ....

Ratio a/b

Difference

Absolute

Absolute Risk Difference (ARD)

reciprocal of

Effect size measures

Number Needed to Treat (NNT)

Proportion Bounded 0-1

Odds Not bounded

Rate/risk

+ time

Relative

Ratio e.g. a/b

|pcontrol-ptreatment| 1/(| pcontrol-ptreatment |)

Nominal Data

Ptreatment=c/(c+d) Pcontrol=a/(a+b)

Rate/risk is a proportion over a time period eg. Pcontrol=a/(a+b)

a 'to' b= Ocontrol=a/b c 'to' d = Otreatment=c/d

= risk ratio (RR) =Incidence rate ratio(IRR)

Odds Ratio (OR)

OR=(Ptreatment /(1- Ptreatment))/(Pcontrol/(1- Pcontrol)) = ad/bc

related OR=(RR(1-Pcontrol))/((1-Pcontrol)RR)

If RR <1 (i.e. treatment better than placebo) consider:

Relative Risk Reduction (RRR): =(Pcontrol - Ptreatment)/Pcontrol

=1-RR Often expressed in %

also let: Treatment group =exposed Control/placebo=unexposed

Then: RRR=prevented fraction in exposed

Best of breed

problem NNT Comparing many treatments unless baseline risks similar

i.e. PtreatmentA= PtreatmentB etc

10% reduction of 100% to 90% more important than 30% to 20% but not picked up with NNT

prevalence

RR= Ptreatment/Pcontrol = exposed/non exposed

An Introduction to statistics

Risk, rates and odds

Written by: Robin Beaumont e-mail: [email protected]

Wednesday, 03 October 2012 Version: 2

This document is part of a series see: http://www.robin-beaumont.co.uk/virtualclassroom/contents.html

Accompanying YouTube videos at http://youtu.be/nFHL54yOniI

Who this document is a imed at those people who wish to learn more about statistics in a practical way. It is the thirteenth in the series. I hope you enjoy working through this document. Robin Beaumont

Acknowledgment My sincere thanks go to Claire Nickerson for not only proofreading several drafts but also providing additional material and technical advice.

Event No event control a b a+b

Treatment c d c+d

http://www.robin-beaumont.co.uk/virtualclassroom/contents.html

http://youtu.be/nFHL54yOniI

Risks, rates and odds

Robin Beaumont D:\web_sites_mine\HIcourseweb new\stats\basics\part13_risks_rates_odds.docx of 19

Contents 1. Introduction ............................................................................................................................................ 3

2. Odds ....................................................................................................................................................... 3

2.1 Odds = proportion of events to none events for a binary variable .......................................................... 3

2.2 Odds against and horse racing ................................................................................................................. 4

2.3 Odds and medical statistics ...................................................................................................................... 5

2.4 Equations and probability ........................................................................................................................ 5

3. Absolute Risk (AR) and odds .................................................................................................................... 6

4. Effect size measures ................................................................................................................................ 7

4.1 Difference measures - ARD and NNT........................................................................................................ 7

4.2 Ratio measures – RR, RRR and OR ............................................................................................................ 7

4.3 Odds Ratio (OR) ........................................................................................................................................ 9

4.4 Relative risk versus odds ratios – study design ...................................................................................... 10

4.5 Interpreting the Odds ratio .................................................................................................................... 12

5. Calculating Risks and odds ..................................................................................................................... 13

6. Carrying out the analysis ....................................................................................................................... 14

6.1 OpenEpi .................................................................................................................................................. 14

6.2 Directly in R ............................................................................................................................................ 15 6.2.1 Writing up the results .................................................................................................................................................... 16

6.2.2 Tips and Tricks ............................................................................................................................................................... 16

6.3 In PASW .................................................................................................................................................. 16

7. Number needed to treat (NNT) and harm (NNTH) confidence intervals ................................................. 17

8. Exercise ................................................................................................................................................. 18

9. Summary ............................................................................................................................................... 18

10. References ......................................................................................................................................... 18



1. Introduction In the previous chapter, proportions and chi square, we discussed proportions and counts and I suggest that you read once again the first two sections of that chapter if you can't remember it. In that chapter we investigated a number of situations concerning nominal data and the chi square distribution but did not consider effect size measures. If we obtained what we considered to be a significant chi square value it did not tell use how much our sample deviated from the null distribution, to get some idea of this we produced various plots ranging from simple bar charts to association and mosaic plots, another approach is to consider various effect size measures which is the focus of this chapter.

In both medical and epidemiological studies a very common method of analysis is the 2 by 2 table of two binary variables, and its analysis has literally been done to death, in this chapter I will try to present in a logical sequence the various statistics, usually types of affect size measures, that have been devised. Interestingly most books give the impression that many of these measures have been around for ever, but both Fleiss 2003 p103 and also Agresti 2002 p44 describe the development of the measures beginning with Yule in 1900 and then Cornfield in 1951.

2. Odds An odd is rather a strange concept unless one is a gambler and even then it is used in a slightly different way compared to its used in statistics. I'll present three explanations of odds, in the hope that you will be happy with at least one of them.

Odds relate to binary variables that is variables that can take only two values e.g. alive/dead etc.

2.1 Odds = proportion of events to none events for a binary variable

Odds = proportion of events to none events for a binary variable

So an odd is just another proportion (i.e. ratio) but importantly it is not the same as a probability. As we can see this from the two equations below:

Firstly although I have divided up the outcomes for the binary variable as none_events to events I could as easily have called these event A and event B for my binary variable. An odds takes either the events or none events as the denominator depending on if you are talking about those for or against the event. This is in contrast to probabilities where the denominator is the total number of all outcomes. An example should make this clearer.

Imagine you have ten patients on which you try out a new treatment of which nine show no improvement and eventually die but one miraculously completely recovers. We say that the probability of death is therefore 9/10=.9 and probability of surviving is 1/10=.1 the important thing here is to note that the denominator in both of these probabilities is the total number of outcomes.

Now what are the odds for (i.e. in favour of) death, this is expressed as the deaths to non deaths; deaths : none_deaths; 9:1 = 9/1 = 9 in contrast the odds against death (i.e. odds in favour of surviving) is none_deaths to deaths 1:9 = 1/9 = 0.1111 Notice here that the odds for and odds against is just the same proportion inverted neither having the total number of events in the denominator.

_eventsprobability

all outcomes=

__

against eventnone eventsodds

events=_ _for event

eventsoddsnone events

=



The table below summarises our discussion so far.

probability Odds for Odds against death (high) 9/10 = .9 9/1 = 9 1/9 =.1111

surviving (low) 1/10 = .1 1/9 = .1111 9/1 = 9

Notice that the probabilities add up to one, while the odds do not. However the product of the odds do add up to one (i.e. .9 x .1111 = 1).

For event A: odds for = event B odds against and also for event B odds for = A odds against

From the above we can surmise:

• Low probability = low odds in favour of = high odds against • High probability = high odds in favour of = low odds against

Exercise - 1.

1. Odds can be calculated from the following types of data (one correct answer):

a. interval data b. ratio data c. ordinal data d. nominal ( 2 categories = dichotomous) e. nominal ( more than 2 categories – multinomial)

2. When you throw two dice there are 36 possible outcomes (6 x 6) six of which have a total score of 7 {1:6; 6:1; 2:5; 5:2; 3:4; 4:3}. Calculate the following:

• The odds in favour of obtaining a total score of 7 • The odds against obtaining a total score of 7 (i.e odds in favour of not obtaining a 7) • The probability of obtaining a total score of 7 • The probability against obtaining a total score of 7 (i.e probability of not obtaining a 7)i

2.2 Odds against and horse racing Most books present odds in the context of horse racing but unfortunately this often confuses because in gambling the odds against the event are used which will become clear with a few examples.

The horse called Red riding hood has odds of 20 to one

So naively calculating this we have a odds of 20/1 = 20 which from the information on the previous page would make one think that it has a high chance of winning. Unfortunately this is not the case because what the bookmaker has given us is actually the odds against winning, so we actually have here the odds of loosing of 20/1 = 20 and therefore the odds against loosing (i.e. winning) is 1/20 = 0.05 in other words once in every twenty. So we should really read the betting odds as:

The horse called Red riding hood has odds against winning of 20 to one Or

The horse called Red riding hood has odds of loosing of 20 to one



Some more examples:

Horse racing terminology Odds against winning Odds for winning Probability of winning

30 to 1 30/1 = 30 One to thirty 1/30 =.0333 1/31 = 0.0322

50 to 1 50/1 = 50 One to fifty 1/ 50 = .02 1/51 = 0.0196

100 to 1 100/1 = 100 One to one hundred 1/100 =.001 1/101 = 0.0099

In conclusion when you see betting odds read it as odds against the event, with a higher odds meaning less chance of the event occurring.

2.3 Odds and medical statistics In contrast to the betting fraternity, in statistics the odds are:

• for an event rather than the odds against the event. • the 'to one' is dropped (Campbell & Swinscow 2009 p.25).

Again some examples should help

Considering once again the miraculous cure we have discovered that works for 1 in ten patients, which we stated as the odds for (i.e. in favour of) death as "9 to 1" = 9 : 1 = 9/1 = 9 statisticians would just say an odds of '9'. Unfortunately just because it is not mentioned does not mean that it should be forgotten.

Because in statistics odds are converted to the 'to one' standard we need to think about how we can consider the odds in favour of surviving which at present is none_death to deaths = 1 to nine = 1:9 = 1/9 = 0.1111 to convert this to the 'to one' standard we just divide each by the denominator (i.e. 9) so now we have the ratio of none_deaths to deaths 1/9 :9/9 = a odd of one ninth to 1 which produces the same result as before = (1/9)/1 =0.1111

2.4 Equations and probability While odds are not the same as probabilities there is a relationship between them given in the table below. The yellow highlighted row is the important one.

probability odds

for Probability1

forfor

for

oddsodds

=+

_ 1for

for eventfor

podds

p=

−

against 1

Probability1

against foragainst

against for

odds oddsodds odds

+= =

+

1 foragainst

for

podds

p−

=

From the above equations you can see that the odds for an event equation is simply a swapped over version of the equation against the event, technically we say it is the reciprocal. Using the above equations it is possible to plot the relationship between odds (in favour of an event) against probability.



0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5

odds

(in

favo

ur o

f)

proportion/probability

Relationship between odds (for) and probability

Some things to note about the above graphs:

• Odds (in favour of) values range from zero to > 10 ( in fact there is no upper limit) • The probability varies between zero and 1 • As the probably gets smaller the odds (in favour of) also gets smaller • When the odds (in favour of) = 1 the probability = .5

You will notice that in the above I have specified the odds as being 'in favour of' as used in statistics, If I have created a graph for the odds against the event, as in betting, the values would have been reflected in the line p=0.5, with the probability getting smaller as the odds (against) get larger. Odds are very important in statistics for several reasons, two being: they do not have a upper value of 1 unlike probability and also because when you take the logarithm of the odd it produces a curve which has a straight line for the middle probability values which is very useful if you are trying to predict a binary outcome from one or more predictors, a situation we will investigate in a latter chapter on logistic regression. After this diversion now lets return to defining our first set of statistics for binary variables.

Exercise - 2.

1. An odds value of 0.5 indicates a probability of (one correct answer):

a=0; b= .25; c= .5; d= .75, e= .1.0 f = infinity

3. Absolute Risk (AR) and odds You will soon realise that the absolute risk and odds statistics are relatively simple, consider the Physicians’ Health Study Data for the 10,919 physicians who had never smoked discussed in the chi square chapter and reproduced below.

The proportion of those who received the placebo who had a heart attack was 96/5488 = 0.01749271, just under 2 percent in contrast to those who received aspirin of 55/5431=0.01012705 or 1 percent. We could express this algebraically; Ptreatment=c/(c+d), Pcontrol=a/(a+b). These proportions are also known as the absolute risks (AR), strictly a risk is related to a time period and it is always a good idea to make sure you say what the period is, so for our data, looking back I can say that the risk is for the period "from the late 1980’s to the early 1990’s" possibly not the best time interval! and It would be sensible to go back to the original paper and specify it at least in years. Note that the risk measure used both categories as the denominator so it is therefore a probability.

In contrast the odds are worked out by just using the no event category, instead of the total, so we have for the placebo group Oplacebo = 96/5392= 0.0178 and for the treatment group Otreatment = 55/5376= 0.0102, algebraically this simply is a/b and c/d, so if b and d are large in comparison to the a and c we would expect the proportions and odds to be pretty similar as they are here. In other words, situations with rare diseases (= low prevalence rates) produce small absolute risks with similar values to the odds of the disease. (Taken from Agresti & Finlay 2009).

Never smoked group Heart attack No heart attack Placebo 96 5392 5488 Aspirin 55 5376 5431

Event No event control a b a+b

Treatment c d c+d

0.001

0.01

0.1

1

10

100

0 0.2 0.4 0.6 0.8 1 Lo

g(od

ds (i

n fa

vour

of))


Relationship between Log(odds(for)) and probability

odds of 1 p=.5

0 1 2 3 4 5 6 7 8 9

10

0 0.2 0.4 0.6 0.8 1

odds

(in

favo

ur o

f)


Relationship between odds (for) and probability

ARtreat = ptreat =c/(c+d) = 55/5431 = 0.01012 ARcontrol = pcontrol = a/(a+b) = 96/5488 = 0.0174



4. Effect size measures You may remember that when we have discussed effect size measures in the past we have considered how much a value such as the mean deviates either from zero or the control/placebo group in standard deviations. With nominal data clearly this is not possible; instead what we can do is take two different approaches:

• Consider the difference in proportions (i.e. risks) between the two groups, that is the control and treatment groups

• Consider the ratio (divide one by the other) of proportions (i.e. risks) between the two groups

Lets take each approach in turn.

4.1 Difference measures - ARD and NNT The simplest of this type of measure is the Absolute Risk Difference (ARD) which is simply the risk difference between the two groups, ignoring the possibility of a negative value (i.e. taking the absolute value), |p1-p2| = |pplacebo-ptreatment|, so for our control and treatment groups we have | .01749271 - 0.01012705 | =.00736566 what does this tell us? Basically it presents the reduction in risk in the treatment group. Considering this as a percentage, for every 100 patients treated .736566 of a patient did not suffer a heart attack compared to the placebo group. In this case where the treatment group has reduced the risk it is also called the ARR Absolute Risk Reduction.

Dividing 100 by this percentage value gives us the number needed to treat to prevent a single death, 100/.736566 =135.7652 so we would need to treat 136 patients to prevent a single death. This value is called the Number Needed to Treat (NNT).

Mathematically the NNT is: 1/(|p1-p2|) where p1 = placebo group and p2=treatment group, the same as the ARD but over 1 this is called the reciprocal of the ARD. Using this equation for our data gives, 1/.00736566 = 135.7652 in other words we need to treat 136 patients to save one life. The seemingly immediate attractiveness of the NNT means that it has gained rapid widespread popularity, however there are dangers associated with it and we will discuss some of these latter.

4.2 Ratio measures – RR, RRR and OR A ratio is just one value divided by another and the simplest such measure we can obtain from our data is by dividing the treatment risk by the control risk, which for our aspirin heart attach data is Ptreatment/Pcontrol 0.01012705/.01749271 = 0.5789297 and guess what it is called the Relative Risk (RR) also called the Incidence Rate Ratio (IRR). Most texts present the numerator as the treatment (i.e. exposed) group but one (Howells, 2007 p.154) uses it as the denominator.

To calculate a valid Relative Risk you usually need some estimate of the population at risk, if you control group is either biased or too small the relative risk will be useless.

If the relative risk is less than 1, that is when the treatment is better than the placebo it is sensible to consider another measure called the Relative Risk Reduction (RRR):

RRR =(Pcontrol - Ptreatment)/Pcontrol =1-RR

For our data the RRR is 1-0.5789297 = 0.4210703 and multiplying this value by 100 to express it as a percentage gives 42.107 telling us that there is 42% reduction in deaths in the treatment group for the period "from the late 1980’s to the early 1990’s". But you must remember that this is a reduction from basically just under 2% to 1% and one would need to take the actual risk into consideration when discussing this value. For example if it were some deadly disease such as heart attacks probably it is worthwhile but if it were a simple non life threatening condition would the intervention with this RRR be worthwhile? Both the RR and RRR do not take into account the baseline risk which in contrast the next measure does.



Exercise - 3.

1. In a prospective cohort study, researchers reported that the risk of fatal coronary heart disease was increased for women with diagnosed diabetes compared with women without (relative risk adjusted for age 3.50, 95% confidence interval 2.70 to 4.53). ). Taken from Peter Sedgwicks excellent statistical questions BMJ series BMJ 2009;339:b3007

Which of the following statements accurately describes the reported risk of fatal coronary heart disease (select two)?

a. For women with diagnosed diabetes, the risk of having fatal coronary heart disease is 3.5 times that of women without diagnosed diabetes

b. The reported risk of fatal coronary heart disease is statistically significant at the 5% level c. In women, diabetes causes fatal coronary heart disease d. 95% of women with diabetes have an increased risk of fatal coronary heart disease between 2.70 and

4.53 times that of women without diabetes Because the confidence interval does not contain zero the result is statistically insignificant

2. Researchers assessed the effects of β lactam antibiotics prescribed in the community for acute respiratory tract infection on the prevalence of antibiotic resistant bacteria in an individual child. A total of 119 children with acute respiratory tract infection were recruited in primary care, of whom 71 received a β lactam antibiotic. A prospective cohort study design was used with follow-up at two and 12 weeks. Antibiotic resistance was assessed by the presence of the ICEHin1056 resistance element in up to four isolates of Haemophilus species, recovered from throat swabs at recruitment and follow-up.

At two weeks, 67% of children prescribed antibiotics had isolation of Haemophilus isolates possessing homologues of ICEHin1056, compared with 36% of those not prescribed antibiotics (relative risk = 1.9; 95% confidence interval: 1.2 to 2.9). Taken from Peter Sedgwicks excellent statistical questions BMJ series BMJ 2010;341:c3983

Which of the following is not true (select one)?

a. The relative risk is the ratio of probability of antibiotic resistance in children prescribed antibiotics relative to those not prescribed antibiotics

b. At two weeks, those children prescribed antibiotics were 190% more likely to exhibit antibiotic resistance relative to children not prescribed antibiotics

c. At two weeks, those children prescribed antibiotics had a 90% greater risk of antibiotic resistance relative to children not prescribed antibiotics

d. Relative risk should only be calculated if we can estimate the population at risk

3 In a clinical trial of 734 subjects treated with viagra, 117 reported headaches. In a control group of 735 subjects not treated with Viagra, 29 reported headaches (adapted from: http://www.cramster.com/).

Which of the following represents the absolute risk of having a headache for the Viagra group (select one)?

a. 29/706 b. 117/(117+617) c. 29/(117+617) d. |(29/735) - (117/734)| e. 1/|(29/735 - 117/734)|

Headache No headache control 29 706 735 viagra 117 617 734

http://www.cramster.com/



4 In a clinical trial of 734 subjects treated with Viagra, 117 reported headaches. In a control group of 735 subjects not treated with Viagra, 29 reported headaches (adapted from: http://www.cramster.com/).

Which of the following represents the absolute risk reduction (ARR) between the two groups (select one)?

a. 29/706 b. 117/(117+617) c. 29/(117+617) d. |(29/735) - (117/734)| e. 1/|(29/735 - 117/734)|

5 In a clinical trial of 734 subjects treated with Viagra, 117 reported headaches. In a control group of 735 subjects not treated with Viagra, 29 reported headaches (adapted from: http://www.cramster.com/).

Which of the following represents the Number Needed to Treat (NNT). Select one?

a. 29/706 b. 117/(117+617) c. 29/(117+617) d. |(29/735) - (117/734)| e. 1/|(29/735 - 117/734)|

4.3 Odds Ratio (OR) The odds ratio (OR) is

OR=(Ptreatment /(1- Ptreatment))/(Pplacebo/(1- Pplacebo)) = Oplacebo /Otreatment = (a/b) x (d/c) = (a/b) /(c/d) = ad/bc

"In computing an odds ratio there is no rule about which odds go in the numerator or the denominator, where reasonable I prefer to put the larger value in the numerator to make the ratio come out greater than one." (Howells, 2007 p.154) It should be noted that one value is simply the inverse of the other which I will demonstrate below.

For our data we calculated the odds before, Oplacebo = 0.0178 Otreatment = 0.0102 so the odds ratio is simply 0.0178/0.0102= 1.745098 (placebo to treatment) or 0.0102/0.0178= 0.5730337 (treatment to placebo) this latter value is very

slightly less than the RR of 0.5789297 calculated above. However, if the baseline (i.e. placebo) had not been so low there would have been a difference. Campbell & Swinscow 2009 (p.28) provide a nice table demonstrating how as the baseline risk increases the RR stays the same but the OR increases to reflect the change. See Sistrom & Garvan 2004 for a more detailed discussion along with a chart indicating the inflation factor.

It is interesting to note that the shortcut formula ad/bc gives: (96 x 5376)/(5392 x 55) = 1.740275, which is the placebo to treatment ratio and bc/ad gives .5730337 the treatment to placebo odds ratio.

There is a simple relationship between the two odds ratio values in the previous paragraph 1/1.745098= 0.5730337 this is not the case with relative risks. The odds ratios have a reciprocal relationship, just like the odds for/against.



Event No event Placebo a b a+b

Treatment c d c+d

Never smoked group Heart attack No heart attack

Placebo 96

5392 5488

Aspirin 55

5376 5431





The odds of having a heart attack in the placebo group is 1.7402 the times it is in the aspirin group or conversely the odds of having a heart attack in the aspirin group is .573 times that compared to the placebo group, that is the odds of having a heart attack in the aspirin group is around half that of the control group.

Using software it is possible to obtain a confidence interval. The estimated parameter value of the null hypothesis is the odds ratio of 1 (both groups equivalent) so if the confidence interval (CI) (100 - alpha) does not include 1 the result will also be statistically significant at the alpha level. For example if we have a 95% CI and it does not contain 1 within its range the p-value will be less than 0.05.

Exercise - 4.

Researchers used a case-control study to assess the risk of thrombosis associated with the use of oral contraceptives. Participants were premenopausal women aged less than 50 years, not pregnant, not within four weeks postpartum, and not using a hormone excreting intrauterine device or depot contraceptive.

Women were identified as a case if they had a first objectively diagnosed episode of deep venous thrombosis or pulmonary embolism. A total of 1524 cases and 1760 otherwise healthy controls were identified. The women were then asked about use of oral contraceptives in the past year. The odds ratio of venous thrombosis for current users of oral contraceptives compared with non-users was 5.0 (95% CI 4.2 to 5.8). Taken from Peter Sedgwicks excellent statistical questions BMJ series BMJ 2010;341:c4414

Which of the following is not true (select one)?

a. It was possible to estimate the population at risk in this case-control study b. The odds ratio is an estimate of the population relative risk c. Recent users of oral contraceptives were five times as likely to experience deep venous thrombosis or

pulmonary embolism than non-users d. The result was statistically significant at the 5% critical level of significance

4.4 Relative risk versus odds ratios – study design I mentioned above that the relative risk and odds ratio are equivalent when the baseline risks are low, besides this there is also another factor that needs to be taken into consideration, the manner in which the data was obtained. We can divide studies into ones that either look into the past (retrospective) where we know the outcome but not the exposure/treatment or prospective studies where we control who receives the treatment and wait to see the outcome. We can really only consider 'risk' in prospective studies, as looking back in retrospective studies usually only provides us with the number of those who have the disease/outcome not those who did not, usually we don't know the risk in the population, but this is not always the case and see Langholz 2010 for details. Particular designs are applied to both these retrospective/prospective studies. The Case control approach is often a retrospective study where outcome is measured before exposure ( treatment) and the controls are selected on the basis of not having the outcome. In contrast the Cohort study approach is usually prospective where the outcome is measured after exposure/treatment and here we will have a truer reflection of the risk in the population.



Bland 2000 p.239 (2nd ed.) p242 (3rd ed) provides a good example of the two types of approach, presenting the research by Doll & Hill concerning lung cancer and smoking, the early research used a retrospective case control design comparing smoking habits of those who were admitted to hospital with lung cancer and those admitted with some other condition. Six years latter they published results from a prospective cohort study consisting of 60% of all UK GPs following up the affect of smoking on their mortality after 53 months. I'm sure you can think of many reasons why the so called 'control' group in the 1950 research would not reflect the true incidence of lung cancer in all non smokers and Bland 2000 discusses this at length and is well worth a read.

Retrospective Case control design Smokers and non-smokers among male cancer patients and controls

(Doll & Hill 1950) quoted in Bland 2000 Smokers Non-smokers total

Lung cancer (admitted to hospital for ca lung) 647 2 649

Controls (admitted to hospital not for ca lung) 622 27 649

Smoker/non-smokers odds ratio= (647/2)/(622/27) =14.04

A third type of study is the cross sectional study which obtains information at a single point in time, usually the present, this basically has similar problems to the retrospective design, Campbell & Swinscow 2009 (p.28) also call these association studies and demonstrate how odds ratios for two different conditions can be analysed in these studies.

Cross sectional design 'association study' Hay fever and eczema in 11 year old children Campbell & Swinscow

2009 (p.30)

Odds of hay fever given Eczema = 141/420 = .3357 Odds of hay fever given no eczema = 928/13525 =.06861

Odds ratio hayfever/Eczema = .3357/.06861 = 4.892 = eczema/hayfever ' A child with hayfever has approximately 5 times the odds of having

eczema' and also because odds are reflective -> ' A child with eczema has approximately 5 times the odds of having hayfever '

The 'given' term in the above needs some care in interpreting as we are using it in the context of association. Normally we use it with regard to

causation which is not reflective A causes B but B does not cause A

Hay fever present No hay fever total

Eczema present 141 420 561 No Eczema 928 13525 14453

From the above we can conclude that:

• Odds ratios should be quoted for all studies • Relative risks should usually only be quoted for Prospective studies (but see Langholz 2010)

Prospective cohort study (Doll & Hill 1956) Standardised death rate per 1,000 men (actually GPs) aged 35 53

months followup quoted in Bland 2000 Smokers Non-smokers

Lung cancer .90 .07 Smoker/non-smokers odds ratio= .9/.07=12.9



4.5 Interpreting the Odds ratio

When OR = (Oplacebo/Otreatment) approaches zero In this situation Oplacebo is very small compared to Otreatment This means a small proportion in the placebo group have an event and in contrast a large proportion in the treatment group have the event. Editing our aspirin and heart attack data to reflect such a change gives a placebo to treatment odds ratio of (1/5487)/(5430/1) = 3.356334e-08 = .00000003356 which is a very small odds ratio value and one we would be very unhappy about obtaining!

When OR = (Oplacebo/Otreatment) = 1 (the confused icon!)

In this situation Otreatment is the same as Oplacebo This means the same proportion of patients in both the control and treatments group have the event. Editing once again our aspirin and heart attach data to reflect such a change, we can see that as long as we keep the proportions for both the treatment and control groups the same we get the same result – see below.

When OR = (Oplacebo/Otreatment) is much greater than 1 In this situation Otreatment is very small compared to Oplacebo indicating a small proportion in the treatment group have an event and large proportion in the control group have the event. Editing our aspirin and heart attack data to reflect such a change, gives a placebo to treatment odds ratio of (5487/1)/(1/5430)= 29794410 A very large Odds ratio value and one we would be very happy about obtaining! So concerning the placebo to treatment odds ratio overall we can indicate the feelings we have towards its value from one of unhappiness through to one of great happiness!

Conversely for the treatment/placebo odds ratio the feelings are reversed:

Exercise - 5.

Visit Steve Simon website Most of the material developed while he was working at Children's Mercy Hospital: http://www.pmean.com/definitions/or.htm and http://www.pmean.com/webinars/20100421/OddsRatio.pdf

Never smoked group

Heart attack No heart attack

Placebo Zero infinity Aspirin infinity zero



40% have event = (200)x(300)/(200)x(300) = 1


20% have event = (100)x(400)/(100)x(400) = 1


60% have event = (300)x(200)/(300)x(200) = 1


1 0 2 3 . . . . . .

Interpreting the treatment/placebo odds ratio

1 0 2 3 . . . . . .

Interpreting the placebo/treatment odds ratio

http://www.pmean.com/definitions/or.htm

http://www.pmean.com/webinars/20100421/OddsRatio.pdf



According to Fleiss 2003 p103 the odds ratio was proposed by Cornfield in 1951 as a measure of the degree of association between an antecedent factor and an outcome event such as morbidity or mortality, but only because it provided a good approximation to another measure he proposed, the relative risk, also called the rate ratio.

5. Calculating Risks and odds In contrast to it being very easy to calculate the various risk and odds values obtaining the various confidence intervals is more complex ( and there appears little agreement concerning which one is the best) and this is where statistical software comes into its own. Because analysis of the types of data and research designs discussed above is the bread and butter of epidemiologists an excellent free specialist package has been developed called OpenEpi which I will demonstrate below. Also there are several epidemiological packages for R that allow a similar analysis (epibasix, Epi, Epicalc, epiR, epitools) Many of these have been co-developed or sponsored by Who/ Unicef etc. Lastly there is our old faithful SPSS.

The table below lists the main epidemiological packages in R

R package

name website details

epitools http://www.medepi.com/epitools

Developed by Tomás Aragón, MD, DrPH: Director and Medical Epidemiologist, UC Berkeley Center for Infectious Diseases & Emergency

Readiness Aragon has published a book Applied Epidemiology Using R the first three chapters are available from his web site

epibasix http://cran.r-project.org/web/packages/epibasix/

Developed by Michael Rotondi, Phd student at University of Western Ontario

Elementary Epidemiological Functions for a Graduate Epidemiology/Biostatistics Courses

epiR http://epicentre.massey.ac.nz/Default.aspx?tabid=195 Developed by Mark Stevenson Associate Professor in Veterinary Epidemiology. Massey University, New Zealand

Epicalc Detailed manual available from:

http://apps.who.int/tdr/svc/publications/training-guideline-publications/analysis-epidemiological-data

The Special Programme for Research and Training in Tropical Diseases (TDR) sponsored by UNICEF/UNDP/World Bank/WHO, written by Virasakdi

Chongsuvivatwong of Prince of Songkla University, Hat Yai, Thailand,

Epi http://staff.pubhealth.ku.dk/~bxc/SPE/ Developed by Bendix Carstensen University of Copenhagen http://staff.pubhealth.ku.dk/~bxc/

When using the above applications remember that generally:

• Treatment group =exposed • Control/placebo=unexposed • Diseased (+)= dead or have condition • Not diseased = alive or do not have condition

Lets take a look as some of the above applications using the heart attack aspirin data presented earlier.

http://www.medepi.com/epitools

http://cran.r-project.org/web/packages/epibasix/

http://epicentre.massey.ac.nz/Default.aspx?tabid=195



http://staff.pubhealth.ku.dk/~bxc/SPE/

http://staff.pubhealth.ku.dk/~bxc/



6. Carrying out the analysis

6.1 OpenEpi Here I have used a example from Harris & Taylor 2008 page 41 describing the effect skiing (exposure) had upon the incidence of knee injuries (event/disease). The cases and controls (non skiers) were matched for age

and sex.

R commander is no help here so instead we will use an excellent online application called OpenEpi. If you wish you can download it for free,

Open your web browser and go to the url:

http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm

Select the Two by Two table option on the left of the screen. You will then see a number of tabs across the top of the screen, including Start and Enter,

Select the Enter tab, now you can enter the data, and change the names of the columns/rows. Enter the data and edit the row column names, as shown opposite.

Click on the calculate button

The results are shown below,

Notice that we do not get the actual odds for each group, only the odds ratio.

Chi square test, see the chi square chapter page. Error! Bookmark not defined.. Are the proportions of knee injuries in the skiers/non-skiers groups statistically significantly different?

Risk calculations

Odds calculations with confidence intervals

1 2

1/2 2 - 1

Checks for us the danger of invalid results if the expected cell counts are <5

http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm



6.2 Directly in R In R you can obtain the result using various epidemiology packages, I will look at epitools. First install and load the package

install.packages("epitools", dependencies=TRUE)

library(epitools)

We now need to get the data into R, and I will create a matrix object with two columns and two rows:

thedata <- matrix(c(40, 20, 60, 80 ), nrow = 2 , ncol = 2)

Now give the rows then the columns names, the opposite way round to how we entered the data in the matrix command:

dimnames(thedata) <- list("group" = c("case", "control"), "outcome" = c("injury", "no_injury"))

Using the command epitab() in epitools, produces the odds ratio with a confidence interval:

epitab(thedata)

You can use additional commands to compare various methods of obtaining the confidence intervals, for the odds ratio using any one of these five:

oddsratio(thedata); oddsratio.midp(thedata) oddsratio.fisher(thedata); oddsratio.wald(thedata) oddsratio.small (thedata)

Similarly you can use the riskratio command to find the risk ratios, including a bootstrap estimate. However before doing that, you can see in the above output that the case rather than control group have been used as the reference group (i.e. have an odds ratio of 1) and it would be better when reporting risk to have the control group as the reference group. This is achieved by using the rev="both" command: riskratio(thedata, rev = "both") riskratio.wald(thedata, rev = "both") riskratio.small(thedata, rev = "both") riskratio.boot(thedata, rev = "both") or riskratio(method = "boot", rev = "both") etc.

Finally in the epibasix package it is easier providing a single command summary(epi2x2(thedata)) which gives the odds ratio, relative risk, and variations for both cohort (prospective) and case control (retrospective) studies, all with confidence intervals.

Injured knees healthy cases 40 60

controls 20 80

create a data object called thedata which is a matrix

The data entered columnwise

c - joins the values together

Specify how the data is divided into rows and columns. i.e. here 2 rows and 2 columns

add row and column names to thedata

Create a list consisting of . . .

column names row names row heading column heading

create a matrix from . . . defaults to column then row



Exercise - 6.

Repeat the above analysis description but with the data opposite using openEpi and epitools.

6.2.1 Writing up the results

Taking the example we have a odds ratio of 2.666 for cases/controls whereas 1/2.6666 = 0.3750 gives us the odds ratio for controls/cases. This is because the two odds ratios are technically reciprocals.

So we can say that skiers have a 166% increase in the odds of a knee injury compared to those who do not ski, calculated from (2.66 -1)x100.

The above sentence is equivalent to saying that those who do not ski have a reduction in the odds of a knee injury of 63%, calculated from (1 - .37)x100.

A report would also include confidence intervals and emphasis the clinical important of these differences if any.

6.2.2 Tips and Tricks

When calculating odds and risk ratios need you to be very clear which is the control and which is the intervention group. Before entering your data I always work through the openEpi example to remind myself.

If you feel that you have the odds ratio the wrong way round consider the value you have and divide one by it, if this looks more like the value you were expecting you may have the groups the wrong way round.

6.3 In PASW In SPSS you set the data up as described in the chi square chapter, including if necessary the weighting variable. Then from the main menu -> Analyze -> descriptive statistics -> crosstabs -> statistics selecting the risk option (see below).

Reverting back to the aspirin example.

To obtain the aspirin/placebo odds ratio rather than the placebo/aspirin odds ratio you need to code aspirin=1 and placebo=2, also it matters which variable you select as the column and the rows in the analysis. PASW calculates the

odds ratio slightly differently using the Mantel-Haenszel Common Odds Ratio Estimate. Notice that you also get a CI for the value. I would suggest that if you use this technique to calculate the odds ratio for your own dataset you carry out the analysis described here first to ensure you set the data up correctly.

Never smoked group Heart attack No heart attack

Placebo 96 (1.749%)

5392 5488

Aspirin 55 (1.012%)

5376 5431

.575 = .73/1.27



7. Number needed to treat (NNT) and harm (NNTH) confidence intervals You will have noticed that the output from the various software applications do not provide a Number needed to treat value or confidence interval. Remembering that the NNT is the reciprocal of the ARD (Absolute Risk Difference) which is provided in all the outputs. Poulosa & Kam 2005 (p176) provide a simple method of obtaining a confidence interval for the NNT: A means of reporting the precision of study results is necessary when using NNT. This is usually expressed as the 95% confidence interval (95% CI) for the ARR and is given by

ARR ± 1:96 × SE(ARR),

where SE(ARR) is the standard error of the absolute risk reduction. The 95% confidence interval for NNT is the reciprocal of the values defining the confidence interval for ARR. For example, if in a trial the ARR is 10% and the 95% CI is 5–20%, the NNT becomes 10 (1/0.1) with 95% CI of 5–20 (1/ 0.2–1/0.05). The 95% confidence interval of NNT indicates that 19 out of 20 times the ‘‘true’’ value will be in the specified range.

[end of quote]

Applying the information given in the above paragraph to our heart attack / aspirin data, we have a ARR of .00736 (called a risk difference in the openEpi output and provided in %, and also called a risk difference in the epibasix package in R, but not in percentage this time. Luckily both applications provide confidence intervals for the value. Taking the confidence intervals from the openEpi output and converting them from percentages gives .0174 to .002993 so the upper and lower bounds are 1/0.074 = 13.51 and 1/.002993 = 334.11 We also know the NNT estimate for our sample 1/.00736566 = 135.7652 So we would report this as NNT=135.7 (95% CI 13.5-334.1) a very wide range.

Treatment may not always improve the patients outcome, sometimes it is better to just do nothing! Looking at the expression for NNT 1/|pplacebo-ptreatment|, if we remove the '||' absolute constraint we might end up with a negative value if the treatment has a higher risk than the placebo group. When this happens the value is called the Number Needed to Harm NNH. This is often considered in the context of epidemiological studies when we have exposure to some type of harm. (pexposure-pnon-exposure). Campbell & Swinscow p.29 provide an example and Kowalska &, Mocroft et al 2010 provide an alternative example from the academic literature.



8. Exercise 1. Self reported genitourinary infections during the month before conception to end of first trimester for mothers of offspring born with gastrochisis (cases) and healthy live born babies (controls) Taken from Peter Sedgwicks excellent statistical questions BMJ series (BMJ 2012;344:e2853).

Calculate the various measures described in this chapter.

2 The following exercise was kindly suggested by Dr Alan Worsely professor of Pharmacy Hong Kong University, demonstrating his propensity, like me for using historical articles to demonstrate important concepts.

One of the first RCT trials was conducted in the 1940's investigating the use of Streptomycin in tuberculosis. A link to the report is provided below. Using the tables presented in the article calculate the various risks and odds values discussed in this chapter. Does this approach add anything more to the original presentation of the results?

Article freely available at:

www.jameslindlibrary.org/trial_records/20th_Century/1940s/MRC_bmj/MRC_bmj_kp.html Higher quality version of the article available at:

http://www.bmj.com/content/2/4582/769.full.pdf [use your university login if you do not wish to register]

9. Summary This chapter has considered a range of effect size measures when analysing two binary variables, a common situation in medicine. The usefulness of odds compared to risks and probability was discussed throughout the chapter.

To investigate this topic further I would suggest that you read the relevant chapter of Understanding Clinical papers, Bowers, House and Owens 2006 or the article by Sistrom & Garvan 2004.

10. References Agresti A, Finlay B, 2009 (4th ed) Statistical Methods for the Social Sciences, Prentice Hall

Bland M 2000 (3rd ed) An introduction to Medical statistics. Oxford University Press.

Bowers D, House A and Owens D. 2006 (2nd ed) 2006 Understanding Clinical papers, ISBN 13-978-0-470091302

Campbell M J, Swinscow T D V 2009 (11th ed) Statistics at square one. Wiley-Blackwell and BMJ books

Crawley M J 2005 Statistics: An introduction using R. Wiley

Daniel W W 2006 (8th ed) Biostatistics: A foundation for analysis in the health sciences. Wiley

Field A 2009 Discovering Statistics Using SPSS. Sage

Firooz A, 2007 Reporting of number needed to treat and its difficulties. [letter] J AM ACAD DERMATOL 57 (4) 729-730.

Howell D C 2006 (6th ed) Statistical Methods for Psychology. Thomson Wandsworth

Genitourinary infection Cases Controls Yes 81 425 No 424 4499

Total 505 4924

http://www.jameslindlibrary.org/trial_records/20th_Century/1940s/MRC_bmj/MRC_bmj_kp.html

http://www.bmj.com/content/2/4582/769.full.pdf



Kowalska J D, Kirk O, Mocroft A, Høj L, Friis-Møller, Reiss P, Weller I, Lundgren J D. 2010 Implementing the number needed to harm in clinical practice: risk of myocardial infarction in HIV-1-infected patients treated with abacavir. HIV Medicine, 11, 200–208.

Langholz B 2010 Case-control studies=odds ratios, blame the retrospective model. Epidemiology 21 [January] 10-12 Open access at: http://journals.lww.com/epidem/Fulltext/2010/01000/Case_Control_Studies___Odds_Ratios__Blame_the.3.aspx

Manrique J J, Villouta M F, Williams H C, 2007 Evidence-based dermatology: number needed to treat and its relation to other risk measures. J Am Acad Dermatol. [Apr] 56(4):664-71.

Norman G R, Streiner D L. 2008 (3rd ed) Biostatistics: The bare Essentials.

Poulos J, Kam P C A, 2005 Number needed to treat: A tool for summarizing treatment effect, and its application in anaesthesia and pain management. Current Anaesthesia & Critical Care 16, 173–179

Sistrom C L, Garvan C W 2004 Proportions, Odds risk. Radiology. 230 12-19

Walter S D. 2001 Number needed to treat (NNT): estimation of a measure of clinical benefit. Statistics in medicine. 20:3947–3962

Advanced reading:

Agresti A 2002 (2nd ed) Categorical Data Analysis. Wiley

Fleiss J L, Levin B, Paik M C 2003 (3rd ed) Statistical Methods for Rates and Proportions. New York: Wiley.

Answer to exercise

i Odds in favour = 6/(36-6); odds against = (63-6)/6 Probability of obtaining 7= 6/36 probability of not obtaining 7= (36-6)/36

http://journals.lww.com/epidem/Fulltext/2010/01000/Case_Control_Studies___Odds_Ratios__Blame_the.3.aspx

http://journals.lww.com/epidem/Fulltext/2010/01000/Case_Control_Studies___Odds_Ratios__Blame_the.3.aspx

An Introduction to statistics Risk, rates and odds Written ... · An Introduction to statistics ....

Documents

Transcript of An Introduction to statistics Risk, rates and odds Written ... · An Introduction to statistics ....