1 Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice VTS 17/10/07.

1

Bread and butter statistics:

RCGP Curriculum Statement 3.5: Evidence-Based

PracticeVTS 17/10/07

2

Audit - definition Research – definition Bias Blinding Confidence intervals Forest plot L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence Normal distribution Parameter

Statistic Variable P-value Number needed to treat Number needed to harm Odds ratio Statistical power Sensitivity Positive predictive value Specificity Reliability Validity

Topics for today

3

Topics for today - 1

Audit - definition Research – definition Bias Blinding Confidence intervals

4


Forest plot L’Abbé plot Hypothesis Null hypothesis Incidence Prevalence

5


Normal distribution Parameter Statistic Variable P-value Number needed to treat Number needed to harm

6


Odds ratio Statistical power Sensitivity Positive predictive value Specificity Reliability Validity

7

Useful Websites

http://www.jr2.ox.ac.uk/bandolier/

http://www.cebm.net

http://www.cas.lancs.ac.uk/glossary_v1.1/Alphabet.html

8

Audit – definition Clinical audit is a quality improvement process It seeks to improve patient care and outcomes

through systematic review of care against explicit criteria and the implementation of change

Aspects of the structure, processes, and outcomes of care are selected and systematically evaluated against explicit criteria

Where indicated, changes are implemented at an individual, team, or service level and further monitoring is used to confirm improvement in healthcare delivery

NICE

9

The Audit cycle Identify the need for change

Problems can be identified in 3 areas: Structure, Process, Outcome

Setting Criteria and Standards - what should be happening

Collect data on performance Assess performance against criteria and standards

Identify changes needed

10

The Audit cycle

11

Research - definition Research is an ORGANISED and SYSTEMATIC way of FINDING

ANSWERS to QUESTIONS

SYSTEMATIC - certain things in the research process are always done in order to get the most accurate results

ORGANISED - there is a structure or method in doing research. It is a planned procedure, not a spontaneous one. It is focused and limited to a specific scope

FINDING ANSWERS is the aim of all research. Whether it is the answer to a hypothesis or even a simple question, research is successful when answers are found even if the answer is negative

QUESTIONS are central to research. If there is no question, then the answer is of no use. Research is focused on relevant, useful, and important questions. Without a question research has no purpose

12

Bias Dictionary definition - 'a one-sided inclination of the mind'. It defines

a systematic tendency of certain trial designs to produce results consistently better or worse than other designs

In studies of the effects of health care bias can arise from: systematic differences in the groups that are compared (selection

bias) the care that is provided, or exposure to other factors apart from the

intervention of interest (performance bias) withdrawals or exclusions of people entered into the study (attrition

bias) how outcomes are assessed (detection bias)

This use of bias does not necessarily imply any prejudice, such as the investigators' desire for particular results, which differs from the conventional use of the word in which bias refers to a partisan point of view

13

Blinding participants, investigators and/or assessors remain ignorant concerning the treatments which

participants are receiving. The aim is to minimise observer bias, in which the assessor, the person making a measurement, have a prior interest or belief that one treatment is better than another, and therefore scores one better than another just because of that.

In a single blind study it is may be the participants who are are blind to their allocations, or those who are making measurements of interest, the assessors.

In a double blind study, at a minimum both participants and assessors are blind to their allocations.

In some circumstances much more complicated designs can be used, where blinding is described at different levels.

To achieve a double-blind state, it is usual to use matching treatment and control treatments. For instance, the tablets can be made to look the same, or if one treatment uses a single pill once a day, but the other uses three pills at various times, all patients will have to take pills during the day to maintain blinding.

If treatments are radically different (tablets compared with injection), a double-dummy technique may be used where all patients receive both an injection and a tablet, in order to maintain blinding.

lack of blinding is a potent source of bias, and open studies or single-blind studies are potential problems for interpreting results of trials.

Concealment of allocation The process used to prevent foreknowledge of group assignment in a randomised controlled trial, which

should be seen as distinct from blinding. The allocation process should be impervious to any influence by the individual making the allocation - administered by someone who is not responsible for recruiting participants; for example, a hospital pharmacy, or a central office. Using methods of assignment such as date of birth and case record numbers (see quasi random allocation) are open to manipulation.

Adequate methods of allocation concealment include: centralized randomisation schemes; randomisation schemes controlled by a pharmacy; numbered or coded containers in which capsules from identical-looking, numbered bottles are administered sequentially; on-site computer systems, where allocations are in a locked unreadable file; and sequentially numbered opaque, sealed envelopes.

14

Confidence intervals

Quantifies the uncertainty in measurement. It is usually reported as 95% CI, which is the range of values within which we can be 95% sure that the true value for the whole population lies. For example, for an NNT of 10 with a 95% CI of 5 and 15, we would have 95% confidence that the true NNT value was between 5 and 15.

15

Confidence intervals A confidence interval calculated for a measure of treatment effect

shows a range within which the true treatment effect is likely to lie. Confidence intervals are preferable to p-values, as they tell us the

range of possible effect sizes compatible with the data. A confidence interval that embraces the value of no difference

indicates that the treatment under investigation is not significantly different from the control.

Confidence intervals aid interpretation of clinical trial data by putting upper and lower bounds on the likely size of any true effect.

Bias must be assessed before confidence intervals can be interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies.

Non-significance does not mean ‘no effect’. Small studies will often report non-significance even when there are important, real effects.

Statistical significance does not necessarily mean that the effect is real: by chance alone about one in 20 significant findings will be spurious.

Statistical significance does not necessarily mean clinically important. It is the size of the effect that determines the importance, not the presence of statistical significance.

16

Forest plot

In a typical forest plot, the results of component studies are shown as squares centred on the point estimate of the result of each study. A horizontal line runs through the square to show its confidence interval usually, but not always, a 95% confidence interval. The overall estimate from the meta-analysis and its confidence interval are put at the bottom, represented as a diamond. The centre of the diamond represents the pooled point estimate, and its horizontal tips represent the confidence interval. Significance is achieved at the set level if the diamond is clear of the line of no effect.

The plot allows readers to see the information from the individual studies that went into the meta-analysis at a glance. It provides a simple visual representation of the amount of variation between the results of the studies, as well as an estimate of the overall result of all the studies together

17

Meta-analysis of effect of beta blockers on mortality after myocardial infarction

Lewis and Ellis, 1982

18

In the modern format ~

19

L'Abbé plot

L'Abbé plot A first stage in any review is to look at a simple scatter plot,

which can yield a surprisingly comprehensive qualitative view of the data. Even if the review does not show the data in this way you can do it from information on individual trials presented in the review tables.

Trials in which the experimental treatment proves better than the control (EER > CER) will be in the upper left of the plot, between the y axis and the line of equality (Figure 1). If experimental is no better than control then the point will fall on the line of equality (EER = CER), and if control is better than experimental then the point will be in the lower right of the plot, between the x axis and the line of equality (EER < CER).

20

L'Abbé plot

Visual inspection gives a quick and easy indication

of the level of agreement among trials. Heterogeneity is often assumed to be due to variation in the experimental and control event rates, but that variation is often due to the small size of trials.

L'Abbé plots are becoming widely used, probably because people can understand them. They do have several benefits: the simple visual presentation is easy to assimilate. They make us think about the reasons why there can be such wide variation in (especially) placebo responses, and about other factors in the overall package of care that can contribute to effectiveness. They explain the need for placebo controls if ethical issues about future trials arise. They keep us sceptical about overly good or bad results for an intervention in a single trial where the major influence may be how good or bad was the response with placebo.

Ideally a L'Abbé plot should have the symbols appropriate to the size of the trials. In Figure 2, There is an inset for the symbol size, and the two colours show trazodone used for erectile dysfunction in two different conditions (and with clear clinical heterogeneity, Bandolier 116).

Figure 2: Trazodone for erectile dysfunction in psychogenic erectile dysfunction (dark symbols) and with physiological or mixed aetiology (light symbols)

21

Hypothesis

A tentative supposition with regard to an unknown state of affairs, the truth of which is thereupon subject to investigation by any available method, either by logical deduction of consequences which may be checked against what is known, or by direct experimental investigation or discovery of facts not hitherto known and suggested by the hypothesis.

A proposition put forward as a supposition rather than asserted. A hypothesis may be put forward for testing or for discussion, possibly as a prelude to acceptance or rejection.

“It is a truth universally acknowledged that a man in possession of a good fortune must be in search of a good wife.”

22

Null hypothesis

The statistical hypothesis that one variable (e.g. whether or not a study participant was allocated to receive an intervention) has no association with another variable or set of variables (e.g. whether or not a study participant died), or that two or more population distributions do not differ from one another. In simplest terms, the null hypothesis states that the results observed in a study are no different from what might have occurred as a result of the play of chance.

23

Incidence

The proportion of new cases of the target disorder in the population at risk during a specified time interval. It is usual to define the disorder, and the population, and the time, and report the incidence as a rate.

For some examples of incidence studies, two of the best relate to how Parkinson's disease incidence varies with latitude, and how Perthes' disease (a developmental problem of the hip joint affecting younger children) varies with deprivation.

24

Prevalence

This is a measure of the proportion of people in a population who have a disease at a point in time, or over some period of time. There are several examples of prevalence worth looking at:

Geographic variation in multiple sclerosis prevalence

Prevalence of Atrial Fibrillation COPD prevalence Prevalence of schizophrenic disorders Body piercing - prevalence and risks Prevalence of migraine Prevalence and incidence of gout

25

Normal distribution

Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped.

The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (μ) and the standard deviation (σ).

All normal density curves satisfy the following property which is often referred to as the Empirical Rule.

68% of the observations fall within 1 standard deviation of the mean, that is, between and .

95% of the observations fall within 2 standard deviations of the mean, that is, between and .

99.7% of the observations fall within 3 standard deviations of the mean, that is, between and .

Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean.

26

Parameter

A parameter is a number computed from a population. Contrast this with the definition of a statistic. A parameter is a constant, unchanging value. There is no random variation in a parameter. If the size of the population is large (as is typically the case), then you may find that a parameter is difficult or even impossible to compute. An example of a parameter would be: the average length of stay in the birth hospital for all

infants born in the United States.

27

Statistic

A statistic is a number computed from a sample. Contrast this with the definition of a parameter. If a statistic is computed from a random sample (as is typically the case), then it has random variation or sampling error. An example of a statistic would be: the average length of stay in the birth hospital

for a random sample of 387 infants born in Johnson County, Kansas.

28

Variable

A measurement that can vary within a study, e.g. the age of participants.

Variability is present when differences can be seen between different people or within the same person over time, with respect to any characteristic or feature that can be assessed or measured.

29

P-value

P-value The probability (ranging from zero to one) that the results

observed in a study (or results more extreme) could have occurred by chance. Convention is that we accept a p value of 0.05 or below as being statistically significant. That means a chance of 1 in 20, which is not very unlikely. This convention has no solid basis, other than being the number chosen many years ago. When many comparisons are bing made, statistical significance can occur just by chance. A more stringent rule is to use a p value of 0.01 ( 1 in 100) or below as statistically significant, though some folk get hot under the collar when you do it.

30

Number needed to treat

The inverse of the absolute risk reduction and the number of patients that need to be treated for one to benefit compared with a control

The ideal NNT is 1, where everyone has improved with treatment and no-one has with control. The higher the NNT, the less effective is the treatment

The value of an NNT is not just numeric - NNTs of 2-5 are indicative of effective therapies, like analgesics for acute pain

NNTs of about 1 might be seen in treating sensitive bacterial infections with antibiotics, while an NNT of 40 or more might be useful e.g. when using aspirin after a heart attack

31

Calculating NNTs NNT = 1/ARR

ARR = (CER – EER) where CER = control group event rateEER = experimental group event rate

Sample CalculationThe results of the Diabetes Control and Complications Trial into the effect of intensive diabetes therapy on the development and progression of neuropathy indicated that neuropathy occurred in 9.6% of patients randomised to usual care and 2.8% of patients randomised to intensive therapy. NNT with intensive diabetes therapy to prevent one additional occurrence of neuropathy can be determined by calculating the absolute risk reduction as follows:

ARR = (CER – EER) = (9.6% - 2.8%) = 6.8%NNT = 1/ARR = 1/0.068 = 14.7 or 15

Therefore need to treat 15 diabetic patients with intensive therapy to prevent one from developing neuropathy

32

Number needed to treat Response to antibiotics of women with symptoms of UTI but

negative dipstick urine test results: double blind RCT. Richards et al, BMJ 2005;331:143-6. Reduce duration of symptoms by 2 days?

4 Antibiotic prescribing in GP and hospital admissions for

peritonsillar abscess, mastoiditis and rheumatic fever in children: time trend analysis. Sharland et al, BMJ 2005, 331, 328-9. Prevent one case of mastoiditis?

At least 2500 Trigeminal neuralgia Rx anticonvulsants. To obtain 50% pain relief?

2.5 Arthritis Rx glucosamine for 3-8/52 cf. placebo. To improve

symptoms? 5

MRC trial of treatment of mild HT: principal results. 17,354 individuals 36-64 years with diastolic 90-109 mmHg Rx benzoflurazide and propranolol for 5.5 years cf. placebo. BMJ 1985 291: 97-104. Primary prevention of one stroke at one year?

850

33

Number needed to harm

This is calculated in the same way as for NNT, but used to describe adverse events. For NNH, large numbers are good, because they mean that adverse events are rare. Small values for NNH are bad, because they mean adverse events are common.

An example of how NNH values can be calculated along with NNT is that of inhaled corticosteroids used for asthma, where increasing dose made small improvement in efficacy, but large worsening for dysphonia and oral candidiasis.

34

Odds ratio

The ratio of the odds of having the target disorder in the experimental group relative to the odds in favour of having the target disorder in the control group (in cohort studies or systematic reviews) or the odds in favour of being exposed in subjects with the target disorder divided by the odds in favour of being exposed in control subjects (without the target disorder).

35

Statistical power

Statistical power The ability of a study to demonstrate an association or

causal relationship between two variables, given that an association exists. For example, 80% power in a clinical trial means that the study has a 80% chance of ending up with a p value of less than 5% in a statistical test (i.e. a statistically significant treatment effect) if there really was an important difference (e.g. 10% versus 5% mortality) between treatments. If the statistical power of a study IS low, the study results will be questionable (the study might have been too small to detect any differences). By convention, 80% is an acceptable level of power.

36

Sensitivity

Sensitivity Proportion of people with the target disorder who have a positive

test. It is used to assist in assessing and selecting a diagnostic test/sign/symptom.

A seNsitive test keeps false-Negatives down – 100% sensitive means all with positive tests have the condition

SnNout When a sign/test/symptom has a high Sensitivity: a Negative

result rules out the diagnosis. For example, the sensitivity of a history of ankle swelling for diagnosing ascites is 93%; therefore if a person does not have a history of ankle swelling, it is highly unlikely that the person has ascites.

37

Positive predictive value

Proportion of people with a positive test who have the target disorder

The same as sensitivity

38

Specificity

Specificity Proportion of people without the target disorder who have a

negative test. It is used to assist in assessing and selecting a diagnostic test/sign/symptom.

A sPecific test keeps false-Positives down – 100% specific means all with negative tests do not have the condition

SpPin When a sign/test/symptom has a high Specificity; a Positive

result rules in the diagnosis. For example , the specificity of a fluid wave for diagnosing ascites is 92%; therefore if a person does have a fluid wave, it rules in the diagnosis of ascites.

39

Reliability

Reproducibility Stability over time and place Ease of replication Observer variation Confirmation of results

40

Validity

This term is a difficult concept in clinical trials, but refers to a trial being able to measure what it sets out to measure. A trial that set out to measure the analgesic effect of a procedure might be in trouble if patients had no pain. Or in a condition where treatment is life-long, evaluating an intervention for 10 minutes might be seen as silly. Looking at validity is not always easy, but a good worked example is that on acupuncture and stroke.

1 Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice VTS 17/10/07.

Documents

Transcript of 1 Bread and butter statistics: RCGP Curriculum Statement 3.5: Evidence-Based Practice VTS 17/10/07.