Biostatistics – A Tool for Improving Clinical Decisions

4
INTRODUCTION MEDICINE is an evolving art full of unknowns; however, patients appreciate and occasionally demand the reassurance of solid answers and scientific explanations. The mathematical discipline of statistics, when employed judiciously, helps quantify the uncertainty inherent in the imperfect profession of medicine. Strong designs and compelling results do not require complex statistical manipulation to be influential. In fact, physicians should remain skeptical when a study employs obscure analyses and multiple layers of stratification. Excessive analytical complexity should be avoided. Active controversy always will exist in medical literature. Therefore, a rudimentary understanding of statistical theory is essential for physicians who contribute to ongoing debates. Statistical evidence and patient communication During the continuous search for better treatments and diagnostic tests, the question of how much evidence is enough is confronted. Despite pressure from patients, most doctors wisely avoid such terminology as truth, absolute certainty, and 100% guarantee, even when they wish to convey that no further investigation is warranted. For example, hypochondriac patient X is certain that his headache is caused by a brain tumor. All physicians would like to say, “Mr. X, you do not have a brain tumor”. Unfortunately, when pressed, the prudent clinician must acknowledge that even the enhanced MRI performed last week has a small, but finite, false-negative rate. Although such language rarely appeases the hypersensitive somaticizers, clinicians must aim to use honest terminology. Unfortunately, communicating the inherent imperfection of medicine reduces the impact of your intervention. Patients will realize that you are qualifying Quality in Health BIOSTATISTICS A TOOL FOR IMPROVING CLINICAL DECISIONS Umesh Gupta*, Hema Lal**, Pritindira*** and Anupam Sibal # From the: Head of Quality Improvement and Decision Support Services, Consultant Vascular Surgery*, Resident, DICE**, Executive DICE*** , Director Medical Services & Senior Consultant Pediatric Gastroenterology # , Indraprastha Apollo Hospitals, New Delhi 110 044, India. Correspondence to: Dr. Umesh Gupta, Head of Quality Improvement and Decision Support Services, Indraprastha Apollo Hospitals, Mathura Road, New Delhi 110 044, India. The article discusses how the application of the theories of measurement, statistical inference, and decision trees contributes to better clinical decisions and ultimately to improved patient care and outcomes. It aims at discussing the tools and procedures that are used in statistics. your advice, and this reduces “adjunct placebo benefit”, but ethics require you to use language such as “the evidence supports...” or “our test is unable to detect....” Managing this frustrating situation requires excellent communication skills, a strong knowledge of the medical literature, professional integrity, and patience. REVIEWING THE LITERATURE AND CRITIQUING STUDIES When addressing a patient’s emergency problems in the real world, one or more online resources may be accessed if time permits. A familiarity with the medical databases in advance increases future effectiveness when time is limited. Examples of available medical databases are MEDLINE, e-medicine .com, Cochrane’s Library. Maximizing efficiency for article review 1. When a list of potentially relevant articles has been reduced to manageable proportions, scan the titles to further narrow the search. As the list shrinks, proceed by reading 3 items in every candidate article: the title, the abstract, and the methods section. 2. Keep an author’s conclusions in mind while critically examining the methods section. Methods analysis is the most important step, and, in many cases, it is the last step. Understanding the methods assists in determining the strength and the ability to generalize the conclusions. If the methods are inadequate, proceed to the next article. 3. Ask the fundamental question, “Can these methods provide results that justify the conclusions?” Keep the hypothesis and the conclusions in mind while reviewing the methods. 4. Methods analysis can be divided into 3 steps. First, analyze the design of the study. Then, characterize Apollo Medicine, Vol. 2, No. 2, June 2005 158

Transcript of Biostatistics – A Tool for Improving Clinical Decisions

Page 1: Biostatistics – A Tool for Improving Clinical Decisions

INTRODUCTION

MEDICINE is an evolving art full of unknowns; however,patients appreciate and occasionally demand thereassurance of solid answers and scientific explanations.The mathematical discipline of statistics, when employedjudiciously, helps quantify the uncertainty inherent in theimperfect profession of medicine.

Strong designs and compelling results do not requirecomplex statistical manipulation to be influential. In fact,physicians should remain skeptical when a study employsobscure analyses and multiple layers of stratification.Excessive analytical complexity should be avoided.

Active controversy always will exist in medicalliterature. Therefore, a rudimentary understanding ofstatistical theory is essential for physicians who contributeto ongoing debates.

Statistical evidence and patient communication

During the continuous search for better treatments anddiagnostic tests, the question of how much evidence isenough is confronted. Despite pressure from patients, mostdoctors wisely avoid such terminology as truth, absolutecertainty, and 100% guarantee, even when they wish toconvey that no further investigation is warranted.

For example, hypochondriac patient X is certain that hisheadache is caused by a brain tumor. All physicians wouldlike to say, “Mr. X, you do not have a brain tumor”.Unfortunately, when pressed, the prudent clinician mustacknowledge that even the enhanced MRI performed lastweek has a small, but finite, false-negative rate. Althoughsuch language rarely appeases the hypersensitivesomaticizers, clinicians must aim to use honestterminology. Unfortunately, communicating the inherentimperfection of medicine reduces the impact of yourintervention. Patients will realize that you are qualifying

Quality in Health

BIOSTATISTICS – A TOOL FOR IMPROVING CLINICAL DECISIONSUmesh Gupta*, Hema Lal**, Pritindira*** and Anupam Sibal#

From the: Head of Quality Improvement and Decision Support Services, Consultant Vascular Surgery*, Resident, DICE**,Executive DICE*** , Director Medical Services & Senior Consultant Pediatric Gastroenterology#,

Indraprastha Apollo Hospitals, New Delhi 110 044, India.

Correspondence to: Dr. Umesh Gupta, Head of Quality Improvement and Decision Support Services,Indraprastha Apollo Hospitals, Mathura Road, New Delhi 110 044, India.

The article discusses how the application of the theories of measurement, statistical inference, and decisiontrees contributes to better clinical decisions and ultimately to improved patient care and outcomes. It aims atdiscussing the tools and procedures that are used in statistics.

your advice, and this reduces “adjunct placebo benefit”, butethics require you to use language such as “the evidencesupports...” or “our test is unable to detect....” Managingthis frustrating situation requires excellent communicationskills, a strong knowledge of the medical literature,professional integrity, and patience.

REVIEWING THE LITERATURE AND CRITIQUINGSTUDIES

When addressing a patient’s emergency problems in thereal world, one or more online resources may be accessed iftime permits. A familiarity with the medical databases inadvance increases future effectiveness when time islimited. Examples of available medical databases areMEDLINE, e-medicine .com, Cochrane’s Library.

Maximizing efficiency for article review

1. When a list of potentially relevant articles has beenreduced to manageable proportions, scan the titlesto further narrow the search. As the list shrinks,proceed by reading 3 items in every candidatearticle: the title, the abstract, and the methodssection.

2. Keep an author’s conclusions in mind whilecritically examining the methods section. Methodsanalysis is the most important step, and, in manycases, it is the last step. Understanding the methodsassists in determining the strength and the ability togeneralize the conclusions. If the methods areinadequate, proceed to the next article.

3. Ask the fundamental question, “Can these methodsprovide results that justify the conclusions?” Keepthe hypothesis and the conclusions in mind whilereviewing the methods.

4. Methods analysis can be divided into 3 steps. First,analyze the design of the study. Then, characterize

Apollo Medicine, Vol. 2, No. 2, June 2005 158

Page 2: Biostatistics – A Tool for Improving Clinical Decisions

159 Apollo Medicine, Vol. 2, No. 2, June 2005

Quality in Health

the study population and review inclusion andexclusion criteria. Finally, seek a basic under-standing of the statistical techniques used tosummarize and digest the endpoints (i.e., raw data)that eventually constitute the results section.

5. Ask, “Is this a cross-sectional study or alongitudinal study?” A cross-sectional studyimplies a snapshot in time. A longitudinal study,synonymous with a cohort study, implies that agroup was monitored over time. Either type of studycan be observational, but only a longitudinal studycan be truly interventional.

Determining study design

Observations from cross-sectional studies yieldprevalence data, and observations from longitudinalstudies yield incidence data. If an author is trying to prove acausal relationship, an interventional longitudinal design isrequired.

A longitudinal interventional study is, by definition, aclinical trial. These studies can be randomized ornonrandomized, controlled or uncontrolled. Resultsgenerated from clinical trials should receive the highestweight and carry the greatest influence of allinvestigations.

Types of study

Observational studies

Much clinical evidence comes from observations, andthese types of data are occasionally dismissed because theystem from less rigorous testing. In many situations, thereare no other ethical methods for gathering information. Forexample, studying toxic exposures (e.g., cigarettesmoking) via a clinical trial would be unethical, so mostdata are derived from observation and analogy.

The two major observational designs are cohort designand case-control design.

(a) Cohort studies

The observational study design most similar to aclinical trial is the cohort study. Researchers identifyeligible patients at the beginning of the study and thenpassively observe who is exposed and who remains free ofexposure. Each of the cohorts, of patients is observed overtime, and investigators determine which patients becomediseased or disabled and which patients remain free ofdisease.

(b) Case-control studies

A more efficient, but less rigorous, observationaldesign is a case-control study. In this technique,participants are enrolled on the basis of presence or absence

of disease. This study contrasts with a cohort study in whichdisease status is unknown during the enrollment phase.These case-control subsets (diseased and free of disease)are examined retrospectively for history of exposure tovarious risk factors.

INFERENTIAL STATISTICS: EXTRAPOLATIONFROM SAMPLES TO POPULATIONS

Populations and samples

Populations are the people from which the sample isdrawn. Samples are the subsets from which data aregathered. Summary data about samples yield statistics. Inturn, investigators make inferences from the sample andthen generalize the inferences to the population.Unknowable and theoretical numerical facts aboutpopulations are called parameters.

Interval and continuous dataInterval data (e.g., pulse rate, leukocyte count) have

fixed intervals between data points. Numerical fractions arenot reported for interval data.

If a theoretical limit to how small the interval can be isnot set, the data are classified as continuous data (e.g.,blood pressure, temperature). With continuous data, theprecision of the measurements is limited only by theresolution of the measuring device.

Central tendency

All biologic data have certain random fluctuation;therefore, clinicians require reasonable reassurance thatdifferences noted and effects observed are greater thandifferences explained by chance alone.

Several measures of central tendency, including themean, the median, and the mode, commonly are used.

Measures of variability for continuous data

Measures of the spread or fluctuation of data pointsaround the mean are measures of variability; they areimportant components of the results section and must beacknowledged when extrapolating sample results topopulations. These are:

Variance is an average squared deviation for all datapoints, whereas deviation is defined as the differencebetween a given data point and the sample mean.

Standard deviation is the square root of variance.Standard error is an inferential statistics term that estimatesthe variability of sample means if they could be calculatedrepeatedly on multiple different samples drawn from thesame population. Terms such as decile and percentileidentify specified levels above or below a known fraction ofdata points.

Page 3: Biostatistics – A Tool for Improving Clinical Decisions

Apollo Medicine, Vol. 2, No. 2, June 2005 160

Quality in Health

Standard error

The standard deviation of the sample is termed s and canbe calculated. Of greater utility is the standard deviation ofsample means.

The standard deviation of sample means represents thevariation of possible test statistics obtained if the trial isrepeated numerous times with different samples each time.It is defined as sigma divided by the square root of thesample size. Sigma is the parameter describing thepopulation’s standard deviation.

Standard error is very useful. An example based on theprevious hypothetical blood pressure study is shownbelow. Actual reported values might be as follows:

• Initial DBP: 102 +/– 12 mm Hg• Follow-up visit DBP: 89 +/– 11 mm Hg• Mean difference of 13 +/– 7 mm Hg

The value after the +/– quantifies the stability of thepoint estimate of the parameter.

Using confidence intervals

An interval formulated to have specific probability ofcontaining the real value of an unknown parameter. A 95%confidence interval has a 95% probability of containing theparameter being estimated.

PIs define the actual intervals that most readersvisualize when reading CIs. PIs are superior to CIs and maybe on the horizon of statistical theory; however, theambiguities involved in assigning a priori probabilitieslimit the utility of PIs in current practice.

SIGNIFICANCE TESTING

Minimize bias before significance testing

Observations can be explained by 3 differentpossibilities, which are bias, chance, or truth. Biases aresystematic factors that may be known or unknown andpreferentially affect an arm of an investigation. Selection,measurement, and confounding are the major sources ofbias. Bias may influence results in any phase of a study,from recruitment through data collection to data analysis. Astrong study design helps to reduce the influence of bias.

When clinical trials are unethical or impractical, lessobjective research must suffice. Confounders, defined asrecognized or unrecognized intervening variables thatcould account for observed differences, often troubleobservational studies.

Demonstrating significance - Avoiding the type Ierror

A type I error occurs if authors conclude a significant

difference or effect exists when it does not, similar to afalse-positive result. This error sometimes is called alphaerror; alpha is the maximum P value considered statisticallysignificant. Conversely, P value is meant to quantify thelikelihood of making a type I error.

When P is less than 0.03, it means that the chance ofobtaining results as impressive as these results by chancealone is less than 3%.

Individual statistical tests

Statistical tests are the tools used to generate the Pvalues and CIs required to quantify the likelihood ofmaking a type I error. The 3 major tests are Student t test,Wilcoxon rank sum, Chi-square test.

Type II errors and power

A type II error is a type of false-negative conclusion.The mistake of concluding that no difference exists when,in fact, a difference does exist is a type II error. If a studyyields negative results, the power of the study isinvestigated. The probability that a clinical trial finds astatistically significant difference, provided that it reallyexists, is called statistical power. Power reporting ismandatory for studies with negative results.

A priori power calculations should be made prior to datagathering to determine how many patients are needed in thestudy. The sample size is a major determinant of the powerof a study, and sample size calculations are an integral partof a power analysis.

GENERALIZABILITY: USING ONLINE EVIDENCE

In clinical trials, inclusion and exclusion criteria andacquisition methods help confirm that a sample isrepresentative of its population; the better the selectiontechniques, the less chance of sampling bias. Statisticalhypothesis testing then determines how likely that theobserved results were caused by chance alone.

When a strong study design yields statisticallysignificant results, the study has internal validity. However,internal validity does not fully address applicability to aparticular practice.

The concept of applicability to a particular practice iscalled generalizability or external validity. A step that oftenis forgotten is comparing the population from which thesample was drawn to the population in the clinical practice.Consider differences between the study’s population andthe clinical practice’s patients prior to extrapolatingconclusions. Comparability of populations is independentof the strength of the authors’ conclusions and inde-pendent of the mathematical exercise of inferentialstatistics.

Page 4: Biostatistics – A Tool for Improving Clinical Decisions

161 Apollo Medicine, Vol. 2, No. 2, June 2005

Quality in Health

MEASURING THE PERFORMANCE OFDIAGNOSTIC TESTS

Diagnostic impressions are syntheses of impreciselycharacterized input from the history, the physical exam, thelab values, the radiological data, and countless otherclinical tests. The essential factors for quantifying thepredictive accuracy of diagnostic data is discussed.

Sensitivity

Sensitivity is defined as the true-positive rate (e.g., thefraction of people known to have a disease who test positivefor a disease). Mathematically, sensitivity is expressed asfollows:

True positives/true positives plus false negatives: Truepositives are defined as individuals with disease who testpositive; false negatives are individuals with disease whotest negative. The false-negative rate is equal to 1 minus thesensitivity. The more sensitive a test is, the less likely it is tomiss true disease.

An example is the enzyme-linked immunosorbentassay (ELISA) test that is used to determine if a person isHIV positive. If the test is negative, the result is reported asnegative (clinically useful information). The sensitiveELISA test is less clinically useful when positive. If the labfinds a positive ELISA result, the specimen must beprocessed further with a specific test, such as the Westernblot, prior to patient notification.

Specificity

Specificity is the fraction of individuals who are diseasefree and test negative for the disease. Mathematically,specificity is expressed as follows.

True negatives/true negatives plus false positives: Truenegatives are those people who are disease free and testnegative. False positives are those people who are diseasefree but test positive. The false-positive rate is equal to 1minus the specificity.

For example, serious emotional and financialconsequences exist when a person is notified incorrectly ofan HIV-positive test result when, in fact, that person isdisease free. Therefore, delaying patient notification forpositive HIV ELISA results until confirmation is madeby the highly specific Western blot assay is goodpolicy.

Accuracy

Sometimes, an alternate test is simply superior. A termthat encompasses specificity and sensitivity is accuracy.Accuracy is the proportion of all test results (positive andnegative) that are correct. For example, a CT scan of thecervical spine is more sensitive and more specific than plainx-rays for detecting fractures. Therefore, stating that a CTscan is more accurate than an x-ray for this diagnosis iscorrect.

Conclusion

A critical distinction between the scientific approachand other methods of inquiry lies in the emphasis placed onreal world validation. Clinicians are then called upon todefend decisions on the basis of empirical researchevidence. Consequently, the clinician must be an intelligentconsumer of medical research outcomes, able tounderstand, interpret, critically evaluate and apply validresults from the latest medical research.

Critical analysis permits allocation of appropriateweight upon evidence based on the strength of the researchdesign and findings. No perfect studies exist, and nearlyevery study can provide some useful information despitedesign limitations or small sample sizes. The merit of someresearch simply lies in the hypotheses that are inspired. Inthe future, more rigorous testing with a priori hypotheseswill support or discredit theories spawned in hypothesis-generating studies.

BIBLIOGRAPHY

1. Abel U. Statistically significant–an overestimation of thevalue of knowledge? Strahlenther Onkol 1999 Apr; 175Suppl 1: 21-22 [Medline].

2. Ambrosius WT, Manatunga AK. Intensive short courses inbiostatistics for fellows and physicians. Stat Med 2002Sep 30; 21(18): 2739-2756[Medline].

3. Bailar JC, Mosteller F. Guidelines for statistical reportingin articles for medical journals. Amplifications andexplanations. Ann Intern Med 1988 Feb; 108(2): 266-273[Medline].

4. Davey Smith G, Bracha Y, Svendsen KH, et al. Incidenceof type 2 diabetes in the randomized multiple risk factorintervention trial. Ann Intern Med 2005 Mar 1; 142(5): 313-322[Medline].

5. Greenberg RH: Medical Epidemiology. 2nd ed. 1996: 05-131.