Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A...

7
Health Care Management Science 7, 43–49, 2004 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal CHRISTOPHER EVANS Mapi Values, 15 Court Square, Suite 620, Boston, MA 02108, USA E-mail: [email protected] MANOUCHE TAVAKOLI University of St. Andrews, St. Katherine’s West, The Scores, St. Andrews, FIFE KY16 9AL Scotland, UK BRUCE CRAWFORD Mapi Values, 15 Court Square, Suite 620, Boston, MA 02108, USA and University of St. Andrews, Scotland, UK Abstract. Researchers have grappled with various ways of placing the results of an economic evaluation in the appropriate context. One of the most common methods is to relate the results of a study to an appropriate benchmark (commonly, $50,000 per QALY in the US or £30,000 per QALY in the UK). This paper examines the foundation for these cut-off points and critiques their use by researchers. Although it is difficult to establish an appropriate benchmark this paper notes that reference points may be too low based on published data. Further, the inconsistent application of benchmarks, and differences in the calculation of a value of a statistical life, will lead to an inefficient allocation of health care resources. Keywords: economic evaluation, cost benefit, methodology, benchmarks 1. Introduction The rising cost of health care provision coupled with a heavy emphasis on quality and value for money has undoubtedly caused a steady increase in interest in pharmacoeconomic re- search in recent years. Academics and other researchers have conducted numerous economic evaluations on the value of pharmaceutical products, devices and other interventions for the purpose of providing information to health care decision makers in order to allocate scarce resources efficiently within the health sector. Early inconsistencies in study methodolo- gies and reporting of such evaluations led researchers to be- come interested in establishing a framework for economic analyses. The initial structure of this framework was devel- oped in Canada and Australia and took the form of formal guidelines. Following this work, researchers in the United States and the United Kingdom (UK) became interested in establishing a basis for pharmacoeconomic studies. This foundation set forth the principles of a reference case which researchers are encouraged to follow. This methodological standard maintains that all pharma- coeconomic evaluations should include the following ele- ments: the societal perspective; comparisons should be taken into account with regards to existing practice; the boundaries of the study should be broad; the time horizon should be long enough to capture relevant future effects; where possible out- come measures (including morbidity and mortality) should be incorporated into a single measure – a quality adjusted life year (QALY); uncertainty should be assessed using appropri- Corresponding author. ate sensitivity analyses and discounting should be conducted and undominated options should be reported as incremental ratios [1]. However, the authors of these standards would ac- knowledge that pharmacoeconomics is not an exact science and there is a wide variation in analyses, and the recommen- dations do not include everything that is important to deci- sion makers (for example, issues in equity and distributive justice are not addressed in the standards). Given the de- gree of uncertainty attached to both clinical and cost data, and the methodologies used, the results of most cost-effectiveness studies cannot be said to be an exact measurement of the inter- ventions under consideration. Results should also be weighed against other issues including ethical, political, access and fu- ture research and development. Nonetheless, it is still impor- tant to specify a framework for cost-effectiveness analyses so that valid comparisons may be made. Cost-effectiveness analysis (CEA) is used as a methodol- ogy to help policy decision makers to maximize total health gain for the scarce resources from the society point of view [2,3]. Thus, standardization is important as researchers and policy makers contemplate the use of two widely used rules to determine whether a new intervention demonstrates value for money: the league table approach and the threshold ap- proach. However, for cost-effectiveness studies to have any use to decision makers it is important to establish not only that the methods and the data used are homogeneous among all studies, but also what is the threshold above which a new/existing technology ceases to become cost-effective. Al- though methodological issues have been addressed elsewhere [4,5], the threshold issue has largely remained unresolved and

Transcript of Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A...

Page 1: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

Health Care Management Science 7, 43–49, 2004 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

Use of Quality Adjusted Life Years and Life Years Gained asBenchmarks in Economic Evaluations: A Critical Appraisal

CHRISTOPHER EVANS ∗Mapi Values, 15 Court Square, Suite 620, Boston, MA 02108, USA

E-mail: [email protected]

MANOUCHE TAVAKOLIUniversity of St. Andrews, St. Katherine’s West, The Scores, St. Andrews, FIFE KY16 9AL Scotland, UK

BRUCE CRAWFORDMapi Values, 15 Court Square, Suite 620, Boston, MA 02108, USA and University of St. Andrews, Scotland, UK

Abstract. Researchers have grappled with various ways of placing the results of an economic evaluation in the appropriate context. Oneof the most common methods is to relate the results of a study to an appropriate benchmark (commonly, $50,000 per QALY in the US or£30,000 per QALY in the UK). This paper examines the foundation for these cut-off points and critiques their use by researchers. Although itis difficult to establish an appropriate benchmark this paper notes that reference points may be too low based on published data. Further, theinconsistent application of benchmarks, and differences in the calculation of a value of a statistical life, will lead to an inefficient allocationof health care resources.

Keywords: economic evaluation, cost benefit, methodology, benchmarks

1. Introduction

The rising cost of health care provision coupled with a heavyemphasis on quality and value for money has undoubtedlycaused a steady increase in interest in pharmacoeconomic re-search in recent years. Academics and other researchers haveconducted numerous economic evaluations on the value ofpharmaceutical products, devices and other interventions forthe purpose of providing information to health care decisionmakers in order to allocate scarce resources efficiently withinthe health sector. Early inconsistencies in study methodolo-gies and reporting of such evaluations led researchers to be-come interested in establishing a framework for economicanalyses. The initial structure of this framework was devel-oped in Canada and Australia and took the form of formalguidelines. Following this work, researchers in the UnitedStates and the United Kingdom (UK) became interested inestablishing a basis for pharmacoeconomic studies. Thisfoundation set forth the principles of a reference case whichresearchers are encouraged to follow.

This methodological standard maintains that all pharma-coeconomic evaluations should include the following ele-ments: the societal perspective; comparisons should be takeninto account with regards to existing practice; the boundariesof the study should be broad; the time horizon should be longenough to capture relevant future effects; where possible out-come measures (including morbidity and mortality) should beincorporated into a single measure – a quality adjusted lifeyear (QALY); uncertainty should be assessed using appropri-

∗ Corresponding author.

ate sensitivity analyses and discounting should be conductedand undominated options should be reported as incrementalratios [1]. However, the authors of these standards would ac-knowledge that pharmacoeconomics is not an exact scienceand there is a wide variation in analyses, and the recommen-dations do not include everything that is important to deci-sion makers (for example, issues in equity and distributivejustice are not addressed in the standards). Given the de-gree of uncertainty attached to both clinical and cost data, andthe methodologies used, the results of most cost-effectivenessstudies cannot be said to be an exact measurement of the inter-ventions under consideration. Results should also be weighedagainst other issues including ethical, political, access and fu-ture research and development. Nonetheless, it is still impor-tant to specify a framework for cost-effectiveness analyses sothat valid comparisons may be made.

Cost-effectiveness analysis (CEA) is used as a methodol-ogy to help policy decision makers to maximize total healthgain for the scarce resources from the society point of view[2,3]. Thus, standardization is important as researchers andpolicy makers contemplate the use of two widely used rulesto determine whether a new intervention demonstrates valuefor money: the league table approach and the threshold ap-proach. However, for cost-effectiveness studies to have anyuse to decision makers it is important to establish not onlythat the methods and the data used are homogeneous amongall studies, but also what is the threshold above which anew/existing technology ceases to become cost-effective. Al-though methodological issues have been addressed elsewhere[4,5], the threshold issue has largely remained unresolved and

Page 2: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

44 C. EVANS ET AL.

illusive. Furthermore, the use of QALY as an outcome mea-sure and the methods used for calculating QALYs [6], havebeen questioned, as it may not capture all the relevant ben-efits of a health care intervention given the short time frameof the clinical and economic data on which it is based. Ithas been argued elsewhere [7] that a single measure such asQALY can be misleading and may result in over expansionof the health care system. This is because QALYs are used associety’s willingness to pay and thus represent shadow prices;however, these shadow prices will depend on the scale of to-tal health care expenditure and therefore the true opportunitycost can only be assessed by examining what is forgone inother sectors of the economy.

It is important to consider whether these standards, whenthey are adhered to by researchers, actually influence deci-sion-making. This paper examines the problems associatedwith conducting studies that utilize the recommended end-points, QALYs or life years gained, and how placing thesestudies in an appropriate decision making process may be dif-ficult. This paper first examines current recommendations forbenchmarks in reporting cost per QALY with the aim of iden-tifying technologies that provide value for money and permitmore efficient resource allocation, and then vets whether ornot these recommendations are appropriate from a scientificstance.

2. Cost per quality adjusted life year/life year gained

Results presented in the form of incremental cost-effective-ness ratios form the core of cost-effectiveness and utilityanalyses. Although many measures of effectiveness can beused, life expectancy (in the form of life years gained orsaved) and especially QALYs are endpoints that are fre-quently employed. However, the conventional utility-basedQALY approach has been criticized on the ground of per-spective (individual or society), the issue of illness severityand inadequate consideration to equity [8,9].

Of the two possible endpoints, quality adjusted life yearsare promoted as the most appropriate endpoint in economicevaluations as they incorporate both quality and quantity im-provements in life and when used across a wide range of dis-eases, comparisons may be made between diseases so thatrational decisions about health care allocation may be made.Ultimately, studies that use QALYs or life years saved as anendpoint can be placed on a league table and ranked. Exam-ples of general league tables have been reported in the liter-ature in pounds sterling [10,11], in US dollars [12,13], andUS dollars specific to coronary heart disease [14]. The leaguetable allows researchers the opportunity to place their find-ings in a broader context and to promote appropriate resourceallocation within health care systems [10]. Decision-makersthen compare and consider selecting those interventions withthe lowest cost per QALY until the budget for health care isexhausted. However, the problem with league tables is thatCEAs are carried out for a small number of interventions thatmay not allow decision-makers to maximize health gain withthe limited financial resources at their disposal.

In reality, it is unlikely that any health authority or gov-ernment agency would allocate health care resources in thismanner. Thus, due to a variety of reasons, which include in-complete sets of incremental cost-effectiveness ratio (ICER),differences in methodologies, time periods and perspectivesused in the various studies included on the league tables,actual rankings of conditions might be inaccurate and in somecases confusing. Due to these problems, decision-makershave sought guidance on how to interpret the results ofeconomic evaluations. This has led researchers to providerecommendations as to appropriate benchmarks (threshold)in pharmacoeconomic studies.

3. Benchmarks

Benchmarks are standard cut-off points or points of referencefor researchers who are interested in determining if a particu-lar intervention is “cost-effective”. Several benchmarks havebeen recommended in the literature and, in some cases, ar-bitrary threshold levels have been used by researchers [15].Researchers have made recommendations in three articles asto an appropriate cut-off point for cost utility studies [16–18].Various government agencies in the US and the UK have alsodeveloped a broad standard. As a result many authors [19,20]have cited these articles as a justification for the adoption of aparticular technology. A description of the suggested bench-marks and a discussion of the weaknesses of the approachesfollow in the next sections.

3.1. Round number benchmark

The aim of economic evaluation is to provide additional in-formation to improve decisions regarding resource allocationwithin the health care system. The application of economicevaluations requires that the techniques used are appropriatein order for the results to be valid. In the US and Europeit has become common practice to make comparisons be-tween health care interventions in terms of their relative cost-effectiveness (cost per life year or cost per quality-adjustedlife-year gained).

Some researchers, without any analytic foundation use a“rule of thumb” of $50,000 per QALY as the threshold level.Any interventions that cost less than this amount are consid-ered “cost-effective”. This cut-off point is essentially arbi-trary, but some justification for the round number has beenmade by linking the $50,000 per QALY to the annual cost ofcaring for a dialysis patient in the US by Medicare [15]. Thesupport for the number comes from the belief that since renaldialysis is covered under the public financing of health care(i.e., Medicare) it represents society’s willingness to pay forsuch interventions.

Two problems arise from this approach to creating abenchmark. First, the $50,000 per QALY figure has essen-tially remained static. As Hirth and others note [15], a morereasonable range for the cost per QALY for dialysis is in theregion $74,000 to $95,000 in 1997 dollars, which implies that

Page 3: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

BENCHMARKS IN PHARMACOECONOMIC EVALUATIONS 45

the $50,000 threshold level is too low. Second, linking the$50,000 per QALY to the Medicare benefit does not neces-sarily reflect society’s willingness to pay for dialysis: it maymerely reflect the particular ability of a group of individu-als to extend coverage to their members. Further, coveragedecisions for Medicare are a product of political and policyconcerns.

Comparisons against a dialysis standard, implies that dial-ysis was somehow considered as a benchmark to which otherinterventions should be considered to be cost-effective, so thatanything below that level must be “cost-effective”. It wasbelieved when the Medicare entitlement was passed that thebeneficiaries would be relatively young and employed, andthe beneficiaries would be able to return to work after reha-bilitation. However, the treatment population is now quite dif-ferent than the original population: it is larger and composedof older patients with comorbid conditions beyond their renaldisease [21]. This suggests that the overall economic impli-cations at its creation are much different than that of today,and if society’s willingness to pay for interventions is to beaccounted for, it is appropriate to consider the original pro-gramme scope. Further, the economics of the end stage renaldisease programme was never of primary concern. Deliber-ations on the matter of coverage concluded: “patient accep-tance criteria should be based on the medical assessment ofbenefits and burdens of treatment and on the best interests ofindividual patients, not economic objectives . . .” [21]. Thus,if the economics of coverage are not of primary concern it isdifficult to make the case for renal dialysis as an appropriatebenchmark.

3.2. Academic benchmarks

The earliest standard dating from 1982 by Kaplan andBush [16], makes three levels of recommendation (table 1).The authors suggest that interventions may be broken intothree levels of “cost-effectiveness”: cost-effective, justifiablein most cases and questionable. The analytic foundation forthese three levels is subject to scrutiny. The authors note thatalthough it is impossible to develop definitive rules their rec-ommendations are based on several, uncited previous studies.Further that the recommendations are “justified by many cur-rent expenditures for tertiary medical care, and also by analy-ses on the economic value of human labour and consumption(human capital)” [16].

Table 1Guidelines for the adoption of medical technologies.

Cost per Well-Year Policy implication

Less than $20,000 per Well-Year Cost-effective by current standards

$20,000 to $100,000 per Possibly controversial, but justifiable byWell-Year many current examples

Greater than $100,000 per Questionable in comparison with otherWell-Year health care expenditures

Source: [16].

Notwithstanding the serious lack of documentation to sup-port the guidelines, the guidelines are many years out of date.Simply inflating the guideline numbers to 2001 dollars yieldsrecommendations well in excess of those in table 1. For in-stance, taking the upper estimate of $100,000 in 1982 dollarsand converting it to 2001 dollars yields a new upper bound of$187,055 [22].

This is not an academic issue: it appears that some re-searchers are currently still using this standard, and fail toacknowledge the dollar figures should be adjusted for infla-tion. This is the case of a recent analysis of adjuvant inter-feron (IFN)-α-2b in melanoma patients that cited the Kaplanand Bush recommendations without adjusting them to main-tain a constant value in real terms [20]. In addition, the cur-rencies of the two studies are different: the reference articleis in US dollars and the study article is in Euros. This er-ror is compounded to the extent the results are quoted in othervenues. For instance, the results for IFN-2b therapy were laterreported under the heading of “below threshold of 20,000 Eu-ros/LYG” [23]. The heading seems to suggest that there issome agreed upon threshold “used by researchers to define afavourable economic outcome”, when in fact this is not thecase.

In an article from 1992, Canadian researchers [17] madebroadly similar tentative recommendations. Technologieswere divided into three main levels: those that cost less thanC$20,000/QALY, those between C$20,000 to C$100,000/QALY and those that are more than C$100,000/QALY(table 2). As with the earlier guidance, a strong analytic foun-dation for the recommendations is absent, and in an effort to

Table 2Grades of recommendation for the adoption of appropriate utilization of

new technologies.

Grade Recommendation

A. Compelling evidence for the adoption and appropriate utilization.The new technology is as effective as or more effective than theexisting one and is less costly.

B. Strong evidence for adoption and appropriate utilization(a) The new technology is more effective than the existing one and costs

less than $20,000 per quality adjusted life year (QALY) gained.(b) The new technology is less effective than the existing one, but its

introduction would save more than $100,000/QALY gained.

C. Moderate evidence for adoption and appropriate utilization(a) The new technology is more effective than the existing one and cost

$20,000 to $100,000/QALY gained.(b) The new technology is less effective than the existing one, but its

introduction would save $20,000 to $100,000/QALY gained.

D. Weak evidence for adoption and appropriate utilization(a) The new technology is more effective than the existing one and

costs more that $100,000/QALY gained.(b) The new technology is less effective than the existing one, but its

introduction would save less than $20,000/QALY gained.

E. Compelling evidence for rejectionThe new technology is less effective than or as effective as the existingone and is more costly.

Source: [17].Note: in Canadian dollars.

Page 4: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

46 C. EVANS ET AL.

Table 3Recommendations based on the quality of the evidence and economic eval-

uations.

Quality Cost-utility categories

of the < £3,000 £3,000–£20,000 > £20,000 Negative lifeevidence per life year per life year per life year years

I Strongly Strongly Limited Not supportedrecommended recommended Support

II Strongly Supported Limited Not supportedrecommended support

III Supported Limited Limited Not supportedsupport

IV Not proven Not proven Not proven Not proven

Source: [18].Notes: I – strong evidence from a RCT, II – evidence from well designedtrials, III – opinions from respected authorities, IV – evidence is inadequate.

support this standard the authors cited the unsupported Ka-plan and Bush [16] guidelines.

Weinstein notes that it is interesting that both the Canadianand the American researchers came up with essentially thesame guidelines [24]. What is particularly interesting is thethresholds developed by Kaplan and Bush, after adjusting forcurrency exchange and inflation, had twice the real value ofthe 1992 Canadian dollars [24]. As Weinstein, states: “in realterms the thresholds have changed, but the appeal of roundnumbers is long lasting” [24].

In the United Kingdom, researchers have also created rec-ommendations for the adoption of medical technologies [18].These recommendations, however, fall below the US andCanadian benchmarks, and it is also not clear what the theo-retical reasoning is for their justification. The UK recommen-dations; however, went beyond those of the US and Canadianand explicitly incorporated the quality of the evidence in thedecision making process: not only should researchers exam-ine the cost, but also the underlying quality of the evidenceon which the conclusion is based (table 3). Unfortunately,although offering an improvement by linking the recommen-dation to the quality of evidence this guidance also suffersfrom a lack of a theoretical underpinning.

In reality, the upper bound of £20,000 was never adheredto in the UK. In a recent article in The Financial Times [25],it was suggested the National Institute for Clinical Excellence(NICE) is moving towards a cut-off point for treatments of ap-proximately £30,000 per QALY. This article suggests, “in ef-fect, a league table of cost-effectiveness is starting to emerge.Below the £30,000 line, the NHS will fund it; above the lineit will not”. Although it looks that there may be a consensusof £30,000 per QALY in the UK among health professionals,academics and managers, the now familiar pattern is repeated:it is not clear how this £30,000 per QALY was established.

3.3. US governmental agencies

Various agencies of the US government have examined theimpact that regulations would have in terms of the costs of

implementing a programme (e.g., nutritional labeling) againstthe benefits of the programme. These benefits may be mea-sured in several ways, but one method is to measure life sav-ing benefits in terms of the value of a statistical life or lifeyear. As an example, the Food and Drug Administration(FDA) has examined the health benefits that arise from pre-venting salmonellosis due to poor refrigeration of eggs. Aspart of the proposed rule analyzing the regulatory impact, theFDA calculated a quality adjusted life day at $630 in 1997[26]. This figure was based on the value of a statistical lifeof $5 million with an average of 21.8 discounted life years(($5 million ÷ 21.8) ÷ 365). A lower estimate was calcu-lated at $80 per day based on average gross domestic productper person (($8 trillion ÷ 268 million Americans) ÷ 365).An upper bound estimate was calculated using what the reg-ulators thought was the most plausible upper bound estimateof 8.4 million per statistical life. This yields approximately$1,000 per life day (($8.4 million ÷ 21.8) ÷ 365). The lowerboundary, base case and upper bound per life year work outto: $29,200, $229,358 and $385,321, respectively. In anotherFDA study, of the costs and benefits of amending regulationson food labeling $100,000 per life year was used [27].

This suggests a wide variation of uncertainty attached tothese cost per life years gained. Furthermore, the main, andobvious problem is that there is little or no consensus evenwithin the FDA as to the appropriate value of a year of life.This problem is further complicated to the extent other agen-cies use different estimates. The Environmental ProtectionAgency has suggested benefits be estimated based on thevalue of a statistical life of $4.8 million and the US De-partment of Agriculture suggests a much lower number andchooses instead to use $721,418 [28].

3.4. Discussion

Some researchers have questioned the inherent problem withthe threshold cost-effectiveness ratio on issues of efficientresource allocation by advocating that decision makers alsoneed information on opportunity costs [2,29,30] and on mea-sures of utility values and equity [8,9,31–34]. Furthermore,cut off points were never meant to be definitive and perma-nent as the authors acknowledge [16] and it was envisagedthat other analyses were to be completed and also other con-siderations would need to be taken into account before reach-ing the final decision [17]. However, there are several prob-lems associated with the current application of benchmarks ineconomic evaluations. The most important issue is the lack ofconsensus as to where the current threshold should be set. Asnoted above there is substantial deviation in the numbers used.Hirth [15] and others have found a “tremendous” variation intheir review of QALYs. In their study of the cost per QALY,they found median estimates that ranged from $25,000 (hu-man capital approach), to $93,400 (revealed preference stud-ies), to $161,305 (contingent valuation studies) per QALY.Of note is that two of these estimates are substantially higherthan the ones recommended by academic researchers. A com-parison of possible “appropriate” benchmarks is found in

Page 5: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

BENCHMARKS IN PHARMACOECONOMIC EVALUATIONS 47

Table 4Potential “cost-effective” benchmarks for economic evaluations.

Recommended benchmark Source

$20,000 per Kaplan and Bush [16] and Laupacis et al.QALY/Well-Yeara [17]

$29,200 per life year Food and Drug Administration lowestimate (in the salmonellosis regulation)

$37,000 per Kaplan and Bush estimate of $20,000QALY/Well-Year inflated to 2001 dollars

$50,000 per QALY Arbitrary or as a comparison to Medicarerenal dialysis

$85,000 per QALYb Adjusted estimate of $50,000 per QALYfrom Hirth et al. [15]

$100,000 per life year Food and Drug Administration (regulationson food labeling)

$229,358 per life year Food and Drug Administration midpointestimate (in the salmonellosis regulation)

$358,000 per life year Food and Drug Administration highestimate (in the salmonellosis regulation)

a In Canadian or US dollars.b $85,000 is the mean of the proposed range of $74,000 to $95,000.

table 4. As can be seen there is over a $300,000 differencebetween the lowest and the highest benchmark. Further, threeof the recommendations are at or exceed the level that is typ-ically viewed as not cost-effective. In fact, depending on thesource of the estimate, $100,000 may be the lower boundaryfor benchmarks – not the upper extreme.

The variability in numbers used by governmental bodies inthe United States has important implications. Since agenciesmust compete with other governmental bodies for funding,and compete internally for financial resources, it is impera-tive to ensure that they are all following the same procedureswhen examining the economic impact of new programmes.The choice of a value for a statistical life or life year cangreatly influence the results of these economic analyses. Theuse of different values creates a situation where the likelihoodof misallocating financial resources to maximize health out-comes increases.

One might be tempted to say interventions that are at theextremes are easy to identify as cost-effective or not cost-effective. However, even using a benchmark in this areamay not be fruitful. Many preventative health care inter-ventions are not reimbursed by insurers even though preven-tative strategies tend to have more favourable incrementalcost-effectiveness ratios compared to curative interventions.Indeed, even extremely expensive medications to treat rarediseases, which would have unfavorable incremental cost-effectiveness ratios, are being reimbursed by private insurersin the US (e.g., the treatment of Gaucher’s Disease at approx-imately $200,000 per year per patient).

4. Conclusion

Due to the large variation in estimates it may be impossible toprovide guidance to researchers as to the appropriate bench-

mark in pharmacoeconomic studies. One tentative conclu-sion is that these “rules of thumb” are likely too low. Esti-mates by Hirth and colleagues [15] and numbers used by USgovernmental agencies suggest that the benchmarks should besubstantially higher than are currently recommended. How-ever, how much higher is a matter subject to considerableconjecture given the huge range in estimates for the value ofa statistical life, a life year or a quality adjusted life year?In fact, it is quite difficult to justify the use of thresholds ineconomic evaluations at all. The simplicity provided by niceround numbers disguises the fact the analytic foundation ofbenchmarks is extremely weak.

Recent concepts of net health benefits (NHB) [35], andcost-effectiveness acceptability curves [36] have been pro-posed to reduce the technical shortcomings of the CE ra-tios and help to interpret the results. The help in inter-pretability have come by not relying whether the results ofa study fall below a single benchmark, but by examining theprobability that a particular study’s finding falls below (or isabove) a benchmark. In this instance cost acceptability cri-terion may be introduced [36]. Acceptability curves indicatethe probability that a treatment under investigation is cost-effective – with the researcher establishing the value placedon the unit of effectiveness (e.g., $50,000 per life year gainedor $100,000 per life year gained). If we accept based on theabove review that the average (or even maximum) willingnessto pay is unknown, we could instead examine the probabilityof a treatment under consideration is cost-effective based on apredetermined benchmark (or ceiling ratio if we examine themaximum amount).

For example, if the findings of a study indicate an esti-mated incremental cost-effectiveness ratio of $15,000 per lifeyear saved we can examine the probability that this result iscost-effective as defined by a benchmark of $50,000 per lifeyear saved. Further, we could examine the probability that thefinding is cost-effective over a range of benchmarks. The ad-vantage to this approach is that it makes no assumption aboutthe appropriate benchmark. For instance, an intervention mayhave a 20% chance of being cost-effective at $25,000 per lifeyear, a 40% chance at $50,000 per life year and a 90% chanceat $100,000 per life year. In this way we have solved theproblem of using a single number as a benchmark. However,at the margin, the issue of interpretation remains unresolved.For example, does an intervention with a 50% probability ofbeing cost-effective at $100,000 per life year saved representa good value? In such situations, the decision maker wouldrequire some additional information before the final decisioncan be made: such as the impact of a new intervention in awider setting which raises the issue of opportunity costs asadvocated by others [30].

If it is impossible to derive an appropriate endpoint, thenthe interpretation of pharmacoeconomic studies defaults torankings on league tables. However, the broader issue of therelevance of league tables and how pharmacoeconomic data iscommunicated has to be questioned [6]. Due to inherent prob-lems with league tables there appears to be no widespreadadoption of league tables in any country and guidelines for the

Page 6: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

48 C. EVANS ET AL.

conduct of pharmacoeconomic studies do not appear to sup-port their use, and when they have been tried in the Americancontext their technical implementation has failed [37,38]. Asone researcher noted, cost-effectiveness analysis as appliedin the Oregon experiment was fundamentally and conceptu-ally flawed as it does poorly in dealing with the friction be-tween statistical and identifiable lives [39]. Moreover, evenif league tables were used in pharmacoeconomic studies insuch a way as to maximize health outcomes it would ignorethe importance that the general population places on equityconcerns and access. For example, it has been shown that thegeneral public places an emphasis on equity considerationsin rationing health care compared to efficiency [40]. A viewalso shared by many health economists [41]. Sir MichaelRawlins, chairman of NICE, echoes this sentiment: althoughthe QALY is crucial in making decisions regarding the cost-effectiveness of interventions, other factors such as patients’views, clinical judgment and a whole range of other issuesincluding price must be taken into consideration.

The use of endpoints such as QALYs is also question-able given the current knowledge of decision makers. Re-ports have found a reluctance of decision makers in Europe toaccept QALYs and arbitrary cut-off points in pharmacoeco-nomic studies [6,42]. In the United States, survey work [43]has noted that decision makers are unfamiliar with the termsutility and quality adjusted life years. This calls into questionwhy researchers create studies with these endpoints if the endaudience does not have an adequate understanding or abil-ity to interpret studies based on these types of endpoints. Ofcourse, one must also remember that any ranking of inter-ventions is likely to be inaccurate due to differences in themethodology employed by researchers.

However, despite the problems with QALYs and leaguetables, at a fundamental level they offer a considerable advan-tage over traditional clinical assessments that only considerextensions in life years. Also, it is important, as Eddy [37]counsels, to separate out the technical and conceptual issuesin using QALYs. Technical issues about measurement in thepast (such as a failure to account for severity of the initialcondition, patients’ utility vs society utility values, and eq-uity [44]) can be corrected by better study design in the futureor the development of new techniques and various methodshave been proposed where the scores from multiattribute util-ity instruments can be transformed to include concern by so-ciety to give priority to the worst-off [8]. However, the use ofmultiattribute utility model has been questioned by others [6].

Conceptual issues are more difficult to deal with. If we be-lieve as Hadorn [39] does that the “rule of rescue”, wheremassive resources are deployed to save an identifiable life(e.g., sailors lost at sea or a boy with leukemia on the news) atthe expense of sacrificing resources for other statistical lives(e.g., flu vaccinations for at risk individuals), cannot be madecompatible with utility elicitation then the quest for QALYs inhealth care decision-making should be abandoned. However,this is not the case and preferences for rescue (and allocat-ing resources to identifiable lives) may be included in utilitystudies. It may be done either through technical changes to

the measurement method by upper end compression of non-severe health states [38] or by directly collecting data on theon the desirability of a service and an outcome to an individ-ual as well as others [37].

In conclusion, it is recognized that currently recommendedbenchmarks may not be appropriate for pharmacoeconomicstudies, and the huge variation in estimates makes it impos-sible to assess what is a correct cut-off point. This task isalso complicated to the extent that benchmarks are likely tovary by country depending on how the health care deliveryand finance system is organized (e.g., socialized vs. free mar-ket). However, use of cost acceptability curves offers someimprovement in interpretation, but there still remain problemsin determining appropriate values. Defaulting to league ta-bles to aid decision-making is not likely to be fruitful due tolegal challenges (as occurred with the Oregon experiment),and a lack of understanding of QALYs and methodologi-cal shortcomings [6]. Future use of league tables may bewarranted as the process for collecting preferences improvestechnically and is expanded to incorporate issues of access,equity and distribution. Nonetheless, current concerns regard-ing the use of benchmarks are warranted by decision mak-ers and researchers. Pharmacoeconomic studies should becommunicated in a way that reimbursement, regulatory andpolicy makers understand. Studies that highlight the actualbudgetary impact of implementing a new intervention are amore practical solution to presenting pharmacoeconomic datacompared to using benchmarks.

Acknowledgements

We are most grateful to anonymous referees for their valuableand helpful advice and comments.

References

[1] J. Siegel, L. Russel and M. Weinstein, Reporting cost effectivenessstudies and results, in: Cost-Effectiveness in Health and Medicine, eds.M. Gold et al. (Oxford University Press, New York, NY, 1996).

[2] A. Gafni and S. Birch, NICE methodological guidelines and decisionmaking in the National Health Services in England and Wales, Pharma-coeconomics 21 (2003) 149–157.

[3] H.C. Weinstein and W.B. Stason, Foundation of cost-effectivenessanalysis for health and medical practices, New England Journal ofMedicine 296 (1977) 716–721.

[4] M.F. Drummond, G.W. Torrance and J.M. Mason, Cost-effectivenessleague tables: More harm than good? Social Science and Medicine 37(1993) 33–40.

[5] M.F. Drummond, B. O’Brien, G.L. Stoddart and G.W. Torrance, Meth-ods for Economic Evaluation of Health Care Programmes (Oxford Uni-versity Press, Oxford, 2000).

[6] G. Duru, J.P. Muray, A. Beresniak, M. Lamure, A. Paine and N. Ni-coloyannis, Limitations of the methods used for calculating quality-adjusted life-year values, PharmacoEconomics 20 (2002) 463–473.

[7] A. Gafni and S. Birch, Guidelines for the adoption of new technolo-gies: A prescription for uncontrolled growth in expenditures and how toavoid the problem, Canadian Medical Association Journal 148 (1993)913–917.

[8] E. Nord, Health state values from multiattribute utility instruments needcorrection, Ann. Med. 33 (2001) 371–374.

Page 7: Use of Quality Adjusted Life Years and Life Years Gained as Benchmarks in Economic Evaluations: A Critical Appraisal

BENCHMARKS IN PHARMACOECONOMIC EVALUATIONS 49

[9] E. Nord, J.L. Pinto, J. Richardson, P. Menzel and P. Ubel, Incorpo-rating societal concerns for fairness in numerical valuations of healthprogrammes, Health Economics 8 (1999) 25–39.

[10] A. Williams, Economics of coronary artery bypass grafting, BritishMedical Journal 291 (1985) 326–329.

[11] G.T. Smith, The economics of hypertension and stroke, American HeartJournal 119 (1990) 725–728.

[12] G. Torrance and A. Zipursky, Cost-effectiveness of antepartum preven-tion of Rh immunization, Clinics in Perinatology 11 (1984) 267–281.

[13] K. Schulman, L. Lynn, H. Glick and J. Eisenberg, Cost-effectivenessof low-dose Zidovudine therapy of asymptomatic patients with humanimmunodeficiency virus (HIV) infection, Annals of Internal Medicine114 (1991) 798–802.

[14] J. Tsevat, D. Duke and L. Golman, Cost-effectiveness of captopriltherapy after myocardial infarction, J. Am. Coll. Cardiol. 26 (1995)914–919.

[15] R. Hirth, M. Chernew, E. Miller et al., Willingness to pay for a quality-adjusted life year: In search of a standard, Medical Decision Making20 (2000) 332–342.

[16] R. Kaplan and J. Bush, Health related quality of life measurement forevaluation research and policy analysis, Health Psychology 1 (1982)61–80.

[17] A. Laupacis, D. Feeny, A. Detsky and A. Tugwell, How attractive doesa new technology have to be to warrant adoption and utilization? Ten-tative guidelines for using clinical and economic evaluations, CanadianMedical Association Journal 146 (1992) 473–481.

[18] A. Stevens, D. Colin-Jones and J. Gabbay, “Quick and clean”: Author-itative health technology assessment for local health care contracting,Health Trends 27 (1995) 37–42.

[19] P. Salzmann, K. Kerlikowske and K. Phillips, Cost-effectiveness of ex-tending mammography guidelines to include women of 40–49 years ofage, Annals of Internal Medicine 127 (1997) 955–965.

[20] J.L. Larriba-Gonzalez, S. Serrano, M. Alvarez-Mon et al., Cost-effectiveness analysis of interferon as adjuvant therapy in high-riskmelanoma patients in Spain, European Journal of Cancer 36 (2000)2344–2352.

[21] R. Rettig and N. Levinsky, Kidney Failure and the Federal Government(National Academy Press, Washington, DC, 1991).

[22] Inflation calculator, www.westegg.com/inflation/. Accessed on 6/1/02.[23] Below threshold of 20 000 Euros/LYG, PharmacoEconomics and Out-

comes News Weekly 298 (2001).[24] M. Weinstein, From cost-effectiveness ratios to resource allocation:

Where to draw the line? in: Valuing Health Care: Costs, Benefits, andEffectiveness of Pharmaceuticals and Other Medical Technologies, ed.F. Sloan (Cambridge University Press, New York, 1996).

[25] Drugs and the NHS’s £30,000 question, Financial Times (2001).[26] Federal Register, Proposed Rules, 64FR36523, 06/07/99 (1999).

[27] Federal Register, Proposed Rules, 64FR627772, 17/11/99 (1999).[28] D. Kenkel, Using estimates of the value of a statistical life in evaluating

regulatory effects (2002).[29] S. Birch and A. Gafni, Cost effectiveness utility analyses. Do current

decision rules lead us to where we want to be? Journal of Health Eco-nomics 11 (1992) 279–296.

[30] P. Sendi, A. Gafni and S. Birch, Opportunity costs and uncertainty inthe economic evaluation of health care interventions, Health Economics11 (2002) 23–31.

[31] A. Wagstaff, QALYs and the equity-efficiency trade-off, Journal ofHealth Economics 10 (1991) 21–41.

[32] P.A. Ubel, G. Loewenstein, D. Scanlon and M. Kamlet, Individual util-ities are inconsistent with rationing choices, Medical Decision Making16 (1996) 108–116.

[33] A. Williams, QALYs and ethics: A health economist’s perspective, So-cial Science and Medicine 43 (1996) 1795–1804.

[34] P. Dolan, The measurement of individual utility and social welfare,Journal of Health Economics 17 (1998) 39–52.

[35] A.A. Stinnett and J. Mullahy, Net Health Benefits: A new frameworkfor the analysis of uncertainty in cost-effectiveness analysis, MedicalDecision Making 18 (1998) S68–S80.

[36] B.A. van Hout, M.J. Al, G.S. Gordon and F.F.H. Rutten, Costs, effectsand C/E ratios alongside a clinical trial, Health Economics 3 (1994)309–319.

[37] D. Eddy, Oregon’s methods: Did cost-effectiveness analysis fail?JAMA 266 (1991) 2135–2141.

[38] E. Nord, Unjustified use of the Quality of Well-Being scale in prioritysetting in Oregon, Health Policy 24 (1993) 45–53.

[39] D. Hadorn, Setting health care priorities in Oregon: Cost-effectivenessmeets the rule of rescue, JAMA 265 (1991) 2218–2225.

[40] P. Ubel and G. Lowenstein, Distributing scarce livers: The moral rea-soning of the general public, Social Science and Medicine 42 (1996)1049–1055.

[41] P. Ubel and G. Lowenstein, Public perceptions of the importance ofprognosis in allocating transplantable livers to children, Medical Deci-sion Making 16 (1996) 234–241.

[42] R.J. van Zwart, H. Lukens, J. Busschbach et al., Differences in atti-tudes, knowledge and use of economic evaluations in decision makingin The Netherlands: The Dutch results from the EUROMET project,PharmacoEconomics 18 (2000) 149–160.

[43] C. Evans, E. Dukes and B. Crawford, The role of pharmacoeconomicinformation in the formulary decision-making process, JMCP 6 (2000)108–121.

[44] E. Nord, Severity of illness versus expected benefit in societal evalua-tion of healthcare interventions, Expert Rev. Pharmacoeconomics Out-comes Res. 1 (2001) 85–92.