Measur Measure for measure - Policy Exchange · Measure for measurefocuses on informa-tion about...

Measure for measure

Using outcome measures to raise standards in the NHS

Richard Hamblin and Janan GaneshEdited by Gavin Lockhart

Measure

for m

ea

sure

R

ichard H

amb

lin an

d Jan

an G

anesh

, edited

by G

avin L

ockh

artP

olicy E

xchan

ge

£10.00

ISBN: 978-1-906097-07-3

Policy ExchangeClutha House

10 Storey’s GateLondon SW1P 3AY

www.policyexchange.org.uk Think Tank of the Year 2006/2007

Modern enterprises see information on their performance asvital – what is extraordinary about the NHS is that it spendsso much on hospital care and knows so little about what thisspending achieves. Outcome data currently focuses onmortality and readmission. This excludes around 90 per centof hospital admissions. Whenever information has beencollected on healthcare, it has revealed serious failings thatrequire corrective action, and has identified high performersfrom whom others can learn.

This report examines measurement schemes in the UK andUS that provide essential lessons for policy makers andincorporates groundbreaking polling that reveals how patientswould react to information on quality. The research showsthat measures of quality must have both visibility andcredibility for clinicians and that publishing measures, ratherthan just reporting confidentially back to providers, increasesthe likelihood of driving up quality. The authors conclude thatthe effects of linking formal incentives (payment) to qualityare ambiguous and that although patients want quality ofcare information, it is not clear that patient choice isstimulated by the publication of performance data.

px nhs cover_hds v3.qxp 24/09/2007 10:03

Measure for measure Using outcome measures to raise standards in the NHS

Richard Hamblin and Janan Ganesh

Edited by Gavin Lockhart

Policy Exchange is an independent think tank whose mission is to develop and promote new policy ideas which willfoster a free society based on strong communities, personal freedom, limited government, national self-confidence andan enterprise culture. Registered charity no: 1096300.

Policy Exchange is committed to an evidence-based approach to policy development. We work in partnership with aca-demics and other experts and commission major studies involving thorough empirical research of alternative policy out-comes. We believe that the policy experience of other countries offers important lessons for government in the UK. Wealso believe that government has much to learn from business and the voluntary sector.

Trustees

Charles Moore (Chairman of the Board), Theodore Agnew, Richard Briance, Camilla Cavendish, Richard Ehrman, Robin Edwards, George Robinson, Tim Steel, Alice Thomson, Rachel Whetstone.

px nhs text_hds.qxp 21/09/2007 16:15 Page 1

About the authors

Richard HamblinHarkness Fellow, Group Health Co-operativeof Puget SoundRichard is a 2006-07 CommonwealthFund Harkness Fellow in HealthcarePolicy and Practice, based in Seattle at theCentre for Health Studies. His work thereconcentrates on the way that publicisinginformation about the quality of heathservices may improve them. Previously hewas head of analytic support in theHealthcare Commission, the Englishhealth services regulator. His primaryresponsibility there was ensuring a rigorousanalytic and evidential base for regulatoryactivities. Richard’s previous career encom-passes NHS management and health serv-ices research, including a spell as a researchofficer at the King’s Fund where he pro-duced studies on London’s healthcare sys-tem, the effect of the introduction of theinternal market into the NHS in the 1990sand methods of modelling waiting-listdynamics. His particular interest is inhealth informatics and the use of rigorousanalysis of performance data as a tool forimprovements in quality and efficiency ofhealth services. He is a graduate of

University College London (1992). Hewrites here in a personal capacity and viewsexpressed are not necessarily those of theCommonwealth Fund, its directors, offi-cers or staff.

Gavin LockhartResearch Director, Policy ExchangeGavin is Research Director at PolicyExchange with responsibility for healthcareand police research. He studied Sociologyat Edinburgh University. After graduatingin 2002 with a first-class degree, Gavinworked as a management consultant beforejoining Policy Exchange in August 2006.

Janan GaneshResearch Fellow, Policy ExchangeJanan is a Researcher at Policy Exchange.He was born in 1982 and graduated top ofhis year from Warwick University (BAPolitics) before studying public policy atUCL, where he focused on health. In2006, he co-authored CompassionateConservatism for Policy Exchange and wonthe T.E. Utley Memorial Prize for youngjournalists. He has also worked for ZacGoldsmith.

2

© Policy Exchange 2007

Published byPolicy Exchange, Clutha House, 10 Storey’s Gate, London SW1P 3AYwww.policyexchange.org.uk

ISBN: 978-1-906097-07-3

Printed by Heron, Dawson and SawyerDesigned by JohnSchwartz, [email protected]


Contents

Executive Summary 5

1 Introduction 92 The current situation 133 From information to action: five case studies 20

Cardiac Surgery Reporting System in New York State 20Clinical Resource and Audit Group Outcome Indicators Reports 23Veterans Health Administration National Surgical Quality Improvement Program 25Healthcare Commission Star Ratings 30The London Patient Choice Project 32

4 Patients, payment and providers 355 Technical difficulties 426 Patient Reported Outcome Measures (PROMs) 477 Outcome Measurement in Primary Care 518 Recommendations 57

Appendix one: explaining why US comparisons are valid 67Appendix two: information for accountability 69Appendix three: information for patient choice and activation 71

Glossary 72

www.policyexchange.org.uk • 3


Acknowledgements

Policy Exchange would like to thank JohnNash, Hugh Osmond, Henry Pitman andMike Sherwood who supported thisresearch project. We would also like tothank the other donors who generouslyand anonymously supported the project.

Special thanks go to Professor GwynBevan, Mohammed Mohammed, PhilippaIngram and all those who commented onearly versions of this report. The PolicyExchange Health and Social Care SteeringGroup also provided invaluable guidance.

The views expressed are not necessarilythose of members of the Policy ExchangeHealth and Social Care Steering Group orof those who commented on drafts of thereport.

4


Executive Summary

From information to action: context The last two decades have been charac-terised by a revolution in the availability ofinformation. This has included a transfor-mation of the capacity to store, retrieveand disseminate data (including, but notlimited to, the growth of the internet), thequestioning of previously accepted intel-lectual authority,1 and more coherent theo-rising about the value of information tosociety and the uses of information as aresource.2 The effects of this revolution areas noticeable in healthcare as anywhereelse.

There are, broadly, two different typesof data about health and healthcare thatare increasingly available to the public. Thefirst is about diseases and remedies,3 andthe second is about the quality and effi-ciency of health services. Although clearlydifferent in focus, both assume a view ofpatients that transcends the traditionalperception of them as passive recipients ofcare and information, seeing them insteadas more pro-active consumers of services,motivated to learn more about their ill-nesses, their treatments and most impor-tantly about the quality of the carers andcare providers.

At its foundation, the NHS agreed a ‘bar-gain’ with doctors in which they acceptedbudgetary constraints in return for clinicalautonomy and freedom from outside scruti-ny. The key to this deal was its assumptionof altruism and professionalism on the partof clinicians. Until the late 1980s, the imageof the NHS was that its resources were lim-ited and directed at ensuring good clinicalcare, and that its outcomes were as good as,if not better than, those achieved in compa-rable countries.

This impression was shattered in the1990s by evidence from internationalcomparisons which showed that the UKhad poor outcomes for cancer, heart dis-

ease and stroke – the country’s principalburden of disease. In addition, a series ofscandals involving incompetent (and, insome cases, malign) doctors attracteddreadful publicity. Particularly shockingwas that these abuses were allowed to con-tinue unchecked for long periods.

The Government’s response was todevelop policies that put quality at theheart of the programme of NHS reform.Using information has been an essentialpart of this strategy. For example, a rangeof national clinical audits was introducedto strengthen clinical governance, aModernisation Agency was created and theNHS Information Centre was established.The NHS has a huge amount of data. Butthe Government and Department ofHealth currently focus on the ‘wrong’measurements of performance.

Modern enterprises see information ontheir performance as vital – what isextraordinary about the NHS is that itspends so much on hospital care andknows so little about what this spendingachieves. Outcome data currently focuseson mortality and readmission. Thisexcludes around 90 per cent of hospitaladmissions. Whenever information hasbeen collected on healthcare, it hasrevealed serious failings that require correc-tive action, and has identified high per-formers from whom others can learn.

Measure for measure focuses on informa-tion about the quality of health servicesand explores ways in which it can lead tobetter quality healthcare for all. We arguethat the information we need is not usual-ly available and that, even if it were, it


“ The Government and Department of Health currently

focus on the ‘wrong’ measurements of performance”

1. Giles J, "Internet encyclopae-

dias go head to head", Nature,

2005

2. Such as Cleveland H,

"Information as Resource", The

Futurist, December 1982

3. Sometimes described as 'infor-

mation therapy' this is increas-

ingly available through various

media (http://www.information-

therapy.org/ and has led to the

concept of the 'expert patient'


would not by itself lead to the improve-ment of health services. If informationalone were enough, none of the scandalsidentified below would have been allowedto go on for so long.

In chapter two we explain that there aredifferent ways in which the publication ofdata may lead to quality improvement.One is that providers themselves activelyrespond, encouraged by incentives toimprove their performance.4 Another isthat the changed behaviour of patientscan drive up standards. Patients maychoose better providers, but may alsobecome more confident in seekingimproved services from their existingprovider. But using information to meas-ure quality is problematic. It is more dif-ficult to assess the quality of primary care5

than secondary care.6 Chronic care,which accounts for much of primary care,often involves multiple agencies (not justthe GP), multiple interventions (not justa one-off operation) and, by definition,sees declining health outcomes forpatients over time. There is an under-standable enthusiasm for the measure-ment of primary and chronic care to bedeveloped further, and the quality andoutcomes framework (QOF) is a strongfoundation on which to build. This doesnot mean, however, that the issue of hos-pital measurement has been resolved.

There remain obstacles to using infor-mation in helpful ways. One of these isgetting the level of information correct.The practice of giving overall ratings forindividual hospital trusts, followed by theGovernment since 2000, has led to sim-plistic assessments and has left gaps in theinformation available below the level of thehospital. For example, to know that a hos-pital has low mortality rates for medicaland surgical admissions tells us nothingabout sub-specialties within surgery.7

Given the present moves towards trans-parency and external quality measurement,the publication of data is no longer contro-

versial. Indeed, it has come to be seen asinevitable. The question remains, however:how should it be done to ensure the maxi-mum benefit for patients?

Learning from previous measurement schemesThere have been a number of measure-ment schemes in the UK and US that pro-vide useful lessons for the future. In chap-ters three and four we consider case studiesof attempts to use information to improvequality (three from the United States, twofrom England and one from Scotland) andconclude as follows:

l to be successful, the measures them-selves must be both visible and crediblefor clinicians; those that are not willbe ignored;

l information that is primarily usedinternally can be used to improve care(as in the case of the Veterans’Administration in the US) when thereis a robust improvement programmethat includes on-the-ground investiga-tion to identify and correct problems,and sharing of good practice;

l publishing the measures, rather thanjust reporting confidentially back toproviders, seems to increase the likeli-hood of changing their behaviour, aswas noticed in the QualityCounts ini-tiative in Wisconsin US. Although, asthe New York case study shows, there isincreased concern about gaming whenmeasures are made public;

l the effects of linking formal incentives toquality measurement, such as pay forperformance, are ambiguous. Incentivesencourage unintended perverse respons-es such as gaming and falsification ofdata. It is not clear that the extra costs ofproviding a financial incentive encouragesufficiently better performance;

l research, including our own survey,shows that patients want quality of care

Measure for measure

6

4. Formal incentives here would

be payment mechanisms or

other sanctions and rewards;

informal incentives would be

kudos and censure for individu-

als and organisations.

5. The frontline of the NHS is

officially called primary care. The

initial contact for many people

when they develop a health prob-

lem is with a member of the pri-

mary care team, usually their GP.

Many other health professionals

work as part of this frontline

team: nurses, health visitors,

dentists, opticians, pharmacists

and a range of specialist thera-

pists. NHS Direct and NHS walk-

in-centres are also primary care

services.

6. Specialised medical services

and commonplace hospital care

(outpatient and inpatient servic-

es). Access is often via referral

from primary healthcare servic-

es.

7. There are also organisational

issues that need to be

addressed. For example, the pri-

mary role of the Healthcare

Commission is to assess how

well NHS hospitals perform

against Government targets and

standards, and not to provide

information below that level,

while lower level assessments of

individual services are available,

for example information about

diabetes services has recently

been released. And despite the

emergence of other providers of

information, gaps in data remain.


information, yet it is not clear thatpatient choice is necessarily stimulatedby publication of information. Very wellstudied publication schemes in the US,such as QualityCounts and cardiac sur-gery mortality in New York State showedthat publishing data did not lead tochanges in hospital market share. TheLondon Patient Choice Project, by con-trast, appears to have been successful inencouraging choice, although this mayreflect both the large infrastructure putin place to support choice, and the rela-tively simple metrics needed to under-stand how long one waits (a somewhateasier proposition than comparing thequality of two services).

Technical issuesGiven these examples of apparently suc-cessful use of measurement, one may won-der why the practice is not more wide-spread. Part of the answer is that it is hardto put into action. There are difficulties inestablishing measurement regimes thatstimulate improvements without also pro-ducing unintended consequences. Theseinclude the design of the incentive systemaround the measurement system; ensuringadequate data quality; and designinganalysis that is sufficiently sophisticated toallow for the complexities of healthcare,and is easily understood by its target audi-ence. There are also problems in decidingwhere and how to publish the results.These difficulties are technical rather thanideological, and are discussed in chapterfive.

Lessons for reformIf we are serious about improving the qual-ity of care, then we need much more pre-cise information than the limited amountthat is available at present. Nothing isknown about the outcomes of mostpatients discharged from NHS hospitals. It

has been argued that patient-reportedmeasures offer an important adjunct to cli-nicians in the care of their patients.8 Self-completed questionnaires with adequatemeasurement properties offer a quick wayfor patients to provide evidence of howthey view their health – evidence that cancomplement existing clinical data.Instruments applied in this context can beused to screen for health problems and tomonitor progress of health problems iden-tified, as well as the outcomes of any treat-ment.

As we argue in chapter six, patient-basedoutcome measures may also help changethe culture of the NHS; an organisationwhich is far from universally patient-focused. It is a politically-led organisation,where the Government, not the patient, isthe paymaster.

At the end of this report we present avision of the potential future landscape ofhealthcare quality information thatreforms could facilitate and we make eight-een recommendations. We believe thatsuch a landscape would provide accounta-bility, promote increased patient trust, andimprove performance.

The information needed falls into threecategories:

Information for accountabilityAccountability measures are designed toassure taxpayers that their local health servic-es (including NHS providers, IndependentTreatment Centres and private providersdoing publicly funded work) do two things:


Executive summary

9. Tarlov AR et al, "The Medical

Outcomes Study. An application

of methods for monitoring the

results of medical care", Journal

of the American Medical

Association, 1989; Nelson EC,

Berwick DM, "The measurement

of health status in clinical prac-

tice", Medical Care, 1989

“ Patient-based outcome measures may also help

change the culture of the NHS; an organisation which is

far from universally patient-focused. It is a politically-led

organisation, where the Government, not the patient, is

the paymaster”


l provide at least a minimum acceptablelevel of care, including safety, access,clinical competence, and compassion;

l spend public money with due care andconsideration and achieve the goals whichthe Government decide are appropriate.

Information for patient choice and activationInformation is often seen as facilitatingpatient choice, but it also serves activation,the patients’ willingness to be involved andassertive in the decision-making in theirown care. Moreover, choice can only real-ly apply to predictable elective care, andespecially surgery. It is not generally rele-vant to emergency care or to management

of long-term conditions, where it will tendto conflict with continuity of care.Whether for choice or activation, releasingsmall amounts of information about thequality of specific services to patients withspecific conditions, rather than consumersgenerally, will provide invaluable support.

Information for providers This information is used to activateproviders’ intrinsic motivations of profes-sionalism and altruism. It does not needto be published but does require a mecha-nism to share the information withproviders and hospitals, and to encourageand monitor improvement.

Measure for measure

8


1Introduction

This chapter sets out the recent historicalcontext of reform in the NHS, the differentways in which information can be used todrive quality improvement and the informa-tion that is currently available.

Context: information and governanceAt the start of the 1990s the NHS was stillrecognisable as the organisation that was cre-ated after the Second World War. This wasdespite the introduction of general managersin the 1980s and an internal market from1991. As an institution, it enjoyed publicesteem; doctors were trusted by their patientsto do the best they could within the resourcesthat were available. The image of the NHSwas one whose limited resources were direct-ed at ensuring good clinical care, and whoseoutcomes were as good as, if not better than,those achieved in comparable countries. Thisimage was shattered in the 1990s, as evidenceemerged of relatively poor outcomes for can-cer, heart disease and strokes in the Englandand Wales – the principal causes of the coun-try’s burden of disease.

In addition, a series of scandals involvingpatients suffering at the hands of incompe-tent (and, in some cases, malign) doctorsattracted dreadful publicity. Particularlyshocking was that these abuses were allowedto continue unchecked for so long. One ofthe most notorious cases was at the BristolRoyal Infirmary, where an exceptionally highnumber of babies died after heart surgery.The ‘Kennedy Report’ of the public inquiryinto the Bristol case identified the root causeof such inaction as the absence of any systemto govern and regulate quality of care in the

NHS.9 The report observed that individualswho were aware of the problem did notbelieve it was their duty to act.

We give evidence in this report of seriouslapses in the quality of care in the NHS thatcame to light in the 1990s and consider thepolicies that have been introduced over thepast ten years to remedy them. Before con-sidering what we can learn from experiencein the UK and elsewhere, it is necessary tounderstand why the NHS did so little toanalyse, monitor and govern the quality ofits care.

Like many public services in NorthAmerica and Western Europe, the NHS hastraditionally been a “network” model ofpublic service delivery. This is a systemwhere the State funds and provides a service,and trusts frontline workers to perform theirroles with altruism and professional compe-tence.10

The lesson much of the British elite andelectorate took from the nation’s successfulstruggle against Nazi Germany was that theliberal individualism of the Victorian erashould be replaced by the kind of high-minded collectivism that won the war. Thiscollectivism, embodied by the BeveridgeReport of 1942, was reinforced by the expe-rience of the emergency medical service dur-ing the war: many from voluntary hospitalsworked outside London and were shockedby what they found.

This faith in public sector workers led tothe notion that scrutiny of their perform-ance and incentives for improvement wereunnecessary. It was enough to rely on thepublic service ethos of those who staffed thewelfare state to provide the best possible per-


9. Learning from Bristol - Report

of the Public Inquiry into

Children's Heart Surgery at the

Bristol Royal Infirmary [Kennedy

Report], the Stationery Office,

2001

10. Le Grand J, Motivation,

Agency and Public Policy: Of

knights and knaves, pawns and

queens, Oxford University Press,

2003


formance. And nowhere did this assumptionrun deeper than in the NHS.

The “deal” between NHS workers andthe Government whereby the latter grantsthe former a greater or lesser degree of inde-pendence in discharging their duties is anexample of what Christopher Hood hascalled “public sector bargains”. These are“explicit or implicit agreements betweenpublic servants – the civil or uniformed serv-ices of the state – and those they serve. Theother partners in such bargains consist ofpoliticians, political parties, clients, and thepublic at large.”11 Public sector bargains arepart of the “living constitution” of a state. AsHood remarks: “Any meaningful democracycan exist only if at least the police and armedforces show some allegiance to the electedGovernment”.12 But these bargains extend toothers professions as well, and include healthworkers.

Hood distinguishes between two broadcategories of public sector bargain. One isthe “principal-agent bargain”, in which pub-lic servants are seen as rational actors who arewilling to subvert or ignore the interests ofthe Government or the general public iftheir own interests so require. The NHS,however, was organised on the basis of thesecond type, the “trustee-type bargain”. Thisis derived from the legal concept of propertybeing put under the charge of a “trust” forthe benefit of others. Trustees are not con-trolled by their beneficiaries, whose powersare limited to being able to take legal actionto ensure that the trustees comply with theterms of the trust. A trustee bargain, there-fore, is one in which public servants (thetrustees) are afforded wide latitude to deter-mine what is in the interests of the generalpublic (the beneficiaries).

Under such a bargain, “the tenure andrewards of public servants are not under thedirect control of those for whom they act,the skills and competencies they are expect-ed to show are not determined by the instru-mental interests of elected politicians, andloyalty lies to an entity that is broader than

the elected Government of the day.”13 TheNHS has many of the characteristics of atrustee-type bargain.14

Rudolf Klein has suggested that, fromthe start of the NHS, there was an implic-it concordat between the state and themedical profession, in which cliniciansaccepted budgetary constraint in return forclinical autonomy.15 In Julian Le Grand’swords, “politicians and civil servants (allo-cated) resources at macro level and gavemedical professionals almost completeclinical freedom to make ground-leveldecisions as to which patients shouldreceive what treatment”.16 They were, toemploy Le Grand’s famous distinction,“knights” rather than “knaves”. A corollarywas that it was unnecessary to scrutinisethe performance of those who worked inpublic services. The caring professionswere assumed to be a model of “knightly”behaviour.

So, from its creation in 1948, the NHShas essentially remained a network organisa-tion, with clinical autonomy over care inte-gral to its organisational DNA. There aremany advantages to this, not least highmorale for professionals and low administra-tive costs.

However, there are two bold assump-tions behind such an approach to health-care: that doctors are all “knights” who puttheir patients’ interests before their ownand that they all also routinely perform toacceptable levels of competence. The expe-riences of the 1990s challenged thisassumption profoundly.

The crisis in qualityWhatever the organisational changes thattook place over the first 50 years of theNHS, there was no challenge to the theoryof professional self-regulation and the exclu-sion of outside scrutiny of quality of care. Itis important to recognise that this organisa-tional pattern, reflecting as it does the trustof the public in the caring professions, is the

Measure for measure

10

11. Hood C and Lodge M, The

Politics of Public Service

Bargains: Reward, Competency,

Loyalty - and Blame, Oxford

University Press, 2006

12. Ibid

13. Ibid

14. Such as the broad discretion

afforded to clinicians, in the

absence of systematic scrutiny

and monitoring of performance

until relatively recently, and in its

ongoing failure to attempt to

measure the extent to which the

health of patients improves as a

result of treatment.

15. Klein R, The New Politics of

the National Health Service,

Radcliffe Press: Oxford, 2006

16. Le Grand, 2003


explanation for the crisis of quality thatthreatened the NHS in the 1990s.

This pattern was so deeply rooted in theNHS that the three fundamental reorganisa-tions in the 1970s, 1980s and 1990s simplyentrenched it further. The first significantreorganisation of the NHS, in 1974,17 wasdesigned on the principle of consensus man-agement, which held that no decision couldbe made without the unanimous agreementof all members of the management teams.This meant that medical representativescould veto change. Mrs Thatcher’sGovernment sought to end consensus man-agement in the 1980s with the introduction

of general managers.18 However, these man-agers had no responsibility for the quality ofcare, which continued to be seen as a profes-sional prerogative. Then, alongside the cre-ation of the internal market from 1991, theThatcher Government introduced medicalaudit. However, medical audit proved singu-larly ineffective in the Bristol case, where the‘Kennedy Report’ found it was outside thenormal managerial processes and lacked apresence in the area of children’s heart sur-gery.

Table 1 gives five examples of scandalsthat came to light in the 1990s and that werethe subject of inquiries commissioned by the


Introduction

Table 1: delays in responding to a series of scandals

Scandal Nature First concerns raised Action taken

Failures of children’s 1991-1995 at least 30 1986 by South 1995 DoH advises

heart surgery at the more children died than Glamorgan Health that an outside

Bristol Royal Infirmary would have been Authority independent inquiry

expected in a typical unit is essential; 1997 inquiry reports

Actions of Found guilty of 1986 senior management 1996 suspended

gynaecologist bungling 13 operations is made aware of Mr and later dismissed

Rodney Ledward and allegedly damaging Ledward’s complication by South Kent Hospitals

hundreds more women rate and his cavalier NHS Trust;

manner 1998 struck off medical register;

1999 Ritchie inquiry set up

2000 Ritchie inquiry reports

Organ-stripping at Large number of hearts 1989 onwards, Alder Hey 1999 Redfern inquiry announced

Alder Hey Children’s from deceased children and the university missed 2001 Redfern inquiry reports

Hospital, Liverpool, retained without the numerous opportunities

by senior pathologist consent of their parents to discipline Professor

Dick van Velzen

Yorkshire GP, 1975-1998 he may have 1998 a GP in a 1998 charged with murder

Harold Shipman, killed more than 200 neighbouring practice following forgery of a will

administered lethal patients reported her concerns 2000 convicted of 15

drugs to his patients to the local Coroner counts of murder

Indecent assaults by 1980s-90s described as 1985 patients tried to raise 2000 jailed for eight years

Loughborough GP preying on young male concerns with the family for nine counts of

Peter Green patients for 17 years health service authority indecent assault

17. Management Arrangements

for the Reorganised Health

Service, Department of Health

and Social Security, HMSO,

1972

18. Pollitt C, Harrison S, Hunter

D, Marnoch G, "General

Management in the NHS: The

Initial Impact 1983-88", Public

Administration, 1991


Secretary of State for Health in the period2000-2002. In each case, individual doctorsabused the trust that patients had placed inthem, and in each case there was an extraor-dinary reluctance from other parties to acton the information supplied. In theShipman case, the failure of the policeinquiry to act on suspicion of murder wasfortunately followed soon after by the arrestof Shipman, who drew attention to himselfby incompetently forging a will.19 But thatfailure illustrates the general problem of dis-belief in what the evidence suggested to bethe case: that a popular GP was murderinghis patients. In each case medical behaviourwas an anomaly at variance with the assump-tion that it is right to trust all doctors to actin the best interests of their patients.

In addition to these scandals over lapsesby a few individuals, there was evidencefrom international comparisons of two dif-ferent kinds of systemic failing. First, the UKpopulation had relatively poor outcomes forcancers, heart disease and strokes. Secondly,they waited much longer for hospital carethan patients in other countries with nation-al health services.

The Labour Government’s response tothese twin failings was to develop policiesthat put quality at the heart of its pro-gramme of NHS reforms. In 2003, twoexperts characterised the quality agenda forthe NHS in England as being the “mostambitious, comprehensive, systemic andintentionally funded effort to create pre-dictable and sustainable capacity forimproving the quality of a nation’s health-care system”.20 Using information has beenan essential part of this strategy.Developments have included a range ofnational clinical audits to strengthen clini-cal governance, as well as the creation ofthe Modernisation Agency and the NHSInformation Centre.21

Over the past decade and a half, theassumption that doctors will inevitably bealtruistic and competent has been disprovedby events. As a result the principle of exter-nal measurement of quality in UK healthservices is no longer controversial, and tomany seems inevitable. But the questionremains: how should quality measurementbe carried out to ensure the maximum ben-efit for patients?

Measure for measure

12

19. The Shipman Inquiry, First

Report: Death Disguised,

Secretary of State for Health,

2002

20.Leatherman S, Sutherland K,

The Quest for Quality in the

NHS: A Mid-Term Evaluation of

the Ten-Year Quality Agenda, the

Stationery Office, 2003 and

Leatherman S, Sutherland K,

"Quality of care in the NHS of

England", 2004

21. http://www.ic.nhs.uk/ (Last

accessed 03.05.07)


2The current situation

SummaryThis section of the report describes whatinformation we currently have, whatinformation we need, what effect thepublishing of information might have,and the obstacles that must be overcomeif information is to drive greater account-ability and quality.

How information can be used toimprove quality of careThere are two reasons for measuring per-formance and putting the resulting infor-mation in the public domain. The first isto increase the accountability of healthcareorganisations and professionals. The sec-ond is to stimulate improvements in thequality of care provided.22 The fact thatpublication leads to increased accountabil-ity is self-evident yet the way in whichimprovement occurs is complex and isexplored below.

There are at least four different mecha-nisms whereby making information publicmay effect improvement:

l Information may help patients to “shoparound” for the best provider or insti-tution;

l Patients may use the information togive themselves greater confidenceand power in negotiating the health-care system without necessarilychanging the provider or institutionthey go to;

l Purchasers of healthcare may make bet-ter and more targeted decisions aboutwhere to purchase care;

l Providers and institutions may them-selves be motivated to improve regard-less of any other incentive, by virtue ofthe kudos that a good report imparts.

However, whatever theoretical ‘model’ isemployed to explain improvements inquality, there are two reactions that propo-nents of greater use of information focuson: changes to provider behaviour andchanges to patient behaviour.

Changing provider behaviourAn increasing number of schemes furtherseek to encourage these changes by link-ing incentives with measurement of qual-ity and publication of the results. Insome cases this is quite explicit, as in pay-for-performance schemes that link someportion of physicians’ income to achieve-ment of specific targets. The new GPcontract is a prominent example ofthis.23,24 But responses from healthcareproviders to several publication schemes,where no financial incentive was offered,suggest an alternative model of “kudosand censure”. In this model, public dis-closure of performance informationplaces pressure on individual providers toimprove their performance, regardless ofexplicit financial incentives or any


22. Marshall M, Shekelle P,

Davies H, Smith P, Public

Reporting on Quality in the

United States and the United

Kingdom, Health Affairs, 2003

23. Doran T, Fullwood C,

Gravelle H, "Pay-for-Performance

Programs in Family Practices in

the UK", New England Journal of

Medicine, 2006

24. Lindernauer P, Remus D,

Roman S, Rothberg M, Benjamin

E, Ma A, Bratzler D, "Public

Reporting and Pay for

Performance in Hospital Quality

Improvement", New England

Journal of Medicine, 2007

“ Patients may use the information to give themselves

greater confidence and power in negotiating the health-

care system”


response from patients. After all, no-onewishes to be seen to perform badly.

Changing patient behaviourA great hope of many who advocate pub-lishing information about healthcarequality is that patients will choose thehighest quality services, thus increasingthe market share of high-qualityproviders and driving the poorest per-formers “up or out” (encouragingimprovement in the face of lost businessor forcing them to leave the market).Therefore, publication of performanceinformation has often been expressed interms of choice between different

providers.25 However, as our review ofpatient responses to information demon-strates, the majority of patients do notidentify “choice” as their preferred use ofperformance information, so this expec-tation may be optimistic.

An alternative model is one in whichpatients use performance information tostrengthen their position in regard to healthservices, either by boosting their ability tochallenge elements of their care that they donot understand or agree with, or by having abetter understanding of how well their doc-tor or hospital provides care. Proponentsargue that, for personal and practical rea-sons, patients will not readily change theirprovider. They suggest that the informationshould be used to improve the existing doc-tor-patient relationship rather than as a toolfor seeking new providers.

The information we haveThe NHS collects a huge amount of data.But the Government and Department ofHealth currently focus on ‘wrong’ meas-

Measure for measure

14

25. http://www.nhs.uk/England/

Choice/CompareHospitals.aspx?i

d=RQM01&id=RJ122&id=RJZ01

&id=RJ701&cache=RQM01RJ12

2RJZ01RJ701st=cyh&sSearch=5

LG (Last accessed 03.05.07)

26. Populus interviewed a ran-

dom sample of 1,004 adults in

Britain aged 18+ by telephone

between May 22nd & 23rd 2007.

Interviews were conducted

across the country and the

results have been weighted to be

representative of all British

adults.

27. Lower social groups would

find relative information about

healthcare providers useful (71

per cent for groups DE; 65 per

cent for AB) and would "boost

their confidence" more (29 per

cent for social groups DE; 16 per

cent for AB)

28. Patients want an overall rat-

ing (57 per cent) more than

detailed information (37 per cent)

about providers and want infor-

mation about practices and hos-

pitals (67 per cent), not individual

doctors (29 per cent).

In order to understand better how patients react to information – and to test the concept that the pub-

lication of data encourages patients to change their healthcare provider - we commissioned a survey

earlier this year.26 The results provide an insight into the ways that the public and patients react to the

publication of data.

Patients want information to challenge or seek reassurance from their doctor (38 per cent). Some

simply want to measure performance (17 per cent) but only a minority would use it to choose or

change their provider (16 per cent and 13 per cent respectively). In other words, choice is not the pri-

mary use to which respondents would put information about quality. This finding challenges the

assumption that the primary result of publishing such information will be to stimulate consumerism

through choice and thereby improve quality. Rather, “better understanding” and “boosting confidence”

are more commonly cited uses for information, suggesting that empowering the individual is more val-

ued than helping them choose between providers. Strikingly, the most deprived socio-economic

groups, and the oldest age group, most commonly cite these two uses.27

Unsurprisingly, we also found that there is a strong desire for information: 79 per cent of respon-

dents - especially women and disadvantaged groups - want a ‘scorecard’ outlining their local sur-

gery/hospital’s performance.

Respondents generally wanted information presented in aggregated forms (overall ratings for a

service rather than individual measures, and for hospitals/practices rather than individual doctors) but

they also wanted to compare hospitals with an expected level of performance, rather than have them

ranked in a league table against each other.28

“ A great hope of many who advocate publishing

information about healthcare quality is that patients will

choose the highest quality services, thus increasing the

market share of high-quality providers and driving the

poorest performers “up or out””


urements of performance, which fall intothree categories:

l Failure: such as mortality, re-admissionand re-operation. These measurementsmay be useful for some operations,such as heart surgery (although evenhere most operations do not end insuch failure), but they are largely irrele-vant to measuring the impact of themajority of the health service’s day today work (e.g. treating chronic condi-tions such as diabetes or asthma).

l Throughput (predominantly in acutecare): such as waiting times. The meas-urement of waiting times is important,not least because shorter waits for treat-ment may improve outcomes. But meas-urements of throughput do not accountfor the quality of care provided.

l Proxy, aggregated quality measures:such as the Healthcare Commission’srating. The health service does produce‘high level’ information on quality, butas we explain below, this ‘level’ of infor-mation is insufficient to drive necessaryimprovements to performance.29

There is some information on quality of carethat is easily available on websites coveringselected conditions and procedures, and thefollowing analysis is based primarily on threemajor websites covering the UK:

l The NHS in England(www.nhs.uk/England) is provided byNHS Connecting for Health, whichwas formed as an agency of theDepartment of Health on 1 April 2005with the primary role of delivering theNational Programme for IT. Its websitegives information, by specialty and sub-specialty, on the availability of servicesand waiting times.

l Dr Foster Intelligence(www.drfoster.co.uk) is a public-privatepartnership (between the InformationCentre for Health and Social Care and

Dr Foster Holdings LLP) that aims toimprove the quality and efficiency ofhealth and social care through better useof information. Its website was launchedin February 2006 and gives informationby trust and hospital, by specialty andsub-specialty, on indicators of structure,process and outcome.

l The Healthcare Commission(www.healthcarecommission.org.uk) isthe independent inspection body forthe NHS and the independent health-care sector. Its website provides a hospi-tal “health check”, a complex summaryassessment of quality of services, themain elements of which are: core stan-dards;30 national targets;31 and newnational targets.32 The HealthcareCommission also gives informationfrom national surveys of staff andpatients, at the level of the hospital;and national clinical audits by hospitalsat sub-specialty level. (The HealthcareCommission was created as the succes-sor to the Commission for HealthImprovement (CHI) in 2004.)

A fourth website, www.health.org.uk/qquip, isa good source for international comparisons onquality.

There is a danger, after ten years of datacollection and analysis, of (prematurely)concluding that the NHS has largely pro-vided what is required for assessing quali-ty of care in hospitals, and that it is nowtime to move on to the more challengingareas of primary and chronic care, and tothe development of the electronic patientrecord. These are indeed important areas,and chronic care consumes a large (andincreasing) share of healthcare resources.There is also evidence that managementin the community – measured through arange of process and outcome measures –both prevents expensive hospital admis-sion and keeps patients healthier. Thequality and outcomes framework (QOF)has given us both some reasonable meas-


The current situation

29. For example, the practice of

giving overall ratings for individ-

ual hospital trusts, which has

been followed by the

Government since 2000, has led

to simplistic methods of assess-

ment and a lack of information

about lower organisational levels.

30. For safety, clinical and cost

effectiveness, governance,

patient focus, accessible and

responsive care, care environ-

ment and amenities, Public

Health.

31. These are dominated by wait-

ing times for various services,

but also include proportions of

people suffering from a heart

attack who receive thrombolysis

within 60 minutes of calling for

professional help; booking and

choice of hospital admissions;

and reorganising cancelled oper-

ations.

32. This heterogeneous mix of

Government priorities includes:

mortality rates from heart dis-

ease, stroke, suicides; methicillin

resistant Staphylococcus aureus

(MRSA) levels; participation of

problem drug users in drug treat-

ment programmes, health

inequalities, adult smoking rates;

under-18 conception rate; NHS

patient experience; obesity

among children under 11; 18-

week wait from GP referral to

hospital treatment; health out-

comes for people with long term

conditions; and reducing emer-

gency bed days


ures of success and a consistent and highquality data flow. Despite much progressand effort we are still far from resolvingthe weaknesses in measuring the qualityof hospital care.

Table 2, below, summarises the infor-mation we have on hospital admissions.This shows massive gaps in the informa-tion that is available once we go below thelevel of the hospital. Dr FosterIntelligence has helped to fill this gap bygiving details for different hospital servic-es. The best source of information on theclinical quality of care is provided bynational clinical audits like those of car-diac surgery and stroke. However, infor-mation from national clinical audits is noteasily accessible, and Dr FosterIntelligence does not report the detailscovered by these audits. These are, more-over, incomplete in that they omit thepatient’s experience.

The national patient survey pro-gramme of the NHS gives information onpatients’ experience of the processes ofcare, and indeed is the largest survey of itskind globally, but still cannot give infor-mation at specialty level. Perhaps mostalarmingly, there is virtually no analysis ofpatients’ experience of the outcomes ofcare, yet a tried and tested tool exists toallow this to be done, and BUPA hasshown how we can collect this data rou-tinely and act upon it.

The information we need We consider information on four dimen-sions and from two perspectives. The fourdimensions are: waiting times, structure,process, and outcome. Clearly, otherdimensions could have been chosen, butthese represent the overriding NHS priori-ties of the last ten years, and the most com-

Measure for measure

16

Table 2: available information on quality of hospital care from the NHS

Hospital Specialty Condition

Waiting times Yes Subspecialty None

Structure Some Some Selected

Patient experience

Process Some Some None

Outcomes None None None

Clinical

Process Health check Health check Health check

Some national Some national

clinical audits clinical audits

Outcomes Mortality and Mortality and Mortality and

readmission rates readmission rates, readmission rates,

some national some national

clinical audits clinical audits


mon areas for performance assessment andimprovement. The two perspectives are theclinician’s and the patient’s. There is noneed to explain what we mean by waitingtimes, but the other terms do require clar-ification. (The elements of structure,process, and outcome were identified asbeing fundamental to understanding qual-ity of care in the seminal work ofDonabedian.)33

l Measurements of structure. These include,for example, the adequacy of facilities andequipment, the qualifications of medicalstaff, and administrative structure. Thesemeasurements need to be used sparingly.They have limitations as they can encour-age a “tick-box” mentality. That is,providers can become preoccupied withgetting the right committees in placewithout changing the way anything isprovided. However, evidence-basedstructural measures can be invaluable. Forexample, there is good evidence thatpatients treated in specialist stroke unitsfare better than in non-specialist hospitalunits.

l Measurements of clinical process. Theseuse evidence of relationships betweenoutcomes and the processes of care.The Government’s national serviceframeworks (NSF) and guidelines rec-ommend what processes of care oughtto be used for each condition.

l Measurements of clinical outcomes.These include mortality and hospitalreadmission rates. Indeed, these arethe only data that the NHS routinelycollects following discharge from hos-pital.

l Measurements of process from thepatient’s perspective. These includequality of food, accommodation,cleanliness and being treated with dig-nity and respect.

l Measurements of outcome from the patient’sperspective. These capture data other thanwhether the patient died or was readmit-ted: in particular, whether that patient’sstate of health has improved followingdischarge from hospital.

Table 3 summarises these perspectives anddimensions.



33. Donabedian A, "Evaluating

the Quality of Medical Care", The

Milbank Quarterly, 2005

Table 3: information the NHS needs

Clinical perspective Patient perspective

Waiting times Self explanatory

Structure The adequacy of facilities and equipment, the qualifications

of medical staff, administrative structure

Process Measures of clinical process – Quality of food,

application of evidence-based accommodation, cleanliness

care, e.g. following NSF and and being treated with

NICE guidelines dignity and respect

Outcome Include mortality and hospital Whether that patient’s state

readmission rates of health has improved following

discharge from hospital


Obstacles to the use of healthcareoutcome measurementsDespite enthusiasm and effort the NHS,like most health services globally, has madelittle progress in measuring its perform-ance. The barriers to successful measure-ment (complexity, credibility, level, tech-nology, technical expertise, organisationalcost and ‘primary care’) are described indetail below:

l Complexity: Measuring outcomes in asophisticated and accurate manner is dif-ficult, while using simple output meas-urements like case fatality can be mis-leading. Many different measurementsare currently available and no consensushas been reached on which should bepreferred, or about which should takepriority when they conflict.34 Conceptualproblems, explained in further detail inchapter five, include being able to adjustfor the case-mix (the different level ofrisk offered by each individual in a groupof patients) of patients being treated.This is similar to the arguments aboutthe need to estimate the value added byschools to take account of the differentmix of children on entry. Highly skilledsurgeons who operate on difficult casesmay have worse outcomes than lessskilled surgeons who operate on simplecases. There are also problems in decid-ing the period during which outcomesought to be measured.

l Credibility: Outcome measurementsthat exist are often not seen as credibleby health professionals. This may onoccasion be due to a fear of havingtheir professional independence chal-lenged or being held accountable fortheir performance. In contrast, wheremeasures are sufficiently sensitive andcredible, for example the approachtaken to the publication of cardiacsurgery mortality figures, measure-ment gets relatively high levels of sup-port from clinicians.

l Level: A further problem is getting the‘organisational level’ of the informa-tion right. There are two ways of sim-plifying the assessment of the qualityof a hospital, and each aims to give aglobal indicator of hospital perform-ance and hence of all its services. First,one can assume that the part of a hos-pital’s services that we can measurewill act as a good proxy for all servicesin the hospital. For example, the out-come measurements we have for mor-tality and readmission rates may beassumed to give a good indication ofthe quality of services generally.Second, one can make estimates of thequality of the services provided at hos-pital level by measuring the satisfac-tion of patients with the processes ofcare, such as catering, cleaning, andthe extent to which they were treatedwith dignity and respect.

Unfortunately, these methods are tooblunt. A general acute hospital is ahighly complex organisation. TheCHI, in its inspections of the imple-mentation of systems and processes toassure and improve the quality of carein acute hospitals, found that single-specialty hospitals tended to do best.Performance in multi-specialty hospi-tals varied greatly and there was often adysfunctional clinical team.

In the case of multi-specialty hospi-tals, generic indicators are valid forthose services that are organised at hos-pital level (catering, cleaning, diagnos-tic services), but not for those that varybetween specialties and individual doc-tors. Thus, to know that a hospital haslow mortality rates for medical and sur-gical admissions tells us nothing aboutmortality rates in surgical sub-special-ties. In the Bristol case, the numbers ofexcess deaths from paediatric cardiacsurgery would have been too small tohave had an impact on its total rates ofmortality; and there were important

Measure for measure

18

34. Such as service costs, end to

end response times, capacity lev-

els, demand, referral rates, num-

bers of missed appointments

('DNAs'), clinical outcomes, wait-

ing times/numbers by health pro-

fessional/referring doctor specialty,

length of stay, patient satisfaction.


differences in outcomes between thetwo surgeons for different procedures.

l Technology: Poor technological infra-structure has hindered the use ofappropriate outcome measures. This isan issue that the Connecting for Healthprogramme has attempted to address.35

For cancer care, there are registries thatprovide a history of treatment for indi-viduals. There is hope that electronicrecords will do this in the future for allpatients routinely, but the record ofsuccessful implementation of suchmassive IT projects cautions againstthis being a practical reality in the nextfew years.

l Technical expertise: despite the creationof the Health and Social CareInformation Centre in April 200536

there remains a shortage of staff, partic-ularly at a local level, who are able tocollate and analyse outcomes data.CHI consistently found that use ofinformation within NHS trusts was theleast developed of the various compo-nents of clinical governance.

l Cost: a key challenge for outcomesmeasurement is to ensure that the costof collecting data and ensuring com-pleteness, accuracy and standardisa-tion are justified by the benefitsderived. This assurance has not beenproven.

l Primary Care: It is much more com-plex to assess the quality of primarycare. Chronic care, which accountsfor much of primary care, usuallyinvolves multiple providers, multipleinterventions and, by definition, seesthe patient’s health outcome declineover time.

As we show in the case studies that follow,evidence indicates that outcome data candrive improvement in the quality of services,such as in cardiac care where it has helpedimprove the quality of work performed bysurgeons.

The next chapter gives examples of wheninformation has been instrumental inimproving services, and when it has beenineffective.



35. Formally the National

Programme for Information

Technology.

36. http://www.ic.nhs.uk/services/

(Last accessed 03.05.07)


3From information to action

Five case studies

Plausible models, supported by goodresearch, suggest that measuring quality ofcare and publishing the resulting informa-tion can improve healthcare. But it is clearthat large gaps in the available data remain,and that there are considerable technicalchallenges to moving forward. To identifysome potential ways of solving these prob-lems we now consider five case studies ofattempts to use information to improvequality (three from the US,37 two fromEngland and one from Scotland):

l The publication of mortality rates fol-lowing coronary artery bypass grafting(CABG) surgery in the Cardiac SurgeryReporting System, which began inNew York state in 1989;

l The publication of annual reports ofclinical outcome indicators byScotland’s Clinical Resource and AuditGroup (CRAG), 1994-2002, for use bythe public and doctors;

l The Veterans Health AdministrationNational Surgical Quality ImprovementProgram (NSQIP), which began in1994;

l The annual star rating of the NHS inEngland, (2001-2005), a single aggre-gate score (zero to three) linked to sanc-tions and rewards;

l The London Patient Choice Project(LPCP), which developed systems togive patients who had been waiting formore than six months the option of

choosing another hospital to have theiroperation sooner.

Each of these uses of information is exam-ined for purpose, selection of indicators,methods, presentation, dissemination andimpact. Analyses typically find large varia-tions in quality, whatever the measure(processes or outcomes). It is wrong toassume that, where clinical care has not yetbeen measured, there is no variation inquality. Ignorance ought not to be bliss.The significant gaps in the informationthat is available about NHS hospitals are aserious cause for alarm.

Cardiac surgery reporting systemin New York StateSummaryl In New York, outcome measurement and

publication encouraged both qualityimprovement and appropriate changes tothe supply side of the market. Publicationwas followed by improvements in out-come, and stimulated some qualityimprovement activity. Whether from“knightly” or “knavish” considerations,poorly performing surgeons improved per-formance or ceased operating altogether.

l There is also some evidence of gaming,such as outmigration and upcoding ofrisk, at least at the margins. The notedchanges in behaviour were by providersrather than patients.

20

37. Appendix one provides an

overview of the American

Healthcare system, and why,

despite its flaws, we believe that

we can learn from the US in this

particular case.


l As there is no effective ‘control’ in theNew York studies, we do not know therelative importance of measurementand publication. Northern NewEngland states achieved, through pri-vate reporting, similar results to NewYork, so it may be that measurementalone is sometimes enough.

PurposePerhaps the most famous and studiedattempt to use information to improvequality of care began in 1989, when theNew York State Department of Healthdeveloped methods to collect and analysedata to reduce mortality after CABG.

Selection The focus on CABG surgery represented aconscious break from previous attempts atperformance measurement, such as theHealthcare Financing Administration annualreports on mortality among hospitalisedMedicare38 patients. The differences weretwofold: the scheme was focused, and it was,or attempted to be, risk-adjusted. Cardiacsurgery is a definite, singular intervention (anoperation), for which there is an outcome(death or survival), which has a clear, if notnecessarily causal, relationship with the inter-vention. This contrasts with mortality ratesfor a whole hospital, where a range of inter-ventions, confounding factors and patientcase-mix influence the results.

Much of this work was led by the car-diac advisory committee of the state’sDepartment of Health: a group of cardiacsurgeons, general physicians and con-sumers who guide the department on car-diac care. To adjust for the relative risk thatdifferent patients presented, the depart-ment developed the Cardiac SurgeryReporting System (CSRS), whereby clini-cal data was collected on all patients under-going cardiac surgery in New York, such astheir risk factors and demographic details.Risk factors included unstable angina, lowejection fraction, stenosis of the left main

coronary artery, congestive heart failureand chronic obstructive pulmonary dis-ease.

Hospitals provided the data on a quar-terly basis to the department, which then,under the guidance of the committee,compared the mortality rates of theproviders. This comparative data includedcrude, expected and risk-adjusted 30-daymortality rates by hospital and surgeon wasgiven to the providers and became the basisfor efforts at improving quality of care. Aseven a critical study of the New Yorkexperiment observed: “With these impor-tant improvements, CSRS became the firstprofiling system with sufficient clinicaldetail to generate credible comparisons ofproviders’ outcomes. For this reason,CSRS has been recognized by many statesand purchasers of care as the gold standardamong systems of its kind.”39

DisseminationInitially, the department released only thedata on hospitals to the general public,reserving the data on specific surgeonssolely for the providers. However, a lawsuitby the newspaper Newsday under the state’sFreedom of Information laws led to thegeneral publication of data on surgeons inDecember 1991. This provoked tremen-dous hostility from physicians, includingthe committee, which felt that the numberof operations performed by each surgeonwas too low to be statistically significant.The committee responded by recommend-ing that hospitals submit data that wouldmake it impossible to detect the perform-ance of individual surgeons. A compromisewas eventually reached. Data would onlybe released about surgeons who had per-formed at least 200 operations during thepreceding three-year period.

Model of improvement As originally envisaged (that is, with confi-dential reporting) the scheme would havemade an appeal to the professionalism and


From information to action

38. A federal health insurance

programme for people age 65

and over and for individuals with

disabilities.

39. Green J, and Wintfield N,

"Report Cards on Cardiac

Surgeons - assessing New York

state's approach", 1995


altruism of individual surgeons. Followingthe intervention of Newsday and the publi-cation of the data, the model of improve-ment became an implicit incentive forproviders, for there was now a threat totheir reputation and market share.

ResultsMark Chassin, commissioner of thedepartment from 1992 to 1994, arguesthat “the registry produces reliable andvalid measures of quality, and hospitals andcardiac-surgery programs throughout NewYork use this information to improve out-comes for their patients.”40 The evidence isthat a remarkable improvement in out-comes for cardiac surgery followed theintroduction of the reporting system. Fromthe beginning of 1989 to the end of 1992,risk-adjusted mortality for CABG in NewYork fell by 41 per cent, going from 4.17per cent to 2.45 per cent.41 This was a farmore dramatic decline than was achievedelsewhere: Medicare patients undergoingCABG, for example, enjoyed only an 18per cent reduction in mortality nationwidebetween 1987 and 1990.42 Only in north-ern New England states were similar resultsachieved during this period – and perhapssignificantly, these states were subject totheir own outcomes reporting process, ifonly a confidential one.

Impact Despite the impressive results achieved inNew York, publication of CABG mortalityrates had no impact on the flows ofpatients to hospitals: improvementsreflected changes in the behaviour ofproviders rather than patients. The report-ed improvements followed from threekinds of actions by providers:

l Stopping surgeons with poor outcomesfrom undertaking CABG. Individualsurgeons with bad mortality rates leftthe market.43 More generally, the resultsshowed that surgeons who performed

fewer than 50 operations per year hadhigher mortality rates than high-vol-ume surgeons. Some hospitals, there-fore, began to restrict the operatingprivileges of low-volume surgeons: 27stopped performing bypass surgeryaltogether between 1989 and 1992. Allof these had had mortality rates in theirfinal year of practice that were between2.5 and 5 times higher than the stateaverage.44 Other surgeons whose per-formance was unsatisfactory (many ofwhom had received their principaltraining in fields other than adult car-diac surgery, such as paediatric surgery)chose to retire from cardiac surgery. Aspoorly performing surgeons weremoved away from the operating the-atre, high-performers were given morehigh-risk patients.

l Poorly performing providers implement-ed specific quality improvement reforms.The committee intervened to provideexpert advice to poor performers. In onecase, a hospital suspended cardiac sur-gery until a new chief could be installedto reform the entire department.45

l Poorly performing providers alsosought to understand the precise causesof their high mortality rates. Wherethese causes were not obvious in theinitial data, they undertook research.Dziuban et al cite the example of StPeter’s Hospital in Albany, the statecapital, which struggled to understandwhy its mortality rate was significantlyhigher than the state average.46 Initialstudies of mortality and morbidityshowed no problems with quality ofcare, and reviews of risk-factor codingexposed no faults in the data.Eventually, with the help of the depart-ment, the hospital pinpointed its emer-gency provision of bypass surgery as thesource of the problem: the mortalityrate there was 26 per cent, compared toa state average of 7 per cent. A furtherreview concluded that the problem was

Measure for measure

22

40. Chassin M et al, "Benefits

and Hazards of Reporting

Medical Outcomes Publicly",

New England Journal of

Medicine, 1996

41. Hannan E et al, "Improving

the outcomes of coronary artery

bypass surgery in New York

State", 1994

42. Peterson E et al, "Changes in

mortality after myocardial revas-

cularization in the elderly: the

national Medicare experience",

Annals of Internal Medicine,

1994

43. Chassin et al 1996

44. Ibid

45. Ibid

46. Dziuban S et al, "How a New

York cardiac surgery program

uses outcomes data", Annals of

Thoracic Surgery, 1996


the failure to stabilise patients ade-quately before surgery (not using intra-aortic balloon pumping as much asrequired, for example). Once this spe-cific failing had been addressed, StPeter’s mortality rate improved to nor-mal levels.

Omoigui et al raised questions over the var-ious types of gaming behaviour undermin-ing the reported successes. They suggestedthat the 41 per cent reduction in risk-adjusted mortality from 1989 to 1992 didnot mean a genuine improvement in out-comes. First, a number of high-riskpatients were treated in a neighbouringstate (Ohio).47 Other studies have, however,questioned the scale of this outmigration.48

Moreover, as Omoigui et al recognised, it isnot possible to ensure that all providersaccept high-risk patients in a system whereoutcome measures are used nationwide.Their second criticism was that (despiteoutmigration), there was a dramaticincrease in the reporting of risk factors ofpatients undergoing CABG after 1989: theexpected mortality using the reported riskrose from 2.62 per cent in 1989 to 3.54 percent in 1992. The authors stated that “sincesurgeons and hospitals are not only awarethat they are under surveillance but thatthey are also responsible for primary datacollection, artificial increases in patientseverity scores could result from the selec-tive emphasis of clinical characteristics”.

If a system is undermined by gamingthis does not necessarily mean that it isfatally flawed. Given the discussion on“knights and knaves”, any system that aimsto drive improvement will be likely to pro-duce the “knavish” response of gaming.Indeed, we suggest that an absence of gam-ing indicates an absence of incentives forimprovement. The above account showsthat the New York reporting system pro-duced data that surgeons and providersfound credible. They were very muchaware of the information, as this was in the

public domain, even though it had no dis-cernible impact on patients’ behaviour.The information was published so that itwas easy to interpret. It was used in bench-marking, and was reported at the organisa-tional level where corrective action couldbe taken.

Clinical Resource and Audit GroupOutcome Indicators ReportsSummaryl The Clinical Resource and Audit Group

(CRAG) carried out a pioneering studythat attempted, largely unsuccessfully, tochange provider behaviour.

l The two evaluations of the CRAGOutcome Indicators reports providedcriteria on which those reports failed:credibility, awareness, ease of interpreta-tion, reporting results at the levels wherecorrective action can be taken, providinginformation in a form that can be usedfor benchmarking, timeliness, and use ofincentives for providers.

l As Florence Nightingale constantlyreminded herself, recommendationshave to be carried out: there must beincentives for providers to act on whathas been reported.

Purpose In 1994, Scotland’s Clinical Resource andAudit Group (CRAG) pioneered inEurope the publication of annual reportsof clinical outcome indicators for use bythe public and doctors.49 The group’sreports aimed to provide a benchmarkingservice for clinical staff by publishing com-parative clinical outcome indicators acrossScotland. Performance management wasseen as a matter for local action: there wereno external national pressures to act onCRAG’s indicators other than publication.

Selection CRAG reports included two kinds of hos-pital clinical indicators: emergency read-



47. After reviewing 9,442 CABG

operations performed in neigh-

bouring Ohio between 1989 and

1993, the authors found that

patients referred from New York

were more likely to have had

prior open heart surgery than

patients from Ohio or other

states or other countries (44 per

cent compared to 21.5 per cent

and 37.4 per cent and 17.3 per

cent respectively). They were

also more likely to be in the New

York Health Heart Association's

functional class III or IV, another

sign of high risk. Both the

expected and the observed mor-

tality rates for the patients from

New York were higher than for

the other referral cohorts, a pat-

tern that was not evident in the

control period 1980 -1988.

48. Chassin M, Hannan E,

DeBuono B, "Benefits and

Hazards of Reporting Medical

Outcomes Publicly", New

England Journal of Medicine,

1996

49. "Clinical Outcome Indicators.

A report from the Clinical

Outcomes Working Group",

CRAG, 2002


mission rates for medical and surgicalpatients; and mortality (or survival) afterhospital treatment for hip fracture, acutemyocardial infarction, stroke and selectedelective surgery.

Methods An important development was the linkingof data on hospital admissions and dis-charges to mortality data collected by theRegistrar General, so that indicators ofmortality (or survival) took account ofdeaths both inside and outside hospital.These indicators were standardised for age,sex and deprivation (based on the patient’sarea of residence) but not for risk (unlikeNew York’s CSRS). The reports empha-sised that variations in reported perform-ance could have been due to varying pro-portions of higher and lower risk proce-dures, differences in the condition ofpatients on admission and differing diag-nostic thresholds. CRAG reported per-formance on each indicator using methodsdeemed appropriate for each.

PresentationThe last report presented information onthe following:

l survival (at 30 days and 120 days) afterhospital admission following hip frac-tures, acute myocardial infarction andstroke: total numbers of patients andthose who survived, as well as crudeand standardised mortality rates, andten-year trends in survival for singleyears (without confidence interval esti-mates).

l surgical mortality (at 30 days) overthree three-year periods: numbers ofoperations and deaths, and crude andstandardised mortality rates (with 95per cent confidence interval estimates).

l readmission rates (at seven and 28days): total numbers of discharges andreadmissions, and crude and standard-ised readmission rates.

Dissemination Initially, publication attracted the atten-tion of the media (results were sometimespresented as “death league tables”). Overtime, however, media interest waned andinformation was disseminated exclusivelyto chief executives and senior clinicians.The overriding concern of the last reportin 2002 was to highlight the limitations ofthese indicators. The report included theresults of an investigation into the widediscrepancy found in survival rates afteradmission for stroke between two hospitalsin Edinburgh. The explanations offeredwere that the closure of the Accident andEmergency unit at one of the hospitalsmeant that severe emergency cases ofstroke were more likely to be taken to theother hospital, where the patient’s firstmilder stroke may not have been record-ed.50 An outsider is struck by the note ofsatisfaction which is evident in the way thisnegative conclusion was reported.51

Model of ImprovementGiven the lack of widespread dissemina-tion, and the markedly lower enthusiasmfor encouraging choice in provision ofhealthcare in Scotland than in England,the CRAG reports have primarily operatedby changing provider behaviour; they havenot used any incentives beyond appeals tothe intrinsic motivations of professional-ism and altruism.

ImpactThere have been two evaluations of CRAGreports. One was by independent academ-ics based in England,52 who explored theireffects on NHS trusts in Scotland. Theother was by a CRAG-funded clinical indi-cators support team,53 which investigatedthe requirements of health boards andtrusts for clinical performance informa-tion. They checked whether the indicatorsmet those requirements, and how and whythey had been used. The main conclusionof the academics, Mannion and Goddard,

Measure for measure

24

50. The report's concluding

observation on this investigation

is: 'This is just one example

(albeit perhaps an extreme one)

of the kind of issues relating to

case mix which may affect

apparent outcome in the context

of this indicator of survival after

admission with stroke'.

51. Ibid

52. Mannion R, and Goddard M,

"Impact of published clinical out-

comes data: Case study in NHS

hospital trusts", British Medical

Journal, 2001

53. "Clinical Outcome Indicators.

A report from the Clinical

Outcomes Working Group",

CRAG, 2002


was that in Scottish trusts these indicators“had a low profile…and were rarely citedas informing internal quality improvementor used externally to identify best prac-tice”.54 The clinical indicators supportteam came to similarly depressing conclu-sions.

Unlike New York’s cardiac surgeryreporting, clinical staff did not considerthe information published to be crediblebecause of the limited accuracy of codingand the lack of an adjustment for risk. Alsothe information was seen as being out ofdate (at least 18 months old at the time ofpublication). There was low awareness ofthe reports because of lack of media atten-tion and limited dissemination withintrusts. The indicators were neither linkedto a formal system of performance assess-ment nor used by health boards in holdingtrusts to account.

The two evaluations emphasise what thereports did not do. They did not identify:good or poor performance; key messages;where action could be required; how serv-ices were organised where outcomes weregood; or possible reasons for variations inoutcomes. Clinicians did not find theinformation useful for benchmarkingbecause it was presented at the level of thehospital rather than the clinical team.

In many ways, the CRAG study illus-trates how measurement alone will notimprove quality. The next case study weinvestigate shows how measurement alliedto a comprehensive quality improvementprogrammes can have significant impact.

Veterans Health AdministrationNational Surgical Quality Improvement Program Summaryl The Veterans Health Administration

(VA) is widely recognised as havingtransformed itself and become a healthservice provider of very high quality inthe last 15 years. This has been

achieved, in part, through a qualityimprovement programme founded onthe collection and internal reporting ofprocess and outcome measurements.

l In many senses, the VA is more like theNHS than many other parts of the UShealthcare system: it provides care for adefined population, it is a system thatintegrates primary and secondary care,and it is a Government provided serv-ice. As such, its approach eschewedmarket mechanisms (its population ofpatients are “captive”) and may bemore easily transferable to the UK thanothers in the US.

l The VA has powerful lessons to teachabout how to use information toimprove services. The following isrequired to ensure that information isacted upon to improve services:

• a formalised, non-punitive, feed-back on performance;

• sharing of best practice betweendifferent organisations;

• using outlying performance as aspur to investigate, identify andcorrect underlying problems.

l This approach is one that relies oninternal rather than public reporting. Itis the approach we label “informationfor improvement” in our vision for2010. (See chapter 7).

PurposeThe Veterans Health Administration, a partof the Department of Veterans Affairs(commonly known as the VA), is the largestprovider of healthcare in the US, and isaimed at citizens who have served in itsarmed forces. As of 2002, it had 159 med-ical centres, of which 128 performed majorsurgery, 165 long-term care facilities and376 outpatient clinics.55 The VA is the clos-est thing in US healthcare to the NationalHealth Service. For many years, the VA washeld up as an awful example of the perils ofsocialised medicine, as its quality of carewas notoriously poor. Studies in the 1980s



54. Mannion R, and Goddard M,

2001

55. Khuri S, Daley J and

Henderson W, "The comparative

assessment and improvement of

quality of surgical care in the

department of veterans affairs",

Archives of Surgery, 2002


and early 1990s showed that the VA pro-vided lower standards of care than privateproviders funded through the federalMedicare programme.56

In 1986, Congress passed legislationrequiring the VA to report anually the out-comes of its surgical operations comparedto the national average, and to adjust thoseoutcomes to take account of patients’ con-ditions. The VA surgeons who were com-missioned to respond to the congressionalmandate reported that there were neitherany known national averages for surgicaloutcomes nor any known risk-adjustmenttechniques that could be used for the vari-ous types of surgery.

There are a number of advantages forthe VA in being a NHS-type system. It has:

l centralised authority;l capacity for sophisticated medical infor-

matics to develop both national averagesand risk adjustment techniques, andhence measure in a systematic and reli-able way the quality of care it provides;

l a defined and captive population;l the ability to drive change through

internal processes, such as privatereporting and intensive quality improve-ment efforts (rather than open publica-tion of outcomes that threaten the lossof market share though competition).

The purpose of the VA programme was tocollect the information necessary for acomprehensive system of quality improve-ment.

Selection The VA began by collecting data on out-comes of surgery, but then extended this todifferent types of care:

l Preventive: mammography, influenzavaccination, cervical cancer screening;

l Inpatient: providing aspirin and betablockers at discharge for acute myocar-dial infarction;

l Outpatient: screening for diabetes byglycosylated haemoglobin measure-ment and annual eye examination, andscreening for depression.

Methods The VA’s first step was to launch a study ofsurgical risk. This used data from 117,000major operations collected by a nurse ineach of the 44 VA medical centres between1991 and 1993. The study developed andvalidated risk-adjustment systems for themeasurement of mortality and morbidityin nine different types of surgery (includ-ing eight non-cardiac ones) 30 days afterthe operation. It found that health out-comes were determined jointly by the riskfactor of patients before surgery, randomvariation and quality of care. If the firsttwo could be accounted for by appropriatestatistical methods, it would be possible toregard health outcome as a measure ofquality of care.

Satisfied with the validity of these tech-niques, the VA was ready to proceed witha programme to monitor the quality ofcare in all VA medical centres performingmajor surgery. This program (NSQIP)began in 1994. A designated nurse at eachcentre collected data about 30-day mor-tality and morbidity and sent it for analy-sis to one of two centres, one for cardiacsurgery (the Centre for ContinuousImprovement in Cardiac Surgery), theother for non-cardiac surgery (the HinesVA Co-operative Studies ProgramCoordinating Centre). By the end of fis-cal year 2000, there were 727,447 opera-tions recorded in the NSQIP database.57

These centres edit the data and refer anypotential errors back to the reporting cen-tre for resolution.

These outcomes data were supple-mented by process data that measuredwhat proportion of interventions werethe correct ones for patients’ circum-stances. For example, in the area of pre-ventive care, were influenza vaccinations

Measure for measure

26

56. Gardner J "VA on the spot:

care quality to be probed by

Congress", Modern Healthcare,

1998. Also Kent C,

"Perspectives: VA under fire for

quality of care", Faulkner & Gray

Medical Health, 1991, and Zook

C.et al, "Repeated hospitalisation

for the same disease: a multiplier

of national health costs", Milbank

Memorial Fund Quarterly Health

Soc, 1998

57. Khuri et al, 2002


provided? In inpatient care, was aspirinprovided to a patient at discharge afteracute myocardial infarction? Using out-comes and process data together wouldserve as a meaningful measure of thequality of care, and allow qualityimprovement to be targeted at the preciseelements of care that are failing.

NSQIP does not gather data about spe-cific surgeons. There are two reasons forthis emphasis on systems rather than indi-viduals.58 The manifest justification is that“if systems of care are poor at a specificinstitution, even the most competent sur-geon can have poor outcomes. Likewise, amediocre surgeon can have excellent out-comes if he or she is functioning in an envi-ronment with excellent systems of care”.59 Alatent justification was the need to win thesupport and trust of surgeons in the wholereform process. There were concerns thattoo much scrutiny of individual surgeonswould alienate them, when their co-opera-tion was vital to the collection of data andthe processes of quality improvement.

DisseminationThe guiding principle of the programme isto effect quality improvement by providingfeedback to VA centres (and not by publi-cation). As Khuri, Daley and Hendersonobserve, “the NSQIP is not a punitive pro-gram for the purpose of identifying ‘badapples’. On the contrary, its primary focusis to provide the surgeons and managers inthe field with reliable information, bench-marks, and consultative advice that willguide them in assessing and continuallyimproving their local processes and struc-tures of care”.60 The feedback takes theform of:61

l an annual report to each centre. Thereport includes risk-adjusted out-comes data for all participating cen-tres, though they are given anonymouscodes rather than identified by name.Each centre knows only its own code,

which allows comparison of their per-formance with the other centres.

l assessment of “high and low outlier” cen-tres. At an annual two-day meeting theexecutive committee reviews the per-formance of each centre over the pre-vious four years. The committeereports concerns over “high outliers”(centres whose rates of deaths or com-plications are much higher thanexpected) and communicates praiseand certificates of commendation to“low outliers” (centres whose observedrates of deaths and complications aremuch lower than expected). This canbe done not only for whole centresbut, thanks to the detail of the infor-mation, also to surgical specialtieswithin each centre.

l spreading best practice: centres that havebeen consistently low outliers, or whichhave made dramatic improvements, areencouraged to reveal their methods.This information is included in itsannual reports to the centres.

l site visits: providers can invite the pro-gramme to conduct structured site vis-its to identify and propose solutions toproblems with care provision. The visithas two parts. First, a NSQIP nursechecks whether the data collected at thesite is reliable. If it is, then a team con-sisting of a surgeon, a critical carenurse, an anaesthetist, and a healthservices specialist arrives and providessenior surgeons and managers at thecentre with a confidential report.

l provision of self-assessment tools: providersand managers are given instrumentsdeveloped by NSQIP to assess thestrengths and weaknesses of their work,especially when the reports show them tobe high outliers.

Model of improvementThe VA is perhaps the finest example ofinformation being used as part of aninternal improvement agenda, yet there



58. Young G, Charns M, Daley J

et al, "Best practice for managing

surgical services: the role of co-

ordination", Journal of

Healthcare Management Review,

22, 1997. Also Young G, Charns

M, "Patterns of co-ordination and

clinical outcomes: a study of sur-

gical services", Health Services

Research, 1998


60. Ibid

61. Ibid


was no risk of loss of reputation (at leastin public) or market share, and no hint ofconsumerism (the VA has a captive mar-ket).

Impact The improvement in the quality of careachieved by the VA after implementing thesystems detailed above:62

l Since the National Surgical QualityImprovement Program began collectingdata in 1991, 30-day mortality in majorsurgery at VA centres has fallen by 27 percent, while 30-day morbidity has fallenby 45 per cent.63 And, unlike in NewYork State, there was no change in therisk profiles of patients during this peri-od.

l A study by Jha et al showed that, beforethe implementation of the programmein the baseline period (1994-95), theVA’s performance was poor in nearly allthe areas assessed. Measurements taken

after its implementation (1997 to 2000)showed significant improvement in pre-ventive, inpatient and outpatient care(Table 4). Of the 13 types of care forwhich multi-year data was available,there were significant improvements in12 of them.64

l A comparison of the VA’s results withthose of Medicare for the years 1997 to2001 showed that the VA outperformedMedicare in every type of care in 1997-99, and in 12 of 13 types of care in 2000-01 (the exception was annual eye exami-nations for diabetes patients). Jha et alconcluded that the only plausible expla-nation is the implementation of NSQIP,as nothing else had changed. They alsoargued that these comparisons underesti-mated the VA’s achievements, since theVA’s patients are generally sicker andmore likely to be physically disabled,mentally unwell, poor, uneducated andfrom a disadvantaged ethnic group thanthe general population.65

Measure for measure

28

62. In April 2007 the McClatchy

Newspaper Group challenged

the VA success story. The heart

of the story is that some VA

press releases may have over-

sold the studies' impact and

there is some anecdotal evi-

dence of marginal gaming, par-

ticularly on waiting times. In our

view there is nothing of any sub-

stance to challenge the VA

results.


64. Jha et al, "Effect of the

Transformation of the Veterans

Affairs Healthcare System on the

Quality of Care", New England


65. Ibid

Table 4: improvements by the VA in preventive, inpatient and outpatient care (%)

Type of care 1994-95 1997 1998 1999 2000

Preventive

Mammography 64 87 89 91 90

Influenza vaccination 28 61 67 75 78

Cervical cancer screening 62 90 93 94 93

Inpatient

Providing aspirin at discharge for 89 92 95 97 98

acute myocardial infarction

Providing beta blockers at discharge 70 83 93 94 95

for acute myocardial infarction

Outpatient screening

Glycosylated haemoglobin 51 84 90 94 94

measurement (diabetes)

Annual eye examination (diabetes) 48 69 72 73 67

Depression - - 44 62 73


Another advantage of NSQIP is the evi-dence that it provides on outcomes. TheNew York programme found a strong linkbetween volume and outcomes for bypasssurgery and the VA had a policy of closingsurgeries that handled a low volume ofcases. But one of its studies found thatwhat applied to CABG surgery could notbe generalised to other types of surgery.The VA consequently reversed the policy,which would otherwise have reducedaccess.66

The VA programme of surgicalimprovement started at about the sametime as Scotland’s CRAG reports. Whilethe former has had a dramatic impact, thelatter seems to have had none. Jha et al intheir explanation of the VA’s success makeit clear how it differed from CRAG inawareness, credibility, relevance andimportance:

l performance contracts held managersaccountable for meeting improvementgoals;

l whenever possible, quality indicatorswere designed to be similar to perform-ance measures commonly used in theprivate sector;

l data gathering and monitoring wereperformed by an independent agency– the external peer review program;

l critical process improvements, suchas an integrated, comprehensive elec-tronic medical record system, wereinstituted at all VA medical centres;

l performance data was widely distrib-uted within the VA, among key stake-holders such as veterans’ service organ-isations, and among members ofCongress.67

A transferable model?The success of the VA’s programme isclear, but can it be applied in Englandand Wales? The parallels between theorganisational arrangements of the VAand the NHS – such as their highly cen-

tralised structures of decision makingand their ability to run a unified infor-mation system – are greater thanbetween the NHS and any other elementof US healthcare. This may mean thatthe UK is uniquely well-placed to copyand benefit from the VA’s reforms. Themodel used by the VA relies heavily oninternal kudos and censure, with theinformation held within the systemrather than being more publicly report-ed. While VA beneficiaries may be ableto get access to such information, themore general public cannot. This may bereasonable given the limited populationit serves, but the NHS cannot be so par-simonious with its data.

Given the plurality of healthcare provi-sion in the US, transferability can be tosome extent judged by how successfully theVA model has been adopted by US privatesector providers.68 The signs here areencouraging.

In 1999, a private sector initiativeinvolving three non-VA medical centres(Emory University Hospital, theUniversity of Kentucky Chandler MedicalCenter, and the University of MichiganMedical Center) was launched to discoverwhether the VA’s data-gathering methodscould be used by non-VA providers, andwhether the models it used to predict out-comes from surgery could be applied tonon-VA patients.69 A nurse from each ofthese non-VA centres collected and provid-ed data to NSQIP Data CoordinatingCentre at Hines. Analysis of the first year’sworth of data showed that NSQIP datacollection methods were applicable to allthree non-VA centres, and that the VA riskadjustment methods were applicable tonon-VA patients. Indeed, the results wereso encouraging that the NSQIP executivecommittee aimed to make NSQIP avail-able to the surgical community across theUS.

If the VA model can be adopted byother providers in the US, there is no



66. Kuhri S, Daley J, Henderson W

et al, "The relationship of surgical

volume to outcome in eight com-

mon operations: results from the

VA National Surgical Quality

Improvement Program", Annals of

Surgery, 1999

67. Jha et al, "Effect of the

Transformation of the Veterans

Affairs Healthcare System on the

Quality of Care", New England





reason to suppose that the NHS – whichin many ways is closer to the VA healthsystem – cannot do likewise.

CriticismsYet, there are still reasons for caution. Inthe VA programme the more useful infor-mation at doctor or hospital level is notmade available in a simple report card forpublic use. This lack of openness is quick-ly becoming politically unacceptable in theNHS.

In addition, the range of measurementsis relatively limited. The findings of theJha study were striking, but the indicatorsit looked at were process measures ratherthan outcome measures. Whether apatient has been vaccinated, for example,is a different question from whether andto what extent a patient’s health has actu-ally improved as the result of a medicalintervention. There is a strong linkbetween process and outcome (a patientwho has received all the appropriate typesof care during a course of treatment islikely to enjoy better health than one whohasn’t) but, as the authors note, “A fullassessment would require the measure-ment of outcomes such as…patient satis-faction.”70

Khuri et al make the same point.NSQIP uses two very basic outcome indi-cators: mortality and morbidity. In thatsense, it is not vastly different from themeasuring currently done in the NHS,which shows whether a patient has died orbeen readmitted (both of which accountfor a minority of cases) but not the extentto which their health has actuallyimproved. “There are other dimensions ofsurgical outcome that could be incorporat-ed into the NSQIP to more thoroughlyassess the quality of surgical care”, theauthors stress. “The most important ofthese are long-term survival, functionaloutcomes, quality of life, and patients’ sat-isfaction.”71

Healthcare Commission Star Ratings Summaryl Where information does not credibly

relate to quality, its impact on bothproviders and patients is degraded evenif it is easy to understand.

l Star ratings were linked to the ‘Kudosand Censure’ model by financial incen-tives and by the opportunity to applyfor Foundation Trust status.

l When compared to performance inWales, star ratings did improve per-formance in England.

Purpose The manifest rationale for publishing starratings was that they gave a rounded assess-ment of performance of NHS organisa-tions for the public. The latent purposewas to put pressure on NHS chief execu-tives to deliver on the Government’s prior-ities. Star ratings amalgamated the scoresfrom a variety of different targets to pro-duce a single summary score that could bemapped into one of four rankings fromzero to three stars. In this way ministersdefined “failure” (zero rated) and “success”(three stars).72

Selection Two sets of indicators were used in star rat-ings of acute hospital trusts from 2001 to2005.73 The first comprised nine key tar-gets, of which six refer to waiting times.The other three were achieving a financialbalance, hospital cleanliness, and improv-ing the working lives of staff. The secondbatch of indicators comprised about 40targets organised into a “balanced score-card”. These reflected ministerial priorities,included a subset of the large number oftargets in the priorities and planningframework for 2003-2006, and satisfiedthe technical criteria of being applicablenationally, measurable, capable of beingcaptured by indicators, and stable overtime.

Measure for measure

30

70. Jha et al, 2003


72. Bevan G, "Setting Targets for

Healthcare Performance: lessons

from a case study of the English

NHS". National Institute

Economic Review, 2006 and

Bevan G, Hood C, "What's

Measured is What Matters:

Targets and Gaming in the

English Public Healthcare

System", Public Administration,

2006 and Bevan G, Hood C,

"Have targets improved perform-

ance in the English NHS?"

British Medical Journal, 2006

73. From 2001 to 2003, star rat-

ings also included assessments

from clinical governance reviews

by the CHI on the seven technical

components of clinical gover-

nance: public, service user, carer

and patient involvement; risk man-

agement; clinical audit; clinical

effectiveness; staffing and staff

management; education, training

and continuing personal and pro-

fessional development; and use of

information.


MethodsThe principal methodological innovationof star ratings was the combining of differ-ent kinds of information on performanceto produce a summary score. The methodsused were criticised for being arcane74 inrequiring levels of measurement that thedata did not justify.75 The scoring systemwas designed to use each year’s data to reg-ister improvements in performance (forexample, more three-star hospital trusts)and to avoid volatility in the star ratings ofindividual trusts. This resulted in whatappeared to be arbitrary selections ofthresholds in measuring performanceagainst key targets.76

Presentation The public presentation of the star ratingswas designed to make an impact by givinga single summary score and by reportingtargets in a way that was easy to under-stand:77

l “key targets” were reported symbolical-ly: üachieved, – underachieved, ûsig-nificantly underachieved;

l each focus area of the “balanced score-card” was reported as high, low ormedium;

l each indicator within each focus areawas given a ranking: 5 significantlyabove average, 4 above average, 3 aver-age, 2 below average, 1 significantlybelow average.

DisseminationStar ratings were published on the internetand as paper reports. They were also coveredby national daily newspapers, local newspa-pers, and national and local television. Twoprofessional journals also reported theresults (the British Medical Journal for physi-cians and the Health Service Journal formanagers). Star ratings were publishedwithin months of the end of the financialyear on which they were based and beforeParliament went into summer recess.

Model of improvement Star ratings explicitly gave trusts incentivesthrough a range of rewards and sanctions,such as an extra £1m to high performingtrusts and the possibility of applying forfoundation hospital status. However,reviews of star ratings also identified that“kudos and censure” played a very impor-tant part in trusts’ responses to them.78

ImpactThere is evidence of two kinds of impactsof star ratings: reported improvementsagainst ‘key targets’; and evidence of gam-ing,79 which is known to have been endem-ic when targets were used in centrallyplanned economies80 and in the public sec-tor.81 Improvements on three ‘key targets’for three different types of hospital waitingtimes when compared with Wales82 (wheretargets were not linked to sanctions andrewards), were as follows:

l Time spent waiting in A&E. The keytarget in England from January 2005for accident and emergency depart-ments was that 98 per cent ofpatients were to be seen within fourhours, which was achieved. The com-parable performance in Wales was89.4 per cent and 91.9 per cent inJune and September 2005. TheNational Audit Office examined per-formance in accident and emergencyin England, in 2002,83 and reportedthat improved performance andincreased patient satisfaction wasachieved despite increasing use ofemergency services.

l First elective hospital admission. The keytarget for the maximum wait was thatno one should be waiting more thannine months by the end of March in2004 (and 2005). This was achieved inEngland in each of those years. InWales the percentages of patients whodid so were 22 and ten per cent inMarch 2004 and in March 2005.



74. Klein R, "Mr Milburn's good

hospital guide", British Medical

Journal, 2002

75. One problem being the deriva-

tion of mean values across indica-

tors at the ordinal level of meas-

urement.

76. In the star ratings for 2003, for

example, the methods of scoring

'key targets' implied that for

patients waiting longer than 12

months, going from 10 to 11 was

twice as bad as going from two to

three.

77. Details were available in sup-

porting technical papers.

78. Mannion R, Davies H,

Marshall M, "Impact of star per-

formance ratings in English acute

hospital trusts", 2005

79. Bevan, G and Hood, C, Have

targets improved performance in

the English NHS?, British Medical

Journal, 2006 and Bevan, G, and

Hood, C, What's Measured is

What Matters: Targets and

Gaming in the English Public

Health Care System, 2006 and

Bevan, G., 2006, Setting Targets

for Health Care Performance: les-

sons from a case study of the

English NHS. National Institute

Economic Review, 2006

from Stalin to Gorbachev, 1998

and Kornai, J., Overcentralisation

in economic administration,

Oxford University Press, 1994.

81. Smith, P, On the unintended

consequences of publishing per-

formance data in the public sector.

International Journal of Public

Administration, 1995

82. Auditor General for Wales,

2005, NHS waiting times in Wales.

Volumes 1 and 2, Cardiff, The

Stationery Office, 2005 and

Auditor General for Wales, 2006,

NHS Waiting Times: follow-up

report, Cardiff, The Stationery

Office, 2006

83. National Audit Office, 2004,

Improving Emergency Care in

England, London: The Stationery

Office, 2004 http://www.nao.

org.uk/publications/nao_reports/03

-04/03041075.pdf (last accessed

03/05/07)


80. Berliner, J, Soviet industry

l First outpatient attendance.84 The keytarget for the maximum wait was thatno one should be waiting more than 17weeks by the end of March 2004 andthroughout 2004/05. In 2005, Wales,with a population only six per cent ofEngland, had nearly four times as manyoutpatients waiting more than threemonths.

However, these successes in deliveringpolitical priorities were countered by con-cerns that clinical priorities were distorted,that gaming was encouraged, and thatmany aspects of healthcare, not coveredexplicitly by star ratings, were ignored. In2004, The Healthcare Commission chair-man, Sir Ian Kennedy, stated that theCommission would “move...away fromratings only about specific targets to paint-ing a much richer picture, having to dowith performance across a number ofdomains”.85 A Consumers’ Association sur-vey earlier that year found that “less thanhalf of those surveyed were aware of starratings and almost half of these wereunlikely to use the ratings to help themchoose a health service”.86 Arguably, theratings did not reflect the quality of care(placing an equal value on cleanliness andmortality rates) but rather show the effec-tiveness of the trust’s management team.

The London Patient Choice ProjectSummaryl The LPCP is very different from the

other four case studies, as the informationthat it produced was published primarilyto allow patients to choose providers.

l The LPCP demonstrated that althoughpatients wanted and made use of infor-mation on quality and performance, itwas no coincidence that they also likedreduced waiting times for their opera-tions, free transport, and support in mak-ing choices. Additionally, the number ofhospitals per square mile in the London

area is unique in England and Wales. It isnot clear that the LPCP results would bereplicable elsewhere in the country. Inaddition the simple comparison of wait-ing times may not be analogous to themore complex judgements of quality ofcare.

PurposeThe LPCP was established to improvechoices for patients who were clinically eli-gible for treatment and who had beenwaiting for treatment at an NHS Londonhospital beyond a target waiting time (ini-tially six months, though this was laterreduced). As the end of the target waitingtime approached, patients were given anopportunity to choose from a range ofalternative providers who had the capacityto offer earlier treatment.87 The project hadfour overall objectives:88

l to develop the necessary capacity totreat patients expected to exercisechoice;

l to develop a working patient choicesystem;

l to learn how to improve the design ofthe system and feed lessons into futureLondon and national programmes;

l to improve patient waiting times andpatient satisfaction.

The LPCP was intended to change the per-ception of the NHS as a system that limit-ed patient choice, as Coulter et al pointout.89 A study in eight European countries,including the UK, had found strong sup-port for the notion of free choice ofprovider: 92 per cent for primary care doc-tors, 85 per cent for specialists, and 86 percent for hospitals.90 However, British peo-ple were among the most dissatisfied withthe actual opportunities for exercisingchoice. Only 30 per cent said that thesewere “good” or “very good”, compared to73 per cent in Spain and 70 per cent inSwitzerland. This dissatisfaction was con-

Measure for measure

32

84. Details of ratings:

http://www.performance.doh.gov.

uk/performanceratings/index.htm

(last accessed 03.05.07)

85. Hansard: Select Committee

on public administration 29

January 2004. http://www.publi-

cations.parliament.uk/pa/cm2003

04/cmselect/cmpub-

adm/41/4012910.htm (last

accessed 03.05.07)

86. Q&A: NHS star ratings 2005,

Guardian Unlimited, http://socie-

ty.guardian.co.uk/health/story/0,,

1536445,00.html (last accessed

03.05.07)

87. Burge P, Devlin N, et al,

"London Patient Choice Project

Evaluation", RAND Corporation:

Santa Monica, 2005

88. Dawson D, Jacobs R, Martin

S, Smith P, "Evaluation of the

London Patient Choice Project:

System Wide Impacts. Final

Report", Centre for Health

Economics, University of York,

2004

89. Coulter A, et al, "Patients'

Experience of Choosing Where

to Undergo Surgical Treatment:

Evaluation of London Patient

Choice Scheme", Picker Institute

Europe, Oxford, 2005

90. Coulter A, Magee H, "The

European Patient of the Future",

Open University Press, 2003


firmed by a MORI poll. More than a third(38 per cent) of those interviewed said thatNHS patients did not have any choices,and a further third said they didn’t knowwhat choices they could have. Two thirds(66 per cent) said that they felt NHSpatients should be able to choose where tohave their operations.91

Selection The project was extended from cataractsurgery to include patients awaiting specif-ic procedures in ear, nose and throat sur-gery (ENT), general surgery, ophthalmolo-gy, orthopaedics, and urology.Gynaecology and plastic surgery wereintroduced on a pilot basis in southeastLondon only. The selected conditions orprocedures in each specialty were those forwhich patients were most likely to have towait for more than six months. For eachprocedure an agreed patient care pathway(clinical plan) was developed and agreedbetween the trusts and the LPCP team.Patients were excluded from the scheme iftheir “originating trust” (OT) could guar-antee them treatment within eight monthsof going on the waiting list (later reducedto seven months), or if there were agreedclinical reasons why treatment by an alter-native “receiving trust” (RT) was inappro-priate. Clinical exclusion criteria werespecified by the LPCP team after wideconsultation with clinicians in the relevantspecialties in both originating and receiv-ing trusts and after reviewing the literatureto determine best practice.92

Methods The LPCP was co-ordinated by a centralteam. If patients on the in-patient waitinglist of a London trust were to be offered thechoice of an alternative provider, the OThad to agree to co-operate with the schemeand was linked up with two RTs, whichcould be NHS or private, or new treatmentcentres with spare capacity. An eligiblepatient was then offered the choice of

remaining with the OT or obtaining morerapid treatment at either of the two alter-natives.

Presentation and Dissemination Independent patient care advisers (PCAs)were responsible for liaising with patientsthroughout the process. Once placed onthe waiting list for surgery, patients attend-ing outpatient clinics at OTs were to beinformed in general terms about the choicescheme. Those patients whose medicalconditions were too complex to be consid-ered for a transfer to an RT were to beidentified and screened out of the LPCPsystem. Staff at OTs were responsible forvalidating the waiting lists and sendingnames of eligible patients to the advisers.Validation was supposed to take placewhen patients had been on the waiting listfor about four months. If patients wereconsidered ineligible for the scheme at thisstage, staff were required to inform them ofthe reasons for this.

Eligible patients were sent a letter givingthem advance warning that a PCA wouldtelephone them to discuss their optionsand an information booklet outlining thescheme in more detail. When patientsreached a specific date on the waiting list,an adviser would offer them the option ofgoing to an alternative hospital and wouldanswer their questions.

If the patient accepted the offer of analternative, the adviser was responsible forbooking an “operative appointment” – acombination of a clinical validation, an out-patient appointment and a pre-operativeassessment. The adviser would offer contin-ued telephone contact with the patient, andsupport if any problems occurred. The PCAwas also responsible for keeping the patient’sGP informed if the operation had beenscheduled to take place at an RT. Finally, adate would be specified for the operation.The intention was that the treatment wouldbe completed within eight-and-a-halfmonths of joining the waiting list.93



91. Patient Choice, MORI study

conducted for BUPA Health

Debate, 2003

92. Coulter A, et al, "Patients'

Experience of Choosing Where

to Undergo Surgical Treatment:

Evaluation of London Patient

Choice Scheme", Picker Institute

Europe, Oxford, 2005


Model of improvementAs the title of the project suggests, theexpectation was that patients would chooseproviders who were able to offer earliertreatment. Information was essential toenable them to make this choice.

ImpactCoulter et al (2005) reported on a study ofpatients from five OTs, who were sentpostal questionnaires before they had beenoffered a choice of hospital and after theyhad been discharged from hospital. Therewere also in-depth interviews with sub-samples in each group. They found that:

l less than a third (32 per cent) of patientsapparently eligible for the scheme wereactually offered a choice of hospital. Ofthose who were offered the opportunityto go to an alternative hospital, twothirds (67 per cent) chose to do so;94

l those who were in more pain and whofelt that their home hospital had apoor, or only fair, reputation were sig-nificantly more likely to choose toundergo treatment elsewhere;

l most patients who opted for an alterna-tive hospital were treated in NHS treat-ment centres (82 per cent) and weremore positive about their experience inthose centres (or private hospitals) thanin an ordinary NHS surgical depart-ment. Patients treated at alternative

hospitals were significantly more satis-fied with their hospital experience thanthose treated at their home hospital;

l patients valued the provision of freetransport and the support they receivedfrom PCAs in guiding them throughthe process. This included helpingthem to make a decision, and co-ordi-nating arrangements between the hos-pitals if they had opted to go to analternative hospital;95

l when deciding where to undergo treat-ment, patients tended to place greateremphasis on issues such as the locationof the hospital, length of wait for theoperation, travel arrangements andconvenience for family and friends;

l one in three survey respondentsexpressed dissatisfaction with theamount of information received aboutthe various hospitals: they wanted toknow more about arrangements for fol-low-up care, quality of care, the qualifi-cations and experience of surgeons,operation success rates, standards ofhygiene and safety records;

l an overwhelming majority (97 per cent)of patients who had opted to go to analternative hospital said that they wouldrecommend the scheme to others.

The LPCP achieved its principal objectiveto provide faster access to good qualitycare.

Measure for measure

34

93. In the event this procedure

was not followed in each of the

OTs exactly as specified. In

some cases patients were not

given the preliminary information

about the scheme to avoid rais-

ing expectations among those

subsequently deemed ineligible.

Although the LCP team recom-

mended early 'fitness for surgery'

assessment, this was not com-

mon practice during the life of

the project. The process of vali-

dating the waiting lists and deter-

mining eligibility also varied

between trusts (Coulter et al

2005).

94. There was no evidence of

inequalities in access to, or

uptake of alternative hospitals by

social class, educational attain-

ment, income or ethnic group;

but people in paid employment

were more likely to opt for an

alternative hospital than those

not in paid employment.

95. Others went to other NHS

hospitals in London (13 per cent)

or private hospitals (5 per cent).

Only one patient in the study

received treatment at an over-

seas hospital.


4Patients, payment and providers

Our five case studies all operated in dif-ferent ways, in different contexts andwith different results. What can we learnfrom them? And can this learning beapplied today in the NHS?

The Labour Government’s NHS poli-cies have gone through three dramaticshifts in emphasis.96 In 1997, The NewNHS97 laid out policies that sought a“third way” between centralised directionand markets. The end of that search wasmarked in 2000 by The NHS Plan,98

which introduced centralised direction inthe form of targets that eventuallybecame star ratings. Delivering the NHSPlan99 in 2002 marked not continuity butanother radical shift in direction, thistime to an emphasis on markets andincentives. There are three key differ-ences in the design of this market fromthat introduced in the early 1990s:

l an explicit recognition that theresponse to failure ought to be man-aged;

l hospitals are paid a fixed tariff bytypes of case, so they compete onother grounds (such as convenience,waiting times and quality);

l patient choice is used as a means ofimproving quality.

This raises two obvious questions. Dopatients act on information about quali-ty? Do providers respond to financialincentives?

Do patients act on information on quality? Although, when asked, consumers say theywant more information about the out-comes attained by providers,100 a review byMarshall et al concluded that “most expertsdo not believe that consumer pressure willbe an important mechanism to stimulatequality improvements for the foreseeablefuture”.101 A study into consumer use ofthe Pennsylvania report card systemshowed that, of 474 patients surveyed,only 56 were aware of the report cards atthe time of their surgery and only a quar-ter of these said that the system had a sig-nificant impact on their choice of surgeon.Another more recent review observed that“when this information is published only aminority are aware of it, of those most donot understand it (including whether highor low rates of an indicator reflect goodperformance), trust it or use it (with prob-lems with timely access and lack of genuinechoice); and evaluations of later develop-ments that addressed many of these poten-tial barriers failed to demonstrate signifi-cant or sustained public interest.”102

These reviews are an important correc-tive to the simplistic view that merely pub-lishing information will result in patientsswitching providers. However, awarenessof the availability of information and easeof interpretation both matter. In thePennsylvania case, more than half of allrespondents claimed that the report cardswould have influenced their decision had


96. Stevens S, "Reform

Strategies for the English NHS",

2004

97. The New NHS: Modern,

Dependable, the Stationery

Office, 1997

98. The NHS Plan, the Stationery

Office, 2000

99. Delivering the NHS Plan, the

Stationery Office, 2002

100. Edgman-Levitan S, and

Cleary P, "What information do

consumers want and need?",

Health Affairs, 1996

101. Marshall et al, 2003

102. Bevan, G "Impacts of

reporting healthcare perform-

ance", in press


they been aware of them, indicating thatthe problem was ignorance of, rather thanlack of enthusiasm for the data.

Paying for performance – do providersrespond to financial incentives?Redesigning payment systems to pay doc-tors and institutions by results is increas-ingly fashionable. The new GP contract,backed by the QOF is internationallyrecognised as one of the most ambitious ofsuch schemes.

Less discriminating funding mecha-nisms that pay for activity regardless ofhow necessary it is create a perverse incen-tive to over-treat the well-insured. Asshown in the US, this drives up costs with-out adding value.103 There is similar evi-dence to be found in Canada, Denmarkand Scotland.104 This problem has spurredattempts in the US to reward improve-ments in quality of care (rather than quan-tity) with extra payments:

l several health insurance plans inCalifornia award bonuses to physiciangroups for patient satisfaction andchronic care performance;

l an employers’ programme calledBridges to Excellence offers doctors$50 per patient for patient educationand installing clinical information sys-tems in their offices or surgeries;

l hospitals scoring highly on a range ofmeasures (including heart attack,CABG, and hip and knee replace-ments) receive a percentage of theirMedicare payments.

Critics fear also that, like centralGovernment targets in the NHS, pay-for-performance schemes may distort clinicalpriorities and have other unintended con-sequences. The resistance of doctors them-selves to the idea is another problem. A2002 survey found that, while 80 per centof doctors wanted clinics to receive morefunding for quality improvement, only 38

per cent supported the idea of makingdirect payments to group or individualphysicians in return for achieving suchimprovements.105

Literature assessing the success of finan-cial incentive systems is very mixed, withsome achieving desired changes, othersproving ineffectual and still others creatingperverse and unanticipated effects:106

l a literature review of 89 studies from arange of countries concluded thatfinancial incentives had a positiveimpact on many performance variables,such as admission rates to hospitals,duration of hospital stays, and, crucial-ly for our purposes, compliance withclinical practice guidelines;107

l several studies from the 1990s of finan-cial incentives for screening and immu-nisation showed that these had littleeffect.108,109 Yet a study of targeted finan-cial incentives for influenza immunisa-tion in a Medicare population showedthat even a small-scale incentive couldachieve a statistically significantimprovement.110

It is unclear to what extent financial incen-tives add to quality improvement beyondwhat can be achieved through measurementand reporting alone. A comparison wasmade between the same measures of chron-ic illness management in California, where afinancial incentive was offered, and thePacific North West,111 where feedback andreporting alone were used. In only oneinstance was there a significantly greaterimprovement in performance in Californiathan in the Pacific North West. By contrast,a very recent study of the Premier/Medicarepay-for-performance scheme, which com-pared hospitals that merely reported infor-mation about their quality of care with hos-pitals that combined this with performance-related pay, found that the latter group reg-istered “modestly greater improvements inquality.”112

Measure for measure

36

103. Porter M, Teisberg E,

Redefining Health Care: Creating

Value-Based Competition on

Results, Harvard Business

School Press, 2006

104. Gosden, T, Forland, F et al

"Impact of Payment Method on

Behaviour of Primary Care

Physicians: A Systematic

Review", Journal of Health

Services Research and Policy,

2001

105. Rolnick et al, 2002

106. Doran T et al, "Pay-for-

Performance Programs in Family

Practices in the United

Kingdom", New England Journal

of Medicine, 2006

107. Chaix-Couturier C, Durand-

Zaleski I, Jolly D, and Durieux P,

"Effects of Financial Incentives

on Medical Practice: Results

from a Systematic Review of the

Literature and Methodological

Issues", International Journal for

Quality in Health Care, 2000

108. Hillman et al, 1998

109. Ibid

110. Kouides et al, 1998

111. Rosenthal M, Frank R, Li Z,

Epstein A. (2005) Early experi-

ence with pay for performance

from concept to practice. JAMA

294, 1788-1793


There is also conflicting evidence aboutfinancial incentives in other public andprivate sector arenas:

l an evaluation of the impact of a per-formance-related pay system for teach-ers in England concluded that “thescheme did improve test score gains, onaverage by about half a grade perpupil.”113 The caveats were that this didnot apply to all subject teachers andthat the researchers were unable todetermine whether improvement repre-sented extra effort or effort divertedfrom other professional activities;

l in executive compensation “direct evi-dence of the responsiveness of executiveperformance to financial incentives isminimal”;114

l pay-for-performance registered someimprovements in the arena of in-jobtraining but gave rise to gaming, suchas providers targeting their efforts atclients for whom training provided theleast added value, which reduced cost-effectiveness.115

In their wide-ranging review of evidence,Rosenthal and Frank concluded that “thecurrent enthusiasm for pay for perform-ance in healthcare rests more on conceptu-al than empirical foundations”. However,their research suffered from at least threeflaws. First, it was not clear “that the find-ings from the literature are indeed compa-rable to the broader efforts now envi-sioned, which would systematically identi-fy and reward the best providers usingmulti-dimensional quality measures”.Second, many of the studies they examinedinvolved very narrow sets of measures, suchas compliance with guidelines on preven-tive healthcare. Third, they also suggestthat the financial incentives offered in thestudies may have been too small.

The mixed evidence on pay-for-per-formance is hardly surprising. If it werestraightforward, value-based competition

would have been introduced already.Although the issue of financial incentives isno longer uncharted territory, simple solu-tions are unlikely to be adequate given thecomplexity of healthcare.116

We now consider a controlled experi-ment that illuminates the quandaries aboutpatient choice, financial incentives, andpublication of information.

Does publishing add any value?The State of Wisconsin has often been avenue for innovation in public services.Advocates of school vouchers continue tocite the famous piloting scheme inMilwaukee in the 1990s. Equally signifi-cant, if less well known, are its pioneeringefforts in healthcare. This case study willexamine the efforts of a major purchasinggroup in the state to measure quality ofcare, assessing its results and determiningthe lessons to be drawn for the NHS.

The Employer Healthcare AllianceCooperative is a not-for-profit co-opera-tive of employer purchasers of healthcare.Set up in 1990 by seven local employers,it is based in Madison, Wisconsin and hascontracts with a range of hospitals in thearea. Its membership now includes 157employers and 73,000 individual employ-ees and family members. What distin-guishes the Alliance from many employ-er-purchasing groups is its determinationto base its purchasing decisions on valuerather than simply on cost. While manypurchasers seek the cheapest care avail-able, the Alliance tries to contract onlywith hospitals that provide care of a goodquality.

As its website puts it: “We practisevalue-based purchasing. We’re committedto improving the quality of the healthcaresystem overall, not just getting discounts.Discounts alone aren’t the answer – man-aging the system to prevent over-use,under-use and misuse of care, and toreward the best performing provider is.”


Patients, payment and providers

112. Lindenauer P, "Public

Reporting and Pay for

Performance in Hospital Quality

Improvement", New England


113. Atkinson A, Burgess S,

Croxson B, Gregg P, Propper C,

Slater H, and Wilson D,

"Evaluating the Impact of

Performance related Pay for

Teachers in England", The

Centre for Market and Public

Organisation, University of

Bristol, 2004

114. Rosenthal et al, 2005

115. Ibid

116. There are two other argu-

ments that can be made to sup-

port this both of which relate to

the incentivisation of specific

process measures. First, Rodney

Hayward has argued that most

pay-for-performance schemes

target ill-evidenced gold stan-

dards that require disproportion-

ately greater efforts to achieve

than the lower well-evidenced

targets. This means that they

have effectively become tools for

advocacy groups for specific dis-

eases to demand a bigger share

of the healthcare cake.

Consequently Pay for

Performance in this manner

achieves precisely the same

effect as fee for service.

Secondly, Porter and Tusberg are

themselves profoundly critical of

pay-for-performance schemes as

not achieving the value-based

competition that they advocate.

They argue for real payment by

outcome, but are unrealistically

optimistic on the achievement of

this.


The alliance endeavours to make theemployees it serves “better healthcareconsumers” through education, whichearned it the 2001 national healthcarepurchaser award. The most notableattempt it has made to drive up quality inlocal hospitals was a systematic reportingof their performance, QualityCounts,launched in 2001. QualityCounts con-sisted of two indices of adverse events(deaths and complications) across threeareas of care (cardiac, obstetric, and hipand knee). It classified hospitals as “betterthan expected” (denoting fewer adverseevents than expected), “as expected”, or“worse than expected”. The data wasobtained from the Wisconsin bureau ofhealth information.117 Exceptional effortswere undertaken to make the report pub-licly accessible: it was mailed to employ-ees covered by the alliance, included as asupplement in the local newspaper inMadison, made available online, and alsodistributed in hard copy to libraries andcommunity groups.

The impacts of information on providersThere were 24 hospitals in south centralWisconsin that were in the Alliance servicearea and that received a “public report”.Hibbard compared them with the remain-ing 91 hospitals in Wisconsin, which weredivided into two control groups of similarsize, similar baseline levels of performanceand similar characteristics such as averagetotal inpatient days. One group receivedno report, the other a “private, confiden-tial” report.

The study is unusual in that it examinedthe nature of changes in both outcomesand processes: asking hospitals how manyof seven specified reforms to reduce com-plications in obstetric care they wereimplementing. An earlier study by thesame authors found that in the ninemonths after the release of the public andprivate reports, “public report” hospitalshad undertaken significantly greater quali-

ty improvement efforts than “privatereport” or “no report” hospitals. This studyalso revealed that hospitals expected thereport to affect their public image but nottheir market share.118

The later study largely confirmed thesefindings. For obstetric care:

l In the public report group (whose infor-mation was given to them and was alsopublished locally), a third of hospitalsachieved statistically significantimprovements in obstetric care in thetwo years following the report, andonly 5 per cent showed a significantdecline. Of the eight hospitals withbaseline performances “worse thanexpected” only one was still at this leveltwo years later. The average score ofthis group for implementing the sevenspecified reforms to reduce complica-tions was 4.1 (and 5.7 for those whichshowed an improvement two yearslater);

l In the private report group (whose infor-mation was given to them but not pub-lished locally), 25 per cent of hospitalsimproved, and 14 per cent declined.Two thirds of the hospitals with base-line performances that were “worsethan expected” were still at this leveltwo years later;

l In the no report group (whose informa-tion was neither given to them norpublished locally), about 12 per centimproved. Nearly two thirds of the hos-pitals with baseline performances thatwere “worse than expected” were still atthis level two years later.

Results for cardiac care echoed those forobstetric care, but did not reach statisticalsignificance, which the authors explainedby noting that there were far fewer hospi-tals with poor scores in cardiac care. Therewas not enough baseline variation in hipand knee care for the authors to study it forpost-report changes.

Measure for measure

38

117. Hibbard J, Stockard J and

Tusler M, "Does publicizing hos-

pital performance stimulate quali-

ty improvement efforts?", Health

Affairs, 2003

118. Ibid


These results demonstrate the addition-al power of public reporting to change thebehaviour of providers. However, its effectson local patients are interesting and at firstsight contradictory. The QualityCountsreport was well known both amongemployees covered by the alliance and bythe general public. Of employees coveredby the alliance, 57 per cent had seen thereport, and 61 per cent had been exposedto it to some degree, such as by seeing thereport, reading about it in the press orhearing about it from another person.Immediately after the report was released,39 per cent of the general public had beenexposed to the report. Two years later thisfell to 24 per cent.119

In addition, those who had seen thereport gained a good understanding of thequality of local health services.Immediately after the report was publishedabout 35 per cent of respondents correctlyidentified the high-performing hospitals,and 63 per cent the poorly-performinghospitals. Of those that had not seen thereport, the proportions were 21 per centand 39 per cent respectively. Some twoyears later there was little change in identi-fication of the high-performers, but adecline in identification of the low-per-formers.

There was, however, no significantchange in market shares during the periodfrom just before release of the report to twoyears after. There were no shifts away fromlow-rated hospitals or toward higher-ratedhospitals in overall discharges, or in obstet-ric or cardiac care cases during any of theexamined post-report time periods.120

This raises interesting questions aboutthe motivations of both patients andproviders. Given robust and easily under-stood data about the quality of care,patients did not use it to change providers,echoing the results of our survey and theexperience of those investigating cardiacsurgery in New York. Early on, local hospi-tals expected their report to affect their

public image but not their market share,and they were correct in this expectation.The main driver of change seems to havebeen a desire to maintain a good reputa-tion, rather than fear of an actual declinein revenue.

That said, all hospitals were “slightly neg-ative” about the general idea of publiclyreported outcome measures. When asked torate the validity of these measures from 1(denoting not valid at all) to 5 (denotingvery valid), the “public report” group wasthe most sceptical (mean rating of 2.1), the“private report” group was the most positive(mean rating of 2.6), and the “no report”group was somewhere in between. Therewere similar results when the groups wereasked how appropriate the QualityCountsreport was for use by consumers, and howeffective it was as a way of driving up quali-ty of care. Within the “public report” hospi-tals, those with the lowest scores were themost sceptical. Nevertheless, the mostimpressive improvements were achieved byprecisely those hospitals that were most hos-tile to the report, the poorly-performing“public report” group.

The conclusion to be drawn from theQualityCounts experiment is that publicreporting of outcomes data is an optimalway of securing gains in the quality of care.However, as Hibbard et al observe, thereare three conditions that must be met ifsuch reports are to enjoy similar successelsewhere. The report must be widely dis-tributed throughout the target community,the information itself must be presented ina way that is easy to understand and thehospitals themselves must know thatanother public report will be published inthe near future so that they have an incen-tive to address any shortcomings revealedin the first report.

All of these seem to be common sense,and the first in particular would be rela-tively simple to achieve in Britain. Themethods used in Wisconsin to ensure awide distribution of the report – local



119. Ibid

120. However, the failure of the

report to impact on market share

may be at least partly due to the

particular circumstances of

Wisconsin. An unusual feature of

the state's health market is the

close alignment between hospi-

tals and physician groups.

Physicians tend to practise only

at hospitals aligned with their

own physician hospital organisa-

tion (PHO). Over 85 per cent of

physicians in the area served by

the alliance operate in such a

system. Whereas only 30 per

cent of hospitals nationally have

either formal or informal align-

ments with PHOs, almost all hos-

pitals in south central Wisconsin

do. The significance of this for

patients is that if they decide to

change hospitals they are likely

also to have to change physi-

cians, something many patients

are understandably reluctant to

do. The implication is clear: "that

in markets where this high

degree of alignment is not pres-

ent, a public report could raise

concerns about both reputation

and market share, motivating

improvements through both of

these pathways."


Measure for measure

40

newspapers, direct mailing, websites andthe provision of hard copies in librariesand other public facilities – can also beemployed in this country. Indeed, thefact that the primary care sector is somuch more widely and frequentlyaccessed by patients in Britain than bytheir American counterparts is an advan-tage in disseminating information. GPsurgeries, already highly visible featuresof most communities in this country, canbecome the main point of call for infor-mation about the quality of local health-care providers.

The second condition is not withouthazards. Making a report easy to evaluatemay sometimes mean simplifying the datato a degree that would meet with objectionsfrom providers. League tables, in particular,are likely to be criticised as crudely andbaselessly presenting some hospitals as sim-ply “better” than others, when the realitymay be much more complicated andambiguous. League tables tend to empha-sise differences in rank which may have nopractical difference in quality, thus creatingfalse criteria for choices. In Wisconsin’scase, going to the first or fifth ranked hos-pital would have made very little differenceto the patient. A league table approach islikely to increase gaming and arguing aboutnumbers, as the effect of very small changescould make a big difference in the ranking.

Wisconsin succeeded in creating datathat was easy to evaluate because iteschewed league tables in favour of meas-uring against expected standards.Anecdotal discussions about unpublishedpolling data in the United States suggestthat patients understand this, with threequarters of citizens wanting comparisonagainst expected performance rather thanbetween physicians.

ConclusionsBefore going on to consider practicalexamples of how to use information about

quality as a more effective tool let us reca-pitulate the key lessons from our review ofexperience to date, our survey of patientsand our review of currently available data.

l Measuring quality of care and publish-ing the resulting data makes healthservices more accountable to patientsand the public. It may also lead toimprovement in these services byencouraging patients, providers or bothto behave in different ways.

l When it is done well, as in the case ofthe VA, or the QualityCounts initiativein Wisconsin, measurement and publi-cation successfully improves the qualityof care.

l It is, however, hard to do well. Thereare two distinct difficulties: the techni-cal challenge of measuring quality andpresenting the information in a waythat improves quality.

l The effects of linking formal incentivesto quality measurement such as pay forperformance are ambiguous. Suchincentives encourage providers to takethe quality agenda seriously, but alsoencourage unintended perverseresponses such as gaming and falsifica-tion of data. It is not clear that theextra costs of providing a financialincentive encourage sufficiently betterperformance.

l Publication is important. As theQualityCounts initiative shows, pub-lishing quality measurements, ratherthan just reporting confidentially backto providers, seems to increase the like-lihood of changing their behaviour.

l However, evidence from the US andthe UK is at best ambiguous aboutwhether patients want this informationto enable them to choose between dif-ferent providers. Our own survey con-firms unpublished results from the USthat suggest that this is not a priority.In addition, very well studied publica-tion schemes in the US, such as


QualityCounts and cardiac surgerymortality in New York state showedthat publishing data did not lead tochanges in hospital market share. TheLPCP by contrast, appears to have beensuccessful, although this may reflectboth the large infrastructure put inplace to support choice, and the rela-tively simple metrics needed to under-stand how long one waits (a somewhatsimpler proposition than comparingthe quality of two services).

l There are currently many gaps in avail-able data, both in the UK and globally.

This is partly due to a lack of sub-hos-pital information (i.e. information atthe level of an individual service), butthere is also a lack of outcome data,other than measures of failure. There isnot an internationally accepted list ofperformance measures that the UK canjust pick up and use.

In the following chapters we consider someof the technical problems of measurementand then suggest ways of filling some ofthe gaps in the information that is current-ly available.




42

5Technical difficulties

IntroductionThe previous chapters have sought todemonstrate the results that have beenachieved by efforts to measure and improvethe quality of healthcare. The case studies ofthe VA, Wisconsin and New York show thatthese measurement systems, when combinedwith active programmes to drive up the stan-dards of poorly performing providers, haveled to dramatic improvements in the qualityof care received by patients.

Given these examples of apparently suc-cessful use of measurement, one may won-der why the practice is not more widespread.At least part of the answer is that it is hard toperform. There are difficulties in establishingmeasurement regimes that stimulateimprovements without also producing unin-tended consequences. These include thedesign of the incentive system around themeasurement system; ensuring adequatedata quality; and designing analysis that issufficiently sophisticated to allow for thecomplexities of healthcare, and is easilyunderstood by its target audience. There arealso problems in deciding where and how topublish the results. These difficulties aretechnical rather than ideological.

No credible attempt to argue the casefor measurement can fail to consider theobstacles faced by such an approach. Thischapter examines three of the most signifi-cant: the question of whether it is better tomeasure outcomes or processes; the dis-torted clinical priorities that can resultfrom the rigid application of certain meas-ures; and the fundamental problem ofwhether what is intended to be measured iswhat is actually being measured.

What to measure: outcomes orprocess?Most health experts caution against usingoutcomes to provide a judgement of thequality of care. For them, outcomes aredetermined by a range of variables, ofwhich quality of care is only one. The mostobvious of these is case-mix, and the com-plexity and intensity of the care that theyreceive as a result. “Hard cases”, thosepatients with poor health and/or sufferingfrom complications, are less likely to attaingood outcomes than healthier patientsreceiving the same quality of treatment. Ofcourse, almost all outcome measures go toconsiderable lengths to adjust for case-mix,but, as Lilford warns, this “can lead to theerroneous conclusion that an unbiasedcomparison between providers then fol-lows…Making judgements about qualityof care on the basis of risk-adjusted com-parisons cannot guarantee that like is beingcompared with like”.121 This is the “case-mix fallacy”.

However, the variables affecting healthoutcomes do not stop at the case-mix ofpatients. There is, for example, a problemof definition: whether or not a live birth orfractured neck of femur has taken place isself-evident, but what counts as a heartfailure or infertility or myocardial infarc-tion?122 Much depends on the discretionand the subjective judgement of the clini-cians on the scene. The variation in defini-tion can affect recorded health outcomedata as much, if not more than, variationsin the quality of care.

These problems are not merely theoreti-cal. Apparent variations in outcome may be

121. Lilford et al, "Use and mis-

use of process and outcome

data in managing performance of

acute medical care: avoiding

institutional stigma", 2004

122, Lilford, et al. 2004


so influenced by case-mix variability anddefinition that they do not relate closelyenough to quality of care to be cited as ameasure of how well a hospital performed.Several studies have shown examples of poorcorrelation between quality of care and out-come in practice. One review concluded thatmost hospitals in the top 5 per cent for mor-tality rates (the “worst” hospitals) will not bein the lowest 5 per cent for quality of care.Three separate studies found no associationbetween outcome and quality of care inacute medical care,124 myocardial infarc-tion125 and congestive heart failure or pneu-monia.126 Other work has discovered rela-tively weak associations for stroke,127 hipfracture128 and myocardial infarction.129

This problem is exacerbated when a leaguetable ranking is used to judge the relative per-formance of different hospitals, as the differ-ent rankings of hospitals on the table maynot reflect any meaningful differencesbetween their performances. Lilford et al’sconclusion, typical of the views of manyhealth policy specialists and clinicians, is thatalthough subjecting healthcare to some kindof performance measurement is valuable,“league tables of outcomes are not a validinstrument for day-to-day performance man-agement by external agencies.”130

This prompts the question of what is anappropriate way of measuring the quality ofcare. The response offered by Lilford et al isthat, although outcomes should still be meas-ured and used by hospitals themselves fortheir own performance management andquality improvement, measuring adherenceto clinical process has more value.131 In otherwords, clinical care should be judged accord-ing to whether appropriate medical interven-tions are made during a patient’s care, such asproviding beta blockers after acute myocar-dial infarction, using lower tidal volume inacute respiratory distress syndrome, or evenjust providing influenza vaccinations to thoseat risk. Process measures are already used inboth the US and the UK. The AmericanMedicare system and our own GP contract

include measures such as whether the correctantibiotics are prescribed for pneumonia, andthe influenza vaccination rate.132

Process measures can claim two majoradvantages over outcome measures. The firstis clarity: failing to meet agreed and evi-dence-based standards of care, such as pro-viding beta blockers, can be regarded as actu-al bad care, rather than just an indicator ofbad care. Unlike process data, a high mortal-ity rate does not tell a hospital precisely whatthe cause of the problem is. Lilford et al askus to consider three of the most infamouscases of clinical failure in the NHS in recentyears: the Bristol Royal Infirmary, Dr HaroldShipman, and a cardiac surgeon with excep-tionally high mortality rates. These were,respectively, the products of poor systems ofcare and scrutiny, a murderous personality,and a surgeon operating while sufferingfrom an undiagnosed brain tumour. “Evenin these most extreme situations”, note theauthors, “we are unable to reliably use out-come data to judge where the quality of carewas deficient.”133

Secondly, there is efficiency, as the meas-urement can be taken at or near themoment care is delivered rather than wait-ing for 30 days (or more, depending on themeasure) to assess the health outcome. Theinformation costs attached to processmeasures are generally lower. One studyestimated that variations in quality of caremay cause a 10 per cent difference in mor-tality across hospitals and that, although3,619 patients from each hospital wouldhave to be assessed to detect this, it wouldtake only 48 from each hospital to detectthe corresponding difference in processmeasures.134


Technical difficulties

123. Thomas J and Hofer T,

"Research evidence on the valid-

ity of risk-adjusted mortality rate

as a measure of hospital quality

of care", 1998

124. Best W and Cowper D, "The

ratio of observed-to-expected

mortality as a quality of care indi-

cator in non-surgical VA

patients", Medical Care, 1994

125. Jenks S et al, "Interpreting

hospital mortality data: the role

of clinical risk adjustment",

JAMA, 1988

126. Park, R et al, "Explaining

variations in hospital death rates:

randomness, severity of illness,

of care", JAMA, 1990

127. McNaughton H et al,

"Relationship between process

and outcome in stroke care",

Stroke, 2003

128. Freeman C et al, "Quality

improvement for patients with hip

fracture: experience from a multi-

site audit", 2002

129. Krumholz H et al,

"Evaluation of a consumer-ori-

ented internet healthcare report

card: the risk of quality ratings

based on mortality data", JAMA,

2002

130. Lilford et al, 2004

131. Ibid

132. Ibid

133. Ibid

134. Mant J and Hicks N,

"Detecting differences in quality

of care: the sensitivity of meas-

ures of process and outcome in

treating acute myocardial infarc-

tion", British Medical Journal,

1995

“ Clinical care should be judged according to whether

appropriate medical interventions are made during a

patient’s care”


Does this mean that we should abandonpublication of outcome measures in favourof process measures? Arguments against thisview cite the fundamental purpose of health-care (it is ultimately about improvement ofhealth and alleviation of suffering, notadherence to protocols), and some of theweaknesses of process measures. Processmeasures may restrict innovation by requir-ing providers of care to perform certain clin-ical interventions, whereas outcome meas-ures essentially leave it up to them to decidehow to achieve good outcomes. We considerthis in greater detail in the next section.Process measures may also be prone to beingunimaginatively followed in ways that createunintended or perverse consequences. AsSmith and others observe, “many measuresof process are highly vulnerable to misrepre-sentation by care providers, and may induceseriously dysfunctional behaviour such asgaming.”135

Ultimately, there seems little reason tomake a choice between process and out-come measures, as they can complementeach other. The idea that different types ofmeasures serve as “tin openers and dials” isapposite here. Outcome measures, as longas they are designed and applied withsophistication, can act as tin openers,exposing the existence of a problem thatneeds to be investigated further. Processmeasures, if they can be credibly linked tothe outcomes, then identify whether therereally is a problem and where exactly it islocated. Among the advantages secured bycombining both types of measures is that itmay reassure providers that they are notbeing judged by incomplete and unhelpfuldata, but by well-rounded information

that identifies exactly what, if anything,they are doing wrong.

The problem of rigidity: measures arenot the same as guidelinesAnother criticism of process measures isthat they are an overly rigid and inflexibleversion of practice guidelines. They canencourage clinicians to make interventionsin order to “tick a box” rather than becausethey judge them necessary for the health ofthe patient.

Walter et al examined the particular caseof the San Francisco VA Medical Center,where clinicians were told that failing toraise the centre’s rate of colorectal cancerscreening from 58 per cent to the VAnational target rate of 65 per cent couldincur financial penalties. The implicationwas that low screening rates indicated poorquality of care, when in fact it some casesit meant the opposite. For many patients,such as those already suffering from severeconditions or with strong objections toscreening, more harm than good may bedone by subjecting them to screening. Insuch cases, good quality of care will takethe form of doing the very opposite ofwhat the San Francisco doctors wereinstructed, and given incentives, to do.

The authors identified that the problemlay in the blurring of the line between twoforms of external monitoring that shouldbe kept distinct: practice guidelines andprocess measures. The VA’s 65 per cent tar-get was derived from practice guidelines,which by themselves are useful and evi-dence-based. (Trials show that patientswho have been screened do generally sufferless colorectal cancer mortality.) However,as the authors stressed, “performancemeasures are not the same as practiceguidelines.”136

Guidelines are to be applied with discre-tion by clinicians themselves. Inherent intheir name is the acceptance of a grey areain many clinical decisions: interventions

Measure for measure

44

135. Smith P, "Some principles of

performance measurement and

performance improvement",

Report prepared for the CHI,

2002

136. Walter et al, "Pitfalls of con-

verting practice guidelines into

quality measures", JAMA, 2004

“ Ultimately, there seems little reason to make a choice

between process and outcome measures, as they can

complement each other”


that are generally desirable may be inap-propriate in particular cases. Process meas-ures, by contrast, set standards that indi-cate failure if not met. They specify whatare “good” and “bad” medical care, oftenassigning rewards and penalties respective-ly. In short, performance guidelines areadvisory, while process measures aremandatory.

As with the criticism of outcome meas-ures made by Lilford et al (that outcomes donot serve as accurate signifiers of quality ofcare), this problem is not just theoretical.The Walter study of the San Francisco caseuncovered the practical results of the per-verse incentives produced by rigid processmeasures. Although patients with severeconditions associated with life expectanciesof only five to ten years are unlikely to ben-efit from screening (and may be harmed byit), a significant percentage of the 229patients audited for colorectal cancer screen-ing in at the San Francisco centre in 2002had these characteristics.137 Thirty-five percent were 75 years old or more, while 24 percent had severe disease. These included a 94-year-old man with metastatic prostate can-cer, an 89-year-old woman with severedementia, and a 76-year-old man with end-stage renal disease who died two monthsbefore the screening date.

However, as the authors maintain, theseproblems are not proof that process meas-ures are inherently flawed but that theyshould be carefully designed. In particular,measures should only apply to the care ofpatients for whom evidence shows that theinterventions in question would do moregood than harm.138 Process measures areoften scored as the number of patients whoreceive an intervention (the numerator),divided by the number of patients whowere eligible for it (the denominator). Theproblem with the VA measure is that thedenominator included patients who didnot want to be screened or for whomscreening would not have been beneficial.The fact that they were not in the numer-

ator counted as a failure. Allowing thedenominator to determine patient prefer-ences and clinical judgements would onlyrequire doctors to document systematicallythe discussions they have with patients andthe recommendations they make to them.

Construct validity: do the samemeasures give different results?The most important requirement for anindicator of performance is “constructvalidity”; the fact that it actually measureswhat it intends to measure. If two differentmeasures purporting to measure the samething produce different results, at least oneof them can be said to lack construct valid-ity.

Little work has been done on examiningthe construct validity of existing measuresof healthcare, but some of it raises con-cerns. Brown and Lilford conducted astudy of performance indicators in Englishprimary care trusts (PCTs) to determinetheir construct validity. They comparedpairs of indicators that they expected, orhypothesised, would measure the samething to see if there was a correlationbetween the results that they produced.Their findings were not encouraging.Analysis of four indicators purporting tomeasure access to services (the QOF’s“access bonus”, the star ratings’ “access toquality services” category, Dr Foster’s“equity” rating, and the patient satisfactionsurvey’s “access and waiting” section)found “insufficient evidence to suggestthat these indicators are measuring thesame underlying concept…It is impossibleto say whether any of the access measuresare ‘better’ than the others.”139

The same was found of performance indi-cators that notionally measure differenthealthcare concepts, but which the authorshypothesise are sufficiently related for resultsto correlate strongly140 They hypothesised,for instance, that PCTs with high star ratingsor National Health Service Litigation


Technical difficulties

137. Ibid

138. Ibid

139. Brown C and Lilford R,

"Cross sectional study of per-

formance indicators for English

Primary Care Trusts: testing con-

struct validity and identifying

explanatory variables", BMC

Health Services Research, 2006

140. Ibid


Authority (NHSLA) ratings would have lowhospital mortality, since star ratings measurethe overall quality of a primary care trust(including hospital care), while NHSLAmeasures safety procedures that can be rea-sonably considered to form a part of qualityof care. Their analysis found, however, thattrusts with high star ratings did not necessar-ily have low mortality rates. PCTs with nostars had a mean mortality ratio of 102.7,those with one star had 99.9, those with twostars had 100.2, and those with three starshad 101.5. Similar results were found for theNHSLA ratings.

Evidence of construct validity was foundonly for QOF and star ratings measures ofscreening and preventive healthcare. Theirstudy, Brown and Lilford conclude, “castsdoubt on whether any of the available per-formance indicators help the public toaccurately assess the level of care received attheir PCT.”

However the Brown and Lilford studywas a very limited one, as they concede.The range of indicators they looked at wassmall and pertained only to those to whichEnglish PCTs are subjected. Moreover,there were some positive findings, such asthe correlation to be found between indi-cators in screening and preventive care. Weconclude that the Brown and Lilford workshould not be seen as casting doubt onmost performance measures in healthcare.

ConclusionThis chapter has considered some of themore significant technical problems thatneed to be overcome in order to measurethe quality of healthcare in a reliableway. Of themselves, these are not ideo-logical objections to using measurementas an accountability or improvementstrategy, but they do emphasise the needfor technical excellence, clinical expertiseand managerial skill in their implemen-tation.

Many of the authors whose work hashighlighted these technical difficultiescontinue to maintain that the basicprinciple of measuring quality of care issound and should be maintained. Walteret al141 are typical: “Despite the pitfallsthat may occur in converting a practiceguideline into a performance measure,the potential benefits of performancemeasures derived from evidence-basedguidelines should not be ignored. Thequality of medical care can be improvedin many areas, and performance meas-ures are tools that can help achieveselected goals.” Their call is for betterdesigned measure, not for the abandon-ment or restriction of measures.

It is to answer this call – by identifyingthe best possible healthcare performancemeasures currently in existence – thatmuch of this report is devoted.

Measure for measure

46

141. Walter et al, "Pitfalls of con-

verting practice guidelines into

quality measures", JAMA, 2004


6Patient Reported Outcome Measures (PROMs)

We have noted that the biggest gap in thereporting of quality in the NHS is in thearea of outcomes of care where nothingdisastrous happened. We know howmany people died, but we know littleabout how successful care was for themajority who did not. The QOF has afew measures which judge how well achronic condition, such as diabetes, isbeing managed. For the majority of hos-pital care, however, there is very little suchmeasurement. This chapter suggests thatasking patients directly how their every-day life is being affected gives data of realpower.

It has been argued that data provided bypatients offers an important adjunct to cli-nicians in the care of their patients.142 Self-completed questionnaires with adequatemeasurement properties offer a quick wayfor patients to provide evidence of how theyview their health – evidence that can com-plement existing clinical data. This informa-tion can be used to screen for health prob-lems and to monitor the progress of theseproblems, once identified, as well as the out-comes of any treatment.

Patient-based outcome measures mayalso help change the culture of the NHS;an organisation which is far from univer-sally patient-focused. It is a politically-ledorganisation, where the Government, notthe patient, is the paymaster. Governmenttargets have often had the perverse effectof taking focus away from patient-cen-tred professionalism.143,144 As the NHSConfederation wrote in 2006, “One wayof ensuring that NHS organisations focus

on providing services that patients want isby including in their measures of perform-ance a measure to encompass patient out-comes and experience”.145

The results of a 2005 Patient-ReportedHealth Instruments Group report provideencouraging signs that PROMs may beeffective in positively improving patientinvolvement and ultimately might improvesome important longer-term outcomes.146

There are a number of relevant studies thathave been completed since 2000:

l The London School of Hygiene andTropical Medicine (LSHTM) carriedout research for the Department ofHealth on the routine use of PROMsin treatment centres, between 2005and the end of 2006. The aim was toidentify which PROMs could best beused to measure accurately healthchanges in treatment centres for fivespecified surgical procedures whichaccount for 15 per cent of NHS elec-tive surgery cost.147

l The Orthopaedics and Trauma Unit atYork Hospitals Trust has been collect-ing health outcomes data for patientsreceiving hip and knee replacementssince March 2001, to determine howmuch patients’ health improved follow-ing surgery.

l BUPA has pioneered data collection ofpatient reported outcomes in hospitals.BUPA has reported particular valuefrom using PROMs to generate ‘safetyengineering charts’ which identify out-liers from normal practice.


142. Tarlov AR et al, "The

Medical Outcomes Study. An

application of methods for moni-

toring the results of medical

care", Journal of the American

Medical Association, 1989;

Nelson EC, Berwick DM, "The

measurement of health status in

clinical practice", Medical Care,

1989

143. In primary care,

Government targets for preven-

tion could generate a tension

between doctor and patient:

overweight patients, for example,

could well feel stigmatised by

Government policy and be less

trustful of, and willing to seek

help from, their GPs. At the sec-

ondary care level Government

targets on the time patients wait-

ed to be seen in A&E, for exam-

ple, could lead to a processing

mentality by staff whose primary

objective was to ensure targets

were met, at the expense of

patient-centred professionalism.

144. This has raised the funda-

mental question of the doctor's

role - was the doctor's responsi-

bility to do his or her best for the

individual patient, or to help meet

political targets even in local cir-

cumstances that might be incom-

patible with truly patient-centred

professionalism?

145. Edwards, N, "Lost in trans-

lation: why are patients more sat-

isfied with the NHS than the pub-

lic?", NHS Confederation, 2006

146. National Centre for Health

Outcomes Development, Patient

Involvement and Collaboration in

Shared Decision-making: A

review. Report to the Department

of Health, 2005.

147. The procedures are

Cataract surgery; Hip replace-

ment; Knee replacement;

Varicose vein procedures; Hernia

repairs.


It is this final example that we examine inthis section of the report.

BackgroundBUPA is the largest and best known pri-vate health insurance company in the UK.3.1 million Britons have BUPA insurance,as well as another 2.9 million people inter-nationally. It provides care through 35acute hospitals and 245 care homes.

During the 1990s, BUPA became frus-trated with the limits of existing measuresof health outcomes used in the UK.148 TheNHS tended to do no more than measurevarious forms of failure, such as readmis-sion and mortality, which account for a rel-atively small number of all patients, asFigures 1 and 2 show. The independentsector, with its more predictable elective

workload, had developed more usefulprocess measures, which defined standardcare pathways and monitored deviationsfrom these pathways. However, neither thestate nor the independent sectors weremeasuring what is most important: theextent to which patients themselves feelthey have gained in health and well-beingafter receiving healthcare.

To fill this gap, BUPA began the searchfor a patient-reported outcome measure.As a journal article by a group of BUPAemployees notes, “Only the patient under-stands their own health post-dischargewith any degree of richness, so the use of atool that assesses outcome from the patientperspective is essential.”149

SF-36In 1998, BUPA eventually settled on theShort Form 36 (otherwise known asSF36)150 as the measuring instrument itwould use. SF-36, developed by theMedical Outcomes Study in the US, con-tains eight scores for dimensions of well-being, such as vitality and body pain, andtwo summary scores for physical and men-tal health. Unlike measures for specificconditions, such as the Oxford knee meas-ure, it is a generic instrument suitable forthe full range of medical interventions. Ithas been widely recognised as valid andreliable,151 and its use in areas of care asdiverse as orthopaedic surgery,152 hysterec-tomy153 and coronary artery bypass graft154

documented. BUPA asked patients to complete and

return the SF-36 form pre-operatively(before admission) and post-operatively(three months later). The response rateachieved so far for the baseline assessmentacross all BUPA hospitals is, on average,10–15 per cent, but much higher in someindividual hospitals.

The first form is collected by the hospi-tal, while the second is mailed out to thepatient with a freepost envelope. The

Measure for measure

48

148.Vallance-Owen A et al,

"Outcome monitoring to facilitate

clinical governance; experience

from a national programme in the

independent sector", 2004

149. Ibid

150. Following concern over the

performance of the SF-36 with

respect to cataracts, this was

dropped in favour of a visual func-

tion-specific measure, the VF-14

originally designed for cataract but

now validated for other conditions

151. Ziebland S, "The short form 36

health status questionnaire: clues

from the Oxford region's normative

data about its usefulness in measur-

ing health gain in population sur-

veys", Journal of Epidemiology and

Community Health, 2005

152. Nilsdotter A et al, "Comparative

responsiveness of measures of pain

and function after total hip replace-

ment", Arthritis & Rheumatism,

2001

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Hospital

%of

tota

ldis

char

ges

% Surgical Deaths% Transfers% Re-ops

0

10

20

30

40

50

60

70

80

90

100

Hospital

%of

tota

ldis

char

ges

% Surgical Deaths% Transfers% Re-ops

Figures 1 and 2: BUPA hospital rates for clinical indicators


change in scores from the first to secondforms is the measure of health outcome.The results are fed back to hospitals everythree months and to consultants every 12months. Each provider can see their ownresults. The identity of the individualpatients, however, is not revealed to hospi-tals or consultants. All of BUPA’s hospitalsparticipated in the scheme and, by 2004,over 100,000 patient cases of elective sur-gery had been recorded.

Problems and solutionsAs Andrew Vallance-Owen, MedicalDirector at BUPA, noted that their origi-nal vision for outcome measurementproved to be a “little optimistic”, problemsbegan to emerge.

The single most significant problem wasthe question of how to present the data.Initially, league tables and histograms wereused. However, these were criticised byproviders for a range of reasons, includingbeing difficult to interpret and providingonly a snapshot view rather than present-ing trends.

League tables were especially unpopular.As Vallance-Owen et al observe:“Outcomes are influenced by random vari-ation as well as case mix, which can signif-icantly alter a hospital’s position in a leaguetable. Nonetheless, there is an implicitassumption that hospitals located towardsthe bottom of a league table provide aworse service. This may be inaccurate andif so, is demoralising for staff ”.155 It was feltthat too many hospitals (about 1 in 20)were identified as outliers, a problemwhich meant that hospitals were wastingstaff time on additional audits when theirperformance was actually within the nor-mal range, while rewards were given to the“top” hospitals when their performancewas actually due to chance.

To address these concerns, BUPA start-ed presenting the SF-36 data in the form ofShewhart charts, otherwise known as con-

trol charts. Control charts plot data in rela-tion to bands, or “audit lines”, which arethree standard deviations above and belowthe mean: any variation within these bandsis attributed to common causes, while vari-ations which exceed the bands are attrib-uted to special causes.

There are several advantages to controlcharts. First, they are relatively easy to inter-pret. Second, looking into special causes isquicker and simpler than investigating com-mon causes. It “enables one to examine clin-ical practice, to identify the factors behindapparently exceptional or potentially poorperformance and hence facilitate clinicalgovernance”, which, after all, was the pur-pose of BUPA’s efforts at measuring out-comes.156 Third, they allow trends in per-formance to be identified, as opposed to justa snapshot view.

Another problem encountered by BUPAwas the scepticism with which consultantsinitially responded to the measuringprocess. Many were hostile to the idea ofbeing judged on the basis of the subjectiveopinion of patients, as opposed to moretechnical and quantitative measures ofwell-being, such as the range of movementof a joint in degrees or visual acuity asgauged by a Snellen chart.157 From thespring of 2002, steps were taken to addressthese concerns. For example, followingcomplaints from ophthalmologists, SF-36was replaced by a specific questionnaire(the Visual Function 14) for phakoemulsi-fication of the lens.158 In addition, reportswere not sent to consultants until at leastten of their patients had been followed up.

It should be noted that some consultants,such as orthopaedic specialists, were veryhappy with SF-36. Andrew Vallance-Owenand Brian Matthews, the project manager incharge of the PROMs project at BUPA, notethat the specialties which were most positiveabout SF-36 were those that registeredimpressive health gains after three months.159

A related problem was the frustration ofother hospital staff, who felt that the


Patient reported outcome measures (PROMs)

153. Coulter A, Peto V and

Jenkinson C, "Quality of life and

patient satisfaction following

treatment for menorrhagia", 2004

154. Connerney I et al, "Relation

between depression and coro-

nary artery bypass surgery and

12 month outcome: a prospective

study", 2001

155. Vallance-Owen et al, 2001

156. Ibid

157. Ibid

158. Ibid

159. Vallance-Owen A and

Matthews B, "Measuring clinical

outcomes through patient sur-

veys in independent hospitals. A

presentation", 2001


anonymity afforded to individual consult-ants prevented them from identifying poorperformers and protecting patients (andtheir hospital’s reputation).160 Here, a bal-ance had to be struck between clinicians,who opposed greater openness of informa-tion, and managers, who wanted it. BUPAcompromised by maintaining consultantconfidentiality and developing a system toalert managers of any serious performanceissues in their hospitals. The managers arethen asked to investigate such issues fur-ther through other sources of audit data.

Another obstacle encountered by BUPAwas that “questionnaire return rates werenot sufficiently high to allow muchreliance to be placed on the output”.161

Three years after BUPA’s introduction ofSF-36, records showed that 61 per cent ofeligible patients received and completedthe baseline form. There was considerablevariation between hospitals, with morethan a quarter showing a rate of over 80per cent. The failure of hospital staff toenthuse their patients to complete theform (due in part to the frustrations notedabove) has been cited as a major reason forthe low completion rates. To improve thesituation, a postal reminder was sent topatients who had not returned the ques-tionnaire after three months. This raisedthe return rate to 75 per cent.162

Finally, some of the procedures beingsubjected to the measures did not containsufficient cases to yield meaningfulresults.163 While data was being collectedon over 1,000 procedures, some of theseonly occurred once or twice in each three-month cycle. BUPA initially attempted tosolve this problem by grouping together

procedures which could be expected toproduce similar effects, but no such group-ing ever met with enough confidence fromsome stakeholders. Consultants in particu-lar worried that their results would be dis-torted by mixing procedures that couldhave very different rates of recovery. Thesolution eventually agreed upon was a nar-rowing of the measurement programme to20 common procedures which covered themain surgical specialties. These includedtotal hip replacement, CABG, adult tonsil-lectomy, hysterectomy and surgicalremoval of impacted teeth.

ImpactThere are specific examples of the data lead-ing to problems of care being identified andaddressed.164 One BUPA hospital, alerted bythe data to poor outcomes in one specialty,conducted further investigation and foundthe specific area of weakness to be post-oper-ative pain relief for hysteroscopy. Targetedimprovements were made. In another exam-ple, a consultant discussed his low ratingswith his hospital’s head of nursing andlearned that the root of the problem was thathe gave overly optimistic information to hispatients about their recovery. He adjustedhis approach accordingly.

“The programme”, Vallance-Owen andhis BUPA colleagues conclude, “has shownthat clinical outcomes can be measuredobjectively and collected systematicallyusing a comparatively simple process. Ithas also demonstrated that feedback mustbe presented in an understandable anduser-friendly manner if it is to influenceclinicians.”165

Measure for measure

50

160. Vallance-Owen et al, 2001

161. Ibid

162. Ibid

163. Ibid

164. Ibid

165. Ibid


7Outcome Measurement in Primary Care

SummaryThe NHS has pioneered quality measure-ment in primary care: it introduced a pay-for-performance scheme in 2004 and hasinvested £8 billion into primary care serv-ices over the past three years.

The introduction of the GeneralMedical Services (GMS) contract wasinfluenced by the ebb of trust from health-care professionals and the move to activemonitoring of performance, which werediscussed in Chapter one. The contractmade headlines at the end of its first yearbecause GP response exceeded that antici-pated: practitioners claimed an average of83 per cent of the available incentives forcarrying out various treatments, whereasthe Government had expected a figure of75 per cent. As a result GPs’ pay increasedby 30 per cent, from an average of £80,000to £106,000 a year, with some said to beearning as much as £250,000. The publichave begun to question whether these gen-erous rewards were matched by the gains inquality of healthcare. In January 2007, theHealth Secretary Patricia Hewitt admittedthat, with hindsight, the Governmentshould have capped the amount that GPscould make from their new contracts.166

This chapter evaluates the criticismsthat have been levelled against the GMScontract: that the evidence that it hasimproved quality is inconclusive; that thewrong things are being measured andrewarded; and that the design of the pay-ment system encourages gaming andpenalises practices serving deprived popu-lations.

BackgroundBefore 2004 only a few measures of quali-ty of care were routinely available. Thesewere for services that attracted an addition-al fee, such as cervical smears, vaccinationrates, child health surveillance, minor sur-gery and contraceptive surgery.167 TheGMS contract provides direct monetaryincentives to the majority of general prac-titioners in the UK who are self-employedpartners (or principals) and who share inthe profit from their practices (includingthose opting for personal medical servicesagreements, the permanent local alterna-tive to the national GMS contract). Itawards payments, accounting for 25-30per cent of income, for a wide range ofservices set out in QOF, which was devel-oped from a number of different evidence-based schemes. The contract containsapproximately 150 quality indicatorsacross four broad areas: clinical, organisa-tional, additional services and patientexperience.

The relative payment for each indicatordepends on a points system designed toreflect the likely workload involved. Eachquality indicator is allocated a maximumpayment and the monetary value of a pointin turn depends on practice list size anddemographics. For example, over half themaximum points were allocated to clinicalperformance (550 out of 1,050) and for anaverage practice with a patient populationof 5,550 and three full-time partners themaximum payment for the clinical areaalone was £66,000 per annum at 2005-06rates.


166. GP pay rise 'was mistake',

interview with Patricia Hewitt,

19th January 2007, see

http://news.bbc.co.uk/player/nol/n

ewsid_6280000/6280077.stm

(last accessed 03/05/07)

167. Ashworth M and Armstrong

D, "The Relationship Between

General Practice Characteristics

and Quality of Care: a national

survey of quality indicators used

in the UK Quality and Outcomes

Framework 2004-5, BMC Family

Practice, vol 7, p 68, 2006


The impact of the GMS contract onquality is unclearA key question is whether the high levels ofQOF interventions attained after theintroduction of pay-for-performancereflect improvements that were alreadyunderway – an acceleration of existingtrends – or a distinct response to a newincentive. Even if the latter were true, asubsidiary question remains: was it pay-ment or measurement that encouraged theimprovement?

The quality of primary care hadalready begun to improve before 2004 in

response to a wide range of initiatives,such as setting national standards for thetreatment of coronary heart disease(introduced in 1999) and diabetes (intro-duced in 2003), and a national system ofinspection.

The most up-to-date evidence fromMartin Roland’s study of the quality ofcare for three chronic conditions – coro-nary heart disease, Type 2 diabetes andasthma169 – suggests that the introductionof pay-for-performance was associatedwith a “modest acceleration in improve-ment” for diabetes and asthma. Theseresults, set out in Figure 3, are based oncare reported in the medical records anddoes not necessarily represent the care pro-vided. A common criticism of pay-for-per-formance programmes is that their maineffect is to promote better record-keepingrather than better care.170

The GMS contract looks at the“wrong” measuresToo much process, not enough outcomeDespite its name the quality and outcomesframework is heavily biased towards meas-ures of clinical process; there are only 20individual outcomes measures in the QOFdataset of 150. A balanced mix of outcomeand process indicators is essential.

Measure for measure

52

168. Roland M et al, "Quality of

Primary Care in England with the

Introduction of Pay-For-

Performance", The New England

Journal of Medicine, pp 181-190,

12th July 2007

169. Ibid

170. Rosenthal M and Frank R,

"What is the Empirical Basis for

Paying for Quality in Healthcare,

Medical Care Research and

Review vol 63 (12), pp 135-157,

2006

Table 5: performance initiatives introduced before 2004168

l National standards for the treatment of major chronic diseases, such as the National Service

Frameworks for coronary heart disease (1999) and diabetes (2003)

l Contractual requirement for practitioners to undertake a clinical audit (initially a requirement in the

1990 contract)

l Financial incentives for cervical cytologic testing and immunisation (early 1990s)

l Widespread use of audit and feedback by the Primary Care Trusts

l Release of comparative data for quality of care to practitioners (common) and the public (rare) by

the primary care trusts

l Annual appraisal of all primary care physicians (by the Primary Care Trusts and including discus-

sion of some audit data)

l National system of inspection and monitoring of performance (by the Healthcare Commission)

90 -

85 -

80 -

75 -

70 -

65 -

60 -

55 -

0 -

1998

-

1999

-

2000

-

2001

-

2002

-

2003

-

2004

-

2005

-

Mea

n P

ract

ice-

Qua

lity

Sco

re (%

)

Year

CHDAsthma

Diabetes

Figure 3: mean scores for clinical quality at the practice level for coronary heart disease, Type 2 diabetes and asthma, 1998-2005


Looking at the wrong interventionsThe principle underlying the points systemis that it rewards GPs and their staff for thequantity of work done. The weighting ofpoints was determined by two small groupsof GPs who estimated the work required toachieve the different interventions. Thisapproach – basing the rewards on percep-tions of likely workload – has the advantagethat it encourages GPs to give equal weightto all quality indicators, rather than prioritis-ing the less burdensome ones. But likelyworkload may not reflect likely health gain.

Last year, Robert Fleetcroft and RichardCookson assessed whether or not there isan association between improvements inhealth and the financial incentives of theGMS contract.171 To do so, they examinedeight preventive treatments covering 38 ofthe 81 clinical indicators in the qualityoutcomes framework (Figure 4). The max-imum payment for each service was calcu-lated and compared with the likely benefitin terms of lives saved per 100,000 popu-lation based on evidence from a widelyendorsed study by McColl et al. Maximum


Outcome measurement in primary care

171. Fleetcroft R and Cookson

R, "Do the Incentive Payments in

the New NHS Contract Reflect

Likely Population Health Gains?,

Journal of Health Services

Research and Policy, Vol 11,

pp7-31, January 2006

Figure 4: potential quality payments against potential lives saved for the eight interventions

Intervention (number relates to graph) Maximum lives Maximum lives Maximum payment

saved per saved per for typical practice

unit of time 100,000 per year

(% of total)

1 ACE in heart failure 76 per 90 days 308.0 (41%) 2400 (06%)

2 Influenza Immunization in over 65s 146 per year 146.0 (20%) 3600 (10%)

3 Stop smoking advice and nicotine replacement 120 per year 120.0 (16%) 10,440 (28%)

4 Screening and treatment of hypertension 286 per 4 years 71.0 (10%) 17,280 (46%)

5 Aspirin in Ischaemic heart disease 48 per year 48.0 (6%) 1320 (4%)

6 Warfarin in atial fibrillation 33 per year 33.0 (4%) 0 (0%)

7 Statins in ischaemic heart disease 69 per 5 years 13.8 (2%) 2750 (7%)

8 Statins in primary prevention 14 per 5 years 2.8 (0%) 0 (0%)

Total 742.6 (100%) 37,800 (100%)

2000 -

1500 -

1000 -

500 -

0 -

0 -

100

-

200

-

300

-

400

-

4

3

21

57

68

Max

imum

qua

l ity

pay m

ent

Lives saved per year per 100,000 population


payments for the eight interventions makeup 57 per cent of the total maximum pay-ment for all QOF clinical interventions.

Potential lives saved for the differentinterventions ranged from 2.8 to 308 per100,000 population per year; potentialquality payments ranged from zero to£17,280 per annum (at the 2005-06 rates).Fleetcroft and Cookson concluded: “Thereappears to be no relationship between payand health gain across these eight interven-tions…There is a danger that clinical activ-ity may be skewed towards high workloadactivities that are only marginally effective,to the detriment of more cost-effectiveactivities.”172

Importantly, two of the interventionsthat have proven effectiveness – the use ofwarfarin in atrial fibrillation and statins inprimary prevention of heart disease –received no quality incentive payment atall. This is in contrast with other areas,such as personal learning plans, thatreceive incentives but whose benefits areunsupported by robust evidence.

Distorting prioritiesRodney Haywood has also argued thatguidelines for treatment and outcomes,when used for performance measures, mayencourage increased intervention andexpenditure with little evidence to supporttheir effectiveness. Such measures oftenbecome bargaining chips for powerful spe-cial interest groups. “Influential partiesoften have strong incentives to advocatethat these measures be aligned with ide-alised goals...even the most pure-heartedpersons and groups with vested interest inissues related to diabetes...have a naturaland justifiable tendency to want moreattention and resources for their cause.”173

At the same time other disease areas –such as mental health, learning disabilities,palliative care of the elderly, osteoarthritis,osteoporosis and multiple sclerosis, wheresuccessful outcomes are harder to measure,or whose advocacy groups have less power

to press for “idealised” measures of success– are neglected. What is not incentivised ismarginalised.

Local versus national prioritiesA further criticism of the GMS contractis that it lacks the flexibility to addresslocal health issues, especially pockets ofdeprivation.

Ethnic minorities often suffer more thanthe majority population from conditionssuch as diabetes. Christopher Millett et alanalysed outcome measures that indicatehigh-quality diabetes care, such as patientlevels of glycated haemoglobin (HbA1cmeasures the extent to which diabetes isunder control), in the London Borough ofWandsworth. They concluded that theintroduction of pay-for-performance hadnot addressed disparities in the manage-ment and control of diabetes in ethnicgroups.174 Although this particular studysomewhat overstates its case, it is neverthe-less true that the easiest way of scoringhigh may be to target the easiest to reachgroups, thus increasing inequity. For exam-ple, if 90 per cent is the cut-off point formaximum payment, there is no incentiveto seek out the 10 per cent hardest to reach– and most expensive – patients.

Of course, reducing these type of healthinequalities would not be a priority in areaswith smaller minority ethnic populationsand, for this reason, the possibility of somelocal discretion in setting a measure of“success” is attractive. Multi-ethnicWandsworth might want to include meas-ures of equality of access and outcome,other local health services could addressage or socioeconomic inequality, and soforth.

Unintended consequencesA high level of gaming? As Tim Doran has written: “Evidence-based quality indicators should not beapplied unthinkingly, since patients have

Measure for measure

54

172. Ibid

173. Haywood R, "Performance

Measurement in Search of a

Path", The New England Journal

of Medicine, p.951, 1st March

2007

174. Millett C et al, "Ethnic

Disparities in Diabetes

Management and Pay-for-

Performance in the UK: the

Wandsworth prospective dia-

betes study", PLoS Medicine,

June 2007


co-existing conditions that affect theiroptimal care. It is inappropriate, for exam-ple, to strive to control the cholesterol levelof someone terminally ill with cancer.”175

Clearly, intrusive tests on dying patientsare both inefficient and unethical. TheGMS contract recognised this and allowedGPs to exclude patients from specific treat-ments (“exception reporting”). However,this provided an opportunity for practi-tioners to increase their income by exclud-ing patients inappropriately.

A small number of practices appear tohave achieved their high scores in the firstyear of the contract by excluding largenumbers of patients. More than 1 per centexcluded over 15 per cent of theirpatients.176 It is possible that those prac-tices which were better at identifying andtreating patients were also better at identi-fying patients for whom the targets wereinappropriate, and the low level of excep-tion reporting suggests that gaming on alarge scale was rare.

The rate of exception reporting variedconsiderably by disease group; this mayreflect variation in the ease of meetingindividual targets or the amount of finan-cial reward available for each. For example,there were low levels of exception reportingfor hypothyroidism (worth about £456 inthe first year of the contract) and high lev-els for mental health problems (worthabout £1,748), coronary heart disease andchronic obstructive pulmonary disease.177

Has the new contract created widespreadgaming throughout primary care in theNHS? Probably not. But the very high lev-els of exclusions in a minority of practicessuggest that the capacity to game exists andis being exploited. There is also the dangerof double counting – by including care pro-vided in hospitals or from district nursesemployed by primary care trusts.

The GMS contract and deprivation The option to exclude patients is not theonly reason why the contract does not ade-

quately reward practices that seek out theneediest patients. During the final negoti-ations on the contract in 2004, it wasdecided to take into account the numberof patients in a practice with a particulardisease in order the better to match pay-ment to workload. But rather than use trueprevalence of disease, an approximation,the adjusted disease prevalence factor(ADPF), was formulated. Although thecalculations under ADPF are complex, themain aims, wrote Bruce Guthrie were to“reduce variation [in payment] and rela-tively protect the losers, while at the sametime providing fair rewards to those whohave the highest prevalence”.178

Evidence suggests that the ADPF doesnot reduce variation in total practiceincome: there is a forty-four-fold variationin payment for practices treating the samenumber of patients to the same level ofquality.179 The aim of fair pay for workloadis therefore not met. Moreover, the ADPFinstitutionalises what has been termed the“inverse care law”: the tendency for morefunding to go to areas that have less needof it than others. This payment does notachieve its main goals and illustrates theproblems that occur when complex adjust-ments are introduced too rapidly and with-out adequate pilot trials.

Variability in the quality of care offeredby different practices has long been a causefor concern and overall QOF scores havealready been found to be lower in areas ofsocial deprivation. Could this be explainedby the difficulty of providing good qualitycare to the neediest populations? Could alink between social deprivation and poorprimary care quality scores arise becausethey have larger, more unmanageable lists,or because they have a higher turnover ofpatients making it difficult to accumulatesufficient clinical success? Mark Ashworthand David Armstrong investigated andfound that “a high proportion of patientsaged 75 years or over or a high proportionof the local population being born in a


Outcome measurement in primary care

175. Doran T et al, "Pay-for-

Performance Programs in Family

Practices in the United

Kingdom", The New England

Journal of Medicine, pp 175-384,

27th July 2006

176. Ibid

177. Ibid

178. Guthrie B et al, "Workload

and Reward in the Quality and

Outcomes Framework of the

2004 General Practice Contract",

British Journal of General

Practice, November 2006

179. Ibid


developing country did not have an inde-pendent effect on the QOF score.” Theyalso found no evidence of lower QOFscores with very high list sizes. Similarly,“high turnover of the registered practicepopulation appeared not to make it more

difficult to achieve higher QOF scores”.180

In other words, the researchers did not findthat deprivation of itself caused poor qual-ity primary care.

Measure for measure

56

180. Ashworth M and Armstrong

D, "The Relationship Between

General Practice Characteristics

and Quality of Care: a national

survey of quality indicators used

in the UK Quality and Outcomes

Framework 2004-5, BMC Family

Practice, vol 7, p 68, 2006


8Recommendations

Summary: what can and should we do?We began this report by noting that NHSprofessionals have traditionally been self-regulated and have had professional auton-omy. This model was rejected in the 1990sdue to poor health outcomes, long waitingtimes, and a series of scandals involvingincompetent or malign clinicians. TheGovernment’s response to this was thedevelopment of national standards of per-formance management. These includedclinical governance and monitoring by theCHI, and a massive increase in publicspending on the NHS. Some improve-ments in care have been achieved; forexample the NHS now has shorter waitingtimes for virtually all points of access tohealthcare. In other areas, however, signifi-cant problems persist; England still suffersfrom relatively poor outcomes in manyareas of care, and the regulatory frameworkand systems of monitoring, accountabilityand regulation are confusing and underde-veloped, and hence often ineffective.

In England, between 1997 and 2002,the Government sought to ensure thatpatients would have access to good localservices. A system of star ratings held chiefexecutives of NHS organisations toaccount for the delivery of Governmentpriorities, while CHI undertook a compre-hensive rolling programme of inspections.Since 2002, the Government has sought toensure better access in a pluralistic system,including NHS trusts, foundation trustsand private providers. The Governmenthas tried to facilitate patient choicebetween competing providers on the basis

of quality. However, the arrangements for,and underpinning of, this pluralistic sys-tem have caused problems:

l The Healthcare Commission is tasked(and funded) to undertake inspectionof healthcare services on a targeted andproportionate basis. Therefore it is notsufficiently sized to undertake a com-prehensive rolling programme ofinspections (including visits) of thecomplete variety of providers that arepublicly financed (which due to theencouragement of new entrants intothe NHS ‘market’ is massivelyincreased).

l Given the limitations on resources, themajority of Healthcare Commissionactivity in relation to the NHS hasbeen focused on its statutory obligationto produce an annual healthcheck thatassesses the quality of NHS providers.However, this is limited in two ways.First, the parameters of the healthcheckare largely determined by Governmentpriority (targets new and existing, andminimum standards for NHS care) andsecond, the system of assessment must,by statute, apply to the whole organisa-tion. This creates a weakness; impor-tant dimensions of quality that lie out-side Government targets, or cannot berelated directly to minimum standards,cannot be reflected adequately in thehealthcheck.

l A second limitation that currentstatutes place on the HealthcareCommission is a different legal set ofrequirements, and hence different regu-



latory and assessment regimes, for theNHS and the independent sector. In asystem which is supposed to be plural-istic and decentralised, it makes nosense to judge the same service provid-ed by the NHS and independent sec-tors in different ways.

l It is unclear how two different policiesto improve quality relate: inspectionsby the Healthcare Commission andpatient choice. To drive up qualitythrough patient choice, patients oughtto have specific and timely informa-tion on services – not just informationat the aggregated level of the hospital.While the Healthcare Commissiondoes undertake service specific assess-ments, these have been of servicessuch as mental health or chronic con-ditions which are less susceptible toimprovement through competition(and where, therefore, regulatoryintervention is a more pressing need).Therefore there is a paradox that theareas on which a regulator shouldproperly concentrate its resources areprecisely those areas where publicationof performance is least likely to stimu-late patient choice.

l We lack evidence to support the ideathat patient choice is a driver of qualityof care. Indeed, the evidence from theUS is that patients do not always usethis information when it is available.Moreover, our own survey shows thatpatients want information at the aggre-gated level of a hospital (which is aweak indicator of quality), primarilyfor the purposes of reassurance rather

than choice. Research by the PickerInstitute181 also shows that patientshave little interest in being offered achoice about which hospital theywould like to be admitted to. It may bethat over time patient choice couldbecome a key spur to quality improve-ment in healthcare. However, it has notyet reached that stage.

A future information ‘landscape’ In this study we have made a case for col-lecting more information. We have focusedon practical examples of good use of infor-mation and explored international exam-ples of best practice. In order to improvethe quality of care we suggest that informa-tion should be used in three ways:

1. Information for accountabilityAccountability measures are designed toassure taxpayers that their local health serv-ices (including NHS providers,Independent Treatment Centres and pri-vate providers doing publicly fundedwork) do two things:

l provide at least a minimum acceptablelevel of care, including safety, access,clinical competence, and compassion;

l spend public money with due care andconsideration and in order to achievethe goals which the Governmentdecide are appropriate.

2. Information for patient choice and activationInformation is often seen as facilitatingpatient choice, but it also serves activation,the patients’ willingness to be involved andassertive in the decision-making abouttheir own care.

3. Information for providers (for qualityimprovement)This information is used to activateproviders’ intrinsic motivations of profes-sionalism and altruism. It does not neces-

Measure for measure

58

181. Available at http://www.pick-

ereurope.org/ (Last accessed

03.05.07)

“ Information is often seen as facilitating patient choice,

but it also serves activation, the patients’ willingness to be

involved and assertive in the decision-making about their

own care”


sarily need to be published but doesrequire a mechanism to share the informa-tion with providers and hospitals, and toencourage and monitor improvement.

Information for accountabilityOrganisational levelAccountability does not mean that the hos-pital guarantees that nothing will ever gowrong anywhere within its walls, but thatit will do all it reasonably can to ensure abasic level of care. Therefore, the measureshould be at the level of organisation, suchas hospital trust, rather than sub-organisa-tional. This is the only one of the threetypes of measures that should be at thishigh level of aggregation.

Nature of data The data will consist of a mixture of judge-ments made by competent professionalsfollowing a rigorous and consistent processof assessment. An example would be theHealthcare Commission’s judgement ofcore standards, which utilises pre-existingdata, and inspection, by qualified profes-sionals.

Who would gather, analyse and publish the data?The Healthcare Commission is ideallyplaced to undertake this role. It alreadycollects, analyses and publishes much ofthe data required for the purpose ofaccountability through its annual healthcheck. The current responsibility of theHealthcare Commission to create an indi-vidual rating for each institution obscuressome of the value of the data, but betterorganisation and presentation of the infor-mation should be relatively straightfor-ward. Another advantage of using theHealthcare Commission is that properaccountability requires independence. TheNHS performance management processcannot substitute for outside scrutiny.Equally, private companies which also pro-

vide consultancy services for NHS trustscannot play this role, because of a clearconflict of interest.

Presentation and disseminationThe information should be disseminatedthrough Healthcare Commission publica-tions and mass and local media. The ideaof requiring hospitals to display their cur-rent assessment within their premisesshould also be considered. Presentationshould be targeted to the specific questionsidentified below, and, to make the resultseasily understood, should be presented in away that shows “better than expected/highquality”, “as expected”, “below expecta-tions/ some concerns”.

The accountability matrix in appendix twoillustrates some measures that could be used.

Information for patient choice and activation PurposeInformation is often seen as facilitatingpatient choice, but it also serves activation,the patients’ willingness to be involved andassertive in the decision-making in theirown care. Moreover, choice can only real-ly apply to predictable elective care, andespecially surgery. It is not generally rele-vant to emergency care or to managementof long-term conditions, where it will tendto conflict with continuity of care.

Whether for choice or activation, releas-ing small amounts of information about thequality of specific services to patients withspecific conditions rather than consumersgenerally will provide invaluable support.

Organisational LevelInformation aimed at improvement needsto be at the level of the service: practice,not PCT; department, not hospital.

Nature of dataThe data will generally be individual meas-ures of performance which have been suffi-


Recommendations


ciently tested for robustness and useful-ness. This type of data needs greater speci-ficity than information for accountability,so judgements are less appropriate thanmeasures.

Earlier in this report, we proposed aclassification of measures which coveredstructures, processes from the point ofview of doctors and patients, and out-comes both from the point of view ofdoctors (measures of mortality and mor-bidity) and patients (such as the self-reported return of physical function). Allof these, except structural measures (vitalfor organisations to diagnose the causesof their level of performance but of lessimmediate value for patients), are suitedto providing information for patientchoice and activation.

The data required to support thesemeasures are generally available (and arelikely to become more so in future) withthe exception of PROMs, which are amajor omission from current NHSdatasets. BUPA have shown that it can bedone through their extensive use of theSF36. The cost of this data collection isestimated at between aproximately £3and £15 per patient, which would equateto no more than 2 per cent of the2007/08 Payment by Results tariff for acataract operation and no more than 0.25per cent of the same tariff for a kneereplacement. The gains to be made fromusing such measures, and the low cost ofdoing so, make the case for their intro-duction into the NHS very persuasive.

In terms of the presentation anddesign of the data, US-based researchsuggests that:

l information should be focused toallow the patient to make judgments

about specific elements of care.l comparison should be made with

norms and acceptable levels of per-formance, rather than trying to createspurious league tables of providers.

Who would gather, analyse and publishthe data?There are multiple roles to be played indelivering these types of data, and differ-ent players from the voluntary, private andprofessional sectors are able to play them.As with accountability measures, trust isessential for the measures to be valued but,unlike them, this need not require theimpersonal formality of a regulator. Forone thing, the gaps in what measures arecurrently available are simply too great forone regulatory body to fill quickly. Wepropose that as many organisations asmeet required standards of technicalexpertise to produce measures to a highstandard of quality should be encouragedto use national data sources to do so.

The issues to be considered when devel-oping these measures are as follows:

l Determining what to measure. Ourclassification of information typesidentifies two points of view, that ofthe patient and that of the clinician.There are two groups of bodies partic-ularly well placed to determine what ismeasured: disease-specific patientgroups (e.g. Diabetes UK) and clinicalspecialty groups (e.g. Royal College ofCardiac Surgeons).

l Determining how to measure andanalyse the data. This often requiresconsiderable analytical skill and tech-nical expertise. This is located in arange of public and private sectororganisations, such as the NHS infor-mation centre, the HealthcareCommission, several university depart-ments, the King’s Fund, CHKS, andDr Foster Intelligence to name just afew.

l Data collection and management.This divides between quasi-adminis-trative data such as HES, which is theresponsibility of the NHS informationcentre; quasi-regulatory data, which iscollected and managed by the

Measure for measure

60


Healthcare Commission; and clinicalaudit type information, which is col-lected and managed by Royal Colleges.

l Publication and dissemination.Patient groups are ideally placed forthis role as they know who the likelybeneficiaries of such information arethrough their networks of member-ship, they are a trusted source of infor-mation, their dissemination channelsare already established and they have aspecific skill and credibility in commu-nicating to their target audiences.

There are many potential models of howthis information could be provided. Onewould be for the Government to support,through “seed money” and guaranteedaccess to publicly collected data, a setnumber of projects each year to be led bypatient groups to supply informationabout performance for their specific audi-ence. These groups would be allowed tocommission the analysis and data man-agement from a range of approvedprovider organisations, such as those list-ed above. Clearly, some entry require-ments for providers of analysis would benecessary in order to ensure informationof sufficient quality, but this decentralisedapproach is likely to ensure quicker andless bureaucratic production, easier tar-geting of information at the right audi-ence, information that comes via a trust-ed “brand” for the audience, and quickerfilling of the gaps in knowledge abouthealthcare.

The patient choice and activationmatrixAppendix three illustrates an exampleinformation matrix for diabetes, draw-ing, wherever possible, from existingdata sources. These measures should besummarised into assessments into thefour areas at the “above/as/below expect-ed levels”.

Information for improvementOrganisational LevelInformation aimed at improvement needs tobe collected and presented at the level atwhich changes need to be made (if necessaryclinical team or practice)

Nature of dataThere are two potential models of the typeof data which could be used in this con-text: the surveillance model or the compre-hensive model.

Surveillance modelThis model uses a range of outcome meas-ures derived from routinely collected datasets which can be used to identify potentialareas of weakness (for example mortalityrates derived from HES and to stimulateinternal analysis and improvement activi-ties. These may either be a wide range ofrelated measures where a consistent patternof relatively poor performance would raiseconcerns, or it may be sentinel measures ofserious failure (e.g. death following day casesurgery).

Typically these will be time series datawhich spot quite small but significantchanges in performance over time. Theseinclude statistical techniques such asCUSUM and control charts. At their bestthey can run in almost real time, allowingquick response to problems before theybecome crises. This is clearly a cheaperoption than the comprehensive model, buthas the risk of missing failing services, or ofnot picking up weak processes or structur-al failure before bad outcomes result.

Comprehensive modelThe alternative model is to create morecomplete networks covering all aspects ofthe structure, process and outcome meas-ures from both clinical and patient per-spectives, and use these effectively forinternal performance management anddiagnosis of issues. At their best, compre-hensive models would combine the best


Recommendations


elements of good clinical audits, patientexperience surveys and PROMs. So thisis a gold standard, but expensive, option.

A rough estimate of the cost of imple-menting this model nationally, based on HCImprovement Reviews, which are somewhatsimilar, would be a minimum of £4 million.The most onerous costs an absolute mini-mum of £4 million to implement nationally,of which the most significant costs are thoseof local data collection (this figure excludescosts associated with improvement activi-ties). More fundamentally, the staff neededto undertake this sort of work are not avail-able in the NHS, unless there is a fundamen-tal re-structuring of the job role of NHSinformatics staff away from unnecessary datacollection for central reporting and businessmanagement purposes.

Who would gather, analyse and publish the data?There are a range of different players here,all of whom have slightly different roles,and all of whom, we argue, are needed:

Internal performance management The failures in quality which were discov-ered in the 1990s would have been takenmore seriously if they had been concernedwith financial management. Performancemanagement in the NHS, despite thecommendable intentions behind the intro-duction of clinical governance and coreand developmental standards, still concen-trates on money and access rather thanclinical quality.

Judging quality is not straightforward,and performance management againstsuch judgements still more complex. ButNHS management is now reasonably wellstructured to take on this role, particular-ly if the comprehensive model is to beused. Strategic Health Authorities (SHAs)are large enough bodies to collect,analyse, and, critically, compare data toallow identification of outliers of goodand bad performance. They have estab-

lished management relationships whichallow both accountability of performanceand the capacity to share good practicebetween different providers (in the man-ner of the Veterans Association in theUS). The local element is also important.The priorities for improvement are likelyto be different in different parts of thecountry, and since SHAs operate belownational level, they have the capacity toset quality improvement priorities whichmore closely reflect local conditions.

There is an emerging consensus that amulti-provider NHS (at least in some sec-tors) will emerge over the next five to tenyears. In this context the commissioningrole of PCTs is essential. Given that thepayment-by-result tariff limits choice onprice, commissioning decisions should intheory be made on quality. At themoment, the information to supportPCTs in doing this is not available.Measurement of quality by SHAs wouldcreate a source for this information. Itwould also further strengthen the incen-tives to improve inside the SHA perform-ance management process.

The external regulator The Healthcare Commission has two dis-tinct programmes which relate respective-ly to the surveillance and comprehensivemodels. The first is a screening and sur-veillance function, and the second is“improvement reviews”, which look indetail at individual services.

The commission is able to do this as inaddition to its analytic functions, it has alarge field force of inspectors (known asassessment managers) who can act astranslators of this information to the serv-ices that are being inspected.

There are, however, limits to what a reg-ulator can legitimately do in this field.These relate to its size, role and function.At its current level of funding (equivalentto 0.07 per cent of the total health expen-diture in the UK) it is simply too small to

Measure for measure

62


undertake comprehensive assessments ofall areas which need to be covered. If weexpect competition to improve quality, theregulator should limit its improvement-focused activities (as opposed to thoseensuring a minimum standard or provid-ing information to facilitate improvementby others) to those areas where “market”-type solutions are less likely to work.

Finally, there are considerable philo-sophical problems in involving a regulator(or indeed a quasi-regulatory body) inimprovement activities. The result couldbe a confused relationship with the bodiesbeing regulated. A regulating body couldfind itself criticising its own activities,which would obviously be difficult.

All this points to the surveillancemodel as a more attractive generalapproach for a regulator to use: it ischeaper, more locally responsive, andmore focused, quickly identifying andcorrecting poor performance. To be usedproperly however, it needs to link withthe general performance managementframework of the NHS. For example, theregulator should use its field force toaddress a problem which exists at onlyone institution, but where the problem isendemic across the area, broader perform-ance management processes for organisa-tions across the area are more appropriate.

The external consultantThere are two types who can be involved.There are the private sector companieswho have a strong analytic, informationmanagement and presentation focus.Obvious examples include Dr Foster andCHKS. For example, Dr Foster havedeveloped and used a surveillance type ofsystem, and their “intelligent board”reports specify data sets which bear someresemblance to the comprehensive modeldescribed above. However, there is almostcertainly room in the market for analyticspecialists working on specific local prob-lems, and indeed very often such small,

targeted, quick and cheap responseswould be far more appropriate to a localsituation than more general approachesfrom larger companies.

The other type of consultant is expertin performance improvement methodolo-gies such as “Lean”. Again, interventionsby such consultants can be entirely appro-priate in certain situations. One would,however, expect them to work with ana-lytic experts and to become involved inresponse to the data.

We believe that the involvement ofboth groups should be encouraged, as thewhole range of potential issues in health-care is too wide to rely on performancemanagement and regulation alone. Inaddition, such organisations can respondmore quickly to local problems. They donot have other statutory responsibilities,nor a broad societal/utilitarian remit thatmust apply to public bodies and whichmitigates against analysing unusual issues.We believe that the impetus for usingtheir services should come from individualproviders seeking to solve their own issues,rather than being imposed by central gov-ernment or regional management.

Recommendations1 The Government should explicitly

recognise three different uses for infor-mation: accountability to the public,choice and assertiveness for patientsand improvement for providers.

Accountability2 The Government should explicitly

reconfirm the role of the HealthcareCommission to publish informationabout minimum acceptable standardsof care and value for money for generalpublic consumption.

3 To assist this, the Government shouldremove the requirement that theHealthcare Commission must give an


Recommendations


overall rating for every trust every year,and replace this with a constantlyupdated “accountability balance sheet”,as described in appendix one.

4 Hospitals should be required to displaytheir current performance as it relatesto safety, accessibility, competence andcompassion.

5 The importance of public health is nowrecognised across the political spec-trum, and the new national targetsfocus on this is to be welcomed.However, holding individual healthcareorganisations to account for achievingthem makes little sense, and has led toa situation where the indicators used tojudge performance have only a tangen-tial relationship with the ultimate goal.We suggest that rather than holdingindividual healthcare providers toaccount, performance at a SHA levelwould allow more meaningful meas-ures to be used.

6 QOF data is currently released on theinternet by the Information Centre forHealth and Social Care, but its presenta-tion as an abstract point-scoring systemmakes it difficult to understand andevaluate. There should be more trans-parent release of data.

Practices should provide aggregatedQOF data for key performance cate-gories (cardiovascular, respiratory,mental health, patient satisfactionand management). The measures inthese groups should be decided byindependent representatives afterpublic reports of outcomes evidence.Data should be updated everymonth.

Targeted patient information7 Information should be collected and

published at the appropriate specialtyor disease group level.

8 The Department of Health shouldexpand funding for National ClinicalAudits, but mandate that the informa-tion should be made available (at a suit-ably aggregated level) for publicationand use by interested parties. As a min-imum, three further disease/procedurespecific websites (of the manner of theheart surgery website) should be creat-ed within five years.182

9 The Department of Health should runa pilot scheme for providing seedmoney to patient groups (e.g. DiabetesUK) to develop and publish a “patient-focused” report for individual servicesusing suitably qualified analysts andimmediately available data with a viewto introducing this as an ongoing com-mitment to openness.

10 The long-term aim should be to havegood enough high quality data collect-ed, expert analysts available, and proce-dures established so that in five yearstime any patient interest group can usenationally collected data to produceinformation for their members which isaccurate, helpful, authoritative, scien-tifically meaningful and easily under-standable.

Information for performance managementand quality improvement11 SHAs have a critical role in using NHS

performance management processes toincrease quality of care. We advocateallowing SHAs to set local goals andincentives to achieve this. This wouldrequire Government to acknowledgethat variation across areas of the UK inthe priorities (and quality) of differentservices is a legitimate response to thevariation in population health needsand priorities.

12 SHAs should be required to developand implement “comprehensive model”

Measure for measure

64

182. http://heartsurgery.health-

carecommission.org.uk/ (last

accessed 03.05.07). We believe

that responsibility for the publica-

tion of such information could lie

with any one of a number of

organisations - patient groups,

royal colleges, private sector

information providers or the

health services regulator.


reviews of the quality of one of theirservices each year, which should gener-ate organisation-specific improvementgoals for the next year. These shouldfocus on areas which are of specific localconcern.

13 The Healthcare Commission shouldcontinue to develop its use of “surveil-lance models” in order to minimise theburden of collecting data and target itsattentions on poorly performing servic-es. It should, in particular, focus onareas where patient choice mechanismsand plurality of provision cannot easilybe used.

14 The high levels achieved during thefirst year of the GMS contract suggestthat its targets were too easy and thereshould be an increase in the thresholdsfor point scoring. The Governmentattempted to address this issue by rais-ing the payment thresholds, modifyingor dropping 30 indicators and addingan additional 18 for 2006-07.183 Wewelcome the increases in the paymentthresholds and believe that further risesshould follow in 2007-08. A system inwhich 83 per cent of available pointsare claimed is scarcely very discriminat-ing; progressively tougher thresholdsshould allow greater discriminationbetween high and low performers, aswell as encouraging improvement.

Information collection and analysis – thesupporting infrastructure 15 In order to fill the gap in knowledge about

the outcomes of most medical interven-tions, the Government should invest inSF-36 for at least two surgical specialtiesacross the whole NHS in the next year, asan interim measure. Longer term,involvement with the still more robustPROMIS project should be sought.

More outcome rather than process

measures should also be introducedinto the GMS and PMS contracts. Thiswould help to quell the criticism thatthese contracts have given GPs a largepay rise for better recording of activitiesthat they were already undertaking.Options for doing this include creatingpatient reported outcomes measures(PROMS) for those undergoing man-agement of chronic conditions andmeasuring unplanned, preventable useof hospital services.

16 The adjusted disease prevalence factorhas not succeeded in focusing resourceson areas of high morbidity. The ADPFshould be replaced by the TDPF. Forpractices with the same number ofpatients on the register, payment addi-tionally increases with the size of thedisease list.

17 NHS Information Centre datashould be made easily available(including easy to use analytic tools)to a wider range of users (whichincludes NHS information depart-ments) in order to encourage a mar-ket in analytic services for the NHS -this should include NHS informationdepartments and private sector ana-lytic consultants.

18 Government should ensure that servicelevel agreements between commission-ers and providers include timely dis-charge data prepared in accordancewith national standards set by theInformation Centre.

19 In order to make this successful, theNHS Information Centre shoulddesign criteria for minimum standardsto be maintained by users of theirinformation – these should concentrateon knowledge of the data, analyticalskill and organisational capacity (for


Recommendations

183. Negotiators agreed that all

lower thresholds for existing indi-

cators should be raised to 40%.

The upper threshold will remain

at 90% for the majority of indica-

tors. For those with an upper

threshold of less than 90%, it will

be raised in line with UK average

achievement in 2005.

New clinical areas agreed

include: heart failure, dementia,

depression, chronic kidney dis-

ease, atrial fibrillation, obesity

and learning disabilities. The fol-

lowing clinical indicators sets

were amended: mental health,

asthma, stroke and transient

ischaemic attack, diabetes,

chronic obstructive pulmonary

disease, epilepsy, cancer and

smoking


example to respect data protection leg-islation).

20 In the long-term, the national NHS ITsystem (Connecting for Health) “spine”provides excellent opportunities forenhanced data collection and use. Tostart planning for the more effective useof this resource, we endorse the recentConservative party white paper’s callfor the establishment of a “referencegroup of academics, economists andthe professions” and recommend thatonce the practicalities of this group’sworking are finalised, item one on theiragenda should be to consider how best

to exploit the data that the NHS ITsystem will contain for better measure-ment and improvement of quality ofcare.

21 We recommend caution in using anyfurther financial incentive systems toincrease quality. Any future develop-ments should be piloted and properlystudied to understand potential perverseeffects and potential for manipulationbefore being run nationally. The pilotshould include a control group of organ-isations receiving information about theincentivised performance measures butno financial incentive.

Measure for measure

66


Appendix one

Explaining why US comparisons are valid

SummaryUS healthcare is structured very differentlyfrom the NHS. It is characterised by amultiplicity of providers and purchasers;greater emphasis on individual choice andless on universality and equity; greaterfragmentation of services; and substantial-ly greater cost. Whatever its shortcomingsas a system, publication of outcome data ismore entrenched in the US.

American case studies are extremelyvaluable in any attempt to make the casefor outcomes data in this country. The USis the world leader in publishing informa-tion about clinical outcomes, a status thatreflects its culture of individualism, choiceand mistrust of paternalist authority –characteristic of a young country with noexperience of aristocracy or feudalism.

Another explanation is that the rapidlyincreasing cost of healthcare has led togreater efforts to ensure value for money, ofwhich measures of quality are an impor-tant part. The US has by some margin themost expensive healthcare system in thedeveloped world, accounting for some 15per cent of GDP.

Part of the problem is a payment systemthat, in general, pays doctors by the proce-dure. This, combined with an increasingperception of the likelihood of litigationresulting from any error or adverse out-come, encourages clinical padding – doc-tors performing more tests and interven-tions than are strictly necessary. Rewardingdoctors on the basis of actual improve-ments in health as a result of their care,rather then paying for all the interventionsthey make, can restrain the growth of theseneedless costs. Employers, major pur-chasers of healthcare for their workers, arealso keen to maximise value. Rather than

focus exclusively on premium costs, theywant improved health outcomes for theirworkers in order to reduce health-relatedabsence.

This is not to suggest that it is only sup-ply side concerns that have driven thegrowing interest in outcome measures.Facilitating consumer choice by allowingpatients to be better-informed abouthealthcare providers is still the ultimatejustification for measuring quality.However, as we will see later in this sec-tion, the evidence that patients pay atten-tion to such data is limited. The more real-istic expectation is that providers them-selves will be moved to improve their per-formance, almost regardless of whetherpatients drive the process through theirpurchasing decisions.

The complex and fragmented nature ofthe US health system can be confusing forBritish observers used to the simplermonopoly of the NHS, and make it diffi-cult to transfer practices between the twocountries. There is a plurality of providersand purchasers, allowing for theoreticallywider choice and more immediate access,at least to specialists. Primary care, verymuch the poor relation in the US, does notplay the same gate-keeping role that it doesin the NHS. Indeed, attempts to create amore integrated, coherent and cheaper sys-tem, such as the introduction of HealthMaintenance Organisations, have incurredresistance precisely because they are seen asrationing care and limiting choice. At thesame time, the wider range of providersleads to greater fragmentation of serviceswhich increases transaction costs. This isoften met with micro-management of clin-ical decision making to reduce costs, whilst



clinicians often need pre-approval frominsurance companies for certain proce-dures and referrals.

The US model is easily caricatured as aprivate health insurance system, withinsurance typically linked to employ-ment, but this picture is somewhat mis-leading. The true nature of the Americansystem is more complex, with massivestate intervention in the form ofMedicare and Medicaid. Indeed, publicspending on healthcare as a share of GDPis as high in the US as it is in the UK. Atthe same time employment-linked healthbenefits are starting to crumble, withfewer employees receiving coverage aspart of the package, and many of thosethat do suffering reduced benefits thatoften require high out-of-pocket expen-diture.

Finally, there are estimated to be 45 mil-lion Americans without health coverage.Some of these either choose to make out-of-pocket payments for their healthcare orare only uninsured for a brief period, butmany can only receive treatment by turn-ing up at emergency wards, which arelegally prevented from turning them awaybut may still charge them for part of theirtreatment. In all, despite its progress inpublishing high quality information, theAmerican health system may be the leastattractive in the developed world, combin-ing extravagantly high costs with seriousproblems of coverage. This report should

not be read as advocating a general shifttowards the American way of organisinghealthcare.

As Marshall et al observe, “Public report-ing in the United States is now much likehealthcare delivery in that country. It isdiverse, is primarily market-based, and lacksan overarching organizational structure orstrategic plan. Public reporting systems varyin what they measure, how they measure it,and how (and to whom) it is reported.”184

Organisations involved in reporting includethe National Committee for QualityAssurance (NCQA), the Centre forMedicare and Medicaid Services, theLeapfrog Group, the Consumer Assessmentof Health Plans, and Healthgrades.

ConclusionThis brief overview has demonstrated that,for reasons of culture and cost-limitation,the US has led the way in measuring out-comes in healthcare. It should be said thatthere is no reason why the lead enjoyed bythe US over the UK in this field should bepermanent. A drawback of America’s frag-mented system is the absence of a single,rationalist system for collecting informa-tion. The data is, as a result, patchy andoften based on samples of patients. It isalso difficult to refine as different organisa-tions measure in different ways. The UK,with perhaps the most unified health struc-ture in the world, does not face these diffi-culties.

Measure for measure

68

184. Marshall M, Shekelle P,

Davies H, Smith P, Public

Reporting on Quality in the

United States and the United

Kingdom, Health Affairs, 2003


Appendix two

Information for accountability


Area Data Sources Data type Data Available/ Analysis undertaken

Additional cost

Judgement of meeting safety domain of core standards

Pre-existing large data sets (e.g. HospitalEpisode Statistics (HES) to target inspection

Judgements based upon structure, process and outcome data

Yes Low

Measures of infectioncontrol – e.g. MRSA and clostridium difficile exposures

Special collections and Health Protection Agency data

Outcome measures Collected in part, analysis somewhat limited

High

Safety

Investigation (HC, Royal Colleges, GMC)

Special collection An “over-riding” judgement organisational failure should over-ride other findings on safety

Low

Judgement of meeting accessible and responsive domain of core standards

Pre-existing large data sets (e.g. HES) to targetinspection


Yes Low

Waiting times Currently collected data 1. Measurements based on breaches of targets 2. Progress over time in reducing waits3. Consideration of the whole distribution of waits (e.g. percentage of patientsseen in 3 months, in 1 month etc)

The data exist to produce a wider range of measures which give a fuller picture of performance on waits andindeed are publicly availablebut are poorly presented and not publicised.

Low

Accessibility

Avoidance of waiting time perverse effects

HES, “Suspension” rates, used to targetinspections where there are concerns that eitherreported figures do not reflect reality, or where there may be perverse consequenceselsewhere in the delivery of care.

Judgement based oninspections targeted by analysis of routinelycollected data

It covers gaming, poor data quality, and unintended consequences.

Moderate

Judgement of meetingclinical (and cost) effective domain of core standards



Yes LowCompetence

Patient Survey assessment of e.g. confidence and trust in doctors treating patients

Patient Survey Outcome from patient perspective

There needs to be additional work either to identify the key few sentinel measures that represent a broader view ofperformance OR to find a method of combining and summarising these data.

Moderate

Judgement of meeting patient focus domain of core standards



Yes LowCompassion

(a) Quality of care matrix


Appropriate spendingThis can be subdivided into three areas:measuring of general good governance,measuring of best practice and measuringof societal value. We suggest that this couldbe achieved relatively cheaply and straight-forwardly by building on the work that the

Healthcare Commission is already doing,but this means that the HealthcareCommission should be given greater free-dom to judge the manner in which it pres-ents information, and should not be com-pelled to create an annual rating for eachhealthcare organisation.

Measure for measure

70


Additional cost

Audit Commission/ Monitorassessment of financial management etc

Audit Commission/ Monitor

Judgement based on formal audit process

Yes Low

Judgement of meeting governance domain of core standards



Yes Low

Good governance

Evidence of poor use of data

Analysis of routine data sets for poor data qualityand anomalies that suggest gaming of data collection and recording to target inspection

Judgement based upon structure, process and outcome data – should have a overriding effect ifdishonesty can be demonstrated

Requires development of analysis and investigation techniques

Moderate

Acute hospital portfolio Specially collected data reflecting use of most cost effective approaches

Largely Structure and process data

Only available for hospitalsector. Each review run rather as an individual project, so any new areashave specific developmentcosts, pre-existing measures have collection costs

ModerateUse of best practice

Judgement of meeting (clinical and) cost effectivedomain of core standards



Yes Low

Judgement of meeting public health domain ofcore standards



Yes Low

Performance against new national standards

Special collection Largely structural data We believe that public healthmeasures should be collected at a locality level (e.g. an SHA area) wheremeaningful action can be pursued.

High

Societal value

Performance against existing national targets (e.g. waiting times in A&E departments)

Pre-existing collection – minimal analysis

Primarily process data These are now well established, but inside this model the relativeimportance of thesemeasures is reduced substantially, reducing thecapacity to distortmanagement priorities noted under targets – as such it is a natural development of themove from star ratings to theannual healthcheck.

Low

(b) Appropriate spending matrix


Appendix three

Information for patient choiceand activation



% patients with BMI recorded

% patients given smoking cessation

% patients given HbA1c test

% patients given retinal screening

% patients with blood pressure tested

% of patients with micro albuminuria testing

% of patients with micro albuminuria treated with ACEinhibitors

Clinical process

% of patients with measured cholesterol

Quality and outcomes framework

Measure of process use relative measures to identify and judge on consistently outlier status

All available at practice level

Set of measures agreed throughinternational consensus drawing on, for example, NQF in US (which has a smaller set for diabetes) and Structured Dialog from Germany (which willprobably have some data on this area). Alternatively QOF weightings could be used although these are not particularly discriminatory.

% of patients who knew theirHbA1c value?

% of patients who knew when to take medication

% of patients who knew how much medication to take

% of patients who received the right amount of verbal information about their diabetes

Patient process

% of patients who received the right amount of written information about their diabetes

Healthcare Commissionpatient survey

Relative measures of variance

All these da ta are currently available. Note this is currently at PCT level while QOF data is at practice level. We select five from approximately 80 indicators which are good sentinel measures.

These were collected as a one off exercise and an 80 measure survey could not be repeated every year but collection of 10% of the data each year would be a justifiable exercise

% of patients with last HbA1c lessthan 7.4 and between 7.4 and 10

% of patients with blood pressure of 145/85 or less

Clinical outcomes

% of patients with cholesterol of 5 mmol/l or less 145/85 or less

QOF Outcome measures, judgement criteria as forclinical process above

All available at practice level

% of patients admitted to hospital because of their diabetes

Patient outcomes

% of patients saying that diabetes affects their day to day living

HC patient survey Relative measures of variance

All these data are currently available. Note this is currently at PCT level while QOF data is at practice level.


Glossary

Access: availability of a required medicalservice. In the UK context this in nearlyall cases refers to the time spent waiting toreceive the service.

Activation: the patient’s willingness to beinvolved and assertive in the decision-mak-ing in their own care.

Case-mix: the relative complexity of thecases being seen by a particular doctor,department or hospital.

Clinical governance: a system throughwhich NHS organisations are accountablefor continuously improving the quality oftheir services and safeguarding high stan-dards of care, by creating an environmentin which clinical excellence will flourish.

Clinical practice guidelines: evidence-based guidelines which cover the appropri-ate treatment and care of patients with spe-cific disease and conditions.

Compassion: in our classification of meas-urement of quality this encompasses thetreatment of patients as human beings,ensuring dignity and respect, concern fortheir suffering, and care taking place indecent conditions.

Construct validity: how well an indicatoractually measures what it purports to.

Consumer: one who consumes healthcare.In this context more or less synonymouswith patient, but implies a more activeapproach to seeking healthcare, and partic-ularly a willingness to “shop around”between different healthcare providers.

Core standards: The 24 Core standardsdetermined by the DH with which allNHS healthcare providers must comply.

These cover clinical and cost effectiveness,governance, patient focus, accessible andresponsive care, care environment andamenities, Public Health.

Domain: these are the categories of judge-ment of care. We propose four foraccountability purposes, safety, access,compassion and competence.

Elective care: non-emergency care, typical-ly admissions into hospital from the wait-ing list.

Gaming: altering practice in order to givethe impression of improved performancerather than actually improving perform-ance. An example of gaming would be toredesignate a corridor as a ward in order toachieve an A&E throughput target.

Indicator: a measure of some aspect of carewhich allows a judgement to be madeabout performance. For example thismight be a percentage of patients receivingcare in accordance with guidelines, orcould be the percentage of patients experi-encing a good or poor outcome as a resultof a particular procedure.

Informatic(s): collection, handling, analy-sis and presentation of information – cov-ering all information related disciplinesincluding information and communica-tion technology, data management, analy-sis and statistics and data presentation.

Institution: healthcare providing or com-missioning organisation.

Measure: a numeric summary of the gen-eral experience of, in this instance, health-care.

National Clinical Audit: clinical audit is aquality improvement process that seeks toimprove patient care and outcomesthrough systematic review of care against

72


explicit criteria. National Clinical auditsare schemes carried out nationally to uni-form standards, allowing national compar-ison of performance.

Outcome - clinical perspective: hardmeasurements of success or failure whichinclude mortality and hospital readmissionrates.

Outcome - patient perspective: thepatient’s perception of whether their stateof health has improved following dischargefrom hospital.

Outmigration: patients being forced togo outside of their local area for their treat-ment, particularly because their local hos-pitals are unwilling to treat them.

Outlier: a data point so extreme as to bestatistically significantly different from theaverage. The data point may refer to anorganisation or an individual. For exam-ple, Dr Shipman was an outlier amongGPs for mortality of patients.

Patient: a person requiring, undergoing, orseeking healthcare.

Performance management: the system ofhierarchical accountability and control inall private and public sector organisations.This typically relates to the achievement ofset objectives which may or may not besubject to formal incentives.

Process - clinical perspective: measures ofhow well the correct clinical procedureswere followed such as following NSF andNICE guidelines.

Process - patient perspective: measures ofthe patients experience of the process ofcare. May include being treated with dig-nity and respect, cleanliness and quality offood and accommodation.

Provider: an organisation providinghealthcare. Typically an NHS trust, inde-pendent sector hospital, or treatment cen-tre.

Purchaser: an organisation purchasinghealthcare: Primary Care Trusts in theUK, private sector insurance companies inthe US.

Quality and outcomes framework (QOF):nearly 150 indicators of the quality ofmanagement and care of patients with arange of diseases used to assess perform-ance of GP practices and support the newGP contract.

Safety: ensuring that care will “first do noharm”, that known risks are minimised andapproved safety procedures are followed.

Sentinel measures: these are measureswhich identify something so unusual, orwhere the message sent is so unequivocal,that further investigation is required. Forexample, mortality following a day caseadmission.

Specialty: particular branch of the medicalprofession e.g. surgery.

Societal value: something which hasimportance for society as a whole, ratherthan just the individual patient. Oftenthis can be linked to a political manifesto,e.g. Labour’s pledge to reduce waitingtimes for non-emergency admissions repre-sented a popular desire for these to bereduced, and thus reducing waiting timesmay have a societal value.

Structure: the adequacy of facilities andequipment, the qualifications of medicalstaff, administrative structure.

Sub-specialty: sub branch of a medical‘specialty’ (see above) e.g. heart surgery.


Glossary


Measure for measure

Using outcome measures to raise standards in the NHS

Richard Hamblin and Janan GaneshEdited by Gavin Lockhart

Measure

for m

ea

sure

R

ichard H

amb

lin an

d Jan

an G

anesh

, edited

by G

avin L

ockh

artP

olicy E

xchan

ge

£10.00

ISBN: 978-1-906097-07-3

Policy ExchangeClutha House

10 Storey’s GateLondon SW1P 3AY

www.policyexchange.org.uk Think Tank of the Year 2006/2007

Modern enterprises see information on their performance asvital – what is extraordinary about the NHS is that it spendsso much on hospital care and knows so little about what thisspending achieves. Outcome data currently focuses onmortality and readmission. This excludes around 90 per centof hospital admissions. Whenever information has beencollected on healthcare, it has revealed serious failings thatrequire corrective action, and has identified high performersfrom whom others can learn.

This report examines measurement schemes in the UK andUS that provide essential lessons for policy makers andincorporates groundbreaking polling that reveals how patientswould react to information on quality. The research showsthat measures of quality must have both visibility andcredibility for clinicians and that publishing measures, ratherthan just reporting confidentially back to providers, increasesthe likelihood of driving up quality. The authors conclude thatthe effects of linking formal incentives (payment) to qualityare ambiguous and that although patients want quality ofcare information, it is not clear that patient choice isstimulated by the publication of performance data.

px nhs cover_hds v3.qxp 24/09/2007 10:03 Page 1

Measur Measure for measure - Policy Exchange · Measure for measurefocuses on informa-tion about...

Documents

Transcript of Measur Measure for measure - Policy Exchange · Measure for measurefocuses on informa-tion about...