Interactive systems need safety locks

19
Interactive systems need safety locks Harold Thimbleby Swansea University

Transcript of Interactive systems need safety locks

Page 1: Interactive systems need safety locks

Interactive systems need safety locksHarold ThimblebySwansea University

Page 2: Interactive systems need safety locks
Page 3: Interactive systems need safety locks

Panama incident, 2000–2001

18 patients died

2 radiologists imprisoned for manslaughter

whether you write software in small or large teams; andwhether you operate domestically or in multiple nations ina rapidly globalizing economy. You are at risk if you placeyour product in conditions where human lives are at stake.Indeed, it’s not the first time that software has been a sus-pect in a series of unexpected fatalities.

! In the mid-1980s, poor software design in another radi-ation machine, known as the Therac-25, contributed to thedeaths of three cancer patients. The Therac-25 was built byAtomic Energy of Canada Ltd., which is a Crown corporationof the government of Canada. In 1988, the company incorpo-rated and sold its radiation-systems assets under the Thera-tronics brand. There does not appear to be any formalinvestigation of the Therac-25 accidents, but according to an in-depth examination by Nancy Leveson, now a professor at theMassachusetts Institute of Technology, and the accounts ofother software experts, the design flaws included the inabilityof the software to handle some of the data it was given; and thedelivery of hard-to-decipher user messages. In a twist of fate,Theratronics, which was ultimately acquired by the Canadianlife-sciences company MDS, manufactured the radiation-ther-apy machine used at the cancer institute in Panama.

!In February 1991, during Operation Desert Storm, anIraqi SCUD missile hit a U.S. Army barracks in Saudi Arabia,killing 28 Americans. The approach of the SCUD should havebeen noticed by a Patriot missile battery. A subsequent gov-ernment investigation found a flaw in the Patriot’s weapons-control software, however, that prevented the system fromproperly tracking the missile. More recently, during OperationIraqi Freedom, the Patriot missile system mistakenly downeda British Tornado fighter and, according to the Los Angeles Timesand other reports, an American F/A-18c Hornet. The pilot inthe single-seat Hornet and the two crew members aboard theBritish jet were killed. The incidents are still under investiga-tion, but Pentagon sources familiar with the Hornet incidenttold the L.A. Times that investigators were looking at a glitchin the missile’s radar system that made it incapable of properlydistinguishing between a friendly plane and an enemy missile.Raytheon, the maker of the Patriot missile system, did notwant to comment on the 1991 incident. It also said the gov-ernment was still investigating the more recent incidents andthat reports the software may be at fault were “off base.”

!A software glitch was cited in a Dec. 11, 2000, crash of aU.S. Marine Corps Osprey tilt-rotor aircraft, in which all fourMarines on board were killed. According to Marine Corps Maj.Gen. Martin Berndt, who presented the finding from a JudgeAdvocate General investigation, “the mishap resulted from ahydraulic-system failure compounded by a computer-softwareanomaly.” Ahydraulic line broke in one of the craft’s two enginecasings as the pods were being moved from airplane mode tohelicopter mode in preparation for landing. When the flight-control computer realized the problem, it stopped the rota-tion of the engine pods. The pilots, trained to respond, tried toreset the pods by pressing the primary reset button, but thefinding stated that a glitch caused “significant pitch and thrustchanges in both prop rotors,” which led to a stall. The planecrashed in a marsh. The craft is made by a partnership of Boeingand Bell Helicopter. A Boeing spokesman said changes weremade in the software but referred requests for details aboutthe software anomaly to the government. Aspokesman for theNavy’s Air Systems Command, which investigated the inci-

CA

SE

10

9

PA

NA

MA

’S

C

AN

CE

R

IN

ST

IT

UT

EB

AS

ELI

NE

MA

RC

H 2

00

4

5

211851364675232564851956357363544859326434563954653065773675612543627457923151

PANAMANIANSVictor GarciaRetired businessmanGarcia is one of seven can-cer patients who survivedthe radiation overdoses atthe institute in 2000. Overall,21 patients have died. He isa party to lawsuits againstMultidata InternationalSystems and MDS, theowner of the Cobalt-60therapy machine, in bothPanama and the U.S.

Dr. Juan Pablo BaresDirector, NCIBares sought internationalhelp to understand thecauses of the overdosesafter the hospital realized in March of 2001 they hadoccurred. Bares offered toresign after the overdosesbecame public in 2001, butthe hospital board refused to accept his resignation.

Camilo JorgeServices manager, ProMed ProMed solicited a bid fromMultidata on a referral fromGeneral Electric MedicalSystems because the NCIcouldn’t afford the treat-ment-planning softwareoffered with the Cobalt-60teletherapy machine. Jorgesays it is the only timeProMed has done business with Multidata.

Cristobal ArboledaSpecial Superior Deputy ProsecutorArboleda led theinvestigation into the causes of the overdoses forPanama’s Ministry of Healthand is now prosecuting the

physicists. He says hisoffice had little experi-ence with softwareand his staff has hadto learn on the job.

MULTIDATA SYSTEMSINTERNATIONAL Mick ConleyGeneral businessmanagerConley, a 13-year vet-eran of the radiation-therapy systemscompany, overseesproduct sales andmarketing. He’sunflappable whenpressed about the role of the company’ssoftware in theradiation accidents inPanama and maintainsthat they would nothave happened if thestaff at the NCI hadfollowed the manualand verified the soft-ware’s results beforetreating patients.

Arne RoestelPresidentRoestel runs the privatelyheld company, which hefounded in 1979. He’s beenworking in the radiation-treatment software industrysince the late 1960s.Business manager Conleycalls him a pioneer in thefield and says he was one ofthe first people in the coun-try to work on computerizedradiation-treatment systems.

OFFICE OF COMPLIANCE,CENTER FOR DEVICES ANDRADIOLOGICAL HEALTH(CDRH), FOOD AND DRUGADMINISTRATION Timothy Ulatowski Director A 30-year veteran of theFDA, and one of the fewpeople to wear a tie in theCDRH office, Ulatowski is adirect, to-the-point managerwho oversees a number ofFDA operations, includingenforcement of medical-device and radiological-health laws and regulations.He holds a B.S. inmicrobiology and an M.S.in biomedical engineering.

John MurraySoftware and Part 11compliance expert Murray is the CDRH’s pri-mary advisor on all aspectsof software, including valida-tion, policy, and classifica-tion. He holds an undergrad-uate degree in electricalengineering and a graduatedegree in computer science;he can explain the complexi-ties of medical devices andtheir software in lay terms.

THE PLAYER ROSTER

Olivia SaldañaPhysicist, National CancerInstitute (NCI), PanamaSaldaña is one of three physicistscharged with second-degreemurder in Panama for enteringdata into Multidata’s software that produced inaccurate amounts of time for patients to be treated with a Cobalt-60 beam.She continues to work at the hos-pital because, she says, “If we didnot work, the patients would die.”

Olivia Saldaña

Page 4: Interactive systems need safety locks

The user must CAREFULLY check if results are correct BEFORE using in treatment.

A USER SHOULD VERIFY THE RESULTS THROUGH INDEPENDENT MEANS until the USER’S PROFESSIONAL CRITERIA IS SATISFIED.

We make dangerous things.

There are no designed-in safety locks.

It’s your fault if anything goes wrong.

Paraphrased…

Page 5: Interactive systems need safety locks

Interactive systems need safety locksHarold ThimblebySwansea University

Why safety locks?

1. People make slips

2. Safety locks stop some slips causing harm

3. Bad design allows slips to cause harm “We do not address error-handling”

1292 pages

“beautifully written introduction to design of algorithms”

“the bible of the field”

“best textbook ever seen”

Page 6: Interactive systems need safety locks

System.out.println(12345678 +8765432l)

21111110

System.out.println(12345678 +8765432)

Java

System.out.println(1111111111 +1111111111)

-2072745074

Java

happens everywhere

Learned?

• Human slips get incorrect results

• Slips go undetected

• Applications do not provide “safety locks”

• Bad design causes errors

Page 7: Interactive systems need safety locks

• Errors alwayshappen

• Safety locksreduce errors and their consequences

Page 8: Interactive systems need safety locks

93% of nurses make numerical errors

I D Kapborg, “Calculation and administration of drug dosage by Swedish nurses, student nurses and physicians,” Int J Quality in Healthcare, 6(4):389-395, 1994.

If we know that, why aren’t there safety locks?

Denise Melanson22 August 2006

Page 9: Interactive systems need safety locks

! !

! !

!!!!!!!!!!!!!!!!R

oo

t C

au

se

An

aly

sis

Fluorouracil Incident Root Cause Analysis

Final Draft: April 5, 2007 Final Formatted Report: April 30, 2007

Formatted for Web Posting: May 22, 2007

!

!

Institute for Safe Medication Practices Canada® Institut pour l’utilisation sécuritaire des médicaments du Canada® !

5250 mg

45.57 mg/mL

4 days

Page 10: Interactive systems need safety locks

45.57 mg per mL

5,250 mg(4 days x 24 hours per day)(4 days x 24 hours per day)

AC MRC MRC 4 ! 2 4 M+ AC

5 2 5 0 ÷

4 5 . 5 7÷

MRC =

22 keystrokes

Four problems

• Calculators are different

• People make slips

• Calculators don’t detect or block errors

• Things will go wrong

Page 11: Interactive systems need safety locks

onlyblocks

2 errors

Safety locks block slips

Safety locks block slips

Press 1.2.3Abbott 999, 123, 1.2, 1:23AM, 1:23PM

Graseby 3400 1.3

Casio HS8V, HS85 1.23

Mathematica 0.36

Excel 0

Word 1.5

Alpha 6

Page 12: Interactive systems need safety locks

GrJ1 J2 N1 N2 J3 J4MeKeys

Patient dies

Log shows 55 mg

Should be 5.5 mg

Nurse at fault

Task — enter 5.5 mg

5

Page 13: Interactive systems need safety locks

5• 5••

5• 5•5

Page 14: Interactive systems need safety locks

5

5• 5•

Page 15: Interactive systems need safety locks

5 55

Nurse thinks 5•5

Log shows 55

what can we achieve with safety locks?

Page 16: Interactive systems need safety locks

blocks 35 types of error

blocks 35 types of error

Page 17: Interactive systems need safety locks

• Where’s thesafety lock?

• You cannot make data entry errors

Page 18: Interactive systems need safety locks

is it any good?halve the

death rate

Reducing number entry errors:solving a widespread, serious problem

Harold Thimbleby1,* and Paul Cairns2

1Future Interaction Technology Laboratory, Swansea University, Swansea SA2 8PP, UK2Department of Computer Science, University of York, York YO10 5DD, UK

Number entry is ubiquitous: it is required in many fields including science, healthcare, edu-cation, government, mathematics and finance. People entering numbers are to be expectedto make errors, but shockingly few systems make any effort to detect, block or otherwisemanage errors. Worse, errors may be ignored but processed in arbitrary ways, withunintended results. A standard class of error (defined in the paper) is an ‘out by 10error’, which is easily made by miskeying a decimal point or a zero. In safety-criticaldomains, such as drug delivery, out by 10 errors generally have adverse consequences.Here, we expose the extent of the problem of numeric errors in a very wide range of sys-tems. An analysis of better error management is presented: under reasonableassumptions, we show that the probability of out by 10 errors can be halved by betteruser interface design. We provide a demonstration user interface to show that the approachis practical.

To kill an error is as good a service as, and sometimes even better than, the establishing ofa new truth or fact.

(Charles Darwin 1879 [2008], p. 229)

Keywords: number entry; human error; dependable systems; user interfaces

1. INTRODUCTION

At first sight, typing numbers is such a mundane taskthat it seems not to merit a second glance. Naturally,when it comes to entering numbers, humans are proneto make errors, but—astonishingly—many systemsmake no effort to detect or manage possible errors,causing incorrect and unpredictable results. Thispaper exposes the extent of this problem in a widerange of systems. We show that the problem cannotbe dismissed merely by blaming the user: indeed, weshow that some system logs, which might otherwise bethought of as a formal record of user actions, cannotbe relied on to assign blame.

Systems should be designed to manage errors, aserrors will always eventually occur regardless of userskill or training. We therefore show how better designsfor number entry may be approached; we present a new,improved user interface for preventing many numberentry errors, and we argue that the new approach canapproximately halve the probability of an importantclass of adverse events arising from number entry error.

We note that problems with complex software arewidely recognized (Leveson 1995; Fox et al. 2009;Hoare 2009; Jackson 2009), but, to our knowledge, thisarticle is the first to report the extent of serious problems

with the seemingly trivial issue of processing numberentry.

2. WIDESPREAD PROBLEMS WITH REALSYSTEMS

Entering numbers seems like an apparently routinetask, but it is in fact less dependable than it appears.Figure 1a shows an everyday example, here takenfrom Microsoft Excel (or Apple Numbers; the twoapplications behave in essentially the same way forthe purposes of this paper). Two columns of numbersare supposed to be added up. In figure 1, the columntotals should be the same, but small typing errorsmake the totals incorrect without any warning, eventhough no user is likely to want things that look likenumbers (e.g. ‘3.1’) to be treated as anything but thenumbers they seem to be. Using Excel’s ‘show pre-cedents’ feature, there is no indication that there is aproblem (see figure 1b). And with frankly devious useof the formatting functions, even greater errors are poss-ible, as in figure 1c—though we note that it is very easyto lose track of formatting, and the type of error illus-trated here could arise by accident and be very hardto track down.

The examples in figure 1 illustrate the problems: theerrors, whether caused intentionally or through acciden-tal slips, are not immediately obvious to a casual glance,though for illustrative purposes the examples are not so

*Author for correspondence ([email protected]).

Electronic supplementary material is available at http://dx.doi.org/10.1098/rsif.2010.0112 or via http://rsif.royalsocietypublishing.org.

J. R. Soc. Interfacedoi:10.1098/rsif.2010.0112

Published online

Received 27 February 2010Accepted 15 March 2010 1 This journal is q 2010 The Royal Society

on April 8, 2010rsif.royalsocietypublishing.orgDownloaded from

Interactive systems need safety locksHarold ThimblebySwansea University

Why safety locks?

1. People make slips

2. Safety locks stop (some) slips causing harm

3. Bad design allows slips to cause harm

Page 19: Interactive systems need safety locks

ideas1. Safety locks work

2. They aren’t difficult to program

3. They save lives

4. Go and put them in!

Press OnPrinciples of interaction programming

MIT Press, 2007

Only 200K hb / 150K pb£25 hb / £18 pb

mitpress.com/presson

…Think of a dose

…Say, 5•5

…Sometimes make slips

…Enter 5•5 58 5••5 etc

…Classify out-by-ten errors0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7