Where’s the Rigor in the Field of Usability...

7
66 1541-1672/13/$31.00 © 2013 IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society HUMAN-CENTERED COMPUTING Where’s the Rigor in the Field of Usability Analysis? the standpoint of human-centered computing. Man- ifestly, the two should be quite closely related. The principles of HCC that have been explored in this de- partment dictate a prime goal for intelligent technol- ogy: It must be useful, understandable … and usable. Is Usability Analysis At Risk? At the 2008 South by Southwest–Interactive Con- ference in Austin, Texas, consultant Jared Spool announced to a roomful of more than a thou- sand attendees that the era of user-centered de- sign (UCD) was dead, UCD having been proven ineffective. 1 He supported this claim with an as- sertion that Apple, with their broadly acknowl- edged successful user experience, doesn’t use UCD processes, and with the additional claim that there had been no empirical report of successful imple- mentation of UCD. In stark contradiction to Spool’s claim of UCD being proven ineffective, hundreds of published (and likely thousands of unpublished) studies every year demonstrate an improvement to user interfaces thanks to various types of human factors studies (usability testing, usability inspections, or user re- quirements analysis). Usually, it’s easy for individu- als with a background in human factors, cognitive psychology, or business ethnography to examine software tools or interfaces and design something better, because the products of technologists (how- ever smart and well-intentioned) were based on de- signer-centered design, with the naive view that the designer-technologist can easily walk in the shoes of the workers who would use the tools. We might be witnessing a perfect storm of con- verging forces that make usability more important than ever before. The Internet’s growth and the spread of intelligent technologies across work en- vironments underscore the claim that the pool of usability analysts needs to be “scaled up by a fac- tor of 100.” 2 But there are also forces that put us- ability analysis at risk. One such force is the wide- spread belief in the myths of automation, 3 which foster the belief that simply injecting more tech- nology into the workplace will solve the problems of cognitive and adaptive complexity. Also putting usability analysis at risk are the problems with procurement processes 4 —experi- enced most recently in the problems with website development in support of the Affordable Care Act. When a software developer does a poor job of coding, this gets discovered (usually) at system test time. When someone does a poor job of us- ability engineering, it doesn’t get discovered until the product is out the door and the customer sup- port phones start ringing off the hook, or until the site is live and visitors are leaving it in droves. And then, the manager who paid for that usability work doesn’t say, “We received poor usability support,” he or she says, “Usability isn’t worth it.” The pace of change is yet another force that puts usability analysis at risk; the endless parade of up- grades and releases. How long does it take you to relearn how to “mail merge” when sending out holiday cards each year? Any of a plethora of quo- tations could be used to name the problem. The response is the abrogation of responsibility and the default reliance on mere satisficing. “Let’s get something out there and rely on user feedback to steer the design of release 2.” And thus we explore the question: If usability is to be a valuable, empirical methodology ... then where’s the science in usability analysis? Where’s the Science? We conducted a Google search on “usability testing” looking for links to, say, articles published in peer-re- viewed scientific journals or links to scientific books. U sability is a property of devices (interfaces, work methods, software, and so on) but it’s also a field—and some would say it’s a profession. Here, we examine the concept of usability from Randolph G. Bias, University of Texas at Austin Robert Hoffman, Florida Institute for Human and Machine Cognition Editors: Robert R. Hoffman, Jeffrey M. Bradshaw, and Ken Ford, Florida Institute for Human and Machine Cognition, [email protected] IS-28-06-HCC.indd 66 18/01/14 4:56 PM

Transcript of Where’s the Rigor in the Field of Usability...

Page 1: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

H I S T O R I E S A N D F U T U R E S

66 1541-1672/13/$31.00 © 2013 IEEE Ieee INTeLLIGeNT SYSTemSPublished by the IEEE Computer Society

H U M A N - C E N T E R E D C O M P U T I N G

Where’s the Rigor in the Field of Usability Analysis?

the standpoint of human-centered computing. Man-ifestly, the two should be quite closely related. The principles of HCC that have been explored in this de-partment dictate a prime goal for intelligent technol-ogy: It must be useful, understandable … and usable.

Is Usability Analysis At Risk?At the 2008 South by Southwest–Interactive Con-ference in Austin, Texas, consultant Jared Spool announced to a roomful of more than a thou-sand attendees that the era of user-centered de-sign (UCD) was dead, UCD having been proven ineffective.1 He supported this claim with an as-sertion that Apple, with their broadly acknowl-edged successful user experience, doesn’t use UCD processes, and with the additional claim that there had been no empirical report of successful imple-mentation of UCD.

In stark contradiction to Spool’s claim of UCD being proven ineffective, hundreds of published (and likely thousands of unpublished) studies every year demonstrate an improvement to user interfaces thanks to various types of human factors studies (usability testing, usability inspections, or user re-quirements analysis). Usually, it’s easy for individu-als with a background in human factors, cognitive psychology, or business ethnography to examine software tools or interfaces and design something better, because the products of technologists (how-ever smart and well-intentioned) were based on de-signer-centered design, with the naive view that the designer-technologist can easily walk in the shoes of the workers who would use the tools.

We might be witnessing a perfect storm of con-verging forces that make usability more important than ever before. The Internet’s growth and the

spread of intelligent technologies across work en-vironments underscore the claim that the pool of usability analysts needs to be “scaled up by a fac-tor of 100.”2 But there are also forces that put us-ability analysis at risk. One such force is the wide-spread belief in the myths of automation,3 which foster the belief that simply injecting more tech-nology into the workplace will solve the problems of cognitive and adaptive complexity.

Also putting usability analysis at risk are the problems with procurement processes4—experi-enced most recently in the problems with website development in support of the Affordable Care Act. When a software developer does a poor job of coding, this gets discovered (usually) at system test time. When someone does a poor job of us-ability engineering, it doesn’t get discovered until the product is out the door and the customer sup-port phones start ringing off the hook, or until the site is live and visitors are leaving it in droves. And then, the manager who paid for that usability work doesn’t say, “We received poor usability support,” he or she says, “Usability isn’t worth it.”

The pace of change is yet another force that puts usability analysis at risk; the endless parade of up-grades and releases. How long does it take you to relearn how to “mail merge” when sending out holiday cards each year? Any of a plethora of quo-tations could be used to name the problem. The response is the abrogation of responsibility and the default reliance on mere satisfi cing. “Let’s get something out there and rely on user feedback to steer the design of release 2.”

And thus we explore the question: If usability is to be a valuable, empirical methodology ... then where’s the science in usability analysis?

Where’s the Science?We conducted a Google search on “usability testing” looking for links to, say, articles published in peer-re-viewed scientifi c journals or links to scientifi c books.

Usability is a property of devices (interfaces,

work methods, software, and so on) but it’s

also a fi eld—and some would say it’s a profession.

Here, we examine the concept of usability from

Randolph G. Bias, University of Texas at AustinRobert Hoffman, Florida Institute for Human and Machine Cognition

Editors: robert r. Hoffman, Jeffrey m. bradshaw, and Ken Ford, Florida Institute for Human and Machine Cognition, [email protected]

IS-28-06-HCC.indd 66 18/01/14 4:56 PM

Page 2: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

November/December 2013 www.computer.org/intelligent 67

We had to go many clicks deep to find a link to anything with a modicum of empirical respectability, and that link led to a book about UCD. The prior 51 links were primarily commercial sites, touting “many happy customers,” offer-ing software that does quick-fix tests of webpage usability. Next, we did a simi-lar search on “performance measure-ment,” hopeful that this search phrase would certainly yield copious links to actual scientific work. Our hopes were quickly dashed. Twenty links deep we found a link to the journal Performance Measurement and Metrics. The prior 19 were all commercial.

There seem to be many people out there whose stock-in-trade is convinc-ing other people that they can help them make better decisions. Most of the sites seem to have a business pitch that essentially says that people with-out any particular experience at the application of an established empiri-cal methodology can follow some simple steps and do fast usability test-ing themselves. And thus, the savvy business leader can avoid having to pay the “expensive professionals.” Ironically, the commercial sites tout their own cadre of experts.

In our search, we were particu-larly struck by the ISO definition of usability (and its derivatives, such as the Wikipedia definition): “The ex-tent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.”5 Effectiveness, efficiency, and satisfaction are certainly good goals, from a certain perspective.6 But how are these things to be evaluated or studied? ISO specification of the con-cepts of effectiveness, efficiency, and satisfaction entail measures of affect and measures of performance (in-cluding accuracy and completeness). Too often, the activity called usability analysis consists of questionnaires,

interviews or “expert review,” which can tap the former but not the lat-ter. Where the rubber meets the road, there isn’t much rubber. It’s hard to find science, even where you might expect to see some.

The Training and Certification GapOur Google search notwithstanding, there are authentic and highly com-petent usability researchers and prac-titioners—and these should be given due credit for their excellent contribu-tions. We are, however, skeptical about the extent to which usability profes-sionals put themselves under a suffi-ciently powerful empirical microscope. Too much usability analysis is, bluntly put, amateurish. Worse, the lack of requisite skill sets and an agreed-upon list of best practices is actively pro-moted in the field. The theme of the Usability Professionals’ Association annual meeting in 2006 was “Usabil-ity through Storytelling.” Storytelling has value, as does any method. Learn-ing from stories is arguably an expres-sion of the empiricist philosophy that knowledge comes from experience. But usability analysis must be more than storytelling. The theme of the 2008 Usability Professionals’ Association annual meeting was “The Many Faces of User Experience: Usability through Holistic Practice,” as the society splayed open its arms—“Marketing specialists, graphic designers, computer scientists, business analysts, psycholo-gists, information architects, technical writers and others bring valuable per-spectives to usability and user experi-ence” (see www.usabilityprofessionals. org/conference/2008). Welcoming every-one into the tent isn’t likely to produce an identifiable discipline, nor robust results.

There’s no well-accepted or agreed-upon certification in usability engineer-ing. There’s no standard curriculum

for usability professionals. In the range of fields that span psychology to com-puter science, we provide insufficient education in the history of cognitive task analysis and insufficient training in the conduct of experiments—which is what usability analyses really are, or should be (at least more often). The lack of certification to ensure a level of quality has led to the undermining of usability’s recognition as a scientifically grounded discipline, and an under-standable reluctance to give usability analysts a seat at the software develop-ment table.

The individuals who conduct usabil-ity analysis should be demonstrably proficient at cognitive task analysis. And in addition to the need for prop-erly trained personnel and the need for some sort of certification system, we need to advance the methodology.

The Methodology GapOf course, usability as a discipline is itself influenced by emerging tech-nology. Tools such as WebEx or GoToMeeting have enabled remote usability evaluations,7 allowing easier access to representative test partici-pants around the globe. Crowdsourc-ing is also a possibility for usability testing.8 But to achieve rigor, there must be more than just increasing the sample size by leveraging the Internet.

There has recently been interest in using psychophysical measurements such as eye movements, galvanic skin response, heart rate, and fMRI. These approaches are a narrow window to the mind. What a person is looking at doesn’t tell you what they’re thinking, or what they know, or what they’ve learned. Hence, none of the physiologi-cal methods alone establishes the kind of rigor that’s called for, no matter how much they might pander to a quantita-tive quest for objective purity.

Usability analysis using the method of professional judgment is almost

IS-28-06-HCC.indd 67 18/01/14 4:56 PM

Page 3: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

68 www.computer.org/intelligent Ieee INTeLLIGeNT SYSTemS

always the analysis of some single de-sign, the one presented by the customer. Rarely are independent variables ma-nipulated or controlled within a formal experimental design. Anyone happy with “3–5 users”9 is unlikely to con-duct an actual experiment and apply statistical tests. It should be admitted that the irrational limitations on for-mal experimentation and the number of participants necessary for rigorous usability analysis often are imposed by shortsighted managers and program-matic constraints rather than chosen voluntarily by usability professionals. Recognize-and-fix usability evaluations are also imposed.

Nevertheless, regrettably few re-sources are devoted to evaluating the choice of methods used,10 including their effectiveness and sufficiency. Very little research has been con-ducted to determine which usability methods are best applied in which situations for a given purpose. Thus, a creative series of comparative us-ability evaluations by Rolf Molich11 is noteworthy. A number of profes-sional usability teams independently and simultaneously evaluated web-sites, Web applications, and Micro-soft Windows programs. These stud-ies shed light on usability practice, though they didn’t systematically compare usability analysis methods.

A further limitation is that re-searchers who have empirically investigated usability analysis meth-ods12 have focused on single, simple measures (that is, the number of us-ability problems identified)—which is an inadequate measure, even of productivity. A review of five exper-imental studies comparing usabil-ity evaluation methods found that “small problems in the way these experiments were designed and con-ducted call into serious question what we thought we knew about the effi-cacy of various methods.”13

Kasper Hornbaek14 conducted an excellent review of usability measures from 180 studies published in core human-computer interaction journals and proceedings. We note some of the issues of measurement validity and reliability that he identified:

•Approximately one quarter of the studies don’t utilize a measure of user performance effectiveness (for example, success and accuracy at achieving task goals, and error rate), and some studies confuse a measure of efficiency (such as re-sources used and time spent) with a measure of effectiveness.

•Measures of the quality of interac-tion are rarely utilized.

•Approximately one quarter of the studies don’t assess the outcome of the users’ interaction (that, the qual-ity of the users’ work products).

•Measures are often interpreted di-rectly when they actually have no simple or straightforward inter-pretation. (For example, how is the number of webpages visited a measure of usability? Is completion time an indicator of efficiency or an indicator or motivation?).

•Measures of learning and reten-tion are rarely employed, despite being recommended in prominent textbooks.

•Measures of user satisfaction rein-vent questions that can be found in available, validated questionnaires “... the diversity of words used in rating scales is simply astonishing” (p. 90).14

• Some studies conflate users’ per-ceptions (that is, judgments of the learnability of an interface) with objective measures (such as time needed to master an interface to a certain performance criterion).

We also note that Hornbaek selected the 180 studies from a larger set of

587 journal articles and conference proceedings papers. Some studies were deselected because they didn’t involve any systematic comparison (for example, of alternative interface designs); also, some didn’t report any quantitative measures and only re-counted selected comments from in-formal user experiences.

While usability study can have a positive impact in product and service delivery, little controlled experimen-tation has been done to determine which usability methods provide the most significant impact.10,14

Toward Standards for Scientific Usability AnalysisSeveral methods developed by re-searchers in human factors, cognitive ergonomics, cognitive psychology, and cognitive systems engineering have a track record of success in eliciting re-quirements, analyzing tasks and goals, and evaluating performance. Indeed, rigorous methods for evaluating hu-man performance and learning are more than 150 years old. Methods for what is today called contextual inquiry can be dated to the earliest days of industrial psychology.15 These venerable methods have been invigo-rated and adapted for task analysis for modern work systems and work contexts.16,17

All of these cognitive task analysis methods involve studying what hap-pens when “users” interact with tech-nology. The users can be engaged in a task walkthrough and can be asked questions. Measurements can be made of what they do and how they perform. Conclusions can be reached about what they learn and how they reason. This is called “end-user testing.”18 But there’s another path taken in the field of usabil-ity analysis. This is the path in which experienced usability professionals make their own evaluations and judg-ments, most commonly called heuristic

IS-28-06-HCC.indd 68 18/01/14 4:56 PM

Page 4: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

November/December 2013 www.computer.org/intelligent 69

evaluation.19 Based on the experience of cognitive systems engineers and experi-mental psychologists—who have devel-oped and applied methods for cognitive task analysis and performance measure-ment20—what might be a good fi rst stab at a description of best practice for scientifi c usability analysis?

We can take it as a foregone conclu-sion that no single method of usability analysis will be “best” for all circum-stances. Methods must be adapted to the goals and focus of the analysis (for example, the software tool, interface, or webpage). It’s also a foregone con-clusion that usability analysis proj-ects should employ multiple methods and measures. Both of these aspects of methodology were recognized in the early days of expert systems, when at-tempts were made to compare meth-ods for eliciting experts’ knowledge.

We believe an important step to-ward increased rigor in usability practice would be the systematic cat-egorization and application of usabil-ity methods. Usability analysis meth-ods can be described in terms of a handful of basic categories:

• Retrospective methods engage us-ers (or an analyst) in the use of a tool, and then after some time at that, pause to reconstruct their rea-soning while using (or learning to use) the tool or work method.

• Concurrent methods engage the user (or analyst) in the use of a tool—and at certain times they’re briefl y interrupted and are pre-sented with probe questions that elicit judgments concerning the tool or work method or statements about what they are thinking. While there may be some concern that interrupting the user might disrupt the fl ow of reasoning, re-search by experimental psycholo-gists suggests that brief interrup-tions don’t make much difference.21

• Performance analysis methods are focused on measuring actual per-formance, using such measures as time-on-task, goal achievement, and so forth. Ultimately, the goal for usability analysis is to reach con-clusions about the quality of per-formance using some tool and the method it instills.

The retrospective, concurrent, and performance analysis methods are all focused on creating a cognitive de-composition. This is a description of user activity with reference to what they know, what they learn, and how they reason. Many different questions can be boiled out from the general question of “Is this tool any good?”

Questions that are formative of ret-rospective methods include:

•What knowledge or previous ex-perience did you rely on to use this tool? Did you rely on knowledge of similar tools (interfaces and so on)?

•At what points did you fi nd your-self asking questions? Did you look for answers to those questions?

•Were you reminded of any previous experience?

•Does this tool (or interface) fi t a standard or typical task you were trained to deal with?

• If your decisions and actions weren’t the best, what training, knowledge, or information could have helped?

•How would you describe the tool (or interface) to someone else?

Questions that are formative of the concurrent methods include

•What are you looking for? Why are you looking for it?

•Why has this caught your attention and what will you do with it?

•What were your specifi c goals and objectives?

•What do you need to proceed further?

• Is there anything you are concerned, confused, or frustrated about?

•What did you decide to do next?•What did you expect to happen

next? To what degree were your ex-pectations met?

•Did you imagine the events that would unfold?

•What mistakes might have occurred here?

•What were the alternative choices that you could have made?

•Were any alternatives rejected?•At this point, what might be done

by someone having less experience than you?

A primary methodological ques-tion regarding concurrent methods is when to interrupt and ask questions. On this, too, there are a number of al-ternatives. There’s no a priori meth-odological cookbook. A probe ques-tion might be asked whenever a user shows visible signs of frustration. A probe might be asked every other time the user enters information into a fi eld, or clicks the mouse. Certainly, not all of the above questions are to be asked every time. Nor are these lists exhaustive.6

The third class of methods is Per-formance Analysis. Many attempts at performance analysis in the fi eld of usability employ single performance measures, such as the time taken to achieve specifi c primary task goals. There are many possibilities that can be adapted from the fi eld of training

Many different questions can be boiled out from the general question of “Is this tool any good?”

IS-28-06-HCC.indd 69 18/01/14 4:56 PM

Page 5: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

70 www.computer.org/intelligent Ieee INTeLLIGeNT SYSTemS

and performance analysis.22 One of the more powerful performance analy-sis methods, particularly suited to the analysis of software tools, focuses on learnability.23

Step 1. Users perform until they reach some criterion of performance on a goal or a conjoint set of goals. A criterion can be based on time to achieve a goal or subgoal, or the number of attempts that must be made before a goal or sub-goal is achieved, or even the number of  attempts that must be made before a goal or subgoal is achieved twice in a row.

Step 2. There is a hiatus period of perhaps a few days.

Step 3. Users are once again placed into a task performance context, and again the researcher determines the number of attempts or the time it takes for the user to reachieve the criterion. Comparison of the number of attempts needed to reachieve crite-rion performance can be used to es-timate the original learning strength, to derive learning curves, and deter-mine the extent to which the skill is retained.

We call out the study of learnability because it should be emphasized in usability studies. As Hornbaek noted, out of 180 studies “only five studies

measured changes in efficiency over time” (p. 87).14 Table 1 summarizes our characterization of methods.

This scheme helps us clarify the often-asked question of how many participants to involve in a usability study.24 There are really two ques-tions here, one about the number of “usability experts” to be involved in an analysis and the other about the number of “representative users” to be involved.

If the participant-analysts are us-ability “experts,” the analysis should employ no fewer than five partic-ipant-experts but need employ no more than seven. The rationale for 5–7 is that studies of the diminishing return on cognitive task analysis have shown that after about five to seven task decompositions by domain ex-perts, there’s usually little more to be learned.25 Ideally, the “experts” who are doing the evaluation would be in-dividuals who are proficient in the content domain (of the tool, website, interface, and so on) and are genuine experts in usability analysis. But for now, all we really have to go on is the finding about diminishing returns for experts doing task decomposition.

The low end accords with the “five users” benchmark set by Jakob Nielsen9 and Robert Virzi.26 They found that the percentage of usability problems that participants identified

asymptoted at about 80 to 85 percent for about five participants. This was for a single task and measure—the identification of usability problems. We wonder if the same asymptote ob-tains for other tasks and measures. But more to the point, typical usabil-ity analyses use fewer than five par-ticipants. Even more to the point, Nielsen’s study involved “representa-tive users,” and not usability experts. It’s hard to avoid seeing this as a rigor gap. What’s needed are more experi-ments, using certified experts, show-ing that you get the diminishing re-turn at three (or some number fewer than five). Since no one has yet shown that, we choose to err on the side that would be taken by mainstream exper-imental psychologists, and advocate for no fewer than five. We’ve never seen five usability experts tackle the same problem (except in the Molich 2012 studies11). Given the evaluator effects that Rolf Molich and his col-leagues found, how would usabil-ity professionals justify conducting studies with fewer than five usability professionals?

If the participant-analysts are rep-resentatives of some user population (“Layperson” in Table 1) then the us-ability analysis should be thought of as being a proper psychological ex-periment, in which case a sample size of about seven is certainly too small; it puts someone at risk of confound-ing any salient effects with individual differences. When conceived in refer-ence to the traditional psychology lab-oratory, a sample size of at least 15 would be preferred. Interestingly, the Nielsen study used 15 participants as the benchmark for deciding what 100 percent meant in their analysis of di-minishing returns. But we would take this a step further. As Kasper Horn-baek argued,14 there needs to be some sort of comparison, some manipu-lated independent variable. In the case

Table 1. A categorization of usability evaluation methods.

Question Analyst Perspective Method

What do people desire?

“Expert” analyst Self RetrospectiveConcurrentPerformance analysisCombinations of methods

Other

Layperson Self RetrospectiveConcurrentPerformance analysisCombinations of methods

Other

Is it any good? “Expert” analyst Self RetrospectiveConcurrentPerformance analysisCombinations of methods

Other

Layperson Self RetrospectiveConcurrentPerformance analysisCombinations of methods

Other

IS-28-06-HCC.indd 70 18/01/14 4:56 PM

Page 6: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

November/December 2013 www.computer.org/intelligent 71

of participant characteristics, there should be some specific sampling plan that makes a sample type an inde-pendent variable, to ensure empirical rigor. For instance, one group of 15 participants might be individuals hav-ing little or no experience using the given type of software or interface, and a second group might be individ-uals having had some or considerable experience. The excellent work of Jeff Sauro and James Lewis27 offers direc-tion on sample size decisions, and is a great resource for those interested in making informed decisions about sample size.

Keeping Hope AliveThe concerns about the methodol-ogy of usability study expressed by Hornbaek in 2006 remain pertinent. Rigor in usability studies could be improved if the following ideas were implemented:

1. Individuals who tout themselves as usability experts should pro-vide up-front convincing empiri-cal evidence that they are, in fact, experts.

2. Developers of software (tools, in-terfaces, and webpages) should conduct thorough analyses of user desirements28 prior to or during the initial prototyping, and thereby avoid needing to have a group of “representative users” rattle off de-sign problems that the developers should have been able to anticipate and avoid in the first place.

3. Focus usability methodology on the analysis of actual performance and learnability23 rather than re-lying exclusively on satisficing through the use of questionnaires or surveys.

In 1988, Donald Norman charac-terized usability as “the next com-petitive frontier” (p. vi).29 With the

commoditization of hardware and, increasingly, software, what might dif-ferentiate products would be how us-able they are. We believe that this is true, but the lack of rigor in the ap-plication of usability practices has re-tarded acceptance of this truth; we advocate for usability analysis as an applied science motivated by the con-cepts and laws of human-centered computing.

AcknowledgmentsRandolph Bias would like to gratefully acknowledge financial support from the University of Texas at Austin IC2 Institute.

References1. J. Spool, “Magic and Mental Models:

Using Illusion to Simplify Designs,”

presentation, SXSW Interactive Conf.,

2008.

2. J. Nielsen, “Usability for the Masses,”

J. Usability Studies, vol. 1, no. 1, 2005,

pp. 2–3.

3. J.M. Bradshaw et al., “The Seven Deadly

Myths of ‘Autonomous Systems’,” IEEE

Intelligent Systems, vol. 28, no. 3, 2013,

pp. 54–61.

4. K. Neville et al., “The Procurement

Woes Revisited,” IEEE Intelligent

Systems, vol. 23, no. 1, 2008,

pp. 72–75.

5. ISO 9241-11, Ergonomic Requirements

for Office Work with Visual Display

Terminals (VDTs)—Part 11: Guidance

on Usability, 1998.

6. R.R. Hoffman, M. Marx, and P.A.

Hancock, “Metrics, Metrics, Metrics:

Negative Hedonicity,” IEEE Intelligent

Systems, Mar.-Apr. 2008, pp. 69–73.

7. R.G. Bias, and S.C. Huang, “Remote,

Remote, Remote, Remote Usability

Testing,” Proc. Society for Technical

Comm. Summit, 2010.

8. D. Liu et al., “Crowdsourcing for Us-

ability Testing,” Ann. Meeting of the Am.

Soc. Information Science and Technol-

ogy, 2012, Baltimore, October.

9. J. Nielsen, Why You Only Need to Test

with 5 Users, Nielsen Norman Group,

2000; www.useit.com/alertbox/

20000319.html.

10. R.G. Bias et al., “Clothing the Naked

Emperor: The Unfulfilled Promise of the

Science of Usability,” interactions, vol.

xx.6, Nov./Dec. 2013, p. 72.

11. R. Molich, CUE—Comparative

Usability Evaluation, user study;

www.dialogdesign.dk/CUE.html.

12. R. Jeffries et al., User Interface Evalua-

tion in the Real World: A Comparison

of Four Techniques, Proc. Conf. Human

Factors in Computing Systems, 1991,

pp. 119–124.

13. W.D. Gray and M.C. Salzman, “Damaged

Merchandise? A Review of Experiments

That Compare Usability Evaluation

Methods,” Human- Computer Interaction,

vol. 13, no. 3, 1998, pp. 203–261.

14. K. Hornbaek, “Current Practice in

Measuring Usability: Challenges to

Usability Studies and Research,” Int’l

J. Human-Computer Studies, vol. 64,

no. 2, 2006, pp. 79–102.

15. R.R. Hoffman and L.G. Militello,

Perspectives on Cognitive Task Analysis:

Historical Origins and Modern Commu-

nities of Practice, CRC Press/Taylor and

Francis, 2008.

16. H. Beyer and K. Holtzblatt, Contextual

Design: Defining Customer-Centered

Systems, Morgan Kaufmann, 2007.

17. J.L. Fath and R.G. Bias, “Taking the

‘Task’ out of Task Analysis,” Proc. 36th

Ann. Meeting of the Human Factors

Society, 1992, pp. 379–383.

18. J. Rubin and D. Chisnell, Handbook of

Usability Testing: How to Plan, Design,

and Conduct Effective Tests, John Wiley,

2008.

19. J. Nielsen, “How to Conduct a Heuristic

Evaluation,” 1995; www.nngroup.com/

articles/how-to-conduct-a-heuristic-

evaluation.

20. B. Crandall, G. Klein, and R.R. Hoffman,

Working Minds: A Practitioner’s Guide

to Cognitive Task Analysis, MIT Press,

2006.

IS-28-06-HCC.indd 71 18/01/14 4:56 PM

Page 7: Where’s the Rigor in the Field of Usability Analysis?cmapsinternal.ihmc.us/rid=1MJNFSW0C-10B6J9P-1D3Q/53. Rigor in … · the extent to which usability profes-sionals put themselves

72 www.computer.org/intelligent Ieee INTeLLIGeNT SYSTemS

21. M.R. Endsley, “Measurement of

Situation Awareness in Dynamic Sys-

tems,” Human Factors, vol. 37, 1995,

pp. 65–84.

22. R.R. Hoffman et al., Accelerated

Expertise: Training for High Proficien-

cy in a Complex World. Boca Raton,

FL: Taylor and Francis/CRC Press,

2014.

23. R.R. Hoffman et al., “Measurement

for Evaluating the Learnability and

Resilience of Methods of Cognitive

Work,” Theoretical Issues in Ergo-

nomic Science, vol. 11, no. 6, 2010, pp.

561–575.

24. K.M. Dresel, “Testing Usability Goals

with Small Samples: A Binomial Analysis

Methodology,” Proc. Human Factors

and Ergonomics Soc. 57th Ann. Meeting,

2013, pp. 1303–1307.

25. C.J. Chao and G. Salvendy, “Percent-

age of Procedural Knowledge Ac-

quired as a Function of the Number

of Experts from Whom Knowledge Is

Acquired for Diagnosis, Debugging,

and Interpretation Tasks,” Int’l J.

Human-Computer Interaction, vol. 6,

1994, pp. 221–233.

26. R.A. Virzi, “Refining the Test Phase

of Usability Evaluation: How Many

Subjects Is Enough?” Human Factors,

vol. 34, no. 4, 1992, 457–471.

27. J. Sauro and J.R. Lewis, Quantifying

the User Experience: Practical Statistics

for User Research, Morgan Kaufmann,

2012.

28. R.R. Hoffman and M.J. McCloskey,

“Envisioning Desirements,” IEEE

Intelligent Systems, vol. 28, no. 4, 2013,

pp. 82–89.

29. D.A. Norman, The Design of Everyday

Things, Basic Books, 1998.

randolph G. bias is a professor in the

School of Information and director of the

Information eXperience (IX) Lab at the Uni-

versity of Texas at Austin. Contact him at

[email protected].

robert Hoffman is a senior research scientist

at the Institute for Human and Machine Cog-

nition. Contact him at [email protected].

Selected CS articles and columns are also available for free at

http://ComputingNow.computer.org.

Expert Online Courses — Just $49.00

Topics:Project Management, Software Security, Embedded Systems, and more.

www.computer.org/online-courses

IS-28-06-HCC.indd 72 18/01/14 4:56 PM