Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim...

58
Usability Testing I214 11 Sept 08 Prof. Van House Usability lab at Oracle http://flickr.com/photos/52056151@N00/1846245768/

Transcript of Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim...

Page 1: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability Testing

I21411 Sept 08

Prof. Van House

Usability lab at Oracle

http://flickr.com/photos/52056151@N00/1846245768/

Page 2: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Rooted in Experimental Design

• Principles– Validity

•Internal (to the test)•External (related to the real world)

– Reliability

•Methods– Designed to control for influence of factors other than what’s being tested

Page 3: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

The canonical usability test• Participants: are or represent real users• Tasks: 

– typical but controlled– controlled setting, short time

• Observation: – Observe and record

• Keystroke logging, videotaping, screen capture, manual note‐taking…without interference

• Measure performance, e.g. keystrokes, elapsed time, errors

• Afterward:– (optional) Debrief:

• Via questionnaire, interview, and/or focus group, ask perceptions and opinions, e.g. ease of use

– Analyze data, diagnose problems, and recommend changes

Page 4: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Early Usability Testing

• Concerned with individual person/interface interaction• Enterprise‐level software – user has no choices; Large numbers of 

transactions• Rooted in cognitive psychology, design of psychological experiments• Emphasis on efficiency• Measures: keystrokes, time, error rates

Page 5: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Current usability tests

• Quantitative:– Measure system and user 

performance • E.g., # of errors, # of keystrokes 

needed, time required, recall/precision– Compare performance

• Design changes, alternatives

• Qualitative:  – In‐depth examination of individuals’

actions, problems, confusions, concerns –

– Possibly controlled tasks and setting– Somewhat controlled setting– Develop recommendations

Page 6: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Types of Usability Testing (Rubin)

• Exploratory: Testing prelim design concepts• Assessment: How well does this design work?

• Validation– Certify usability late in development– Does it meet standards, benchmarks

• Comparison: One design against another

Page 7: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Steps in usability testing

• Plan– Define goals!– Decide on scope of study– Design method

• Data collection • Measures• Users

• Set up study– Recruit users– Develop materials

• Conduct study• Record• Analyze findings• Report findings

Page 8: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Rooted in Experimental Design

• Principles– Validity

• Internal (to the test)• External (related to the real world)

– Reliability

• Methods– control for influence of factors other than what’s being tested

Page 9: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Ecological validity

Page 10: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Planning

• Know your purpose• Have a solid plan• Be flexible• Prepare test materials and setting

– Scripts for testers, monitors– Tasks, scenarios, etc.– Prototypes if needed– Equipment

• Pre‐test your testing plan• Pre‐test your analysis plan 

Page 11: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Pre‐test

Pre-test

Pre-test!

Page 12: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

EXAMPLES

• http://boltpeters.com/greenpeace/• http://boltpeters.com/aaa

Page 13: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Some testing methods

• Simple testing with experimenter present

–Probing

Page 14: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Some testing methods

• Simple testing with experimenter present

• Ditto but with unobtrusive observation– 1‐way mirrors (expensive; 

own/rent facilities)– Computer conferencing 

tools

Page 15: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Some testing methods

• Simple testing with experimenter present

• Ditto but with unobtrusive observation– 1‐way mirrors (expensive; 

own/rent facilities)

• Think aloud– More talking during the test – Paired think aloud

Page 16: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Some testing methods

• Simple testing with experimenter present

• Ditto but with unobtrusive observation– 1‐way mirrors (expensive; 

own/rent facilities)

• Think aloud– More talking during the test – Paired think aloud

• More relaxed observation and encouragement of participant comments during activity

Page 17: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability testing: sources of “error”

HistorySelection: ParticipantsMaturationTesting, test conditionsExperimenter TaskInstrumentationMortality

Page 18: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability testing: sources of “error”and how to control for them

Testing, test conditions: controlled and replicatedbe careful of the exeperience they gain during tests

Experimenter: stay neutral! work from a script

Task:  controlled tasks, matched tasks, change order of tasksInstrumentation:  e.g., always use same interfaces, browsers, 

survey forms etc etc.Mortality: avoid dropouts

Page 19: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability testing: sources of “error”and how to control for them

History: experiences, events unrelated to the treatment – try to minimizeSelection: Participants

Representative of usersRandomly assigned to treatment groups (if multiple)NOT professional testers Consider age, sex, experiencePeople who are not easily intimated

• Maturation: Time and learning ‐‐ people learn more about the task, develop better strategies – Caution about re‐using subjects– Varying order of activities, tasks, of system designs tested (if >1)– Short‐term studies– Longer‐term studies– Beware of fatigue, discouragement, boredom…

Page 20: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Cell Phone Test Participants

The participants were selected from the 41 subjects responding to the open online survey…. Nineteen participants were selected for the comparison test, and assigned to one of two groups. We generated matched groups that had approximately equal numbers of each gender and of varying experience with a Nokia TM cellular phone.

Page 21: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Travel Sites Test Participants

• 20 people – randomly assigned to one of 3 groups, one for each travel site

• Relevant characteristics:– Age  (80% 19‐25)– Sex: 60% male, 40% female– Ethnicity: 55% Caucasian (sic)– Internet experience: some (25%), significant (75%)– How frequently they visit different sites:

• Expedia 75%  visit sometimes or often• Travelocity 60%• Orbitz 60%

Page 22: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Cellphone test: repeated measures design

• From a cellphone study:– 2 designs– Participants randomly divided into 2 groups– 5 tasks

Group 1 cellphone A cellphone B

Group 2 cellphone B cellphone A

Page 23: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Compensate for maturation effect, familiarity with tasks

TasksGroup

1st set of tasks 2nd set of tasks

1 cellphone A cellphone B

2 cellphone B cellphone A

Page 24: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Compensate for maturation effect, familiarity with tasks

TasksGroup

Tasks X Tasks Y

1 cellphone A cellphone B

2 cellphone B cellphone A

Tasks Y Tasks X

3 cellphone A cellphone B

4 cellphone B cellphone A

Page 25: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Test Conditions

Page 26: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Tasks

Page 27: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Cell Phone Test

Tasks:1. Check received calls.2. Find the wireless Internet access.3. Find the option, “Welcome Note”.4. Turn on vibrating alert.5. Set the phone on the silent mode.

Measures:1. The time to complete each task.2. The number of attempts to complete each task.3. Task success rate.4. Number of and types of errors:

Observations and Comments: note when participants had difficulty, when an unusual behavior occurred, or when a cause of error became obvious.Noncritical Error: a participant made a mistake but was able to recover during thetask in the allotted attempts.Critical Error: a participant made a mistake and was unable to recover and complete the task successfully. 

Page 28: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Travel Site Test

Task: Identify a round‐trip flight on each site.Measure: Time required to identify a round trip flight on each site

Page 29: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Sources of measures, cont.

• Heuristics– E.g. “easy to use”: 

•How many times did users refer to “help,” act confused?

•How many screens did user have to look at to do X?•Ask users at end of test: “How easy…?”

Page 30: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Sources of measures

• Goals, concerns– The organization’s– The users’, discovered through exploratory methods– Developers’/evaluators, discovered in usability 

inspection• perceived problems

– e.g.,users report ‘too slow’• Perceived strengths

– faster than our competitors, fewer/more hits than our competitors…

Page 31: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Recording during test

• Tester observations and interpretations– Notes– User activity logging

• User activity recording or logging– Audio– video– Activity logging– Eye‐tracking

Page 32: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Data collection

• Observation and logging– Real time – video, screen capture– From tape

• Transcribe and index videos

• Automated logging• User interview, focus group• Review of tape with user• User questionnaire (before and/or after)• Collect data from users as soon as possible after 

the test

Page 33: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

What You Record

• Measures of performance– Participants– System

• Participants’ activities• Participants’ comments and opinions• Areas, activities where people have problems• Your observations, thoughts

– E.g., “several people seem to be confused at this point; they hesitate at this screen and then seem tentative in their choices”

– Value of having same people conduct multiple tests: the experimenter as part of the data collection assemblage

Page 34: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Automated data collection

• Logging clicks and other activities• Logging time• Eye‐tracking

Page 35: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Eye‐tracking

Page 36: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

• http://www.useit.com/alertbox/video_talkinghead_eyetrack.wmv

Page 37: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Efficiency: Eye‐tracking and web page

Performance measures such as click accuracy and time on task were supplemented with eye movements which allowed for an assessment of the processes that led to both the failures and the successes. Eye tracking helped diagnose errors and identify the better of the two designs (and the reasons for its superiority) when both were equally highly successful. The number of times the target was looked at and the number of fixations prior to the first fixation on the target provide information about the attention deployment stage of search (Did users see the target? Did they have trouble locating it?) and about the target processing stage (Did users have difficulties comprehending the target?).

Page 38: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Cell Phone Test Results

Page 39: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Cell Phone Test : Comparing errors

Page 40: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Travel sites time per task

Page 41: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Time x frequency of visit

Page 42: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Combining methods:some examples

• Interviews, focus groups, questionnaires (to find areas of concern) followed by testing (to explore areas of concern)

• Testing followed by questionnaires, interviews, or focus groups (to improve understanding of test findings)

• Testing based on heuristics (source of measures)

• Testing based on results of inspections and walk‐throughs (once possible problems are uncovered, use testing to examine more rigorously)

Page 43: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Benefits of Formal Usability Testing

Page 44: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Benefits of Formal Usability Testing

• Clear results• Clear feedback to designers• Added credibility in quantitatively‐oriented organizations, with quantitative professions  (e.g., engineers)

• Added credibility when client, designers can see and hear users

Page 45: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Limits to Usability Testing

Page 46: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Limits to Usability Testing

• Unrepresentative conditions• Unrepresentative tasks? 

– Limited kinds of tasks amenable to testing– Short time period

• Unrepresentative users?– Limited number, range of users– Often novice users  (e.g., for a new interface) – Testing effects: people do their ‘best’ when being observed

• Limited observation opportunity• Can get at certain kinds of information and not others

Page 47: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Remote Testing

Page 48: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Remote Testing

• Experimenter and subjects not in same place• Benefits: 

– Access to people who could not travel to your site– Cost– No need for special facilities– Often results in an easy record of test

• Problems: – Difficult observation and data collection– Lack of access too non‐verbal cues– Less interaction with participants– Less control over conditions– Technology isn’t as flexible as one might want

Page 49: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Software used for remote testing

•Webex (web conferencing software)• Ovostudios http://www.ovostudios.com/     Ovo Logger Freeware http://www.ovostudios.com/ovologger.asp

•Morae http://www.techsmith.com/morae.asp

• Noldus http://www.noldus.com/site/nav10000

• Ethnio http://ethnio.com/

Page 50: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

International Remote Testing 

•Access to users more varied than is possible otherwise

• All problems of remote testing in general PLUS the problems of crossing cultures

• Do with local experts!

• New: off‐shoring usability testing

Page 51: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Discount usability testing

• Very few users – to try to find big problems fast

• Expert evaluation – experts instead of users– Heuristics– Walk‐throughs

Page 52: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify
Page 53: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability testing: comparing kayak.comto expedia.com

• Users?– Frequent travelers, rare travelers– Heavy, moderate, light internet users– Previous use of online travel sites

• Tasks?– Searching for flights, priority to dates; to cost; with/without flex dates– Ditto hotels– Packages

• Design?– Equal and random distribution of user groups to groups 1 and 2

• Implementation– Group 1 searches site A then B– Group 2 searches B then A

• Interview or survey uses:– Pre‐”test”?  Attitudes toward, satisfaction with travel sites?– Interview/questionnaire about each site after its use?– Final interview/questionnaire/focus group?

Page 54: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Usability testing: comparing kayak.comto expedia.com

• Measures– Objective, quantitative:

• Characteristics of the sites: – Number of different kinds of transactions supported (e.g., flexible 

dates), Number of airlines (hotels, etc) searched, etc.– Added functionality (e.g., accepts city name OR airport code)

• Performance of sites– Speed with which display appears– Precision and recall of alternatives returned– Speed with which user receives and reviews results and makes a 

choice– User error rate

– Qualitative; users’ subjective assessment of • Quality of alternatives returned• Ease in performing a search• Ease in understanding results• Ease in making reservations

Page 55: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Some usability testing measures

• EFFECTIVENESS:– Study of voter machines: http://www.upassoc.org/upa_publications/jus/2007august/voting‐

machine.html

• The machine reliability appeared to be 100%, indicating that, within our sample, all votes that had been cast were correctly represented in the output of the voting machine.

• Regarding usability, 1.4% of the participants had cast the wrong vote using the 

voting machine. This percentage was similar to that of the paper ballot.

Page 56: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Efficiency: Eye‐tracking and web page

Twelve people completed three search tasks using each design. Performance measures such as click accuracy and time on task were supplemented with eye movementswhich allowed for an assessment of the processes that led to both the failures and the successes. Eye tracking helped diagnose errors and identify the better of the two designs (and the reasons for its superiority) when both were equally highly successful.

The number of times the target was looked at and the number of fixations prior to the first fixation on the target provide information about the attention deployment stage of search (Did users see the target? Did they have trouble locating it?) and about the target processing stage (Did users have difficulties comprehending the target?).

Page 57: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

http://www.upassoc.org/upa_publications/jus/2006_february/cellular_empirical_evaluation.pdf

Page 58: Usability Testing - Courses · Types of Usability Testing (Rubin) • Exploratory: Testing prelim design concepts • Assessment: How well does this design work? • Validation –Certify

Steps in usability testing

• Plan• Set up• Conduct• Record• Analyze• Report