Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart...

Post on 03-Jan-2016

213 views 0 download

Tags:

Transcript of Karrie Karahalios, Eric Gilbert 6 April 2007 some slides courtesy of Brian Bailey and John Hart...

Karrie Karahalios, Eric Gilbert6 April 2007

some slides courtesy of Brian Bailey and John Hart

cs414empirical user studies

• Conduct user study to gain more precise measure of the usability of an interface or system

• Complements low-fidelity techniques

• Requires a larger investment than low-fi prototyping

• Provide positive experience for users!

Messages

In Context of Task-Centered UI Design

• Measure performance, error rate, learnability and retention, satisfaction, tolerable network delay…

• adapt to your particular interface and context

• Compare results to usability goals

• Identify usability issues and resolve them

Empirical User Studies

• Develop materials

• Prepare for the study

• Conduct the study

• Analyze results and iterate

• Learn from the experience

Overview of Doing Empirical User Studies

• Identify usability goals

• Develop experimental tasks and design

• Recruit users

• Instrument software/hardware

Prepare for the Study

• Identify questions you want answered

• questions should be specific and measurable

• Examples:

• can a user perform each task in < 30s?

• after only five minutes of instruction, can a user perform each task with < 2 errors?

• are users rating the interface at least a ‘3’ for overall satisfaction on a 5-point scale?

Identify Usability Goals

• Structure of experiment

• what will users do, in what order, where, etc.

• Between groups (randomly assigned to treatment groups)

• Control group

• Experimental group

• Within groups

• Each user performs under all conditions

• Order randomized

• Cheaper because it uses fewer participants

Develop Experimental Design

• What gets changed and what is its effect?

• Independent variables

• the variables you manipulatee.g. # of menu items, lighting conditions, mouse vs. keys

• Dependent variables

• measured parte.g. speed of menu choice, reaction time to stimuli

• Variable type matters

• discretecontinuous

Experimental Variables

• Typically want about 8 – 12 users

• depends on desired confidence in the results

• 12 is the magic number for the ANOVA test (more later)

• This could be the most challenging aspect of the study

• expect about a 0.1% to 10% response rate

• may need IRB approval, especially if you want to publish

• Give users a compelling reason to participate

Recruit Users

Demographic Diversity

• It is important to target your user population.

• example: if you are developing for Firefox, make sure that you use people already familiar with Firefox.

• Beyond that, it is also important to gain a diversity of different types of users:

• age• sex• education• occupation• ...

• can tell you important things about your system, and help you generalize

• Log performance and errors (if possible)

• Determined media capture needs

• ensure that you have access to equipment

• manage physical layout of the testing space

• Anything else that you need?

Instrument Software/Hardware

• Give user an overview of the study

• Introduce your system, allow for practice

• Have users work through the tasks

• Collect experimental measures (e.g., performance and error data)

• Fill out questionnaire, if any

• Debrief the user

• Entire session should last less than 60 minutes

Conduct the Study

• Purpose of the study, but not necessarily details of what you are testing

• What they will be doing (the tasks)

• They are not being tested, the interface/system is

• They can quit at anytime and will not affect relationship with you, the university, the company, etc.

• About the equipment in the room

• Whether their face and/or actions will be recorded

• How to think aloud (if you are collecting verbal data)

• If you will or will not be available to answer questions

• Their data will be viewed only in aggregate form

• How long the session will take

Tell the User At Least:

• Offer breaks at boundary points

• Offer to send results in aggregate form or allows users to see improved interface

• Develop understandable instructions

• Do not “defend” your interface

• Do not make subjective comments about users, ease or difficulty of tasks, etc.

Make Users Feel Comfortable

• Analyze data using statistical methods (ANOVAs and Chi-Squared tests common)

• take a stats course, e.g., Stat 320, for more detail

• did you meet the goals? How from the goals are you?

Analyze Results and Iterate

t-tests and ANOVAs

• t-tests compare two random samples and determine if the samples are statistically significantly different

• e.g., are dynamic menus better than static menus?

• ANOVAs (analysis of variance) compare n random samples and determine if the samples are statistically significantly different

• e.g., which is best: dynamic, static or radial menus?

• Both assume the samples come from normal distributions and both produce p-values.

• .

• Bell curve

• y = exp(-x2)

• Occurs from sum of independent events

• e.g. sum of dice rolls

• Total time = t-find + t-home + t-click

• Total # of errors

Normal DistributionsNormal Distributions

1

σ 2π

p-values

• probability value

• The probability that the difference you observe in an experiment is due to random chance

• An expression of the confidence of your result

• Typically, a difference is called statistically significant whenp < 0.05.

Partial eta-squared

• Some ANOVAs produce partial eta-squared values in addition to p-values.

• They are becoming widespread in HCI literature.

• You may see them soon in a usability report.

• Partial eta-squared values offer a practical measure of significance.

• Measure performance (time, error rate)

• Measure user satisfaction

• Give realistic experience of the interface

• realistic system response

• move among tasks seamlessly

• designers not in control, the user is

• Focus will be on the details

• most big issues should already be resolved

Advantages of Empirical User Studies

• Users typically must come to the lab

• makes it more difficult to recruit them

• users may have anxiety

• Large setup effort involved

• software instrumentation, hardware setup, questionnaire design, IRB approval, etc.

• Prototype may crash

Disadvantages of Empirical User Studies

An Example of How This Gets Used in Practice

• “The Impact of Delayed Visual Feedback on Collaborative Performance” by Darren Gergle, presented at CHI 06.

• What is the relationship between delayed visual feedback and collaboration? How much network delay can be tolerated?

• e.g, architectural planning, telesurgery and remote repair

The Collaborative Puzzle Task

• The experimental task was for a helper to guide a worker through a visual puzzle over a network connection

Independent Variables

• Only one: visual delay in the helper’s view window

• Delay sampled from this distribution [60 - 3300ms]:

• f(n) = Tn = Tn-1 * e.05 with T1 = 60

Dependent Variables

• Only one: task performance time

• Participants were asked to perform the puzzle task as quickly and accurately as possible.

Quantitative Analysis Using ANOVA

• “For delays between 60ms and 939ms, we found no evidence to indicate any impact of delayed visual feedback on task performance (SE = (2.87), F1,610 = .028, p = .87).”

• p > 0.05, so the samples are not significantly different

• “However, for delay rates between 939ms and 1798ms there is a significant impact on task performance (F1,610 = 13.57, p < .001).”

• Since p < 0.001, this result is highly significant

Graph of Delay vs. Performance