I213: User Interface Design & Development Marti Hearst March 5, 2007.

i213: User Interface Design & Development

Marti HearstMarch 5, 2007

Example Study

Sample study by White et al. 2002– Studying query-biased search results summaries

First did an informal assessment to determine responses to the state of the art– 6 participants– Compared AltaVista & Google

• Versions from 2001 or maybe 2000• Google’s summaries were query-biased, AltaVista’s weren’t• Ranking wasn’t as good then

– Findings:• Summaries were ambiguous and too short• First thing they saw was the hit count – discouraging• Had to scroll to see more than a few results• Main conclusion: the document summaries were not descriptive enough

Study Goals

Evaluate a new form of query-biased summaries for web searchHypothesis:– The presence of query-biased summaries will

improve search effectiveness

Experiment Design

Independent Variable:– Search Interface

• Levels:– Google, Google + summaries, AV, AV + summaries

– Task types• Levels: 4 different tasks

Dependent Variables:– Participant satisfaction– Task completion success– Task completion time

Blocking

Number of participants: 24Within-participants– They each use all 4 interfaces– They each do 4 tasks

They control for:– Effects of task

• (some harder than others)– Effects of order of exposure to system

• (seeing one can influence the effects of seeing the next)

They do not control for:– Order of task

Latin-Square Design

Start with an orderingRotate the order, moving one position per line

Latin Square Design

G

G+

A

G+

A

A+

A

A+

G

T1 T2 T3

6

6

6 withinpartic.design

A+

G

G+

A+ G G+ A

T4

6#

partic.Per

Cond.

Latin Square Design

Start with an orderingRotate the order, moving one position per lineNote that this doesn’t give you every possible ordering!– (e.g., don’t see AV right after G)– The hope is the outcome isn’t that sensitive to

ordering

Study Procedure

Participants came by, one at a time. Session lasted about 1.5 hoursProcedure:– Introductory orientation session– Background questionnaire– The following 4 steps for each task:

• Short training session with the new system• Receive a hard-copy task description• 10 minute search session• Post-search questionnaire

– Final questionnaire– Information discussion (optional)

Study Procedure

Data collection– Questionnaires

• 5 point Likert scales• 3 on task, 4 on search process, 4 on summaries

– Think-aloud– Automatic logging

• # docs returned• # summaries requested and returned• # results pages viewed• Time for each session

Questions Asked

Subjective Results

No interaction effects for taskAll groups preferred the enhanced summariesAll groups feel they benefited from the summariesG + enhanced summaries was significantly different from the rest on 3 out of 4 (relaxing, interesting, restful)– Except for easy/difficult, where there was no difference

System ranking in the final questionnaire:– 23 out of 24 choose AV+ or G+ as first or second– 19 choose AV+ and G+ as the top two

More Qualitative Results

Participants liked both styles of results summariesParticipants disliked– Scrolling– Moving the mouse to see enhanced summaries– Hiding the url in enhanced summaries– Not seeing query terms in context (AV)

Quantitative Results

Task time– Artificial cutoff of 10 minutes assigned even if task

not completed– Participants significantly faster with the enhanced

summaries– There was a slight correlation between system and

task completion, but not strong

Experiment Design Example: Marking Menus

Based onKurtenbach, Sellen, and Buxton, Some Articulartory and Cognitive Aspects of

“Marking Menus”, Graphics Interface ‘94, http://reality.sgi.com/gordo_tor/papers

Experiment Design Example: Marking Menus

Pie marking menus can reveal – the available options – the relationship between mark and command

1. User presses down with stylus2. Menu appears3. User marks the choice, an ink trail follows

Why Marking Menus?

Same movement for selecting command as for executing itSupporting markings with pie menus should help transition between novice and expertUseful for keyboardless devicesUseful for large screensPie menus have been shown to be faster than linear menus in certain situations

What do we want to know?

Are marking menus better than pie menus?– Do users have to see the menu?– Does leaving an “ink trail” make a difference?– Do people improve on these new menus as they

practice?

Related questions:– What, if any, are the effects of different input

devices?– What, if any, are the effects of different size menus?

Experiment Factors

Isolate the following factors (independent variables):

– Menu condition• exposed, hidden, hidden w/marks (E,H,M)

– Input device• mouse, stylus, track ball (M,S,T)

– Number of items in menu • 4,5,7,8,11,12 (note: both odd and even)

Response variables (dependent variables):– Response Time – Number of Errors

Experiment Hypotheses

Note these are stated in terms of the factors (independent variables)

1. Exposed menus will yield faster response times and lower error rates, but not when menu size is small

2. Response variables will monotonically increase with menu size for exposed menus

3. Response time will be sensitive to number of menu choices for hidden menus (familiar ones will be easier, e.g., 8 and 12)

4. Stylus better than Mouse better than Track ball

Experiment Hypotheses

5. Device performance is independent of menu type

6. Performance on hidden menus (both marking and hidden) will improve steadily across trials. Performance on exposed menus will remain constant.

Experiment Design

Participants– 36 right-handed people

• usually gender distribution is stated– considerable mouse experience– (almost) no trackball, stylus experience

Experiment Design

Task– Select target “slices” from a series of different pie

menus as quickly and accurately as possible• (a) exposed (b) hidden• Can move mouse to select, as long as butten held down

– Menus were simply numbered segments• (meaningful items would have longer learning times)

– Participants saw running scores• Shown grayed-out feedback about which selected• Lose points for wrong selection

Experiment Design

36 participantsOne between-subjects factor – Menu View Type

• Three levels: E, H, or M• (Exposed, Hidden, Marking)

Two within-subjects factors– Device Type

• Three levels: M, T, or S• (Mouse, Trackball, Stylus)

– Number of Menu Items• Six levels: 4, 5, 7, 8, 11, 12

How should we arrange these?

Experiment Design

E H M

12 12 12

Betweensubjectsdesign

How to arrange

thedevices?

Experiment Design

M

T

S

T

S

M

S

M

T

E H M

12 12 12

A LatinSquare

No row or

columnsharelabels

(Note: each of 12 participants does everything in one column)

Experiment Design

M

T

S

T

S

M

S

M

T

E H M

How toarrange

themenu sizes?

Block by sizethen

randomize the

blocks.

Experiment Design

M

T

S

T

S

M

S

M

T

E H M

5 11

12 8

7 4

Block by sizethen

randomize the

blocks.

(Note: the order of each set of menu size blockswill differ for each participant in each square)

Experiment Design

M

T

S

T

S

M

S

M

T

E H M

5 11

12 8

7 4

7 8

12 5

4 11

40 trials per block

(Note: these blocks will lookdifferent for each participant.)

Experiment Overall Results

Group Mean RT(s.d)

Mean Errors(s.d.)

Mean %Errors

Exposed 0.98 (.23) 0.64 (1.0) 1.6%

Hidden 1.10 (.31) 3.27 (3.57) 8.2%

Marking 1.10 (.31) 3.76 (3.67) 9.4%

So exposing menus is faster … or is it?Let’s factor things out more.

A Learning EffectWhen we graph over the number of trials, we finda difference between exposed and hidden menus.This suggests that participants may eventually becomefaster using marking menus (was hypothesized).A later study verified this.

Factoring to Expose InteractionsIncreasing menu size increases selection time and number of errors (was hypothesized).No differences across menu groups in terms of response time.That is, until we factor by menu size AND menu group– Then we see that menu size has interaction effects on Hidden

groups not seen in Exposed group– This was hypothesized (12 easier than 11)

Factoring to Expose Interactions

Stylus and mouse outperformed trackball (hypothesized)Stylus and mouse the same (not hypothesized)Initially, effect of input device did not interact with menu type– this is when comparing globally– BUT ...

More detailed analysis:– Compare both by menu type and device type– Stylus significantly faster with Marking group– Trackball significantly slower with Exposed group– Not hypothesized!

Average response time and errors as a function of device, menu size, and menu type.

Potential explanations:

Markings provide feedbackfor when stylus is pressedproperly.Ink trail is consistent withthe metaphor of using a pen.

Experiment Design

M

T

S

T

S

M

S

M

T

E H M

How can we tell if order in which the device appears has an effect on the final outcome?

Some evidence:There is no significant difference among devices in the Hidden group.Trackball was slowest and most error prone in all three cases.Still, there may be some hidden interactions, but unlikelyto be strong given the previous graph.

Statistical Tests

Need to test for statistical significance– This is a big area– Assuming a normal distribution:

• Students t-test to compare two variables• ANOVA to compare more than two variables

Summary

Formal studies can reveal detailed information but take extensive time/effortHuman participants entail special requirementsExperiment design involves– Factors, levels, participants, tasks, hypotheses– Important to consider which factors are likely to have real effects on the

results, and isolate theseAnalysis– Often need to involve a statistician to do it right– Need to determine statistical significance– Important to make plots and explore the data

Longitudinal Studies

Trace the use of an interface over timeDo people continue to use it, or drop it?How does people’s use change over time?

Longitudinal Studies

Dumais et al. 2003– Studied use of desktop search– Some people had sort by date as default, others had sort by

relevance as default– A number of people switched from relevance to date; few

went the other wayKaki 2005– Studied use of term-grouping search interface– People used the groups only for certain types of queries– People’s queries got shorter, since the interface could

disambiguate for them.

Followup Work

Hierarchical Markup Menu study

Followup WorkResults of use of marking menus over an extended period of time– two person extended study– participants became much faster using gestures without

viewing the menus

Followup WorkResults of use of marking menus over an extended period of time– participants temporarily returned to “novice” mode when they

had been away from the system for a while

Wizard of Oz Studies

(discussed briefly in Nielsen)Useful for simulating a smart program in order to get participant responsesExamples: Test out:– a speech interface– a question-answering interface

There is a man behind the curtain!

Discuss Jeffries et al.

Compared 4 Evaluation Techniques– Heuristic Evaluation– Software Guidelines– Cognitive Walkthroughs– Usability Testing

Findings?

I213: User Interface Design & Development Marti Hearst March 5, 2007.

Documents

Transcript of I213: User Interface Design & Development Marti Hearst March 5, 2007.