Evaluation. formativeThere are many times throughout the lifecycle of a software development that a...

Evaluation

Evaluation

• There are many times throughout the lifecycle of a software development that a designer needs answers to questions that check whether his or her ideas match with those of the user(s).Such evaluation is known as formativeformative evaluation because it (hopefully) helps shape the product. User-centred design places a premium on formative evaluation methods.

• Summative Summative evaluation, in contrast, takes place after the product has been developed.

Context of Formative Evaluation

• Evaluation is concerned with gathering data about the usability usability of a design or product by a specificspecific group of users for a particularparticular activity within a definitedefinite environment or work context.

• Regardless of the type of evaluation it is important to consider– characteristics of the usersusers– types of activitiesactivities they will carry out

– environmentenvironment of the study (controlled laboratory? field study?)

– naturenature of the artefact or system being evaluated? (sketches? prototype? full system?)

Reasons for Evaluation

• UnderstandingUnderstanding the real world– particularly important during requirements gathering

• ComparingComparing designs– rarely are there options without alternatives

– valuable throughout the development process

• EngineeringEngineering towards a target– often expressed in the form of a metric

• Checking conformanceChecking conformance to a standard

Classification of Evaluation Methods

• Observation and Monitoring– data collection by note-taking, keyboard logging, video

capture

• Experimentation and Benchmarking– statement of hypothesis, control of variables

• Collecting users’ opinions– surveys, questionnaires, interviews

• Interpreting situated events

• Predicting usability

Observation and Monitoring - Direct Observation Protocol• Usually informal in field study, more formal in controlled

laboratories

• data collection by direct observation and note-taking– users in “natural” surroundings– “objectivity” may be compromised by point of view of

observer– users may behave differently while being watched

(Hawthorne Effect)– ethnographic, participatory approach is an alternative

Observation and Monitoring - Indirect Observation Protocol• data collection by remote note taking, keyboard

logging, video capture – Users need to be briefedbriefed fully; a policypolicy must be decided upon and

agreed about what to do if they get “stuck”; tasks must be justifiedjustified and prioritisedprioritised (easiest first)

– Video capture permits post-event “debriefing” and avoids Hawthorne effect (However, users may behave differently in unnatural environment)

– with data-logging vast amounts of low-level data collected; difficult and expensive to analyse

– interaction of variables may be more relevant than a single one (lack of context)

Experimentation and Benchmarking

• “Scientific” and “engineering” approach• utilises standard scientific investigation techniques• Selection of benchmarking criteria is critical…and

sometimes difficult (e.g., for OODBMS)• Control of variables, esp. user groups, may lead to

“artificial” experimental bases

Collecting User’s Opinions

• Surveys– critical mass and breadth of survey are critical

for statistical reliability– Sampling techniques need to be well-grounded

in theory and practice– Questions must be consistently formulated,

clear and not “lead” to specific answers

Collecting User’s Opinions - Verbal Protocol

• (Individual) Interviews– can be during or after user interaction

• during: immediate impressions are recorded

• during: may be distracting during complex tasks

• after: no distraction from task at hand

• after: may lead to misleading results (short-term memory loss, “history rewritten” etc.)

– can be “structured” or not• a structured interview is like a personal questionnaire -

prepared questions

Collecting Users Opinions• Questionnaires

– “open” (free form reply) or “closed” (answers “yes/no” or from a wider range of possible answers)

• latter is better for quantitative analysis

– important to use clear, comprehensive and unambiguous terminology, quantified where possible

• e.g., daily?, weekly?, monthly? Rather than “seldom”, “often” and there should always be a “never”

– Needs to allow for “negative” feedback

– All Form Fillin guidelines apply!

Relationship between Types of Evaluation and Reasons for

EvaluationObserving and Monitoring

Users’ Opinions

Experiments etc.

InterpretivePredictive

Understanding Real World

Comparing Designs

Engineering to target

Standards conformance

Y Y

Y

Y

Y

Y

Y Y

Y Y

Y

Y

Y

Y

Y Y

Y

YY

Y

Evaluation. formativeThere are many times throughout the lifecycle of a software development that a...

Documents

Transcript of Evaluation. formativeThere are many times throughout the lifecycle of a software development that a...