[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
1
D1.2 Usability Evaluation Methodology
Contract no AAL-2009-2-068
Start date of project 24/02/2012
Due date of deliverable M13
Completion date of deliverable March 2013
Lead partner for deliverable Université de Technologie
de Troyes (UTT)
Type of version xxx v5.0
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
2
NATURE OF THE DELIVERABLE
R Report X
P Prototype
D Demonstrator
Project co-funded by the European Commission within the AAL Program, call 2
Dissemination Level
PU Public
PP Restricted to other programme participants (including AALA) X
RE Restricted to a group specified by the consortium (including AALA)
CO Confidential, only for members of the consortium (including AALA)
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
3
Document History
Issue Date Version Change Made / Reason for this Issue
19/11/2012 1 General guideline to qualitative usability evaluation
14/12/2012 2 Adaptation of guideline to partners’ research interests
21/02/2013 3 Updated document after discussion and partners’ contribution
03/03/2013 4 Integrated methodology after partner’s input
08/03/2013 5 Integrated methodology document after internal review
Document Main Author Karine Lan (UTT)
Document signed off by Myriam Lewkowicz (UTT)
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
4
Table of contents
INTRODUCTION ..................................................................................................... 5
Scope and structure of the Methodology document .......................................................................... 5
1. IMPROVING USABILITY WHEN DESIGNING FOR THE ELDERLY .......................... 6
What is “usability”? ............................................................................................................................. 6
Usability norm ..................................................................................................................................... 7
WP6.1 and WP6.2-6.5: producing insights as part of an iterative design process ............................. 8
2. WP 6.1: PRELIMINARY USABILITY EVALUATION OF PAELIFE TECHNOLOGIES AND PROTOTYPES
.................................................................................................................... 10
Preliminary usability questioning around the prototype .................................................................. 11
Verbal protocol analysis ................................................................................................................ 11
User needs elicitation interviews .................................................................................................. 13
Focus group ................................................................................................................................... 14
Insights of WP6.1 ............................................................................................................................... 15
Precision note concerning roadmap for Deliverable 6.2 ............................................................... 15
3. WP6.2-6.5: FIELD TRIALS AND USABILITY EVALUATION ................................. 15
Phase 1: Usability tests of the first version of the PLA ...................................................................... 16
Usability tests ................................................................................................................................ 16
Focus group ................................................................................................................................... 18
Phase 2: Usability tests of the final version of the PLA ..................................................................... 18
One-month field trials ................................................................................................................... 18
Focus group ................................................................................................................................... 22
Insights of WP6.2-6.5 ........................................................................................................................ 22
CONCLUSIVE REMARKS ........................................................................................ 22
BIBLIOGRAPHY .................................................................................................... 23
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
5
Introduction
Society is getting older. In addition to changes in demographics, there has been an enormous change
in technological capabilities – how products work, look, act, and react to people who use them – and
in the potential of these technologies. The PaeLife project is based on the observation that
technology has an impact in all aspects of our life, particularly in the way we grow older. The
project’s main goal is to fight isolation and exclusion and to allow the elderly to be more productive,
independent and to have a more social and fulfilling life, by empowering these elderly users with a
Personal (Virtual) Life Assistant. The PLA would be a virtual presence which supports social
communication, learning and entertainment in an integrated way via an optimized interface.
PaeLife focuses on individuals who are recently retired and who are used to some level of technology
usage and who want to keep themselves active, productive and socially engaged. Although today’s
elderly people over 65 may show some resistance to the adoption of technology, tomorrow’s elderly
will have used technology in the last one or two decades of their lives. Therefore, more and more
consumers and users of technology are becoming part of the category “older adult”. Aging occurs on
many levels – (1) biological, (2) psychological, (3) social. Concerning the interaction with
technologies, aging brings changes in perception, cognition and control of movements (Fisk et al.,
2009). Therefore this change in demographics brings with it important changes in the demands for
products and services adapted to older adults, that fit into a context of use and that meet user
needs. The usability evaluation that will be achieved in the PaeLife project aims at producing
feedback and recommendations as part of an iterative design process (WP6.1 and WP6.2-6.5) to
ensure that the developed PLA meets these requirements.
Motivation
The first aim of this methodology's document is to be a general guideline to using both a qualitative
approach, supplemented by quantitative measures of performance and log file analysis, to do the
usability evaluation of the technology and the prototype (WP6.1) and the developed PLA application
(WP6.2-6.5). The second aim is to plan the timing of the different steps of usability evaluation as part
of the iterative design process. It shares the protocol that has been planned, following the
discussions and contributions of all the partners involved in the evaluation.
It has been unanimously agreed, during videoconference meeting discussions and consortium
meeting that each partner involved in evaluation and field trials will adapt the methodology
proposed in WP1.4 to its own context, to the specificities of the field, and to the resources and
competencies available in the team. Based on partners’ experience and research interests, a protocol
has been elaborated for the evaluation of the prototype and the field trials of the PLA. It appeared
absolutely necessary to distinguish the different phases in the design process, and to adapt the
methodology of evaluation / testing according to the needs and level of progression in the project.
Apart from the differences in the fieldwork techniques, we have selected different participants for
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
6
WP6.1 and WP6.2-6.5. The idea is to avoid introducing a bias in the experience of the first end users
of the PLA, if the latter would have tested the prototype, where the interaction sequences would
unavoidably be different due to the interface, if not the functions.
Goal of the deliverable This deliverable is the method document presented in WP1.4. This document is organized in three
parts. The first part lays the foundation of the importance of usability when designing technological
products and services for the elderly. The second part describes the protocol for WP6.1, where the
prototype will be evaluated in terms of usability and usefulness. The third part describes the protocol
and the schedule of the field trials that the PaeLife project partners are planning to use for WP6.2-
6.5.
1. Improving usability when designing for the elderly
A common stereotype of older adults is that they do not and will not use technology, which is far
from being true. Adults over 65 want to keep up with technology and take advantage of what a
technological world has to offer. Understanding new technologies makes them feel connected to
others and the world in general. When older adults reject technology, it tends to be due to not
perceiving a benefit of the technology, not necessarily because it is too difficult or time-consuming to
learn. The “not worth it” impression is more likely to be stressed by an unusable interface1 (Pak and
McLaughlin, 2011: 4). Indeed, there are two important ergonomics notions in designing HCI: usability
and interface. However, developing a usable product involves more than considering the user
interface.
What is “usability”?
A “usable” product is a product that is learnt naturally, that is easy and effective to use, and is not
unpleasant to the user. Usability is a quality attribute that assesses how easy user interfaces
(application, website, homepage, etc.,) are to use. Another important quality attribute is utility,
which refers to the design's functionality: Does it do what users need? For Nielsen (1994), usability
and utility are equally important and together determine whether something is useful. The author
gives clear definitions and explanations of these notions.
Utility = whether it provides the features you need.
Usability = how easy & pleasant these features are to use.
Useful = usability + utility.
1 Unusable’ or ‘usable’ are the adjectives of the concept usability
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
7
For Nielsen, “It matters little that something is easy if it's not what you want. It's also no good if the
system can hypothetically do what you want, but you can't make it happen because the user
interface is too difficult”. Therefore, concerning products and services designed for the elderly, the
“need of use” – or rather the benefits of use – must be made clear before older adults will voluntarily
adopt technology (Fisk et al., 2009).
To study a design's utility, you can use the same user research methods that improve usability. These
methods will be described in parts 2 and 3 of this report. From an ergonomics perspective, usability
can be measured using different techniques. The insights that are produced aim at improving users’
well-being (health, security, satisfaction, comfort…) and systems’ global efficiency (Baccino et al.,
2005: 15). Concerning the products designed for the elderly, apart from enhancing market
penetration, improved usability will improve quality of life and, with some classes of products, save
lives (Fisk et al., 2009). Therefore, the word "usability" also refers to methods for improving ease-of-
use during the design process (Nielsen, 1994). Among the several methods for studying usability, the
most basic and useful is what is called user testing or usability tests, which will be described below
and used mainly for WP6.2-6.5.
Usability norm
Usability has been defined in the ISO 92412 norm as “the extent to which a system, product or
service can be used by specified users to achieve specified goals with effectiveness, efficiency and
satisfaction in a specified context of use” (ISO 9241-11:2011). Effectiveness is the accuracy and
completeness with which users achieve certain goals. It can be measured by error rates or by the
actions achieved. Efficiency, which is the relation between (i) the accuracy and completeness
with which users achieve certain goals and (ii) the resources expended in achieving them.
Indicators of efficiency include task completion time and learning time. Satisfaction is the users'
comfort with and positive attitudes towards the use of the system. Users' (subjective) satisfaction
can be measured by attitude rating scales.
However, as specified in the ISO 9241 norm, the usability of an object is not a constant and frozen
parameter. Indeed, an object can only be defined as being usable related to a precise type of user –
who represents the main target whom it addresses – and related to a specific context (Baccino et al.,
2005). There are instances where increasing usability for some may reduce usability for others. For
example, making an interface usable for people with visual impairments could involve forcing
unnecessary audio on all users3 (Pak and McLaughlin, 2011). Therefore, measuring the usability when
designing for the elderly requires taking into account these parameters, which are
the context of use
the type of user.
2 This norm is the reference for contributing to standardization in WP7.1
3 However, though it may be unnecessary, the audio does not make the user interface necessarily less usable
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
8
The usability evaluation planned for the PaeLife project fit these requirements of adapting the
design. First, the methodology answers the PaeLife objectives, where users’ limitations and needs are
studied, considering their context (Human, Environment and Application). Second, while contributing
to the state-of-the-art in Speech Recognition (SR) systems, the usability evaluation and testing of the
PLA will take into account the state-of-the-art guidelines concerning the design for older adults4, that
is, considering the normal age-related declines in abilities. The type of qualitative and observational
analysis – mainly based on ethnography and video recordings – that is described in this document
and that will be achieved in WP 6.1 and WP 6.2-6.5, will hopefully enable the identification of:
• information needs;
• visual and auditory requirements;
• demands for focused attention and for retaining information in memory;
• the time necessary to react to signals;
• physical requirements.
Ideally identifying these specific needs and requirements and general usability “problems” will be the
basis to improve the usability of the interface. Interviews will supplement the insights in order to
improve the usefulness of the functions and services of the PLA.
WP6.1 and WP6.2-6.5:
producing insights as part of an iterative design process
WP6.1 is the preliminary usability evaluation of PaeLife technologies and prototypes. At this stage,
based on the discussions among the project partners concerning the services that will be provided by
the developed PLA, the current LHC5 prototype and the developed application will be sensibly
different. These differences concern the interface, the devices as well as the functions. Since
evaluating a prototype or an implemented device have different objectives, we have selected the
techniques that are the most coherent at this stage, depending on the number of prototypes
available and the time available before the implemented PLA (WP5) is available for field trials in
4 We draw on two books published in the “Human Factors & Aging Series” in 2009 and 2011 :
- Fisk A.D., et al., (2009), Designing for Older Adults: Principles and Creative Human Factors Approaches (second
edition), London and New York, CRC Press
- Pak R. and McLaughlin, A., (2011), Designing displays for older adults, Boca Raton, London and New York, CRC
Press
5 LHC is Living Home Center, an application developed by Microsoft prior to the PaeLife project, and which will
evolve and be improved as part of the PaeLife project. The application developed as part of the PaeLife project
will be referred to in this document as the PLA.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
9
WP6.2-6.5. Based on the availability of the text of the graphical interface after translation and
implementation, in French, Hungarian and Polish, the preliminary usability evaluation will start in
April 2013. The first insights to inform the development in WP5 for the second iteration will hopefully
be available by the end of May 2013. Testing the LHC prototype will result in preparatory user
feedbacks about the quality and usability of speech and gesture modalities, the proposed mixed HCI
interface and the PLA services. This feedback will be used in WP2, WP3, WP4 to adjust the proposed
techniques. In this methodology we focus on preliminary usability evaluation of the prototype,
which is connected to D6.4, while the exact methodologies for voice talent selection, and technology
testing (D6.1-D6.3) must be adapted during the execution of WP6 considering the statements of the
present document.
WP6.2-6.5 will focus on the field trials and usability evaluation of the pilot application. The planning
for the availability of the application – produced by WP5 – has been decided by the consortium:
September 2013 – availability of the first version of the application
October 2013 – availability of the final application after iterative improvement informed by
WP6
The evaluation tasks in WP6.2-6.5 will immediately follow the availability of the first version or final
version of the application. The first stage will immediately follow the availability of the first version of
the application – September 2013 – and will consist of usability tests and a focus group. The second
stage is the evaluation of the final application, which will be available in October 2013 and will
consist of a one-month field trial at the elderly participants’ place and a focus group.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
10
Figure 1 : Timing and organization of Usability evaluation of WP6
The methods for each stage have been chosen according to their coherence with the level of
progression in the iterative design process. Together with the preliminary usability questioning
achieved in WP6.1, the two stages in WP6.2-6.5 brings the total number of tests to three, which is
considered as a good number of iterations (Nielsen, 1993). The planning and protocol of the different
tasks of WP6 are described below, in parts 2 and 3 of this deliverable.
2. WP 6.1: Preliminary usability evaluation of PaeLife technologies
and prototypes
An important question – both methodological and organisational – has been raised and discussed by
the project partners: How are the insights in WP6.1 going to inform the development and
implementation of the technologies (WP2-4) and the application (WP5), so that there is a
progression between WP6.1 and WP6.2-6-5? It appeared absolutely necessary to create clear bridges
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
11
between WP6.1 + WP6.2-6.5 and WP2-5 in terms of planning and coordination, so that the insights of
the evaluation WPs could efficiently inform the implementation WPs. Through this coordination and
collaboration in between WPs, it would be possible to make the most of (i) the prototype as an
artefact for reflective practice and (ii) user participation in the iterative design approach. In order to
achieve this efficiency, partners doing the evaluation and the partners doing the development have
defined a working method and will collaboratively define the formalization of the insights, so as to
transfer the knowledge of the first prototype testing and usefully inform the development in WP2-5.
Preliminary usability questioning around the prototype
Adopting the view that a technology’s adoption and success relies on its usefulness and not just on
the usability of the interface, WP6.1 will consist of what we would call “preliminary usability
questioning” and qualitative interviews. It will involve the participation of three to five users,
depending on time available. In other words, though the form may resemble simple usability tests,
the focus will not be about testing specifically the prototype’s interface in terms of usability. The
objective is to use the prototype to question usability issues in a general way, and be the basis to
make contextual interviews to elicit user needs.
Verbal protocol analysis
The preliminary usability questioning will, like usability tests, involve users experiencing certain
aspects of the prototype and solving tasks with it. They will be organized around a technique which is
called think aloud testing or verbal protocol analysis. The slight difference between the two terms is
that the first one is used in an experimental framework from the interface of a developed product or
prototype that is being tested, and the second in a more “natural” activity situation. However, the
principle is the same: it involves having participants performing a task or set of tasks and verbalizing
their thoughts (“talking aloud”) while doing so. This technique is based on users' spoken comments,
where they verbalise how they use the system, explaining what they are trying to do and the type of
problems they experience.
The basic assumption of verbal protocol analysis is that when people talk aloud while performing a
task, the verbal stream can be taken as a reflection of the cognitive processes in use. Concerning
usability and usefulness issues, verbal protocols can reveal information about misconceptions and
conceptual change, strategy acquisition, use, and mastery, task performance, affective response, and
the like. Verbal protocols are useful at any stage in the development lifecycle. During evaluation, it
allows to:
1. determine how effective the device is, as in the case of the prototype, as an integrated social
communication tool.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
12
2. assess the user’s learning and performance. Indeed, verbal protocols can provide valuable
information about the user’s cognitive processes, beyond simple measures of accuracy and
time on task.
However, we will approach the protocol with only a general idea in WP6.1, without implementing an
effective coding scheme that maps to the cognitive processes of interest.
The type of verbal protocol analysis that is planned for WP 6.1 is concurrent verbal protocol, that is,
delivered at the same time as the participant performs the task, and are ideally unprompted by the
experimenter. It should be distinguished from the retrospective verbal protocols, which will be used
mainly at a later stage, in WP6.2-6.5 (cf. part 3 of this document). However, depending on the
difficulty of the task to be achieved - “difficulty” linked mainly to users’ competencies and abilities to
use the LHC prototype, and that will be revealed in situ - we might prefer to use retrospective verbal
protocols. Indeed, the think aloud protocol concurrent to the performance of the task can impose an
additional task-load on the user. This load can alter the way the user performs the task.
However, when using the retrospective protocol, in order to avoid a bias because it relies upon the
user's memory after running the task, we will prefer the self-confrontation interview. In this protocol,
users are asked to comment on their actions in front of the video recording after the system use.
Thus, for retrospective protocols, video recording will be preferred, whereas for concurrent protocol,
the verbal protocols may be collected by direct note taking, by simple audio recording or by video
recording. In every video recording utilization, the participants will be asked to sign a declaration of
consent.
Video recording has a major advantage: the phenomena are saved – the user’s non-verbal behaviour,
visual landmarks regarding the different states of the interface. This allows the researcher to relate
much more easily the user’s verbalisations to the actions that he/she has achieved. It is also possible,
for the internet applications, to save the logs linked to the browsing that has been done. Data
collection is rapid, because very few special arrangements need to be on-site, and data analysis will
usually be conducted off-site. The data analysis is much longer, and one disadvantage of this method
is that it is time consuming to analyse audio and videotapes afterwards.
This type of think aloud testing will allow the identification of usability problems with the prototype’s
interface. But the main focus of attention is not on the usability of the interface, but on usefulness in
a more global sense. Thus, the task analysis – especially the authentic user tasks – aims at (i) gaining
an understanding of the users' perception of the usefulness of the functionalities, and (ii) identifying
user needs. They allow an observation of the user's actions in a given situation and context. This is
the reason why, even though the activity achieved by the user is somehow “artificially provoked”,
the tests and interviews will be made at the user’s place. Indeed, for both practical and
methodological6 reasons, the usability tests and interviews will be done at the users' homes or at the
6 The practical reason is that none of the universities part of the Consortium is equipped with a usability lab.
The methodological reason is that we are not favorable to experimental situations (though we are conscious
that task assignments are somehow artificial) completely uncorrelated from ordinary practice, since our
research interest is, not only on usability issues, but on “usage” in a more broader sense.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
13
elderly association to which they belong, i.e., a familiar place, where they conduct ordinary day-to-
day activities. However, care must be taken to avoid interruptions – phone calls, people coming in
the room etc. The think aloud testing will last a maximum of one hour. These think aloud testing
around the prototype will lead to the interviews, which will take place during the same
morning/afternoon session, after a short break. The objective of these qualitative interviews is to
elicit and understand user needs and the users’ context of use of the technologies they already use at
their home as part of their daily life.
User needs elicitation interviews
The individual in-depth interviews will be semi-directive, lasting from one to two hours, and should
be recorded and partially transcribed7. These interviews will focus on the current uses of ICTs, the
careers of the users (the different technologies that they have used so far), how they have learnt to
use these technologies, and how their use of technology is embedded in their way of life, particularly
the way in which they relate to their friends and family. The interviews will also include a session of
practical observation of the real use of ICTs that are owned and currently used in the day-to-day
home environment. For example, if users describe the way in which they use their smartphone, they
will be asked them to demonstrate this using the verbal protocol analysis.
Interviews and observations are both qualitative methods that will enable precise descriptions to be
obtained on current practices, and thus provide a full vision of the context in which users could use
the PLA developed in the project. This understanding of the context of use of the ICTs will be
extremely useful for the analysis of users’ needs that will be completed with: (i) the validation of the
functionalities (following the first focus group) and (ii) information from previously published studies
on the use of ICTs by the elderly.
The objective of these interviews is to grasp a broad understanding of the elderly users’ way of life,
so that the utility of the prototype and its functionalities can be examined as part of a research
interest on social usage. These interviews will unable to grasp the context of use and thus,
supplement the insights about user requirements provided by WP1 (cf. D1.1), answering a gap
pointed by WP1, where “This discrepancy between the observation results and questionnaire data
may suggest that the users rated how satisfied they were with themselves when completing the task,
rather than how satisfying the task itself was. There is not enough data, however, to verify the
suspicion, hence it would need further investigation.” The argument based on the presumption that,
although the completion of a particular task may be fun, the user himself may not be very pleased
with his/her results when completing the task, which affect the final rating. This suggests including
two different questions during the future interviews:
1. How satisfying was the task?
7 The transcriptions constitute a first level of analysis of what has been said during the interview. As a
document, it has the advantage of easily being shared between members of the research team, and verbatim
quotations are efficient means to integrate the user’s perspective in research reports.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
14
2. How pleased are you with completing the task?
Also, the interviews will iteratively validate the choice of devices the consortium made, based on the
insights of WP1 concerning user’s preference. The qualitative approach proposed in WP6 will focus
on the details and the context of use, and investigate, in real-time, the degree of satisfaction when
using the prototype. These individual in-depth interviews will be supplemented by focus groups.
Focus group
Focus groups are planned at different moments of the project, to explore different research/design
questions, which will be clearly identified once the different versions of the developed PLA is
available. The total number of focus groups planned is 3, but may vary according to the research
questions that will reflexively emerge out of the preliminary usability questioning (WP6.1), or to the
time available when the developed PLA is available for phases 1 and 2 of WP6.2-6.5, and the end
date of the project. For WP6.1, there will be one focus group.
As name suggests, a Focus Group is a group activity whose objective is to focus on specific aspects of
a study matter. The technique consists in creating small user groups part of the target of the product,
who are made to interact around a relevant subject. This technique can be used both in the design
phase and in the evaluation phase. In the design phase, the objective of the focus group is to collect
information about user characteristics and needs. In the evaluation phase, it is possible to bring
participants to talk about the prototype and gather feedback, whether positive or negative. Unlike
other techniques, e.g. usability tests, focus groups do not allow to gather objective information on
the efficiency. It allows to collect the users’ subjective impressions about the device easiness of use
and learning etc.
Independent of the schedule of the focus groups, the idea is to have 10 participants for each focus
group. In each focus group involving a total of 10 participants, 2 or 3 end users will be automatically
integrated. When participating in the focus group, these participants will have already done the
preliminary usability questioning (WP6.1), the usability tests (WP6.2-6.5 phase 1) or field trials
(WP6.2-6.5 phase2). From this point of view, these users will have a different status and role to play
compared to the 8 other users, as it will be the evaluation of their own experiences that will have
enabled (i) the identification of a range of existing usages, and also to refine (ii) the list of usability
problems with the LHC prototype (identified in WP6.1) as well as (iii) the list of the potential
functionalities of the system. These users will effectively become mediators between the researchers
and the other users, and it is notably this characteristic or ability that have been taken into account
when selecting the users: These users will have already been involved doing the questionnaires and
the workshops (WP1) and will be involved in the field trials.
The topic of the focus group for WP6.1 will be the usability of different devices, focusing on the LHC
prototype, and the perceived usefulness of these existing technologies in their daily life. The idea is
that users discuss about the devices (that they would have tested during the workshop of WP1 and
which they will be able to manipulate) in a general way, to try the LHC and discuss it. The objective is
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
15
to test/confirm the insights collected during the preliminary usability questioning based on the LHC
prototype and the individual interviews.
Insights of WP6.1
It is expected that the WP6.1 produce interesting insights concerning:
the usefulness of the functions and modalities present in the LHC prototype;
usability issues and guidelines for the design of interface for older adults;
communication habits and way of life of the elderly and how they may be supported by the
PLA;
prioritization of features in a prototype.
These insights will be formalized in very precise requirements8 so that they can readily be used for
the development in WP5.
Precision note concerning roadmap for Deliverable 6.2
The deliverable D6.2 “Preliminary Usability Evaluation of Multimodal HCI” cannot be produced at this
stage in the project. Our aim is to do this evaluation with multimodal interaction created using
technologies developed in the framework of the PaeLife project, and not existing ones. As part of the
modalities is delayed (ex: touch), and only recently a first very simple proof-of-concept prototype
was created, we will proceed with this evaluation in the near future.
3. WP6.2-6.5: Field trials and usability evaluation
WP 6.2 aims at planning the end-user field trials in the countries part of the consortium, so that
usability evaluation can be executed in the respective countries: WP6.3 – Hungary, WP6.4 – France,
WP6.5 – Poland.
8 Partners have agreed to collaboratively discuss the best way to formalize these insights once they will have
been produced.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
16
Phase 1: Usability tests of the first version of the PLA
Usability tests
Usability testing is a technique used to evaluate a product by testing it with representative users. The
goal is to identify usability problems, collect quantitative data on participants' performance (e.g.,
time on task, error rates), and determine participant's satisfaction with the product. Simple usability
tests, where users think out loud are cheap, robust, flexible, and easy to learn. They are qualitative in
nature. Thinking aloud may be the single most valuable usability engineering method (Nielsen, 1993).
In the test, these users will try to complete typical tasks. Doing so, the user identifies and describes
usability problems while using the system that is being tested, that is, he/she comments his/her
actions and what is seen on the screen, in real-time. A researcher-observer is present and prompts
the user to continue talking, watches, listens and takes notes. Some partners in the PaeLife will use
video recording of the activities, screen content and user's comments. These data will be used for
later analysis to identify usability problems. The earlier those problems are found and fixed, the less
expensive they are. The identified problems will be formalized in a list, organized by degree of
severity, and will include a detailed description of each usability problem.
Studies on usability tests (Lindgaard and Chattratichart, 2007) show no significant correlation
between the number of users and the number of severe problems identified. Nielsen recommends
having tests with a maximum of 5 users and summarizes as such: “Elaborate usability tests are a
waste of resources. The best results come from testing no more than 5 users and running as many
small tests as you can afford.” This will be the position that we will adopt in order to make a more in-
depth qualitative analysis of each test. Indeed, the collection of data from a single test user brings a
lot of insights – almost a third of all there is to know about the usability of the design. With a second
user, there is some overlap in what you learn, but since people are different, the second user adds
some amount of new insight. The third user will do many things that were already observed with the
first user or with the second user and even some things that have been identified already twice. But
still, the third user will generate a small amount of new data, even if not as much as the first and the
second user did. After the fifth user, it is a waste of time to observe the same findings repeatedly but
not learning much new (Nielsen, 2000). Therefore, for both methodological and practical reasons,
the usability tests will involve 3 or 4 participants9 when the research interest is in understanding the
usability problems in detail, taking into account the context in which they happen. However, when
the research interest is more focused on measuring the number of errors and task completion rate,
through the gathering of metrics to be able to do statistics, a greater number of users may be
needed. France will adopt the first approach, Poland and Hungary the second approach.
Even though there is no correlation between the number of users and number of severe problems
identified, there is a significant correlation between the number of tasks and the number of
problems identified: Higher task coverage causes a higher number of usability problems to be
9 The total amount of participants for WP6 is 10 in each country (they are the same having done the workshops
in WP1). WP6.2-6.5 will be the last evaluation of the final application based on field trials and will involve 6
participants (see below).
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
17
identified. Therefore, knowledge on the usage domain will be mobilized so as to identify as many
tasks as possible, with a clear goal analysis so as to minimize variation among the usability problems
identified. The tasks will be identified by the research team after installing the first version of the
application on the tablet, and going through and testing the different functions and modalities,
including, if possible, the speech recognition.
The usability tests are based on:
1. predefined task assignments
2. users' own authentic tasks.
It is useful to combine both types of tasks, because each type allows different types of insights to
be gained. On the one hand, predefined tasks allow the research teams to deal with specific
usability problems and thus, to collect useful information concerning the interface and the users'
difficulties or preferred ways of interacting with the device. These exploratory tests keep the
users within the limits of specific areas of interest, which have been clearly identified at the level
of the interface. On the other hand, an authentic system use in non-task based conditions allows
an identification of a varied set of problems, since other parts of the system are explored, and is
not restricted to usability issues.
The tasks – whether predefined or authentic – are realized after users have been explained the
different functions. The instructions given to users concerning usability issues that they have to
report will be both deductive – conceptually explaining what is a usability problem before doing the
activity – and inductive – the user discovers the issue by him/her-self after being given examples.
This combination of deductive and inductive instructions will hopefully allow users to identify more
problems than when using deductive instructions only, as it caters for the difference in users'
preference.
Apart from allowing the identification of usability problems, task assignments – especially the
authentic user tasks – aim to:
measure the efficiency and performance through task times;
gain an understanding of the users' perception of the utility/usefulness of the functionalities;
identify user needs.
Therefore, it is clear that our conception of usability tests is more qualitative than quantitative: The
aim is not just to see whether the user has succeeded in achieving a given task and measure how
much time has been necessary. The usability tests allow an observation of the user's actions in a
given situation and context, even though this situation is somehow “artificially provoked”. The video
recordings will serve to analyse (i) the metric data concerning the tasks, and, perhaps more
importantly from our research perspective, (ii) the situated action of the user, drawing on a
qualitative approach.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
18
Focus group
A focus group will be organized following the usability tests, involving the 3-4 users per country who
have participated in the usability tests over a total of 10 participants. The aim will be to discuss the
usability problems that will have been identified – why they are ‘problems’ – and generate ideas
about ways to fix it. It will be organized in two ways that will supplement each other:
Group interviewing – every participant is invited to talk with the intervention of a moderator
Small group activities – e.g. to test, or at least try the application, either in autonomy or with
the presence of a moderator, before discussing it in the group interview
This subjective evaluation of participants’ perception will supplement the metric and qualitative data
collected during the usability tests.
Phase 2: Usability tests of the final version of the PLA
One-month field trials
The field trials will be based mainly on observational methods. These methods involve an investigator
viewing users as they work in a field study, and taking notes of, or video recording the activity that
takes place. Observation may be either direct, where the investigator is actually present during the
task – what is known as ethnography – or indirect, where the task is viewed a posteriori by some
other means such as through the use of a video recorder. The method is useful early in user
requirements specification for obtaining qualitative data. It is also useful for studying currently
executed tasks and processes. As explained above, observational methods in field studies are
qualitative in nature. However, not all the partners have the same scientific interests and background
for qualitative approach. We have therefore agreed, after fruitful discussions, that France (UTT, who
leads these tasks) would adopt observational-qualitative methods, based mainly on ethnography and
video analysis, and that Hungary and Poland would adopt a more quantitative approach, using log
files analysis. Despite the difference in the methods of analysis, the timing and protocol of the field
trials will be the same for all the partners involved in this WP, so that we will be able to cross the
results more efficiently. We are convinced of the interest of complementing qualitative and
quantitative analyses for the field trials to obtain global insights.
Based on (i) the resources available – both human (1 field researcher in each country) and material
(2 video recording devices for France, and most probably 2 devices for each country because of cost
reasons), and (ii) the amount of log data that we think will be necessary, we are planning the field
trials with two end users in parallel. The field trials with each end user will last one month, as
specified in the proposal. Assuming that the final application is actually available in October 2013, the
time available will be three months. This will make a total of 6 users participating in the field trials.
During this month, the users will keep a media diary. The idea that had at first been considered was
to keep a written media diary or an audio media diary for self-reporting, where users write down
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
19
their comments, suggestions and ideas. Instead we have chosen a “videoconference media diary”,
where, instead of writing, the user discusses his/her use of the PLA and usability problems with the
researcher during online sessions precisely using the PLA. Indeed, a videoconference media diary has
a double advantage. First, data collection is not restricted because users are unwilling to write, or
forget to keep the media diary. Second, since the research object is precisely about ICT-mediated
social interaction, users will indeed use this technology to communicate, in a motivated but not
“artificial” way. Since this type of diary keeping may be intrusive, the regularity with which the
researcher will contact the user needs to be determined with the users themselves. However, in case
there are users who are categorically against regular video conference sessions with the researcher,
the written media diary will be used instead.
Before the beginning of the each one-month field trial, we are planning to organize a meeting with
the two test participants. The objective is to inform them about the objectives of field trials and the
whole scenario. This can be an opportunity to show them the PLA and collect opinions in a less
formal environment than a focus group, and eventually train them in using the PLA. The users will
also have a presentation of the functions of the PLA and the tasks that they will prescriptively asked
to achieve during the field trials. These scenarios will be defined when the final application will be
ready, but, for example, the prescribed tasks will be, in week 1 to write x emails, appoint meeting
and write it down in calendar; in week 2 organize skype teleconference and so on.
This one-month field trial will be organized around stepmarks every week.
Visit 1: the PLA is installed at the end user's place. Comments of appreciation or difficulties are
recorded (either by note-taking or video recording), like in a usability test, but in a less formal way
with no predefined tasks. The researcher can assist the user, depending on personality, level of
competence with technologies, etc. to optimize user's confidence in the use of the PLA.
Week 1: the end user is left with the device, and can familiarize with the device, using it at his/her
own pace.
Visit 2: At the beginning of the second week, the researchers go the user's place. A short interview is
made to collect comments on their use during the first week (to supplement the impressions,
difficulties, satisfaction, etc., in the case of a written media diary being kept instead of a Skype media
diary). In France, a simple-to-use video recording device is installed at the French users’ place. The
researcher explains to the user how to switch on the recording, and the reason for recording. In
Poland and Hungary, the users are informed of the fact that the log data will be saved, even though
data acquisition is completely transparent and anonymised for them.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
20
Week 2: Like week 1, the user is left with the device and can use it freely in the usual home
environment and context. He/she continues keeping the media diary, either in a written way or
through regular skype sessions with the researcher. The difference with week 1 (for French users) is
that whenever the user uses the PLA, he/she video-records his activity. The user switches off the
recording when he/she finishes using the PLA. A methodological precision that seems important here
is that there should not happen any “mise-en-scène” (stage setting): the user is not supposed to
comment what he/she is doing in the video, or do special tasks for the needs of the recording.
He/she simply turns the recording on, and acts “normally”, i.e. in a usual and natural way. The video
recordings will then be analysed by the French social scientists and used for self-confrontation
interviews.
The log files tracking the end-users activities will be collected automatically by the system, and will
serve as the basis for the analysis of the tasks achieved by the Hungarian and Polish users. We will
collect information from internet service providers about data transmission and dates of using the
prototype. On that basis we will build usage profile for every end user and present statistic data for
all users. Log file analysis (that is the files that tracks the activity of the user through the interface) is
used to study the behaviour, identify strategies that are most often used to browse, to identify the
mistakes that are frequently made by users. Log files are a list of tasks the server actually completed,
that corresponds to the server requests and responses.
All access and error information coming from the PLA application will be logged. Partners from WP5
and WP6 have come to an agreement concerning the level of detail that the system will allow to
track log activity. This information will concern several usability aspects:
Response time: page to be displayed and requests (when a user presses a button and the
action/function is finalized). This information will allow to know if everything is working and
if the time response is reasonable.
Problems experienced while using the portal. This information will be relevant to track and
fix any problems. All abnormal/errors situations will be logged in the application to display a
proper message.
Errors in forms: empty fields, mandatory information missing.
Data transfer: this information allows the evaluation of whether the requests/responses
have the appropriate format.
Data transmission statistics will be anonymous due to regulations on personal data protection and
privacy issues.
The interest of combining a qualitative approach (France) and a quantitative approach (Hungary and
Poland) is that the integrated analysis will have a holistic understanding of how the PLA is used,
gaining insights on both context of use through observational analysis of usage in the home, and
patterns of use and performance indicators like average task time.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
21
Visit 3: the researcher (France) takes back the recording device to collect the video data. Hungarian
and Polish researchers can collect the first log files at the same moment. It may be a good
opportunity at this stage, in case it is written, to check whether the media diary is regularly kept so
that it can provide useful information. At this stage, the user may be encouraged to make
suggestions concerning usability issues as well as functionalities. Most of all, the interviews may
focus on usage, that is, aim at understanding the context of use of the PLA, exploring the ways in
which the PLA is embedded in the user's way of life, i.e. how it acts as an “assistant” and not just as a
device that is occasionally used.
Week 3: the user continues to use the device, perhaps in a more critical way.
Week 3: this week is used by the researchers to, either (for France) watch the video recording, select
extracts that will be shown to the user and prepare the self-confrontation interviews, or (Hungary
and Poland) do a first analysis of the log files collected so far.
Visit 4: Self-confrontation interviews (France), drawing on retrospective verbal protocols. The users
are shown video recording extracts of their own activity, and asked to explain what happened, what
is the task they were trying to achieve, what were their difficulties, were they experiencing
satisfaction or frustration, etc., The objective of showing videos is that the user cannot remember
everything he/she has done, and therefore provide more interesting insights than a posteriori
interviews. Also, instead of having a researcher interpreting the user's actions from an exterior
perspective, video analysis is used differently, where the user him/her-self explains and interprets
his/her past actions.
Week 4: last week of field trials
The user keeps using the PLA at his/her home, so that the field trials can last a whole month, allowing
enough time for the user to have the feeling he/she has tested all the functionalities and understand
how the PLA can be integrated in his/her daily life, and support his/her activities (or not).
Visit 5: final debriefing interview of 1-month trial. It is not possible or desirable to determine exactly
what will/should be the content of this final interview a posteriori. The topics that will be talked
about and the importance of them, will depend on the phenomena that will have emerged of the
one-month trial.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
22
Focus group
A final focus group will be organized, depending on the time available between the moment insights
are produced from the analyses of the field trials and the end of the project. It will act as a
debriefing/summing up session, where researchers will be able to confirm the phenomena that will
have emerged of WP6 as a whole. All end users who take part in a test will share their insights and
comment the field trial tests. The results of the meeting will also be included in a final
documentation.
Insights of WP6.2-6.5
The insights of the usability tests, field trials and focus group will produce analyses that will be
formalized according to three aspects – observation, interpretation and recommendation – in D6.5
and D6.6. The recommendations that will be proposed will be argued and based on concrete
examples; the specifications will be written in a way that will be as clear as possible for design teams,
in order to be useful for the final iteration.
Conclusive remarks
This Evaluation Methodology document, which is the deliverable D1.2, aims at being the basis for
sharing an integrated methodology and protocol for WP6.1 and WP6.2-6.5, that is unanimously
agreed upon by all the partners. The objective, that we believe is practically possible, is to share each
other's results, and collaborate to make a global analysis of the qualitative and quantitative data that
will be collected for the evaluation part.
Having explained the approach adopted in WP6, the need for coordination reveals to be even more
important. It has been agreed that partners will be efficient in making the bridge between the
different WPs, for example to make use of the insights of the evaluation of the prototype inWP6.1 –
in terms of usability of the interface and usefulness of functions – before finalizing the
implementation of the device. We believe that the iterative design process which PaeLife has
adopted will succeed in significantly improving the usability and usefulness of the PLA and follow the
guidelines for designing for older adults, so that it fits the needs and context of use of the elderly
people.
[PaeLife: Personal Assistant to Enhance the Social Life of the Seniors, Contract nº AAL-2009-2-068]
23
Bibliography Baccino, T. et al. (2005). Mesure de l’utilisabilité des Interfaces, Hermès Science Publisher: Paris.
Fisk A.D., et al., (2009). Designing for Older Adults: Principles and Creative Human Factors Approaches
(second edition), London and New York, CRC Press
ISO 9241-210:2010, Ergonomie de l'interaction homme-système -- Partie 210: Conception centrée
sur l'opérateur humain pour les systèmes interactifs
Lindgaard, G., & Chattratichart, J. (2007). Usability Testing: What Have We Overlooked? CHI 2007,
(pp. 1415-1424).
Nielsen, J. (1993). Iterative User Interface Design, Jakob Nielsen’s Alertbox: November 1, 1993
Nielsen, J. (1994). Usability engineering, Morgan Kaufmann Publishers Inc. San Francisco, CA
Nielsen, J. (2000). Why You Only Need to Test with 5 Users, Jakob Nielsen’s Alertbox: March 19, 2000
Pak R. and McLaughlin, A., (2011). Designing displays for older adults, Boca Raton, London and New
York, CRC Press
Top Related