Conducting Usability Research through the Internet ...Conducting Usability Research through the...

Conducting Usability Research through the Internet: Testing Users via the WWW


Carolyn Wei, [email protected] Jennifer Barrick, [email protected] Elisabeth Cuddihy, [email protected] Jan Spyridakis, [email protected] Department of Technical Communication University of Washington Box 352195 Seattle, WA 98195-2195 Tel: +1 (206) 543-2567 Fax: +1 (206) 543-8858 Dept. URL: www.uwtc.washington.edu

ABSTRACT This article discusses the need for studying Web usability issues in natural environments. We propose that remote, Internet-based studies, testing users on their own technological devices at a time convenient for them, should complement traditional lab studies. We also describe remote methods and a toolkit for supporting the conduct of remote usability studies and empirical experiments.

INTRODUCTION With the Internet becoming a popular information resource, Web site usability has become increasingly important. Stakeholders in the usability of Web sites will know less than ever about potential users who may be in remote areas or international locations. It will be critical to use a variety of methods to identify a broad range of usability problems and to test the efficacy of designs with a diverse mix of users who may be far beyond the reach of a local usability lab. In this paper, we argue that remote usability testing can be a valuable part of the usability professional’s toolbox. The traditional method of testing Web site usability has been to run lab-based studies. These studies are certainly useful for finding critical problems about selected issues and such studies can convincingly demonstrate user frustration or success with portions of a Web site. However, there are tradeoffs associated with usability tests conducted in laboratory settings. For example, it is difficult to represent the full spectrum of a Web site’s audience with the small user samples that are willing or even able to come to a usability lab. Of further concern is the artifice associated with a laboratory itself—one-way mirrors, cameras, think-aloud protocols, and institutional sterility. There are no crumbs or coffee spills on the keyboards in a usability lab, no tangled cords or dysfunctional mice. In contrast, users “in the wild” access the Web using a variety of equipment in a multitude of environments. They browse the Web in the middle of the night or in spurts during their breaks at work—and who knows what they are thinking when they are accessing a tourist information Web site on a mobile device while sitting on a bench in a rail station in Stockholm. Unfortunately it is virtually impossible to mimic these environments in a usability lab or even observe such experiences with a field study. Imagine trying to obtain observational

jspyridakis

Text Box

Wei, C., Cuddihy, E., Barrick, J., and J.H. Spyridakis. Conducting Usability Research through the Internet. Proceedings of the Usability Professional Association (June, Montreal), 2005.


data about a user in bed working wirelessly on a laptop in a dimly lit room at 2 a.m. No matter how sociable the usability researcher is, study participants may not settle in and get as comfortable, knowing that they are being watched, as they would if they were using their own computing devices unsupervised and unobserved. Without being able to assess natural behavior of users, usability professionals risk missing out on identifying potentially significant problems with their Web sites or understanding how different versions of a Web site might function with real users working in their own environments. After a brief comparison of traditional usability lab studies with remote, Internet-based studies, we discuss remote methods in general and our software toolkit that facilitates remote usability testing and experimentation.

TRADITIONAL USABILITY LAB STUDIES Traditional usability tests in which users come to a lab and complete specific tasks are frequently used in the software industry and are becoming more common with online businesses. Usability tests can support inquiries about specific usability issues and help identify problem areas of a Web site. Typically only small samples of users participate because of the expense and time required per user, yet the results may be sufficient to address specific problems of interest. Lab studies can provide very rich, human illustrations of what works or does not work on a Web site and are extremely useful for investigating specific problems on specific Web sites with select users. Yet lab studies become problematic when usability researchers attempt to generalize their findings into guidelines that inform Web design practices for the large set of users who comprise the general Internet audience. Lab studies are often designed ad hoc for a specific Web site and are not meant to be generalizable to a wide range of situations. Further, because many of these studies are implemented with multiple design questions in mind, experimental variables can unintentionally become confounded. Some of the most often cited usability research raises methodological concerns for those who seek to extrapolate the results of this work into guidelines for Web design practices in general. For example, Morkes and Nielsen’s studies [e.g., 1, 2] can be hard to interpret because the studies manipulated several variables simultaneously without appropriate controls in a single experiment, and Spool et al. [3], who are frequently cited in Web design guidelines, provide little information about their methodology to aid in judging the validity of their results. In addition, the lab-based test takes users out of their natural environment, reducing the applicability of the study to real life situations and thus reducing the external validity, even though researchers may be more certain of internal validity (controlling threats to the study from uncontrolled variables) [4].

REMOTE, INTERNET-BASED USABILITY STUDIES In contrast to lab-based studies, remote Internet-based research allows researchers to examine users in natural environments, while gathering behavioral, attitudinal, and performance data concerning their interactions with a Web site. Researchers can assess users who are using their own equipment on their own time, and who are seeking information for their own needs. Further, remote studies can test large samples of a Web site’s real audience, providing veridical evidence of the usability of a site. And depending on the toolset used, it is even possible to capture rich, ethnographic information about users through Web cams, audio links, or remote sharing of computer screens (although some of these methods increase the intrusiveness of the researchers). We propose that studying usability issues remotely through the Internet in users’ own environments can complement traditional usability lab tests. The benefits of remote,


Internet-based studies can address many of the drawbacks of usability research noted above. Our research group has spent the last few years exploring issues related to Internet-based research methods in various projects in which we have studied the effect of Web design features (e.g., heading frequency, hyperlink wording) on user performance, behavior, and perceptions [5-13]. These studies have assessed volunteer Web site users, participating through the Internet from a location and at a time of their choosing. While conducting these studies, we have pursued a goal of refining Internet-based methods and creating a technical infrastructure to support the conduct of such work. The sample size of our studies has ranged from 85 to 600 subjects, with participants recruited through email lists, flyers, or links on health Web sites. All test Web sites have been adapted from naturally occurring Web sites. All studies have used rigorous experimental designs with controlled variables, operationalized conditions, and random assignment of visitors to test conditions. All studies have presented human subjects consent information and questionnaires that gathered data about demographics and site perceptions; most have also gathered comprehension data about the Web pages or sites, as well as behavioral data. Despite the advantages offered by such approaches, naturalistic Internet-based studies of Web design issues are few in number. Several recent studies in the business field, for example, have attempted to remotely assess the effect of Web design decisions on e-commerce sites but technical constraints have forced the researchers to conduct their studies in laboratories and occasionally to hire out the development of the experimental sites [14 -16]. Some examples of forays into remote studies from the fields of business and communications concern investigations of the effect of content organizers and instructional objective on learner performance [17], and Web design features on perceptions of trust [18] and credibility [19]. One reason for the paucity of Internet-based studies of Web design may be due to the fact that there are few tools to support such efforts. One such tool, OpenWebSurvey [20], allows researchers to instrument a Web site to appear within a frame that asks users to perform tasks and answer questions as they browse. The tool records survey answers as well as limited information about browsing behavior. Unfortunately, framing a Web site with a separate survey recreates some of the artifice of lab tests. Another tool, Uzilla [21], is a software suite for remote usability testing that offers a battery of tools for creating user tasks and analyzing data; however, it does not offer the ability to create different experimental conditions if, for example, one wanted to easily create and test variations of a Web design. As noted above, our research group has conducted several online studies related to Web design issues. With each experiment, we have gained additional experience and insight in conducting online studies and now are developing a methodology and prototype toolkit to support online experiments. The methodology we describe next is relevant both to those who wish to conduct generalizable online experiments as well as those who want to conduct remote usability testing to assess specific Web sites or situations. The goals of an Internet-based experiment and a remote usability test may differ, but the methods and practices of one can inform the other.

A METHODOLOGY AND A TOOLKIT FOR CONDUCTING REMOTE INTERNET-BASED STUDIES We recently conducted a remote Internet-based experiment on the effect of hyperlink wording on user behavior, perceptions, and comprehension of an experimental Web site about the natural history of American Samoa [7, 8]. This experiment was supported by a prototype toolkit that we developed to support the research. We specifically sought to test


the effects of three versions of hyperlink wording (generic, informative, and intriguing) in the navigation menu and embedded links within the body text on a variety of measures including user comprehension, perceptions, and behavior. The toolkit we developed for this experiment has now been expanded and is currently being used to support two new studies, one about the placement and design of navigation menus, and one about the syntax and semantics of headings in a health information site. The two programmers on our research team developed the toolkit so that even the non-technical members of the team could manage design changes to the experimental site, create surveys, and access and analyze data. Using the toolkit, researchers can instrument a study, programmatically create multiple versions of a Web site without replicating HTML code for each and every page, assign participants to conditions, deliver the dynamically constructed Web site, record Web site usage data beyond what is possible with standard server logs, and create, deliver, and collect questionnaire responses. The process for designing and running an experiment using this toolkit is depicted in Figure 1.

Figure 1: The process for designing and running an experiment using our toolkit.

Designing a Study Our toolkit allowed us to create a complete experimental study to be delivered through the Web, starting off with participants encountering introductory pages about the purpose of the study and informed consent, and ending with participants answering questionnaires, and if desired obtaining study incentives. Given our experimental design, in which we wanted to test alternative Web designs with controlled variations, the toolkit also supported the dynamic generation of multiple versions of the test Web site, random assignment of participants to conditions, survey design, and custom data logging. Although our toolkit does not support the direct observation of live users in action, it does allow for the capture of behavioral, perception, and comprehension measures for later analysis. After the initial set up of the experiment, there is very little work required to monitor the progress of participants. Over 600 users participated in the hyperlink wording


study over a few weeks’ time without requiring supervision time from the researchers. Our experiences with this toolkit can inform future work in the conduct of remote Internet-based studies of the effect of Web design decisions on user interactions. Some of the “lessons learned” are reported here.

Creating Multiple Versions of a Web Site

Designing a single-source Web site that dynamically generates experimental conditions of interest is crucial for success of a remote study. By making changes in a single file that affect variables throughout the instrumented Web site, researchers ensure that only specific characteristics of the site will change rather than the accidental introduction of variations between conditions. Further, they can ensure that changes in the base Web site ripple through all versions of an experimental site; for example, the correction of a typographical error or the removal of an extra line of space would be made in only the base site, not in multiple separate sites. In the hyperlink wording study, our toolkit allowed us to create a single experimental, base Web site that would serve as the foundation for the production of unique variations of hyperlink wording in the navigation menu and embedded in the body text. PHP, a scripting language designed for use in Web programming, allowed us to generate all experimental conditions for the study from a single code source. PHP scripts controlled variables in the HTML code that stood in for the embedded links and navigation bar links. Variables can be used to control more than text wording. Variables could also control other aspects of Web page design such as the background color, the displayed images, text passages, or any other design feature that is controlled by HTML or cascading style sheets. For example, in our study, we used PHP to randomly vary the order of thematic sections of the navigation menu, and the order of the links within these sections, for each participant. PHP scripts could be used to create a menu of variations that researchers desire to test and the ensuing experimental Web sites would be dynamically generated as participants enter the study.

Assigning Participants to Conditions Our toolkit supported the assignment of participants to study conditions sequentially or randomly, and the delivery of condition-appropriate Web sites. Because participants come to a study Web site in an essentially random, self-selected order, both sequential and random methods should result in an equal and random assignment of participants across experimental conditions. The assignment process was transparent to participants; they did not see any direct evidence that other versions of the Web site existed. Random assignment of participants is critical to ensure that researchers do not unknowingly bias any condition with participant assignment. Further, each condition is tested by a wide range of users and their unique operating systems.

Constructing and Delivering Surveys

The Web site under study was followed by a survey. The perceptual, demographic, and comprehension questions helped us contextualize the clicking behaviors of users. For example, data on a user who kept returning to the home page and reported on a survey that the Web site was confusing might suggest that the information needed to be better organized. Online survey design and delivery can be challenging. Rather than relying on difficult to edit, static HTML pages or an inflexible survey generator tool, we developed a method to dynamically generate online surveys from a set of plain text files. If changes needed to be made to the survey such as rewording an answer choice, moving questions around, or adding a new page of questions, the text files of the survey could be easily edited by someone without any HTML knowledge. This ease of editing is essential for


ensuring that survey questions can be iteratively refined to elicit the most useful responses from study participants.

Recording Data

Our toolkit made it possible to collect specific, customized information about the participants’ behavior on the experimental Web site in addition to questionnaire responses about demographics, perceptions, and comprehension. The following behavioral measures were most useful for our interpretation of users’ reactions to our experimental Web designs.

• Specific pages visited. Each participant’s unique navigation path through the Web site was documented to help us understand users’ interactions with the site. Such information is extremely difficult to distill from standard Web server log files, and customized scripts are necessary to capture granular detail that is linked to specific users.

• Page load times. By recording page load times, we could extrapolate the time spent on each page by looking at the load time of the next page. Understanding how long participants spend on a page is helpful for inferring whether they are reading, clicking through the site too quickly to be reading, or temporarily leaving the experiment to do something else.

• Back button use. Because many users are avid Back button users, it is important to capture participants’ use of the Back button for the researchers to understand how users travel through a site. Also, it is useful to capture such behavior when administering a comprehension test that supposes participants are answering questions from memory rather than by backtracking through the site to find answers. Our method allowed us to check whether the page that participants appeared to originate from matched the last page they had actually visited. When there was a discrepancy, we inferred that participants had used their browser’s Back button or Forward button. This approach is similar to the approach detailed in Hong et al. [22], but is still subject to some browser caching problems, which researchers can manage programmatically.

• Type of link clicked. Capturing in granular detail which links participants click on a page was important in the hyperlink wording study since each page often had two links pointing to the same page. With the toolkit, we could identify whether participants clicked on links tagged as navigation links, embedded text links, introductory page links, or survey page links. Given that the focus of this study included link placement, this information was crucial to our study goals, yet such information would have been impossible to determine from standard log files.

• Survey data. Participants’ answers to surveys can be critical in interpreting their behavior with the study Web site. The toolkit recorded the answers to survey questions related to demographics, preferences, and comprehension, as well as providing open-ended questions and recording comments.

• Passive data. Passive data about the study are also critical to record. Our toolkit recorded information about each participant’s interaction with the study, including the participant’s assigned study condition, type of Web browser, and type of computer operating system. This information helped capture the context of the each user’s computing environment and to better understand user interactions.

All of the above information was linked together for each individual participant through ID codes uniquely and transparently assigned to each participant, making it easy to connect different types of information, such as pages seen and comprehension, or condition and types of links clicked. ID codes are much more reliable for identifying individual users than attempting to rely on IP addresses, which can be dynamically assigned by ISPs or shared by multiple users of a proxy server. The random ID codes also helped to protect user


anonymity by separating user data from IP addresses. An additional benefit of the method by which the scripts recorded data was that the page URLs did not contain lengthy variable strings (such as one sees on Amazon.com), which can alert study participants to the exact nature of collected data and thereby affect their behavior. The toolkit directly exported data as a flat-file database that could be imported into a statistical program such as SPSS for analysis. Other tools were developed to query the database for specific information such as comprehension scores related to pages visited, number of participants who saw specific pages, or time spent on introductory pages. The results of these queries could be output in a tabular form that could either be easily viewed in a text editor or imported into a statistical program for further analysis.

CONCLUSIONS Our recent experiences in conducting remote, Internet-based studies to empirically assess the effect of Web design features on users in their natural environments has convinced us that such work is a useful complement to lab-based studies. Large-sample, remotely conducted, Internet studies can supplement the rich detail of traditional lab studies with data about a broad set of users in their own idiosyncratic environments, or identify broad statistical patterns to be later scrutinized under the close lens of a lab study. With the refinement of methods for the conduct of such work and the development of viable tools to support such work (like the toolkit we describe here), usability testing will be able to move outside the walls of the well-equipped usability labs, and into the arena of assessing users on their own computers as well as those who are accessing the Web on a variety of mobile computing devices. In the future, it will indeed be possible to naturally and non-invasively study the computing habits of that tourist in the rail station in Stockholm, or the insomniac workaholic computing in his bed in the wee hours.

REFERENCES 1. Morkes, J., and Nielsen, J. (1997). Concise, SCANNABLE, and Objective: How to Write

for the Web. Retrieved March 12, 2004, from http://www.useit.com/papers/webwriting/writing.html.

2. Morkes, J., and Nielsen, J. (1998). Applying Writing Guidelines to Web Pages. Retrieved March 14, 2004, from http://www.useit.com/papers/webwriting/rewriting.html. Also in the Proceedings of ACM CHI 98 Conference on Human Factors in Computing Systems, .2, 321-322.

3. Spool, J.M., Scanlon, T., Schroeder, W., Snyder, C., and DeAngelo, T. (1997). Web Site Usability: A Designer's Guide. North Andover, Mass: User Interface Engineering.

4. Spyridakis, J. (2000). Guidelines for Authoring Comprehensible Web Pages and Evaluating Their Success. Technical Communication 47 (3), 359-382. http://www.techcomm-online.org/issues/v47n3/pdf/0419.pdf

5. Mobrand, K.A., and Spyridakis, J.H. (2002a). The Effect of Hyperlink Wording on User Performance. In the Proceedings of International Technical Communication Conference, 229–232.

6. Mobrand, K.A., and Spyridakis, J.H. (2002b). A Web-Based Study of User Performance with Enhanced Local Navigational Cues. In the Proceedings of IEEE International Professional Communication Conference, 500–508.

7. Barrick, J., Maust, B., Spyridakis, J.H., Eliot, M., Wei, C., Evans, M., and Mobrand, K. (2004). A Tool for Supporting Web-Based Empirical Research: Providing a Basis for Web Design Guidelines. In the Proceedings of IEEE International Professional Communication Conference, 189-193.


8. Evans, M., Wei, C., Eliot, M., Barrick, J., Maust, B., and Spyridakis, J.H. (2004). The Influence of Link Wording on Browsing Behavior and Comprehension. In the Proceedings of International Conference on Technical Communication, 313-317.

9. Freeman, K. and Spyridakis, J.H. (2004a). An Examination of Factors that Affect the Credibility of Online Health Information, Technical Communication 51 (2), 239-263.

10. Freeman, K. and Spyridakis, J.H. (2005: In Preparation). Web Site Credibility Revisited.

11. Mobrand, K.A., and Spyridakis, J.H. (2005: In Preparation). User Performance with Explicit Local Navigational Links: A Web-based Investigation

12. Schultz, L., and Spyridakis, J.H. (2004). The Effect of Heading Frequency on Comprehension of Online Information: A Study of Two Populations. Technical Communication 51 (4), 504-516.

13. Schultz, L., and Spyridakis, J.H. (2005: Under Review). Online Medical Information and Patients’ Emotional States.

14. Fiore, A.M., and Jin, H. (2003). Influence of Image Interactivity on Approach Responses Towards an Online Retailer. Internet Research: Electronic Networking Applications and Policy 13 (1). 38-48.

15. Danaher, P., and Mullarkey, G. (2003). Factors Affecting Online Advertising Recall: A Study of Students. Journal of Advertising Research. 252-267.

16. Eroglu, S., Machleit, K., and Davis, L. (2003). Empirical Testing of a Model of Online Store Atmospherics and Shopper Responses. Psychology & Marketing 20 (2). 139-150.

17. Nishikura, H. (2000). The Impact of Content Organizers and Instructional Objectives on Learner Performance in a Web-Based Environment. Unpublished doctoral dissertation, Arizona State University, AZ.

18. Stephens, R.T. (2004). Trust Study. Retrieved May 6, 2004, from http://www.truststudy.com/

19. Flanagin, A.J., and Metzger, M.J. (2003). The Perceived Credibility of Personal Web Page Information as Influenced by the Sex of the Source. Computers in Human Behavior 19, 683-701.

20. Baravalle A., and Lanfranchi, V. (2003). Remote Web Usability Testing. Behavior Research Methods, Instruments, & Computers 35 (3), 364-368.

21. Edmonds, A. (2003). Uzilla: A New Tool for Web Usability Testing. Behavior Research Methods, Instruments, & Computers 35 (2), 194-201.

22. Hong, J.I., Heer, J., Waterson, S., and J.A. Landay. (2001). WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. ACM Transactions on Information Systems 19 (3), 263–285.

Conducting Usability Research through the Internet ...Conducting Usability Research through the...

Documents

Transcript of Conducting Usability Research through the Internet ...Conducting Usability Research through the...