chap16.pdf
Document Details
Uploaded by VividStream
Tags
Full Transcript
Chapter 16 E VA L U AT I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S 16.1 Introduction 16.2 Inspections: Heuristic Evaluation and Walk-Throughs 16.3 Analytics and A/B testing 16.4 Predictive Models Objectives The main goals of this chapter are to accomplish the following: •...
Chapter 16 E VA L U AT I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S 16.1 Introduction 16.2 Inspections: Heuristic Evaluation and Walk-Throughs 16.3 Analytics and A/B testing 16.4 Predictive Models Objectives The main goals of this chapter are to accomplish the following: • • • • • Describe the key concepts associated with inspection methods. Explain how to do heuristic evaluation and walk-throughs. Explain the role of analytics in evaluation. Describe how A/B testing is used in evaluation. Describe how to use Fitts’ law—a predictive model. 16.1 Introduction The evaluation methods described in this book so far have involved interaction with, or direct observation of, users. In this chapter, we introduce methods that are based on understanding users through one of the following: • Knowledge codified in heuristics • Data collected remotely • Models that predict users’ performance None of these methods requires users to be present during the evaluation. Inspection methods often involve a researcher, sometimes known as an expert, role-playing the users for whom the product is designed, analyzing aspects of an interface, and identifying potential usability problems. The most well-known methods are heuristic evaluation and walkthroughs. Analytics involves user interaction logging, and A/B testing is an experimental method. Both analytics and A/B testing are usually carried out remotely. Predictive modeling 550 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S involves analyzing the various physical and mental operations that are needed to perform particular tasks at the interface and operationalizing them as quantitative measures. One of the most commonly used predictive models is Fitts’ law. 16.2 Inspections: Heuristic Evaluation and Walk-Throughs Sometimes, it is not practical to involve users in an evaluation because they are not available, there is insufficient time, or it is difficult to find people. In such circumstances, other people, often referred to as experts or researchers, can provide feedback. These are people who are knowledgeable about both interaction design and the needs and typical behavior of users. Various inspection methods were developed as alternatives to usability testing in the early 1990s, drawing on software engineering practice where code and other types of inspections are commonly used. Inspection methods for interaction design include heuristic evaluations and walk-throughs, in which researchers examine the interface of an interactive product, often role-playing typical users, and suggest problems that users would likely have when interacting with the product. One of the attractions of these methods is that they can be used at any stage of a design project. They can also be used to complement user testing. 16.2.1 Heuristic Evaluation In heuristic evaluation, researchers, guided by a set of usability principles known as heuristics, evaluate whether user-interface elements, such as dialog boxes, menus, navigation structure, online help, and so on, conform to tried-and-tested principles. These heuristics closely resemble high-level design principles (such as making designs consistent, reducing memory load, and using terms that users understand). Heuristic evaluation was developed by Jakob Nielsen and his colleagues (Nielsen and Mohlich, 1990; Nielsen, 1994a) and later modified by other researchers for evaluating the web and other types of systems (see Hollingshead and Novick, 2007; Budd, 2007; Pinelle et al., 2009; Harley, 2018). In addition, many researchers and practitioners have converted design guidelines into heuristics that are then applied in heuristic evaluation. The original set of heuristics for HCI evaluation were empirically derived from the analysis of 249 usability problems (Nielsen, 1994b); a revised version of these heuristics follows (Nielsen, 2014: useit.com): Visibility of System Status The system should always keep users informed about what is going on, through appropriate feedback and within reasonable time. Match Between System and the Real World The system should speak the users’ language, with words, phrases, and concepts familiar to the user, rather than system-oriented terms. It should follow real-world conventions, making information appear in a natural and logical order. User Control and Freedom Users often choose system functions by mistake and will need a clearly marked emergency exit to leave the unwanted state without having to go through an extended dialog. The system should support undo and redo. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S Consistency and Standards Users should not have to wonder whether different words, situations, or actions mean the same thing. The system should follow platform conventions. Error Prevention Rather than just good error messages, the system should incorporate careful design that prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action. Recognition Rather Than Recall Minimize the user’s memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialog to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. Flexibility and Efficiency of Use Accelerators—unseen by the novice user—may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. Aesthetic and Minimalist Design Dialogs should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialog competes with the relevant units of information and diminishes their relative visibility. Help Users Recognize, Diagnose, and Recover from Errors Error messages should be expressed in plain language (not codes), precisely indicate the problem, and constructively suggest a solution. Help and Documentation Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large. More information about heuristic evaluation is provided at www.nngroup.com/articles/ux-expert-reviews/ This site shows how a researcher, Wendy Bravo, used heuristics to evaluate two travel websites, Travelocity and Expedia: https://medium.com/@WendyBravo/heuristic-evaluation-of-two-travelwebsites-13f830cf0111 This video, developed by David Lazarus and published on May 9, 2011, provides insights into Jakob Nielsen’s 10 Usability Heuristics for Interface Design. The video is still useful even though the heuristics have been updated slightly since it was made. http://youtu.be/hWc0Fd2AS3s 551 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S Designers and researchers evaluate aspects of the interface against the appropriate heuristics. For example, if a new social media system is being evaluated, the designer might consider how users would add friends to their networks. Those doing the heuristic evaluation go through the interface several times, inspecting the various interaction elements and comparing them with the list of usability heuristics. During each iteration, usability problems will be identified and ways of fixing them may be suggested. Although many heuristics apply to most products (for example, be consistent and provide meaningful feedback, especially if an error occurs), some of the core heuristics are too general for evaluating products that have come onto the market more recently, such as mobile devices, digital toys, social media, ambient devices, web services, and IoT. Many designers and researchers have therefore developed their own heuristics by tailoring Nielsen’s heuristics with other design guidelines, market research, results from research studies, and requirements documents. The Nielsen/Norman Group has also taken a more detailed look at particular heuristics, such as the first heuristic listed above, “visibility of system status,” (Harley, 2018a), which focuses on communication and transparency. Exactly which heuristics are appropriate and how many are needed for different products is debatable and depends on the goals of the evaluation. However, most sets have between 5 and 10 items. This number provides a good range of usability criteria by which to judge the various aspects of a product’s design. More than 10 items become difficult for those doing the evaluation to manage, while fewer than 5 items tend not to be sufficiently discriminating. Another concern is the number of researchers needed to carry out a thorough heuristic evaluation that identifies the majority of usability problems. Empirical tests were conducted suggesting that 3–5 can typically identify up to 75 percent of the total usability problems, as shown in Figure 16.1 (Nielsen, 1994a). However, employing several researchers can be resource intensive. Therefore, the overall conclusion is that while more researchers might be better, fewer can be used—especially if the researchers are experienced and knowledgeable about the product and its intended users. 100% Proportion of Usability Problems Found 552 75% 50% 25% 0% 0 5 10 Number of Evaluators 15 Figure 16.1 Curve showing the proportion of usability problems in an interface found by heuristic evaluation using various numbers of evaluators Source: Nielsen and Mack (1994). Used courtesy of John Wiley & Sons, Inc. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S Heuristic Evaluation for Websites A number of different heuristic sets for evaluating websites have been developed based on Nielsen’s original 10 heuristics. One of these was developed by Andy Budd after discovering that Nielsen’s heuristics did not address the problems of the continuously evolving web. He also found that there was overlap between several of the guidelines and that they varied widely in terms of their scope and specificity, which made them difficult to use. An extract from these heuristics is shown in Box 16.1. Notice that a difference between these and Nielsen’s original heuristics is that they place more emphasis on information content. BOX 16.1 Extract from the Heuristics Developed by Budd (2007) That Emphasize Web Design Issues Clarity Make the system as clear, concise, and meaningful as possible for the intended audience. • Write clear, concise copy. • Only use technical language for a technical audience. • Write clear and meaningful labels. • Use meaningful icons. Minimize Unnecessary Complexity and Cognitive Load Make the system as simple as possible for users to accomplish their tasks. • Remove unnecessary functionality, process steps, and visual clutter. • Use progressive disclosure to hide advanced features. • Break down complicated processes into multiple steps. • Prioritize using size, shape, color, alignment, and proximity. Provide Users with Context Interfaces should provide users with a sense of context in time and space. • Provide a clear site name and purpose. • Highlight the current section in the navigation. 553 554 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S • • • • Provide a breadcrumb trail (that is, show where the user has been in a website). Use appropriate feedback messages. Show the number of steps in a process. Reduce perception of latency by providing visual cues (for instance, a progress indicator) or by allowing users to complete other tasks while waiting. Promote a Pleasurable and Positive User Experience The user should be treated with respect, and the design should be aesthetically pleasing and promote a pleasurable and rewarding experience. • Create a pleasurable and attractive design. • Provide easily attainable goals. • Provide rewards for usage and progression. A similar approach to Budd’s is also taken by Leigh Howells in her article entitled “A guide to heuristic website reviews” (Howells, 2011). In this article and in a more recent one by Toni Granollers (2018), techniques for making the results of heuristic evaluation more objective are proposed. This can be done either to show the occurrence of different heuristics from an evaluation or to compare the results of different researchers’ evaluations, as shown in Figure 16.2. First, a calculation is done to estimate the percentage of usability problems identified by each researcher, which is then displayed around the diagram (in this case there were seven researchers). Then a single value representing the mean of all of the researchers’ individual means is calculated and displayed in the center of the diagram. In addition to being able to compare the relative performance of different researchers and the overall usability of the design, a version of this procedure can be used to compare the usability of different prototypes or for comparisons with competitors’ products. Doing Heuristic Evaluations Doing a heuristic evaluation can be broken down into three main stages (Nielsen and Mack, 1994; Muniz, 2016). • A briefing session, in which the user researchers are briefed about the goal of the evaluation. If there is more than one researcher, a prepared script may be used to ensure that each person receives the same briefing. • The evaluation period, in which the user researchers typically spend 1–2 hours independently inspecting the product, using the heuristics for guidance. Typically, the researchers will take at least two passes through the interface. The first pass gives a feel for the flow of the interaction and the product’s scope. The second pass allows them to focus on specific interface elements in the context of the whole product and to identify potential usability problems. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S Eval1; 77,2% Eval2; 69,0% Eval7; 66,7% MEAN 77,5% Eval6; 86,0% Eval3; 83,8% Eval5; 72,7% Eval4; 87,0% Figure 16.2 Radar diagram showing the mean number of problems identified by each of the seven researchers and the overall mean shown in the center of the diagram Source: Granollers (2018). Used courtesy of Springer Nature ACTIVITY 16.1 1. Use some of Budd’s heuristics (Box 16.1) to evaluate a website that you visit regularly. Do these heuristics help you to identify important usability and user experience issues? If so, how? 2. How does being aware of the heuristics influence how you interact with the website? 3. Was it difficult to use these heuristics? Comment 1. The heuristics focus on key usability criteria, such as whether the interface seems unnecessarily complex and how color is used. Budd’s heuristics also encourage consideration of how the user feels about the experience of interacting with a website. 2. Being aware of the heuristics may lead to a stronger focus on the design and interaction, and it can raise awareness of what the user is trying to do and how the website is responding. 3. When applied at a high level, these guidelines can be tricky to use. For example, what exactly does “clarity” mean in regard to a website? Although the detailed list (write clear, concise copy; only use technical language for a technical audience, and so on) provides some guidance, making the evaluation task a bit easier, it may still seem quite difficult, particularly for those not used to doing heuristic evaluations. 555 556 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S If the evaluation is for a functioning product, the researchers will typically have some specific user tasks in mind so that their exploration is focused. Suggesting tasks may be helpful, but many UX researchers suggest their own tasks. However, this approach is more difficult if the evaluation is done early in design when there are only screen mock-ups or a specification. Therefore, the approach needs to be adapted for the evaluation circumstances. While working through the interface, specification, or mock-ups, a second researcher may record the problems identified, while the other researcher may think aloud, which can be video recorded. Alternatively, each researcher may take notes. • The debriefing session, in which the researchers come together to discuss their findings with designers and to prioritize the problems they found and give suggestions for solutions. The heuristics focus the researchers’ attention on particular issues, so selecting appropriate heuristics is critically important. Even so, sometimes there is disagreement among researchers, as discussed in the next “Dilemma.” DILEMMA Classic Problems or False Alarms? Some researchers and designers may have the impression that heuristic evaluation is a panacea that can reveal all that is wrong with a design with little demand on a design team’s resources. However, in addition to being quite difficult to use as just discussed, heuristic evaluation has other problems, such as sometimes missing key problems that would likely be found by testing the product with real users. Shortly after heuristic evaluation was developed, several independent studies compared it with other methods, particularly user testing. They found that the different approaches often identify different problems and that sometimes heuristic evaluation misses severe problems (Karat, 1994). In addition, its efficacy can be influenced both by the number of experts and by the nature of the problems, as mentioned earlier (Cockton and Woolrych, 2001; Woolrych and Cockton, 2001). Heuristic evaluation, therefore, should not be viewed as a replacement for user testing. Another issue concerns researchers reporting problems that don’t exist. In other words, some of the researchers’ predictions are wrong. Bill Bailey (2001) cites analyses from three published sources showing that about 33 percent of the problems reported were real usability problems, some of which were serious, while others were trivial. However, the researchers missed about 21 percent of users’ problems. Furthermore, about 43 percent of the problems identified by the researchers were not problems at all; they were false alarms. He points out that this means that only about half of the problems identified were true problems: “More specifically, for every true usability problem identified, there will be a little over one false alarm (1.2) and about one-half of one missed problem (0.6). If this analysis is true, the experts tend to identify more false alarms and miss more problems than they have true hits.” How can the number of false alarms or missed serious problems be reduced? Checking that researchers really have the expertise that is required could help, particularly that they have a good understanding of the target user population. But how can this be done? One way to overcome these problems is to have several researchers. This helps to reduce the impact of 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S one person’s experience or poor performance. Using heuristic evaluation along with user testing and other methods is also a good idea. Providing support for researchers and designers to use heuristics effectively is yet another way to reduce these shortcomings. For example, Bruce Tognazzini (2014) now includes short case studies to illustrate some of the principles that he advocates using as heuristics. Analyzing the meaning of each heuristic and developing a set of questions can also be helpful, as mentioned previously. Another important issue when designing and evaluating web pages, mobile apps, and other types of products is their accessibility to a broad range of users, for example, people with sight, hearing, and mobility challenges. Many countries now have web content accessibility guidelines (WCAG) to which designers must pay attention, as discussed in Box 16.2. BOX 16.2 Evaluating for Accessibility Using the Web Content Accessibility Guidelines Web Content Accessibility Guidelines (WCAG) are a detailed set of standards about how to ensure that web page content is accessible for users with various disabilities (Lazar et al., 2015). While heuristics such as Ben Shneiderman’s eight golden rules (Shneiderman et al., 2016) and Nielsen and Mohlich’s heuristic evaluation are well-known within the HCI community, the WCAG is probably the best-known set of interface guidelines or standards outside of the HCI community. Why? Because many countries around the world have laws that require that government websites, and websites of public accommodations (such as hotels, libraries, and retail stores), are accessible for people with disabilities. A majority of those laws, including the Disability Discrimination Act in Australia, Stanca Act in Italy, Equality Act in the United Kingdom, and Section 508 of the Rehabilitation Act in the United States, as well as policies such as Canada’s Policy on Communications and Federal Identity and India’s Guidelines for Indian Government Websites, use WCAG as the benchmark for web accessibility. For more information about the web accessibility guidelines, laws, and policies, see https://www.w3.org/WAI/ The concept of web accessibility is as old as the web itself. Tim Berners-Lee said, “The power of the Web is in its universality. Access by everyone, regardless of disability, is an essential aspect” (https://www.w3.org/Press/IPO-announce). To fulfill this mission, the WCAG were created, approved, and released in 1999. The WCAG were created by committee members from 475 member organizations, including leading tech companies such as Microsoft, Google, and Apple. The process for developing them was transparent and open, and all of the stakeholders, including many members of the HCI community, were encouraged to contribute and comment. 557 558 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S WCAG 2.0 was released in 2008. WCAG 2.1 was released in 2018, with a modification to improve accessibility further for low-vision users and for web content presented on mobile devices. In addition, when designers follow these guidelines, there are often benefits for all users, such as improved readability and search results that are presented in more meaningful ways. While all of the various WCAG documents online would add up to hundreds of printed pages, the key concepts and core requirements are summarized in “WCAG 2.1 at a Glance,” (www.w3.org/WAI/standardsguidelines/wcag/glance) a document that could be considered to be a set of HCI heuristics. The key concepts of web accessibility, according to WCAG, are summarized as POUR— Perceivable, Operable, Understandable, and Robust. 1. Perceivable 1.1 Provide text alternatives for non-text content. 1.2 Provide captions and other alternatives for multimedia. 1.3 Create content that can be presented in different ways, including by assistive technologies, without losing meaning. 1.4 Make it easier for users to see and hear content. 2. Operable 2.1 Make all functionality available from a keyboard. 2.2 Give users enough time to read and use content. 2.3 Do not use content that causes seizures or physical reactions. 2.4 Help users navigate and find content. 2.5 Make it easy to use inputs other than keyboard. 3. Understandable 3.1 Make text readable and understandable. 3.2 Make content appear and operate in predictable ways. 3.3 Help users avoid and correct mistakes. 4. Robust 4.1 Maximize compatibility with current and future user tools. Source: https://www.w3.org/WAI/standards-guidelines/wcag/glance/. These guidelines can be used as heuristics to evaluate basic web page accessibility. For example, they can be converted into specific questions such as: Is there ALT text on graphics? Is the entire page usable if a pointing device cannot be used? Is there any flashing content that will trigger seizures? Is there captioning on videos? While some of these issues can be addressed directly by designers, captioning is typically contracted out to organizations that specialize in developing and inserting captions. Governments and large organizations have to make their websites accessible to avoid possible legal action in the United States and some other countries. However, tools and advice to enable small companies and individuals to develop appropriate captions help to make captioning more universal. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S Some researchers have created heuristics specifically to ensure that websites and other products are accessible to users with disabilities. For example, Jenn Mankoff et al. (2005) discovered that developers who did heuristic evaluation using a screen reader found 50 percent of known usability problems. Although, admirably, much research focuses on accessibility for people with sight problems, research to support other types of disabilities is also needed. An example is the research by Alexandros Yeratziotis and Panayiotis Zaphiris (2018), who created a method comprising 12 heuristics for evaluating deaf users’ experiences with websites. While automated software testing tools have been developed in an attempt to apply WCAG guidelines to web pages, this approach had limited success because there are so many accessibility requirements that are not currently machine-testable. Human inspection using the WCAG, or user testing involving people with disabilities, are still the superior methods for evaluating web compliance with WCAG 2.1 standards. Turning Design Guidelines, Principles, and Golden Rules into Heuristics An approach to developing heuristics for evaluating the many different types of digital technologies is to convert design guidelines into heuristics. Often this is done by just using guidelines as though they are heuristics, so guidelines and heuristics are assumed to be interchangeable. A more principled approach is for designers and researchers to translate the design guidelines into questions. For example, Kaisa Väänänen-Vainio-Mattila and Minna Wäljas (2009) adopted this approach when developing heuristics for evaluating user experience with a web service. They identified what they called hedonic heuristics, which directly addressed how users feel about their interactions. These were based on design guidelines concerning whether the user feels that the web service provides a lively place where it is enjoyable to spend time and whether it satisfies a user’s curiosity by frequently offering interesting content. When stated as questions, these become: Is the service a lively place where it is enjoyable to spend time? Does the service satisfy users’ curiosity by frequently offering interesting content? In a critique of Nielsen’s Heuristics (1994) and a similar set of heuristics proposed by Bruce Tognazzini’s known as “First Principles of HCI Design and Usability” (Tognazzini, 2014), Toni Granollers points out the need for revising these heuristics. She claims that there is considerable overlap both within each of the two sets of heuristics and between them. Furthermore, she stresses the need for more guidance in using heuristics and advocates for developing questions as a way to provide this support. Granollers suggests first converting the heuristics into principles, and then, as was suggested earlier, identifying pertinent questions to ground the principles so that they are useful. For example, consider the heuristic “visibility and system state,” which is a composite between Nielsen’s and Tognazzini’s heuristics. Granolles suggests the following questions: Does the application include a visible title page, section or site? Does the user always know where they are located? Does the user always know what the system or application is doing? Are the links clearly defined? Can all actions be visualized directly (i.e., no other actions are required)? Granollers, 2018, p. 62 Each heuristic is therefore decomposed into a set of questions like these, which could be further adapted for evaluating specific products. Heuristics (some of which may be guidelines or rules) have been created for designing and evaluating a wide range of products including shared groupware (Baker et al., 2002), 559 560 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S video games (Pinelle et al., 2008), multiplayer games (Pinelle et al., 2009), online communities (Preece and Shneiderman, 2009), information visualization (Forsell and Johansson, 2010), captchas (Reynaga et al., 2015), and e-commerce sites (Hartley, 2018b). David Travis (2016), a consultant from Userfocus, has compiled 247 guidelines that are used in evaluations. These include 20 guidelines for home page usability, 20 for search usability, 29 for navigation and information architecture, 23 for trust and credibility, and more. To access more information about these guidelines, check out David Travis’s website at https://www.userfocus.co.uk/resources/guidelines.html In the mid-1980s Ben Shneiderman also proposed design guidelines that are frequently used as heuristics for evaluation. These are called the “eight golden rules.” They were slightly revised recently (Shneiderman et al., 2016) and are now stated as follows: 1. 2. 3. 4. 5. 6. 7. 8. Strive for consistency. Seek universal usability. Offer informative feedback. Design dialogs to yield closure. Prevent errors. Permit easy reversal of actions. Keep users in control. Reduce short-term memory load ACTIVITY 16.2 COMPARING HEURISTICS 1. Compare Nielsen’s usability heuristics with Shneiderman’s eight golden rules. Which are similar, and which are different? 2. Then select another set of heuristics or guidelines for evaluating a system in which you are particularly interested and add them to the comparison. Comment 1. Only a few heuristics and golden rules nearly match, for instance, Nielsen’s guidelines for “consistency and standards,” “error prevention,” and “user control and freedom” match up with Shneiderman’s rules of “striving for consistency,” “prevent errors,” and “keep users in control.” Looking deeper, Nielsen’s “help users recognize, diagnose and recover from errors” and “help and documentation” map with Shneiderman’s “offer informative feedback.” It is harder to find heuristics and golden rules that are unique to each researcher’s set; “aesthetic and minimalist design” appears only in Nielsen’s list, whereas “seek universal usability” appears only in Shneiderman’s list. However, with even deeper analysis, it could be argued that there is considerable overlap between the two sets. Without examining 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S and considering each heuristic and guideline in detail, making comparisons like this is not straightforward. It is therefore difficult to judge when faced with choosing between these and/or other heuristics. In the end, perhaps the best way forward is for researchers to select the set of heuristics that seem most appropriate for their own evaluation context. 2. We selected the web accessibility guidelines listed in Box 16.2. Unlike the Nielsen heuristics and Shneiderman’s eight golden rules, these guidelines specifically target the accessibility of websites for users with disabilities, particularly those who are blind or have limited vision. The ones under “perceivable,” “operable,” and “robust” do not appear in either of the other two lists. The guidelines listed for “understandable” are more like those in Nielsen’s and Shneiderman’s lists. They focus on reminding designers to make content appear in consistent and predictable ways and to help users to avoid making mistakes. 16.2.2 Walk-Throughs Walk-throughs offer an alternative approach to heuristic evaluation for predicting users’ problems without doing user testing. As the name suggests, walk-throughs involve walking through a task with the product and noting problematic usability features. While most walk-through methods do not involve users, others, such as pluralistic walk-throughs, involve a team that may include users, as well as developers and usability specialists. In this section, we consider cognitive and pluralistic walk-throughs. Both were originally developed for evaluating desktop systems, but, as with heuristic evaluation, they can be adapted for other kinds of interfaces. Cognitive Walk-Throughs Cognitive walk-throughs involve simulating how users go about problem-solving at each step in a human-computer interaction. A cognitive walk-through, as the name implies, takes a cognitive perspective in which the focus is on evaluating designs for ease of learning—a focus that is motivated by observations that users learn by exploration. This well-established method (Wharton et al., 1994) is now often integrated with a range of other evaluation and design processes. See, for example, the Jared Spool blog at https://medium.com/@jmspool, (Spool 2018). The main steps involved in cognitive walk-throughs are as follows: 1. The characteristics of typical users are identified and documented, and sample tasks are developed that focus on the aspects of the design to be evaluated. A description, mock-up, or prototype of the interface to be developed is also produced, along with a clear sequence of the actions needed for the users to complete the task. 2. A designer and one or more UX researchers come together to do the analysis. 3. The UX researchers walk through the action sequences for each task, placing it within the context of a typical scenario. As they do this, they try to answer the following questions: a. Will the correct action be sufficiently evident to the user? (Will the user know what to do to achieve the task?) b. Will the user notice that the correct action is available? (Can users see the button or menu item that they should use for the next action? Is it apparent when it is needed?) 561 562 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S c. Will the user associate and interpret the response from the action correctly? (Will users know from the feedback that they have made a correct or incorrect choice of action?) In other words, will users know what to do, see how to do it, and understand from feedback whether the action was completed correctly or not? 4. As the walk-through is being done, a record of critical information is compiled. a. The assumptions about what would cause problems and why are identified. b. Notes about side issues and design changes are made. c. A summary of the results is compiled. 5. The design is then revised to fix the problems presented. Before making the fix, insights derived from the walk-through are often checked by testing them with real users. When doing a cognitive walk-through, it is important to document the process, keeping account of what works and what doesn’t. A standardized feedback form can be used in which answers are recorded to each question. Any negative answers are carefully documented on a separate form, along with details of the product, its version number, and the date of the evaluation. It is also useful to document the severity of the problems. For example, how likely a problem is to occur, and how serious it will be for users. The form can also be used to record the process details outlined in steps 1–4. Brad Dalrymple (2017) describes doing a walk-through with himself as the user in three steps. Notice that there are fewer steps and they are a bit different from those previously listed. 1. Identify the user goal you want to examine. 2. Identify the tasks you must complete to accomplish that goal. 3. Document the experience while completing the tasks. Dalrymple provides an example of the actions that he needs to go through to create a Spotify playlist (the task) of music for guests who will attend his dinner party (the goal). Check out this link for the Dalrymple cognitive walk-through to create a Spotify playlist: https://medium.com/user-research/cognitive-walk-throughs-b84c4f0a14d4 Compared with heuristic evaluation, walk-throughs focus more closely on identifying specific user problems at a detailed level. Another type of walk-through that takes a semiotic engineering perspective is described in Box 16.3. BOX 16.3 A Semiotic Engineering Inspection Technique Humans use a variety of signs and symbols to communicate and encode information. These include everyday things like road signs, written or spoken words, mathematical symbols, gestures, and icons. The study of how signs and symbols are constituted, interpreted, and produced is known as semiotics. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S UX designs use a variety of signs to communicate meanings to users. Some of these are well established, such as the trashcan for deleting files, while others are created for or used only in particular types of applications, such as a row of birds in a bird identification app (see Figure 16.3). The goal of UX designers is that users of their designs understand what they mean to communicate with familiar and unfamiliar signs alike. (a) (b) Figure 16.3 (a) Icons for a trashcan and (b) bird symbols Source: (a) University of Maryland; (b) Merlin Bird ID app, Cornell Lab of Ornithology An important aspect of UX design is how to get the designers’ message across to the users by means of interaction signs alone. Knowledge of semiotic and engineering concepts— brought together in the semiotic engineering of human interaction with and through digital technologies (de Souza, 2005)—contributes to improving the communication of principles, features, and values of UX design. The primary method used to evaluate the quality of semiotic engineering is SigniFYIng Message—an inspection procedure (de Souza et al., 2016) that focuses on the communicative power of signs that UX designers can choose in order to communicate their message to users. These are the very signs through which users, in turn, will be able to express what they want to do, explore, or experience during interaction. The method is suitable for evaluating small portions of a UX design in detail. When carrying out this kind of semiotic evaluation, inspectors are guided by specific questions about three types of interaction signs. • Static signs, which communicate what they mean instantly and do not require further interaction for a user to make sense of them. • Dynamic signs, which only communicate meaning over time and through interaction. In other words, the user can only make sense of them if they engage in interaction. • Metalinguistic signs, which can be static or dynamic. Their distinctive feature is that their meaning is an explanation, a description, some information, warning, or commentary about another interface sign. Figure 16.4 shows examples of how these signs achieve communication within four screens of a smartphone app for arranging meetings. To help users avoid time zone errors when meeting participants who are in different time zones, UX designers may elect to communicate times using Greenwich mean time (GMT) and also expose their rationale to users. 563 564 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S Figure 16.4 Examples of static, dynamic, and metalinguistic signs used in UX design sketches for a meeting arrangement app Source: de Souza et al. (2016). Used courtesy of Springer Nature The outcome of a SigniFYIng Message inspection is an assessment of the quality of the messages and the strategies of communication that a piece of UX design offers to the users. Using this information, designers may choose to modify the signs to clarify the communication. ACTIVITY 16.3 Conduct a cognitive walk-through of typical users who want to buy a copy of this book as an ebook at www.amazon.com or www.wiley.com. Follow the steps outlined earlier by Cathleen Wharton (Wharton et al., 2009). Comment Step 1 Typical Users Students and professional designers who use the web regularly. Task To buy an ebook version of this book from www.amazon.com or www.wiley.com. Step 2 You will play the role of the expert evaluator. 16.2 I N S P E C T I O N S : H E U R I S T I C E VA L U AT I O N A N D W A L K - T H R O U G H S Step 3 (Note that the interface for www.amazon.com or www.wiley.com may have changed since the authors did this evaluation.) The first action will probably be to select the search box on the home page of the website selected and then type in the title or names of the author(s) of the book. Q: Will users know what to do? A: Yes. They know that they must find books, and the search box is a good place to start. Q: Will users see how to do it? A: Yes. They have seen a search box before, will type in the appropriate term, and will click the Go or Search icon. Q: Will users understand from the feedback provided whether the action was correct or not? A: Yes. Their action should take them to a page that shows them the cover of this book. They need to click this or a Buy icon next to the cover of the book. Q: Will users understand from the feedback provided whether the action was correct or not? A: Yes. They have probably done this before, and they will be able to continue to purchase the book. ACTIVITY 16.4 From your experience of reading about and trying a heuristic evaluation and cognitive walkthrough, how do you think they compare for evaluating a website in terms of the following? 1. The time typically needed to do each kind of evaluation 2. The suitability of each method for evaluating a whole website Comment 1. A cognitive walk-through would typically take longer because it is a more detailed process than a heuristic evaluation. 2. A cognitive walk-through would typically not be used to evaluate a whole website unless it was a small one. A cognitive walk-through is a detailed process, whereas a heuristic evaluation is more holistic. A variation of a cognitive walk-through was developed by Rick Spencer (2000) to overcome some problems that he encountered when using the original form of a cognitive walk-through for a design team. The first problem was that answering the questions and discussing the answers took too long. Second, the designers tended to be defensive, often 565 566 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S invoking long explanations of cognitive theory to justify their designs. This was particularly difficult because it undermined the efficacy of the method and the social relationships of team members. To cope with these problems, he adapted the method by asking fewer detailed questions and curtailing discussion. This meant that the analysis was more coarsegrained but could normally be completed in about 2.5 hours, depending on the task being evaluated by the cognitive walk-through. He also identified a leader and set strong ground rules for the session, including a ban on defending a design, debating cognitive theory, or doing designs on the fly. More recently, Valentina Grigoreanu and Manal Mohanna (2013) modified the cognitive walk-through so that it could be used effectively within an agile design process in which a quick turnaround in design-evaluate-design cycles is needed. Their method involves an informal, simplified streamlined cognitive walk-through (SSCW) followed by an informal pluralistic walk-through (discussed next). When compared to a traditional user study on the same user interface, they found that approximately 80 percent of the findings from the user study were also revealed by the SSCW. A discussion of the value of the cognitive walk-through method for evaluating various devices can be found at www.userfocus.co.uk/articles/cogwalk.html Pluralistic Walk-Throughs Pluralistic walk-throughs are another type of well-established walk-through in which users, developers, and usability researchers work together to step through a task scenario. As they do this, they discuss usability issues associated with dialog elements involved in the scenario steps (Nielsen and Mack, 1994). In a pluralistic walk-through, each person is asked to assume the role of a typical user. Scenarios of use, consisting of a few prototype screens, are given to each person who writes down the sequence of actions that they would take to move from one screen to another, without conferring with each other. Then they all discuss the actions they each suggested before moving on to the next round of screens. This process continues until all of the scenarios have been evaluated (Bias, 1994). The benefits of pluralistic walk-throughs include a strong focus on users’ tasks at a detailed level, that is, looking at the steps taken. This level of analysis can be invaluable for certain kinds of systems, such as safety-critical ones, where a usability problem identified for a single step could be critical to its safety or efficiency. The approach lends itself well to participatory design practices, as discussed in Chapter 12, “Design, Prototyping, and Construction,” by involving a multidisciplinary team in which users play a key role. Furthermore, the researchers bring a variety of expertise and opinions for interpreting each stage of the interaction. The limitations with this approach include having to get the researchers together at one time and then proceed at the rate of the slowest. Furthermore, only a limited number of scenarios, and hence paths through the interface, can usually be explored because of time constraints. 16.3 A N A LY T I C S A N D A / B T E S T I N G For an overview of walk-throughs and an example of a cognitive walk-through of iTunes, see the following site: http://team17-cs3240.blogspot.com/2012/03/cognitive-walkthrough-andpluralistic.html Note: The link to pluralistic walk-throughs may not work correctly on all browsers. 16.3 Analytics and A/B Testing A variety of users’ actions can be recorded by software automatically, including key presses, mouse or other pointing device movements, time spent searching a web page, looking at help systems, and task flow through software modules. A key advantage of logging activity automatically is that it is unobtrusive provided the system’s performance is not affected, but it also raises ethical concerns about observing participants if this is done without their knowledge, as discussed in Chapter 10, “Data at Scale.” Another advantage is that large volumes of data can be logged automatically and then explored and analyzed using visualization and other tools. 16.3.1 Web Analytics Web analytics is a form of interaction logging that was specifically created to analyze users’ activity on websites so that designers could modify their designs to attract and retain customers. For example, if a website promises users information about how to plant a wildflower garden but the home page is unattractive and it only shows gardens in arid and tropical regions, then users from more temperate zones will not look any further because the information they see isn’t relevant to them. These users become one-time visitors and leave to look for other websites that contain the information they need to create their gardens. If the website is used by thousands of users and a small number of users do not return, this loss of users may not be noticed by the web designers and web owners unless they track users’ activities. Using web analytics, web designers and developers can trace the activity of the users who visit their website. They can see how many people came to the site, how many stayed and for how long, and which pages they visited. They can also find out about where the users came from and much more. Web analytics is therefore a powerful evaluation tool for web designers that can be used on its own or in conjunction with other types of evaluations, particularly user testing. For instance, web analytics can provide a “big-picture” overview of user interaction on a website, whereas user testing with a few typical users can reveal details about UX design problems that need to be fixed. Because the goal of using web analytics is to enable designers to optimize users’ usage of the website, web analytics is especially valued by businesses and market research organizations. For example, web analytics can be used to evaluate the effectiveness of a print or media advertising campaign by showing how traffic to a website changes during and after the campaign. 567 568 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S Web analytics are also used in evaluating non-transactional products such as information and entertainment websites, including hobby, music, games, blogs, and personal websites (refer to Sleeper et al., 2014), and for learning. When analytics are used in learning, they are often referred to as learning analytics (for example, Oviatt et al., 2013; Educause, 2016). Learning analytics play a strong role in evaluating learners’ activities in massive open online courses (MOOCs) and with Open Education Resources (OERs). The designers of these systems are interested in questions such as at what point do learners tend to drop out and why? Other types of specialist analytics have also been developed that can be used in evaluation studies, such as visual analytics (discussed in Chapter 10, “Data at Scale, in which thousands and often millions of data points are displayed and can be manipulated visually, as in social network analysis (Hansen et al., 2019). Box 16.5 and Box 16.6 contain two short case examples of web analytics being used in different evaluation contexts. The first is an early example designed to evaluate visitor traffic to a website for Mountain Wines of California. The second shows the use of Google Analytics for evaluating the use of a community website for air monitoring. A video of Simon Buckingham Shum’s 2014 keynote presentation at the EdMedia 2014 Conference can be found at http://people.kmi.open.ac.uk/sbs/2014/06/ edmedia2014-keynote/ The video introduces learning analytics and how analytics are used to answer key questions in a world where people are dealing with large volumes of digital data. Using Web Analytics There are two types of web analytics: on-site and off-site analytics. On-site analytics are used by website owners to measure visitor behavior. Off-site analytics measure a website’s visibility and potential to acquire an audience on the Internet regardless of who owns the website. In recent years, however, the difference between off-site and on-site analytics has blurred but some people still use these terms. Additional sources may also be used to augment the data collected about a website, such as email, direct mail campaign data, sales, and history data, which can be paired with web traffic data to provide further insights into users’ behavior. Google Analytics Even as early as 2012, Google Analytics was the most widely used on-site web analytics and statistics service. More than 50 percent of the 10,000 most popular websites at that time (Empson, 2012) used Google Analytics, and its popularity continues to soar. Figure 16.5 shows parts of the Google Analytics dashboard for the accompanying website for the previous edition of this book, id-book.com, for the week starting at the end of November 2018 until the beginning of December 2018. The first segment (a) shows information about who accessed the site and how long they stayed, the second segment (b) shows the devices used to view the website and the pages visited, and the third segment (c) shows the languages spoken by the users. 16.3 (a) (b) A N A LY T I C S A N D A / B T E S T I N G 569 570 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S (c) Figure 16.5 Segments of the Google Analytics dashboard for id-book.com in December 2018: (a) audience overview, (b) the devices used to access the site, and (c) the languages of the users ACTIVITY 16.5 Consider the three screenshot segments shown in Figure 16.5 from the Google Analytics for id-book.com, and then answer the following questions. 1. How many people visited the site during this period? 2. What do you think someone might look at in 2 minutes, 37 seconds (the average time they spent on the site)? 3. Bounce rate refers to the percentage of visitors who view just one page of your site. What is the bounce rate for this book, and why do you think this might be a useful metric to capture for any website? 4. Which devices are being used to access the site? 5. Which were the three largest language groups during the period, and what can you say about the bounce rate for each of them. Comment 1. 1,723 users visited the site over this period. Notice that some users must have had more than one session since the number of users is not the same as the number of sessions, which was 2,271. 16.3 A N A LY T I C S A N D A / B T E S T I N G 2. The number of pages viewed per session on average is about 3.25 in 2 minutes, 37 seconds. This suggests that a user probably won’t have played any of the videos on the site nor read any of the case studies in any great detail. From part (b), it appears that they did check out some of the chapters, resources, and slides. 3. The bounce rate is 58.30 percent. This is a useful metric because it represents a simple but significant characteristic of user behavior, which is that after visiting the home page, they did not go anywhere else on the site. Typical bounce rates are 40–60 percent, while greater than 65 percent is high and less than 35 percent is low). If the bounce rate is high, it merits further investigation to see whether there is a problem with the website. 4. 79.6 percent of users accessed the site using a desktop, 16 percent used a mobile device, and 4.4 percent used a laptop. Compared to the previous week, the number of mobile users increased by 3.2 percent. 5. American English speakers were the largest group (317, or 59.81 percent), followed by British English speakers (44, or 8.3 percent), and then Chinese speakers (27, or 5.09 percent). The bounce rate for the Chinese visitors was by far the highest at 82.86 percent, compared with 55.5 percent for the Americans and 63.45 percent for the British visitors. Ian Lurie’s “Google Analytics Tutorial—Install” video explains how to install and use Google Analytics on your website. This video can be found at http://youtu .be/P_l4oc6tbYk Scott Bradley’s Google Analytics Tutorial Step-by-Step video describes the statistics included in Google Analytics, and it provides insight into how the analytics may be used to improve user traffic. This video can be found at http://youtu.be/ mm78xlsADgc For an overview of different dashboards that can be customized in Google Analytics, see Ned Poulter’s website (2013) 6 Google Analytics Custom Dashboards to Save You Time NOW! at http://www.stateofdigital.com/google-analyticsdashboards/ You can also study an online course, developed by FutureLearn, on data science with Google Analytics at www.futurelearn.com/courses/data-science-googleanalytics/ 571 572 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S BOX 16.4 Other Analytics Tools In addition to Google Analytics, other tools continue to emerge that provide additional layers of information, good access control options, and raw and real-time data collection. Moz Analytics Tracks search marketing, social media marketing, brand activity, links, and content marketing, and it is particularly useful for link management and analysis: www.moz.com TruSocialMetrics Tracks social media metrics, and it helps calculate social media marketing return on investment: www.truesocialmetrics.com Clicky Comprehensive and real-time analytics tool that shows individual visitors and the actions they take, and it helps define what people from different demographics find interesting: www.clicky.com KISSmetrics Detailed analytics tool that displays what website visitors are doing on your website before, during, and after they buy: www.kissmetrics.com Crazy Egg Tracks visitor clicks based on where they are specifically clicking, and it creates click heat maps useful for website design, usability, and conversion: www.crazyegg.com ClickTale Records website visitor actions and uses meta-statistics to create visual heat map reports on customer mouse movement, scrolling, and other visitor behaviors: www .clicktale.com There are many sites on the web that provide lists of analytics tools. One of these, which includes some tools in addition to those mentioned in Box 16.4, is the following: https://www.computerworlduk.com/galleries/data/best-web-analytics-toolsalternatives-google-analytics-3628473/ BOX 16.5 Tracking Visitors to Mountain Wines Website In this study, Mountain Wines of California hired VisiStat to do an early study of the traffic to its website. Mountain Wines wanted to find ways to encourage more visitors to come to its website with the hope of enticing them to visit the winery. The first step to achieving this goal was to discover how many visitors were currently visiting the website, what they did there, and where they came from. Obtaining analytics about the website enabled Mountain Wines to start to understand what was happening and how to increase the number of visitors (VisiStat, 2010). Part of the results of this early analysis are shown in Figure 16.6, which provides an overview of the number of page views provided by VisiStat. Figure 16.7 shows where some of the IP addresses are located. 16.3 A N A LY T I C S A N D A / B T E S T I N G Using this and other data provided by VisiStat, the Mountain Wines founders could see visitor totals, traffic averages, traffic sources, visitor activity, and more. They discovered the importance of visibility for their top search words; they could pinpoint where their visitors were going on their website; and they could see where their visitors were geographically located. Figure 16.6 A general view of the kind of data provided by VisiStat Source: http://www.visistat.com/tracking/monthly-page-views.php Figure 16.7 Where the 13 visitors to the website are located by the IP address Source: http://www.visistat.com/tracking/monthly-page-views.php BOX 16.6 Using Google Analytics for Air Quality Monitoring Many parts of the world suffer from poor air quality caused by pollution from industry, traffic congestion, and forest fires. More recently, fires in California, the northwest United States, Canada, and parts of Europe have created severe air quality problems. Consequently, communities are developing devices to crowdsource air quality readings for monitoring the quality of the air that they breathe. In one of these community-empowered air quality monitoring project, Yen-Chia Hsu and her colleagues (2017) developed a website that integrates animated smoke images, data from sensors, and crowdsourced smell reports and wind data. 573 574 16 E V A L U A T I O N : I N S P E C T I O N S , A N A LY T I C S , A N D M O D E L S Having enabled the community to monitor its own air quality and to collect reliable