F27ID Testing and Evaluation PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides notes on testing and evaluation in interaction design. It covers the importance of testing, evaluation goals, measuring usability, and different evaluation methods. It also discusses questionnaires, observations, interviews, and experimental research. The document is intended for university-level students in interaction design.
Full Transcript
Testing and Evaluation F27ID Introduction to Interaction Design Overview Importance of testing and evaluation Evaluation Goals Measuring Usability Categories and methods of Evaluation Choosing the right one(s) Intro to Experimental Design – to be cont. in F28ED Why is test...
Testing and Evaluation F27ID Introduction to Interaction Design Overview Importance of testing and evaluation Evaluation Goals Measuring Usability Categories and methods of Evaluation Choosing the right one(s) Intro to Experimental Design – to be cont. in F28ED Why is testing and evaluation important? Testing and evaluation intro Discover improvements in areas such as: Performance/Efficiency How much time, and how many steps, are required for people to complete basic tasks? Accuracy How many mistakes do people make? Are they fatal or recoverable with the right information Recall How much does the person remember afterwards or after periods of non-use? Emotional response How does the person feel about the tasks completed? Is the person confident, stressed? Would the user recommend this system to a friend? Evaluation goals Assess extent of system functionality Assess effect on user Identify specific problems Designers need to check whether their ideas are really what users need/want; or whether the final product works as expected. To do that, we need some form of methods, or more specifically, empirical methods for HCI. Usability Remember Usability? Usability refers to the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of user (ISO 9241-11) Usability Usability measures the quality of a user's experience when interacting with a product or system Ease of learning Efficiency of use Memorability Error frequency and severity Satisfaction (subjective for each P). User Evaluation Considered to yield the most reliable and valid estimate of an application's usability In a typical user-based evaluation, participants are asked to perform a set of tasks with the technology. Depending on the primary focus of the evaluator, the users' success at completing the tasks and their speed of performance may be recorded. Some categories of evaluation Lab/controlled studies (experimental research, Usability testing), Field/in the wild studies Observation Methods (think aloud, video analysis) Questionnaires, Interviews, Focus Groups Some are Quantitative others are Qualitative Some methods The evaluation approach influences the methods used, and in turn, how data is collected, analysed and presented. Usability testing, for example, typically: Involves users, interviews, participatory design, questionnaires. Conducted in the laboratory or in a natural setting. Primary goal is to test how usable the interface is with intended populations. Field/in the wild studies typically: Involve observation and interviews. Do not involve controlled tests in a laboratory. Produce mainly qualitative data. Which one should you use? Should you use Quantitative or Qualitative approach? Quantitative refers to methods where you gather numbers. E.g. Time of completion, number of errors, Likert scales (e.g. 1-5) Qualitative refers to words E.g. Interviews, open ended questions. Which one should you use? Data is either Qualitative or Quantitative Some methods can collect both. E.g. Questionnaires can be both. Quantitative (ratings-based), Qualitative (short-questions). Selection depends on: Overall research or evaluation goal Ethical and practical issues Resources, cost, logistics Availability of participants and sample size Skills of team, background, philosophy Hence an evaluation proposal/hypothesis/research question(s) is imperative. Other considerations Reliability/replicability: repeating the experiment will give you similar results Validity: The accuracy of a method to measure what it is intended to measure Bias: the unintentional influence of the experimenter's expectations, beliefs, or preconceived notions on the outcome of a study or research experiment. Scope the boundaries and extent of a study Questionnaires Create a new one, or reuse/adapt an existing? Established questionnaires will give more reliable and repeatable results than new questionnaires. Some questionnaires for assessing the perceived usability of an interactive system: Questionnaire for User Interface Satisfaction (QUIS) (1988) Computer System Usability Questionnaire (CSUQ) (1995) System Usability Scale (SUS) (1996) Godspeed Questionnaires for robot perceptions NASA/TLX for task load index Observations Researcher observes users using the new/current technology and makes notes. Interaction with participants is kept to a minimum. Simulates real world. Interviews Researcher questions user on one-to-one basis usually based on pre-prepared questions. structured (all questions), semi-structured (driven by questions), unstructured (driven by topics) Advantages can be varied to suit context issues can be explored more fully can elicit user views and identify unanticipated problems relatively cheap Disadvantages Can be very subjective time consuming Focus groups Discussion based group interview Often used for market research Features Comprises people with a particular set of characteristics Moderated (often by the researcher) Tend to be relatively informal Centres around open questions designed to generate dynamic discussion (participants may also be required to complete a questionnaire) Not suitable for evaluation/late stages. Why? Think aloud protocols user observed performing task user asked to describe what he is doing and why, what he thinks is happening etc Advantages simplicity - requires little expertise can provide useful insights can show how system is actually used Disadvantages subjective selective act of describing may alter task performance Remember photobooth prototype video Experimental research A test of the effect of a single variable by changing it while keeping all other variables the same. A controlled experiment generally compares the results obtained from an experimental sample against a control sample. General terms Independent variables (test conditions) Dependent variables (what you measure) Confounding variables (do not control but may affect measurements) Independent / dependent variables Independent Variables (IV): What you as the researcher vary or manipulate Type of interface/app Type of feedback Type of Menu Dependent Variables (DV): What you measure Time Performance (Accuracy, Errors) Subjective Ratings Another Example Interaction design List vs dropdown for data entry selection Confounding Variables A variable that provides an alternative explanation to the results that we see Can cloud the effect of extra conditions So in the plants example, a confound variable could be: Temperature at the time of the experiment Type of soil Plant genetics What are some possible confounding variables for UI testing? Experimental design Within/between subject within subject design each participant performs experiment under each condition transfer of learning possible less costly and less likely to suffer from user variation between subject design each participant performs under only one condition no transfer of learning more participants required variation can bias results Method Procedure: the sequence of steps from welcoming participant to the participant leaving the experiment room, in other words: you provide enough detail on the process of data collection to allow another person to repeat your research Materials: Here you describe the equipment/instruments used for data collection (computers, microphones, screens, tangible objects, etc). Setup: where was the participant seated, how far from the screen, etc. Measurements: details on data collection (what is being measured and how) Participants: Here you need to identify the participants (demographics etc). We will take a much deeper dive in all of this in F28ED next year!! :D Overview Importance of testing and evaluation Evaluation Goals Measuring Usability Categories and methods of Evaluation Choosing the right one(s) Intro to Experimental Design – to be cont. in F28ED Further reading Chapters 14 & 15 https://discovery.hw.ac.uk/permalink/f/1el5916/44hwa_alma216995 0000003206