ISTQB CT-AI Syllabus v1.0 PDF

Certified Tester AI Testing (CT-AI) Syllabus Version 1.0 International Software Testing Qualifications Board Provided by Alliance for Qualification, Artificial Intelligence United, Chinese Software Testing Qualifications Board, and Korean Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Copyright Notice Copyright Notice © International Software Testing Qualifications Board (hereinafter called ISTQB®) ISTQB® is a registered trademark of the International Software Testing Qualifications Board. Copyright © 2021, the authors Klaudia Dussa-Zieger (chair), Werner Henschelchen, Vipul Kocher, Qin Liu, Stuart Reid, Kyle Siemens, and Adam Leon Smith. All rights reserved. The authors hereby transfer the copyright to the ISTQB®. The authors (as current copyright holders) and ISTQB® (as the future copyright holder) have agreed to the following conditions of use: Extracts, for non-commercial use, from this document may be copied if the source is acknowledged. Any Accredited Training Provider may use this syllabus as the basis for a training course if the authors and the ISTQB® are acknowledged as the source and copyright owners of the syllabus and provided that any advertisement of such a training course may mention the syllabus only after official Accreditation of the training materials has been received from an ISTQB®-recognized Member Board. Any individual or group of individuals may use this syllabus as the basis for articles and books, if the authors and the ISTQB® are acknowledged as the source and copyright owners of the syllabus. Any other use of this syllabus is prohibited without first obtaining the approval in writing of the ISTQB®. Any ISTQB®-recognized Member Board may translate this syllabus provided they reproduce the abovementioned Copyright Notice in the translated version of the syllabus. v1.0 Page 2 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Revision History Version Date Remarks 1.0 2021/10/01 Release for GA v1.0 Page 3 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Table of Contents Copyright Notice..................................................................................................................................... 2 Revision History...................................................................................................................................... 3 Table of Contents................................................................................................................................... 4 Acknowledgements................................................................................................................................ 8 0 Introduction.................................................................................................................................... 9 0.1 Purpose of this Syllabus........................................................................................................ 9 0.2 The Certified Tester AI Testing............................................................................................. 9 0.3 Examinable Learning Objectives and Cognitive Level of Knowledge................................... 9 0.4 Hands-on Levels of Competency........................................................................................ 10 0.5 The Certified Tester AI Testing Exam................................................................................. 10 0.6 Accreditation........................................................................................................................ 11 0.7 Level of Detail...................................................................................................................... 11 0.8 How this Syllabus is Organized........................................................................................... 11 1 Introduction to AI – 105 minutes.................................................................................................. 13 1.1 Definition of AI and AI Effect............................................................................................... 14 1.2 Narrow, General and Super AI............................................................................................ 14 1.3 AI-Based and Conventional Systems.................................................................................. 14 1.4 AI Technologies................................................................................................................... 15 1.5 AI Development Frameworks.............................................................................................. 15 1.6 Hardware for AI-Based Systems......................................................................................... 16 1.7 AI as a Service (AIaaS)....................................................................................................... 17 1.7.1 Contracts for AI as a Service.......................................................................................... 17 1.7.2 AIaaS Examples............................................................................................................. 17 1.8 Pre-Trained Models............................................................................................................. 18 1.8.1 Introduction to Pre-Trained Models................................................................................. 18 1.8.2 Transfer Learning............................................................................................................ 18 1.8.3 Risks of using Pre-Trained Models and Transfer Learning............................................. 19 1.9 Standards, Regulations and AI........................................................................................... 19 2 Quality Characteristics for AI-Based Systems – 105 minutes...................................................... 21 2.1 Flexibility and Adaptability................................................................................................... 22 2.2 Autonomy............................................................................................................................ 22 2.3 Evolution.............................................................................................................................. 22 v1.0 Page 4 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 2.4 Bias..................................................................................................................................... 23 2.5 Ethics................................................................................................................................... 23 2.6 Side Effects and Reward Hacking....................................................................................... 24 2.7 Transparency, Interpretability and Explainability................................................................. 24 2.8 Safety and AI....................................................................................................................... 25 3 Machine Learning (ML) – Overview - 145 minutes...................................................................... 26 3.1 Forms of ML........................................................................................................................ 27 3.1.1 Supervised Learning....................................................................................................... 27 3.1.2 Unsupervised Learning................................................................................................... 27 3.1.3 Reinforcement Learning.................................................................................................. 28 3.2 ML Workflow........................................................................................................................ 28 3.3 Selecting a Form of ML....................................................................................................... 31 3.4 Factors Involved in ML Algorithm Selection........................................................................ 31 3.5 Overfitting and Underfitting.................................................................................................. 32 3.5.1 Overfitting........................................................................................................................ 32 3.5.2 Underfitting...................................................................................................................... 32 3.5.3 Hands-On Exercise: Demonstrate Overfitting and Underfitting...................................... 32 4 ML - Data – 230 minutes.............................................................................................................. 33 4.1 Data Preparation as Part of the ML Workflow..................................................................... 34 4.1.1 Challenges in Data Preparation...................................................................................... 35 4.1.2 Hands-On Exercise: Data Preparation for ML................................................................ 35 4.2 Training, Validation and Test Datasets in the ML Workflow................................................ 36 4.2.1 Hands-On Exercise: Identify Training and Test Data and Create an ML Model............. 36 4.3 Dataset Quality Issues........................................................................................................ 36 4.4 Data Quality and its Effect on the ML Model....................................................................... 38 4.5 Data Labelling for Supervised Learning.............................................................................. 38 4.5.1 Approaches to Data Labelling......................................................................................... 38 4.5.2 Mislabeled Data in Datasets........................................................................................... 39 5 ML Functional Performance Metrics – 120 minutes..................................................................... 40 5.1 Confusion Matrix................................................................................................................. 41 5.2 Additional ML Functional Performance Metrics for Classification, Regression and Clustering......................................................................................................................................... 42 5.3 Limitations of ML Functional Performance Metrics............................................................. 42 5.4 Selecting ML Functional Performance Metrics.................................................................... 43 5.4.1 Hands-On Exercise: Evaluate the Created ML Model.................................................... 44 v1.0 Page 5 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 5.5 Benchmark Suites for ML.................................................................................................... 44 6 ML - Neural Networks and Testing – 65 minutes......................................................................... 45 6.1 Neural Networks.................................................................................................................. 46 6.1.1 Hands-On Exercise: Implement a Simple Perceptron.................................................... 47 6.2 Coverage Measures for Neural Networks........................................................................... 48 7 Testing AI-Based Systems Overview – 115 minutes................................................................... 50 7.1 Specification of AI-Based Systems..................................................................................... 51 7.2 Test Levels for AI-Based Systems...................................................................................... 51 7.2.1 Input Data Testing........................................................................................................... 52 7.2.2 ML Model Testing........................................................................................................... 52 7.2.3 Component Testing......................................................................................................... 52 7.2.4 Component Integration Testing....................................................................................... 52 7.2.5 System Testing............................................................................................................... 52 7.2.6 Acceptance Testing........................................................................................................ 53 7.3 Test Data for Testing AI-based Systems............................................................................. 53 7.4 Testing for Automation Bias in AI-Based Systems.............................................................. 53 7.5 Documenting an AI Component.......................................................................................... 54 7.6 Testing for Concept Drift..................................................................................................... 55 7.7 Selecting a Test Approach for an ML System..................................................................... 55 8 Testing AI-Specific Quality Characteristics – 150 minutes.......................................................... 58 8.1 Challenges Testing Self-Learning Systems........................................................................ 59 8.2 Testing Autonomous AI-Based Systems............................................................................. 60 8.3 Testing for Algorithmic, Sample and Inappropriate Bias..................................................... 60 8.4 Challenges Testing Probabilistic and Non-Deterministic AI-Based Systems...................... 61 8.5 Challenges Testing Complex AI-Based Systems................................................................ 61 8.6 Testing the Transparency, Interpretability and Explainability of AI-Based Systems........... 62 8.6.1 Hands-On Exercise: Model Explainability....................................................................... 63 8.7 Test Oracles for AI-Based Systems.................................................................................... 63 8.8 Test Objectives and Acceptance Criteria............................................................................ 63 9 Methods and Techniques for the Testing of AI-Based Systems – 245 minutes.......................... 66 9.1 Adversarial Attacks and Data Poisoning............................................................................. 67 9.1.1 Adversarial Attacks......................................................................................................... 67 9.1.2 Data Poisoning................................................................................................................ 67 9.2 Pairwise Testing.................................................................................................................. 68 9.2.1 Hands-On Exercise: Pairwise Testing............................................................................ 68 v1.0 Page 6 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 9.3 Back-to-Back Testing.......................................................................................................... 68 9.4 A/B Testing.......................................................................................................................... 69 9.5 Metamorphic Testing (MT).................................................................................................. 69 9.5.1 Hands-On Exercise: Metamorphic Testing..................................................................... 71 9.6 Experience-Based Testing of AI-Based Systems............................................................... 71 9.6.1 Hands-On Exercise: Exploratory Testing and Exploratory Data Analysis (EDA)............ 72 9.7 Selecting Test Techniques for AI-Based Systems.............................................................. 73 10 Test Environments for AI-Based Systems – 30 minutes.............................................................. 74 10.1 Test Environments for AI-Based Systems........................................................................... 75 10.2 Virtual Test Environments for Testing AI-Based Systems................................................... 75 11 Using AI for Testing – 195 minutes.............................................................................................. 77 11.1 AI Technologies for Testing................................................................................................. 78 11.1.1 Hands-On Exercise:The Use of AI in Testing............................................................. 78 11.2 Using AI to Analyze Reported Defects................................................................................ 78 11.3 Using AI for Test Case Generation..................................................................................... 79 11.4 Using AI for the Optimization of Regression Test Suites.................................................... 79 11.5 Using AI for Defect Prediction............................................................................................. 79 11.5.1 Hands-On Exercise: Build a Defect Prediction System.............................................. 80 11.6 Using AI for Testing User Interfaces................................................................................... 80 11.6.1 Using AI to Test Through the Graphical User Interface (GUI).................................... 80 11.6.2 Using AI to Test the GUI............................................................................................. 81 12 References................................................................................................................................... 82 12.1 Standards [S]....................................................................................................................... 82 12.2 ISTQB® Documents [I]......................................................................................................... 82 12.3 Books and Articles [B]......................................................................................................... 83 12.4 Other References [R].......................................................................................................... 86 13 Appendix A – Abbreviations......................................................................................................... 88 14 Appendix B – AI Specific and other Terms.................................................................................. 89 15 Index............................................................................................................................................ 99 v1.0 Page 7 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Acknowledgements This document was formally released by the General Assembly of the ISTQB® on October 1st, 2021. It was produced by a team from the International Software Testing Qualifications Board: Klaudia Dussa-Zieger (chair), Werner Henschelchen, Vipul Kocher, Qin Liu, Stuart Reid, Kyle Siemens, and Adam Leon Smith. The team thanks the authors of the three contributing syllabi; A4Q: Rex Black, Bruno Legeard, Jeremias Rößler, Adam Leon Smith, Stephan Goericke, Werner Henschelchen AiU: Main authors Vipul Kocher, Saurabh Bansal, Srinivas Padmanabhuni and Sonika Bengani and co-authors Rik Marselis, José M. Diaz Delgado CSTQB/KSTQB: Qin Liu, Stuart Reid The team thanks the Exam, Glossary and Marketing Working Groups for their support throughout the development of the syllabus, Graham Bath for his technical editing and the Member Boards for their suggestions and input. The following persons participated in the reviewing and commenting of this syllabus: Laura Albert, Reto Armuzzi, Árpád Beszédes, Armin Born, Géza Bujdosó, Renzo Cerquozzi, Sudeep Chatterjee, Seunghee Choi, Young-jae Choi, Piet de Roo, Myriam Christener, Jean-Baptiste Crouigneau, Guofu Ding,Erwin Engelsma, Hongfei Fan, Péter Földházi Jr., Tamás Gergely, Ferdinand Gramsamer, Attila Gyúri, Matthias Hamburg, Tobias Horn, Jarosław Hryszko, Beata Karpinska, Joan Killeen, Rik Kochuyt, Thomas Letzkus, Chunhui Li, Haiying Liu, Gary Mogyorodi, Rik Marselis, Imre Mészáros, Tetsu Nagata, Ingvar Nordström, Gábor Péterffy, Tal Pe'er, Ralph Pichler, Nishan Portoyan, Meile Posthuma, Adam Roman, Gerhard Runze, Andrew Rutz, Klaus Skafte, Mike Smith, Payal Sobti, Péter Sótér, Michael Stahl, Chris van Bael, Stephanie van Dijck, Robert Werkhoven, Paul Weymouth, Dong Xin, Ester Zabar, Claude Zhang. v1.0 Page 8 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 0 Introduction 0.1 Purpose of this Syllabus This syllabus forms the basis for the ISTQB® Certified Tester AI Testing. The ISTQB® provides this syllabus as follows: 1. To member boards, to translate into their local language and to accredit training providers. Member boards may adapt the syllabus to their particular language needs and modify the references to adapt to their local publications. 2. To certification bodies, to derive examination questions in their local language adapted to the learning objectives for this syllabus. 3. To training providers, to produce courseware and determine appropriate teaching methods. 4. To certification candidates, to prepare for the certification exam (either as part of a training course or independently). 5. To the international software and systems engineering community, to advance the profession of software and systems testing, and as a basis for books and articles. 0.2 The Certified Tester AI Testing The Certified Tester AI Testing is aimed at anyone involved in testing AI-based systems and/or AI for testing. This includes people in roles such as testers, test analysts, data analysts, test engineers, test consultants, test managers, user acceptance testers and software developers. This certification is also appropriate for anyone who wants a basic understanding of testing AI-based systems and/or AI for testing, such as project managers, quality managers, software development managers, business analysts, operations team members, IT directors and management consultants. The Certified Tester AI Testing Overview [I03] is a separate document which includes the following information: Business outcomes for the syllabus Matrix of business outcomes and connection with learning objectives Summary of the syllabus Relationships among the syllabi 0.3 Examinable Learning Objectives and Cognitive Level of Knowledge Learning objectives support the business outcomes and are used to create the Certified Tester AI Testing exams. Candidates may be asked to recognize, remember, or recall a keyword or concept mentioned in any of the eleven chapters. The specific learning objectives levels are shown at the beginning of each chapter, and classified as follows: K1: Remember K2: Understand v1.0 Page 9 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus K3: Apply K4: Analyze All terms listed as keywords just below chapter headings shall be remembered (K1), even if not explicitly mentioned in the learning objectives. 0.4 Hands-on Levels of Competency The Certified Tester Specialist AI Testing includes hands-on objectives which focus on practical skills and competencies. The following levels apply to hands-on objectives (as shown): H0: Live demo of an exercise or recorded video. H1: Guided exercise. The students follow a sequence of steps performed by the trainer. H2: Exercise with hints. The student is given an exercise with relevant hints so the exercise can be solved within the given timeframe, or students take part in a discussion. Competencies are achieved by performing hands-on exercises, such shown in the following list: Demonstrate underfitting and overfitting (H0). Perform data preparation in support of the creation of an ML model (H2). Identify training and test datasets and create an ML model (H2). Evaluate the created ML model using selected ML functional performance metrics (H2). Experience of the implementation of a perceptron (H1). Use of a tool to show how explainability can be used by testers (H2). Apply pairwise testing to derive and execute test cases for an AI-based system (H2). Apply metamorphic testing to derive and execute test cases for a given scenario (H2). Apply exploratory testing to an AI-based system (H2). Discuss, using examples, those activities in testing where AI is less likely to be used (H2). Implement a simple AI-based defect prediction system (H2). 0.5 The Certified Tester AI Testing Exam The Certified Tester AI Testing exam will be based on this syllabus. Answers to exam questions may require the use of material based on more than one section of this syllabus. All sections of the syllabus are examinable, except for the Introduction and Appendices. Standards and books are included as references, but their content is not examinable, beyond what is summarized in the syllabus itself from such standards and books. Refer to Certified Tester Specialist AI Testing “Overview” document for further details under section “Exam Structure”. Entry Requirement Note: The ISTQB® Foundation Level certificate shall be obtained before taking the Certified Tester Specialist AI Testing exam. v1.0 Page 10 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 0.6 Accreditation An ISTQB® Member Board may accredit training providers whose course material follows this syllabus. Training providers should obtain accreditation guidelines from the Member Board or body that performs the accreditation. An accredited course is recognized as conforming to this syllabus and is allowed to have an ISTQB® exam as part of the course. The accreditation guidelines for this syllabus follow the general Accreditation Guidelines published by the Processes Management and Compliance Working Group. 0.7 Level of Detail The level of detail in this syllabus allows internationally consistent courses and exams. In order to achieve this goal, the syllabus consists of: General instructional objectives describing the intention of the AI Testing Certified Tester. A list of terms that students must be able to recall. Learning and hands-on objectives for each knowledge area, describing the learning outcomes to be achieved. A description of the key concepts, including references to sources such as accepted literature or standards. The syllabus content is not a description of the entire knowledge area for the testing of AI-based systems; it reflects the level of detail to be covered in Certified Tester Specialist AI Testing training courses. It focuses on introducing the basic concepts of artificial intelligence (AI) and machine learning in particular, and how systems based on these technologies can be tested. 0.8 How this Syllabus is Organized There are eleven chapters with examinable content. The top-level heading for each chapter specifies the time for the chapter; timing is not provided below chapter level. For accredited training courses, the syllabus requires a minimum of 25.1 hours of instruction, distributed across the eleven chapters as follows: Chapter 1: 105 minutes Introduction to AI Chapter 2: 105 minutes Quality Characteristics for AI-Based Systems Chapter 3: 145 minutes Machine Learning (ML) – Overview Chapter 4: 230 minutes ML – Data Chapter 5: 120 minutes ML Functional Performance Metrics Chapter 6: 65 minutes ML – Neural Networks and Testing Chapter 7: 115 minutes Testing AI-Based Systems Overview Chapter 8: 150 minutes Testing AI-Specific Quality Characteristics Chapter 9: 245 minutes Methods and Techniques for the Testing of AI-Based Systems Chapter 10: 30 minutes Test Environments for AI-Based Systems Chapter 11: 195 minutes Using AI for Testing v1.0 Page 11 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus v1.0 Page 12 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 1 Introduction to AI – 105 minutes Testing Keywords None AI-Specific Keywords AI as a Service (AIaaS), AI development framework, AI effect, AI-based system, artificial intelligence (AI), neural network, deep learning (DL), deep neural network, general AI, General Data Protection Regulation (GDPR), machine learning (ML), narrow AI, pre-trained model, super AI, technological singularity, transfer learning Learning Objectives for Chapter 1: 1.1 Definition of AI and AI Effect AI-1.1.1 K2 Describe the AI effect and how it influences the definition of AI. 1.2 Narrow, General and Super AI AI-1.2.1 K2 Distinguish between narrow AI, general AI, and super AI. 1.3 AI-Based and Conventional Systems. AI-1.3.1 K2 Differentiate between AI-based systems and conventional systems. 1.4 AI Technologies AI-1.4.1 K1 Recognize the different technologies used to implement AI. 1.5 AI Development Frameworks AI-1.5.1 K1 Identify popular AI development frameworks. 1.6 Hardware for AI-Based Systems AI-1.6.1 K2 Compare the choices available for hardware to implement AI-based systems. 1.7 AI as a Service (AIaaS) AI-1.7.1 K2 Explain the concept of AI as a Service (AIaaS). 1.8 Pre-Trained Models AI-1.8.1 K2 Explain the use of pre-trained AI models and the risks associated with them. 1.9 Standards, Regulations and AI AI-1.9.1 K2 Describe how standards apply to AI-based systems. v1.0 Page 13 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 1.1 Definition of AI and AI Effect The term artificial intelligence (AI) dates back to the 1950s and refers to the objective of building and programming “intelligent” machines capable of imitating human beings. The definition today has evolved significantly, and the following definition captures the concept [S01]: The capability of an engineered system to acquire, process and apply knowledge and skills. The way in which people understand the meaning of AI depends on their current perception. In the 1970s the idea of a computer system that could beat a human at chess was somewhere in the future and most considered this to be AI. Now, over twenty years after the computer-based system Deep Blue beat world chess champion Garry Kasparov, the “brute force” approach implemented in that system is not considered by many to be true artificial intelligence (i.e., the system did not learn from data and was not capable of self-learning). Similarly, the expert systems of the 1970s and 1980s incorporated human expertise as rules which could be run repeatedly without the expert being present. These were considered to be AI then, but are not considered as such now. The changing perception of what constitutes AI is known as the “AI Effect” [R01]. As the perception of AI in society changes, so does its definition. As a result, any definition made today is likely to change in the future and may not match those from the past. 1.2 Narrow, General and Super AI At a high level, AI can be broken into three categories: Narrow AI (also known as weak AI) systems are programmed to carry out a specific task with limited context. Currently this form of AI is widely available. For example, game-playing systems, spam filters, test case generators and voice assistants. General AI (also known as strong AI) systems have general (wide-ranging) cognitive abilities similar to humans. These AI-based systems can reason and understand their environment as humans do, and act accordingly. As of 2021, no general AI systems have been realized. Super AI systems are capable of replicating human cognition (general AI) and make use of massive processing power, practically unlimited memory and access to all human knowledge (e.g., through access to the web). It is thought that super AI systems will quickly become wiser than humans. The point at which AI-based systems transition from general AI to super AI is commonly known as the technological singularity [B01]. 1.3 AI-Based and Conventional Systems In a typical conventional computer system, the software is programmed by humans using an imperative language, which includes constructs such as if-then-else and loops. It is relatively easy for humans to understand how the system transforms inputs into outputs. In an AI-based system using machine learning (ML), patterns in data are used by the system to determine how it should react in the future to new data (see Chapter 3 for a detailed explanation of ML). For example, an AI-based image processor designed to identify images of cats is trained with a set of images known to contain cats. The AI determines on its own what patterns or features in the data can be used to identify cats. These patterns and rules are then applied to new images in order to determine if they contain cats. In many AI-based systems, this results in the prediction-making procedure being less easy to understand by humans (see Section 2.7). v1.0 Page 14 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus In practice, AI-based systems can be implemented by a variety of technologies (see Section 1.4), and the “AI Effect” (see Section 1.1) may determine what is currently considered to be an AI-based system and what is considered to be a conventional system. 1.4 AI Technologies AI can be implemented using a wide range of technologies (see [B02] for more details), such as: Fuzzy logic Search algorithms Reasoning techniques - Rule engines - Deductive classifiers - Case-based reasoning - Procedural reasoning Machine learning techniques - Neural networks - Bayesian models - Decision trees - Random forest - Linear regression - Logistic regression - Clustering algorithms - Genetic algorithms - Support vector machine (SVM) AI-based systems typically implement one or more of these technologies. 1.5 AI Development Frameworks There are many AI development frameworks available, some of which are focused on specific domains. These frameworks support a range of activities, such as data preparation, algorithm selection, and compilation of models to run on various processors, such as central processing units (CPUs), graphical processing units (GPUs) or Cloud Tensor Processing Units (TPUs). The selection of a particular framework may also depend on particular aspects such as the programming language used for the implementation and its ease of use. The following frameworks are some of the most popular (as of April 2021): Apache MxNet: A deep learning open-source framework used by Amazon for Amazon Web Services (AWS) [R02]. CNTK: The Microsoft Cognitive Toolkit (CNTK) is an open-source deep-learning toolkit [R03]. IBM Watson Studio: A suite of tools that support the development of AI solutions [R04]. v1.0 Page 15 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Keras: A high-level open-source API, written in the Python language, capable of running on top of TensorFlow and CNTK [R06]. PyTorch: An open-source ML library operated by Facebook and used for apps applying image processing and natural language processing (NLP). Support is provided for both Python and C++ interfaces [R07]. Scikit-learn: An open-source machine ML library for the Python programming language [R08]. TensorFlow: An open-source ML framework based on data flow graphs for scalable machine learning, provided by Google [R05]. Note that these development frameworks are constantly evolving, sometimes combining, and sometimes being replaced by new frameworks. 1.6 Hardware for AI-Based Systems A variety of hardware is used for ML model training (see Chapter 3) and model implementation. For example, a model that performs speech recognition may run on a low-end smartphone, although access to the power of cloud computing may be needed to train it. A common approach used when the host device is not connected to the internet is to train the model in the cloud and then deploy it to the host device. ML typically benefits from hardware that supports the following attributes: Low-precision arithmetic: This uses fewer bits for computation (e.g., 8 instead of 32 bits, which is usually all that is needed for ML). The ability to work with large data structures (e.g., to support matrix multiplication). Massively parallel (concurrent) processing. General-purpose CPUs provide support for complex operations that are not typically required for ML applications and only provide a few cores. As a result, their architecture is less efficient for training and running ML models when compared to GPUs, which have thousands of cores and which are designed to perform the massively parallel but relatively simple processing of images. As a consequence, GPUs typically outperform CPUs for ML applications, even though CPUs typically have faster clock speeds. For small-scale ML work, GPUs generally offer the best option. Some hardware is specially intended for AI, such as purpose-built Application-Specific Integrated Circuits (ASICs) and System on a Chip (SoC). These AI-specific solutions have features such as multiple cores, special data management and the ability to perform in-memory processing. They are most suitable for edge computing, while the training of the ML model is done in the cloud. Hardware with specific AI architectures is currently (as of April 2021) under development. This includes neuromorphic processors [B03], which do not use the traditional von Neumann architecture, but rather an architecture that loosely mimics brain neurons. Examples of AI hardware providers and their processors include (as of April 2021): NVIDIA: They provide a range of GPUs and AI-specific processors, such as the Volta [R09]. Google: They have developed application-specific integrated circuits for both training and inferencing. Google TPUs (Cloud Tensor Processing Units) [R10] can be accessed by users on the Google Cloud, whereas the Edge TPU [R11] is a purpose-built ASIC designed to run AI on individual devices. v1.0 Page 16 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Intel: They provide Nervana neural network processors [R12] for deep learning (both training and inferencing) and Movidius Myriad vision processing units for inferencing in computer vision and neural network applications. Mobileye: They produce the EyeQ family of SoC devices [R13] to support complex and computationally intense vision processing. These have low power consumption for use in vehicles. Apple: They produce the Bionic chip for on-device AI in iPhones [B04]. Huawei: Their Kirin 970 chip for smartphones has built-in neural network processing for AI [B05]. 1.7 AI as a Service (AIaaS) AI components, such as ML models, can be created within an organization, downloaded from a third- party, or used as a service on the web (AIaaS). A hybrid approach is also possible in which some of the AI functionality is provided from within the system and some is provided as a service. When ML is used as a service, access is provided to an ML model over the web and support can also be provided for data preparation and storage, model training, evaluation, tuning, testing and deployment. Third-party providers (e.g., AWS, Microsoft) offer specific AI services, such as facial and speech recognition. This allows individuals and organizations to implement AI using cloud-based services even when they have insufficient resources and expertise to build their own AI services. In addition, ML models provided as part of a third-party service are likely to have been trained on a larger, more diverse training dataset than is readily available to many stakeholders, such as those who have recently moved into the AI market. 1.7.1 Contracts for AI as a Service These AI services are typically provided with similar contracts as for non-AI cloud-based Software as a Service (SaaS). A contract for AIaaS typically includes a service-level agreement (SLA) that defines availability and security commitments. Such SLAs typically cover an uptime for the service (e.g., 99.99% uptime) and a response time to fix defects, but rarely define ML functional performance metrics, (such as accuracy), in a similar manner (see Chapter 5). AIaaS is often paid for on a subscription basis, and if the contracted availability and/or response times are not met, then the service provider typically provides credits for future services. Other than these credits, most AIaaS contracts provide limited liability (other than in terms of fees paid), meaning that AI-based systems that depend on AIaaS are typically limited to relatively low-risk applications, where loss of service would not be too damaging. Services often come with an initial free trial period in lieu of an acceptance period. During this period the consumer of the AIaaS is expected to test whether the provided service meets their needs in terms of required functionality and performance (e.g., accuracy). This is generally necessary to cover any lack of transparency on the provided service (see Section 7.5). 1.7.2 AIaaS Examples The following are examples of AIaaS (as of April 2021): IBM Watson Assistant: This is an AI chatbot which is priced according to the number of monthly active users. v1.0 Page 17 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Google Cloud AI and ML Products: These provide document-based AI that includes a form parser and document OCR. Prices are based on the number of pages sent for processing. Amazon CodeGuru: This provides a review of ML Java code that supplies developers with recommendations for improving their code quality. Prices are based on the number of lines of source code analyzed. Microsoft Azure Cognitive Search: The provides AI cloud search. Prices are based on search units (defined in terms of the storage and throughput used). 1.8 Pre-Trained Models 1.8.1 Introduction to Pre-Trained Models It can be expensive to train ML models (see Chapter 3). First, the data has to be prepared and then the model must be trained. The first activity can consume large amounts of human resources, while the latter activity can consume a lot of computing resources. Many organizations do not have access to these resources. A cheaper and often more effective alternative is to use a pre-trained model. This provides similar functionality to the required model and is used as the basis for creating a new model that extends and/or focuses the functionality of the pre-trained model. Such models are only available for a limited number of technologies, such as neural networks and random forests. If an image classifier is needed, it could be trained using the publicly available ImageNet dataset, which contains over 14 million images classified into over 1000 categories. This reduces the risk of consuming significant resources with no guarantee of success. Alternatively, an existing model could be reused that has already been trained on this dataset. By using such a pre-trained model, training costs are saved and the risk of it not working largely eliminated. When a pre-trained model is used without modification, it can simply be embedded in the AI-based system, or it can be used as a service (see Section 1.7). 1.8.2 Transfer Learning It is also possible to take a pre-trained model and modify it to perform a second, different requirement. This is known as transfer learning and is used on deep neural networks in which the early layers (see Chapter 6) of the neural network typically perform quite basic tasks (e.g., identifying the difference between straight and curved lines in an image classifier), whereas the later layers perform more specialized tasks (e.g., differentiating between building architectural types). In this example, all but the later layers of an image classifier can be reused, eliminating the need to train the early layers. The later layers are then retrained to handle the unique requirements for a new classifier. In practice, the pre-trained model may be fine-tuned with additional training on new problem-specific data. The effectiveness of this approach largely depends on the similarity between the function performed by the original model and the function required by the new model. For example, modifying an image classifier that identifies cat species to then identify dog breeds would be far more effective than modifying it to identify people’s accents. There are many pre-trained models available, especially from academic researchers. Some examples of such pre-trained models are ImageNet models [R14] such as Inception, VGG, AlexNet, and MobileNet for image classification and pre-trained NLP models like Google’s BERT [R15]. v1.0 Page 18 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 1.8.3 Risks of using Pre-Trained Models and Transfer Learning Using pre-trained models and transfer learning are both common approaches to building AI-based systems, but there are some risks associated. These include: A pre-trained model may lack transparency compared to an internally generated model. The level of similarity between the function performed by the pre-trained model and the required functionality may be insufficient. Also, this difference may not be understood by the data scientist. Differences in the data preparation steps (see Section 4.1) used for the pre-trained model when originally developed and the data preparation steps used when this model is then used in a new system may impact the resulting functional performance. The shortcomings of a pre-trained model are likely to be inherited by those who reuse it and may not be documented. For example, inherited biases (see Section 2.4) may not be apparent if there is a lack of documentation about the data used to train the model. Also, if the pre-trained model is not widely used, there are likely to be more unknown (or undocumented) defects and more rigorous testing may be needed to mitigate this risk. Models created through transfer learning are highly likely to be sensitive to the same vulnerabilities as the pre-trained model on which it is based (e.g., adversarial attacks, as explained in 9.1.1). In addition, if an AI-based system is known to contain a specific pre- trained model (or is based on a specific pre-trained model), then vulnerabilities associated with it may already be known by potential attackers. Note that several of the above risks can be more easily mitigated by having thorough documentation available for the pre-trained model (see Section 7.5). 1.9 Standards, Regulations and AI The Joint Technical Committee of IEC and ISO on information technology (ISO/IEC JTC1) prepares international standards which contribute towards AI. For example, a subcommittee on AI (ISO/IEC JTC 1/SC42), was set up in 2017. In addition, ISO/IEC JTC1/SC7, which covers software and system engineering, has published a technical report on the “Testing of AI-based systems” [S01]. Standards on AI are also published at the regional level (e.g., European standards) and the national level. The EU-wide General Data Protection Regulation (GDPR) came into effect in May 2018 and sets obligations for data controllers with regards to personal data and automated decision-making [B06]. It includes requirements to assess and improve AI system functional performance, including the mitigation of potential discrimination, and for ensuring individuals’ rights to not be subjected to automated decision-making. The most important aspect of the GDPR from a testing perspective is that personal data (including predictions) should be accurate. This does not mean that every single prediction made by the system must be accurate, but that the system should be accurate enough for the purposes for which it is used. The German national standards body (DIN) has also developed the AI Quality Metamodel ([S02], [S03]). Standards on AI are also published by industry bodies. For example, the Institute of Electrical and Electronics Engineers (IEEE) is working on a range of standards on ethics and AI (The IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems). Many of these standards are still in development at the time of writing. v1.0 Page 19 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Where AI is used in safety-related systems, the relevant regulatory standards are applicable, such as ISO 26262 [S04] and ISO/PAS 21448 (SOTIF) [S05] for automotive systems. Such regulatory standards are typically mandated by government bodies, and it would be illegal to sell a car in some countries if the included software did not comply with ISO 26262. Standards in isolation are voluntary documents, and their use is normally only made mandatory by legislation or contract. However, many users of standards do so to benefit from the expertise of the authors and to create products that are of higher quality. v1.0 Page 20 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 2 Quality Characteristics for AI-Based Systems – 105 minutes Keywords None AI-Specific Keywords Adaptability, algorithmic bias, autonomy, bias, evolution, explainability, explainable AI (XAI), flexibility, inappropriate bias, interpretability, ML system, machine learning, reward hacking, robustness, sample bias, self-learning system, side effects, transparency Learning Objectives for Chapter 2: 2.1 Flexibility and Adaptability AI-2.1.1 K2 Explain the importance of flexibility and adaptability as characteristics of AI-based systems. 2.2 Autonomy AI-2.2.1 K2 Explain the relationship between autonomy and AI-based systems. 2.3 Evolution AI-2.3.1 K2 Explain the importance of managing evolution for AI-based systems. 2.4 Bias AI-2.4.1 K2 Describe the different causes and types of bias found in AI-based systems. 2.5 Ethics AI-2.5.1 K2 Discuss the ethical principles that should be respected in the development, deployment and use of AI-based systems. 2.6 Side Effects and Reward Hacking AI-2.6.1 K2 Explain the occurrence of side effects and reward hacking in AI-based systems. 2.7 Transparency, Interpretability and Explainability AI-2.7.1 K2 Explain how transparency, interpretability and explainability apply to AI-based systems. 2.8 Safety and AI AI-2.8.1 K1 Recall the characteristics that make it difficult to use AI-based systems in safety- related applications. v1.0 Page 21 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 2.1 Flexibility and Adaptability Flexibility and adaptability are closely related quality characteristics. In this syllabus, flexibility is considered to be the ability of the system to be used in situations that were not part of the original system requirements, while adaptability is considered to be the ease with which the system can be modified for new situations, such as different hardware and changing operational environments. Both flexibility and adaptability are useful if: the operational environment is not fully known when the system is deployed. the system is expected to cope with new operational environments. the system is expected to adapt to new situations. the system must determine when it should change its behavior. Self-learning AI-based systems are expected to demonstrate all of the above characteristics. As a consequence, they must be adaptable and have the potential to be flexible. The flexibility and adaptability requirements of an AI-based system should include details of any environment changes to which the system is expected to adapt. These requirements should also specify constraints on the time and resources that the system can use to adapt itself (e.g., how long can it take to adapt to recognizing a new type of object). 2.2 Autonomy When defining autonomy, it is important to first recognize that a fully autonomous system would be completely independent of human oversight and control. In practice, full autonomy is not often desired. For example, fully self-driving cars, which are popularly referred to as “autonomous”, are officially classified as having “full driving automation” [B07]. Many consider autonomous systems to be “smart” or “intelligent”, which suggests they would include AI-based components to perform certain functions. For example, autonomous vehicles that need to be situationally aware typically use several sensors and image processing to gather information about the vehicle’s immediate environment. Machine learning, and especially deep learning (see Section 6.1), has been found to be the most effective approach to performing this function. Autonomous systems may also include decision-making and control functions. Both of these can be effectively performed using AI-based components. Even though some AI-based systems are considered to be autonomous, this does not apply to all AI- based systems. In this syllabus, autonomy is considered to be the ability of the system to work independently of human oversight and control for prolonged periods of time. This can help with identifying the characteristics of an autonomous system that need to be specified and tested. For example, the length of time an autonomous system is expected to perform satisfactorily without human intervention needs to be known. In addition, it is important to identify the events for which the autonomous system must give control back to its human controllers. 2.3 Evolution In this syllabus, evolution is considered to be the ability of the system to improve itself in response to changing external constraints. Some AI systems can be described as self-learning and successful self-learning AI-based systems need to incorporate this form of evolution. v1.0 Page 22 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus AI-based systems often operate in an evolving environment. As with other forms of IT systems, an AI- based system needs to be flexible and adaptable enough to cope with changes in its operational environment. Self-learning AI-based systems typically need to manage two forms of change: One form of change is where the system learns from its own decisions and its interactions with its environment. The other form of change is where the system learns from changes made to the system’s operational environment. In both cases the system will ideally evolve to improve its effectiveness and efficiency. However, this evolution must be constrained to prevent the system from developing any unwanted characteristics. Any evolution must continue to meet the original system requirements and constraints. Where these are lacking, the system must be managed to ensure that any evolution remains within limits and that it always stays aligned with human values. Section 2.6 provides examples relating to the impact of side effects and reward hacking on self-learning AI-based systems. 2.4 Bias In the context of AI-based systems, bias is a statistical measure of the distance between the outputs provided by the system and what are considered to be “fair outputs” which show no favoritism to a particular group. Inappropriate biases can be linked to attributes such as gender, race, ethnicity, sexual orientation, income level, and age. Cases of inappropriate bias in AI-based systems have been reported, for example, in systems used for making recommendations for bank lending, in recruitment systems, and in judicial monitoring systems. Bias can be introduced into many types of AI-based systems. For example, it is difficult to prevent the bias of experts being built-in to the rules applied by an expert system. However, the prevalence of ML systems means that much of the discussion relating to bias takes place in the context of these systems. ML systems are used to make decisions and predictions, using algorithms which make use of collected data, and these two components can introduce bias in the results: Algorithmic bias can occur when the learning algorithm is incorrectly configured, for example, when it overvalues some data compared to others. This source of bias can be caused and managed by the hyperparameter tuning of the ML algorithms (see Section 3.2). Sample bias can occur when the training data is not fully representative of the data space to which ML is applied. Inappropriate bias is often caused by sample bias, but occasionally it can also be caused by algorithmic bias. 2.5 Ethics Ethics is defined in the Cambridge Dictionary as: a system of accepted beliefs that control behavior, especially such a system based on morals AI-based systems with enhanced capabilities are having a largely positive effect on people’s lives. As these systems have become more widespread, concerns have been raised as to whether they are used in an ethical manner. v1.0 Page 23 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus What is considered ethical can change over time and can also change among localities and cultures. Care must be taken that the deployment of an AI-based system from one location to another considers differences in stakeholder values. National and international policies on the ethics of AI can be found in many countries and regions. The Organisation for Economic Co-operation and Development issued its principles for AI, the first international standards agreed by governments for the responsible development of AI, in 2019 [B08]. These principles were adopted by forty-two countries when they were issued and are also backed by the European Commission. They include practical policy recommendations as well as value-based principles for the “responsible stewardship of trustworthy AI”. These are summarized as: AI should benefit people and the planet by driving inclusive growth, sustainable development and well-being. AI systems should respect the rule of law, human rights, democratic values and diversity, and should include appropriate safeguards to ensure a fair society. There should be transparency around AI to ensure that people understand outcomes and can challenge them. AI systems must function in a robust, secure and safe way throughout their life cycles and risks should be continually assessed. Organizations and individuals developing, deploying or operating AI systems should be held accountable. 2.6 Side Effects and Reward Hacking Side effects and reward hacking can result in AI-based systems generating unexpected, and even harmful, results when the system attempts to meet its goals [B09]. Negative side effects can result when the designer of an AI-based system specifies a goal that “focuses on accomplishing some specific tasks in the environment but ignores other aspects of the (potentially very large) environment, and thus implicitly expresses indifference over environmental variables that might actually be harmful to change” [B09]. For example, a self-driving car with a goal of travelling to its destination in “as fuel-efficient and safe manner as possible” may achieve the goal, but with the side effect of the passengers becoming extremely annoyed at the excessive time taken. Reward hacking can result from an AI-based system achieving a specified goal by using a “clever” or “easy” solution that “perverts the spirit of the designer’s intent”. Effectively, the goal can be gamed. A widely used example of reward hacking is where an AI-based system is teaching itself to play an arcade computer game. It is presented with the goal of achieving the “highest score” , and to do so it simply hacks the data record that stores the highest score, rather than playing the game to achieve it. 2.7 Transparency, Interpretability and Explainability AI-based systems are typically applied in areas where users need to trust those systems. This may be for safety reasons, but also where privacy is needed and where they might provide potentially life- changing predictions and decisions. Most users are presented with AI-based systems as “black boxes” and have little awareness of how these systems arrive at their results. In some cases, this ignorance may even apply to the data scientists who built the systems. Occasionally, users may not even be aware they are interacting with an AI-based system. v1.0 Page 24 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus The inherent complexity of AI-based systems has led to the field of “Explainable AI” (XAI). The aim of XAI is for users to be able to understand how AI-based systems come up with their results, thus increasing users’ trust in them. According to The Royal Society [B10], there are several reasons for wanting XAI, including: giving users confidence in the system safeguarding against bias meeting regulatory standards or policy requirements improving system design assessing risk, robustness, and vulnerability understanding and verifying the outputs from a system autonomy, agency (making the user feel empowered), and meeting social values This leads to the following three basic desirable XAI characteristics for AI-based systems from the perspective of a stakeholder (see also Section 8.6): Transparency: This is considered to be the ease with which the algorithm and training data used to generate the model can be determined. Interpretability: This is considered to be the understandability of the AI technology by various stakeholders, including the users. Explainability: This is considered to be the ease with which users can determine how the AI- based system comes up with a particular result. 2.8 Safety and AI In this syllabus, safety is considered to be the expectancy that an AI-based system will not cause harm to people, property or the environment. AI-based systems may be used to make decisions that affect safety. For example, AI-based systems working in the fields of medicine, manufacturing, defense, security, and transportation have the potential to affect safety. The characteristics of AI-based systems that make it more difficult to ensure they are safe (e.g., do not harm humans) include: complexity non-determinism probabilistic nature self-learning lack of transparency, interpretability and explainability lack of robustness The challenges of testing several of these characteristics are covered in Chapter 8. v1.0 Page 25 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 3 Machine Learning (ML) – Overview - 145 minutes Keywords None AI-Specific Keywords Association, classification, clustering, data preparation, ML algorithm, ML framework, ML functional performance criteria, ML model, ML training data, ML workflow, model evaluation, model tuning, outlier, overfitting, regression, reinforcement learning, supervised learning, underfitting, unsupervised learning Learning Objectives for Chapter 3: 3.1 Forms of ML AI-3.1.1 K2 Describe classification and regression as part of supervised learning. AI-3.1.2 K2 Describe clustering and association as part of unsupervised learning. AI-3.1.3 K2 Describe reinforcement learning. 3.2 ML Workflow AI-3.2.1 K2 Summarize the workflow used to create an ML system. 3.3 Selecting a Form of ML AI-3.3.1 K3 Given a project scenario, identify an appropriate form of ML (from classification, regression, clustering, association, or reinforcement learning). 3.4 Factors involved in ML Algorithm Selection AI-3.4.1 K2 Explain the factors involved in the selection of ML algorithms. 3.5 Overfitting and Underfitting AI-3.5.1 K2 Summarize the concepts of underfitting and overfitting. HO-3.5.1 H0 Demonstrate underfitting and overfitting. v1.0 Page 26 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 3.1 Forms of ML ML algorithms can be categorized as: supervised learning, unsupervised learning, and reinforcement learning. 3.1.1 Supervised Learning In this kind of learning, the algorithm creates the ML model from labeled data during the training phase. The labeled data, which typically comprises pairs of inputs (e.g., an image of a dog and the label “dog”) is used by the algorithm to infer the relationship between the input data (e.g., images of dogs) and the output labels (e.g., “dog” and “cat”) during the training. During the ML model testing phase, a new set of unseen data is applied to the trained model to predict the output. The model is deployed once the output accuracy level is satisfactory. Problems solved by supervised learning are divided into two categories: Classification: This is when the problem requires an input to be classified into one of a few pre-defined classes, classification is used. Face recognition or object detection in an image are examples of problems that use classification. Regression: This is when the problem requires the ML model to predict a numeric output using regression. Predicting the age of a person based on input data about their habits or predicting the future prices of stocks are examples of problems that use regression. Note that the term regression, as used in the context of a ML problem, is different to its use in other ISTQB® syllabi, such as [I01], where regression is used to describe the problem of software modifications causing change-related defects. 3.1.2 Unsupervised Learning In this kind of learning, the algorithm creates the ML model from unlabeled data during the training phase. The unlabeled data is used by the algorithm to infer patterns in the input data during the training and assigns inputs to different classes, based on their commonalities. During the testing phase, the trained model is applied to a new set of unseen data to predict which classes the input data should be assigned to. The model is deployed once the output accuracy level is considered to be satisfactory. Problems solved by unsupervised learning are divided into two categories: Clustering: This is when the problem requires the identification of similarities in input data points that allows them to be grouped based on common characteristics or attributes. For example, clustering is used to categorize different types of customers for the purpose of marketing. Association: This is when the problem requires interesting relationships or dependencies to be identified among data attributes. For example, a product recommendation system may identify associations based on customers’ shopping behavior. v1.0 Page 27 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 3.1.3 Reinforcement Learning Reinforcement learning is an approach where the system (an “intelligent agent”) learns by interacting with the environment in an iterative manner and thereby learns from experience. Reinforcement learning does not use training data. The agent is rewarded when it makes a correct decision and penalized when it makes an incorrect decision. Setting up the environment, choosing the right strategy for the agent to meet the desired goal, and designing a reward function, are key challenges when implementing reinforcement learning. Robotics, autonomous vehicles, and chatbots are examples of applications that use reinforcement learning. 3.2 ML Workflow The activities in the machine learning workflow are: Understand the Objectives The purpose of the ML model to be deployed needs to be understood and agreed with the stakeholders to ensure alignment with business priorities. Acceptance criteria (including ML functional performance metrics – see Chapter 5) should be defined for the developed model. Select a Framework A suitable AI development framework should be selected based on the objectives, acceptance criteria, and business priorities (see Section 1.5). Select & Build the Algorithm An ML algorithm is selected based on various factors including the objectives, acceptance criteria, and the available data (see Section 3.4). The algorithm may be manually coded, but it is often retrieved from a library of pre-written code. The algorithm is then compiled to prepare for training the model, if required. Prepare & Test Data Data preparation (see Section 4.1) comprises data acquisition, data pre-processing and feature engineering. Exploratory data analysis (EDA) may be performed alongside these activities. The data used by the algorithm and model will be based on the objectives and is used by all the activities in the “model generation and test” activity shown on Figure 1. For example, if the system is a real-time trading system, the data will come from the trading market. The data used to train, tune and test the model must be representative of the operational data that will be used by the model. In some cases, it is possible to use pre-gathered datasets for the initial training of the model (e.g., see Kaggle datasets [R16]). Otherwise, raw data typically needs some pre-processing and feature engineering. Testing of the data and any automated data preparation steps needs to be performed. See Section 7.2.1 for more details on input data testing. Train the Model The selected ML algorithm uses training data to train the model. Some algorithms, such as those generating a neural network, read the training dataset several times. Each iteration of training on the training dataset is referred to as an epoch. v1.0 Page 28 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Parameters defining the model structure (e.g., the number of layers of a neural network or the depth of a decision tree) are passed to the algorithm. These parameters are known as model hyperparameters. Parameters that control the training (e.g., how many epochs to use when training a neural network) are also passed to the algorithm. These parameters are known as algorithm hyperparameters. Evaluate the Model The model is evaluated against the agreed ML functional performance metrics, using the validation dataset and the results then used to improve (tune) the model. Model evaluation and tuning should resemble a scientific experiment that needs to be carefully conducted under controlled conditions with clear documentation. In practice, several models are typically created and trained using different algorithms (e.g., random forests, SVM, and neural networks), and the best one is chosen, based on the results of the evaluation and tuning. Tune the Model The results from evaluating the model against the agreed ML functional performance metrics are used to adjust the model settings to fit the data and thereby improve its performance. The model may be tuned by hyperparameter tuning, where the training activity is modified (e.g., by changing the number of training steps or by changing the amount of data used for training), or attributes of the model are updated (e.g., the number of neurons in a neural network or the depth of a decision tree). The three activities of training, evaluation and tuning can be considered as comprising model generation, as shown on Figure 1. Test the Model Once a model has been generated, (i.e., it has been trained, evaluated and tuned), it should be tested against an independent test dataset set to ensure that the agreed ML functional performance criteria are met (see Section 7.2.2). The functional performance measures from testing are also compared with those from evaluation, and if the performance of the model with independent data is significantly lower than during evaluation, it may be necessary to select a different model. In addition to functional performance tests, non-functional tests, such as for the time to train the model, and the time and resource usage taken to provide a prediction, also need to be performed. Typically, these tests are performed by the data engineer/scientist, but testers with sufficient knowledge of the domain and access to the relevant resources can also perform these tests. Deploy the Model Once model development is complete, as shown on Figure 1, the tuned model typically needs to be re-engineered for deployment along with its related resources, including the relevant data pipeline. This is normally achieved through the framework. Targets might include embedded systems and the cloud, where the model can be accessed via a web API. v1.0 Page 29 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Figure 1: ML Workflow Use the Model Once deployed, the model is typically part of a larger AI-based system and can be used operationally. Models may perform scheduled batch predictions at set time intervals or may run on request in real time. Monitor and Tune the Model While the model is being used, its situation may evolve and the model may drift away from its intended performance (see Sections 2.3 and 7.6). To ensure that any drift is identified and managed, the operational model should be regularly evaluated against its acceptance criteria. It may be deemed necessary to update the model settings to address the problem of drift or it may be decided that re-training with new data is needed to create a more accurate or more robust model. In this case a new model may be created and trained with updated training v1.0 Page 30 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus data. The new model may then be compared against the existing model using a form of A/B testing (see Section 9.4). The ML workflow shown in Figure 1 is a logical sequence. In practice, the workflow is applied in a manner where the steps are repeated iteratively (e.g., when the model is evaluated, it is often necessary to return to the training step, and sometimes to data preparation). The steps shown in Figure 1 do not include the integration of the ML model with the non-ML parts of the overall system. Typically, ML models cannot be deployed in isolation and need to be integrated with the non-ML parts. For example, in vision applications, there is a data pipeline that cleans and modifies data before submitting it to the ML model. Where the model is part of a larger AI-based system, it will need to be integrated into this system prior to deployment. In this case, integration, system and acceptance test levels may be performed, as described in Section 7.2. 3.3 Selecting a Form of ML When selecting an appropriate ML approach, the following guidelines apply: There should be sufficient training and test data available for the selected ML approach. For supervised learning, it is necessary to have properly labeled data. If there is an output label, it may be supervised learning. If the output is discrete and categorical, it may be classification. If the output is numeric and continuous in nature, it may be regression. If no output is provided in the given dataset, it may be unsupervised learning. If the problem involves grouping similar data, it may be clustering. If the problem involves finding co-occurring data items, it may be association. Reinforcement learning is better suited to contexts in which there is interaction with the environment. If the problem involves the notion of multiple states, and involves decisions at each state, then reinforcement learning may be applicable. 3.4 Factors Involved in ML Algorithm Selection There is no definitive approach to selecting the optimal ML algorithm, ML model settings and ML model hyperparameters. In practice, this set is chosen based on a mix of the following factors: The required functionality (e.g., whether the functionality is classification or prediction of a discrete value) The required quality characteristics; such as o accuracy (e.g., some models may be more accurate, but be slower) o constraints on available memory (e.g., for an embedded system) o the speed of training (and retraining) the model o the speed of prediction (e.g., for real-time systems) o transparency, interpretability and explainability requirements v1.0 Page 31 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus The type of data available for training the model (e.g., some models might only work with image data) The amount of data available for training and testing the model (some models might, for example, have a tendency to overfit with a limited amount of data, to a greater degree than other models) The number of features in the input data expected to be used by the model (e.g., other factors, such as speed and accuracy, are likely to be directly affected by the number of features) The expected number of classes for clustering (e.g., some models may be unsuitable for problems with more than one class) Previous experience Trial and error 3.5 Overfitting and Underfitting 3.5.1 Overfitting Overfitting occurs when the model fits too closely to a set of data points and fails to properly generalize. Such a model works very well with the data used to train it but can struggle to provide accurate predictions for new data. Overfitting can occur when the model tries to fit to every data point, including those data points that may be described as noise or outliers. It can also occur when insufficient data is provided in the training dataset. 3.5.2 Underfitting Underfitting occurs when the model is not sophisticated enough to accurately fit to the patterns in the training data. Underfitting models tend to be too simplistic and can struggle to provide accurate predictions for both new data and data very similar to the training data. One cause of underfitting can be a training dataset that does not contain features that reflect important relationships between inputs and outputs. It can also occur when the algorithm does not correctly fit the data (e.g., creating a linear model for non-linear data). 3.5.3 Hands-On Exercise: Demonstrate Overfitting and Underfitting Demonstrate the concepts of overfitting and underfitting on a model. This could be demonstrated by using a dataset that contains very little data (overfitting), and a dataset with poor feature correlations (underfitting). v1.0 Page 32 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 4 ML - Data – 230 minutes Keywords None AI-Specific Keywords Annotation, augmentation, classification model, data labelling, data preparation, ML training data, supervised learning, test dataset, validation dataset Learning Objectives for Chapter 4: 4.1 Data Preparation as part of the ML Workflow AI-4.1.1 K2 Describe the activities and challenges related to data preparation. HO-4.1.1 H2 Perform data preparation in support of the creation of an ML model. 4.2 Training, Validation and Test Datasets in the ML Workflow AI-4.2.1 K2 Contrast the use of training, validation and test datasets in the development of an ML model. HO-4.2.1 H2 Identify training and test datasets and create an ML model. 4.3 Dataset Quality Issues AI-4.3.1 K2 Describe typical dataset quality issues. 4.4 Data quality and its effect on the ML model AI-4.4.1 K2 Recognize how poor data quality can cause problems with the resultant ML model. 4.5 Data Labelling for Supervised Learning AI-4.5.1 K1 Recall the different approaches to the labelling of data in datasets for supervised learning. AI-4.5.2 K1 Recall reasons for the data in datasets being mislabeled. v1.0 Page 33 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 4.1 Data Preparation as Part of the ML Workflow Data preparation uses an average of 43% of the ML workflow effort and is probably the most resource-intensive activity in the ML workflow. In comparison, model selection and building uses only 17% [R17]. Data preparation forms part of the data pipeline, which takes in raw data and outputs data in a form that can be used to both train an ML model and for prediction by a trained ML model. Data preparation can be considered to comprise the following activities: Data acquisition Identification: The types of data to be used for training and predictions are identified. For example, for a self-driving car, it could include the identification of the need for radar, video and laser imaging, detection, and ranging (LiDAR) data. Gathering: The source of the data is identified and the means for collecting the data are determined. For example, this could include the identification of the International Monetary Fund (IMF) as a source for financial data and the channels that will be used to submit the data into the AI-based system. Labelling: See Section 4.5. The acquired data can be in various forms (e.g., numerical, categorical, image, tabular, text, time- series, sensor, geospatial, video, and audio). Data pre-processing Cleaning: Where incorrect data, duplicate data or outliers are identified, they are either removed or corrected. In addition, data imputation may be used to replace missing data values with estimated or guessed values (e.g., using mean, median and mode values). The removal or anonymization of personal information may also be performed. Transformation: The format of the given data is changed (e.g., breaking an address held as a string into its constituent parts, dropping a field holding a random identifier, converting categorical data into numerical data, changing image formats). Some of the transformations applied on numerical data include scaling to ensure that the same range is used. Standardization, for example, rescales data to ensure it takes a mean of zero and a standard deviation of one. This normalization ensures that the data has a range between zero and one. Augmentation: This is used to increase the number of samples in a dataset. Augmentation can also be used to include adversarial examples in the training data, providing robustness against adversarial attacks (see 9.1). Sampling: This involves selection of some part of the total available dataset so that patterns in the larger dataset can be observed. This is typically done to reduce costs and the time needed to create the ML model. Note that all pre-processing carries a risk that it may change useful valid data or add invalid data. Feature engineering Feature selection: A feature is an attribute/property reflected in the data. Feature selection involves the selection of those features which are most likely to contribute to model training and prediction. In practice, it often includes the removal features that are not expected (or that are not wanted) to have any effect on the resultant model. By removing irrelevant information (noise), feature selection can reduce overall training times, prevent overfitting (see Section 3.5.1), increase accuracy and make models more generalizable. v1.0 Page 34 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus Feature extraction: This involves the derivation of informative and non-redundant features from the existing features. The resulting data set is typically smaller and can be used to generate an ML model of equivalent accuracy more cheaply and more quickly. In parallel to these data preparation activities, exploratory data analysis (EDA) is also typically carried out to support the overall data preparation task. This includes performing data analysis to discover trends inherent in the data and using data visualization to represent data in a visual format by plotting trends in the data. Although the above data preparation activities and sub-activities have been shown in a logical order, different projects may re-order them or only use a subset of them. Some of the data preparation steps, such as the identification of the data source, are performed just once and can be performed manually. Other steps may be part of the operational data pipeline and normally work on live data. These tasks should be automated. 4.1.1 Challenges in Data Preparation Some of the challenges related to data preparation include: The need for knowledge of: o the application domain. o the data and its properties. o the various techniques associated with data preparation. The difficulty of getting high quality data from multiple sources. The difficulty of automating the data pipeline, and ensuring that the production data pipeline is both scalable and has reasonable performance efficiency (e.g., time needed to complete the processing of a data item). The costs associated with data preparation. Not giving sufficient priority to checking for defects introduced into the data pipeline during data preparation. The introduction of sample bias (see Section 2.4). 4.1.2 Hands-On Exercise: Data Preparation for ML For a given set of raw data, perform the applicable data preparation steps as outlined in Section 4.1 to produce a dataset that will be used to create a classification model using supervised learning. This activity forms the first step in creating an ML model that will be used for future exercises. To perform this activity, students will be provided with appropriate (and language-specific) materials, including: Libraries ML frameworks Tools A development environment v1.0 Page 35 of 100 2021-10-01 © International Software Testing Qualifications Board Certified Tester AI Testing (CT-AI) Syllabus 4.2 Training, Validation and Test Datasets in the ML Workflow Logically, three sets of equivalent data (i.e., randomly selected from a single initial dataset) are required to develop an ML model: A training dataset, which is used to train the model. A validation dataset, which used for evaluating and subsequently tuning the model. A test dataset, (also known as the holdout dataset), which is used for testing the tuned model. If unlimited suitable data is available, the amount of data used in the ML workflow for training, evaluation and testing typically depends on the following factors: The algorithm used to train the model. The availability of resources, such as RAM, disk space, computing power, network bandwidth and the available time. In practice, due to the challenge of acquiring sufficient suitable data, the training and validation datasets are often derived from a single combined dataset. The test dataset is kept separate and is not used during training. This is to ensure the developed model is not influenced by the test data, and so that test results give a true reflection of the model’s quality There is no optimal ratio for splitting the combined dataset into the three individual datasets, but the typical ratios which may be used as a guideline range from 60:20:20 to 80:10:10 (training: validation: test). Splitting the data into these datasets it is often done randomly, unless the dataset is small or if there is a risk of the resultant datasets not being representative of the expected operational data. If limited data is available, then splitting the available data into three datasets may result in insufficient data being available for effective training. To overcome this issue, the training and validation datasets may be combined (keeping the test dataset separate), and then used to create multiple split combinations of this dataset (e.g., 80% training / 20% validation). Data is then randomly assigned to the training and validation datasets. Training, validation and tuning are performed using these multiple split combinations to create multiple tuned models, and the overall model performance may be calculated as the average across all runs. There are various methods used for creating multiple split combinations, which include split-test, bootstrap, K-fold cross validation and leave-one-out cross validation (see [B02] for more details). 4.2.1 Hands-On Exercise: Identify Training and Test Data and Create an ML Model Split the pr

ISTQB CT-AI Syllabus v1.0 PDF

Document Details

Tags

Related

Summary

Full Transcript