Domain 2 Lesson 1: Artificial Intelligence PDF

Summary

This document is a lesson on Data Collection for Artificial Intelligence. It includes information about determining data types, and whether to use an existing dataset or generate one. It also touches on balancing data and examples of data collection methods.

Full Transcript

Domain 2 Lesson 1 31 | Domain 2 Lesson 1: Review Legal Requirements Artificial Intelligence Project Workbook, First Edition Data Collection Project Details...

Domain 2 Lesson 1 31 | Domain 2 Lesson 1: Review Legal Requirements Artificial Intelligence Project Workbook, First Edition Data Collection Project Details Project file When working with data for an AI model, one must determine the type and N/A characteristics needed to use AI to solve a problem. Once data types and Estimated completion time characteristics are defined, the next step in the data collection process is to 15 minutes determine whether an existing data set is a good starting point for an AI model or if data needs to be generated. Video reference Domain 2 Purpose Topic: Choose the Way to Collect Data Upon completing this project, you will better understand the data collection Subtopic: Determine Data Types process for an AI model. Needed; Decide if Needed Data Exists; Automated vs. User Input Steps for Completion Objectives covered 2 Data Collection, Processing, and 1. Determine whether each statement is true or false. Engineering a. F The more complex an AI problem is, the more 2.1 Choose the way to collect data 2.1.1 Determine data is needed. type/characteristics of data needed 2.1.2 Decide if there is an existing b. F A data set must be small to produce accurate data set or if you need to generate results. your own 2.1.3 When generating your own c. T Users should not sacrifice the quality of data to data set, decide whether collection save time. can be automated or requires user input d. T Automated data processes only collect what they are programmed to collect. Notes for the teacher Ensure students understand that subject 2. Explain why data must be balanced. matter experts and AI solution developers should work together to a. to avoid bias??? identify data types and characteristics needed for an AI solution. 3. Describe one example of an AI problem that is well-suited for automatically collected data. a. restocking 4. Describe one example of an AI problem that is well-suited for data collected from user input. a. amusment park quality 32 | Domain 2 Lesson 1: Data Collection Artificial Intelligence Project Workbook, First Edition Quality of Data Project Details Project file After data has been collected, AI developers must determine the quality of the N/A data. Determining the quality of data can help AI developers understand if the Estimated completion time data they collect will meet the needs of an AI project. 5 minutes Purpose Video reference Domain 2 Upon completing this project, you will better understand how to determine the Topic: Assess Data Quality quality of collected data. Subtopic: Determine if Dataset Meets Needs Steps for Completion Objectives covered 1. How can descriptive statistics help evaluate data used for an AI model? 2 Data Collection, Processing, and Engineering a. it will give a data summary useing 2.2 Assess data quality central tendencies, dispersions, and distrubusions 2.2.1 Determine if data set meets needs of task 2. How can correlation analysis help evaluate data used for an AI model? Notes for the teacher If time permits, you may choose to a. it can show the relasionship between the data and ai show students examples of descriptive statistics, correlation analysis, and outlier detection methods. 3. How can outlier detection methods help evaluate data used for an AI model? a. outlier detection methods are also critical for identifying data points that significantly deviate frome the norm, which could otherwise skew our model's results 33 | Domain 2 Lesson 1: Quality of Data Artificial Intelligence Project Workbook, First Edition Clean Data Project Details Project file After AI developers determine that the data they have collected will meet the N/A needs of the AI project they are working on, the collected data must be cleaned. Estimated completion time Cleaning data involves identifying missing, misaligned, fake, and corrupted data 5 minutes and then implementing techniques that help prevent flawed data from skewing AI results. Video reference Domain 2 Purpose Topic: Assess Data Quality Subtopic: Look for Missing or Upon completing this project, you will better understand how to clean collected Corrupt Data data. Objectives covered 2 Data Collection, Processing, and Steps for Completion Engineering 2.2 Assess data quality 1. How can users handle missing data? 2.2.2 Look for missing or corrupt a. mean imputation data elements Notes for the teacher If time permits, you may choose to show students some of the software 2. How can users handle misaligned data? available for data cleaning such as Power Query, found in Excel and in a. data verification Power BI. 3. How can users handle fake or corrupt data? a. verification 34 | Domain 2 Lesson 1: Clean Data Artificial Intelligence Project Workbook, First Edition Domain 2 Lesson 2 35 | Domain 2 Lesson 2: Clean Data Artificial Intelligence Project Workbook, First Edition Bias in Data Project Details Project file AI developers must ensure that the data they collect represents what they are N/A trying to accomplish. One of the biggest detriments to having quality data is Estimated completion time bias. Data collection tools can be prone to bias in different ways. 10 minutes Purpose Video reference Domain 2 Upon completing this project, you will better understand how to evaluate data Topic: Ensure Representation for bias. Subtopic: Examine Collection Techniques; Ensure Enough Data is Steps for Completion Unbiased 1. What is selection bias? Objectives covered 2 Data Collection, Processing, and a. Engineering 2.3 Ensure that data are representative 2.3.1 Examine collection 2. What is historical bias? techniques for potential sources of bias a. 2.3.2 Make sure the amount of data is enough to build an unbiased model 3. Describe one way in which survey tools can be biased. Notes for the teacher Ensure students understand that a. eliminating all bias is often not possible, but removing as much bias as possible will lead to more accurate AI results. 4. Why might online data collection tools be biased? a. 5. Describe one way in which observational data collection tools can be biased. a. 6. What can AI developers do if they determine a data set is too small? a. 7. A data set should be large enough to produce a(n) AI model. 36 | Domain 2 Lesson 2: Bias in Data Artificial Intelligence Project Workbook, First Edition Buy or Develop Project Details Project file Developing an AI solution requires extensive programming knowledge in N/A languages such as Python, R, or Java. However, creating an AI solution from Estimated completion time scratch is not always necessary. There are many tools available to help build AI 10 minutes solutions. Whether buying or developing an AI model, organizations should consider how the AI model will be stored and what costs are associated with the Video reference AI project. Domain 2 Topic: Identify Resource Purpose Requirements Subtopic: Assess Existing Upon completing this project, you will better understand the storage options Resources; Consider Budget and Resources and costs associated with an AI project. Objectives covered Steps for Completion 2 Data Collection, Processing, and Engineering 1. When should AI developers consider hosting an AI model locally? 2.4 Identify resource requirements (e.g., computing, time complexity) a. 2.4.1 Assess whether problem is solvable with available computing resources 2.4.2 Consider the budget of the project and resources that are 2. When should AI developers consider hosting an AI model in the cloud? available a. Notes for the teacher Ensure students understand that there are benefits and risks to both local and cloud storage. Also, ensure students understand when it is more appropriate 3. List one technical resource that should be considered when budgeting to buy an AI solution and when it is for an AI project. more appropriate to develop an AI solution. a. 4. List one human resource that should be considered when budgeting for an AI project. a. 37 | Domain 2 Lesson 2: Buy or Develop Artificial Intelligence Project Workbook, First Edition Domain 2 Lesson 3 38 | Domain 2 Lesson 3: Buy or Develop Artificial Intelligence Project Workbook, First Edition Convert Images to Binary Project Details Project file Format N/A Estimated completion time In many AI systems, data needs to be converted from whatever format it arrives 5 minutes into binary, such as zeros and ones, to be useful for an AI model. This Video reference conversion especially applies to images. Images must be converted to binary Domain 2 format for an AI model to be able to process and use the data from the images Topic: Convert Data in results. Subtopic: Convert to Binary Purpose Objectives covered 2 Data Collection, Processing, and Upon completing this project, you will better understand how to convert images Engineering 2.5 Convert data into suitable to binary format. formats (e.g., numerical, image, time series) Steps for Completion 2.5.1 Convert data to binary (e.g., images become pixels) 1. What does it mean to transform an image’s data into binary format? Notes for the teacher a. If time permits, you may choose to show students an example of an image being converted to binary format. 2. What is one-hot encoding? a. 3. What are word embeddings? a. 39 | Domain 2 Lesson 3: Convert Images to Binary Format Artificial Intelligence Project Workbook, First Edition Convert Sentences to Tokens Project Details Project file In addition to converting images to binary format, AI developers may need to N/A convert text to tokens. Tokens are smaller units of words or sentences. AI Estimated completion time developers should remember that to process data and make predictions, an AI 5 minutes model needs to convert data, whether from images or text, into a language it can understand to properly evaluate data. Video reference Domain 2 Purpose Topic: Convert Data Subtopic: Convert to Features Upon completing this project, you will better understand how to convert Suitable for AI sentences to tokens. Objectives covered 2 Data Collection, Processing, and Steps for Completion Engineering 2.5 Convert data into suitable 1. List the steps of converting sentences to tokens in the correct order. formats (e.g., numerical, image, time a. series) 2.5.2 Convert computer data into b. features suitable for AI (e.g., sentences become tokens) c. Notes for the teacher d. If time permits, you may choose to show students an example of 2. Why might AI developers convert sentences to tokens? converting sentences to tokens. a. 40 | Domain 2 Lesson 3: Convert Sentences to Tokens Artificial Intelligence Project Workbook, First Edition Domain 2 Lesson 4 41 | Domain 2 Lesson 4: Convert Sentences to Tokens Artificial Intelligence Project Workbook, First Edition Features Project Details Project file A feature is a characteristic used in AI machine learning. An AI model using N/A incorrect features will not produce a usable outcome, thus wasting time and Estimated completion time money in an organization’s quest to make AI work for it. Including only features 5 minutes that belong can help ensure an AI model performs well and has likely results. Video reference Purpose Domain 2 Topic: Select Features for the AI Upon completing this project, you will better understand how to determine Model what features to include in an AI model. Subtopic: Determine Features to Include Steps for Completion Objectives covered 1. What two questions should AI developers ask when determining what 2 Data Collection, Processing, and Engineering features to include in an AI model? 2.6 Select features for the AI model a. 2.6.1 Determine which features of data to include Notes for the teacher b. Ensure students understand that they should also consider the availability of a feature before including it in an AI 2. How can a correlation between two data points be used to determine model. whether a feature should be included in an AI model? a. 42 | Domain 2 Lesson 4: Features Artificial Intelligence Project Workbook, First Edition Feature Vectors Project Details Project file In addition to selecting appropriate features for a testing or training data set for N/A an AI model, turning some of those features into feature vectors may also be Estimated completion time necessary. Building initial feature vector data sets to train or test an AI model is 15 minutes a crucial step in developing AI solutions. Following the process for developing initial feature vectors is essential to developing robust and reliable AI models. Video reference Domain 2 Purpose Topic: Select Features for the AI Model Upon completing this project, you will better understand what feature vectors Subtopic: Build Initial Feature are and why they are important for an AI model. Vectors; Consult with SMEs on Selection Steps for Completion Objectives covered 2 Data Collection, Processing, and 1. What is a feature vector? Engineering a. 2.6 Select features for the AI model 2.6.2 Build initial feature vectors for test/train data set 2.6.3 Consult with subject-matter experts to confirm feature selection Notes for the teacher 2. Why are feature vectors needed when developing an AI model? Ensure students understand that feature vectors represent the measurable a. properties or characteristics of the subject or phenomena being observed. 3. What does it mean to scale feature vectors? a. 4. What is encoding? a. 5. Determine whether each statement is true or false. a. An AI model's performance on random data or a small, unrepresentative data set may not be indicative of its performance on real-world data. b. The data the model has access to usually comes in two forms: the training data set and the archived data set. c. It is crucial that the testing data is mixed with the training data to get an unbiased estimate of an AI model's performance. d. In many instances, AI developers will consult with subject matter experts to help ensure appropriate features are used in an AI model. e. AI developers should not consult subject matter experts about risky features. 43 | Domain 2 Lesson 4: Feature Vectors Artificial Intelligence Project Workbook, First Edition Domain 2 Lesson 5 44 | Domain 2 Lesson 5: Feature Vectors Artificial Intelligence Project Workbook, First Edition Transform and Manipulate Project Details Project file Data N/A Estimated completion time Once features are selected for an AI model, AI developers should engage in 10 minutes feature engineering, which ensures the features will work properly within the Video reference context of an AI model. A data set used for AI is rarely ready to be in an AI Domain 2 model without the transformation and manipulation of data. Topic: Engage in Feature Engineering Subtopic: Determine Needed Purpose Transformations; Create Processed Datasets Upon completing this project, you will better understand how to transform and manipulate data. Objectives covered 2 Data Collection, Processing, and Steps for Completion Engineering 2.7 Engage in feature engineering 1. What occurs during the process of feature engineering? 2.7.1 Review features and determine what standard a. transformations are needed 2.7.2 Create processed data sets 2. What three methods can AI developers use to transform and Notes for the teacher If time permits, you may choose to manipulate data to prepare it for an AI model? show students a data set and how to transform and manipulate that data set a. to prepare it for an AI model. 3. Determine whether each statement is true or false. a. Once data is transformed, AI developers should validate the transformation. b. If missing values in data sets are critical for an AI model to be accurate, AI developers may consider estimating the values and entering them into the model. c. AI developers can add keywords to mark existing words in data that help categorize data for a model. 45 | Domain 2 Lesson 5: Transform and Manipulate Data Artificial Intelligence Project Workbook, First Edition Training and Test Data Sets Project Details Project file Once we have a data set and decide what features we will use in an AI model, N/A the next step is to separate the available data into two sets: training and test Estimated completion time sets. Test sets are also known as validation sets. 15 minutes Purpose Video reference Domain 2 Upon completing this project, you will better understand the purpose of using Topic: Identify Data Sets training and test data sets on an AI model. Subtopic: Training and Test Sets; Ensure Test Set is Representative Steps for Completion Objectives covered 1. What is a test data set used for? 2 Data Collection, Processing, and Engineering a. 2.8 Identify training and test data sets 2.8.1 Separate available data into 2. What is a training data set used for? training and test sets 2.8.2 Ensure test set is a. representative Notes for the teacher 3. What is the goal of using test data sets and training data sets on an AI Ensure students understand that model? training data and test data should always be stored and used separately. a. 4. How can AI developers accurately evaluate a model’s performance using test data sets? a. 5. Describe two ways in which random sampling can be beneficial to creating accurate test data sets. a. 46 | Domain 2 Lesson 5: Training and Test Data Sets Artificial Intelligence Project Workbook, First Edition Documentation Project Details Project file There are three areas of importance in documenting data decisions: listing N/A assumptions, documenting predicates, and specifying constraints. These three Estimated completion time areas impact not only the data decisions made within documents but also how 15 minutes AI models are built to ensure that the models fit within the assumptions and constraints associated with AI projects. Creating and maintaining white papers is Video reference crucial for effective documentation of AI projects. Domain 2 Topic: Document Data Decisions Purpose Subtopic: Assumptions, Predicates, and Constraints; Make Information Upon completing this project, you will better understand how to document Available various aspects of AI models. Objectives covered 2 Data Collection, Processing, and Steps for Completion Engineering 2.9 Document data decisions 1. What are assumptions? 2.9.1 List assumptions, predicates, and constraints upon which design a. choices have been reasoned 2.9.2 Make this information available to regulators and end users who demand deep 2. What are predicates? transparency a. Notes for the teacher Ensure students understand that access to white papers should be managed, as they often contain sensitive data. 3. What are constraints? a. 4. What is the purpose of white papers? a. 5. Where are two places white papers should be stored? a. 6. What is deep transparency? a. 47 | Domain 2 Lesson 5: Documentation Artificial Intelligence Project Workbook, First Edition

Use Quizgecko on...
Browser
Browser