CS106EA Midterm Review PDF

Review Questions + Our Answers (Lecture and Review) 1. What is Artificial Intelligence? ○ Artificial Intelligence are programs which carry out tasks which normally require intelligence when done by humans. A key point here is that there is no generally agreed definition of AI but that most commonly referred to definitions involve programs that think, reason, and learn like humans. 2. What is the difference between Artificial General Intelligence (AGI) and Narrow Artificial Intelligence? ○ AGI can solve any problem that a human can, specifically even on problems that it has not been specifically trained on. AGI has comparable understanding and learning capability to humans. ○ Narrow AI can only solve a specific type of problem and can NOT generalize beyond its field of expertise. 3. What is the difference between using a model in Training Mode vs Inference Mode? ○ In training mode, during forward pass we calculate and store intermediate results which will be needed for backpropagation. In inference mode we don’t → this is the fundamental difference between training mode and inference mode ○ Training mode: the model is fed training instances and the output is used to update the model ○ Inference: Model is being used for actual predictions (aka the model is NOT updated based on the outputs, the outputs are actually used) 4. What is the difference between Training and Production? ○ During the training process we are working to improve the model Model uses training mode when working to update the model uses inference mode to validate and test ○ During production use the model is fully trained and we are using it for real work (all use of the model after training has been completed) model is always in inference mode training during production use is possible but rare and it can lead to instability as the model might overreact to new inputs 5. What is the relationship between Artificial Intelligence, Machine Learning, Neural Networks, and Deep Learning? ○ Artificial Intelligence > Machine Learning > Neural Networks > Deep learning ○ Aka Artificial Intelligence encompasses them all, and is the term for the overall field. ○ Most modern AI work is actually done using Machine Learning. ○ Neural networks are ALL based on Machine learning. ○ While some basic Neural networks can carry out useful tasks, the majority of modern Neural Networks fall into the category of deep learning 6. What is the difference between Symbolic AI and Subsymbolic AI? ○ In symbolic AI the systems represent information as symbols (or rules) Much easier to understand ○ In sub symbolic AI the understanding of a sentence is stored by a distributed set of numeric values across a neural network (mathematical models). Key point is in sub symbolic AI we cannot point to specific locations where a particular concept exists in our neural network Generally very opaque 7. Why is Machine Learning critical to modern AI? ○ Machine learning programs learn and improve their behavior WITHOUT direct modification by humans ○ almost all modern AI programs depend on some level of Machine learning 8. What is the difference between predictive, descriptive, and generative AI systems? ○ Predictive → Supervised Learning, samples in dataset are labeled w expected output Designed to make predictions from data Learn statically patterns from training dataset Apply learned patterns to previously unseen data Within predictive systems: there are regressive systems which return continuous values and classification systems that place items in pre-existing classes Examples: weather prediction system (regression), house price prediction system, animal recognit ○ Descriptive → Unsupervised Learning, no labels (expected output) provided Designed to identify patterns or structures in data Developers can steer patterns identified by choosing input features used in samples or by using feature engineering to combine or modify input characteristics Groups are entirely created by the AI itself ex: streaming service placing customers into groups ○ Generative Ai systems → Self supervised Learning, system generates its own labels using clearly defined algorithms, widely used for LLMs — model learns patterns from input samples provided, generates new content based on input (ex. ChatGPT) 9. What is the difference between Supervised Learning and Unsupervised Learning? ○ In supervised learning, samples in a dataset are labeled with the expected output but in unsupervised learning, there are no labels (expected outputs) provided ○ Supervised learning ex: Dataset of images, information on houses ○ Unsupervised learning ex: Streaming service data on customers 10.What is Self-Supervised Learning, and why do we use it? ○ When a system generates its own labels using clearly defined algorithm ○ Widely used for LLM (ex: Chat GPT) ○ Very useful for large datasets when there is far too much data for a human to label ○ EX: Next word prediction 1. Create labelled samples from sentence 2. Feed in part of sentence and predict the next word Masked word prediction 1. Create samples by blocking words in sentences 11.What is the relationship between Datasets, Samples, Features, and Labels? 12.What are the advantages of Classical Machine Learning methods over Neural Network - based methods? ○ Models are easier to understand, less computationally expensive, work with less data, and are less prone to overfitting 13.What three mathematical specialties form the foundation for Machine Learning? ○ Statistics – Modern subsymbolic AI is really all statistics , neural networks are attempting to find statistical relationships between the inputs and the desire outputs ○ Calculus – Study of how things change, the rate, and calculating the cumulative effects of change ○ Linear Algebra - The study of linear systems in a multidimensional space 14.What is Linear Algebra? ○ Study of liner systems is a multidimensional space (2nd space is line, 3rd space is a place, n-space is a hyperplane) ○ Y = mx+b is linear ○ Y = ax^2… ○ Main relevant points: Tensors N-dimensional space 1. provides a theoretical framework for understanding our data Linear algebraic operations 1. used commonly in actual neural network operations 2. can also be used to modify data and come up with representations that are useful 3. used to carry out much of the operation of neural networks 15.What is a Tensor? ○ A tensor is a multi-dimensional array used to store data, like a list, but with more than one dimension (GPT answer, I didn’t know it) ○ The basic data structure used to store our information (lecture) ○ Are used to store information in Machine Learning Input data output results neural networks themselves 16.Suppose I’m working on a housing machine learning model. Each sample consists of the four features (square footage, yard footage, number of bedrooms, number of bathrooms) and two labels (selling price, time on market2). If I wanted to represent each sample as a point in n-dimensional space (representing both inputs and outputs in the same space), what does that mean, and what would be the value of n? Hint: In the term n-dimensional space, the symbol n is used generically. It represents the number of dimension entirely unrelated such as weights for a node in a neural network. While it is common to also associate n with input and m with output when discussing input and output spaces, you should also become familiar with n- dimensional as just the generic name for a multidimensional space. 6 dimensional space (4 inputs + 2 outputs) 17.If I say that a model maps a point in 3-dimensional space to a point in 2- dimensional space, what does that me about how the number of dimensions relates to features and labels in your dataset ○ In this case it is a 3 dimensional space because we are describing how a model transforms data from one output space has 2 dimensions which corresponds to 2 labels per sample. In other words, each sample ha 18.What is a Jupyter Notebook and why is it used? ○ Jupyter Notebook is an interactive notebook system and is widely used for both education and by data scientists ○ it's particularly valued as a good way for data scientist to present their findings to others ○ Can also provide a good prototyping environment when exploring new ideas ○ Jupyter does support some other languages, and in fact the name Jupyter comes from the names of three programming languages → Julia, Python, and R 19.What is Google Colab and why is it used? 20.What are some of the advantages of neural-network-based machine learning over classical machine learning? 21.What does the term “parameters” refer to when discussing Neural Networks—for example, what does “ChatGP 22.What are weights and biases and how are they used to determine the output of a node in a neural network? 23.Using the diagram below showing values for inputs, weights, and biases on a neuron, walk through how we would calculate the output of the neuron. Don’t worry about actually calculating the correct value out of the Activation Function, but you should be prepared to explain how we use an Activation Function to determine the final output. 24.What is the primary purpose of Activation Functions in neural networks? What happens if we remove it and just use a network without Activation Functions? 25.What does it mean to have a Deep Neural Network? What benefits does having a Deep Neural Network provide 26.What is the key characteristic that defines a Dense or Fully-Connected Neural Network? 27.What kind of functions can a neural network simulate? 28.What is the meaning and relationship between the following terms when used with Neural Networks: Deeper, Higher, Lower. ○ Deeper → means closer to the output → deeper into it ○ Higher → means closer to the output → more abstract/conceptual ○ Lower → means closer to the input 29.What happens when our network has a high capacity, but we have a small dataset? 30.How do the number of features in each sample and the number of samples in a dataset affect a network's ○ ability to model the data? What happens when we increase the number of features but not the numbe ○ of samples? 31.What do I mean when I say: “input order is an illusion for dense neural networks”? 32.What situations is One-Hot Encoding used for and what benefit does it provide? 33.Why are activation functions (such as Sigmoid and Softmax) sometimes place on output nodes in neural networks? 34.Describe what’s going on in this diagram. Provide a walkthrough of what happens each time a batch of data is passed through the training process. ○ The blue section is the dataset and it consists of individual samples that are labeled with known values. May be historic data or labeled by human experts ○ The green model box is the mathematical model designed to predict outputs from input data ○ Here we get the predicted and generated output, which in a basic machine learning model is where the model completes ○ Then for most machine learning models, the cost function compares the actual output to expected output and measures how well the model works ○ In the optimizer it updates the modern based on how well the previous prediction worked ○ 35.What is an Epoch and what is its relationship to a Batch? ○ Epoch = full pass of entire contents of the training dataset through the neural network ○ Batch = pass of some data through the neural network followed by parameter updates Full batch / mini batch / stochastic gradient descent ○ We’re looking at how the parameters are updated after each batch ○ Remember the updates process continues for multiple epochs, changing the parameter values after processing each batch 36.What is the difference between Stochastic Gradient Descent, Minibatch Gradient Descent, and Full ○ Batch Gradient Descent? Consider how different batch sizes and update frequencies might change train ○ efficiency and stability. 37.What does Stochastic refer to in the context of Machine Learning batches? 38.What are common causes of Overfitting? 39.Why do we need a Validation Dataset when we have already split our data into Training and ○ Testing Datasets? What would be the consequences of only using Training and Testing Datasets? 40.How does our Validation Dataset help prevent Overfitting? 41.In lecture, I talked about how the relationship between input and output is the most important during production, and yet for real-world systems we generally don’t plot the input vs. output graph. Why is that? What is the point of using them during teaching if they aren’t used for real-world systems? 42.What is the relationship between the following graphs: input vs. output, parameter vs. output, parameter vs. loss? Why do we want to use the parameter vs. loss graph instead of the input vs. output graph when determining how to update our parameters? 43.What is the difference between gradient and gradient descent? How is the gradient used within the gradient descent process? 44.What is the relationship between epochs and batches? What is the relationship between batches and the gradient descent process? We actually covered epochs and batches in the last lecture, but I want you to think about how they tie together with the gradient descent process. 45.What ultimately causes the gradient descent process to stop? ○ Hint: Even if the parameter reaches the optimal position where the loss is 0, the process doesn’t inherently stop. There are other stop criteria, and keep in mind that we aren’t optimizing for a single parameter, we’re optimizing for all parameters in the network, which all interact with each other in a complex manner. 46.What is the difference between Loss and Accuracy? 47.What is the relationship between the Forward Pass and the Backward Pass used in a Neural Network? What is each used for? Are they both used during model training? What about during production use of the model? 48.During the Training Process, what is the difference between how a Training Sample would be processed and how a Validation Sample would be processed? 49.Walk through the training process starting with what happens for each Epoch, then drilling down to what happens for each Batch in the Epoch, and then finally what happens for each Sample in the Batch. Consider both what happens at the level of the entire model (i.e., the entire Neural Network) and at the level of each parameter in the model. 50.What causes Vanishing Gradients? What are the implications of having Vanishing Gradients occur? 51.List at least seven different steps that should be taken by developers of AI systems and categorize each step as pre-development, development, or deployment/maintenance. 52.There is a wide range of data that might be used for AI system training. The data characteristics have a profound impact on the suitability of the problem for AI and the likely costs of development. Discuss some of the key ways in which data might vary and the consequences for development. ○ Hints: What are the consequences of using structured vs. unstructured data? If our initial data is not labelled, how might we try to label it, and what are the likely costs of such an operation? How closely does the available data align with our intended production use and what challenges might arise if there is a mismatch? 53.What are some of the ethical issues we should consider when evaluating a potential data source? 54.What are some things we might want to keep in mind when developing an evaluation protocol for our system? 55.How can an unscrupulous developer use top-k accuracy measurements to provide an inflated view of how well their model performs? 56.What are two steps we might need to perform to prepare our original data for use in a neural network? 57.What are some of the steps we might take to go from having a list of potential models to settling on the specific model to use? 58.What are some steps we can take after we’ve completed our model training to improve our model performance (and reduce resource needs) during deployment? ○ What is the difference between how a CPU and a GPU execute instructions? 59.Why are GPUs and AI Accelerators advantageous for handling Machine Learning? 60.Give an example of irreducible error that might prevent making a prediction? Feel free to use any ○ problem domain. 61.What is the relationship between chaotic effects on our ability to make individual predictions? ○ What about predictions for large populations? 62.Explain why the Fragile Families Challenge largely failed, at least from a Machine Learning perspective. ○ (2-3 sentences) 63.Discuss why Geopolitical Events are difficult for AI to predict. (1-2 paragraphs, 6-8 sentences total) 64.Are there national or international events that might be solvable with AI? 65.What do we mean when we say that a Neural-Network-based AI system is opaque? 66.If opponents know an AI system is in place they may try to game the system. Describe a situation ○ (not described in lecture) in which this might occur. 67.What are the benefits of purpose-built neural networks over dense-neural networks? What do I mean by purpose-built? Describe at least one specific change to the architecture of CNNs which is there because CNNs are built specifically for handling image data. 68.We discussed a number of Computer Vision tasks used by real world systems, describe any three of them. 69.We discussed several problematic Computer Vision uses. Discuss one of them and why you believe it is or is not an appropriate use of Computer Vision. 70.How does convolution solve the problem of translational invariance when trying to identify objects in an image? 71.What determines the meaning of the different filters/kernels in the CNN? Is this set by the human programmer? 72.Describe several of the benefits to the hierarchical structure and use of pooling in CNNs. 73.What are the benefits to using Fine Tuning or Transfer Learning with a CNN? Provide an example of when you might be able to use one of these techniques. Key Terms Artificial General Intelligence (AGI) Machine Learning Inference Training Production Predictive AI Descriptive AI Generative AI Classification Dataset Model Cost Function / Objective Function Optimizer Narrow Artificial Intelligence Turing Test Artificial Super Intelligence Supervised Learning Unsupervised Learning Self-Supervised Learning Dataset Sample = Observation or Instance Features = Commonly refers to just the input features but sometimes can refer to output as well Label = Generally refers to the actual label on a sample not the output of the model Prediction Classical Machine Learning Linear Algebra Vectors, Matrices, and Tensors N-Dimensional Space Google Colab Jupyter Notebook Graphics Processing Unit (GPU) AI Accelerator Neural Processing Unit (NPU) Ground Truth MNIST Node / Neuron Weight, Bias, Parameter Edge / Connection Neural Network Input Layer Output Layer Hidden Layer Deep Learning Deep Neural Network Dense Neural Network / Fully-Connected Neural Network Network Capacity Hyperparameter Overfitting Underfitting One-Hot Encoding Epoch Batch Overfitting Training, Validation, and Testing Datasets ○ You should be able to describe the distinct roles these three datasets play in machine learning development and why we need all three of them. Underfitting Gradient descent Sigmoid, Tanh, ReLU, Leaky ReLU Softmax / Activation Function Minibatch, Stochastic Gradient Descent, Full-Batch Gradient Descent Gradients ○ For our class, it’s okay to think of the gradient as the equivalent of a slope. However, for more advanced students, keep in mind that we typically aren’t working in a 2D space. Instead, we’re operating in an n-dimensional space, where each parameter on the node we’re working on adds an extra dimension. The gradient is an n-dimensional vector that points in the direction of the steepest increase in the loss value for a given set of parameter values in this n-dimensional space (technically, the gradient is the normal to the tangent hyperplane at the point defined by the current parameter values). If we hold all other parameters constant and focus on the effect of changing just one parameter, we simplify the problem to a 2D space that represents the changing parameter on one axis and the loss value on the other. This is the graph we’ve been studying in class, where we analyze the slope of the tangent line to guide parameter updates. Gradient Descent Backpropagation Loss Forward Pass / Forward Propagation Backward Pass / Backpropagation ○ For our class don’t worry about the nuances between “Pass” and “Propagation”, if you’re going to just remember two terms, I recommend Forward Pass and Backpropagation Vanishing Gradients Exploding Gradients The Curse of Dimensionality ○ Just having a general understanding that increasing features causes problems is sufficient for most students. Those actually planning to go into the field would do well to focus on the note on the slide introducing The Curse of Dimensionality which provides further information. We’ll see some examples of feature/sample size mismatches in the next lecture. Concept Drift ○ This was actually briefly touched on near the end of Lecture 1, where examples of situations it occurs in were briefly considered. We’ll see some potential consequences of assuming no concept drift occurs next lecture. Logically though, it belongs as part of something that should be considered during development and deployment. Graphics Processing Unit (GPU) Structured and Unstructured Data Missing Data ○ For our class, you don’t need to memorize these specific characteristics, but you recognize them and once seen, you should be able to recall what they mean. In addition you should be prepared to come up with a short list of different data characteristics and what their consequences might be for development or deployment. Top-K Choice Accuracy Feature Engineering ○ Feature Engineering is particularly common for Classical Machine Learning techniques. Students planning to work in environments where such techniques might be used should definitely have this term memorized. The rest of you should recognize it and have some sense of what it means. Central Processing Unit (CPU) AI Accelerator Neural Processing Unit (NPU) Automation Bias Algorithmic Aversion Convolutional Neural Network (CNN) Filter / Kernel Receptive Field Feature Map Convolution ○ You don’t need to remember the technical definition of Convolution. Just remember that it’s the basic operation we perform applying the kernel/filter to an input map generating a new feature map. Data Augmentation Fine Tuning ○ Taking an existing network and tweaking it by training it with data specific to your task Transfer Learning RGB Greyscale (also spelled Grayscale) Color Channel ○ RGB, Greyscale, and Color Channel aren’t actually artificial intelligence terms, they are general computer science terms. You should definitely learn RGB and Greyscale if you intend to work with images on computers. Color Channel is a bit more technical, but it does show up in both machine learning and also general image manipulation software such as Photoshop. Translational Invariance, Rotational Invariance, Scaling Invariance Pooling Homework Review Overfitting Fill in more… ChatGPT answers 1. What is Artificial Intelligence? AI is the simulation of human intelligence processes by machines, particularly computers. 2. What is the difference between Artificial General Intelligence (AGI) and Narrow Artificial Intelligence? AGI is a system capable of performing any intellectual task a human can, whereas Narrow AI specializes in specific tasks. 3. What is the difference between using a model in Training Mode vs Inference Mode? Training Mode involves learning from data and adjusting model parameters, while Inference Mode uses a trained model to make predictions on new data. 4. What is the difference between Training and Production? Training is the process of teaching a model using data, while Production involves using the trained model in real-world applications. 5. What is the relationship between Artificial Intelligence, Machine Learning, Neural Networks, and Deep Learning? AI is the broader field; Machine Learning is a subset of AI, Neural Networks are a type of model used in ML, and Deep Learning is a subset of ML focused on large neural networks. 6. What is the difference between Symbolic AI and Subsymbolic AI? Symbolic AI uses explicit rules and logic, while Subsymbolic AI (e.g., neural networks) learns patterns from data without explicit rules. 7. Why is Machine Learning critical to modern AI? Machine Learning enables AI systems to automatically improve from experience, making them adaptable and powerful. 8. What is the difference between predictive, descriptive, and generative AI systems? Predictive AI forecasts future outcomes, descriptive AI analyzes and interprets data, and generative AI creates new data or content. 9. What is the difference between Supervised Learning and Unsupervised Learning? Supervised Learning uses labeled data to train models, while Unsupervised Learning identifies patterns in unlabeled data. 10.What is Self-Supervised Learning, and why do we use it? Self-Supervised Learning generates its own labels from the data, reducing the need for manual labeling. 11.What is the relationship between Datasets, Samples, Features, and Labels? A dataset consists of samples, each containing features (input variables) and labels (output variables). 12.What are the advantages of Classical Machine Learning methods over Neural Network-based methods? Classical ML methods are often simpler, easier to interpret, and require less data than Neural Networks. 13.What three mathematical specialties form the foundation for Machine Learning? Linear Algebra, Calculus, and Probability & Statistics are the core mathematical foundations for Machine Learning. 14.What is Linear Algebra? Linear Algebra is the branch of mathematics dealing with vectors, matrices, and systems of linear equations. 15.What is a Tensor? A Tensor is a multi-dimensional array, generalizing matrices to higher dimensions, commonly used in deep learning. 16.If I’m working on a housing machine learning model. What does representing each sample as a point in n-dimensional space mean, and what would be the value of n? It means each sample is represented by a vector with n dimensions, where n is the sum of features and labels (6 in this case). 17.If I say that a model maps a point in 3-dimensional space to a point in 2-dimensional space, what does that mean? It means the model reduces three features into two labels, so each sample has 3 features and 2 labels. 18.What is a Jupyter Notebook and why is it used? A Jupyter Notebook is an interactive document for running code and visualizing data, widely used in data science and machine learning. 19.What is Google Colab and why is it used? Google Colab is a cloud-based platform for running Jupyter notebooks, offering free access to GPUs and TPUs for machine learning tasks. 20.What are some of the advantages of neural-network-based machine learning over classical machine learning? Neural networks can automatically extract complex features from raw data, especially in tasks like image recognition and natural language processing. 21.What does the term “parameters” refer to when discussing Neural Networks—for example, what does “ChatGPT 3.5 has 175 billion parameters” mean? Parameters are the weights and biases in a neural network that are learned during training to optimize the model's predictions. 22.What are weights and biases and how are they used to determine the output of a node in a neural network? Weights are values that adjust the input to each node, and biases are constants added to the weighted sum to help the model learn. 23.What is the primary purpose of Activation Functions in neural networks? What happens if we remove it and just use a network without Activation Functions? Activation functions introduce non-linearity, enabling neural networks to model complex patterns; removing them would make the network equivalent to a linear model. 24.What does it mean to have a Deep Neural Network? What benefits does having a Deep Neural Network provide? A Deep Neural Network has many hidden layers, allowing it to learn hierarchical representations of data and perform complex tasks. 25.What is the key characteristic that defines a Dense or Fully-Connected Neural Network? Every node in a layer is connected to every node in the subsequent layer. 26.What kind of functions can a neural network simulate? A neural network can simulate any continuous function, including complex non-linear relationships. 27.What is the meaning and relationship between the following terms when used with Neural Networks: Deeper, Higher, Lower? Deeper refers to more layers in the network, while Higher and Lower refer to the positions of layers within the network (higher is closer to the input). 28.What happens when our network has a high capacity, but we have a small dataset? The model may overfit, memorizing the training data rather than generalizing to new data. 29.How do the number of features in each sample and the number of samples in a dataset affect a network's ability to model the data? More features can improve model performance, but insufficient samples may lead to overfitting, and more samples help the model generalize better. 30.What do I mean when I say: “input order is an illusion for dense neural networks”? Dense neural networks do not rely on the order of inputs because each input is processed independently by each node. 31.What situations is One-Hot Encoding used for and what benefit does it provide? One-Hot Encoding is used for categorical data, converting categories into binary vectors to be used in machine learning models. 32.Why are activation functions (such as Sigmoid and Softmax) sometimes placed on output nodes in neural networks? They are used to scale the output values to a desired range, like probabilities for classification tasks. 33.What’s going on in this diagram: What happens each time a batch of data is passed through the training process? The model processes the batch, computes the loss, adjusts weights through backpropagation, and updates parameters. 34.What is an Epoch and what is its relationship to a Batch? An epoch is one complete pass through the entire training dataset, and each batch is a subset of data processed in one step during an epoch. 35.What is the difference between Stochastic Gradient Descent, Minibatch Gradient Descent, and Full Batch Gradient Descent? SGD updates parameters after each sample, Minibatch uses small batches, and Full Batch uses the entire dataset for each update. 36.What does Stochastic refer to in the context of Machine Learning batches? Stochastic refers to using a single data sample at a time for parameter updates, introducing randomness to the training process. 37.What are common causes of Overfitting? Overfitting occurs when a model learns noise or irrelevant patterns in the training data due to too much complexity or insufficient data. 38.Why do we need a Validation Dataset when we have already split our data into Training and Testing Datasets? The validation dataset helps tune hyperparameters and prevent overfitting, ensuring the model generalizes well to unseen data. 39.How does our Validation Dataset help prevent Overfitting? It provides an intermediate check for the model’s performance, allowing adjustments before testing on unseen data. 40.Why don’t we plot input vs. output graphs for real-world systems in production? In real-world systems, data relationships are complex, and we focus more on optimizing model performance through metrics rather than visualizing raw inputs and outputs. 41.What is the relationship between the following graphs: input vs. output, parameter vs. output, parameter vs. loss? Parameter vs. loss graph helps visualize model training and parameter optimization, whereas input vs. output is less useful for fine-tuning. 42.What is the difference between gradient and gradient descent? The gradient is the vector pointing in the direction of the steepest increase in loss, while gradient descent is the optimization process that moves in the opposite direction to minimize loss. 43.What is the relationship between epochs and batches? What is the relationship between batches and the gradient descent process? Epochs represent complete passes through the dataset, and batches are subsets of the dataset processed during each update in gradient descent. 44.What ultimately causes the gradient descent process to stop? Gradient descent stops when the loss no longer improves or when a predefined number of epochs or convergence criteria is met. 45.What is the difference between Loss and Accuracy? Loss measures how far the model’s predictions are from actual values, while accuracy measures the percentage of correct predictions. 46.What is the relationship between the Forward Pass and the Backward Pass used in a Neural Network? The Forward Pass computes the output, and the Backward Pass updates the weights based on the error calculated during the forward pass. 47.During the Training Process, what is the difference between how a Training Sample would be processed and how a Validation Sample would be processed? Training samples are used to update model parameters, while validation samples help evaluate the model’s performance without influencing parameters. 48.Walk through the training process: What happens for each Epoch, Batch, and Sample? In each epoch, batches are processed with each sample, the model's predictions are compared to the true values, gradients are computed, and weights are updated. 49.What causes Vanishing Gradients? What are the implications of having Vanishing Gradients occur? Vanishing gradients occur when gradients become too small for effective weight updates, leading to slow or stalled learning in deep networks. 50.List seven steps for AI system development and categorize each. Pre-development: Define problem, gather data; Development: Train model, validate model; Deployment: Monitor performance, update model, maintain system. 51.Discuss data variations and their consequences for development. Structured data is easier to work with, unstructured data is more complex, and labeling unstructured data is costly but necessary for training. 52.What ethical issues should be considered when evaluating a potential data source? Issues include bias, privacy, consent, and the potential harm to individuals or communities. 53.What should be kept in mind when developing an evaluation protocol? The evaluation should be fair, unbiased, and reflect the real-world performance of the model. 54.How can unscrupulous developers inflate top-k accuracy? By choosing top-k thresholds that exaggerate the model's true performance, misleading stakeholders about the model’s capability. 55.What two steps might we perform to prepare data for neural networks? Data normalization and handling missing values are common preprocessing steps for neural networks. 56.What steps do we take to choose a model? Evaluate various models, test their performance, and select based on accuracy, efficiency, and suitability for the problem. 57.What steps to improve model performance during deployment? Optimize model efficiency, monitor its performance, and retrain when necessary to adapt to new data. 58.What is the difference between how a CPU and a GPU execute instructions? CPUs are optimized for single-threaded performance, while GPUs are designed for parallel processing tasks like matrix calculations. 59.Why are GPUs and AI Accelerators advantageous for Machine Learning? They provide massive parallel processing capabilities, speeding up the training of large models significantly. 60.Give an example of irreducible error. Weather prediction might have irreducible error due to unpredictable atmospheric variables. 61.What is the relationship between chaotic effects and predictions? Chaos theory suggests that small changes in input can lead to unpredictable outputs, making accurate individual predictions difficult but still feasible for large populations. 62.Why did the Fragile Families Challenge fail from an ML perspective? The challenge faced difficulties due to complex and noisy data with many variables, making predictions unreliable. 63.Why are Geopolitical Events difficult for AI to predict? Geopolitical events involve complex, unpredictable human behavior and a wide range of influencing factors, making them difficult to model effectively. 64.Are there national or international events that might be solvable with AI? Certain events, like natural disasters or economic trends, could be modeled with AI based on historical data and patterns. 65.What does it mean when we say a Neural-Network-based AI system is opaque? It means the model’s decision-making process is not easily interpretable, making it difficult to understand how it reaches conclusions. 66.Describe a situation where opponents might game an AI system. In online gaming, opponents might exploit weaknesses in a cheat-detection AI to gain an unfair advantage. 67.What are the benefits of purpose-built neural networks over dense neural networks? Purpose-built networks are optimized for specific tasks (e.g., CNNs for images), leading to better performance and efficiency than general-purpose networks. 68.What are three Computer Vision tasks? Object detection, image classification, and facial recognition are common Computer Vision tasks. 69.Discuss a problematic use of Computer Vision. Surveillance for facial recognition can be problematic due to privacy concerns and the potential for discrimination or bias. 70.How does convolution solve the problem of translational invariance? Convolution applies the same filter across the entire image, allowing the network to recognize objects regardless of position. 71.What determines the meaning of the different filters/kernels in CNNs? Filters in CNNs are learned automatically from data, rather than being set by human programmers. 72.What are the benefits of pooling in CNNs? Pooling reduces dimensionality, making the network more computationally efficient while maintaining important features. 73.What are the benefits of Fine-Tuning or Transfer Learning? Fine-tuning adjusts pre-trained models for new tasks, saving time and resources, and Transfer Learning uses knowledge from one domain to improve performance in another. 74. Here are the definitions for each term: ### **Core AI and Machine Learning Concepts** 1. **Artificial General Intelligence (AGI)** – A type of AI that can understand, learn, and apply knowledge across a wide range of tasks at a human-like level. Unlike narrow AI, AGI can generalize learning to new, unseen situations. 2. **Machine Learning** – A subset of AI that enables computers to learn from and make predictions or decisions based on data, without being explicitly programmed. 3. **Inference** – The process of using a trained machine learning model to make predictions or classifications on new data. 4. **Training** – The process of teaching a machine learning model by feeding it data and adjusting its parameters to minimize error. 5. **Production** – The stage in machine learning where a trained model is deployed and used for real-world inference tasks. 6. **Predictive AI** – AI that uses historical data to predict future outcomes (e.g., sales forecasting). 7. **Descriptive AI** – AI that analyzes past data to provide insights and explanations (e.g., customer segmentation). 8. **Generative AI** – AI that creates new content, such as text, images, or audio, based on learned patterns (e.g., ChatGPT, DALL·E). 9. **Classification** – A type of machine learning task where data is assigned to predefined categories (e.g., spam vs. non-spam emails). 10. **Dataset** – A structured collection of data used for training, validating, and testing machine learning models. 11. **Model** – A mathematical structure trained on data to make predictions or decisions. ### **Mathematical and Optimization Concepts** 12. **Cost Function / Objective Function** – A function that measures the error between predicted and actual values, guiding optimization. 13. **Optimizer** – An algorithm that adjusts a model’s parameters to minimize the cost function (e.g., Adam, SGD). 14. **Narrow Artificial Intelligence** – AI specialized in a single task or domain (e.g., chess-playing AI). 15. **Turing Test** – A test to determine if a machine can exhibit human-like intelligence through conversation. 16. **Artificial Super Intelligence (ASI)** – A hypothetical AI surpassing human intelligence across all domains. ### **Machine Learning Approaches** 17. **Supervised Learning** – A learning approach where models are trained on labeled data (input-output pairs). 18. **Unsupervised Learning** – A learning approach where models identify patterns in data without labeled outputs. 19. **Self-Supervised Learning** – A variant of supervised learning where models generate their own labels from raw data. ### **Data Terminology** 20. **Sample (Observation or Instance)** – A single data point in a dataset. 21. **Features** – Input variables used to make predictions. 22. **Label** – The actual output or classification assigned to a sample. 23. **Prediction** – The output generated by a model when given an input. ### **Machine Learning Fundamentals** 24. **Classical Machine Learning** – Traditional statistical and algorithmic approaches to ML (e.g., decision trees, SVMs). 25. **Linear Algebra** – The mathematical foundation of machine learning involving vectors, matrices, and tensors. 26. **Vectors, Matrices, and Tensors** – Multi-dimensional arrays used for storing and manipulating data in ML models. 27. **N-Dimensional Space** – A space with multiple dimensions where data and model parameters exist. ### **Tools & Hardware** 28. **Google Colab** – A cloud-based Jupyter Notebook environment for running ML experiments. 29. **Jupyter Notebook** – An open-source interactive computing environment for writing and executing code. 30. **Graphics Processing Unit (GPU)** – Specialized hardware that accelerates machine learning computations. 31. **AI Accelerator** – Specialized hardware designed to speed up AI computations (e.g., TPUs, NPUs). 32. **Neural Processing Unit (NPU)** – A processor optimized for neural network computations. ### **Data Concepts** 33. **Ground Truth** – The actual, correct labels used to evaluate model performance. 34. **MNIST** – A famous dataset of handwritten digits used for image classification. 35. **Structured Data** – Data organized in a defined schema (e.g., spreadsheets, databases). 36. **Unstructured Data** – Data without a predefined format (e.g., images, videos, raw text). 37. **Missing Data** – Data points that are absent or incomplete in a dataset. ### **Neural Network Components** 38. **Node / Neuron** – A fundamental unit in a neural network that processes input and produces output. 39. **Weight, Bias, Parameter** – The learnable factors in a neural network that determine its output. 40. **Edge / Connection** – The link between neurons, carrying weighted information. 41. **Neural Network** – A network of interconnected neurons used to model complex data patterns. 42. **Input Layer** – The first layer of a neural network, where data is fed in. 43. **Output Layer** – The final layer of a neural network, producing the model’s prediction. 44. **Hidden Layer** – Layers between the input and output layers that transform data using learned parameters. 45. **Deep Learning** – A subset of ML using deep neural networks with multiple hidden layers. 46. **Deep Neural Network (DNN)** – A neural network with many hidden layers. 47. **Dense Neural Network / Fully-Connected Neural Network** – A neural network where each neuron is connected to all neurons in the next layer. 48. **Network Capacity** – The complexity of a neural network, determined by the number of neurons and layers. 49. **Hyperparameter** – A model setting that must be tuned manually (e.g., learning rate, batch size). ### **Training & Evaluation** 50. **Overfitting** – When a model learns noise from training data, reducing generalization to new data. 51. **Underfitting** – When a model is too simple and fails to capture patterns in data. 52. **One-Hot Encoding** – A technique for representing categorical data as binary vectors. 53. **Epoch** – One complete pass through the training dataset during training. 54. **Batch** – A subset of data used for updating model parameters in an iteration. 55. **Training, Validation, and Testing Datasets** – - **Training Set** – Used to train the model. - **Validation Set** – Used to tune hyperparameters and prevent overfitting. - **Testing Set** – Used to evaluate model performance on unseen data. ### **Optimization & Learning Algorithms** 56. **Gradient Descent** – An optimization algorithm that adjusts model parameters to minimize loss. 57. **Sigmoid, Tanh, ReLU, Leaky ReLU** – Activation functions that introduce non-linearity in neural networks. 58. **Softmax / Activation Function** – A function that converts model outputs into probabilities. 59. **Minibatch, Stochastic Gradient Descent (SGD), Full-Batch Gradient Descent** – Variants of gradient descent with different batch sizes. 60. **Gradients** – The rate of change of the loss function with respect to model parameters. 61. **Backpropagation** – A method for updating model weights using gradients. 62. **Loss** – The error measure used to update model parameters. 63. **Forward Pass / Forward Propagation** – The process of passing input through a network to generate predictions. 64. **Backward Pass / Backpropagation** – The process of computing gradients and updating weights. 65. **Vanishing Gradients** – When gradients become too small, preventing deep networks from learning. 66. **Exploding Gradients** – When gradients become too large, leading to instability in training. ### **Challenges in Machine Learning** 67. **The Curse of Dimensionality** – The problem where increasing features makes learning harder. 68. **Concept Drift** – When the relationship between input data and output labels changes over time. ### **Evaluation Metrics & Engineering** 69. **Top-K Choice Accuracy** – A metric that measures if the correct answer is in the top K predictions. 70. **Feature Engineering** – The process of creating new input features to improve model performance. ### **Advanced Neural Networks** 71. **Convolutional Neural Network (CNN)** – A deep learning model designed for processing image data. 72. **Filter / Kernel** – A small matrix applied to an image to detect patterns. 73. **Receptive Field** – The area of an input image that a neuron in a CNN responds to. 74. **Feature Map** – The output of applying filters to an input image. 75. **Convolution** – The process of applying a kernel to extract features from an image. 76. **Data Augmentation** – Techniques for artificially expanding a dataset (e.g., flipping images). 77. **Fine-Tuning** – Adapting a pre-trained model to a new task. 78. **Transfer Learning** – Using a pre-trained model for a new, related task. 79. **Translational, Rotational, Scaling Invariance** – Properties that make CNNs robust to transformations. 80. **Pooling** – A downsampling technique used in CNNs to reduce dimensionality.

CS106EA Midterm Review PDF

Document Details

Tags

Related

Summary

Full Transcript