Flashcards Deep learning PDF
Document Details
Uploaded by AppreciativeSheep370
Roskilde University
Tags
Summary
This document is a set of flashcards about deep learning. It's organized around questions and answers, with the first part focusing on general concepts of neural networks and training methods. It also details convolutional layers and their usage.
Full Transcript
Flashcards Deep learning NeuraleNetworksDeepLearning General Concepts of Neural Networks Question: What is the inspiration behind artificial neural networks? ○ Answer: Artificial neural networks are inspired by how some cells in the human brain function and conn...
Flashcards Deep learning NeuraleNetworksDeepLearning General Concepts of Neural Networks Question: What is the inspiration behind artificial neural networks? ○ Answer: Artificial neural networks are inspired by how some cells in the human brain function and connect, though they are a highly simplified model. Question: What is the function of weights in a neuron? ○ Answer: Weights determine the influence of each input on the neuron's output. Adjusting the weights changes the function the neuron performs. Question: What is an activation function, and why is it used? ○ Answer: An activation function normalizes the weighted sum of inputs to a range, often between 0 and 1, to ensure the output is interpretable and bounded. Training Neural Networks Question: What does supervised training of neural networks involve? ○ Answer: It involves using a large dataset of input-output pairs to adjust weights such that the network approximates the desired outputs for given inputs. Question: Define "epoch" in the context of training neural networks. ○ Answer: An epoch is one complete pass of the training dataset through the network. Question: What is overtraining in neural networks? ○ Answer: Overtraining occurs when the model learns to perform very well on the training data but fails to generalize to new, unseen data. Question: What is the learning rate, and why is it important in training? ○ Answer: The learning rate is a hyperparameter that controls how much weights are updated during training. Too high a learning rate can cause divergence, while too low a rate can slow down training. Question: What is the difference between training, validation, and test data? ○ Answer: ○ Training data: Used to adjust the model's weights during training. ○ Validation data: Used to monitor performance and decide when to stop training. ○ Test data: Used to evaluate the model's final performance on unseen data. Question: What is a batch in neural network training? ○ Answer: A batch is a subset of the training data used to calculate updates to the model's weights before applying them. Question: What is backpropagation? ○ Answer: Backpropagation is an algorithm that propagates the error gradient backward through the network to update weights and minimize the loss. Question: What is the loss function? ○ Answer: The loss function quantifies the difference between the predicted output and the true output. The goal of training is to minimize this value. Convolutional Layers Question: What is the purpose of convolutional layers in deep learning? ○ Answer: Convolutional layers are designed to extract features such as edges, textures, and patterns from input data, particularly images, by applying filters over small regions of the data. Question: What is a convolution filter? ○ Answer: A convolution filter is a small matrix (e.g., 3×3 or 5×5) that scans over an image to detect specific features, like edges or textures, by performing element-wise multiplication with pixel values. Question: What is a feature map in a convolutional layer? ○ Answer: A feature map is the output produced by applying a filter to an input image, highlighting regions of interest or specific features detected by the filter. Question: What happens to convolutional filters during training? ○ Answer: During training, the weights of the filters are adjusted via backpropagation to minimize the loss function, allowing the filters to detect increasingly relevant features. Question: How does the learning rate affect training in convolutional layers? ○ Answer: The learning rate determines the size of updates made to filter weights. A small learning rate may lead to slow convergence, while a large learning rate can cause instability or failure to converge. Question: What is stride in the context of convolutional layers? ○ Answer: Stride refers to the step size with which the filter moves across the input image. A larger stride reduces the output size and computational cost but may lose fine details. Question: What is padding, and why is it used in convolutional layers? ○ Answer: Padding adds extra pixels (usually zeros) around the edges of an image to preserve the spatial dimensions of the input after convolution. Question: How does pooling complement convolutional layers? ○ Answer: Pooling layers reduce the spatial dimensions of feature maps, retaining the most important information while reducing computational complexity and the risk of overfitting. Advanced Concepts Question: What is transfer learning in convolutional neural networks? ○ Answer: Transfer learning reuses convolution layers from a pre-trained model, allowing only task-specific layers to be trained, which drastically reduces training time. Question: How is segmentation performed using convolutional networks? ○ Answer: Segmentation involves training a network to produce masks from images, where desired objects are highlighted (e.g., identifying tumors in medical imaging). Question: What is the role of hyperparameters in training neural networks? ○ Answer: Hyperparameters, such as batch size, learning rate, and number of epochs, control how the training process is conducted and influence the performance of the model. Keywords Batch size: Number of training examples processed at once. Epoch: A complete pass through the training dataset. Loss function: A measure of the error between predicted and true labels. Backpropagation: Algorithm for propagating weight adjustments backward through the network. Gradient descent: Optimization algorithm for minimizing the loss function. Learning rate: A hyperparameter controlling the size of weight updates. Validation data: Data used to monitor and evaluate the model during training. Test data: Data used for final model evaluation on unseen examples. Intro to deep learning Data, populations and samples. General Concepts of Data Question: What is the starting point for machine learning? ○ Answer: Data is the starting point for machine learning, as it emerges from reality and provides insights into the underlying processes or populations. Question: What are the three main types of data? ○ Answer: 1. Structured data: Organized in relational databases or single relations. 2. Unstructured data: Includes images, sound files, and plain text. 3. Semi-structured data: Includes formats like HTML documents or Wikipedia pages. Question: How can unstructured data be used in machine learning? ○ Answer: Unstructured data can be pre-processed into structured forms, making it usable in machine learning tasks. Populations and Samples Question: What is a population in machine learning? ○ Answer: A population is a potentially infinite set of tuples (or vectors) representing data, with fixed attributes or features. Question: What is a sample in machine learning? ○ Answer: A sample is a finite subset of a population used for analysis, and it must be representative of the population to ensure reliable learning outcomes. Question: What does it mean for a sample to be representative? ○ Answer: A representative sample respects the inherent probability distributions of the population, ensuring that observed attributes and relationships reflect those of the population. Question: What is a biased sample? ○ Answer: A sample is biased if certain attribute values or combinations are overrepresented, distorting the true probabilities of the population. Question: When is a biased sample desirable? ○ Answer: In tasks like cancer detection, where balanced samples (e.g., equal “cancer” and “no cancer” images) can improve the model's ability to distinguish between classes, even though such samples are biased. Attributes and Features Question: What are attributes or features in machine learning? ○ Answer: Attributes or features are the components of a data tuple, representing specific characteristics or values (e.g., pixel colors, labels, or categorical data). Question: Why are features important in machine learning? ○ Answer: Features summarize raw data into meaningful properties, improving the efficiency and accuracy of learning algorithms. Question: What are some ways features can be identified? ○ Answer: Using domain-specific algorithms for feature extraction (e.g., identifying facial features in face recognition). Learning features directly from data using machine learning, especially deep learning. Examples Question: Give an example of a population in image processing. ○ Answer: Photos of 1000 × 1000 pixels in RGB format, such as: Portraits taken with smartphones. Portraits labeled with the identity of the portrayed person. Skin photos labeled with “cancer” or “no cancer.” Question: How can features help in text analysis? ○ Answer: Features can encode text as vectors, with attributes representing the presence or absence of words (e.g., one-hot encoding) or concepts (e.g., "means of transportation"). Question: What is an example of adding sentiment as a feature in text analysis? ○ Answer: A message like "that d*** car broke down again" might have a feature value of 1 for anger, while "have a nice day" would have a value of 0. Key Terms Structured data: Data in organized forms like tables. Unstructured data: Data such as images or text with no inherent structure. Semi-structured data: Data with some organizational framework, like HTML. Population: The entire set of data tuples representing the reality of interest. Sample: A finite subset of the population used for learning or analysis. Bias: An overrepresentation of certain features or values in a sample. Features: Extracted or derived data properties relevant to the learning task. One-hot encoding: A representation where each attribute is binary (0 or 1), indicating presence or absence. Models and errors General Concepts of Models Question: What is a model in machine learning? ○ Answer: A model is a description of data, specifically a population, extracted from sample data. It approximates the "true" properties of the population and is often referred to as a hypothesis. Question: Why are models considered approximations of "true" models? ○ Answer: Models are based on finite samples, which are only fractional subsets of the entire population, and may include noise, bias, or other limitations. Question: What is the hypothesis space? ○ Answer: The hypothesis space (or model space) is the set of all possible models defined by a specific parameterization. For example, in a linear model, y=ax+by = ax + by=ax+b, the hypothesis space includes all possible combinations of aaa and bbb. Linear Models and Hypothesis Testing Question: How is a linear model defined for a population? ○ Answer: A linear model expresses a functional dependency, such as y=ax+by = ax + by=ax+b, where aaa and bbb are parameters that determine the relationship between xxx and yyy. Question: What is the role of parameters aaa and bbb in a linear model? ○ Answer: Parameters aaa and bbb define the slope and intercept of the linear relationship, determining how yyy changes with xxx. Question: What is the purpose of linear regression? ○ Answer: Linear regression finds the optimal values of aaa and bbb that minimize the error between the model’s predictions and the observed data. Measuring Model Fit Question: How is the error of a model calculated for a dataset? ○ Answer: The error is often calculated as the sum of squared differences between the observed yyy values and the predicted yyy values: Error=∑i=1n(yi−(a⋅xi+b))2\text{Error} = \sum_{i=1}^n (y_i - (a \cdot x_i + b))^2Error=i=1∑n(yi−(a⋅xi+b))2 Question: What is the sum of squared differences, and why is it used in machine learning? Answer: The sum of squared differences is a way to measure how well a model fits the data. It calculates the total error by taking the difference between the observed values (the actual data) and the predicted values (what the model estimates). Each difference is squared to ensure that negative and positive errors do not cancel each other out. The result is a single number that represents the total error across all data points. This method is used because it emphasizes larger errors, making it easier to identify and reduce significant discrepancies during model optimization. Question: What does a smaller error indicate about a model? ○ Answer: A smaller error indicates a better fit between the model and the observed data in the sample. Question: What is the difference between the sum of errors and the average error? ○ Answer: The sum of errors represents the total discrepancy, while the average error divides this by the number of data points to give a per-sample measure. Question: Why is perfect error (zero) not always ideal? ○ Answer: A perfect error of zero may indicate overfitting, where the model captures noise or biases specific to the sample rather than generalizing to the population. Optimal Models Question: What does argmin\text{argmin}argmin mean in the context of model optimization? ○ Answer: argmin\text{argmin}argmin finds the parameters a∗a^*a∗ and b∗b^*b∗ that minimize the error function, yielding the optimal model. Question: What are optimal parameter values for linear regression? ○ Answer: Optimal values of aaa and bbb are those that minimize the sum of squared errors for the dataset. Question: How does the complexity of a model influence the required sample size? ○ Answer: More complex models with more parameters require larger sample sizes to accurately fit the data and avoid overfitting. Advanced Concepts Question: Can linear regression be extended to higher dimensions? ○ Answer: Yes, linear regression can generalize to higher dimensions, fitting planes (e.g., y=ax+bx′+cy = ax + bx' + cy=ax+bx′+c) or other hyperplanes in multidimensional spaces. Question: How can non-linear functions be addressed with linear regression? ○ Answer: Non-linear functions can sometimes be transformed into linear ones, such as using logy\log ylogy for exponential relationships. Examples Question: In a linear model y=0.5x+0.5y = 0.5x + 0.5y=0.5x+0.5, what happens if the data points deviate from the line? ○ Answer: Deviations are considered noise or errors, and the sum of squared differences measures the overall fit. Question: Why might a sample of 100 points be sufficient for a simple linear model? ○ Answer: Simple linear models with few parameters (e.g., aaa and bbb) require fewer samples to achieve accurate approximations of the population. Key Terms Model: A mathematical representation of data relationships. Hypothesis space: The set of all possible models defined by a given parameterization. Parameters: Constants (e.g., aaa and bbb) in the model that define its behavior. Error function: A measure of how well a model fits the observed data. Linear regression: A method for finding the best-fitting straight line in a dataset. Noise: Random variations in data that do not follow the underlying trend. Overfitting: A model that performs well on the training data but fails to generalize to unseen data. Application of Models Produced by Machine Learning: General Uses of Models Question: Why do we extract models from data in machine learning? ○ Answer: Models allow us to make decisions and predictions based on data. They are used for tasks like determining membership in a population, predicting missing values, identifying unusual data points (outliers), and generating new data. Membership Test and Classification Question: What is a membership test in machine learning? ○ Answer: A membership test checks if a given data point fits the description of a population as defined by a model. It determines how closely the data point matches the patterns or characteristics the model has learned. Question: How is membership testing related to classification? ○ Answer: Membership testing often involves categorizing a data point into one of two groups: member or non-member. In classification, this concept is extended to multiple categories. Question: How might a membership test apply in medical diagnostics? ○ Answer: For example, a model trained on images labeled as “cancer” or “no cancer” might assess whether a new image fits into either category. The result helps decide the most likely label or whether the data is unsuitable for classification. Question: What do small or large deviations from a model's expectations indicate? ○ Answer: Small deviation: The data likely belongs to the population. Large deviation: The data is less likely to belong or may indicate that the recording conditions or labeling are inconsistent with the model. Prediction Question: What does prediction mean in machine learning? ○ Answer: Prediction involves estimating missing parts of data based on patterns learned by the model. For example, if part of a tuple is known, the model predicts the missing values to fit the expected pattern. Question: How is prediction different from classification? ○ Answer: Prediction typically involves estimating continuous or numeric values, while classification assigns data to specific categories or labels. Question: Give an example of prediction in practice. ○ Answer: In genomics, a model might predict which parts of a DNA sequence correspond to genes, providing probabilities for each subsequence. Outlier Analysis Question: What is an outlier in machine learning? ○ Answer: An outlier is a data point that significantly deviates from the general pattern of the dataset. It appears unusual or unexpected compared to the population. Question: What are two ways outliers can be interpreted? ○ Answer: 1. Noise: The outlier may represent irrelevant data or errors, which should be removed to refine the model. 2. Interesting phenomena: The outlier might indicate a significant discovery, such as a new scientific finding or an unusual transaction. Question: How can outliers be handled in regression models? ○ Answer: By first identifying and removing data points with high deviations, then re-calculating the model to better fit the remaining data. Generative Models Question: What is a generative model in machine learning? ○ Answer: A generative model creates new data points that mimic the patterns of the original data. These models are useful for generating synthetic training or testing data. Question: How can generative models help when data is scarce? ○ Answer: Generative models can produce artificial data to supplement small training or testing datasets, ensuring the model has enough information to learn effectively. Question: Provide an example of a generative model in practice. ○ Answer: A model trained on genomic sequences could generate artificial DNA sequences to test how well different gene-prediction algorithms perform on unseen data. Question: What advanced applications do generative models enable? ○ Answer: Generative models can create entirely new images, transform existing ones, or simulate realistic scenarios, which are widely used in deep learning applications. Advanced Concepts Question: What happens if both possible classifications for a data point seem plausible? ○ Answer: This may indicate that the model is insufficient or that the data point is particularly challenging to classify. Further investigation or alternative tests may be required. Question: Why is it important to distinguish between false positives and false negatives? ○ Answer: In critical applications like medical diagnosis, false positives can lead to unnecessary treatments, while false negatives can miss serious conditions. Different consequences make careful evaluation essential. Question: How do generative models relate to deep learning? ○ Answer: In deep learning, generative models are used to create or transform data, such as generating realistic images or enhancing datasets for specific tasks. Key Terms Membership test: Determines whether a data point fits the population described by a model. Classification: Assigns a data point to a specific category or label. Prediction: Estimates missing data points based on patterns in the model. Outlier: A data point that significantly deviates from the dataset's pattern. Generative model: A model used to create new data points by mimicking the original data. Terminology: Machine Learning Problems, Tasks, Algorithms, and Methodologies Machine Learning Problems Question: What is a machine learning problem? ○ Answer: A machine learning problem involves using a sample of data and a model class (with parameters) to determine parameter values that produce a model satisfying specific quality criteria. Question: Why might we need to change the model class during training? ○ Answer: If the chosen model class cannot meet the quality criteria or generalize well to new data, we may need to try a different class of models. Learning Algorithms Question: What is a learning algorithm? ○ Answer: A learning algorithm takes a dataset and a model class and produces a model with parameter values that aim to minimize error on the training data. Question: Why can’t learning algorithms always guarantee the best model? ○ Answer: For complex models, the algorithm may only find an approximation rather than the parameters that produce the smallest error due to factors like model complexity or computational limits. Machine Learning Tasks Question: What are the main components of a machine learning task? ○ Answer: 1. Identify the domain and collect a sufficiently large sample. 2. Ensure the sample is unbiased or has controlled bias for the task. 3. Define quality criteria (e.g., accuracy, generalizability). 4. Select and train a model class. 5. Choose the best model for the task. Question: What is controlled bias, and when is it desirable? ○ Answer: Controlled bias occurs when a sample is intentionally adjusted for specific purposes, such as ensuring equal numbers of examples for different categories (e.g., balanced "cancer" and "no cancer" labels). Quality Criteria for Models Question: How do we measure the quality of a model? ○ Answer: By splitting the dataset into training and validation sets, we measure the error on both parts. A good model has low training and validation errors of similar size, indicating it can generalize well to unseen data. Question: What does it mean if validation error is much higher than training error? ○ Answer: This indicates overfitting, where the model memorizes the training data but fails to generalize to new data. Question: What is underfitting? ○ Answer: Underfitting occurs when the model performs poorly even on the training data, often because the model class is too simple, unsuitable, or lacks sufficient data to learn from. Overfitting Question: What causes overfitting? ○ Answer: Overfitting can occur when: 1. The model class has too many parameters. 2. The training dataset is too small. 3. Irrelevant features in the data dominate the learning process. Question: How can overfitting be addressed? ○ Answer: 1. Use a simpler model with fewer parameters. 2. Identify and remove irrelevant or redundant features. 3. Collect more training data. Question: Provide an example of an overfitted model. ○ Answer: A model that memorizes every training example, such as returning specific labels for each input in the training set, will perform perfectly on training data but fail on new inputs. Underfitting Question: What causes underfitting? ○ Answer: Underfitting happens when: 1. The model class is too simple (e.g., linear models for non-linear problems). 2. The dataset lacks sufficient information to learn the task. 3. The task is poorly defined. Question: How can underfitting be addressed? ○ Answer: 1. Use a more complex or appropriate model class. 2. Ensure the dataset contains enough relevant information for the task. Training and Validation Question: What is the purpose of splitting data into training and validation sets? ○ Answer: Splitting data helps assess whether the model generalizes well. The training set is used to fit the model, while the validation set checks performance on unseen data. Question: Why is it useful to try multiple training-validation splits? ○ Answer: Repeated splits ensure that the model's performance is consistent and not dependent on a specific split of the data. Examples and Practices Question: Why is balancing training data important for tasks like classification? ○ Answer: Imbalanced data, such as far more "no cancer" than "cancer" examples, can lead to biased models that fail to detect minority cases effectively. Question: What is a simple way to identify overfitting or underfitting? ○ Answer: Overfitting: Low training error but high validation error. Underfitting: High training error and high validation error. Question: Why might a high training error indicate a poorly chosen model class? ○ Answer: A model class may lack the complexity needed to capture the underlying patterns in the data (e.g., using linear models for non-linear relationships). Key Terms Learning algorithm: An algorithm that finds the best parameters for a model given a dataset. Training data: Data used to fit the model. Validation data: Data used to test the model’s ability to generalize. Overfitting: When a model learns the training data too well, failing to generalize to new data. Underfitting: When a model fails to capture the patterns even in the training data. Quality criteria: Measures like accuracy or error rates that determine how well a model fits the data. Measuring the Quality of a Trained Model for Classification, focusing on explanations instead of math: Precision Question: What does precision measure in a classification model? ○ Answer: Precision measures how trustworthy the model's predictions are for a specific class. It tells you the proportion of predictions for a class that are actually correct. Question: What does a low precision score indicate? ○ Answer: A low precision score indicates that the model often misclassifies other classes as the target class (many false positives). Recall Question: What does recall measure in a classification model? ○ Answer: Recall measures the model's ability to identify all the actual instances of a specific class. It tells you the proportion of true instances that the model correctly detects. Question: What does a low recall score indicate? ○ Answer: A low recall score means the model fails to identify many true instances of the class (many false negatives). F-Measure Question: What is the F-measure, and why is it used? ○ Answer: The F-measure combines precision and recall into a single metric. It provides a balanced score that reflects both how accurate and how comprehensive the model is for a class. It ranges from 0 to 1, with 1 being perfect. Question: Why does the F-measure "punish" low precision or recall scores? ○ Answer: Because it uses the harmonic mean, a low value in either precision or recall has a significant impact on the overall F-measure, emphasizing the importance of both metrics. Confusion Matrix Question: What is a confusion matrix? ○ Answer: A confusion matrix summarizes the model's performance by showing how many instances of each class were correctly or incorrectly classified. The diagonal represents correct predictions, while other cells show errors. Question: How do you interpret a confusion matrix? ○ Answer: Larger numbers along the diagonal indicate better performance. Non-diagonal cells represent misclassifications, with larger numbers showing specific areas where the model struggles. Question: Why is the confusion matrix useful in classification tasks? ○ Answer: It provides a detailed breakdown of errors for each class, helping to identify specific weaknesses in the model. Medical Applications: Sensitivity and Specificity Question: What is sensitivity, and why is it important in medical diagnosis? ○ Answer: Sensitivity, also called the true positive rate, measures the proportion of actual positive cases correctly identified by the model. It is crucial for ensuring the model detects all cases of a disease. Question: What is specificity, and why is it important in medical diagnosis? ○ Answer: Specificity measures the proportion of actual negative cases correctly identified. It ensures the model avoids misclassifying healthy individuals as diseased. Question: Why is sensitivity often prioritized in critical medical applications? ○ Answer: Missing a true positive (false negative) can have severe consequences, such as failing to diagnose a life-threatening condition. Accuracy Question: What does accuracy measure in classification? ○ Answer: Accuracy measures the proportion of all instances (across all classes) that the model correctly classifies. Question: Why can accuracy be misleading in imbalanced datasets? ○ Answer: In datasets where one class dominates, a high accuracy might simply reflect the model’s ability to predict the majority class correctly, ignoring minority classes. Limitations of Quality Metrics Question: Why are metrics like precision, recall, and accuracy problematic? ○ Answer: These metrics depend on the test sample size, which is often chosen arbitrarily. Without robust statistical methods, the reported metrics may not reliably represent the model's true performance. Question: Why does the lack of statistical rigor pose a challenge in machine learning? ○ Answer: Unlike traditional statistical modeling, machine learning often relies on intuition to determine if results are “good enough” or if datasets are “large enough,” leading to uncertainty about the reliability of reported metrics. Examples Question: In a medical test for a disease, what do false positives and false negatives represent? ○ Answer: False positives: Healthy individuals misclassified as having the disease. False negatives: Diseased individuals misclassified as healthy. Question: What does a perfect confusion matrix look like? ○ Answer: A perfect confusion matrix has non-zero values only along the diagonal, with zeros in all other cells, indicating no misclassifications. Key Terms Precision: Trustworthiness of a class prediction. Recall: Proportion of actual class instances identified. F-measure: Balance between precision and recall. Confusion matrix: A table showing correct and incorrect predictions for each class. Sensitivity: The model’s ability to identify true positives. Specificity: The model’s ability to identify true negatives. Accuracy: Overall proportion of correct predictions. A Sketch of Iterative Learning Methods: General Concepts Question: What is iterative learning in machine learning? ○ Answer: Iterative learning is a step-by-step process to find the best parameter values for a model by gradually reducing the error. It’s essential when there’s no direct formula to compute the solution, as is common in complex models like neural networks. Question: Why is iterative learning necessary for complex models? ○ Answer: Complex models have many parameters, and their error surfaces can’t be fully calculated due to high computational costs. Iterative methods allow us to approximate the best parameter values without directly solving for them. Error Surface Question: What is an error surface in machine learning? ○ Answer: An error surface represents how the model's error changes as parameters are adjusted. It’s like a 3D landscape where the height represents error: peaks are high error, and valleys are low error. The goal is to find the deepest valley (minimum error). Question: What does the global minimum on an error surface represent? ○ Answer: The global minimum is the point on the error surface where the model has the smallest possible error, meaning the parameter values at this point make the model perform its best. Question: What challenges arise with error surfaces for complex models? ○ Answer: The surface can have: Local minima: Small valleys that aren’t the global minimum, leading to suboptimal solutions. Flat plateaus: Regions where the error hardly changes, making it difficult for the algorithm to decide how to move forward. Iterative Algorithm Question: How does an iterative algorithm mimic finding the minimum error? ○ Answer: It’s like rolling a ball down the error surface. The algorithm evaluates the error at nearby points and moves in the direction where the error decreases the most. It repeats this process until it reaches a point where no further improvement is possible. Question: What happens if the step size (ϵ\epsilonϵ) is too large or too small? ○ Answer: Too large: The algorithm jumps over good solutions, missing narrow valleys where the error is low. Too small: The algorithm moves very slowly, taking a long time to converge to the minimum error. Question: How can iterative algorithms handle local minima? ○ Answer: By: Running the algorithm multiple times with different random starting points, increasing the chances of finding the global minimum. Dynamically adjusting the step size to explore the surface more effectively. Refinements in Neural Networks Question: How is backpropagation in neural networks different from basic iterative learning? ○ Answer: Backpropagation doesn’t calculate the total error for the entire dataset in every step. Instead, it updates the parameters based on one data point (or a small batch) at a time, making it faster and more efficient for large datasets. Question: What is the role of Δp\Delta pΔp in backpropagation? ○ Answer: Δp\Delta pΔp is the adjustment to the current parameter values. It’s calculated to reduce the error for a specific data point. By making small, targeted adjustments, the algorithm gradually improves the model. Challenges with High-Dimensional Spaces Question: Why is high-dimensional parameter space challenging in iterative learning? ○ Answer: In high dimensions, there are exponentially more potential solutions, and the error surface can have countless local minima. This makes it harder to find the global minimum, as the algorithm might get stuck in suboptimal regions. Question: How does dimensionality affect the need for starting points? ○ Answer: The more dimensions (parameters) a model has, the more starting points are needed to ensure the algorithm explores the surface adequately and finds the global minimum. Practical Considerations Question: How does dynamically adjusting the step size help in iterative learning? ○ Answer: Adjusting the step size allows the algorithm to take large steps in flat areas to speed up progress and smaller steps in steep regions to avoid overshooting the minimum. Question: Why is stopping criteria important in iterative algorithms? ○ Answer: Stopping criteria ensure the algorithm stops when further improvements are negligible. This prevents wasting time on unnecessary computations when the solution is already good enough. Key Terms Error surface: A representation of how error changes as model parameters are adjusted. Global minimum: The best possible parameter values, where the error is lowest. Local minimum: Suboptimal parameter values where error is low but not the lowest. Step size (ϵ\epsilonϵ): The amount the parameters are adjusted in each iteration; it affects the algorithm's speed and accuracy. Backpropagation: An iterative learning method in neural networks that updates parameters based on individual data points to improve efficiency. Supervised Learning Question: What is supervised learning in machine learning? ○ Answer: Supervised learning involves training a model with labeled data, where each data point includes an input and a known output (e.g., category or prediction). The model learns to map inputs to outputs based on these examples. Question: Why is supervised learning like having a teacher? ○ Answer: In supervised learning, the system (student) learns from a dataset (provided by a teacher) that explicitly tells it the correct answer for each data point. This feedback allows the model to improve its predictions. Question: Provide an example of supervised learning. ○ Answer: A dataset of annotated photos with labels like "cow" or "grand piano" teaches a model to classify images based on features such as the number of legs. For instance: 4 legs: Likely a cow. 3 legs: Likely a grand piano. Unsupervised Learning Question: What is unsupervised learning in machine learning? ○ Answer: Unsupervised learning involves analyzing data without explicit labels. The goal is to identify hidden patterns or groupings (clusters) that naturally exist in the data. Question: What is clustering in unsupervised learning? ○ Answer: Clustering groups data points into clusters based on similarity. For example, Martian researchers might group humans into clusters like "men," "women," and "children" based on body shape and size, without knowing these labels. Question: How does k-means clustering work? ○ Answer: K-means clustering assigns data points to kkk clusters, each represented by a centroid. The algorithm minimizes the distance between data points and their respective centroids, iteratively refining the clusters. Question: What are the limitations of k-means clustering? ○ Answer: The developer must choose kkk (the number of clusters) in advance. It works best with evenly distributed, compact groups of data and may fail with irregularly shaped clusters. Feature Detection Question: How does feature detection improve data analysis? ○ Answer: Feature detection identifies key attributes or patterns in data that are more informative than the raw input. These features can simplify the data while retaining essential information. Question: How can clustering be used to detect features? ○ Answer: Clustering can identify groups in data, such as regions with high text message traffic in a city. The region number becomes a feature that replaces raw location data, simplifying further analysis. Question: Provide an example of feature detection in text message analysis. ○ Answer: A police dataset initially includes GPS coordinates and raw text. By clustering, regions are identified, and GPS data is replaced with region numbers. Later features like "language," "mentions of drugs," or "mentions of weapons" are added for more targeted analysis. Differences Between Supervised and Unsupervised Learning Question: What is the key difference between supervised and unsupervised learning? ○ Answer: Supervised learning: Uses labeled data to train models for prediction or classification. Unsupervised learning: Analyzes unlabeled data to discover patterns or structures, like clusters. Question: Can unsupervised learning support supervised learning? ○ Answer: Yes, unsupervised learning can generate features (like cluster labels) that can be used as inputs for supervised learning tasks. Deep Learning and Automatic Feature Detection Question: How does deep learning handle feature detection differently? ○ Answer: Deep learning automates feature detection, extracting increasingly abstract features from raw data through multiple layers, without requiring manual intervention. Question: Provide an example of deep learning in image analysis. ○ Answer: Deep learning models can process raw pixel values to identify edges, shapes, and objects in images, gradually building higher-level features like "faces" or "cars" through layered processing. Applications of Feature Detection Question: How is feature detection used in law enforcement? ○ Answer: Law enforcement can use feature detection to identify suspicious behaviors or locations, such as clustering phone messages to find high-activity regions or analyzing messages for mentions of drugs or weapons. Question: Why are features important for machine learning models? ○ Answer: Features simplify data, retain essential information, and make it easier for models to detect patterns and make accurate predictions. Key Terms Supervised learning: Training a model with labeled data. Unsupervised learning: Discovering patterns in unlabeled data. Clustering: Grouping data points based on similarity. Feature detection: Identifying important attributes in data. k-means clustering: A method to group data into kkk clusters by minimizing distances to cluster centroids. Deep learning: Automatically extracts features through layers of abstraction. Powerpoints Flashcards for the Slideshow Content Overview of Deep Learning Question: What are some key goals of deep learning projects? ○ Answer: Key goals include classification, prediction, and generation of data or models. Question: What is a typical workflow for a deep learning project? ○ Answer: A typical workflow includes defining the task, collecting data, preprocessing data, selecting a model type, training the model, and evaluating performance. Key Concepts in Machine Learning Question: What is the difference between a population and a sample? ○ Answer: A population is the imagined superset of all possible data related to a task, while a sample is a manageable subset of data taken from the population. Question: Why is data regularity important in machine learning? ○ Answer: Regularity ensures that data has inherent patterns and properties that models can learn to generalize for tasks like classification or prediction. Linear Regression and Models Question: What is the goal of linear regression in machine learning? ○ Answer: The goal is to find the best parameter values for a linear model that minimizes the error between predicted and actual values. Question: How are outliers interpreted in machine learning? ○ Answer: Outliers can be noise (errors in data) or significant anomalies, such as new phenomena or potential issues requiring further investigation. Supervised Learning Question: What is supervised learning? ○ Answer: Supervised learning uses labeled data where each data point has an input and a known output. The model learns to map inputs to outputs based on this training data. Question: What are the main steps in a supervised learning workflow? ○ Answer: Steps include defining the task, collecting and preprocessing data, splitting data into training/validation/test sets, selecting a model, training the model, and evaluating its performance. Convolutional Neural Networks (CNNs) Question: What are convolutional layers in CNNs? ○ Answer: Convolutional layers use filters (small neural networks) to scan input data (e.g., images) for features like edges or textures, producing feature maps. Question: What are the typical applications of CNNs? ○ Answer: CNNs are commonly used for classification (e.g., "this is a dog") and segmentation (e.g., "here is the dog in the image"). Question: How does transfer learning improve CNN training? ○ Answer: Transfer learning reuses pre-trained models (e.g., VGG16) to save time and resources by fine-tuning them for specific tasks. Neural Network Basics Question: What is an artificial neuron? ○ Answer: An artificial neuron is a small computational unit that combines inputs with weights, applies an activation function, and produces an output. Question: What is the role of weights in neural networks? ○ Answer: Weights determine how much influence each input has on the neuron's output. Adjusting weights during training helps the network learn. Question: What are some key terms in neural network training? ○ Answer: ○ Epoch: One pass of the entire training dataset through the network. ○ Learning Rate: How much weights are adjusted during training. ○ Batch: A subset of data used for weight updates in each iteration. Deep Neural Networks Question: What makes a neural network "deep"? ○ Answer: A deep neural network has multiple layers of neurons, enabling it to learn hierarchical and abstract features from data. Question: Why are modern deep networks so powerful? ○ Answer: Modern networks leverage faster computers (e.g., GPUs), large datasets, and improved architectures and algorithms to solve complex problems. Keras/TensorFlow Workflow Question: How do you define a convolutional layer in Keras? Answer: Use the Conv2D layer, specifying parameters like the number of filters, kernel size, and activation function. Example: python Kopier kode x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs) ○ Question: What is the role of MaxPooling and Flattening in CNNs? ○ Answer: ○ MaxPooling: Reduces the size of feature maps while retaining important features. ○ Flattening: Converts feature maps into a 1D array for input to fully connected layers. Question: What is the function of a dense (fully connected) layer in Keras? ○ Answer: A dense layer connects every neuron from the previous layer to every neuron in the current layer, making final predictions. Practical Exercises and Tools Question: What is "Teachable Machine," and how is it used? ○ Answer: Teachable Machine is an online tool for training simple machine learning models using images, audio, or poses. It provides an easy way to experiment with classification tasks. Question: Why is practical experimentation important in deep learning? ○ Answer: Hands-on experimentation allows you to understand how models work, fine-tune them for specific tasks, and identify challenges like overfitting or underfitting. Challenges and Limitations Question: What is overfitting in machine learning? ○ Answer: Overfitting occurs when a model learns the training data too well, including its noise, and fails to generalize to new, unseen data. Question: What are common strategies to prevent overfitting? ○ Answer: Use techniques like: ○ Reducing model complexity (fewer parameters). ○ Increasing the amount of training data. ○ Using regularization methods (e.g., dropout). Question: How do local minima and plateaus affect training? ○ Answer: Local minima can trap the training process in suboptimal solutions, while plateaus slow down training due to minimal error variation. Segmentation.pdf Model Building in Keras Question: What are the different ways to define models in Keras? ○ Answer: Models can be defined using the Sequential API, functional API, or a mix of both for greater flexibility. Question: What advantage does the functional API offer in Keras? ○ Answer: The functional API allows for more complex architectures, such as combining multiple layers or creating branching models, enabling flexibility in model design. Data Augmentation Question: When should you avoid using data augmentation? ○ Answer: Avoid using data augmentation for standardized datasets, such as CT scans, where transformations could distort the information. Question: What are potential benefits and drawbacks of data augmentation? ○ Answer: Benefits: It can introduce variations, helping models generalize better. Drawbacks: It does not add new information and may introduce noise, potentially degrading performance. Convolutional Layers in Autoencoders Question: What is the purpose of using convolutional layers in autoencoders? ○ Answer: Convolutional layers in autoencoders help learn compressed representations of data and reconstruct outputs such as: Denoising images Adding artistic effects Colorizing black-and-white images Question: What additional layer is unique to autoencoders? ○ Answer: Conv2DTranspose, which is used to reconstruct the original input from a compressed representation, effectively upsampling data. Segmentation Question: What is image segmentation? ○ Answer: Image segmentation involves identifying and isolating relevant parts of an image, often by creating a pixel-by-pixel mask to highlight areas of interest. Question: Why is segmentation important for medical imaging? ○ Answer: It highlights areas where medical experts should focus, aiding in disease diagnosis and improving classification accuracy by preprocessing images. Applications of Autoencoders and Segmentation Question: How can segmentation improve CNN classification? ○ Answer: By segmenting and cropping relevant areas of an image before passing them to a CNN, the model can focus on critical regions, improving accuracy. Question: Give an example of an autoencoder application outside of segmentation. ○ Answer: Autoencoders can be used to add artistic effects to images, such as film grains or stylized transformations. Terminology Question: What is a mask in segmentation? ○ Answer: A mask is a binary or grayscale image where specific areas are marked to indicate regions of interest in the original image. Exercises and Considerations Question: How does having three categories in a classification task differ from two? ○ Answer: The model structure and loss function must adapt to handle the additional category, such as using softmax for multiple classes instead of sigmoid for binary. Question: Why is it important to modify segmentation models for tasks without outline pixels? ○ Answer: Many real-world segmentation tasks do not include predefined outlines, so models need to be robust to handle raw image data. Flashcards: What is an Autoencoder? Definition Question: What is an autoencoder? ○ Answer: An autoencoder is a type of neural network designed to learn efficient, compressed representations of input data by reconstructing it. Structure Question: What are the main components of an autoencoder? ○ Answer: Encoder: Compresses the input into a latent-space representation. Latent Space: The compressed, low-dimensional representation of the input. Decoder: Reconstructs the input from the latent-space representation. Question: What layers are typically used in autoencoders? ○ Answer: Convolutional layers for feature extraction, pooling layers for downsampling, and transposed convolutional layers (e.g., Conv2DTranspose) for reconstruction. Purpose Question: What is the primary purpose of an autoencoder? ○ Answer: To compress data into a smaller representation while retaining essential information, and then reconstruct it as closely as possible to the original input. Question: Why is the latent space important in an autoencoder? ○ Answer: It represents the most critical features of the data in a compact form, which can be used for tasks like denoising or feature extraction. Applications Question: What are some common applications of autoencoders? ○ Answer: Image denoising: Removing noise from images. Dimensionality reduction: Reducing features for data analysis. Anomaly detection: Identifying deviations by analyzing reconstruction errors. Data generation: Creating new, similar data points (e.g., images). Question: How are autoencoders used in segmentation tasks? ○ Answer: In segmentation, autoencoders take an input image and output a segmented version by reconstructing only the regions of interest, often as a mask. Variants Question: What is a variational autoencoder (VAE)? ○ Answer: A VAE is a type of autoencoder that imposes a probabilistic structure on the latent space, enabling data generation and interpolation. Question: What makes a denoising autoencoder unique? ○ Answer: It is trained to reconstruct clean data from noisy input, improving robustness to noise in data. Limitations Question: What are the limitations of autoencoders? ○ Answer: May struggle with generating completely accurate reconstructions for complex data. Cannot generalize well to unseen data if not properly trained. Latent space representation may lack interpretability without careful design. Flashcards: Residual Connections, U-Net, and Segmentation Techniques Residual Connections Question: What are residual connections in neural networks? ○ Answer: Residual connections allow layers to learn residual mappings instead of direct mappings by adding the input of a layer to its output. This helps preserve original information for later stages of the network. Question: Why are residual connections important? ○ Answer: They mitigate the vanishing gradient problem, make training deeper networks feasible, and improve performance by maintaining information flow. Question: What is an intuitive analogy for residual connections? ○ Answer: They are like adding a hazy version of an image to its sharp version in photography, enhancing both details and overall context. Question: What is a key requirement for.add() in Keras? ○ Answer: The arguments passed to.add() must have the same dimensions. Batch Normalization Question: What does batch normalization do? ○ Answer: It normalizes the output of a layer to have a mean of 0 and a standard deviation of 1 for each batch, helping to stabilize and accelerate training. Question: Where should batch normalization be applied in a network? ○ Answer: Before the activation function. Question: How does batch normalization compare to activation functions? ○ Answer: While both help improve training stability, batch normalization focuses on standardizing outputs, whereas activation functions introduce non-linearity. Depthwise Separable Convolutions Question: What are depthwise separable convolutions? ○ Answer: They are a more efficient version of standard convolutions that separate spatial filtering from depth filtering, reducing computation while maintaining accuracy. Question: Why might depthwise separable convolutions be used? ○ Answer: They are computationally lighter and faster, especially for models deployed on resource-constrained devices. U-Net Architecture Question: What is a U-Net? ○ Answer: A neural network architecture designed for image segmentation, particularly in biomedical applications, using an encoder-decoder structure with skip connections. Question: What role do residual-like principles play in U-Nets? ○ Answer: Skip connections (gray arrows) directly link encoder layers to decoder layers, preserving spatial information and improving segmentation accuracy. Question: Why is U-Net considered a strong choice for segmentation? ○ Answer: Its design allows for precise localization and reconstruction, which is critical in tasks like medical image analysis. Segmentation Techniques Question: How is segmentation different from classification? ○ Answer: Segmentation identifies specific regions or pixels of interest in an image, while classification labels the entire image as belonging to a single category. Question: What are common use cases for segmentation? ○ Answer: Medical imaging (e.g., highlighting areas of disease). Object detection (e.g., segmenting animals in images). Improving classification by preprocessing segmented regions. Question: How do U-Nets use convolutional and transposed convolutional layers? ○ Answer: Convolutional layers: Extract features during encoding. Transposed convolutional layers: Reconstruct spatial dimensions during decoding. Practical Applications Question: What are residual connections particularly useful for? ○ Answer: Enhancing performance in segmentation tasks and improving results when combined with architectures like U-Net. Question: What result differences were observed with and without residual connections? ○ Answer: Best results were achieved with residual connections when paired with U-Net. Question: What is the advantage of U-Nets in segmentation compared to standard CNNs? ○ Answer: U-Nets preserve spatial information through skip connections, allowing more accurate and context-aware segmentation. Flashcards: Segmentation Networks and Generative Deep Learning Segmentation Networks Question: What is the typical architecture of segmentation networks? ○ Answer: Segmentation networks often use an encoder-decoder structure, where the encoder extracts features and the decoder reconstructs the segmented image. Question: Which segmentation network is widely used and successful? ○ Answer: U-Net and its variations are the most successful segmentation networks, especially for biomedical applications. Question: What is the difference between add(...) and concatenate(...) in U-Net implementations? ○ Answer: The original U-Net uses concatenate(...) to combine feature maps, while some implementations, like Chollet's example, use add(...). Question: How can GANs assist segmentation? ○ Answer: GANs can generate "fake" annotated test images, which are useful for training segmentation models when real annotated data is limited. Generative Deep Learning Question: What is generative deep learning? ○ Answer: Generative deep learning goes beyond classification, prediction, and segmentation by creating new outputs, such as text, images, or translations, often with elements of randomness. Question: What are common applications of generative deep learning? ○ Answer: Language translation (e.g., transformers like ChatGPT). Image generation (e.g., style transfer, GANs). Creative outputs with elements learned from training data. Simplistic Text Generation (n-grams) Question: How does n-gram-based text generation work? ○ Answer: It builds a database of word sequences and their probabilities from training text. During generation, it selects the next word based on these probabilities. Question: What is the role of "strictness" in n-gram generation? ○ Answer: Strictness modifies word probabilities, where higher strictness favors the most probable words, while lower strictness allows more randomness. Transformer Models Question: What are transformer models, and how are they used? ○ Answer: Transformers, like those in ChatGPT, use contextual word embeddings to generate text or translate language by understanding the relationship between all parts of input data. Question: How do transformer models differ from n-gram models? ○ Answer: Transformers use advanced architectures to understand context and relationships in the input, whereas n-gram models rely on simpler probability-based sequences. Style Transfer Question: What is style transfer? ○ Answer: Style transfer applies the "style" of one image to the "content" of another, creating a hybrid image. Question: How is style transfer implemented? ○ Answer: Using convolutional neural networks, it minimizes the difference between the style features of one image and the content features of another. Question: What algorithm is commonly used for style transfer? ○ Answer: The algorithm proposed by Gatys, Ecker, and Bethge (2016), often using networks like VGG19. GANs (Generative Adversarial Networks) Question: What are GANs, and how do they work? ○ Answer: GANs involve two models, a generator and a discriminator, working in opposition. The generator creates fake data, and the discriminator learns to distinguish between real and fake data. Question: What is the goal of the generator in a GAN? ○ Answer: The generator aims to produce data that the discriminator classifies as real. Question: What is the discriminator’s role in a GAN? ○ Answer: The discriminator evaluates data and classifies it as real or fake, helping improve the generator. Summary of Deep Learning Applications Question: What are the key components of deep learning architectures? ○ Answer: Encoding: Convolutions and pooling layers. Decoding: Transposed convolutions and reconstruction layers. Classification: Fully connected layers for final decisions. Question: What are examples of generative tasks in deep learning? ○ Answer: Segmentation (e.g., U-Net). Image improvement (e.g., noise reduction). Style transfer. Text/image generation (e.g., ChatGPT, DALL-E). Flashcards on Deep Learning for Text Processing Word Embeddings Question: What is a word embedding? ○ Answer: A word embedding maps words to vectors where each component represents a feature. These features are machine-generated and not directly interpretable by humans. Question: Why are word embeddings used instead of one-hot encodings? ○ Answer: Word embeddings capture relationships between words, reduce the number of parameters, and encode semantic information, unlike one-hot encodings which lack contextual meaning. Question: How are word embeddings created? ○ Answer: They are learned by analyzing co-occurrences of words in large datasets or by using pretrained models like GloVe or Word2Vec. Question: What do the distances between word embeddings signify? ○ Answer: Distances reflect conceptual relationships. Closer embeddings indicate stronger semantic similarity, though context plays a significant role. Self-Attention Mechanism Question: What is self-attention? ○ Answer: Self-attention is a mechanism that adjusts word embeddings based on the relationships between words in a given context. It emphasizes relevant words and de-emphasizes irrelevant ones. Question: How does self-attention work? ○ Answer: Each word is compared with every other word in the sequence to calculate attention scores. These scores are normalized and used to weight the embeddings, producing context-aware representations. Question: What is the result of applying self-attention? ○ Answer: The result is a new embedding for each word that reflects its meaning within the specific context of the sentence or document. Transformers Question: What is a transformer model? ○ Answer: A transformer is a deep learning architecture designed for sequence data. It uses self-attention to process input data and is foundational for models like ChatGPT and BERT. Question: How do transformers generate text? ○ Answer: Transformers generate text one token at a time. They use the context of previously generated tokens and input queries to predict the next token. Question: Why are transformers significant for NLP? ○ Answer: Transformers handle long-range dependencies, adapt embeddings to context, and outperform traditional RNNs and LSTMs in tasks like translation, summarization, and text generation. Bias in Word Embeddings and Models Question: How does bias arise in word embeddings? ○ Answer: Bias arises from the training data, which may reflect societal, cultural, or political biases. These biases can propagate to models like ChatGPT. Question: How can bias in word embeddings be detected? ○ Answer: Bias can be detected by analyzing embeddings for words or phrases that reveal stereotypes, political leanings, or cultural biases using visualization tools. Applications of Transformers Question: What are common applications of transformers? ○ Answer: Applications include: Language translation Text summarization Sentiment analysis Chatbots and conversational agents Question-answering systems Question: How does translation work with transformers? ○ Answer: Transformers trained on parallel corpora (sentence pairs in two languages) generate translations by understanding and rearranging word order and syntax based on context. Generative Deep Learning Question: What is generative deep learning? ○ Answer: Generative deep learning involves creating new data instances, such as text or images, that resemble the training data but are unique. Question: What is an example of generative deep learning? ○ Answer: Examples include: ChatGPT for text generation GANs for image synthesis Style transfer for applying the style of one image to another Style Transfer Question: What is style transfer? ○ Answer: Style transfer involves applying the artistic style of one image to the content of another using deep learning techniques. Question: How is style transfer achieved? ○ Answer: A pretrained convolutional neural network (e.g., VGG19) extracts style and content features. The model optimizes a new image to match the style of one image and the content of another. GANs (Generative Adversarial Networks) Question: What is a GAN? ○ Answer: A GAN consists of two models: a generator that creates data and a discriminator that evaluates whether the data is real or generated. They train together to improve data realism. Question: What are GANs used for? ○ Answer: GANs are used for tasks like image generation, data augmentation, and creating realistic synthetic data for training. LLMs (Large Language Models) Question: What are LLMs? ○ Answer: LLMs are large-scale transformer models trained on vast text corpora to generate human-like text, answer questions, and perform complex language tasks. Question: What are examples of LLMs? ○ Answer: Examples include ChatGPT, Bard, GPT-3, and Azure's language models. Question: What are some challenges with LLMs? ○ Answer: Challenges include bias in training data, hallucination of facts, and high computational costs. Q: What is the purpose of a loss function in training? A loss function guides the optimization process in training by quantifying the difference between predictions and true values. The goal is to minimize the loss and improve model performance. Q: Why are loss functions application-specific? Some inaccuracies are worse than others, depending on the context: ○ False positives: Can be critical, e.g., detecting infectious diseases (COVID-19). ○ False negatives: Can be disastrous, e.g., missing cancer cases. The right loss function reflects the priorities of the application. Q: What are common loss functions for semantic segmentation? Textbook recommendation: Sparse categorical cross-entropy. Literature favorites: ○ Jaccard index (Intersection over Union): Measures overlap between predictions and ground truth. ○ Dice coefficient: Balances precision and recall, especially useful for imbalanced datasets. Q: How is the Jaccard index calculated? Formula: J=1−IntersectionUnionJ = 1 - \frac{\text{Intersection}}{\text{Union}}J=1−UnionIntersection Example: J=1−48+4+11≈0.83J = 1 - \frac{4}{8 + 4 + 11} \approx 0.83J=1−8+4+114≈0.83. Q: How is the Dice coefficient calculated? Formula: D=1−2×Intersection2 \timesIntersection + UnionD = 1 - \frac{2 \times \text{Intersection}}{\text{2 \times Intersection + Union}}D=1−2 \timesIntersection + Union2×Intersection Example: D=1−2×48+2×4+11≈0.70D = 1 - \frac{2 \times 4}{8 + 2 \times 4 + 11} \approx 0.70D=1−8+2×4+112×4≈0.70. Q: What should be considered when designing a loss function for semantic segmentation? Should false negatives (e.g., missed objects) be penalized more than false positives (e.g., extra objects)? The loss function must align with the specific goals of the application. Q: What was the exercise proposed for loss functions? Adjust the math of a loss function to penalize some errors (e.g., large false negatives) more heavily than others, based on application requirements.