Exam Questions ML ENG PDF
Document Details
Uploaded by UndisputableCommonsense6303
Tags
Related
- Mathematics and Statistical Foundations for Machine Learning (FIC 504), Data Science (FIC 506), Cyber Security (FIC 507) PDF
- Machine Learning 1 Classification Methods Lectures PDF
- Machine Learning for Business Analytics 2024 PDF
- Week 3 Review of Machine Learning PDF
- Exam Questions ML ENG (1) PDF
- Probabilistic Machine Learning: An Introduction PDF
Summary
This document contains exam questions on machine learning, covering topics such as probability, Bayes' theorem, regression, classification, neural networks, and model evaluations. It includes questions about the concepts, applications, and evaluation metrics for machine learning models and includes details on the role of probability in machine learning, and how it was used in a project.
Full Transcript
Exam Questions Probability and Statistics 1. What is the role of probability in machine learning, and how was it used in your project? ○ Probability is used to describe uncertainty in a model’s predictions, such as the likelihood of a sound belonging to a particular emotion (ha...
Exam Questions Probability and Statistics 1. What is the role of probability in machine learning, and how was it used in your project? ○ Probability is used to describe uncertainty in a model’s predictions, such as the likelihood of a sound belonging to a particular emotion (happy, angry, sad). ○ In your project, it is implicitly used in classification models like Neural Networks and Random Forest, where probability distributions help make decisions. 2. How is Bayes' theorem used in machine learning? Can you give an example from your project or theory? ○ Bayes' theorem updates probabilities based on new evidence. ○ In your project, it could be applied to adjust the likelihood of an emotion based on prior data, such as if a particular speaker tends to sound sad more often. 3. What does it mean for data to be independent and identically distributed (i.i.d.)? Why is it important? ○ i.i.d. means that data points are independent of each other and follow the same probability distribution. ○ It’s important because it ensures the model learns patterns that generalize well. If your dataset (RAVDESS) were not i.i.d., the model could become biased toward specific speakers or emotions. Regression vs. Classification 4. What is the difference between regression and classification? ○ Regression predicts continuous values (e.g., temperature). ○ Classification predicts categories (e.g., whether a sound is "happy" or "angry"). Your project is a classification problem since it involves emotion classes. 5. Can you explain why regression isn’t used for emotion recognition and how classification is used instead? ○ Emotion recognition requires categorizing data (e.g., happy vs. sad), not predicting a numerical value. ○ Classification models like Neural Networks and SVM can predict probabilities for each emotion class. Neural Networks 6. How does a neural network work, and what are its key components? ○ A neural network consists of: ○ Input layer: Receives features like MFCC. ○ Hidden layers: Process the data and learn patterns. ○ Output layer: Outputs probabilities for each emotion class. 7. Why did you use a feedforward architecture in your neural network? ○ Feedforward architecture means data flows in one direction (input → output). ○ It’s suitable for your project because emotion recognition doesn’t require memory of previous inputs (unlike RNNs). 8. How can a neural network have multiple inputs and outputs, and why is this important? ○ Multiple inputs: For example, using both MFCC and pitch as features. ○ Multiple outputs: For example, classifying both emotion and intensity. ○ In your project, multiple outputs could classify both the emotion and the speaker's intensity. Model Training and Evaluation 9. What is an epoch, and why is it important in the training process? ○ An epoch is one complete pass through the entire dataset during training. ○ It’s important because multiple epochs allow the model to learn patterns thoroughly. 10. Why do we stop training when validation error stops decreasing? ○ Validation error shows how the model performs on unseen data. ○ If validation error increases while training error decreases, it indicates overfitting. 11. What is the difference between training, validation, and test data? ○ Training data: Used to train the model. ○ Validation data: Used to tune the model during training. ○ Test data: Used only after training to evaluate the model’s generalization. 12. How do we measure if a model is overfitting? ○ When validation error is significantly higher than training error, or when the model performs poorly on test data. Overfitting and Generalization 13. What techniques can be used to prevent overfitting in machine learning models? ○ Regularization, dropout, data augmentation, and early stopping. 14. How do we ensure a model can generalize to new data? ○ By using a large and diverse dataset and avoiding overfitting. Inference 15. What is inference, and how is it used in your project? ○ Inference is when the model makes predictions based on new input data. ○ In your project, it is used to classify emotions such as happy, angry, etc., from new audio recordings. Feature Engineering 16. What is MFCC, and why is it an important feature for emotion recognition? ○ MFCC summarizes the most important characteristics of audio, especially those relevant to human speech. ○ It reduces data complexity, making it easier for the model to learn. Model Evaluation Metrics 17. What is a confusion matrix, and how is it used to evaluate a model? ○ A table showing correct and incorrect predictions for each class. ○ It can help identify specific errors in emotion recognition (e.g., sad is misclassified as angry). 18. How do you calculate precision, recall, and F1-score? ○ Precision: How many predicted positives were correct? ○ Recall: How many actual positives were correctly predicted? ○ F1-score: A balance between precision and recall. Loss Functions 19. What is cross-entropy, and why is it used as a loss function in classification? ○ Cross-entropy measures the difference between predicted probabilities and true probabilities. ○ It’s used because it penalizes incorrect predictions more heavily. Gradient-Based Optimization 20. What is gradient descent, and how is it used to train neural networks? ○ Gradient descent adjusts model weights to minimize the loss. ○ It does this by taking small steps in the direction of the steepest descent (gradient). 21. What is the learning rate, and why is it important to set it correctly? ○ The learning rate controls the size of the steps taken during gradient descent. ○ If it’s too high, the model might overshoot the optimal solution. If it’s too low, training becomes very slow. Chumur Questions Question 1: Regression outputs a set of probabilities that predict the likelihood that the inputs belong to each possible class or label. A. True B. False Explanation of the question: This question asks whether regression outputs probabilities for categories (labels) like classification does. Why is the answer B (False)? Regression predicts a continuous value (e.g., a temperature of 23.5 degrees). Classification predicts a probability for each category (e.g., 70% cat, 30% dog). Example: Regression: "How tall is this person based on their age and weight?" Classification: "Is this image a dog or a cat?" This question tests your ability to distinguish between regression and classification. Question 2: Neural networks must have one input and one output. A. True B. False Explanation of the question: This question examines whether neural networks are restricted to having only one input and one output. Why is the answer B (False)? Neural networks are flexible: They can have multiple inputs, such as data from different sensors or images. They can have multiple outputs, such as predicting both temperature and humidity simultaneously. Example: Input: Two sensor data points (temperature and humidity). Output: Two values (predicted temperature and humidity in an hour). This question tests your understanding of the structure and flexibility of neural networks. Question 3: When should you stop training a model? A. For as long as possible B. For a particular number of epochs C. Until the training error stops getting smaller D. Until the validation error stops getting smaller Explanation of the question: This question focuses on determining the right point to stop training to avoid overfitting. Why is the answer D (Until the validation error stops getting smaller)? Training error shows how well the model performs on training data. Focusing only on training error can lead to overfitting. Validation error measures how well the model performs on unseen data. When validation error stops decreasing, further training no longer improves the model's generalization. This question tests your understanding of overfitting and early stopping. Question 4: The test data set is used to evaluate the model after each epoch. A. True B. False Explanation of the question: This question examines when the test data is used in the training process. Why is the answer B (False)? The validation set is used to evaluate the model after each epoch during training. The test set is only used after the training is complete to evaluate the model's performance on unseen data. This question tests your understanding of the difference between validation and test data. Question 5: An overfit model will perform poorly on which of the following? A. Training data B. Validation data C. Test data D. Data encountered in the real world after deployment Explanation of the question: This question addresses the consequences of overfitting. Why are B, C, and D correct? An overfit model: Memorizes the training data too well, so it performs well on training data (not A). Performs poorly on validation data, test data, and real-world data because it cannot generalize to unseen examples. This question tests your understanding of how overfitting impacts a model's performance. Question 6: Inference is when a model is used to make a prediction with some given input data. A. True B. False Explanation of the question: This question defines inference in machine learning. Why is the answer A (True)? Inference occurs when the model takes new input data and makes a prediction. Example: A trained model predicts that an image shows a cat. This question tests your understanding of what inference means in machine learning. Others What is Regression Regression is a method in machine learning that helps us predict numbers, or values. Think of it as a smart calculator that tries to guess a number based on what it has learned from past examples. How does regression work? Imagine you want to figure out how tall someone might be based on their age and weight. Regression looks at past data (age, weight, and height) and tries to find a formula that fits the patterns. Then, it uses that formula to predict the height of new people it hasn’t seen before. A fun example: Let’s say you want to know how many ice creams will be sold on a hot summer day: Input (what we know): The temperature outside. Output (what we want to find): How many ice creams will be sold. Regression learns from past data (like how many ice creams were sold on previous hot days) and makes a prediction for today. Key points: Regression predicts numbers like temperature, height, or sales figures. It’s not like "classification," which guesses categories (like dog or cat). It’s used for problems where we’re looking for a precise result. What is an epoch An epoch in machine learning is one complete pass through the entire dataset during training. Think of it as the model "reading" all the data once to learn patterns from it. How does it work? If your dataset is very large, it's often split into smaller parts called batches to make the training process more manageable. An epoch means the model has gone through all the batches and seen the entire dataset once. Training through epochs: The model processes the data in batches during each epoch. After every epoch, the model updates its parameters (weights and biases) to improve its performance based on what it has learned. Training usually requires multiple epochs so the model can refine its understanding and improve its predictions. A simple analogy: Imagine you’re reading a book to learn about animals: One epoch is like reading the entire book once. If you read it again (another epoch), you might remember more details about the animals. The more times you read the book (multiple epochs), the better you understand the information. Why are epochs important? Multiple epochs allow the model to learn more thoroughly. However, training for too many epochs can lead to overfitting, where the model memorizes the training data instead of learning general patterns. What’s the difference between Neural Networks and Deep Learning? A neural network is a broader concept in machine learning that mimics how the human brain works. It consists of layers of interconnected "neurons" (nodes) that process and learn patterns in data. Deep learning, on the other hand, is a specialized type of neural network with many layers (hence "deep") that can learn very complex patterns from data. Neural Networks: Neural networks are a general family of models. They can be simple (with just one or a few layers) or complex. Example: A shallow neural network with only one hidden layer can classify basic patterns, like whether an email is spam or not. Deep Learning: Deep learning is a subcategory of neural networks. It uses many hidden layers (deep architecture) to learn complex patterns. Each layer extracts increasingly detailed features from the data. Example: A deep learning model can identify objects in an image by recognizing edges in the first layer, shapes in the next, and entire objects in the final layers. Simple analogy: Think of neural networks as a family of cars: A shallow neural network is like a simple car – it works well for short, easy trips. Deep learning is like a high-performance race car – it can handle long and complicated journeys, but it needs more fuel (data) and a stronger engine (computational power). Key differences: Parameter Neural Networks Deep Learning Definition A general concept for algorithms A specific type of neural network with many inspired by the brain. layers. Complexity Can be simple or shallow. Always complex, with multiple layers. Data Works with smaller datasets. Requires large datasets to perform well. Requirements Computation Faster and less resource-intensive. Requires high computational power (e.g., GPUs). Use Cases Simple tasks like basic Complex tasks like image recognition or classification. natural language processing. Confusion Matrix A confusion matrix is a tool used to evaluate the performance of a classification model. It shows how well the model is predicting each class by comparing its predictions to the actual values. How does it work? A confusion matrix is a table with four main values: True Positives (TP): The model correctly predicted a positive class. True Negatives (TN): The model correctly predicted a negative class. False Positives (FP): The model predicted a positive class when it was actually negative (a "false alarm"). False Negatives (FN): The model predicted a negative class when it was actually positive (a "miss"). Why is it useful? A confusion matrix helps you understand where the model is making mistakes. You can calculate important metrics from it: Accuracy: The percentage of correct predictions. Precision: Of all the positive predictions, how many were correct? Recall (Sensitivity): Of all the actual positives, how many did the model correctly predict? F1-Score: A balance between precision and recall. Example:If you have a model predicting whether a sound is "happy" or "sad," the confusion matrix could look like this: Predicted Happy Predicted Sad Actual Happy 80 (TP) 20 (FN) Actual Sad 10 (FP) 90 (TN) Training Subset: The training subset is the data used to teach the model. The model learns patterns and relationships in this data by adjusting its parameters (weights and biases). Example: If your dataset is sounds of different emotions, the training subset contains most of these sounds with their labels (e.g., "happy," "sad"). Testing Subset: The testing subset is the data used to evaluate the model after training is complete. This data is completely separate from the training data, so the model has never seen it before. It helps determine how well the model can generalize to new, unseen data. Why is this distinction important? Training subset: The model gets better at recognizing patterns by adjusting its parameters based on this data. Testing subset: Ensures the model isn’t just memorizing the training data (overfitting) but can also perform well on new, real-world data. A simple analogy: Training data is like studying for a test using a textbook. Testing data is like taking the exam—you don’t get to use the textbook, and the questions might be different from what you studied. What is MFCC (Mel-Frequency Cepstral Coefficients)? MFCC stands for Mel-Frequency Cepstral Coefficients, and it’s a feature extraction technique widely used in audio and speech processing. It transforms raw audio signals into a compact and meaningful representation that highlights the characteristics of human speech or sounds. How does MFCC work? Breaking audio into frames: The audio signal is divided into small time segments, called frames, to analyze the sound over time. Fourier Transform: Each frame is transformed into its frequency components using the Fourier Transform, turning the time-domain signal into a frequency-domain representation. Mel scale filter bank: Frequencies are mapped onto the Mel scale, which mimics how humans perceive sound. Lower frequencies are given more importance since the human ear is more sensitive to them. Logarithmic scaling: The amplitude of each frequency is converted to a logarithmic scale, similar to how the human ear perceives loudness. Discrete Cosine Transform (DCT): The final step applies the DCT to compress the information into a smaller set of coefficients, called the MFCCs, which represent the most important features of the sound. How is MFCC used in your project? In your project on emotion recognition, MFCC plays a key role in preprocessing the audio data to extract the most relevant features for classification. Input representation for machine learning models: The raw audio signal is first processed to generate MFCCs. These coefficients summarize the important characteristics of the audio, such as pitch, tone, and rhythm. Emotion-related patterns: Different emotions (e.g., happy, sad, angry) have unique vocal characteristics, like changes in pitch and intensity. MFCC captures these subtle variations, making it easier for your machine learning models (e.g., Neural Networks, Random Forest, SVM) to distinguish between emotions. Dimensionality reduction: Instead of feeding the entire raw audio signal into the model, MFCC reduces the data's complexity, focusing on the features most relevant to emotion classification. This helps the model train faster and perform better. Why MFCC is effective in your project? Human perception mimicry: MFCC is designed to reflect how humans perceive sound, making it particularly suitable for tasks like emotion recognition, where human-like understanding of speech is critical. Noise robustness: By focusing on key speech features and ignoring less important frequencies, MFCC helps reduce the impact of noise in the audio signal. Hundreds of small, embedded IoT devices working to gather and process data is known as "cloud computing." True False Explanation: This is False because the scenario describes edge computing, not cloud computing. In edge computing, data is processed locally, close to where it is collected (e.g., on IoT devices like sensors or cameras). In cloud computing, data is sent to distant data centers for processing. Kid-friendly explanation: Edge computing is like doing your homework at home instead of sending it far away to a teacher to get it checked. The "far edge" describes computing equipment like regional servers and cell towers. True False Explanation: This is True because the far edge refers to computing infrastructure that is closer to the user than centralized cloud data centers but not as close as IoT devices themselves. Examples of far edge include regional servers and cell towers. Kid-friendly explanation: Think of the far edge as a post office between your house and the main post office. Which of the following is NOT an advantage of edge computing? Reliability with little or no internet connection Additional services and throughput can be easily scaled by purchasing compute time Reduced network latency Lower network bandwidth usage Explanation: The ability to easily scale services by purchasing additional compute time is an advantage of cloud computing, not edge computing.(B) Key advantages of edge computing include: ○ Functioning without a reliable internet connection. ○ Faster response times (reduced latency). ○ Reducing the amount of data sent over the network (lower bandwidth usage). Kid-friendly explanation: Edge computing is about doing things close to you, while cloud computing is like asking for help from a big computer far away. Machine learning (ML) is a subset of artificial intelligence (AI). True False Explanation: Machine learning (ML) is a part of artificial intelligence (AI). ML focuses on teaching a computer to find patterns in data and make predictions. AI is a broader field that includes other techniques like rules and logic. Kid-friendly explanation: ML is like one specific class in a school, while AI is the entire school. You are designing a system to detect anomalies (outliers). No labels will be assigned to incoming data: you just want to know if a new piece of data is similar to known-good groupings of data. Which type of machine learning would be a good fit to tackle this problem? Supervised learning Regression Unsupervised learning Reinforcement learning Explanation: This problem fits unsupervised learning, as there are no labels and you are looking for patterns or groupings in the data. Supervised learning requires labels. Regression predicts numbers, not patterns. Reinforcement learning is about rewards and penalties, which isn’t relevant here. Kid-friendly explanation: Unsupervised learning is like letting the computer figure out the patterns on its own, like finding a shortcut in a maze without help. You are creating a system that can monitor the health of a conveyor belt and predict the remaining life based on various sensor data: sound, vibration, electrical current. What would be the best approach to solving this problem? Traditional algorithm Machine learning algorithm Cryptographic algorithm Explanation: Since you have sensor data and matching time-to-failure data, a machine learning algorithm is the best fit. ML can learn complex relationships in the data and predict the remaining life of the conveyor belt. Kid-friendly explanation: ML is like a super detective that can find patterns and make predictions based on clues. The process of updating a machine learning model's internal parameters by exposing it to collected, known-good data is known as "prediction serving." True False Explanation: Updating a model’s internal parameters is called training or retraining.(false) Prediction serving refers to using a trained model to make predictions, not updating its parameters. Kid-friendly explanation: Training is like learning something new, while prediction serving is like showing what you already know. A machine learning algorithm running on a smartphone is an example of edge AI. True False Explanation: This is True because edge AI refers to running a model on the device itself (e.g., a smartphone) instead of sending data to the cloud for processing. Kid-friendly explanation: Edge AI is like doing your homework on your phone instead of asking a computer far away for help. You are designing a safety device with a camera that looks for human limbs in a work area and automatically shuts off a dangerous machine if such items are identified. Which benefit is the main reason for choosing edge AI for this project? Increased user data privacy Lower network bandwidth usage Less energy usage Reduced network latency Explanation: The device needs to act in under 10 milliseconds, so reduced network latency is the key benefit. Edge AI processes data locally, allowing for immediate responses. Kid-friendly explanation: It’s like pressing a stop button that works instantly because it doesn’t need to ask for permission from a faraway computer. The smart camera that identifies limbs in a work area needs to locate the position of the limb in the area and count the number of limbs in each image. Which edge AI use case does this fall into? Time-series sensor data Audio time-series data Image classification Object detection Explanation: This is object detection, as it identifies, locates, and counts objects (limbs) in an image. Kid-friendly explanation: Object detection is like finding both what and where something is in a picture. In data processing and cleaning, what does "ETL" stand for? Extract, Transform, Lock Enhance, Train, Lock Extract, Transform, Load Extract, Train, Load Explanation: ETL stands for: 1. Extract: Get data from sources. 2. Transform: Clean or adjust the data. 3. Load: Store it in a database. Kid-friendly explanation: ETL is like making juice: you gather the fruit, squeeze it, and pour it into bottles.