ML & Data Science: Bias-Variance and Imbalanced Data

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Consider a classification problem with imbalanced data. A classifier that always predicts the majority class achieves 90% accuracy. Which of the following statements is most accurate?

The classifier is an acceptable starting point and should be further optimized for better performance.
The classifier is not useful because it does not provide any insight into the minority class. (correct)
The classifier is performing well, as it achieves high accuracy.
The classifier is overfitting to the majority class.

Assume you are building a model to predict fraudulent transactions. The dataset is highly imbalanced, with only 2% of transactions being fraudulent. Which evaluation metric is the most appropriate to use?

F1 Score (correct)
Precision
Accuracy
Recall

How does increasing model complexity typically affect bias and variance?

Increases both bias and variance
Decreases both bias and variance
Decreases bias and increases variance (correct)
Increases bias and decreases variance

You're using PCA to reduce the dimensionality of your dataset. You notice that the first two principal components explain 95% of the variance. Which of the following is the most reasonable conclusion?

It is safe to reduce the dataset to two dimensions, as very little information is lost. (A) Signup and view all the answers

In PCA, what is the significance of the eigenvectors derived from the covariance matrix of the data?

They define the directions of the new feature vectors, also known as principal components. (D) Signup and view all the answers

Consider a dataset where you want to predict whether a customer will click on an ad (binary classification). You have a Naive Bayes' classifier. Under what condition is the Naive Bayes' assumption most likely to be problematic?

When the features are highly correlated with each other. (A) Signup and view all the answers

In the context of Bayesian learning, differentiate between Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation.

MLE maximizes the probability of the data, while MAP maximizes the posterior probability of the parameters given the data. (C) Signup and view all the answers

Which statement best describes the role of the sigmoid function in logistic regression?

Transforms the linear combination of inputs into a probability between 0 and 1. (B) Signup and view all the answers

How does L1 regularization differ from L2 regularization in the context of linear regression?

L1 regularization encourages sparsity in the model by setting some coefficients to zero, while L2 regularization shrinks coefficients towards zero without necessarily setting them to zero. (C) Signup and view all the answers

What is the primary purpose of dropout in neural networks?

To prevent overfitting by randomly deactivating neurons during training. (D) Signup and view all the answers

In the context of Convolutional Neural Networks (CNNs), what is the role of a pooling layer?

To reduce the dimensionality of the feature maps and provide translation invariance. (D) Signup and view all the answers

A CNN has multiple convolutional layers. How does the complexity of features detected typically change from earlier to later layers?

Earlier layers detect simple edges and textures, while later layers detect complex, high-level features. (A) Signup and view all the answers

In the context of ensemble learning, what is the primary difference between bagging and boosting?

Bagging aims to reduce variance, whereas boosting aims to reduce bias. (A) Signup and view all the answers

What is the role of a meta-learner in a stacking ensemble method?

To combine the predictions of the base learners into a final prediction. (B) Signup and view all the answers

How does the encoder in an autoencoder contribute to data compression?

By mapping the data to a lower-dimensional latent space. (D) Signup and view all the answers

What is the primary difference between an autoencoder (AE) and a variational autoencoder (VAE)?

VAEs learn a continuous latent space, whereas AEs may have a discontinuous latent space. (A) Signup and view all the answers

In data structures, what is the key distinction between a List and a Linked List regarding memory usage and access?

Lists allocate contiguous memory blocks allowing for random access via index, while Linked Lists use non-contiguous memory with sequential access through pointers. (A) Signup and view all the answers

What is the difference between a Stack and a Queue in terms of accessing elements?

Stacks use a Last-In-First-Out (LIFO) approach, while Queues use a First-In-First-Out (FIFO) approach. (B) Signup and view all the answers

In a hash table, what is the main reason for collisions, and how are collisions typically resolved?

Collisions occur when different keys produce the same hash value, resolved typically using separate chaining or open addressing. (C) Signup and view all the answers

In tree data structures, how does a 'leaf node' differ from a 'root node'?

Root nodes do not have parents; leaf nodes do not have children. (D) Signup and view all the answers

How do graphs differ from trees in data structure characteristics?

Graphs can contain cycles; trees are acyclic and have a hierarchical structure with one root. (D) Signup and view all the answers

When preparing for coding interviews, why is it recommended to focus on easy to medium difficulty questions first?

To build a strong foundation and familiarity with fundamental concepts. (C) Signup and view all the answers

During a coding interview, what is the benefit of 'thinking out loud' while coding?

It allows the interviewer to understand your problem-solving approach and thought process. (C) Signup and view all the answers

During a behavioral interview, what is the STAR method primarily used for?

To structure answers to questions about past experiences in a clear and concise manner. (D) Signup and view all the answers

When preparing stories for a behavioral interview, why is it recommended to assign keywords to each story?

To easily recall and match relevant stories to different interview questions based on their themes. (D) Signup and view all the answers

What does the 'Action' component of the STAR method involve?

Explaining the specific steps you took to address the problem or situation. (D) Signup and view all the answers

During PCA, if the original data isn't scaled, what is the implication for the identified principal components?

Variables with larger variances prior to PCA will have a disproportionately larger influence on the principal components. (A) Signup and view all the answers

Suppose that you are building an autoencoder for data compression. During data evaluation, you notice that autoencoder struggles with a specific subset of rare inputs in the dataset. What adjustments can you perform?

Over-sample the rare inputs within your training data such that your model sees more of those specific rare inputs. (B) Signup and view all the answers

Given a scenario where the bias is high and the variance is low. What course of action would you recommend?

Add more features to the model and decrease regularization. (C) Signup and view all the answers

What modifications could be made to a Loss function?

The addition of an L1 or L2 penalty. (B) Signup and view all the answers

What is the importance of convolutional layers in CNNs?

They detect patterns between spatially related data. (C) Signup and view all the answers

AlexNet and ResNet sought out to improve the results, what was a key different between what each of them preformed?

While ResNet worked to solve the vanishing gradient problem, AlexNet sought out to make improvements via ReLU. (D) Signup and view all the answers

What would be the advantages of bagging?

It decreases the variance in the predictions (C) Signup and view all the answers

Which statements best describes the benefit of ensembling.?

Ensembling can reduce variance or bias. (A) Signup and view all the answers

Which data structure is not appropriate for coding during interviews?

There aren't unappropriated data structures for interviews (B) Signup and view all the answers

What is an appropriate strategy to prepare for the interview?

All the previous options. (D) Signup and view all the answers

What is the first strategical step to answer the behavioral interview?

Extract useful keywords that encapsulates the gist of the question. (C) Signup and view all the answers

Think what the task component in your stories is useful, and pick the best options.

Explain your responsibility in the situation. (C) Signup and view all the answers

Flashcards

What is Bias?

Error between average model prediction and ground truth. It tells us the capacity of the underlying model to predict the values.

What is Variance?

Average variability in the model prediction for the given dataset. It tells you how much the function can adjust to the change in the dataset

High Bias

Occurs when the model is too simple, leading to under-fitting. It also leads to high error on both test and train data

High Variance

Occurs when the model is overly complex, leading to over-fitting. It also leads to Low error on train data and high on test