Data Science Fundamentals Quiz

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which of the following best describes the primary goal of data science?

Applying computational and statistical techniques to gain insights into real-world problems. (correct)
Efficiently storing and retrieving large datasets.
Creating visually appealing data presentations.
Developing new hardware and software for data storage.

Data analysis in data science solely involves choosing a model without any prior exploration of the data.

False (B)

Name the three macro-steps involved in a typical data science task.

Data collection, data analysis, and data presentation

A crucial aspect of statistical analysis is choosing a good , as selection can lead to non-representative conclusions.

sample Signup and view all the answers

Match the following concepts with their descriptions:

Data Management = Support for data storage, retrieval, and infrastructure. Correlation Coefficient = Measures the linear relationship between two variables. Simpson's Paradox = Trend observable in groups is unobservable when combined. Bonferroni's Principle = Addresses the risk of discovering meaningless patterns. Signup and view all the answers

In the context of correlation coefficients, what does a value of 0 indicate?

No linear correlation. (D) Signup and view all the answers

What does Simpson's paradox primarily point to?

The presence of a confounding variable influencing the other variables. (B) Signup and view all the answers

Descriptive statistics involves making conclusions that extend beyond the given dataset.

False (B) Signup and view all the answers

What is the primary drawback of a greedy policy in reinforcement learning?

It may lead to suboptimal solutions by not exploring alternatives. (B) Signup and view all the answers

Temporal difference learning requires prior knowledge of the transition function and reward function.

False (B) Signup and view all the answers

Describe the impact of a very small learning rate ($ \alpha \rightarrow 0 $) on the learning process.

slow but stable learning Signup and view all the answers

A common assumption for proving the convergence of many RL algorithms is ______ exploration, ensuring every state-action pair is experienced infinitely many times.

infinite Signup and view all the answers

Match the following Multi-Agent Reinforcement Learning (MARL) challenges with their descriptions:

Non-stationarity = The environment changes continuously due to simultaneous actions of multiple agents. Credit assignment = Difficulty in correctly attributing rewards to the appropriate agent. Equilibrium selection = Finding a globally optimal policy that balances the interests of all agents. Signup and view all the answers

What characterizes a zero-sum game in the context of Multi-Agent Reinforcement Learning (MARL)?

The sum of all agents’ rewards is zero. (A) Signup and view all the answers

A Nash equilibrium guarantees that all agents achieve the highest possible reward.

False (B) Signup and view all the answers

Which learning mode in MARL balances independence and coordination by sharing information during training but allowing independent action during deployment?

CTDE (Centralized Training, Decentralized Execution) (A) Signup and view all the answers

What are the key advantages of using deep learning in reinforcement learning?

Generalization and scalability of value and policy functions, efficient representation of complex environments, better sample efficiency. Signup and view all the answers

In reinforcement learning, the absence of a fixed ground truth and the presence of non-i.i.d. data, which violates assumptions of supervised learning, leads to the ______ target problem and potential overfitting.

moving Signup and view all the answers

Which of the following ensemble methods learns sequentially, with each model correcting the previous one?

Boosting (A) Signup and view all the answers

Linear classifiers are generally unsuitable for large datasets due to their computational complexity.

False (B) Signup and view all the answers

What is the primary goal of Support Vector Machines (SVMs) in classification tasks?

to find the optimal separating hyperplane that best divides data points into different classes Signup and view all the answers

In gradient descent, a learning rate that is too ______ can cause training to become unstable and may prevent convergence.

high Signup and view all the answers

What is the role of the sigmoid function in logistic regression?

To map predictions to the range [0, 1], representing probabilities (C) Signup and view all the answers

The perceptron can only be used for classification problems and cannot be adapted for regression tasks.

False (B) Signup and view all the answers

Explain how the C hyper-parameter in SVM affects the bias-variance trade-off.

A large C leads to a smaller margin and potential overfitting (low bias, high variance), while a small C allows misclassifications, leading to a larger margin and better generalization (high bias, low variance). Signup and view all the answers

The data points closest to the hyperplane in SVM, which directly influence its position, are called ______.

support vectors Signup and view all the answers

Which of the following statements best describes the purpose of a loss function in machine learning?

To quantify the difference between model predictions and actual target values (A) Signup and view all the answers

In logistic regression, a feature weight close to zero indicates that the feature is highly predictive of the positive class.

False (B) Signup and view all the answers

Describe the process of gradient descent and its goal in optimizing model parameters.

Gradient descent is an iterative optimization algorithm that minimizes the error function by adjusting model parameters in the direction opposite to the gradient until the error is minimized. Signup and view all the answers

The C hyper-parameter in SVM controls the trade-off between achieving a larger ______ and minimizing classification errors.

margin Signup and view all the answers

Which of the following is a characteristic of ensemble methods that use boosting?

Using homogeneous weak learners learned sequentially (B) Signup and view all the answers

Regularization techniques in machine learning aim to increase the complexity of models to better fit the training data.

False (B) Signup and view all the answers

In frequentist hypothesis testing, which condition leads to the rejection of the null hypothesis (H0)?

p-value < T (D) Signup and view all the answers

According to the Central Limit Theorem, the sample mean of any random variable always follows a normal distribution, regardless of sample size.

False (B) Signup and view all the answers

Bayes' theorem provides a method to update the probability of an event given new information. Write the formula for Bayes' Theorem.

p(A|B) = p(A) * p(B|A) / p(B) Signup and view all the answers

__________ statistics aims to generalize results from a sample to an entire population, perform hypothesis testing, and build data models to draw conclusions.

Inferential Signup and view all the answers

Match the following Gestalt principles with their descriptions:

Closure = Experiencing separate elements as a complete figure Proximity = Perceiving objects near each other as a group Similarity = Grouping similar elements into collective entities Common Fate = Perceiving objects moving in the same direction as a collective unit Signup and view all the answers

Which type of chart is most suitable for visualizing the distribution of a single variable and identifying whether the distribution is symmetrical?

Box plot (D) Signup and view all the answers

Anscombe's quartet demonstrates that descriptive statistics are always sufficient for understanding the underlying patterns in a dataset.

False (B) Signup and view all the answers

Explain the purpose of a heatmap and in what kind of analysis it would be most useful.

A heatmap represents the values in a matrix using colors and is useful for visualizing correlations or distances between entities. Signup and view all the answers

In machine learning, finding a mapping between input and output variables directly from labeled data observations is known as __________ learning.

supervised Signup and view all the answers

What is the primary difference between classification and regression in supervised learning?

Classification predicts discrete outputs, regression predicts continuous outputs. (C) Signup and view all the answers

Unsupervised learning requires labeled data to train a model.

False (B) Signup and view all the answers

Define what a confusion matrix is and what its purpose is in evaluating the performance of a machine learning model.

A confusion matrix is a matrix to measure the performance of the prediction model; it can be binary or multi classes. It reports tn, tp, fn, fp. Signup and view all the answers

__________ is calculated as the ratio of true positives to the sum of true positives and false positives, indicating the accuracy of the positive predictions.

Precision Signup and view all the answers

When should a recall-precision curve be preferred over a receiver operating characteristic (ROC) curve for evaluating a classification model?

When the dataset is imbalanced. (B) Signup and view all the answers

Lindley's paradox suggests that the Bayesian and frequentist inference approaches will always arrive at the same conclusions, regardless of the prior distribution.

False (B) Signup and view all the answers

What is the primary purpose of using kernel functions in Kernel Machines?

To efficiently compute dot products in a higher-dimensional space without explicitly transforming the data. (B) Signup and view all the answers

The One-vs-One (OvO) approach for multi-class SVM classification requires training fewer models than the One-vs-All (OvA) approach.

False (B) Signup and view all the answers

In the context of neural networks, what is function approximation?

learning a parametric function that maps inputs to outputs Signup and view all the answers

In a feed-forward neural network, each neuron applies a function: $f(x;\theta) = g(w \cdot x + b)$, where $g$ is a non-linear ________ function.

activation Signup and view all the answers

Match the following activation functions with their primary use case:

Sigmoid = Binary classification Softmax = Multi-class classification ReLU = Deep networks Signup and view all the answers

During the training of a neural network, what is the role of the loss function?

To measure the error between the network's predictions and the ground truth. (D) Signup and view all the answers

Backpropagation is used to calculate the loss function in a neural network.

False (B) Signup and view all the answers

What problem does early stopping address during neural network training?

overfitting Signup and view all the answers

Convolutional Neural Networks (CNNs) are particularly well-suited for tasks involving ________ classification due to their ability to automatically extract hierarchical features.

image Signup and view all the answers

Which of the following is a disadvantage of the One-vs-All (OvA) approach in multi-class SVM classification?

It struggles when the classes are highly imbalanced. (B) Signup and view all the answers

Without activation functions, a neural network can effectively model non-linear dependencies in data.

False (B) Signup and view all the answers

What are the trainable parameters in each layer of a feedforward neural network?

weights and biases Signup and view all the answers

The ________ kernel captures complex decision boundaries by mapping data into an infinite-dimensional space.

Gaussian (Radial Basis Function, RBF) Signup and view all the answers

What is the purpose of the 'patience' parameter in the early stopping technique?

To wait for a few epochs to see if the validation error decreases again before stopping the training. (A) Signup and view all the answers

Match the names to their correct process

Convolutional Neural Networks (CNNs) = automatic hierarchical feature extraction from input data Backpropagation = compute gradients for every parameter in a neural network Kernel Machines = extend SVMs to handle non-linear classification problems Signup and view all the answers

Which characteristic distinguishes Recurrent Neural Networks (RNNs) from feed-forward networks?

RNNs have loops in their architecture, allowing them to maintain a memory of previous inputs. (A) Signup and view all the answers

Regression is used to predict discrete categories or labels.

False (B) Signup and view all the answers

What key assumption differentiates parametric regression from non-parametric regression?

Parametric regression requires pre-defining the shape of the function. Signup and view all the answers

In K-Nearest Neighbors Regression, the target value is predicted as a weighted combination of the K ________’ values.

nearest neighbors Signup and view all the answers

Which loss function is commonly used in neural networks designed for regression tasks?

Mean Squared Error (MSE) (B) Signup and view all the answers

Match each clustering performance measure with its description:

Dunn Index = A higher value indicates better clustering performance. Silhouette Score = Ranges from -1 to 1; higher values suggest better clustering. Homogeneity = Measures if each cluster contains only points of a single label; best value is 1. Signup and view all the answers

Which of the following is a primary goal of clustering?

To have high intra-cluster similarity and high inter-cluster dissimilarity. (A) Signup and view all the answers

What is a major limitation of the K-means algorithm regarding cluster shape?

It assumes linear separation and works best when clusters are linearly separable. (C) Signup and view all the answers

Since clustering is unsupervised, there is a direct way to evaluate the performance.

False (B) Signup and view all the answers

What is the role of the Moore-Penrose pseudo-inverse in linear regression?

To solve for the coefficients when there are more equations than variables. (B) Signup and view all the answers

In regression trees, what criterion is used to evaluate and select attributes during training?

Mean Squared Error (MSE) Signup and view all the answers

Which of the following adjustments is needed to adapt a Multi-Layer Perceptron (MLP) for regression tasks compared to classification?

Using one neuron with a linear activation function in the output layer. (C) Signup and view all the answers

Support Vector Regression (SVR) separates classes by finding a margin.

False (B) Signup and view all the answers

In the K-means algorithm, the number of clusters (K) must be _________.

set in advance Signup and view all the answers

What is the significance of high intra-cluster similarity and high inter-cluster dissimilarity in clustering?

Indicates well-defined clusters Signup and view all the answers

Which of the following is the MOST accurate description of the difference between data mining and machine learning?

Data mining aims to find frequent patterns and rules, while machine learning learns a model from the data. (C) Signup and view all the answers

A high lift value (e.g., lift > 1) for an association rule indicates a negative dependence between the antecedent and the consequent.

False (B) Signup and view all the answers

In the context of association rule mining, define the 'interest' of a rule and explain what it means when the interest is zero.

Interest is the difference between the confidence of a rule and the support of the consequent. An interest of zero means that the antecedent has no influence on the consequent. Signup and view all the answers

The Apriori algorithm generates candidate itemsets of length k, given all frequent itemsets of length ______.

k-1 Signup and view all the answers

Match the linkage methods with their descriptions in hierarchical clustering:

Single Linkage = Minimum distance between any two points from each cluster. Complete Linkage = Maximum distance between any two points from each cluster. Average Linkage = Average distance between all pairs of points between two clusters. Ward's Method = Minimizes the increase in variance when merging clusters. Signup and view all the answers

What is the primary purpose of Principal Component Analysis (PCA)?

To project high-dimensional data into a lower-dimensional space while preserving variance. (D) Signup and view all the answers

In Reinforcement Learning (RL), a sparse reward system provides the agent with a reward after each action.

False (B) Signup and view all the answers

Define a Markov Decision Process (MDP) and briefly explain the significance of the Markov property within this framework.

An MDP is a mathematical model describing sequential decision-making with states, actions, rewards, and transition probabilities. The Markov property states that the future depends only on the current state and action, not on past history. Signup and view all the answers

In Reinforcement Learning, the learning goal is to maximize the expected ______, which is the sum of future rewards an agent can collect from a given state.

return Signup and view all the answers

What impact does a discount factor ($$\gamma$$) close to 0 have on the agent's decision-making in Reinforcement Learning?

The agent prioritizes immediate rewards over future rewards. (C) Signup and view all the answers

The completeness measure in clustering evaluates whether clusters are internally homogenous.

False (B) Signup and view all the answers

Given an association rule A -> B, what does the support of the rule represent?

The fraction of transactions that contain both itemsets A and B. (C) Signup and view all the answers

A ______ is a tree-like diagram that represents the hierarchical structure of clusters in hierarchical clustering.

dendrogram Signup and view all the answers

Explain the purpose of state value function, $V(s)$, and state-action value function, $Q(s, a)$, in Reinforcement Learning.

$V(s)$ measures the expected value of being in state $s$ while following a policy $\pi$, and $Q(s, a)$ measures the expected value of taking action $a$ in state $s$, and then following policy $\pi$. They both estimate how 'good' a state or state-action pair is for decision-making. Signup and view all the answers

What is a greedy policy in Reinforcement Learning?

A policy that selects the action with the highest estimated state-action value. (B) Signup and view all the answers

When training a machine learning model, what is the primary purpose of splitting data into training, validation, and test sets?

To choose appropriate hyper parameters, select the best model, and measure performance on unseen data, respectively. (A) Signup and view all the answers

Maintaining class proportions when splitting data for model training is generally recommended.

True (A) Signup and view all the answers

In K-fold cross-validation, the data is split into k folds, and in each turn, one fold acts as the __________ set while the others form the training and validation sets.

test Signup and view all the answers

What is the key difference between micro and macro averaging in the context of evaluating model performance?

Micro averaging sums all true positives, false positives, and false negatives before computing the metric, while macro averaging first computes the metric per fold and then averages. (A) Signup and view all the answers

Describe overfitting in the context of machine learning.

Overfitting occurs when a model learns the training data too well, including its noise and specific details, leading to poor generalization on unseen data. Signup and view all the answers

Match the following data preprocessing tasks with their descriptions:

Aggregation = Grouping data together. Cleaning = Handling missing or inconsistent data. Discretization = Converting numerical features into discrete intervals. Normalization = Scaling data to a specific range. Signup and view all the answers

In the context of the KNN classifier, how is the target variable for a new instance typically determined?

By finding the most frequent target value among the k nearest neighbors. (B) Signup and view all the answers

Hamming distance can be used as a metric in KNN.

True (A) Signup and view all the answers

How can the KNN algorithm be adapted for regression tasks?

By evaluating the target variable as a weighted combination of the target values of the k nearest neighbors. (A) Signup and view all the answers

Name one requirement and one drawback of using the KNN algorithm.

Requirement: Numeric features. Drawback: High computational cost in high-dimensional spaces. Signup and view all the answers

In a decision tree, what does each internal node represent?

An attribute used for splitting the data. (C) Signup and view all the answers

The C4.5 algorithm uses __ to select the best attribute for splitting the data.

information gain ratio Signup and view all the answers

Which of the following statements best describes the Gini Index in the context of decision trees:

It measures the impurity of a node; lower values indicate purer nodes. (C) Signup and view all the answers

Pruning is used to increase the complexity of decision trees.

False (B) Signup and view all the answers

Give two pros and one con of decision trees.

Pros: Simple to understand, able to handle both numerical and categorical data. Con: Prone to overfitting Signup and view all the answers

Flashcards

Data Science

Applying computational and statistical techniques to gain insight into real-world problems.

Data Science Application Ingredients

Raw data, a problem statement, a model, and an evaluation metric.