BMAN73701 Week 5: Advanced Machine Learning

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Machine Learning with scikit-learn does not require any libraries such as NumPy or Matplotlib.

False (B)

What is the primary output format of data processed using scikit-learn?

Numpy

Machine Learning requires _______ data preprocessing before analysis.

raw

Match the following data sources to their types:

Databases = Raw Data Sources Excel = Raw Data Sources Numerical = Tabular Data Types Categorical = Tabular Data Types Signup and view all the answers

What type of data does scikit-learn accept as input?

Numpy or Pandas DataFrame (C) Signup and view all the answers

Who is the professor for the Programming in Python course?

Prof. Manuel López-Ibáñez Signup and view all the answers

What is the formula for calculating Accuracy?

$TP + TN / (TP + TN + FP + FN)$ (A) Signup and view all the answers

The Recall is calculated as the ratio of true positives to the total actual positives.

True (A) Signup and view all the answers

What is the F1 score for the given model?

0.62 Signup and view all the answers

The ratio of predicted positives that are actual positives is called __________.

Precision Signup and view all the answers

Match the terms with their descriptions:

Precision = Ratio of true positives to predicted positives Recall = Ratio of true positives to actual positives Accuracy = Ratio of correct predictions to total predictions F1 Score = Harmonic mean of Precision and Recall Signup and view all the answers

What does the value '0.57' represent in this context?

Recall (A) Signup and view all the answers

How many instances are classified correctly in this model?

4 Signup and view all the answers

The F1 score in this case is higher than both precision and recall.

False (B) Signup and view all the answers

What happens if K is set too small in K-fold Cross-validation?

It results in faster computations but poorer generalization. (C) Signup and view all the answers

Increasing K in K-fold Cross-validation always improves model accuracy.

False (B) Signup and view all the answers

What method should be used in K-fold Cross-validation when classes are unbalanced?

Stratified K-fold Cross-validation Signup and view all the answers

In Stratified K-fold Cross-validation, the proportion of __________ labels is maintained within each fold.

class Signup and view all the answers

Match the following terms with their definitions:

K-fold Cross-validation = Division of dataset into K subsets for training and validation. Stratified K-fold = A method ensuring each fold has the same proportion of class labels. Training fold = Subset of data used to train the model. Validation fold = Subset of data used to evaluate model performance. Signup and view all the answers

Which library function can be used to implement stratified cross-validation in Python?

cross_val_score() (B) Signup and view all the answers

K-fold Cross-validation is only applicable to classification problems.

False (B) Signup and view all the answers

Identify one potential disadvantage of using a very large K value in K-fold Cross-validation.

Increased computation time. Signup and view all the answers

What is the purpose of the `model.fit()` function in supervised learning?

To build the model (C) Signup and view all the answers

Neural networks learn decision points and branches when modeling.

False (B) Signup and view all the answers

What are the two types of predictions made by classifiers and regression models?

Classifiers predict labels, and regression predicts numerical outputs. Signup and view all the answers

Random forests are a type of __________ learning model.

ensemble Signup and view all the answers

What scoring metric is commonly used for regression models?

R-Squared (R2) (B) Signup and view all the answers

Confusion matrices are used to assess the performance of regression models.

False (B) Signup and view all the answers

What is the main goal of supervised machine learning?

Given examples, learn to classify or predict answers (D) Signup and view all the answers

Classification tasks in machine learning predict a real-valued number.

False (B) Signup and view all the answers

What is the purpose of the train/test random split in machine learning?

To separate data into training and testing sets for model evaluation. Signup and view all the answers

In machine learning, __________ is used to validate a model's performance by dividing the training data into K subsets.

K-fold cross-validation Signup and view all the answers

Match the following machine learning terms with their descriptions:

Supervised ML = Learning with labeled data Unsupervised ML = Learning without labeled data Classification = Assigning labels to data Regression = Predicting continuous values Signup and view all the answers

What does K represent in K-fold cross-validation?

The number of splits of the training data (B) Signup and view all the answers

In supervised machine learning, the terms 'X' and 'y' typically represent the input and output data respectively.

True (A) Signup and view all the answers

Define classification in the context of machine learning.

Classification is the process of assigning categories or labels to data. Signup and view all the answers

The output layer of a neural network is where __________ are generated.

predictions Signup and view all the answers

Match the machine learning stages with their correct sequence:

Train/test random split = 1 Train ML model = 2 Score on the test set = 3 Signup and view all the answers

What does the term 'hidden layer' refer to in a neural network?

The layers that perform intermediate computations (B) Signup and view all the answers

Regression tasks involve assigning discrete labels to data.

False (B) Signup and view all the answers

Explain the main difference between supervised and unsupervised machine learning.

Supervised learning uses labeled data for training, while unsupervised learning works with unlabeled data. Signup and view all the answers

The training dataset in machine learning is commonly denoted as __________.

Xtrain Signup and view all the answers

What does the 'Random' in Random Forests refer to?

Random decisions made during tree construction (B) Signup and view all the answers

Random Forests consists of a single decision tree for classification and regression.

False (B) Signup and view all the answers

What is the primary purpose of an ensemble of decision trees in Random Forests?

To improve accuracy and reduce overfitting. Signup and view all the answers

In Random Forests, classification is based on __ and regression is based on _.

vote, average Signup and view all the answers

Match the following terms related to Random Forests with their meanings:

Ensemble = Combination of multiple models Decision Tree = A model that makes decisions based on features Feature Split = Dividing the data at a particular point based on feature values Information Gain = A measure used to determine the effectiveness of a feature in splitting data Signup and view all the answers

Which of the following statements about Random Forests is true?

They can handle both classification and regression tasks. (D) Signup and view all the answers

Random Forests can only be used with continuous data.

False (B) Signup and view all the answers

Name one advantage of using Random Forests over a single decision tree.

Reduced risk of overfitting. Signup and view all the answers

Flashcards

Precision

The ratio of correctly predicted positive instances to the total number of instances predicted as positive.

Recall

The ratio of correctly predicted positive instances to the total number of actual positive instances.