Podcast
Questions and Answers
Machine Learning with scikit-learn does not require any libraries such as NumPy or Matplotlib.
Machine Learning with scikit-learn does not require any libraries such as NumPy or Matplotlib.
False
What is the primary output format of data processed using scikit-learn?
What is the primary output format of data processed using scikit-learn?
Numpy
Machine Learning requires _______ data preprocessing before analysis.
Machine Learning requires _______ data preprocessing before analysis.
raw
Match the following data sources to their types:
Match the following data sources to their types:
Signup and view all the answers
What type of data does scikit-learn accept as input?
What type of data does scikit-learn accept as input?
Signup and view all the answers
Who is the professor for the Programming in Python course?
Who is the professor for the Programming in Python course?
Signup and view all the answers
What is the formula for calculating Accuracy?
What is the formula for calculating Accuracy?
Signup and view all the answers
The Recall is calculated as the ratio of true positives to the total actual positives.
The Recall is calculated as the ratio of true positives to the total actual positives.
Signup and view all the answers
What is the F1 score for the given model?
What is the F1 score for the given model?
Signup and view all the answers
The ratio of predicted positives that are actual positives is called __________.
The ratio of predicted positives that are actual positives is called __________.
Signup and view all the answers
Match the terms with their descriptions:
Match the terms with their descriptions:
Signup and view all the answers
What does the value '0.57' represent in this context?
What does the value '0.57' represent in this context?
Signup and view all the answers
How many instances are classified correctly in this model?
How many instances are classified correctly in this model?
Signup and view all the answers
The F1 score in this case is higher than both precision and recall.
The F1 score in this case is higher than both precision and recall.
Signup and view all the answers
What happens if K is set too small in K-fold Cross-validation?
What happens if K is set too small in K-fold Cross-validation?
Signup and view all the answers
Increasing K in K-fold Cross-validation always improves model accuracy.
Increasing K in K-fold Cross-validation always improves model accuracy.
Signup and view all the answers
What method should be used in K-fold Cross-validation when classes are unbalanced?
What method should be used in K-fold Cross-validation when classes are unbalanced?
Signup and view all the answers
In Stratified K-fold Cross-validation, the proportion of __________ labels is maintained within each fold.
In Stratified K-fold Cross-validation, the proportion of __________ labels is maintained within each fold.
Signup and view all the answers
Match the following terms with their definitions:
Match the following terms with their definitions:
Signup and view all the answers
Which library function can be used to implement stratified cross-validation in Python?
Which library function can be used to implement stratified cross-validation in Python?
Signup and view all the answers
K-fold Cross-validation is only applicable to classification problems.
K-fold Cross-validation is only applicable to classification problems.
Signup and view all the answers
Identify one potential disadvantage of using a very large K value in K-fold Cross-validation.
Identify one potential disadvantage of using a very large K value in K-fold Cross-validation.
Signup and view all the answers
What is the purpose of the model.fit()
function in supervised learning?
What is the purpose of the model.fit()
function in supervised learning?
Signup and view all the answers
Neural networks learn decision points and branches when modeling.
Neural networks learn decision points and branches when modeling.
Signup and view all the answers
What are the two types of predictions made by classifiers and regression models?
What are the two types of predictions made by classifiers and regression models?
Signup and view all the answers
Random forests are a type of __________ learning model.
Random forests are a type of __________ learning model.
Signup and view all the answers
What scoring metric is commonly used for regression models?
What scoring metric is commonly used for regression models?
Signup and view all the answers
Confusion matrices are used to assess the performance of regression models.
Confusion matrices are used to assess the performance of regression models.
Signup and view all the answers
What is the main goal of supervised machine learning?
What is the main goal of supervised machine learning?
Signup and view all the answers
Classification tasks in machine learning predict a real-valued number.
Classification tasks in machine learning predict a real-valued number.
Signup and view all the answers
What is the purpose of the train/test random split in machine learning?
What is the purpose of the train/test random split in machine learning?
Signup and view all the answers
In machine learning, __________ is used to validate a model's performance by dividing the training data into K subsets.
In machine learning, __________ is used to validate a model's performance by dividing the training data into K subsets.
Signup and view all the answers
Match the following machine learning terms with their descriptions:
Match the following machine learning terms with their descriptions:
Signup and view all the answers
What does K represent in K-fold cross-validation?
What does K represent in K-fold cross-validation?
Signup and view all the answers
In supervised machine learning, the terms 'X' and 'y' typically represent the input and output data respectively.
In supervised machine learning, the terms 'X' and 'y' typically represent the input and output data respectively.
Signup and view all the answers
Define classification in the context of machine learning.
Define classification in the context of machine learning.
Signup and view all the answers
The output layer of a neural network is where __________ are generated.
The output layer of a neural network is where __________ are generated.
Signup and view all the answers
Match the machine learning stages with their correct sequence:
Match the machine learning stages with their correct sequence:
Signup and view all the answers
What does the term 'hidden layer' refer to in a neural network?
What does the term 'hidden layer' refer to in a neural network?
Signup and view all the answers
Regression tasks involve assigning discrete labels to data.
Regression tasks involve assigning discrete labels to data.
Signup and view all the answers
Explain the main difference between supervised and unsupervised machine learning.
Explain the main difference between supervised and unsupervised machine learning.
Signup and view all the answers
The training dataset in machine learning is commonly denoted as __________.
The training dataset in machine learning is commonly denoted as __________.
Signup and view all the answers
What does the 'Random' in Random Forests refer to?
What does the 'Random' in Random Forests refer to?
Signup and view all the answers
Random Forests consists of a single decision tree for classification and regression.
Random Forests consists of a single decision tree for classification and regression.
Signup and view all the answers
What is the primary purpose of an ensemble of decision trees in Random Forests?
What is the primary purpose of an ensemble of decision trees in Random Forests?
Signup and view all the answers
In Random Forests, classification is based on __________ and regression is based on _________.
In Random Forests, classification is based on __________ and regression is based on _________.
Signup and view all the answers
Match the following terms related to Random Forests with their meanings:
Match the following terms related to Random Forests with their meanings:
Signup and view all the answers
Which of the following statements about Random Forests is true?
Which of the following statements about Random Forests is true?
Signup and view all the answers
Random Forests can only be used with continuous data.
Random Forests can only be used with continuous data.
Signup and view all the answers
Name one advantage of using Random Forests over a single decision tree.
Name one advantage of using Random Forests over a single decision tree.
Signup and view all the answers
Study Notes
Course Information
- Course Title: Programming in Python for Business Analytics
- Course Code: BMAN73701
- Week: 5, Lecture 2
- Topic: Advanced Machine Learning
Data Analysis Process
- Data acquisition from raw sources (databases, web, excel, APIs)
- Raw data tidied and organized into tabular data (numerical, categorical, ordinal)
- Data analysis through summary statistics, analysis, and visualizations.
Machine Learning with scikit-learn
- Built on top of NumPy and Matplotlib
- Input data can be NumPy or Pandas DataFrames
- Output is typically NumPy arrays
- Open-source, constantly improving, and object-oriented
- Used to fit (train) or transform data
Supervised Machine Learning
- Learning from examples of answers
- Classification: assigning discrete categories or labels
- Regression: predicting continuous real-valued numbers
Supervised ML Workflow
- Randomly split data into training and testing sets
- Train a machine learning model using the training data
- Evaluate the model's performance on the test set
K-fold Cross-validation
- Divides training data into k-folds
- Iterates through k-folds, using each fold as validation data
- Scores the model on validation data for each iteration
- Improves the ability of the model to generalize to unseen data; k-folds can be more accurate than a train/test random split if the training_data is small
- The best value for K is situational; too small, and the model may not generalize; too large, and it takes longer to train
Stratified K-fold Cross-validation
- Maintains the proportion of class labels in train and test sets during K-fold Cross-Validation
- Improves the handling of unbalanced data sets
- Automatically used in
cross_val_score()
Supervised ML Model Building
- Decision trees: learning decision points/branches
- Neural networks (MLP): learning weights of neurons
Supervised ML Model Evaluation
- Classifiers: accuracy
- Regression: R2
Random Forests
- Ensemble of decision trees
- Random decisions when building the trees
- Many trees combined
- Avoids overfitting by averaging predictions from multiple trees
- Measures feature importance
Credit Card Default Example
- Dataset used for demonstration purposes, with 30,000 rows (unbalanced)
- Using
value_counts()
gives the breakdown of the default variable, which should be considered before modeling
Feature Importance in Random Forests
- Important features have a higher impact on the model's predictions
- Calculated by
forest.feature importances_
(calculated after training the model)
Hyper-parameter Optimization
- Parameters set by training data, whereas hyperparameters need additional tuning
- Methods: Grid Search and Random Search
- Optimization algorithms used to find the best combination of hyperparameters maximizing the cross-validation score
- Methods such as SMAC, IRACE, Skopt
Preprocessing with Cross-validation
- Data transformations should be performed within the model-building step for each k-fold in Cross-validation (not before)
- Avoids data leakage, where model evaluation benefits by taking data from the validation set. This is important because the result would be overly optimistic.
Pipelines
- Combining preprocessing steps and machine learning models into a single object
- Helps with data transformations and avoiding data leakage during model evaluation.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore advanced concepts in machine learning tailored for business analytics. This quiz covers data acquisition, organization, analysis, and the application of scikit-learn for supervised learning techniques. Test your knowledge on classification, regression, and the workflow of machine learning models.