BUSA3020 Revision Notes - Generated Questions

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What are explanatory models built for?

test causal hypotheses

What type of models are based on underlying causal relationships between theoretical constructs?

Explanatory Models (correct)
Unsupervised Models
Predictive Models
Supervised Models

Supervised learning allows us to make predictions about unseen data.

True (A)

Feature Scaling is a method used to transform the range of independent variables or features of data. It leads to quicker convergence of optimization algorithms such as __________ descent.

gradient Signup and view all the answers

Match the libraries with their descriptions:

Pandas = Python library used for data manipulation and analysis. Numpy = Python library used for working with arrays. Scikit-learn = Versatile machine-learning library in Python with various algorithms. Signup and view all the answers

What does Dimensionality Reduction refer to?

Dimensionality reduction refers to the transformation of the features in the dataset from a high-dimensionality space to a low-dimensionality space while attempting to retain meaningful properties of the original data. Signup and view all the answers

Which method of Dimensionality Reduction involves imposing penalties on parameter values for feature selection?

Regularisation (A) Signup and view all the answers

Principal Component Analysis (PCA) finds uncorrelated features that explain most of the variance in high-dimensional data.

True (A) Signup and view all the answers

________ is an extension of Principal Component Analysis (PCA) that allows for nonlinear dimensionality reduction.

Kernel Principal Component Analysis (KPCA) Signup and view all the answers

Match the Dimensionality Reduction method with its description:

Regularisation = Imposing penalties on parameter values for feature selection Sequential Feature Selection = Selecting a small number of relevant features from a larger set Feature Extraction = Summarizing the information content of the dataset by transforming the feature space Signup and view all the answers

What is information gain in decision trees?

Information Gain is the difference between the impurity of the parent node and the sum of the child node impurities. Signup and view all the answers

What are the measures of impurity used in decision trees?

All of the above (D) Signup and view all the answers

Tree pruning is a technique used to ___ the complexity of the final model and help prevent overfitting.

reduce Signup and view all the answers

Random Forests are ensembles of decision trees.

True (A) Signup and view all the answers

Match the following hyperparameters with their descriptions:

Number of Trees (n_estimators) = Number of trees in the forest Maximum Depth of the Trees (max_depth) = Limits how deep the trees can grow Bootstrap (bootstrap) = Determines if bootstrap samples are used Criterion (criterion) = Function used to measure the quality of a split Signup and view all the answers

What does KNN stand for?

K-Nearest Neighbours Signup and view all the answers

What is the key difference between L1 and L2 regularizations?

Sparsity; L1 regularization can zero out coefficients, leading to sparse models. Signup and view all the answers

What is the purpose of an n-gram model?

to represent sequences of words from text or speech Signup and view all the answers

What does a 2-gram model represent?

a sequence of two items from a text (D) Signup and view all the answers

In binary classification, what is required for a decision to be made by majority voting?

more than 50% of the classifiers must agree on the same class Signup and view all the answers

What are regular expressions used for?

defining search patterns with sequences of characters Signup and view all the answers

Match the topic modeling term with its description:

Topic Modeling = assigning topics to unlabeled text documents Latent Dirichlet Allocation (LDA) = finding groups of words that appear together across documents Signup and view all the answers

What is the main difference between Hard Voting and Soft Voting in ensemble learning?

Hard voting considers the most frequent class label, while Soft voting combines probability estimates. (B) Signup and view all the answers

What does AdaBoost stand for?

Adaptive Boosting Signup and view all the answers

DBSCAN is a clustering method based on the density of data points.

True (A) Signup and view all the answers

_______ is a preliminary step before applying more formal statistical techniques/analytics and can be crucial for understanding data sets.

Exploratory Data Analysis Signup and view all the answers

Which type of regression analysis uses several independent variables?

Multiple linear regression (A) Signup and view all the answers

Mean Squared Error (MSE) is heavily influenced by outliers.

True (A) Signup and view all the answers

What does R-squared (R2) measure in regression analysis?

R-squared measures the proportion of total variation of outcomes explained by the model. Signup and view all the answers

An observation or data point that falls within the expected range of a dataset is called an ________.

inlier Signup and view all the answers

Match the regression regularization method with its description:

L2 Regularisation - Ridge Regression = Adds a penalty equal to the square of the magnitude of coefficients to reduce the size of coefficients L1 Regularisation - LASSO = Introduces a penalty that is the absolute value of the magnitude of coefficients, shrinking some coefficients to zero L1 + L2 Regularisation - Elastic Net = Combines penalties from both Ridge and LASSO, maintaining a balance between feature selection and coefficient shrinkage Signup and view all the answers

Which ensemble learning algorithm uses multiple decision trees to make predictions?

Random Forest Regression (C) Signup and view all the answers

Decision tree regression models require feature scaling.

False (B) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Introduction to Analytics

Explanatory models: built to test causal hypotheses, based on underlying causal relationships between theoretical constructs
Predictive models: generate accurate predictions of new observations, integrate knowledge from existing theoretical models in a less formal way

Supervised vs Unsupervised Learning

Supervised Learning:
- Build a model from labeled training data to make predictions about unseen or future data
- Examples: classification, regression
Unsupervised Learning:
- Discover structure in unlabeled dataset
- Examples: clustering, topic modeling

Classification Algorithms 1

Types of Classification:
- Binary Classification: predict categorical class labels with two classes (e.g., true/false)
- Multi-class Classification: predict categorical class labels with more than two classes (e.g., buy/sell/hold)
Perceptron Learning Algorithm:
- Single-layer linear classifier
- Operates as a single-layer neural network
- Weights are updated using the errors computed based on the output from the linear activation function and true class labels
Hyperparameters:
- Set by the analyst, not optimized from the data (e.g., learning rate, number of epochs)
Feature Scaling:
- Method used to transform the range of independent variables or features of data
- Examples: standardization
Python Libraries:
- Pandas: data manipulation and analysis
- NumPy: working with arrays
- Scikit-learn: machine learning library with various algorithms and utilities

Classification Algorithms 2

Overfitting and Underfitting:
- Overfitting: model captures "patterns" in the training data that do not repeat in new data
- Underfitting: model cannot capture the underlying trend of the data
Bias-Variance Tradeoff:
- Balancing model complexity and accuracy
- Regularization: technique to prevent overfitting by adding a penalty on the larger magnitudes of model parameters
- Examples: L1 and L2 regularization
Cost Function:
- Formed by taking the negative/inverse of the log likelihood
- Used to optimize model parameters

Classification Algorithms 3

Logistic Regression:
- Classification algorithm for binary classification tasks
- Models the probability that a given input belongs to a particular class
Maximum Likelihood Estimation:
- Estimating parameters of a probability distribution by maximizing a likelihood function
- Used to optimize model parameters
Support Vector Machine (SVM):
- Aims to maximize the margin between the decision boundary and the closest data points from each class
- Less sensitive to outliers than other classification algorithms

Classification Algorithms 4

Decision Tree Learning:
- Supervised learning algorithm that models decisions and their possible consequences as a tree-like structure
- Decision trees can be prone to overfitting
Maximising Information Gain:
- Measure of the difference between the impurity of the parent node and the sum of the child node impurities
- Used to evaluate the quality of a split in the decision tree
Measures of Impurity:
- Entropy: quantifies the amount of uncertainty or disorder in a system
- Classification error: measures the proportion of misclassified instances in a dataset
- Gini impurity: calculates the probability of incorrect classification by randomly assigning a label to a randomly chosen sample

Random Forest

Ensemble method that combines multiple decision tree models
Uses bootstrap sampling and random feature selection to reduce correlation between trees
Output prediction is the class selected by most trees
Hyperparameters: number of trees, maximum depth, bootstrap, criterion

Data Preprocessing

Dealing with Missing Data:
- Deleting rows with missing values
- Imputing missing values
Categorical Data:
- Nominal features: no ordering possible (e.g., color)
- Ordinal features: categorical values that can be ordered or sorted (e.g., shirt size)
Encoding Categorical Variables:
- Nominal categorical variables: one-hot encoding
- Ordinal categorical variables: mapping into integers
Feature Scaling:
- Normalization: scales features to a specified range (usually [0, 1])
- Standardization: scales features to have a mean of 0 and a standard deviation of 1### Transformation Formula
The transformation formula is 𝑋−µ / σ, where µ is the mean and σ is the standard deviation of the feature.

Feature Selection

L1 regularisation:
- Produces sparse models (only a subset of coefficients are non-zero)
- Robust to outliers (penalises absolute value of coefficients)
- Can lead to multiple solutions
L2 regularisation:
- Does not produce sparse models (all coefficients are shrunk towards zero)
- Not robust to outliers
- Leads to one solution
Key differences between L1 and L2 regularisation:
- Sparsity
- Number of solutions
- Robustness to outliers
- Computational difficulty

Sequential Feature Selection

Greedy algorithms:
- Make locally optimal choices at each step
- Do not guarantee global optimum
Sequential Backward Selection (SBS):
1. Start with the full feature set
2. Evaluate the performance of the model
3. Remove a feature
4. Determine the feature to remove
5. Permanently remove the feature
6. Repeat steps 2-5 until the desired number of features is reached or the performance of the model does not improve

Feature Importance with Random Forests

Feature importance can be measured as the averaged impurity decrease from all decision trees in the forest

Dimensionality Reduction

High dimensionality:
- Requires large amounts of data
- Computationally expensive
Methods of dimensionality reduction:
1. Regularisation (L1 and L2 penalties)
2. Sequential feature selection
3. Feature extraction (PCA, LDA, etc.)
Principal Component Analysis (PCA):
- Finds uncorrelated features that explain most of the variance in high-dimensional data
- Used for dimensionality reduction
- Can be used for linearly separable data
Linear Discriminant Analysis (LDA):
- Supervised dimensionality reduction technique
- Finds features that optimize class separability
Kernel Principal Component Analysis (KPCA):
- Extension of PCA for nonlinear dimensionality reduction
- Uses kernel methods to transform data onto a lower-dimensional subspace

Model Evaluation and Hyperparameter Tuning

Pipelines:
- Combine multiple processing steps into a single estimator
- Advantages: simplicity, reproducibility, code maintenance, and parameter tuning
Hyperparameter tuning:
- Grid search:
  - Define a grid of hyperparameter values
  - Set up a model and grid search tool
  - Fit the grid search to the data
  - Evaluate results
- Holdout cross-validation:
  - Divide dataset into training, validation, and test sets
  - Use validation set to tune hyperparameters
  - Evaluate final model on test set
K-fold cross-validation:
- Divide dataset into k folds
- Use k-1 folds for training and 1 fold for validation
- Repeat for each fold
- Average performance metrics

Model Evaluation Metrics

Learning curves:
- Plot performance metrics against sample size
- Identify overfitting or underfitting
Validation curves:
- Plot performance metrics against a hyperparameter value
- Identify optimal hyperparameter value
Confusion matrix:
- Evaluate classification model performance
- Calculate precision, recall, F1 score, etc.

Ensemble Methods

Ensemble methods:
- Combine multiple models to improve performance
- Reduce overfitting and bias
Techniques:
- Bagging (bootstrap aggregating)
- Boosting (AdaBoost)
- Voting (hard and soft)
Bagging:
- Reduce overfitting
- Create bootstrap samples
- Combine multiple models
Adaptive Boosting (AdaBoost):
- Reduce bias and variance
- Focus on hard-to-classify instances
- Update instance weights and learner weights### Clustering Techniques
K-Means: partitions n items into k clusters, each item belongs to the cluster with the nearest mean
K-Means++: an algorithm for choosing the initial values for the K-Means clustering algorithm
Hierarchical Trees: a method of cluster analysis that seeks to build a hierarchy of clusters
- Agglomerative (Bottom-Up) Method: starts by assuming each example is a single cluster, merges closest pairs of clusters iteratively until only one cluster remains
- Divisive (Top-Down) Method: starts with one cluster, splits the cluster into smaller clusters iteratively until each cluster contains only one example

Measuring Distance

Single Linkage Approach: computes distances between the most similar members for each pair of clusters and merges the two clusters for which the distance between the most similar members is the smallest
Complete Linkage Approach: computes the distance between the most dissimilar members for each pair of clusters and merges the two clusters for which the distance between the most dissimilar members is the smallest

DBSCAN

identifies clusters in datasets based on the density of data points
classifies clusters based on the idea that a cluster in a dataset is a high-density area surrounded by a low-density area
key concepts:
- Core Points: a point is considered a core point if it has a minimum number of points (MinPts) within a given radius (ϵ)
- Border Points: a point that is not a core point but falls within the radius of a core point
- Noise Points: any point that is not a core point or a border point is considered noise or an outlier, not belonging to any cluster

Regression Analysis

Simple Regression: models a linear relationship between a target/dependent and one independent variable/feature
Multiple Regression: models a linear relationship between a target/dependent and more than one independent variable/feature
Ordinary Least Squares (OLS): a method for estimating the parameters of the linear regression line that minimises the sum of squared vertical distances from the estimated line to the training examples

Evaluating Regression Performance

Mean Squared Error (MSE): the average of the squares of the errors—that is, the average squared difference between the predicted values and the actual value
R-Squared (R²): a statistical measure of how close the data are to the fitted regression line
Residual Plot: a graph that shows the residuals on the vertical axis and the predictive values, or an independent variable on the horizontal axis

Regularisation Methods

L2 Regularisation - Ridge Regression: adds a penalty equal to the square of the magnitude of coefficients to the loss function, reducing the size of coefficients but keeping all variables in the model
L1 Regularisation - Lasso Regression: introduces a penalty that is the absolute value of the magnitude of coefficients, which can shrink some coefficients to zero, effectively performing feature selection
L1 + L2 Regularisation - Elastic Net: combines penalties from both Ridge and Lasso, integrating the benefits of both

Non-Linear Regression Models

Polynomial Regression: models complex relationships that do not follow simple linear patterns
Decision Tree Regression: employed to predict a continuous outcome by learning decision rules inferred from the data features
Random Forest Regression: an ensemble learning algorithm that utilises multiple decision trees to make predictions by averaging the outputs of the individual trees

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

BUSA3020 Revision Notes - Generated Questions

Choose a study mode

Podcast

Questions and Answers

What are explanatory models built for?

What type of models are based on underlying causal relationships between theoretical constructs?

Supervised learning allows us to make predictions about unseen data.

Feature Scaling is a method used to transform the range of independent variables or features of data. It leads to quicker convergence of optimization algorithms such as __________ descent.

Match the libraries with their descriptions:

What does Dimensionality Reduction refer to?

Which method of Dimensionality Reduction involves imposing penalties on parameter values for feature selection?

Principal Component Analysis (PCA) finds uncorrelated features that explain most of the variance in high-dimensional data.

________ is an extension of Principal Component Analysis (PCA) that allows for nonlinear dimensionality reduction.

Match the Dimensionality Reduction method with its description:

What is information gain in decision trees?

What are the measures of impurity used in decision trees?

Tree pruning is a technique used to ___ the complexity of the final model and help prevent overfitting.

Random Forests are ensembles of decision trees.

Match the following hyperparameters with their descriptions:

What does KNN stand for?

What is the key difference between L1 and L2 regularizations?

What is the purpose of an n-gram model?

What does a 2-gram model represent?

In binary classification, what is required for a decision to be made by majority voting?

What are regular expressions used for?

Match the topic modeling term with its description:

What is the main difference between Hard Voting and Soft Voting in ensemble learning?

What does AdaBoost stand for?

DBSCAN is a clustering method based on the density of data points.

_______ is a preliminary step before applying more formal statistical techniques/analytics and can be crucial for understanding data sets.

Which type of regression analysis uses several independent variables?

Mean Squared Error (MSE) is heavily influenced by outliers.

What does R-squared (R2) measure in regression analysis?

An observation or data point that falls within the expected range of a dataset is called an ________.

Match the regression regularization method with its description:

Which ensemble learning algorithm uses multiple decision trees to make predictions?

Decision tree regression models require feature scaling.

Study Notes

Introduction to Analytics

Supervised vs Unsupervised Learning

Classification Algorithms 1

Classification Algorithms 2

Classification Algorithms 3

Classification Algorithms 4

Random Forest

Data Preprocessing

Feature Selection

Sequential Feature Selection

Feature Importance with Random Forests

Dimensionality Reduction

Model Evaluation and Hyperparameter Tuning

Model Evaluation Metrics

Ensemble Methods

Measuring Distance

DBSCAN

Regression Analysis

Evaluating Regression Performance

Regularisation Methods

Non-Linear Regression Models

Studying That Suits You

Related Documents

More Like This

Exploring Explanatory Variables

Exploratory Data Analysis and Visualization in Software Analytics Quiz

Descriptive Analytics - Module 1 Quiz

Understanding Explanatory and Response Variables