BUSA3020 Revision Notes - Generated Questions
33 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What are explanatory models built for?

test causal hypotheses

What type of models are based on underlying causal relationships between theoretical constructs?

  • Explanatory Models (correct)
  • Unsupervised Models
  • Predictive Models
  • Supervised Models
  • Supervised learning allows us to make predictions about unseen data.

    True

    Feature Scaling is a method used to transform the range of independent variables or features of data. It leads to quicker convergence of optimization algorithms such as __________ descent.

    <p>gradient</p> Signup and view all the answers

    Match the libraries with their descriptions:

    <p>Pandas = Python library used for data manipulation and analysis. Numpy = Python library used for working with arrays. Scikit-learn = Versatile machine-learning library in Python with various algorithms.</p> Signup and view all the answers

    What does Dimensionality Reduction refer to?

    <p>Dimensionality reduction refers to the transformation of the features in the dataset from a high-dimensionality space to a low-dimensionality space while attempting to retain meaningful properties of the original data.</p> Signup and view all the answers

    Which method of Dimensionality Reduction involves imposing penalties on parameter values for feature selection?

    <p>Regularisation</p> Signup and view all the answers

    Principal Component Analysis (PCA) finds uncorrelated features that explain most of the variance in high-dimensional data.

    <p>True</p> Signup and view all the answers

    ________ is an extension of Principal Component Analysis (PCA) that allows for nonlinear dimensionality reduction.

    <p>Kernel Principal Component Analysis (KPCA)</p> Signup and view all the answers

    Match the Dimensionality Reduction method with its description:

    <p>Regularisation = Imposing penalties on parameter values for feature selection Sequential Feature Selection = Selecting a small number of relevant features from a larger set Feature Extraction = Summarizing the information content of the dataset by transforming the feature space</p> Signup and view all the answers

    What is information gain in decision trees?

    <p>Information Gain is the difference between the impurity of the parent node and the sum of the child node impurities.</p> Signup and view all the answers

    What are the measures of impurity used in decision trees?

    <p>All of the above</p> Signup and view all the answers

    Tree pruning is a technique used to ___ the complexity of the final model and help prevent overfitting.

    <p>reduce</p> Signup and view all the answers

    Random Forests are ensembles of decision trees.

    <p>True</p> Signup and view all the answers

    Match the following hyperparameters with their descriptions:

    <p>Number of Trees (n_estimators) = Number of trees in the forest Maximum Depth of the Trees (max_depth) = Limits how deep the trees can grow Bootstrap (bootstrap) = Determines if bootstrap samples are used Criterion (criterion) = Function used to measure the quality of a split</p> Signup and view all the answers

    What does KNN stand for?

    <p>K-Nearest Neighbours</p> Signup and view all the answers

    What is the key difference between L1 and L2 regularizations?

    <p>Sparsity; L1 regularization can zero out coefficients, leading to sparse models.</p> Signup and view all the answers

    What is the purpose of an n-gram model?

    <p>to represent sequences of words from text or speech</p> Signup and view all the answers

    What does a 2-gram model represent?

    <p>a sequence of two items from a text</p> Signup and view all the answers

    In binary classification, what is required for a decision to be made by majority voting?

    <p>more than 50% of the classifiers must agree on the same class</p> Signup and view all the answers

    What are regular expressions used for?

    <p>defining search patterns with sequences of characters</p> Signup and view all the answers

    Match the topic modeling term with its description:

    <p>Topic Modeling = assigning topics to unlabeled text documents Latent Dirichlet Allocation (LDA) = finding groups of words that appear together across documents</p> Signup and view all the answers

    What is the main difference between Hard Voting and Soft Voting in ensemble learning?

    <p>Hard voting considers the most frequent class label, while Soft voting combines probability estimates.</p> Signup and view all the answers

    What does AdaBoost stand for?

    <p>Adaptive Boosting</p> Signup and view all the answers

    DBSCAN is a clustering method based on the density of data points.

    <p>True</p> Signup and view all the answers

    _______ is a preliminary step before applying more formal statistical techniques/analytics and can be crucial for understanding data sets.

    <p>Exploratory Data Analysis</p> Signup and view all the answers

    Which type of regression analysis uses several independent variables?

    <p>Multiple linear regression</p> Signup and view all the answers

    Mean Squared Error (MSE) is heavily influenced by outliers.

    <p>True</p> Signup and view all the answers

    What does R-squared (R2) measure in regression analysis?

    <p>R-squared measures the proportion of total variation of outcomes explained by the model.</p> Signup and view all the answers

    An observation or data point that falls within the expected range of a dataset is called an ________.

    <p>inlier</p> Signup and view all the answers

    Match the regression regularization method with its description:

    <p>L2 Regularisation - Ridge Regression = Adds a penalty equal to the square of the magnitude of coefficients to reduce the size of coefficients L1 Regularisation - LASSO = Introduces a penalty that is the absolute value of the magnitude of coefficients, shrinking some coefficients to zero L1 + L2 Regularisation - Elastic Net = Combines penalties from both Ridge and LASSO, maintaining a balance between feature selection and coefficient shrinkage</p> Signup and view all the answers

    Which ensemble learning algorithm uses multiple decision trees to make predictions?

    <p>Random Forest Regression</p> Signup and view all the answers

    Decision tree regression models require feature scaling.

    <p>False</p> Signup and view all the answers

    Study Notes

    Introduction to Analytics

    • Explanatory models: built to test causal hypotheses, based on underlying causal relationships between theoretical constructs
    • Predictive models: generate accurate predictions of new observations, integrate knowledge from existing theoretical models in a less formal way

    Supervised vs Unsupervised Learning

    • Supervised Learning:
      • Build a model from labeled training data to make predictions about unseen or future data
      • Examples: classification, regression
    • Unsupervised Learning:
      • Discover structure in unlabeled dataset
      • Examples: clustering, topic modeling

    Classification Algorithms 1

    • Types of Classification:
      • Binary Classification: predict categorical class labels with two classes (e.g., true/false)
      • Multi-class Classification: predict categorical class labels with more than two classes (e.g., buy/sell/hold)
    • Perceptron Learning Algorithm:
      • Single-layer linear classifier
      • Operates as a single-layer neural network
      • Weights are updated using the errors computed based on the output from the linear activation function and true class labels
    • Hyperparameters:
      • Set by the analyst, not optimized from the data (e.g., learning rate, number of epochs)
    • Feature Scaling:
      • Method used to transform the range of independent variables or features of data
      • Examples: standardization
    • Python Libraries:
      • Pandas: data manipulation and analysis
      • NumPy: working with arrays
      • Scikit-learn: machine learning library with various algorithms and utilities

    Classification Algorithms 2

    • Overfitting and Underfitting:
      • Overfitting: model captures "patterns" in the training data that do not repeat in new data
      • Underfitting: model cannot capture the underlying trend of the data
    • Bias-Variance Tradeoff:
      • Balancing model complexity and accuracy
      • Regularization: technique to prevent overfitting by adding a penalty on the larger magnitudes of model parameters
      • Examples: L1 and L2 regularization
    • Cost Function:
      • Formed by taking the negative/inverse of the log likelihood
      • Used to optimize model parameters

    Classification Algorithms 3

    • Logistic Regression:
      • Classification algorithm for binary classification tasks
      • Models the probability that a given input belongs to a particular class
    • Maximum Likelihood Estimation:
      • Estimating parameters of a probability distribution by maximizing a likelihood function
      • Used to optimize model parameters
    • Support Vector Machine (SVM):
      • Aims to maximize the margin between the decision boundary and the closest data points from each class
      • Less sensitive to outliers than other classification algorithms

    Classification Algorithms 4

    • Decision Tree Learning:
      • Supervised learning algorithm that models decisions and their possible consequences as a tree-like structure
      • Decision trees can be prone to overfitting
    • Maximising Information Gain:
      • Measure of the difference between the impurity of the parent node and the sum of the child node impurities
      • Used to evaluate the quality of a split in the decision tree
    • Measures of Impurity:
      • Entropy: quantifies the amount of uncertainty or disorder in a system
      • Classification error: measures the proportion of misclassified instances in a dataset
      • Gini impurity: calculates the probability of incorrect classification by randomly assigning a label to a randomly chosen sample

    Random Forest

    • Ensemble method that combines multiple decision tree models
    • Uses bootstrap sampling and random feature selection to reduce correlation between trees
    • Output prediction is the class selected by most trees
    • Hyperparameters: number of trees, maximum depth, bootstrap, criterion

    Data Preprocessing

    • Dealing with Missing Data:
      • Deleting rows with missing values
      • Imputing missing values
    • Categorical Data:
      • Nominal features: no ordering possible (e.g., color)
      • Ordinal features: categorical values that can be ordered or sorted (e.g., shirt size)
    • Encoding Categorical Variables:
      • Nominal categorical variables: one-hot encoding
      • Ordinal categorical variables: mapping into integers
    • Feature Scaling:
      • Normalization: scales features to a specified range (usually [0, 1])
      • Standardization: scales features to have a mean of 0 and a standard deviation of 1### Transformation Formula
    • The transformation formula is 𝑋−µ / σ, where µ is the mean and σ is the standard deviation of the feature.

    Feature Selection

    • L1 regularisation:
      • Produces sparse models (only a subset of coefficients are non-zero)
      • Robust to outliers (penalises absolute value of coefficients)
      • Can lead to multiple solutions
    • L2 regularisation:
      • Does not produce sparse models (all coefficients are shrunk towards zero)
      • Not robust to outliers
      • Leads to one solution
    • Key differences between L1 and L2 regularisation:
      • Sparsity
      • Number of solutions
      • Robustness to outliers
      • Computational difficulty

    Sequential Feature Selection

    • Greedy algorithms:
      • Make locally optimal choices at each step
      • Do not guarantee global optimum
    • Sequential Backward Selection (SBS):
      1. Start with the full feature set
      2. Evaluate the performance of the model
      3. Remove a feature
      4. Determine the feature to remove
      5. Permanently remove the feature
      6. Repeat steps 2-5 until the desired number of features is reached or the performance of the model does not improve

    Feature Importance with Random Forests

    • Feature importance can be measured as the averaged impurity decrease from all decision trees in the forest

    Dimensionality Reduction

    • High dimensionality:
      • Requires large amounts of data
      • Computationally expensive
    • Methods of dimensionality reduction:
      1. Regularisation (L1 and L2 penalties)
      2. Sequential feature selection
      3. Feature extraction (PCA, LDA, etc.)
    • Principal Component Analysis (PCA):
      • Finds uncorrelated features that explain most of the variance in high-dimensional data
      • Used for dimensionality reduction
      • Can be used for linearly separable data
    • Linear Discriminant Analysis (LDA):
      • Supervised dimensionality reduction technique
      • Finds features that optimize class separability
    • Kernel Principal Component Analysis (KPCA):
      • Extension of PCA for nonlinear dimensionality reduction
      • Uses kernel methods to transform data onto a lower-dimensional subspace

    Model Evaluation and Hyperparameter Tuning

    • Pipelines:
      • Combine multiple processing steps into a single estimator
      • Advantages: simplicity, reproducibility, code maintenance, and parameter tuning
    • Hyperparameter tuning:
      • Grid search:
        • Define a grid of hyperparameter values
        • Set up a model and grid search tool
        • Fit the grid search to the data
        • Evaluate results
      • Holdout cross-validation:
        • Divide dataset into training, validation, and test sets
        • Use validation set to tune hyperparameters
        • Evaluate final model on test set
    • K-fold cross-validation:
      • Divide dataset into k folds
      • Use k-1 folds for training and 1 fold for validation
      • Repeat for each fold
      • Average performance metrics

    Model Evaluation Metrics

    • Learning curves:
      • Plot performance metrics against sample size
      • Identify overfitting or underfitting
    • Validation curves:
      • Plot performance metrics against a hyperparameter value
      • Identify optimal hyperparameter value
    • Confusion matrix:
      • Evaluate classification model performance
      • Calculate precision, recall, F1 score, etc.

    Ensemble Methods

    • Ensemble methods:
      • Combine multiple models to improve performance
      • Reduce overfitting and bias
    • Techniques:
      • Bagging (bootstrap aggregating)
      • Boosting (AdaBoost)
      • Voting (hard and soft)
    • Bagging:
      • Reduce overfitting
      • Create bootstrap samples
      • Combine multiple models
    • Adaptive Boosting (AdaBoost):
      • Reduce bias and variance
      • Focus on hard-to-classify instances
      • Update instance weights and learner weights### Clustering Techniques
    • K-Means: partitions n items into k clusters, each item belongs to the cluster with the nearest mean
    • K-Means++: an algorithm for choosing the initial values for the K-Means clustering algorithm
    • Hierarchical Trees: a method of cluster analysis that seeks to build a hierarchy of clusters
      • Agglomerative (Bottom-Up) Method: starts by assuming each example is a single cluster, merges closest pairs of clusters iteratively until only one cluster remains
      • Divisive (Top-Down) Method: starts with one cluster, splits the cluster into smaller clusters iteratively until each cluster contains only one example

    Measuring Distance

    • Single Linkage Approach: computes distances between the most similar members for each pair of clusters and merges the two clusters for which the distance between the most similar members is the smallest
    • Complete Linkage Approach: computes the distance between the most dissimilar members for each pair of clusters and merges the two clusters for which the distance between the most dissimilar members is the smallest

    DBSCAN

    • identifies clusters in datasets based on the density of data points
    • classifies clusters based on the idea that a cluster in a dataset is a high-density area surrounded by a low-density area
    • key concepts:
      • Core Points: a point is considered a core point if it has a minimum number of points (MinPts) within a given radius (ϵ)
      • Border Points: a point that is not a core point but falls within the radius of a core point
      • Noise Points: any point that is not a core point or a border point is considered noise or an outlier, not belonging to any cluster

    Regression Analysis

    • Simple Regression: models a linear relationship between a target/dependent and one independent variable/feature
    • Multiple Regression: models a linear relationship between a target/dependent and more than one independent variable/feature
    • Ordinary Least Squares (OLS): a method for estimating the parameters of the linear regression line that minimises the sum of squared vertical distances from the estimated line to the training examples

    Evaluating Regression Performance

    • Mean Squared Error (MSE): the average of the squares of the errors—that is, the average squared difference between the predicted values and the actual value
    • R-Squared (R²): a statistical measure of how close the data are to the fitted regression line
    • Residual Plot: a graph that shows the residuals on the vertical axis and the predictive values, or an independent variable on the horizontal axis

    Regularisation Methods

    • L2 Regularisation - Ridge Regression: adds a penalty equal to the square of the magnitude of coefficients to the loss function, reducing the size of coefficients but keeping all variables in the model
    • L1 Regularisation - Lasso Regression: introduces a penalty that is the absolute value of the magnitude of coefficients, which can shrink some coefficients to zero, effectively performing feature selection
    • L1 + L2 Regularisation - Elastic Net: combines penalties from both Ridge and Lasso, integrating the benefits of both

    Non-Linear Regression Models

    • Polynomial Regression: models complex relationships that do not follow simple linear patterns
    • Decision Tree Regression: employed to predict a continuous outcome by learning decision rules inferred from the data features
    • Random Forest Regression: an ensemble learning algorithm that utilises multiple decision trees to make predictions by averaging the outputs of the individual trees

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    BUSA3020 Revision PDF

    More Like This

    Exploring Explanatory Variables
    5 questions
    Exploratory Data Analysis (EDA)
    5 questions
    Descriptive Analytics - Module 1 Quiz
    24 questions
    Use Quizgecko on...
    Browser
    Browser