Untitled Quiz
49 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the significance of having two hidden layers in an artificial neural network (ANN)?

  • They can represent any decision boundary with high accuracy. (correct)
  • They increase the overall computational speed.
  • They improve the interpretability of the model.
  • They simplify the training process.
  • How is the optimal size of the hidden layer(s) in a multi-layer ANN typically determined?

  • By following pre-set standard sizes for specific tasks.
  • Based on theoretical analysis of network performance.
  • Through extensive simulations on training data.
  • By using a trial-and-error heuristic approach. (correct)
  • What happens during the training of a multi-layer ANN when an error is detected in the output?

  • New input patterns are generated.
  • The hidden layers are removed.
  • The entire network resets to its initial state.
  • Weights are adjusted to reduce the error. (correct)
  • Why might more layers be added to an ANN structure?

    <p>To handle more complex data and numerous predictors.</p> Signup and view all the answers

    Which of the following is true about the learning process in a multi-layer ANN?

    <p>It involves presenting input patterns and adjusting weights based on errors.</p> Signup and view all the answers

    What is the purpose of the backward pass in an artificial neural network (ANN)?

    <p>To calculate and propagate the error backwards</p> Signup and view all the answers

    Which factor can contribute to the overfitting of an ANN?

    <p>An increasing number of hidden neurons</p> Signup and view all the answers

    What does the back propagation algorithm primarily aim to achieve?

    <p>Minimize the total error of the ANN</p> Signup and view all the answers

    What is indicated by a higher $R^2$ value in relation to an ANN?

    <p>A better fit to the data</p> Signup and view all the answers

    What does momentum help achieve when training an ANN?

    <p>Prevention of local maxima</p> Signup and view all the answers

    Which of the following is NOT a parameter that can affect the performance of an ANN?

    <p>Output data type</p> Signup and view all the answers

    What does the local gradient of a neuron during back propagation represent?

    <p>The change in the neuron's output relative to changes in input</p> Signup and view all the answers

    What is one potential consequence of a high learning rate in an ANN?

    <p>Divergence from optimal weights</p> Signup and view all the answers

    What is the primary purpose of the least squares method in linear regression?

    <p>To minimize the sum of the squared differences between actual and predicted values</p> Signup and view all the answers

    Which of the following indicates that a linear regression model may not adequately fit the data?

    <p>A non-linear pattern in residuals</p> Signup and view all the answers

    When analyzing residuals to check for homoscedasticity, what does constant variance imply?

    <p>The model is correctly specified</p> Signup and view all the answers

    In a confusion matrix, what does a True Positive (TP) represent?

    <p>The model predicted YES, and the actual answer was YES</p> Signup and view all the answers

    What does a non-independent residual analysis indicate?

    <p>Residuals are correlated and may indicate model misspecification</p> Signup and view all the answers

    The linear function is represented mathematically as which of the following?

    <p>Y = β0 + β1X + ε</p> Signup and view all the answers

    Which statement is true regarding the slope in a linear regression model?

    <p>It represents the rate of change of Y with respect to X</p> Signup and view all the answers

    How does increasing the number of data points affect the least squares regression line?

    <p>It could potentially stabilize the coefficients and reduce variability</p> Signup and view all the answers

    What does it mean if the residuals have a fan-shaped pattern when plotted?

    <p>There is a problem with linearity in the data</p> Signup and view all the answers

    What is represented by the term 'error' in a linear regression model?

    <p>The difference between predicted and actual values</p> Signup and view all the answers

    Which logical operation results in 1 only when the inputs are different?

    <p>XOR</p> Signup and view all the answers

    In a single-layer perceptron, which type of problems can it not solve?

    <p>Non-linearly separable problems</p> Signup and view all the answers

    What is true about multi-layer artificial neural networks (ANNs)?

    <p>They can learn both linearly and non-linearly separable problems.</p> Signup and view all the answers

    Which of the following accurately describes the layers in a multi-layer ANN?

    <p>At least one layer must be hidden.</p> Signup and view all the answers

    What does the AND operator output when both inputs are 0?

    <p>0</p> Signup and view all the answers

    What classification task is suited for a multi-layer ANN but not for a single-layer perceptron?

    <p>Multi-class classification</p> Signup and view all the answers

    How do multi-layer ANNs propagate input signals?

    <p>Layer-by-layer in a forward direction</p> Signup and view all the answers

    Which result does the OR logical operation yield for inputs 0 and 0?

    <p>0</p> Signup and view all the answers

    How does changing the mean (μ) of a normal distribution affect its graph?

    <p>It shifts the distribution left or right.</p> Signup and view all the answers

    What does the standard deviation (σ) determine in a normal distribution?

    <p>The width or spread of the distribution.</p> Signup and view all the answers

    In a normal distribution defined by its mean and standard deviation, what does E(X) represent?

    <p>The expected value of the random variable.</p> Signup and view all the answers

    What mathematical function describes a normal distribution?

    <p>A probability density function (pdf).</p> Signup and view all the answers

    How is the variance (Var(X)) of a normal distribution calculated?

    <p>By squaring the standard deviation.</p> Signup and view all the answers

    According to the central limit theorem, how does the mean of a sample (𝑥̅) vary around the population mean (μ)?

    <p>It varies around μ with a standard deviation of σ/n.</p> Signup and view all the answers

    In a normal distribution, if the mean (μ) is increased while the standard deviation (σ) remains unchanged, what happens to the distribution?

    <p>The distribution shifts to the right.</p> Signup and view all the answers

    Which of the following statements is true about the total area under a normal distribution curve?

    <p>It always equals 1.</p> Signup and view all the answers

    What happens to the sampling distribution of 𝑥̅ as sample size n increases?

    <p>It becomes a Gaussian distribution.</p> Signup and view all the answers

    What does a p-value greater than 0.1 indicate?

    <p>No presumption against the null hypothesis.</p> Signup and view all the answers

    What does a positive covariance between two variables indicate?

    <p>The variables are positively correlated.</p> Signup and view all the answers

    Which of the following p-value ranges indicates a low presumption against the null hypothesis?

    <p>0.05 &lt; 𝑝 ≤ 0.1</p> Signup and view all the answers

    In the context of A/B testing, what does Fisher's exact test evaluate?

    <p>Non-random associations between two categorical variables.</p> Signup and view all the answers

    In terms of linear correlation, what does it mean when the covariance is equal to zero?

    <p>The two variables are independent.</p> Signup and view all the answers

    Which statement describes the 68-95-99.7 Rule?

    <p>68% of data falls within one standard deviation from the mean.</p> Signup and view all the answers

    Which scenario best demonstrates anomaly detection in machine learning?

    <p>Finding potential scams in an online retail shop.</p> Signup and view all the answers

    Which of the following represents a weak linear relationship in terms of correlation?

    <p>cov(X,Y) = 0.01</p> Signup and view all the answers

    What is the interpretation of a p-value that falls within the range of 0.01 to 0.05?

    <p>Strong presumption against the null hypothesis.</p> Signup and view all the answers

    Study Notes

    Machine Learning for Business Analytics

    • The presentation is about machine learning for business analytics, focusing on different aspects of data analysis.
    • The instructors are Dr. Marc Hilbert and Dr. Andrii Kleshchonok.
    • The presentation language is English.
    • The date of the presentation is October 20, 2024.

    Part 1: Introduction to Business Analytics

    • This section introduces the core concepts of business analytics.
    • Define the task (e.g., prediction, clustering, classification, anomaly detection).
    • Define objectives, error metrics, and performance standards.
    • Data collection: set up data streams, storage, input, parallelisation, and Hadoop.
    • Preprocessing: noise and outlier filtering, completing missing data using histograms and interpolation, normalization to scale data.

    Dimensionality Reduction/Feature Selection

    • Choose features to use and extract data from.
    • Explore methods such as PCA, LDA, LLE, GDA.
    • Consider goals, questions related to tractability.
    • Design experiments, including train/validate/test data sets and cross-validation.
    • Perform deployment.

    Classification vs. Clustering

    • Classification: uses labeled data, requires training phases, and is domain sensitive. Easy to measure performance. Includes methods like Naive Bayes, KNN, SVM, Decision Trees, Random Forests.
    • Clustering: uses unlabeled data, organizes patterns based on similarity, difficult to evaluate, and includes methods like K-means, Fuzzy C-means, Hierarchical Clustering, DBScan.

    Examples of ML Problems

    • Predict how much customers spend in online retail.
    • Explore different types of online retail customers.
    • Find categories for items in an online store.
    • Suggest items users might want to buy online.

    Part 2: Elements of Statistics

    • Discusses fundamental statistical concepts.
    • Random variable descriptions, discrete and continuous.
    • Probability function mapping.
    • Probability function area always equals 1.

    Description of Random Variables

    • A random variable takes on a range of values with specific probabilities.
    • The probability is how often we expect different outcomes in repeated experiments.

    Discrete vs. Continuous Random Variables

    • Discrete: countable number of outcomes. Examples: dead/alive, treatment/placebo, dice rolls.
    • Continuous: infinite continuum of values. Examples: blood pressure, weight, speed of a car, real numbers from 1 to 6.

    Probability Function

    • A probability function maps possible values of a variable against the probability of their occurrence. This value is between 0 and 1.
    • The area under the probability function is equal to 1.0.

    Continuous Case

    • For continuous variables, the probability function is a continuous mathematical function that integrates to 1.
    • Example: the negative exponential function (exponential distribution) integrates to 1.

    Continuous Case (cont.)

    • The probability function for continuous random variables is called the probability density function (PDF).
    • Probabilities of continuous variables are associated to ranges, not single values.

    All Probability Distributions

    • All probability distributions are characterized by an expected value (mean) and a variance (standard deviation squared).

    Mean or Expectation Value

    • Discrete case mean (expected value): E(X) = Σxᵢp(xᵢ) for all x.
    • Continuous case mean (expected value): E(X) = ∫xᵢp(xᵢ)dx for all x.

    Variance

    • σ² = Var(X) = E(x-μ)²
    • Variance is the expected squared distance from the mean.

    Variance (cont.)

    • Discrete case: Var(X) = Σ(xᵢ - μ)² p(xᵢ) for all x.
    • Continuous case: Var(x) = ∫(xᵢ - μ)² p(xᵢ)dx for all x.

    Normal Distribution

    • A bell-shaped curve.
    • Defined by the mean (μ) and standard deviation (σ).
    • Changing μ shifts the distribution left or right. Changing σ increases or decreases the distribution spread.

    The Normal Distribution: Mathematical Function

    • f(x) = 1/(σ√2π) * e^(-(x-μ)²/(2σ²)).

    The Normal PDF

    • The area under the PDF curve always integrates to 1.

    Normal Distribution Definition

    • Mean = E(X) = μ
    • Standard deviation (Std Dev) = √Var(X) = σ

    Central Limit Theorem

    • The mean of many random samples will be normally distributed around the true mean of the population, as the sample size increases.
    • Standard deviation of the sampling distribution decreases as the sample size increases.

    68-95-99.7 Rule

    • 68% of the data falls within one standard deviation of the mean.
    • 95% within two standard deviations.
    • 99.7% within three standard deviations.

    Confidence Interval

    • μ = x ± t * (s/√n).
    • Uses t-Students value, dependent on sample size and confidence level.

    Testing Hypothesis

    • Comparison between population distribution vs. sampling distribution.
    • Test on the sample mean to either reject or accept the null hypothesis.

    Deterministic vs. Statistical Testing

    • Deterministic: observe the event and decide (reject/don't reject null).
    • Statistical: observe the event and decide, with a chance of error (reject/don't reject with chance p%).

    Types of Errors in Hypothesis Testing

    • Type I error (α): Rejecting the null hypothesis when it's true (false positive).
    • Type II error (β): Failing to reject the null hypothesis when it's false (false negative).

    p-value

    • Probability of an observed event to occur by pure chance.
    • Informal significance levels help us interpret the results.

    Anomaly Detection

    • Techniques used to isolate and identify data points or values that are considered unusual or don't align with the rest of the data.

    Examples of ML problems (cont.)

    • Identify potential scams in online retail outlets.

    A/B Testing

    • A method for testing two different design options by comparing their success rate.
    • Used to quantitatively asses if there's a statistical difference between the two.
    • Assess the effect of underlining links on click-through rate.

    Correlation

    • Describes a linear relationship between two variables.
    • Positive correlation (increasing X → increasing Y).
    • Negative correlation (increasing X → decreasing Y).
    • No correlation.

    Correlation (cont.)

    • cov(X,Y) = Σ((xᵢ - X̄)(yᵢ - Ȳ))/(n - 1).

    Linear Correlation

    • Linear relationships are visualized and evaluated on scatterplots.
    • Assess the strength of the relationship between variables.
    • Visual assessment of the relationship: Strong, weak, no relationship.

    Linear Regression Model

    • Assumes a linear relationship between variables.
    • Defines the relationship using an equation (Y = β₀ + β₁X₁ + εᵢ).
    • The dependent variable is Y, and the independent variable is X.
    • Random error (ε) accounts for the fact the linear relation is an approximation.

    Estimating Parameters: Least Squares Method

    • The best fit is when the differences between prediction values and the actual values are minimal.
    • Least squares minimizes the sum of the squared differences.
    • The method is used for parameter estimation in linear regression, and can also be applied to other models.

    Least Squares Graphically

    • Visual representation of the error minimisation using a line graph and the points.

    Residual Analysis for Linearity, Homoscedasticity, and Independence

    • Residual analysis assess the validity of the assumptions for the linear model.
    • Linearity: The errors should be randomly distributed around the line.
    • Homoscedasticity: The variance of errors should be constant across the independents variable.
    • Independence: The residual values at one point should not be correlated with the residual values at a different point.

    Estimating Parameters: Classification

    • This section covers estimation techniques specific to classification problems.

    Comparing LP and Logit Models

    • Comparison of linear predictive models vs. logistic predictive models focusing on the shape differences.

    Confusion Matrix/Crosstabs

    • Calculates the performance of a classification model.
    • True positives (TP), True negatives (TN), False Positives (FP), False Negatives (FN).

    Confusion Matrix

    • A table that records the counts of the classifications.

    Underfitting and Overfitting

    • Underfitting: The model is too simple to capture the true relationship (e.g., a flat line through data with a curve).
    • Overfitting: The model is too complex, fitting the training set too closely and losing generalisability (e.g., following the noise in data, which does not reflect the underlying pattern)

    Overfitting

    • The model fits the training data very well.
    • The model does not generalise well for the test data.
    • There is a gap between training and test error.

    Overfitting (cont.)

    • Overfitting is a problem in machine learning.
    • It occurs when a model is too complex.
    • Good fit on training data but poor on unseen (test) data.

    Overfitting of ANNs

    • Parameters (e.g., number of neurons, initial weights).
    • Activation functions (sigmoid, etc.).
    • Learning rate, momentum (increase in flexibility due to increase in neurons).

    Training and test data set

    • Training data is used to train or learn a model.
    • Test data is used to evaluate the performance.

    Goodness-of-fit of ANN

    • Measure of how well the ANN models the data.
    • Similar to R-squared for linear regression, but applied to ANNs.
    • Close to 1.0 = better fit.
    • Plotting MSE vs. Epochs helps choose the optimal model, as the training and test MSE.

    Advantages of ANNs

    • Efficient for massively parallel processing.
    • Robust, tolerant to missing or noisy data.
    • User-friendly programming.

    Disadvantages of ANNs

    • Difficult to design models for arbitrary applications.
    • Difficult to assess internal operation of the ANN.
    • Not easy to know which variable is influential (black box).

    Part 5: Python Implementation

    • This section outlines components for designing and training dense artificial neural networks using the Keras Python library.
    • This includes Data (housing dataset), Problem statement (regression or classification), Preprocessing (standard scaling, one-hot encoding), and Architecture (number of layers/neurons, activation functions, dropout layers), Training parameters (optimizer ADAM, batch size, epochs), Evaluation metrics and learning curves, Analysis of errors and residuals.

    Dropout Layers

    • Randomly remove nodes within the NN during a forward path to train an ensemble of subnetworks.
    • Effectively improves generalisation ability.
    • Leads to improved uncertainty estimation of predictions.

    Classification Implementation in Keras

    • Steps to use Keras for classification models include defining a sequential model, defining the layers (input, hidden, output), specifying activation functions, compiling the model, and training it.

    High-level Language Model Overview

    • Large language models are described and their parameters vs. the year are displayed.

    Intuition behind LLM trainings

    • Autoregressive models predict future tokens given past history. Autoencoders predict tokens, given the rest of the context.

    LLM Capabilities

    • LLMs cover various tasks such as text classification, entity recognition, summarization, paraphrase, translation, and data generation.

    GPT (Generative Pre-trained Transformer)

    • Generative Large Language Model (LLM). Zero-shot and few-shot learning on diverse tasks. Includes a Chat Functionality and Human Feedback Loop.

    Part 6: Exam Information

    • This section contains exam-related details (e.g., dates, topics).

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    More Like This

    Untitled Quiz
    6 questions

    Untitled Quiz

    AdoredHealing avatar
    AdoredHealing
    Untitled Quiz
    19 questions

    Untitled Quiz

    TalentedFantasy1640 avatar
    TalentedFantasy1640
    Untitled Quiz
    55 questions

    Untitled Quiz

    StatuesquePrimrose avatar
    StatuesquePrimrose
    Untitled Quiz
    18 questions

    Untitled Quiz

    RighteousIguana avatar
    RighteousIguana
    Use Quizgecko on...
    Browser
    Browser