Introduction to ML for Business Applications
55 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is Machine Learning?

Algorithmic decisions or predictions that are based on data

What is the main difference between Machine Learning and Artificial Intelligence?

  • Machine Learning is a broader concept that encompasses Artificial Intelligence, as it deals with the creation of machines that can mimic human cognitive functions.
  • Machine Learning is a subset of Artificial Intelligence that focuses on building algorithms that can learn from data to perform specific tasks. (correct)
  • Machine Learning focuses on analyzing data to make predictions, while Artificial Intelligence is concerned with creating machines that can reason and think independently.
  • Machine Learning and Artificial Intelligence are essentially the same thing, with no real distinction between them.
  • Deep Learning is a type of machine learning that relies on artificial neural networks to perform complex tasks.

    True

    What are some of the major applications of Machine Learning in the business context?

    <p>Fraud detection, recommendations, chatbots, image generation, customer segmentation, and demand/load prediction</p> Signup and view all the answers

    Which of the following is NOT a basic machine learning problem?

    <p>Optimization</p> Signup and view all the answers

    Which type of machine learning problem deals with predicting a continuous output based on historical data?

    <p>Regression</p> Signup and view all the answers

    Which type of machine learning problem deals with identifying groups or patterns in unlabeled data?

    <p>Clustering</p> Signup and view all the answers

    What are the three types of features used in Machine Learning?

    <p>Categorical, ordinal, and numerical</p> Signup and view all the answers

    Categorical features have a natural ordering, allowing for comparisons between different categories.

    <p>False</p> Signup and view all the answers

    Ordinal features are typically encoded as numbers to represent their ordering, and comparisons between these numbers (<, >, =) are meaningful

    <p>True</p> Signup and view all the answers

    Normalizing numerical features helps ensure that all features have similar scales, preventing one feature from dominating during training and improving the model's performance.

    <p>True</p> Signup and view all the answers

    Which type of machine learning problem is particularly relevant when dealing with predicting whether a patient has a particular disease based on their characteristics?

    <p>Classification</p> Signup and view all the answers

    Which type of machine learning algorithm is often used in recommendation systems to suggest items that similar users have liked?

    <p>K-Nearest Neighbors</p> Signup and view all the answers

    Which type of machine learning problem is most relevant in identifying communities or groups of users with similar interests or connections within a social network?

    <p>Clustering</p> Signup and view all the answers

    Which type of machine learning algorithm is particularly effective in reducing the dimensionality of high-dimensional data for pattern recognition?

    <p>Self-Organizing Maps (SOM)</p> Signup and view all the answers

    Overfitting occurs when a model is too complex and learns the training data too well, resulting in poor performance on new data.

    <p>True</p> Signup and view all the answers

    What are some ways to mitigate overfitting in machine learning models?

    <p>Model simplification, early stopping, regularization, pruning, and ensemble methods</p> Signup and view all the answers

    What is the purpose of a validation set in machine learning?

    <p>To tune model parameters and prevent overfitting by evaluating the model's performance on unseen data.</p> Signup and view all the answers

    What is the purpose of a test set in machine learning?

    <p>To provide an unbiased evaluation of the model's ability to generalize to new data after training is complete.</p> Signup and view all the answers

    What is the main idea behind k-fold cross-validation, and why is it important in Machine Learning?

    <p>K-fold cross-validation is a technique for evaluating the performance of a machine learning model by splitting the dataset into k folds. The model is trained and tested k times, each time using a different fold as the test set. This helps in obtaining a more robust and reliable estimate of the model's performance, reducing the chance of overfitting.</p> Signup and view all the answers

    When dealing with unbalanced classes, accuracy is a robust metric for evaluating the performance of a classifier.

    <p>False</p> Signup and view all the answers

    Which of the following metrics is best suited for evaluating the performance of a model in cases where the costs of false positives and false negatives are unequal or when the classes are unbalanced?

    <p>F1-Score</p> Signup and view all the answers

    Outliers are data points that have unusual values compared to other data points and can significantly affect a model's performance.

    <p>True</p> Signup and view all the answers

    What is 'Survivor Bias'?

    <p>A type of cognitive bias that occurs when we focus on observations that survive a process, overlooking those that did not survive.</p> Signup and view all the answers

    Feature Engineering involves transforming raw data into a set of meaningful and informative features that can be used as input for machine learning models.

    <p>True</p> Signup and view all the answers

    What is the main goal of Feature Engineering?

    <p>To improve the predictive performance of machine learning models by creating features that are meaningful and informative for the task.</p> Signup and view all the answers

    What is the main idea behind machine learning?

    <p>Algorithmic decisions or predictions that are based on data.</p> Signup and view all the answers

    Define artificial intelligence (AI) and its relationship with machine learning (ML).

    <p>Artificial intelligence (AI) is an umbrella term for computer software that mimics human cognition to perform complex tasks and learns from them.</p> Signup and view all the answers

    What is the term used for the set of data used to train the model in machine learning?

    <p>Training set</p> Signup and view all the answers

    Which of the following is NOT a core ML problem?

    <p>Deep Learning</p> Signup and view all the answers

    What type of ML is used to predict continuous-valued outputs based on historical data?

    <p>Regression</p> Signup and view all the answers

    Which of the following is NOT a common application of ML in a business context?

    <p>Sentiment Analysis</p> Signup and view all the answers

    What is the difference between descriptive analytics and predictive Analytics?

    <p>Descriptive analytics focuses on analyzing historical data to uncover trends and patterns, while predictive analytics utilizes statistical models to forecast future outcomes based on past data.</p> Signup and view all the answers

    Define the term 'feature' and 'target variable' in ML.

    <p>A feature represents an independent variable or characteristic of a data point, while the target variable is the dependent variable we aim to predict.</p> Signup and view all the answers

    Which type of feature represents instances that fall into one category of a set of categories?

    <p>Categorical</p> Signup and view all the answers

    Which type of feature has a natural ordering among its categories?

    <p>Ordinal</p> Signup and view all the answers

    Why is normalization often applied to target variables in ML?

    <p>Normalization helps standardize target variables to ensure they have a specific range, typically zero mean and unit variance, or a range of [0, 1].</p> Signup and view all the answers

    What is the main purpose of 'overfitting' in ML?

    <p>Overfitting occurs when a model becomes too closely tailored to the training data and performs poorly on unseen data.</p> Signup and view all the answers

    Which of the following is NOT a solution to mitigate overfitting?

    <p>Feature Engineering</p> Signup and view all the answers

    What is the main role of the 'validation set' in ML?

    <p>It is used during training to tune model parameters and prevents overfitting.</p> Signup and view all the answers

    What is the main difference between accuracy and precision in ML?

    <p>Accuracy measures the overall percentage of correct predictions, while precision focuses on the proportion of correct positive predictions out of all positive predictions.</p> Signup and view all the answers

    Which metric is used to measure the proportion of actual positive instances correctly identified as positive?

    <p>Recall</p> Signup and view all the answers

    What are 'outliers' in ML data?

    <p>Outliers are data points that deviate significantly from the typical patterns observed in a dataset.</p> Signup and view all the answers

    Which of the following is NOT a common method to handle outliers in ML data?

    <p>Ignore outliers</p> Signup and view all the answers

    Define 'Survivor Bias' in ML.

    <p>Survivor bias occurs when we focus on observations that survive a process while overlooking those that did not survive, as they are no longer visible. This can lead to misleading conclusions.</p> Signup and view all the answers

    How are 'features' used in machine learning?

    <p>Features are used to represent data points in a way that is meaningful for the ML algorithm to learn patterns. They can be categorical, ordinal, or numerical.</p> Signup and view all the answers

    Explain the main difference in the independence assumption between Naive Bayes and Bayesian networks.

    <p>Naive Bayes assumes conditional independence between features, while Bayesian networks explicitly model conditional dependencies among features.</p> Signup and view all the answers

    The chain rule of probability allows us to express a joint distribution as a product of local distributions in Bayesian networks.

    <p>True</p> Signup and view all the answers

    What are the key components of Bayesian networks?

    <p>They consist of nodes representing random variables or features, edges representing conditional dependencies between variables, and conditional probability tables quantifying the relationships between connected nodes.</p> Signup and view all the answers

    What are the key benefits of Bayesian networks over Naive Bayes models?

    <p>They are less restrictive and more flexible, capable of handling complex relationships and dependencies between features, and they can incorporate prior knowledge about variable dependencies.</p> Signup and view all the answers

    What is the primary challenge associated with learning Bayesian networks?

    <p>The process of learning Bayesian networks is computationally complex, which makes it a challenging task and a subject of ongoing research.</p> Signup and view all the answers

    Which of the following is NOT a key takeaway from the summary of the lecture on Naïve Bayes and Bayesian networks?

    <p>The concept of a feature vector in machine learning was introduced.</p> Signup and view all the answers

    What is the main difference between Naive Bayes and Bayesian networks in terms of handling dependencies?

    <p>Naive Bayes assumes dependencies between features are independent, while Bayesian networks allow for complex dependencies between features.</p> Signup and view all the answers

    Which of the following ML families is best suited for dealing with complex relationships and dependencies between features?

    <p>Bayesian Networks</p> Signup and view all the answers

    Explain the role of data preparation in machine learning.

    <p>Data preparation is crucial for ensuring that the data is processed and transformed into a format that is usable and suitable for the ML algorithms to learn patterns and make predictions.</p> Signup and view all the answers

    Study Notes

    Course Information

    • Course title: Machine Learning for Business Applications
    • Lecture: Introduction to ML for BA – Lecture A.0
    • Instructor: Prof. Dr. Maximilian Schiffer
    • Department: Professorship of Business Analytics & Intelligent Systems, TUM School of Management
    • Institute: Munich Data Science Institute
    • Semester: Winter 2024/25

    Agenda

    • Motivation
    • Basics of Machine Learning
    • Essentials & Training Strategies

    What is Machine Learning?

    • Machine Learning = algorithmic decisions or predictions based on data
    • Training phase: based on historic data
    • Application/Inference phase: based on new data

    Artificial Intelligence & Machine Learning

    • Artificial Intelligence (AI): umbrella term for computer software mimicking human cognition
    • Machine Learning (ML): a subfield of AI using algorithms trained on data to create adaptable models performing specific tasks

    Introduction to ML - History

    • 1940-1950: Early Days (Boolean circuit model of brain, Turing's "Computing Machinery and Intelligence")
    • 1950-1970: Excitement (Early AI programs, Dartmouth meeting, algorithms for logical reasoning)
    • 1970-1990: Knowledge-based approaches (AI winter, expert systems)
    • 2000-2020: High Performance Computing (Big Data & Deep Learning)

    Introduction to ML - Overview

    • Supervised Learning: Classification, Regression
    • Unsupervised Learning: Clustering
    • Reinforcement Learning

    Machine Learning in the Business Context

    • Fraud Detection
    • Recommendations
    • Chatbots
    • Image Generation
    • Customer Segmentation
    • Image Recognition
    • Demand/Load Prediction
    • Predictive Maintenance
    • Predictive Supply Chain Management
    • Personalized Marketing Campaigns

    Scope of the Course

    • Introduction to Machine Learning for Business Applications
    • Naive Bayes & Bayesian Networks
    • Decision Trees
    • Clustering
    • Regression
    • Neural Networks
    • Data Preparation, Generalization & Evaluation
    • Recap & Exam Preparation

    From Data to Information

    • Data Consolidation
    • Selection and Preprocessing
    • Prediction
    • Interpretation & Evaluation

    Focus of this Course

    • Descriptive Analytics (Analysis of historical data)
    • Predictive Analytics (Use statistical models to forecast)
    • Prescriptive Analytics (Recommend actions to optimize)

    Datasets: Features and Target Variables

    • Dataset D = {(xᵢ, yᵢ)}₁
    • xᵢ: K-dimensional feature vector (independent variable)
    • yᵢ: respective target variable (dependent variable)

    Feature Types

    • Categorical Features
    • Ordinal Features
    • Numerical Features

    Excursion: Normalization

    • Rescaling numerical data to similar scales preventing one feature dominating

    Credit Scoring - Features and Target Variables

    • Numerical Features (Loan Amount, Disposable Income)
    • Ordinal Features (Savings, Employment)
    • Categorical Features (Purpose of loan, housing)

    Three basic ML Problems

    • Classification (Predicting categories from existing data)
    • Regression (Predicting continuous values from historic data)
    • Clustering (Finding patterns in data without associated labels)

    Classification - Example

    • Creating statistical models to determine label for new observations

    Regression - Example

    • Creating statistical models that allow predictions for numerical labels based on existing data

    Clustering - Example

    • Grouping data into clusters

    Classification - Notation

    • A mapping y = f(x) from input vectors to outputs (binary classification, C = 2)

    Regression - Notation

    • Mapping y=f(x) with y being continuous values

    Clustering - Notation

    • Data set D = {xᵢ}₁, set of K clusters, zᵢ ∈ {1, .., K} representing cluster

    Feature Engineering - Example

    • Describing states using a vector of features (properties), example: Distance to closest ghost

    Feature Engineering

    • Transforming raw data into meaningful features to improve ML algorithm performance

    Overfitting

    • Model performs well on training data but poorly on new data
    • Model very targeted for training data, hard to generalize.
    • Model simplification, regularization, early stopping, pruning, ensemble methods solve this

    Training, Validation, and Testing

    • Training Set: data used to train the model
    • Validation Set: separate subset for tuning parameters and prevent overfitting
    • Test Set: for assessing model performance on new data

    Excursion: Cross Validation

    • Technique used for evaluating model performance with multiple training and test sets
    • Usually 10-fold cross validation (Data split into 10 subsets)

    Accuracy and Un-balanced Classes

    • Accuracy/Error rate is not good measurement for imbalanced classes
    • Better metrics include precision, recall, and F1-score

    Outliers in the Data

    • Isolated instances that are unlike other instances
    • Dealing with outliers is very important
    • Methods of handling outliers: Removal, identification and fixing

    Survivor Bias

    • Cognitive bias overlooking observations that did not survive a process

    Recap Introduction

    • Key takeaways- Machine learning, historical development and types
    • Classification, clustering and regression problems and their implementations
    • Next topic: basic probabilities, conditional probabilities, and classification of new observations

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz covers the foundational concepts of Machine Learning (ML) and its applications in business. Participants will explore motivation, training strategies, and the relationship between ML and Artificial Intelligence. Get ready to deepen your understanding of these essential topics for today's data-driven environment.

    More Like This

    AI for Business Overview
    16 questions
    تطبيقات الذكاء الاصطناعي
    10 questions
    Use Quizgecko on...
    Browser
    Browser