Machine Learning Fundamentals Quiz
46 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary purpose of feature selection in the data preprocessing phase?

  • To improve the accuracy of the model
  • To reduce the training time of the algorithm
  • To eliminate irrelevant features
  • All of the above (correct)
  • Untrained algorithms are used during the deployment phase.

    False

    What are the outputs of a supervised machine learning algorithm?

    Labels

    During the prediction phase, new inputs are provided to a __________ machine learning algorithm.

    <p>trained</p> Signup and view all the answers

    Match the phases of machine learning with their corresponding activities:

    <p>Data Preprocessing = Feature Selection Training Phase = Model Training Deployment Phase = Model Prediction Input Phase = Providing Features</p> Signup and view all the answers

    What is the primary purpose of PCA in data analysis?

    <p>To transform data into fewer uncorrelated components</p> Signup and view all the answers

    PCA can require the number of components to be specified in advance.

    <p>True</p> Signup and view all the answers

    Name one challenge associated with interpreting PCA components.

    <p>It is often hard to understand what components represent.</p> Signup and view all the answers

    PCA primarily helps in visualizing __________ data.

    <p>high-dimensional</p> Signup and view all the answers

    Match the following terms related to PCA with their descriptions:

    <p>Principal Component = A direction in the feature space along which the data varies the most Variance = The measure of how much values differ from the mean Dimensionality Reduction = The process of reducing the number of features while retaining essential information Uncorrelated Features = Features that do not influence each other</p> Signup and view all the answers

    What does PCA aim to achieve by transforming data?

    <p>To better capture the relationships between original features</p> Signup and view all the answers

    PCA is guaranteed to provide a clear interpretation of the resulting components.

    <p>False</p> Signup and view all the answers

    What kind of features does PCA produce?

    <p>Uncorrelated features</p> Signup and view all the answers

    What does K-fold cross-validation do?

    <p>It helps in reducing overfitting by validating multiple splits.</p> Signup and view all the answers

    The output layer of a neural network has no influence on the predictions made by the model.

    <p>False</p> Signup and view all the answers

    What is the purpose of the 'MLPRegressor' in the provided content?

    <p>To create a multi-layer perceptron regressor for training a machine learning model.</p> Signup and view all the answers

    Match the following terms with their descriptions:

    <p>K-fold cross-validation = A method to validate a model by splitting data into K subsets MLPRegressor = A neural network model used for regression tasks Training Data = Data used to fit the machine learning model Validation Data = Data used to assess the model's performance</p> Signup and view all the answers

    Which parameter was set to 500 in MLPRegressor?

    <p>Max iterations</p> Signup and view all the answers

    Using a single fold for validation can give a more accurate performance score than K-fold cross-validation.

    <p>False</p> Signup and view all the answers

    What is the effect of increasing the number of hidden layers in an MLPRegressor?

    <p>It can improve the model's ability to learn complex patterns, but may also lead to overfitting.</p> Signup and view all the answers

    What is a dataset?

    <p>A collection of numerical and/or categorical values</p> Signup and view all the answers

    An observation groups values from different variables for multiple items.

    <p>False</p> Signup and view all the answers

    What programming libraries is scikit-learn built on top of?

    <p>NumPy and Matplotlib</p> Signup and view all the answers

    Scikit-learn is ___-source, free to use and contribute.

    <p>open</p> Signup and view all the answers

    Which of the following describes an observation?

    <p>Values of several variables for the same object</p> Signup and view all the answers

    Scikit-learn requires data input to be in the form of a Pandas DataFrame or Numpy array.

    <p>True</p> Signup and view all the answers

    What type of programming paradigm does scikit-learn follow?

    <p>Object-oriented</p> Signup and view all the answers

    The score of the decision tree model on the test set is lower than its cross-validation score.

    <p>False</p> Signup and view all the answers

    The actual classes of the test set were: [2, 1, 0, 1, 0]. The predicted values for these classes are [____, ____, ____, ____, ____].

    <p>'C', 'C', 'A', 'B', 'A'</p> Signup and view all the answers

    Match the following classes with their corresponding predicted values:

    <p>Class A = Predicted: 'A' Class B = Predicted: 'B' Class C = Predicted: 'C'</p> Signup and view all the answers

    Which class had the highest predicted value?

    <p>Class C</p> Signup and view all the answers

    How many samples were used for the analysis?

    <p>40</p> Signup and view all the answers

    The value corresponding to Class A is the highest among the values provided.

    <p>False</p> Signup and view all the answers

    What is the primary goal of supervised machine learning?

    <p>To predict outcomes based on labeled training data</p> Signup and view all the answers

    Unsupervised machine learning relies on labeled training data.

    <p>False</p> Signup and view all the answers

    What is the purpose of cross-validation in machine learning?

    <p>To assess how the results of a statistical analysis will generalize to an independent data set.</p> Signup and view all the answers

    In supervised learning, we use _______ data for training the model.

    <p>labeled</p> Signup and view all the answers

    Match the machine learning techniques with their definitions:

    <p>Supervised Learning = Learning from labeled data Unsupervised Learning = Learning from unlabeled data Cross-validation = Method to validate model performance Overfitting = Model is too complex and fits noise</p> Signup and view all the answers

    Which of the following actions is NOT part of data preprocessing?

    <p>Training the ML model</p> Signup and view all the answers

    Underfitting occurs when a model is too complex for the given data.

    <p>False</p> Signup and view all the answers

    What is overfitting in machine learning?

    <p>When a model learns noise from the training data rather than the underlying pattern.</p> Signup and view all the answers

    The _______ data is used to evaluate the performance of the trained model.

    <p>test</p> Signup and view all the answers

    Match the components of a machine learning model with their roles:

    <p>Training Data = Data used to train the model Test Data = Data used to evaluate model performance Model = The algorithm that makes predictions Features = Input variables for the model</p> Signup and view all the answers

    Which of the following best describes the bias-variance trade-off?

    <p>It is the balance between errors due to bias and variance in models</p> Signup and view all the answers

    Feature selection can help improve the performance of a machine learning model.

    <p>True</p> Signup and view all the answers

    What does the process of standardization refer to in data preprocessing?

    <p>Transforming data to have a mean of zero and a standard deviation of one.</p> Signup and view all the answers

    Study Notes

    Introduction to Machine Learning

    • Machine learning is about building models from data to identify patterns or predict future samples
    • Machine learning is similar to predictive analytics, statistical learning etc.
    • Machine learning is not the same as artificial intelligence

    What is Data?

    • A dataset is a collection of numerical or categorical values
    • A variable is an attribute, criteria, feature, or dimension measured consistently
    • An observation is the values of several variables for a single item, person, unit, etc.

    Machine Learning with Scikit-learn

    • Built on NumPy and Matplotlib
    • Input can be NumPy or Pandas DataFrame, output is NumPy
    • Open-source, free to use and contribute
    • Continuously updated
    • Object-oriented approach: create objects, call methods to fit (train) or transform data

    Unsupervised ML

    • Goal is to learn something from data without knowing answers
    • Data preprocessing and feature selection are crucial steps
    • Algorithm examples: K-means, hierarchical clustering (unsupervised classification), Principal Components Analysis (dimensionality reduction), and some neural networks

    K-Means Clustering

    • Divides data into k disjoint clusters, each with a center (centroid) that minimizes distance to its members
    • Very well-known algorithm
    • High-quality implementations
    • Handles large datasets well
    • Assumes clusters are convex and isotropic

    Clustering example

    • Data is shown for stores grouped by type, size and mean sales

    Principal Components Analysis (PCA)

    • Transforms data to have fewer uncorrelated features that explain most data variance
    • Useful for visualizing high-dimensional data and reducing features
    • Number of components must be specified
    • Components can be hard to interpret

    Unsupervised Methodology

    • Fit the model to the training data
    • Transform the test data using fitted model
    • The model predicts a representation of the test data

    Supervised ML

    • Goal is to learn relationship between input and output data, similar to supervised machine learning
    • Models: Multi-layer perceptron (neural network, regression); Decision trees (classification)

    Deep (Artificial) Neural Networks

    • More layers mean higher capacity (prediction power)
    • Harder to train
    • Deep learning is a form of this

    Supervised: fit, transform, predict

    • Train the model by learning the relationships between x and y where x is input and y is output
    • Build the model
    • Predict values for unseen data (test)

    MLP Regression

    • Python codes show implementation for fitting and scoring models.

    Underfitting / Overfitting

    • In machine learning, underfitting and overfitting can be a problem where the model does not accurately represent the data, whether insufficient training data (underfitting) or overtraining data (overfitting)

    K-fold Cross-validation

    • Used to get a more accurate estimate of model performance
    • The algorithm is split into training and testing data
    • The data is then further split into folds
    • Trains on one fold, and validates / tests on another
    • The scores are averaged to create a more accurate assessment

    Stratified K-fold Cross-validation

    • Stratified K-fold is a modification of K-fold used for classification problems
    • Ensures that the proportion of class labels is roughly the same within the training, validation, and test sets
    • Useful when training data includes imbalanced class

    Decision Trees

    • Learn a hierarchy of if/else questions to classify outputs
    • Starts at root node and answers questions that eventually reach a leaf node with an output label

    Next lecture

    • More advanced models, including score metrics and confusion matrices
    • Optimization of hyperparameters

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Test your knowledge on the fundamental concepts of machine learning, including feature selection, outputs of supervised algorithms, and the phases of machine learning. This quiz covers essential topics crucial for understanding the data preprocessing phase and deployment activities.

    More Like This

    Use Quizgecko on...
    Browser
    Browser