Machine Learning 600 - Introduction
23 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a significant advantage of using decision trees in machine learning?

Decision trees allow for white-box testing, enabling users to observe their internal workings.

What is one of the main disadvantages of decision trees?

They can create overly complex models, which may lead to overfitting the training data.

How does the ID3 algorithm determine which attribute to split on?

It uses information gain, calculating the difference in entropy before and after the split.

What improvement does the C4.5 algorithm offer over the ID3 algorithm?

<p>C4.5 can handle continuous attributes and missing values more effectively.</p> Signup and view all the answers

What is overfitting in the context of decision trees?

<p>Overfitting occurs when a decision tree model fits the training data too closely, capturing noise.</p> Signup and view all the answers

What technique can be used to avoid overfitting in decision trees?

<p>Pruning the decision tree can reduce its complexity and help avoid overfitting.</p> Signup and view all the answers

Why might a larger training set help decision tree learning?

<p>A larger training set provides more data points, allowing the tree to learn more patterns.</p> Signup and view all the answers

What does it mean when a model is underfitting?

<p>Underfitting occurs when a model is too simplistic to capture the underlying patterns in data.</p> Signup and view all the answers

What is the main purpose of the K-Means algorithm in clustering?

<p>The K-Means algorithm is used to partition data into K distinct clusters by minimizing the variances within each cluster.</p> Signup and view all the answers

How do decision trees make predictions?

<p>Decision trees make predictions by splitting data based on feature values to form branches until a final decision node is reached.</p> Signup and view all the answers

What is overfitting in machine learning?

<p>Overfitting occurs when a model learns the training data too well, capturing noise instead of the underlying patterns, which harms its performance on new data.</p> Signup and view all the answers

What are some common strategies to avoid overfitting in decision trees?

<p>Common strategies include pruning the tree, limiting the maximum depth, and setting a minimum number of samples required to split a node.</p> Signup and view all the answers

Why is the silhouette method important in clustering?

<p>The silhouette method measures how similar an object is to its own cluster compared to other clusters, helping to evaluate the quality of clustering.</p> Signup and view all the answers

Name three advantages of using decision trees.

<p>Decision trees are easy to interpret, require little data preprocessing, and can handle both categorical and numerical data effectively.</p> Signup and view all the answers

What are the limitations of decision trees?

<p>Decision trees can be prone to overfitting, are sensitive to small data variations, and may not perform well with imbalanced datasets.</p> Signup and view all the answers

What is the primary advantage of using decision trees in data analysis?

<p>The primary advantage of using decision trees is their ease of interpretation and ability to handle both numerical and categorical data.</p> Signup and view all the answers

In the context of clustering, how does the K-means algorithm generally function?

<p>The K-means algorithm clusters data by partitioning it into K distinct groups based on feature similarity, using the mean of the data points in each cluster.</p> Signup and view all the answers

What is overfitting in machine learning and how can it affect decision trees?

<p>Overfitting occurs when a model learns the training data too well, capturing noise rather than the underlying distribution, which can lead decision trees to make poor predictions on unseen data.</p> Signup and view all the answers

What is underfitting in decision trees, and what causes it?

<p>Underfitting occurs when a decision tree is too simple to capture the underlying structure of the data, often due to insufficient depth or splits.</p> Signup and view all the answers

How can one avoid overfitting when utilizing decision trees?

<p>To avoid overfitting, one can limit the maximum depth of the tree, require a minimum number of samples to split, or use techniques like pruning.</p> Signup and view all the answers

What factors influence the choice of K in the K-means algorithm?

<p>The choice of K is influenced by the dataset's structure, the desired level of granularity in clustering, and methods like the elbow method for optimization.</p> Signup and view all the answers

What role does entropy play in decision tree algorithms?

<p>Entropy measures the impurity or randomness of a dataset; it is used to determine the best splits at each node in a decision tree.</p> Signup and view all the answers

Can decision trees handle both numerical and categorical data simultaneously? Explain.

<p>Yes, decision trees can handle both numerical and categorical data, making them versatile for various types of datasets.</p> Signup and view all the answers

Study Notes

Student Guide - Bachelor of Science in Information Technology - Machine Learning 600

  • Subject: Machine Learning 600
  • Year: 2
  • Institution: RICHFIELD
  • Website: richfield.ac.za

Table of Contents

  • Chapter 1: Introduction to Machine Learning (page 2)
    • 1.1 History of Machine Learning (page 2)
      • Machine learning (ML) is a sub-field of AI that uses computing capabilities to learn from structured or unstructured data.
      • Aims to predict or classify various features. Example: recommender systems, image recognition, medical diagnosis, speech recognition, language translation
      • Key figures in ML development: Alan Turing (1950), Arthur Samuel (defined ML as a scientific field that provides machines with the capacity to learn without programming), Tom Mitchell (defined ML in terms of improvement in performance with experience).
    • 1.2 Type of Machine Learning (page 4)
      • 1.2.1 Supervised Learning:
        • Works with labelled features (input-output).
        • Every observation in the training dataset has an input-output object.
        • Purpose is to predict or classify unknown features in the testing phase.
        • Problems: Bias-variance dilemma, size of training dataset
      • 1.2.2 Unsupervised Learning:
        • Aims to detect hidden patterns in a given dataset.
        • No wrong/right answer; the aim is to understand patterns in the data based on similarity/dissimilarity of hidden patterns.
    • 1.3 Applicability of Machine Learning Techniques (page 6)
      • Widely used in software engineering to improve user experience.
      • Examples: Spam detection, voice recognition, stock trading, robotics, medicine, healthcare, advertising, e-commerce, and gaming.
    • 1.4 Types of Machine Learning Languages & Data Repositories (page 11)
      • Various programming languages for implementing ML systems: Java, Clojure, Python, R, MATLAB, Scala-Scalable language, Ruby.
      • Software tools that use ML algorithms: WEKA, Kafka, Spark & Hadoop, DeepLearning4J.
    • 1.5 Data Repositories (page 13)
      • Different data sources: public data from the internet, datasets from institutions.
      • Data formats: CSV, XML, JSON.
      • Example: Predicting student performance based on the modules they register for.
      • Importance of ML algorithm learning experience to improve features and experimentation results.

Second Chapter

  • Chapter 2: The Machine Learning Project Cycle & Data Acquisition Technique (page 12)
    • 2.1 ML Cycle (page 17)

      • Cycle of actions: acquisition, preparation, processing, reporting.
      • Data from various sources (internal or external).
    • 2.1.1 Transfer Learning (page 18)

      • Utilizing existing models with adjustments to fit new data (images, video, large text corpora).
    • 2.1.2 One solution fits all (page 18)

      • No one-size-fits-all solution; data projects are complex
    • 2.2 Defining the Process (page 19)

      • Importance of planning in ML projects (use A5/A4 notepads/whiteboards).
      • Steps in the process from planning phase, development, testing, reporting, to refining.
    • 2.2.1 Data Processing (page 20)

      • Ensuring data is suitable for use by ML algorithms.
    • 2.2.2 Data Storage (page 20)

      • Data storage options: physical disk or cloud-based.
    • 2.2.3 Data Privacy (page 20)

      • Important considerations in data governance according to the General Data Protection Regulations (GDPR) in Europe, and how personal data is protected and processed.
    • 2.3 Data Quality and Cleaning (page 21)

      • Importance of data quality checks (presence, type, length checks, range checks).
    • 2.4 Experiments (page 22)

    • 2.5 Planning (page 23)

    • 2.6 Scraping Data (page 23)

    • Steps in data extraction/scraping from webpages/other sources

      • Source of data
    • Data extraction process (e.g., tools)

    • Data format (e.g., readable format)

      • Data variable values.
      • Data storage locations.
    • 2.6.1 Copy and Paste (page 24)

      • Web scraping example using Wikipedia to extract airport data.
    • 2.6.2 Using an API (page 25)

      • Using APIs for retrieving data (e.g., weather information).
    • 2.7 Data Migration (page 26)

      • Acquiring data and methods for data/file format conversion.
    • 2.8 Summary (page 26)

Third Chapter

  • Chapter 3: Statistics, Randomness, and Linear Regression in ML (page 25)
    • 3.1 Working with ML Datasets (page 30)

      • Loading ML datasets using Java, converting text to integer format.
    • 3.1.1 Using Java to load ML datasets (page 30)

      • Loading ML datasets using Java
    • 3.1.2 Basic Statistics (page 31)

      • Sum, minimum, maximum, mean, mode, median, range, variance, standard deviation.
    • 3.2 Linear Regression (page 32).

      • Simple linear regression
    • 3.2.1 Scatter Plots (page 35)

      • Graphing data to visualise linear relationship.
    • 3.2.2 Trendline (page 35)

      • Graphing the data trend.
    • 3.2.3 Prediction (page 36)

      • Calculating the predicted score/value based on the linear regression equation. -3.3 Programming (page 36)
      • Use of programming libraries for implementing linear regression
    • 3.4 Randomness (page 37)

    • 3.5 Summary (page 39)

Fourth Chapter

  • Chapter 4: Decision Trees & Clustering (page 37)

    • 4.1 Introduction to Decision Trees (page 42) -Decision trees as a method to select options -Applications in many industry areas -4.1.1 Why use Decision Trees page (43)

    • 4.1.2 Disadvantages of Decision Trees (page 43). - Creation of complex models based on data

    • 4.1.3 Decision Tree Types (page 43)

    -4.1.4 The intuition behind Decision Trees (page 44).

    • 4.2 Entropy Computation (page 46)

    • 4.2.1 Information Gain Computation (page 46)

    • 4.3 Clustering (page 47)

    • 4.3.1 Why using Clustering? (page 47)

    • 4.3.2 Clustering Models (page 48)

    • 4.3.3 K-Means Algorithms (page 48)

    • 4.4 Cross Validation (Page 50)

    • 4.4.1 Silhouette method (page 50)

    • 4.4.2 Data visualization (page 51)

    • 4.5 Summary (page 51)

    • 4.6 Review Questions (page 52)

    • 4.7 MCQs (Quick Quiz) page (54)

Fifth Chapter

  • Chapter 5: Association Rules Learning - Support Vectors Machine & Neural Networks (page 55)
    • 5.1 Association Rules(Page 55)

    • 5.2 Support Vectors Machine (page 58)

      • 5.3.1 SVM classification principle (Page 59)
    • 5.4 Neural Networks (page 59)

      • Activation Functions (Page 62)

      • 5.5 Summary (Page 64)

    • 5.6 Review Questions (Page 65)

    • 5.7 MCQs (Quick Quiz) (Page 66)

Sixth Chapter

  • Chapter 6: Machine Learning with Text Documents, Sentiment Analysis & Image Processing (page 67)
    • 6.1 Preparing Text for Analysis (page 72)

    • 6.2 Stopwords (page 72)

    • 6.3 Stemming (Page 73)

    • 6.4 N-grams (Page 74)

    • 6.5 TF/IDF (Page 74)

    • 6.6 Image Processing in ML.(Page 76)

    • 6.6.1 Color Depth (Page 77)

    • 6.6.2 Images in ML (Page 78).

    • 6.7 Convolutional Neural Network (CNN).

      -6.7.1 Feature Extraction (Page 79)

      • 6.7.2 Classification (Page 80)
    • 6.8 CNN & transfer Learning (page 82) -Performance assessment aspects in ML algorithms.

    • 6.9 Assessing the performance of ML Algorithms (Page 82) -6.9.1 Classification and Confusion Matrix (Page 84) -6.9.2 Regularization (Page 86). -6.10 Summary (page 88)

    • 6.11 Review Questions (Page 88)

    • 6.12 MCQs (Quick Quiz) (Page 89).

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

This quiz covers the essential concepts from the Machine Learning 600 course, specifically focusing on the introduction chapter. It highlights the history of machine learning, key figures in the field, and various types of machine learning, including supervised learning. Test your understanding of these foundational topics in this exciting and rapidly evolving area of computer science.

Use Quizgecko on...
Browser
Browser