Machine Learning Foundations - Unit 1
48 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What effect does the choice of kernel function have in kernel density estimation?

  • The variance of the data
  • The number of bins used in the histogram
  • The smoothness of the estimated density (correct)
  • The mean of the distribution
  • Which kernel function is commonly used in kernel density estimation?

  • Exponential kernel
  • Poisson kernel
  • Gaussian kernel (correct)
  • Binomial kernel
  • In k-nearest neighbor density estimation, what does the parameter 'k' represent?

  • The width of the kernel function
  • The number of bins in the histogram
  • The number of nearest neighbors considered for each point (correct)
  • The number of data points used for density estimation
  • What is a key advantage of k-nearest neighbor density estimation?

    <p>It is simple and non-parametric</p> Signup and view all the answers

    Which nonparametric method is best suited for handling large datasets with unknown distribution shapes?

    <p>Kernel density estimator</p> Signup and view all the answers

    What is the primary difference between histogram estimators and kernel estimators?

    <p>Histograms use bins while kernels use points</p> Signup and view all the answers

    What limitation is commonly associated with kernel density estimation techniques?

    <p>They require large sample sizes to be accurate</p> Signup and view all the answers

    What does a histogram estimator primarily rely on for its structure?

    <p>The number of bins and their width</p> Signup and view all the answers

    Which clustering algorithm can handle clusters of varying shapes and sizes?

    <p>DBSCAN</p> Signup and view all the answers

    Which clustering algorithm does not require the assumption of equal-sized clusters?

    <p>DBSCAN</p> Signup and view all the answers

    Which clustering algorithm is based on the concept of nearest neighbors?

    <p>K-Nearest Neighbors</p> Signup and view all the answers

    Which assumption does the Naïve Bayes classifier make about features?

    <p>Features are independent given the class label.</p> Signup and view all the answers

    Which probability is calculated in the Naïve Bayes algorithm to classify a new data point?

    <p>Posterior probability</p> Signup and view all the answers

    What is the key equation used in Bayes' Theorem?

    <p>P(A|B) = P(B|A) * P(A) / P(B)</p> Signup and view all the answers

    In a Naïve Bayes classifier, which class is chosen as the predicted class?

    <p>Class with the highest posterior probability</p> Signup and view all the answers

    What is the main purpose of the 'kernel trick' in SVM?

    <p>To transform data into a higher-dimensional space</p> Signup and view all the answers

    Which kernel function is commonly used in Support Vector Machines (SVM) for non-linearly separable data?

    <p>Radial Basis Function (RBF) kernel</p> Signup and view all the answers

    Which of the following is NOT a commonly used kernel in SVM?

    <p>Logistic</p> Signup and view all the answers

    In Bayes' Theorem, what does the term $P(B)$ represent?

    <p>Marginal probability</p> Signup and view all the answers

    Which statement about Naïve Bayes classifiers is accurate?

    <p>It is robust to noise and irrelevant features.</p> Signup and view all the answers

    What is a 'support vector' in the context of SVM?

    <p>A data point that is closest to the decision boundary</p> Signup and view all the answers

    Which activation function is most commonly used in the output layer of a binary classification neural network?

    <p>Sigmoid</p> Signup and view all the answers

    What is the primary role of an activation function in a neural network?

    <p>To introduce non-linearity into the model</p> Signup and view all the answers

    What is a Perceptron in the context of machine learning?

    <p>The simplest form of a neural network</p> Signup and view all the answers

    What does the Simple Matching Coefficient measure?

    <p>The proportion of matching attributes in binary data</p> Signup and view all the answers

    Which metric is used to calculate the correlation between two attributes?

    <p>Pearson Correlation Coefficient</p> Signup and view all the answers

    What does the Cosine Similarity measure?

    <p>The angle between two vectors</p> Signup and view all the answers

    Which of the following measures similarity between binary vectors?

    <p>Simple Matching Coefficient</p> Signup and view all the answers

    What is a key advantage of using Euclidean Distance?

    <p>It is easy to compute and interpret in a continuous space</p> Signup and view all the answers

    In what type of data is the Cosine Similarity particularly useful?

    <p>Text data</p> Signup and view all the answers

    What does a decision tree model do?

    <p>Divides data into branches to make predictions based on feature values</p> Signup and view all the answers

    Which algorithm is commonly used to create a decision tree?

    <p>ID3</p> Signup and view all the answers

    What is a primary difference between histograms and kernel density estimators?

    <p>Kernels are more sensitive to bin width than histograms.</p> Signup and view all the answers

    How does the choice of bandwidth affect kernel density estimation?

    <p>It influences the smoothness versus the bias-variance tradeoff.</p> Signup and view all the answers

    Which statement accurately describes nonparametric methods?

    <p>They do not rely on assumptions about the distribution's form.</p> Signup and view all the answers

    In nonparametric density estimation, what does 'smoothing' signify?

    <p>Adjusting the bandwidth to control the smoothness of the density estimate.</p> Signup and view all the answers

    What is commonly observed when increasing the number of bins in a histogram?

    <p>It produces a more jagged density estimate.</p> Signup and view all the answers

    Which metric is generally the most useful when handling an imbalanced dataset?

    <p>Precision</p> Signup and view all the answers

    What effect does kernel smoothing have compared to histograms in density estimation?

    <p>It generally produces a more continuous estimate.</p> Signup and view all the answers

    When using histograms, what happens if the bin size is too large?

    <p>Details of the data distribution are lost.</p> Signup and view all the answers

    How does the k-Means algorithm initialize cluster centroids?

    <p>Randomly</p> Signup and view all the answers

    What is the role of the ‘k’ parameter in the k-Means algorithm?

    <p>Number of clusters to be formed</p> Signup and view all the answers

    How does the k-Means algorithm update cluster centroids during each iteration?

    <p>By calculating the mean of all data points in each cluster</p> Signup and view all the answers

    What is a major limitation of the k-Means algorithm?

    <p>It is sensitive to initial centroid positions</p> Signup and view all the answers

    How does the k-Means algorithm determine convergence?

    <p>When the centroids stop moving significantly between iterations</p> Signup and view all the answers

    Which distance metric is commonly used in the k-Means algorithm?

    <p>Euclidean distance</p> Signup and view all the answers

    What is the computational complexity of the k-Means algorithm?

    <p>O(n*k)</p> Signup and view all the answers

    Which of the following methods can help improve the performance of the k-Means algorithm?

    <p>Scaling the data to have equal variance</p> Signup and view all the answers

    Study Notes

    Machine Learning Foundations - Unit 1

    • Machine learning is primarily focused on building algorithms that allow computers to learn and improve from data.
    • Examples of machine learning applications include predicting stock prices.
    • Data in machine learning is the information used to train and test models.
    • Supervised learning uses labeled data for learning.
    • Unsupervised learning aims to discover patterns or structures in unlabeled data.
    • Reinforcement Learning is a type of learning where an agent learns by interacting with an environment to maximize cumulative reward.
    • Data accuracy, computational speed and model complexity are challenges in machine learning.
    • The purpose of training in machine learning is to build a model that can predict or classify data.
    • Generalization addresses the model's ability to perform well on unseen data.

    Machine Learning Foundations - Unit 1 Additional Topics (cont'd)

    • Feasibility of learning: The ability to learn effectively from available data.
    • Model complexity: The intricacy of the model; crucial to prevent overfitting.
    • Cross-validation: A measure of the discrepancy between training and testing performance.
    • Underfitting: A model that is too simplistic and fails to capture the underlying patterns in the data.
    • Overfitting: A model that is too complex and fits the noise in the training data but doesn't generalize to new data.
    • Bias-variance tradeoff: Balance between the model fitting the training data perfectly and not over-fitting.
    • Distance Metrics: Various measures for quantifying how far apart data points are, including Euclidean distance, Manhattan distance, and Minkowski distance.
    • Cosine similarity: A measure, useful for text data, of the angle between two vectors.
    • Jaccard coefficient: Measures the similarity between sets, common in text data.
    • Simple Matching Coefficient: Measures the proportion of matching attributes in binary data.
    • Pearson Correlation Coefficient: Used to calculate the correlation between two variables.
    • Distance Metrics: Euclidean distance, Manhattan distance, Minkowski distance, Cosine Similarity, Jaccard coefficient.
    • K-Nearest Neighbors (KNN): An algorithm where a new data point is classified based on the categories of the most similar nearby points.
    • KNN Challenges:: Complexity of decision boundary and high computational cost during prediction.
    • Decision Trees: Algorithm that creates a tree-like structure, effectively partitioning data based on feature values leading to predictions.
    • Decision Tree Algorithms: Apriori, ID3, and C4.5 are widely known.
    • Rule-Based Classifiers': Employ pre-defined if-then rules to classify data.
    • Polynomial Regression: Uses polynomial terms to capture non-linear relationships between variables.
    • Multicollinearity: Occurs when independent variables in a linear regression model are highly correlated.
    • Regularization Techniques: Method employed in models to prevent overfitting and improve generalization to new data like Ridge Regression and Lasso Regression.
    • Model Evaluation Metrics for Regression : MSE (Mean Squared Error), MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), RMSLE (Root Mean Squared Logarithmic Error)

    Clustering Analysis - Unit 2

    • k-Means Clustering: Aims to partition data into 'k' clusters based on minimizing the within-cluster variance.
    • Hierarchical Clustering: Creates clusters in a hierarchical structure (either agglomerative or divisive).
    • Agglomerative Clustering: Starts with each data point as a separate cluster and merges them based on the minimum distance.
    • Divisive Clustering: Starts with all data points in one cluster and recursively splits them into smaller clusters.
    • Ward's Method: A Hierarchical clustering method that minimizes the sum of squared differences within clusters.
    • Dendrogram: A tree-like diagram showing the hierarchical relationship between clusters.
    • Silhouette Coefficient: A measure of how similar a data point is to its own cluster compared to other clusters.
    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Clusters data points based on density, identifying outliers as noise.
    • Core Points: Data points surrounded by a minimum number of points within a specific radius.
    • K-nearest neighbors (KNN): used in algorithms like DBSCAN
    • Feature scaling/data normalization: Improves the performance of some algorithms like KMeans.

    Naïve Bayes and Support Vector Machines (SVM) - Unit 3

    • Naive Bayes: A probabilistic classifier based on Bayes' Theorem assuming features are conditionally independent given the class label.
    • Support Vector Machines (SVM): A supervised learning method that finds the optimal hyperplane to separate data points of different classes.
    • Kernel Trick: Transforms the data into a higher-dimensional space to solve non-linearly separable problems.
    • Linear Kernel: Used for linearly separable data.
    • Polynomial Kernel: Used for non-linearly separable data.
    • Gaussian Kernel: Also known as RBF (Radial Basis Function) kernel. Used for non-linearly separable data.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    i want same questions in pdf

    Use Quizgecko on...
    Browser
    Browser