Foundations of Machine Learning PDF

Foundations of Machine Learning UNIT 1 1. What is machine learning primarily concerned with ? - A) Building algorithms that allow computers to - B) Programming computers to perform predefined learn from data tasks - C) Using computers to manually analyze data - D) Creating physical machines that can learn autonomously **Correct Answer:** A 2. Which of the following is an application of machine learning? - A) Predicting stock prices - B) Building bridges - C) Writing software code - D) Designing hardware components **Correct Answer:** A 3. In the context of machine learning, what is data? - A) Information that is used to train and test - B) Raw hardware components models - C) The result of a computation - D) A programming language **Correct Answer:** A 4. Which type of machine learning involves learning from labeled data? - A) Supervised Learning - B) Unsupervised Learning - C) Reinforcement Learning - D) Semi-supervised Learning **Correct Answer:** A 5. What does unsupervised learning aim to discover? - A) The best policy for decision making - B) The best model parameters in labeled data - C) Hidden patterns or structures in unlabeled data D) The optimal value for an objective function - **Correct Answer:** C 6. In reinforcement learning, what does an agent seek to maximize? - A) Computational speed - B) Data accuracy - C) Labelled datasets - D) Cumulative reward **Correct Answer:** D 7. What is meant by the feasibility of learning? - A) The amount of data required - B) The speed of the learning algorithm - C) The ability to learn effectively from available - D) The complexity of the model data **Correct Answer:** C 8. Which of the following represents a challenge in machine learning? - A) Increasing computational power - B) Managing error and noise in data - C) Designing hardware - D) Writing software code **Correct Answer:** B 9. **What is the primary purpose of training in machine learning?** - A) To interpret results - B) To collect data - C) To test the model - D) To build a model that can predict or classify data **Correct Answer:** D 10. What does the theory of generalization address? - A) How well a model performs on unseen data - B) The complexity of the learning algorithm - C) The speed of data processing - D) The amount of training data **Correct Answer:** A 11. Which term describes the discrepancy between training and testing performance? - A) Cross-validation - B) Overfitting - C) Underfitting - D) Bias-Variance Tradeoff **Correct Answer:** D 12. What is the learning curve in machine learning? - A) A measure of model complexity - B) A mathematical formula for learning rates - C) A graphical representation of model - D) A method for data transformation performance over time **Correct Answer:** C 13. Which distance metric calculates the straight-line distance between two points in Euclidean space? - A) Manhattan Distance - B) Euclidean Distance - C) Minkowski Distance - D) Cosine Similarity **Correct Answer:** B 14. What is the Minkowski distance formula used for? - A) Measuring distance in a generalized way that - B) Measuring similarity between binary data includes Euclidean and Manhattan distances as special cases - C) Calculating correlation between attributes - D) Determining the similarity of categorical data **Correct Answer:** A 15. How does the Jaccard coefficient measure similarity? - A) By calculating the average distance of data - B) By calculating the angle between vectors points - C) By measuring the straight-line distance - D) By comparing the proportion of matching between points positive attributes **Correct Answer:** D 16. Which similarity measure is most appropriate for comparing text documents of different lengths? - A) Jaccard Coefficient - B) Euclidean Distance - C) Cosine Similarity - D) Minkowski Distance **Correct Answer:** C 17. What does the Simple Matching Coefficient measure? - A) The proportion of matching attributes in - B) The distance between two data points binary data - C) The angle between two vectors - D) The correlation between two variables **Correct Answer:** A 18. Which metric is used to calculate the correlation between two attributes? - A) Pearson Correlation Coefficient - B) Jaccard Coefficient - C) Euclidean Distance - D) Cosine Similarity **Correct Answer:** A 19. What does the Cosine Similarity measure? - A) The straight-line distance between two points - B) The angle between two vectors - C) The number of matching attributes - D) The proportion of dissimilar attributes **Correct Answer:** B 20. Which of the following measures similarity between binary vectors? - A) Minkowski Distance - B) Simple Matching Coefficient - C) Euclidean Distance - D) Pearson Correlation Coefficient **Correct Answer:** B 21. What is a key advantage of using Euclidean Distance? - A) It scales well with high-dimensional data - B) It handles categorical data well - C) It measures similarity rather than dissimilarity - D) It is easy to compute and interpret in a continuous space **Correct Answer:** D 22. In what type of data is the Cosine Similarity particularly useful? - A) Categorical data - B) Numerical data - C) Text data - D) Time-series data **Correct Answer:** C 23. What does a decision tree model do? - A) Divides data into branches to make - B) Groups similar data points into clusters predictions based on feature values - C) Reduces dimensionality of the data - D) Calculates the distance between data points **Correct Answer:** A 24. Which algorithm is commonly used to create a decision tree? - A) Apriori - B) K-means - C) ID3 - D) SVM **Correct Answer:** C 25. What does a rule-based classifier use to make decisions? - A) The angle between vectors - B) The distance between data points - C) A set of if-then rules based on feature values - D) A set of clustered data points **Correct Answer:** C 26. In the K-Nearest Neighbors (KNN) algorithm, what does the parameter 'K' represent? - A) The distance metric used - B) The number of features in the dataset - C) The number of classes in the dataset - D) The number of nearest neighbors to consider **Correct Answer:** D 27. Which distance metric is commonly used in KNN? - A) Cosine Similarity - B) Manhattan Distance - C) Euclidean Distance - D) Jaccard Coefficient **Correct Answer:** C 28. What is a key consideration when choosing the value of K in KNN? - A) The number of features - B) A balance between underfitting and overfitting - C) The distance metric - D) The size of the training dataset **Correct Answer:** B 29. Which of the following is NOT an advantage of decision trees? - A) They handle both numerical and categorical - B) They are easy to interpret data - C) They always provide the best accuracy - D) They can be used for both classification and regression **Correct Answer:** C 30. What is the purpose of pruning a decision tree? - A) To reduce overfitting by removing branches - B) To increase the depth of the tree that contribute little to the model - C) To split nodes into more branches - D) To add more features to the model **Correct Answer:** A 31. How does a nearest-neighbor classifier make predictions? - A) By splitting data into clusters - B) By calculating the average of feature values - C) By finding the most common class among the - D) By applying a set of decision rules nearest neighbors **Correct Answer:** C 32. Which of the following is a limitation of KNN? - A) Complexity of the decision boundary - B) Difficulty in interpreting the model - C) Inability to handle categorical data - D) High computational cost during prediction **Correct Answer:** D 33. What does a rule-based classifier use to determine class labels? - A) A distance measure between data points - B) A set of predefined rules based on feature values - C) A clustering algorithm - D) A regression model **Correct Answer:** B 34. In decision trees, what is entropy used to measure? - A) The correlation between features - B) The distance between nodes - C) The impurity or randomness in a dataset - D) The precision of the model **Correct Answer:** C 35. Which criterion is often used to split nodes in decision trees? - A) Gini impurity or information gain - B) Euclidean distance or Manhattan distance - C) Cosine similarity or Jaccard coefficient - D) Mean squared error or mean absolute error **Correct Answer:** A 36. What is the goal of linear regression? - A) To find the line that best fits the data points - B) To classify data into categories - C) To cluster similar data points - D) To transform data into a different format **Correct Answer:** A 37. In simple linear regression, what does the slope of the line represent? - A) The mean of the residuals - B) The intercept of the line - C) The correlation between variables - D) The change in the dependent variable for a one-unit change in the independent variable **Correct Answer:** D 38. What does multiple linear regression allow you to do? - A) Reduce dimensionality of the data - B) Model non-linear relationships between variables - C) Predict binary outcomes - D) Model the relationship between a dependent variable and multiple independent variables **Correct Answer:** D 39. Which term refers to the error introduced when a model is too complex and fits the noise in the training data? - A) Overfitting - B) Underfitting - C) Bias - D) Variance **Correct Answer:** A 40. What is polynomial regression used for? - A) Reducing the number of features - B) Modeling non-linear relationships by including higher-degree terms - C) Classifying data into distinct categories - D) Clustering similar data points **Correct Answer:** B 41. In polynomial regression, what effect does increasing the polynomial degree have on the model? - A) It decreases the model's accuracy - B) It simplifies the model - C) It increases the model's flexibility, potentially leading to overfitting - D) It reduces the number of features **Correct Answer:** C 42. What is a common technique to assess the performance of a regression model? - A) Cross-validation - B) Clustering - C) Principal Component Analysis - D) Feature scaling **Correct Answer:** A 43. What is the purpose of adding an intercept term in a linear regression model? - A) To account for the baseline level of the dependent variable when all independent variables are zero - B) To increase the number of features - C) To reduce the error in predictions - D) To scale the features **Correct Answer:** A 44. Which of the following methods can be used to prevent overfitting in polynomial regression? - A) Reducing the training dataset size - B) Increasing the polynomial degree - C) Regularization techniques - D) Using more features **Correct Answer:** C 45. What does the R-squared value indicate in a regression model? - A) The proportion of variance in the dependent variable explained by the independent variables - B) The mean of the residuals - C) The slope of the regression line - D) The correlation between the dependent and independent variables **Correct Answer:** A 46. What does a residual represent in a regression model? - A) The correlation between variables - B) The model's complexity - C) The difference between the observed and predicted values - D) The mean value of the dependent variable **Correct Answer:** C 47. Which regression technique is suitable for handling high-dimensional datasets? - A) KNN Regression - B) Polynomial Regression - C) Decision Tree Regression - D) Ridge Regression **Correct Answer:** D 48. What is the main advantage of using regularization in regression models? - A) It simplifies the model - B) It helps to prevent overfitting by penalizing large coefficients - C) It increases the model's accuracy - D) It reduces the number of features **Correct Answer:** B 49. Which type of regression is used when the relationship between variables is non-linear but can be approximated with polynomial terms? - A) Linear Regression - B) Polynomial Regression - C) Logistic Regression - D) Ridge Regression **Correct Answer:** B 50. In multiple linear regression, what does multicollinearity refer to? - A) The presence of outliers in the data - B) High correlation between the dependent and independent variables - C) High correlation between independent variables - D) The complexity of the model **Correct Answer:** C 51. Which metric is commonly used to evaluate regression model performance? - A) Jaccard Index - B) Gini Index - C) Silhouette Score - D) Mean Squared Error (MSE) **Correct Answer:** D 52. What is the effect of increasing the number of features in a regression model? - A) It may lead to overfitting if not managed properly - B) It always improves model performance - C) It simplifies the model - D) It eliminates the need for regularization **Correct Answer:** A 53. What does a high p-value in regression analysis indicate about a predictor variable? - A) The variable is highly correlated with other predictors - B) The variable has a strong effect on the outcome - C) The variable may not be a significant predictor of the outcome - D) The variable improves the model's fit **Correct Answer:** C 54. Which type of regression can be used when the dependent variable is categorical? - A) Logistic Regression - B) Linear Regression - C) Polynomial Regression - D) Ridge Regression **Correct Answer:** A 55. What is the main purpose of using cross-validation in regression? - A) To increase the model's complexity - B) To assess the model's performance on unseen data - C) To transform the data - D) To reduce the dimensionality of the data **Correct Answer:** B 56. What does an increase in the variance of a regression model indicate? - A) The model may be overfitting the training data - B) The model is underfitting the data - C) The model is generalizing well - D) The model is not complex enough **Correct Answer:** A 57. Which of the following techniques can be used to handle multicollinearity in multiple linear regression? - A) Using non-linear regression models - B) Adding more features - C) Regularization (e.g., Lasso, Ridge) - D) Increasing the sample size **Correct Answer:** C 58. What is the main difference between simple linear regression and multiple linear regression? - A) Simple linear regression uses one independent variable, while multiple linear regression uses multiple independent variables - B) Simple linear regression is used for classification tasks - C) Multiple linear regression assumes a non-linear relationship - D) Simple linear regression uses polynomial terms **Correct Answer:** A 59. What is the purpose of residual plots in regression analysis? - A) To assess the fit of the regression model by plotting residuals against predicted values or other variables - B) To calculate the R-squared value - C) To determine the significance of predictors - D) To normalize the data **Correct Answer:** A 60. Which method is used to estimate the parameters in a linear regression model? - A) Support Vector Machines (SVM) - B) K-Means Clustering - C) Decision Trees - D) Ordinary Least Squares (OLS) **Correct Answer:** D 61. **In polynomial regression, what effect does adding more polynomial terms have?** - A) It simplifies the model - B) It can increase the model's ability to capture complex patterns, but may also lead to overfitting - C) It reduces the model's complexity - D) It eliminates the need for regularization **Correct Answer:** B 62. Which of the following is a common method for regularization in regression models? - A) Principal Component Analysis - B) K-Means Clustering - C) Lasso Regression - D) Hierarchical Clustering **Correct Answer:** C 63. What is the purpose of feature scaling in regression? - A) To reduce the model's complexity - B) To increase the number of features - C) To standardize the range of independent variables - D) To improve the accuracy of predictions **Correct Answer:** C 64. What does the term "model interpretability" refer to in regression analysis? - A) The ability to understand and explain the model's predictions and decision-making process - B) The accuracy of the model - C) The computational efficiency of the model - D) The ability to handle large datasets **Correct Answer:** A 65. Which of the following can be used to visualize the fit of a regression model? - A) Scatter plot with regression line - B) Heatmap - C) Bar chart - D) Pie chart **Correct Answer:** A 66. In multiple linear regression, what does a high variance in the residuals suggest? - A) The model has low bias - B) The model is underfitting the data - C) The model is generalizing well - D) The model may be overfitting the training data **Correct Answer:** D 67. What is the primary goal of regression analysis? - A) To model the relationship between a dependent variable and one or more independent variables - B) To cluster similar data points - C) To classify data into categories - D) To transform the data into a different format **Correct Answer:** A 68. Which of the following is a benefit of using polynomial regression over simple linear regression? - A) Ability to model non-linear relationships - B) Simplicity of the model - C) Lower computational cost - D) Better performance with categorical data **Correct Answer:** A 69. Which regression method is used when you want to account for collinearity among predictors? - A) KNN Regression - B) Polynomial Regression - C) Logistic Regression - D) Ridge Regression **Correct Answer:** D 70. In linear regression, what does the term "least squares" refer to? - A) Minimizing the sum of the squared residuals - B) Maximizing the likelihood of the data - C) Reducing the number of features - D) Measuring the correlation between variables **Correct Answer:** A 71. Which regression technique is used when the response variable is binary? - A) Polynomial Regression - B) Linear Regression - C) Logistic Regression - D) Ridge Regression **Correct Answer:** C 72. What is the purpose of the coefficient of determination (R-squared)? - A) To explain the proportion of variance in the dependent variable that is predictable from the independent variables - B) To calculate the mean error of predictions - C) To determine the best-fit line for the data - D) To select the optimal number of features **Correct Answer:** A 73. Which of the following is an advantage of polynomial regression? - A) It always produces a linear model - B) It can capture non-linear relationships between variables - C) It reduces the complexity of the model - D) It eliminates the need for feature scaling **Correct Answer:** B 74. What does the term "heteroscedasticity" refer to in regression analysis? - A) The accuracy of predictions - B) The model's performance on different datasets - C) The correlation between independent variables - D) The variance of the residuals changes across levels of an independent variable **Correct Answer:** D 75. What technique is used to evaluate the robustness of a regression model? - A) Data transformation - B) Feature selection - C) Cross-validation - D) Dimensionality reduction **Correct Answer:** D UNIT-2 Clustering Analysis 1. What is the primary objective of k-Means clustering in data mining? a) Classification b) Regression d) Association rule mining c) Clustering 2. How does the k-Means algorithm initialize cluster centroids? a) Randomly b) Using the mean of all data points c) Based on the median data point d) By choosing the farthest data points 3. What is the role of the ‘k’ parameter in the k-Means algorithm? a) Number of clusters to be formed b) Distance metric used for clustering c) Learning rate for centroid updates d) Number of iterations for convergence 4. How does the k-Means algorithm update cluster centroids during each iteration? a) By calculating the mean of all data points in b) By choosing the data point closest to the centroid each cluster c) By merging clusters with similar centroids d) By selecting the most central data point 5. What is a major limitation of the k-Means algorithm? a) It cannot handle large datasets b) It requires a large number of clusters c) It is sensitive to initial centroid positions d) It cannot handle categorical data 6. How does the k-Means algorithm determine convergence? a) When the centroids stop moving significantly b) When all data points are assigned to a cluster between iterations c) After a fixed number of iterations d) When the number of clusters equals ‘k’ 7. Which distance metric is commonly used in the k-Means algorithm? a) Manhattan distance b) Hamming distance c) Euclidean distance d) Cosine similarity 8. What is the computational complexity of the k-Means algorithm? a) O(n) b) O(n log n) c) O(n^2) d) O(n*k) 9. Which of the following methods can help improve the performance of the k-Means algorithm? a) Using a larger value of ‘k’ b) Reducing the number of iterations c) Scaling the data to have equal variance d) Initializing centroids close to the mean of the data 10. What is the main advantage of the k-Means algorithm? a) It is robust to outliers b) It guarantees convergence to the global optimum c) It can handle non-linear data d) It is computationally efficient Answer: d) It is computationally efficient 13. Which is needed by K-means clustering? (A). defined distance metric (B). number of clusters (C). initial guess as to cluster centroids (D). all of these 14. Which function is used for k-means clustering? (A). k-means (B). k-mean (C). heatmap (D). none of the mentioned 15. Which is conclusively produced by Hierarchical Clustering? (A). final estimation of cluster centroids (B). tree showing how nearby things are to each other (D). all of these (C). assignment of each point to clusters 16. Which clustering technique requires a merging approach? (A). Partitional (B). Hierarchical (C). Naive Bayes (D). None of the mentioned 17. What is the primary objective of Hierarchical Clustering in data mining? a) To classify data points into predefined b) To find the optimal number of clusters clusters automatically c) To reduce the dimensionality of the data d) To perform regression analysis 18. How does Hierarchical Clustering initially treat each data point? a) As a separate cluster b) By assigning it to the nearest centroid c) By randomly assigning it to a cluster d) By calculating its distance to all other points 19. What is the main difference between agglomerative and divisive hierarchical clustering? a) Agglomerative starts with all data points b) Agglomerative merges clusters, while in one cluster, while divisive starts with divisive splits clusters each point as a separate cluster c) Agglomerative uses Euclidean distance, while divisive uses Manhattan distance d) Agglomerative is faster than divisive 20. Which linkage method in hierarchical clustering merges clusters based on the minimum distance between points in each cluster? a) Single linkage b) Complete linkage c) Average linkage d) Ward’s linkage 21. What is the dendrogram used for in hierarchical clustering? a) To visualize the data points b) To display the distance between clusters at each merge step c) To select the optimal number of clusters d) To measure the silhouette coefficient 22. Which method is used to determine the number of clusters in hierarchical clustering? a) Silhouette coefficient b) Elbow method c) Dendrogram d) F-measure 23. What does the “linkage distance” measure in hierarchical clustering? a) The distance between data points within a b) The distance between centroids of cluster clusters c) The distance between clusters at each d) The number of iterations until merge step convergence 24. In hierarchical clustering, what does “Ward’s method” prioritize during cluster merging? a) Minimizing the maximum variance b) Minimizing the sum of squared within clusters differences within clusters c) Minimizing the sum of squared differences between clusters d) Maximizing the silhouette coefficient 25. Which of the following is an advantage of hierarchical clustering? a) It requires less computational resources b) It guarantees convergence to the global than k-means clustering optimum c) It is scalable to large datasets d) It does not require specifying the number of clusters in advance 26. What is a potential drawback of hierarchical clustering? a) It is computationally expensive for b) It cannot handle categorical data large datasets c) It is sensitive to the initial placement of d) It requires normalization of data centroids 27. What is DBSCAN primarily used for in data mining? a) Regression analysis b) Clustering spatial data c) Dimensionality reduction d) Classification 28. How does DBSCAN determine the core points in a dataset? a) By calculating the mean of all data pointsb) By measuring the distance to the nearest centroid c) By counting the number of points d) By applying principal component within a specified radius (ε) analysis Answer: c) By counting the number of points within a specified radius (ε) 29. What does the ε parameter control in DBSCAN? a) The number of clusters to form b) The minimum number of points required to form a cluster c) The maximum distance between two d) The number of iterations for convergence points to be considered neighbours 30. What is the significance of the MinPts parameter in DBSCAN? a) It defines the maximum radius for b) It determines the number of clusters clustering c) It specifies the minimum number of d) It measures the density of each point points required to form a dense region 31. How does DBSCAN handle noise points in a dataset? a) By assigning noise points to the nearest b) By removing noise points from the cluster dataset c) By ignoring noise points during d) By treating noise points as a separate clustering cluster 32. Which of the following statements about DBSCAN is true? a) It requires the number of clusters to beb) It is sensitive to the order of data known in advance points c) It can only handle numerical data d) It uses a hierarchical approach for clustering 33. What is the primary advantage of DBSCAN compared to k-means clustering? a) It is faster and more scalable for large b) It guarantees convergence to the global datasets optimum c) It does not require the number of d) It handles non-linear relationships clusters to be specified beforehand between data points 34. What is the computational complexity of DBSCAN? a) O(n log n) b) O(n^2) c) O(n) d) O(n*k) 35. Which type of clusters can DBSCAN effectively identify? a) Clusters of equal sizes b) Clusters with arbitrary shapes and sizes c) Clusters with high-dimensional data d) Clusters with categorical data 36. What is a key limitation of DBSCAN? a) It is sensitive to noise and outliers b) It requires a large number of iterations for convergence d) It is computationally expensive for large c) It cannot handle high-dimensional data datasets 37. What is the primary purpose of the Apriori algorithm in data mining? a) Classification b) Regression c) Association rule mining d) Clustering 38. Which of the following is the first step in the Apriori algorithm? a) Generate candidate itemsets b) Calculate confidence c) Prune non-frequent itemsets d) Generate frequent itemsets 39. In the context of the Apriori algorithm, what is ‘support’? a) The ratio of the number of transactions b) The probability of an itemset occurring that contain an itemset to the total given another itemset number of transactions c) The strength of an association rule d) The total number of items in a transaction 40. How does the Apriori algorithm generate candidate itemsets? a) By randomly selecting items b) By using frequent itemsets from the previous iteration c) By clustering similar items d) By sorting items based on their frequency 41. What is the purpose of the pruning step in the Apriori algorithm? a) To generate new candidate itemsets b) To remove infrequent itemsets c) To calculate the confidence of rules d) To sort the itemsets 42. In the Apriori algorithm, which property helps reduce the number of candidate itemsets? a) Monotonicity property b) Anti-monotonicity property c) Transitivity property d) Symmetry property 43. What does the ‘confidence’ measure in the context of association rules? a) The frequency of the rule in the dataset b) The accuracy of the rule c) The probability that the rule’s d) The number of items in the rule consequent is true given the antecedent 44. If an itemset is frequent, what can be said about its subsets in the context of the Apriori algorithm? a) They must also be frequent b) They can be infrequent c) They are irrelevant d) They are less important 45. What does the term ‘lift’ indicate in association rule mining? a) The number of items in a transaction b) The strength of a rule over random occurrence c) The total number of transactions d) The sum of item frequencies 46. Which of the following is a limitation of the Apriori algorithm? a) It cannot handle large datasets b) It requires multiple database scans c) It does not support numeric data d) It cannot generate rules with more than two items 47. What does the “FP” in FP-Growth stand for? a) Frequent Pattern b) Fast Processing c) Fixed Pattern d) Frequent Path 48. What is the primary purpose of the FP-Growth algorithm in data mining? a) Clustering b) Regression c) Association rule mining d) Classification 49. Which data structure is central to the FP-Growth algorithm? a) Decision tree b) Hash table c) FP-Tree (Frequent Pattern Tree) d) Graph 50. How does the FP-Growth algorithm differ from the Apriori algorithm in finding frequent itemsets? a) FP-Growth uses candidate generation b) FP-Growth uses depth-first search c) FP-Growth does not generate candidate d) FP-Growth uses a horizontal data format itemsets 51. What is the first step in the FP-Growth algorithm? a) Construct the FP-Tree b) Generate candidate itemsets c) Sort the items by frequency d) Prune infrequent itemsets 52. In the FP-Growth algorithm, what does the “conditional FP-Tree” represent? a) A subtree with items that meet a minimum support b) A tree that shows all possible item combinations threshold c) A tree built for each frequent item with d) A pruned version of the original FP-Tree conditional transactions 53. What is a major advantage of the FP-Growth algorithm over the Apriori algorithm? a) It uses a breadth-first search b) It avoids multiple database scans c) It generates fewer association rules d) It is easier to implement 54. What is the key purpose of the header table in the FP-Growth algorithm? a) To store the frequency of each item b) To link nodes in the FP-Tree with the same item c) To sort items in the FP-Tree d) To merge identical transactions 55. Which operation is critical for constructing the FP-Tree in the FP-Growth algorithm? a) Sorting transactions b) Calculating confidence c) Intersecting itemsets d) Pruning the tree 56. What is a key limitation of the FP-Growth algorithm? a) High computational cost for candidate generation b) Difficulty in handling large datasets c) Large memory requirement for constructing d) Slow processing speed the FP-Tree 57. Which of the following is a goal of clustering algorithms? a. Classification b. Regression c. Dimensionality reduction d. Grouping similar data points together 58. Which clustering algorithm is based on the concept of centroids? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 59. Which clustering algorithm does not require specifying the number of clusters in advance? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 60. Which clustering algorithm is sensitive to the order of the data points? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 61. Which clustering algorithm is based on a density-based approach? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 62. Which clustering algorithm uses a hierarchical approach to create clusters? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 63. Which clustering algorithm is based on the concept of minimizing the within-cluster variance? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 64. Which clustering algorithm is capable of detecting outliers as noise points? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 65. Which clustering algorithm is suitable for non-linearly separable data? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 66. Which clustering algorithm assigns data points to the nearest cluster centroid? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 67. Which clustering algorithm is computationally efficient for large datasets? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 68. Which clustering algorithm can handle clusters of varying shapes and sizes? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 69. Which clustering algorithm does not require the assumption of equal-sized clusters? a. K-Means b. DBSCAN c. Agglomerative d. Mean-Shift 70. Which clustering algorithm is based on the concept of nearest neighbors? a. K-Means b. DBSCAN c. Agglomerative d. K-Nearest Neighbors 1. **Which of the following assumptions does the Naïve Bayes classifier make about features?** - A) Features are dependent on each other. - B) Features are independent given the class label. - C) Features are mutually exclusive. - D) Features are independent and mutually exclusive. - **Correct Option:** B 2. **Which probability is calculated in the Naïve Bayes algorithm to classify a new data point?** - A) Joint probability - B) Conditional probability - C) Marginal probability - D) Posterior probability - **Correct Option:** D 3. **What is the key equation used in Bayes' Theorem?** - **Correct Option:** C 4. **In a Naïve Bayes classifier, the class with the highest _____ is chosen as the predicted class.** - A) Prior probability - B) Posterior probability - C) Likelihood - D) Joint probability - **Correct Option:** B 5. **Which of the following is an advantage of the Naïve Bayes classifier?** - A) It works well with small datasets only. - B) It can handle continuous and categorical data well. - C) It does not require independence of features. - D) It is highly computationally expensive. - **Correct Option:** B 6. **Which kernel function is commonly used in Support Vector Machines (SVM) to handle non-linearly separable data?** - A) Linear kernel - B) Polynomial kernel - C) Radial Basis Function (RBF) kernel - D) Sigmoid kernel - **Correct Option:** C 7. **What is the objective of the Support Vector Machine (SVM) algorithm?** - A) Maximize the number of support vectors - B) Maximize the distance between the decision boundary and support vectors - C) Minimize the distance between all data points - D) Minimize the number of support vectors - **Correct Option:** B 8. **In SVM, what is the 'kernel trick' used for?** - A) To convert SVM into a linear classifier - B) To increase the size of the dataset - C) To transform data into a higher-dimensional space - D) To reduce computational complexity - **Correct Option:** C 9. **Which of the following is NOT a commonly used kernel in SVM?** - A) Linear - B) Gaussian - C) Logistic - D) Polynomial - **Correct Option:** C 10. **In Bayes' Theorem, the term \( P(B) \) in the formula \( P(A \mid B) = \frac{P(A) \cdot P(B \mid A)}{P(B)} \) is known as what?** - A) Posterior probability - B) Likelihood - C) Marginal probability - D) Prior probability - **Correct Option:** C 11. **Which of the following statements about Naïve Bayes is true?** - A) It performs well on small datasets with complex structures. - B) It assumes that features are conditionally dependent given the class. - C) It is highly sensitive to irrelevant features. - D) It is robust to noise and irrelevant features. - **Correct Option:** D 12. **In SVM, what is a 'support vector'?** - A) A vector that defines the direction of the decision boundary - B) A data point that is closest to the decision boundary - C) A vector that maximizes the margin between classes - D) A vector that reduces the margin between classes - **Correct Option:** B These questions cover the fundamentals of Naïve Bayes classifiers, Bayes' Theorem, and Support Vector Machines (SVM), including key concepts, advantages, and mechanisms. UNIT-3 MCQ 1. Which of the following is NOT a component of a basic artificial neuron? A. Input weights B. Activation function C. Loss function D. Bias term 2. Which activation function is commonly used in the output layer of a binary classification neural network? A. ReLU B. Sigmoid C. Tanh D. Softmax 3. What is the main purpose of the activation function in a neural network? A. To scale the input data B. To introduce non-linearity into the model C. To adjust the learning rate D. To compute the loss 4. What is a Perceptron? A. A type of support vector machine B. The simplest form of a neural network C. A deep learning model D. A clustering algorithm 5. Which of the following is true about the Perceptron algorithm? A. It can only solve non-linear problems B. It uses a gradient descent approach for weight updates C. It is guaranteed to converge if the data is D. It requires many hidden layers to linearly separable work effectively 6. What role does the bias term play in a neural network? A. It adjusts the learning rate of the network B. It controls the magnitude of the input signals C. It allows the model to better fit the data by D. It reduces the overfitting of the shifting the activation function model 7. Which of the following best describes the backpropagation algorithm? A. It is used to initialize the weights in a neural B. It is a method for optimizing network the weights using the chain rule C. It is a method for increasing the complexity of the D. It is a regularization technique network used in training 8. What is the purpose of using the gradient descent method in training neural networks? A. To select the optimal architecture for the network B. To minimize the loss function by iteratively adjusting weights C. To initialize the weights of the network D. To prevent overfitting 9. What happens if the learning rate is set too high during training? A. The model will learn very slowly B. The model may oscillate and fail to converge C. The model will be overfit to the training data D. The model will become too simple and underfit 10. In a neural network, what is the purpose of using a loss function? A. To determine the learning rate B. To measure the difference between the predicted and actual output C. To regularize the model D. To add non-linearity to the network 11. Which of the following is true about multi-layer perceptrons (MLPs)? A. They can solve both linear and non-linear B. They do not require an activation problems function C. They are the same as a single-layer perceptron D. They cannot be used for regression tasks 12. Which technique is used to prevent overfitting in neural networks? A. Increasing the number of epochs B. Reducing the number of layers C. Using dropout during training D. Decreasing the learning rate 13. What is the primary purpose of ensemble learning? A. To increase the accuracy of predictions by B. To reduce the complexity of combining multiple models individual models C. To simplify the model training process D. To reduce the amount of data required 14. In a Random Forest algorithm, how are individual trees typically trained? A. On the entire dataset B. On different subsets of the dataset with replacement (bootstrapping) C. On the same subset of the dataset D. On a random selection of features 15. Which of the following is a key advantage of the Random Forest algorithm? A. High interpretability of the model B. Reduced risk of overfitting compared to individual decision trees C. Reduced computational complexity D. Simplicity in understanding the decision-making process 16. What does the term 'bagging' stand for in the context of ensemble learning? A. Boosting aggregated generalized estimators B. Bootstrap aggregating C. Binary aggregation D. Base aggregating 17. Random Forest can handle which of the following types of data? A. Only numerical data B. Only categorical data C. Both numerical and categorical data D. Only text data 18. Bagging helps to reduce variance by A. Training on different subsets of the data B. Increasing the complexity of individual models C. Combining different types of models D. Using only one model type repeatedly 19. Boosting works by A. Training multiple models independently and B. Sequentially training models combining their predictions where each model tries to correct the errors of the previous one C. Using a single decision tree to predict outcomes D. Aggregating the predictions of a single model with varying parameters 20. Which of the following algorithms is a popular example of boosting? A. K-Nearest Neighbors B. AdaBoost C. Support Vector Machine D. Linear Regression Which statement accurately describes the AdaBoost algorithm? A. It builds a single, complex model to achieve high B. It combines predictions from accuracy multiple models using simple 21. average C. It adjusts the weights of misclassified D. It performs random sampling of samples to focus learning on difficult cases features to train models Which of the following is a key advantage of AdaBoost? A. It can significantly improve the performance B. It reduces the need for extensive 22. of weak learners data preprocessing C. It simplifies the model interpretation process D. It eliminates the need for hyperparameter tuning Which of the following best describes nonparametric methods in statistics? A. Methods that assume a specific form for the B. Methods that do not assume a underlying distribution specific form for the underlying 23. distribution C. Methods that require parameter estimation D. Methods that use a fixed number of parameters In nonparametric density estimation, the primary goal is to A. Estimate the mean of a population B. Determine the exact distribution 24. of the data C. Estimate the probability density function of D. Identify the variance of the data the data Which of the following is a common nonparametric density estimation method? 25. A. Linear regression B. Kernel density estimation C. Logistic regression D. ANOVA 26. A histogram estimator is based on A. Fitting a parametric model to the data B. Dividing the data into intervals and counting the number of observations in each interval C. Calculating the mean and variance of the data D. Applying a smoothing function to the data 27. Which of the following is a drawback of using histograms for density estimation? A. They are computationally expensive B. They are highly sensitive to the choice of bin width C. They require large sample sizes to be accurate D. They cannot handle large datasets 28. In kernel density estimation, the choice of kernel function affects A. The mean of the distribution B. The variance of the data C. The smoothness of the estimated density D. The number of bins used in the histogram 29. Which of the following is a common kernel function used in kernel density estimation? A. Exponential kernel B. Gaussian kernel C. Poisson kernel D. Binomial kernel 30. In k-nearest neighbor density estimation, the parameter 'k' represents A. The number of data points used for density B. The number of nearest estimation neighbors considered for each point C. The width of the kernel function D. The number of bins in the histogram 31. The main advantage of k-nearest neighbor density estimation is A. It requires many parameters B. It provides a smooth density estimate with a small number of neighbors C. It is simple and non-parametric D. It assumes a specific distribution shape 32. Which nonparametric method is most suitable for handling large datasets with unknown distribution shapes? A. Histogram estimator B. Kernel density estimator C. Parametric regression D. Logistic regression 33. What is the primary difference between histogram estimators and kernel estimators? A. Histograms use bins while kernels use B. Histograms require kernel smoothing functions functions while kernel estimators use bins C. Kernels are more sensitive to bin width than D. Histograms can handle large histograms datasets better than kernels 34. The choice of bandwidth in kernel density estimation affects A. The number of nearest neighbors used B. The size of the bins in histograms C. The smoothness versus the bias-variance D. The computational complexity of tradeoff of the density estimate the estimator 35. Which of the following is true about nonparametric methods in general? A. They always require more data than parametric B. They do not rely on methods assumptions about the form of the distribution C. They cannot handle large datasets efficiently D. They are only applicable to discrete data 36. In the context of nonparametric density estimation, what does the term 'smoothing' refer to? A. Reducing the number of data points used B. Adjusting the bandwidth or window size to control the density estimate’s smoothness C. Increasing the number of bins in a histogram D. Using a parametric model to fit the data 37. For a given dataset, increasing the number of bins in a histogram estimator typically results in A. A smoother density estimate B. A more jagged density estimate C. No change in the density estimate D. A less accurate estimate of the density 38. Which metric is most useful when dealing with an imbalanced dataset? A. Accuracy B. Precision C. Recall D. F1 Score 39. What does the precision of a model indicate? A. The proportion of true positive predictions B. The proportion of true positive out of all positive predictions predictions out of all actual positives C. The proportion of true negative predictions out of D. The proportion of false positive all negative predictions predictions out of all predictions 40. Which metric is calculated as the harmonic mean of precision and recall? A. Accuracy B. F1 Score C. AUC D. Precision 41. If a model has high recall but low precision, what does this imply? A. The model is good at predicting positives but B. The model is very accurate in its also predicts many false positives predictions C. The model only predicts negatives D. The model has equal performance in all classes 42. Which metric would be most useful to evaluate a model’s ability to rank positive instances higher than negative instances? A. Accuracy B. AUC (Area Under the ROC Curve) C. Precision D. Recall 43. What does the recall metric measure? A. The proportion of true positives among all B. The proportion of false positives actual positives among all predicted positives C. The proportion of true negatives among all actual D. The proportion of false negatives negatives among all actual positives 44. What does a higher AUC value indicate about a model? A. The model has higher precision B. The model has a higher recall C. The model performs better at distinguishing D. The model has higher accuracy between classes 45. Which of the following metrics is affected the most by class imbalance? A. Accuracy B. Recall C. Precision D. F1 Score What does 'false negative' refer to in a confusion matrix? A. An instance where the model incorrectly B. An instance where the model predicts the negative class when it is positive correctly predicts the negative 46. class C. An instance where the model incorrectly predicts D. An instance where the model the positive class when it is negative correctly predicts the positive class In which case is the F1 Score being beneficial? A. When false positives and false negatives B. When accuracy is sufficient for 47. have the same cost evaluation C. When only precision is important D. When the model’s speed is the primary concern If the model’s AUC score is 0.75, what does this indicate? A. The model has a random performance B. The model is better than 48. random but not excellent C. The model has perfect performance D. The model is performing poorly How do you interpret a precision of 0.8? A. 80% of the predicted positives are true B. 80% of the actual positives are 49. positives correctly predicted C. 80% of the actual negatives are correctly D. 80% of the predictions are correct predicted What does a high false positive rate indicate? A. Many negative instances are incorrectly B. Many positive instances are 50. classified as positive incorrectly classified as negative C. The model has high precision D. The model has a high recall 51. What does the Mean Squared Error (MSE) measure? A) The average of the squared differences B) The square root of the sum of between actual and predicted values squared differences between actual and predicted values C) The average of absolute differences between D) The square of the mean of actual and predicted values absolute differences between actual and predicted values 52. Which regression metric is more appropriate for predicting a model where the target variable has a skewed distribution? A) MSE B) MAE C) RMSE D) RMSLE 53. Which metric measures how much predictions deviate from actual values on average, ignoring the direction of the error? A) MSE B) MAE C) RMSE D) RMSLE 54. If the Mean Absolute Error (MAE) for a regression model is 5, what is the interpretation? A) On average, the predictions are off by 5 units. B) The square root of average squared errors is 5 units. C) The logarithmic error is 5. D) The squared error is 5. 55. Which metric penalizes larger errors more heavily? A) MSE B) MAE C) RMSE D) RMSLE 56. What does a lower RMSE indicate in a regression model? A) Better model performance B) Worse model performance C) More variance in the data D) Higher sensitivity to outliers 57. When comparing models with different error distributions, which metric is least likely to be influenced by outliers? A) MSE B) MAE C) RMSE D) RMSLE 58. Which metric is most appropriate when dealing with large-scale data where the target variable varies exponentially? A) MSE B) MAE C) RMSE D) RMSLE 59. What happens to the MAE if all the predictions are off by a constant amount in the same direction? A) It remains the same. B) It increases by a constant amount. C) It decreases by a constant amount. D) It becomes zero. 60. If the MAE for a model is 0, what can be inferred about the predictions? A) Predictions are exactly equal to the actual B) Predictions are perfectly values. uncorrelated with the actual values. C) Predictions are exponentially related to the actual D) Predictions are entirely random. values.

Foundations of Machine Learning PDF

Document Details

Tags

Related

Summary

Full Transcript