Machine Learning MCQ PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document contains multiple-choice questions (MCQs) about machine learning topics, including decision trees and clustering, and regression algorithms. The questions cover different aspects of these techniques, from basic definitions to more advanced concepts and application scenarios.
Full Transcript
1\. What is a decision tree in machine learning? a\) A clustering algorithm b\) A regression and classification model c\) A dimensionality reduction technique d\) A data cleaning tool Answer: b 2\. Which of the following is a key component of a decision tree? a\) Branches and Leaves b\) Prin...
1\. What is a decision tree in machine learning? a\) A clustering algorithm b\) A regression and classification model c\) A dimensionality reduction technique d\) A data cleaning tool Answer: b 2\. Which of the following is a key component of a decision tree? a\) Branches and Leaves b\) Principal components c\) Clusters d\) Hidden layers Answer: a 3\. What is a leaf node in a decision tree? a\) The starting point of the tree b\) A node that contains features c\) A terminal node with an output value d\) A node that splits data into branches Answer: c 4\. Which of the following metrics is used to measure impurity in a decision tree? a\) R-squared b\) Entropy c\) Learning rate d\) Precision Answer: b 5\. What is the splitting criterion in a decision tree? a\) Method for pruning the tree b\) Rule for dividing data at each node c\) Algorithm to find the shortest path d\) Way to calculate prediction accuracy Answer: b 6\. Which algorithm is commonly used to build a decision tree? a\) K-means b\) CART c\) Gradient Descent d\) Naive Bayes Answer: b 7\. What does Gini Impurity represent in a decision tree? a\) The squared error for splits b\) The probability of incorrect classification at a node c\) The total entropy in the system d\) The number of leaf nodes in the tree Answer: b 8\. Which of the following is true for decision tree pruning? a\) It increases the depth of the tree b\) It is used to reduce overfitting c\) It reduces the training accuracy d\) It adds more branches to the tree Answer: b 9\. In a decision tree, what is the purpose of the \"information gain\"? a\) To measure data noise b\) To evaluate the importance of features c\) To determine the best attribute to split d\) To minimize classification errors Answer: c 10\. Which of these is NOT an advantage of decision trees? a\) Easy to interpret b\) Handles both categorical and numerical data c\) High computational efficiency for deep trees d\) Requires little data preprocessing Answer: c 11\. How does Random Forest improve upon decision trees? a\) By using bagging and multiple trees b\) By reducing the number of features c\) By applying boosting to a single tree d\) By using unsupervised learning Answer: a 12\. What is \"overfitting\" in the context of decision trees? a\) A tree that is too small and generalizes poorly b\) A tree that is too complex and memorizes the training data c\) A tree with high test accuracy and low training accuracy d\) A tree with too many categorical features Answer: b 13\. What does \"entropy\" measure in a decision tree? a\) Homogeneity of the data b\) The size of the dataset c\) The accuracy of the model d\) The number of leaf nodes Answer: a 14\. What is the CART algorithm? a\) Classification and Regression Trees b\) Clustered Attribute Reduction Technique c\) Categorical and Random Thresholds d\) Computational Attribute Ranking Tool Answer: a 15\. How does increasing the depth of a decision tree affect its performance? a\) Reduces overfitting b\) Reduces variance c\) Increases training accuracy but risks overfitting d\) Reduces complexity Answer: c 16\. Which of these is a method for preventing overfitting in decision trees? a\) Increasing the number of features b\) Early stopping during tree construction c\) Using a single tree with all features d\) Ignoring numerical data Answer: b 17\. What is the difference between classification and regression trees? a\) Classification trees are used for numerical predictions b\) Regression trees predict continuous values, classification trees predict categories c\) Classification trees use boosting, regression trees do not d\) Regression trees require more features than classification trees Answer: b 18\. Which parameter controls the maximum depth of a decision tree in scikit-learn? a\) max\_depth b\) min\_samples\_split c\) criterion d\) max\_features Answer: a 19\. What is the role of the min\_samples\_split parameter in decision tree construction? a\) Controls the minimum number of samples per branch b\) Determines the maximum tree depth c\) Specifies the minimum number of samples required to split a node d\) Reduces the computational time by pruning Answer: c 20\. What is clustering in machine learning? a\) A supervised learning technique b\) A dimensionality reduction method c\) A method to group similar data points d\) A type of regression analysis Answer: c 21\. Which of the following is a clustering algorithm? a\) Logistic Regression b\) K-means c\) Random Forest d\) Decision Tree Answer: b 22\. In clustering, what is the primary goal? a\) To predict future values b\) To group data into clusters based on similarity c\) To classify data into predefined categories d\) To minimize classification errors Answer: b 23\. Which type of learning does clustering fall under? a\) Supervised learning b\) Unsupervised learning c\) Semi-supervised learning d\) Reinforcement learning Answer: b 24\. Which of the following is NOT a clustering algorithm? a\) DBSCAN b\) Hierarchical clustering c\) Linear Regression d\) Gaussian Mixture Model Answer: c 25\. What is the role of the "centroid" in K-means clustering? a\) It determines the distance between points b\) It is the center of a cluster c\) It evaluates the model\'s accuracy d\) It reduces dimensionality Answer: b 26\. Which of the following is a density-based clustering algorithm? a\) K-means b\) DBSCAN c\) Hierarchical clustering d\) PCA Answer: b 27\. Which of the following metrics can be used to calculate the distance between points in clustering? a\) Mean squared error b\) Euclidean distance c\) Gini impurity d\) Precision Answer: b 28\. What does the term \"elbow method\" refer to in clustering? a\) A technique for reducing noise b\) A method for determining the optimal number of clusters c\) A way to split clusters in hierarchical clustering d\) A metric for measuring density in DBSCAN Answer: b 29\. Which of these problems might occur in K-means clustering? a\) Difficulty with linearly separable data b\) Misclassification due to labeled data c\) Sensitivity to initial centroid placement d\) High computational complexity for shallow trees Answer: c 30\. What is a primary limitation of K-means clustering? a\) It cannot handle high-dimensional data b\) It assumes clusters are spherical and equal in size c\) It is too slow for small datasets d\) It does not require a pre-specified number of clusters Answer: b 31\. In clustering, what does a dendrogram represent? a\) A graphical representation of hierarchical clustering b\) A chart to find the optimal number of clusters c\) A plot of the cluster centroids d\) A metric for cluster density Answer: a 32\. What is the role of pre-processing in clustering analysis? a\) To normalize or standardize data for better distance measurement b\) To assign labels to data points c\) To train a supervised model d\) To eliminate the need for distance calculations Answer: a 33\. What is the primary difference between regression and classification? a\) Regression predicts discrete values, while classification predicts continuous values b\) Regression predicts continuous values, while classification predicts discrete values c\) Both predict discrete values but differ in algorithmic approach d\) Regression is used for clustering, while classification is used for reinforcement learning Answer: b 34\. Which of the following is a regression algorithm? a\) Logistic Regression b\) Linear Regression c\) Decision Trees for classification d\) K-means Answer: b 35\. Which of these problems is best solved using classification? a\) Predicting house prices b\) Forecasting stock prices c\) Diagnosing whether an email is spam or not d\) Estimating fuel consumption based on engine size Answer: c 36\. What is the output of a regression model? a\) Categories or classes b\) A probability distribution c\) A continuous numerical value d\) Cluster centroids Answer: c 37\. What type of data does a classification algorithm typically work with? a\) Numerical values only b\) Categorical values only c\) Both numerical and categorical values d\) Binary data only Answer: c 38\. Which metric is commonly used to evaluate regression models? a\) Confusion matrix b\) Mean Squared Error (MSE) c\) Precision and Recall d\) Silhouette score Answer: b 39\. Which of the following is NOT a common evaluation metric for classification? a\) Accuracy b\) Recall c\) R-squared d\) F1-score Answer: c 40\. What is multicollinearity in the context of regression? a\) When independent variables are highly correlated with each other b\) When the dependent variable is categorical c\) When a model predicts multiple output variables d\) When the model overfits the training data Answer: a 41\. Which algorithm is typically used for both regression and classification tasks? a\) K-means clustering b\) Support Vector Machine (SVM) c\) Principal Component Analysis (PCA) d\) Decision Trees Answer: d 42\. Which of the following techniques is specifically used in regression but not classification? a\) Lasso and Ridge Regression b\) Confusion Matrix c\) Cross-entropy loss d\) Precision-Recall tradeoff Answer: a 43\. What is overfitting in classification or regression models? a\) When the model performs well on the test data but poorly on the training data b\) When the model performs well on the training data but poorly on the test data c\) When the model uses too few features d\) When the model's performance is consistent across datasets Answer: b 44\. Which of these is a key assumption of linear regression? a\) Features are independent and identically distributed b\) The dependent variable must be categorical c\) There is a linear relationship between features and the target variable d\) Data must be clustered before training Answer: c