AI Exam 2 Sample Questions PDF

College of Engineering Special Topics (AI) Sample Questions – Exam 2 Q1. Define the following machine-learning terms in your own words (a) Training set Answer A set of input–output pair examples, used as input to a machine learning program to create a hypothesis. (b) Hypothesis Answer In machine learning, a hypothesis is a function, learned from the training data and a member of the hypothesis space, that maps inputs to outputs. Q2. Describe the differences between supervised, unsupervised, and reinforcement learning. Answer In supervised learning, the training data consists of input–output pairs, where the labeled outputs are what we are trying to learn. In unsupervised learning, there is no labeled output, and the goal is to find patterns or clusters in the input. In reinforcement learning, the learning is given positive or negative rewards at certain points, and the goal is to maximize rewards. Q3. Similar questions to define a term and method for the topics covered in exam: Intro to ML, linear regression, logistic regression, neural network, backpropagation, gradient descent, clustering, centroid. Q4. You train a linear classifier on 1,000 training points and discover that the training accuracy is only 50%. Which of the following, if done in isolation, has a good chance of improving your training accuracy? Add new features; Train on more data Train on less data Q5. Consider a large data set of the medical profiles of cancer patients with no labels. The model has to learn whether there are different groups of such patients for which separate treatments may be tailored. Unsupervised learning Supervised learning Q6. You have the past data on the performance of two cricket teams; based on different parameters and the match results, you have to predict which team will win. Unsupervised learning Supervised learning Q7. Machine Learning is a field of AI consisting of learning algorithms that.............. a) At executing some task b) Over time with experience c) Improve their performance d) All of the above Q8. Machine learning algorithms build a model based on sample data, known as................. a) Training Data b) Transfer Data c) Data Training d) None of the above Q9.................... algorithms enable the computers to learn from data, and even improve themselves, without being explicitly programmed. a) Deep Learning b) Machine Learning c) Artificial Intelligence d) None of the above Q10. Linear Regression is a machine learning algorithm based on _____. a) unsupervised learning b) supervised learning. c) reinforcement learning d) none of these Q11. Regression models a target prediction value based on _____. a) dependent variable b) independent variables c) independent value d) dependent value Q12. In Linear Regression RMSE stands for_________. a) Root Mean Squared Error b) Read Mean Squared Error c) Root Mode Squared Error d) none of these Q13. The correlation coefficient is used to determine: a) A specific value of the y-variable given a specific value of the x-variable b) A specific value of the x-variable given a specific value of the y-variable c) The strength of the relationship between the x and y variables d) None of these Q14. In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is: a) the correlation model b) the regression model c) used to compute the correlation coefficient d) None of these alternatives is correct. Q15. What is called the average squared difference between classifier predicted output and actual output? a) Mean relative error b) Mean squared error c) Mean absolute error d) Root mean squared error Q16. If Linear regression model perfectly first i.e., train error is zero, then _____________________ a) Test error is also always zero b) Test error is non-zero c) Couldn’t comment on Test error d) Test error is equal to Train error Q17. Assume we train a model on a given dataset. If we were to remove 50% of samples from the dataset and re-train the model from scratch, the new model will be more likely to overfit to its training data than the old one. a) True b) False Q18. If {v1, v2, ··· , vn}and {w1, w2, ··· , wn}are linearly independent, then {v1 +w1, v2 +w2, ··· , vn + wn} are linearly independent. a) True b) False Figure 1: The following graphic will be used as a representation of bias and variance. Imagine that a true/correct model is one that always predicts a location at the center of each target (being farther away from the center of the target indicates that a model’s predictions are worse). We retrain a model multiple times, and make a prediction with each trained model. For each of the targets, determine whether the bias and variance is low or high with respect to the true model. Q19. In Figure 1, subplot I, how are bias and variance related to the true model? a) High bias, High variance b) High bias, Low variance c) Low bias, High variance d) Low bias, Low variance Q20. In Figure 1, subplot II, how are bias and variance related to the true model? a) High bias, High variance b) High bias, Low variance c) Low bias, High variance d) Low bias, Low variance Q21. In Figure 1, subplot III, how are bias and variance related to the true model? a) High bias, High variance b) High bias, Low variance c) Low bias, High variance d) Low bias, Low variance Q22. In Figure 1, subplot IV, how are bias and variance related to the true model? a) High bias, High variance b) High bias, Low variance c) Low bias, High variance d) Low bias, Low variance Q23. K-fold cross-validation can be used for hyperparameter tuning. a) True b) False Q24. True/False: We use stochastic gradient descent instead of gradient descent in order to speed up per-iteration computation at the expense of more variance. a) True b) False Q25. The cross-validation error is a better estimate of the true error than the training error. a) True b) False Q26. Solving the k-means objective is an unsupervised learning problem. a) True b) False Q27. Which of the following does not increase the complexity of a neural network? a) Adding more layers b) Increasing the hidden layer size c) Reducing the strength of the regularizer d) Reducing the learning rate Q28. Select the valid reasons that ReLUs may be preferred over sigmoids (logistic functions) as activation functions for the hidden layers of a neural network. a) The forward and backward passes are computationally cheaper with ReLUs than with sigmoid. b) The cost function of a ReLU-based neural network, trained with the squared-error loss, will be con- vex because the ramp (ReLU) function is convex. c) ReLUs are less vulnerable to the vanishing gradient problem than sigmoids. d) The cost function of a ReLU-based neural network is smooth with a gradient defined everywhere in weight space, whereas the cost function of a sigmoid-based neural network is not. Q29. Which steps are customarily (usually) part of training a neural network’s weights with backpropagation? a) Computing the partial derivatives of each weight with respect to each weight in the previous layer. b) Computing the partial derivatives of a cost function or loss function with respect to each weight. c) Computing the partial derivatives of a cost function or loss function with respect to each hidden unit value. d) Computing the partial derivatives of a cost function or loss function with respect to each input feature. a) False. Weights do not depend on each other. b) True. That’s what we need for gradient descent. c) True. These values are needed as intermediate results to obtain the gradients with respect to the weights. d) False. We can’t change the training points, so these derivatives are not useful. Q30. You are training a neural network with sigmoid activation functions. You discover that you are suffering from the vanishing gradient problem: with most of the training points, most of the components of the gradients are close to zero. It is causing training to be very slow. How could you combat this problem? a) Make the network deeper (more layers) b) Make the network shallower (fewer layers). c) Initialize the weights with larger values. d) Use ReLU activations instead of sigmoids. a) False. Making sigmoid-based neural networks deeper without other mitigations can exacerbate the vanishing gradient problem as a natural byproduct of the chain rule. (The derivative of a sigmoid is between 0 and 0.25). b) True. Reducing the number of layers is likely to increase the norm of the gradient for similar reasoning as above. c) False. Using larger initial weights will not necessarily increase the norm of gradients. The gradient of a sigmoid vanishes as the input deviates from zero, so larger initial weights could lead to smaller gradients. d) True. Although ReLU gradients can be zero, it is rarely true in practice that most of the components are zero for most of the training points. A substantial number of components of the gradients will be 1, thereby helping gradient descent to make progress. Q31. Select the true statements about the running time of k-means clustering of n sample points with d features each. e) The step that updates the cluster means 𝜇𝑖 , given fixed cluster assignments 𝑦𝑗 , can be implemented to run in at most 𝑂(𝑛𝑑) time. f) Increasing k always increases the running time. g) The step that updates the cluster assignments y j , given fixed cluster means 𝜇𝑖 , can be implemented to run in at most 𝑂(𝑛𝑘𝑑) time. h) The k-means algorithm runs in at most 𝑂(𝑛𝑘𝑑) time. a) True. Each of the n sample points contributes to one cluster, and it takes O(d) time to add it to its cluster’s sum. b) False. Consider k = n; the algorithm will terminate after the first iteration. c) True. The main expense is calculating the distance from n sample points to k centroids in d-dimensional space. d) False. The algorithm may not converge in a constant number of steps. Sometimes it needs exponentially many steps! *Credit: These questions are carefully gathered from different AI/ML courses: UC Berkeley CS 189/289A, Stanford CS229, University of Washington CSE 546.

AI Exam 2 Sample Questions PDF

Document Details

Tags

Related

Summary

Full Transcript