1.1regressionANDclassification.pdf

Full Transcript

▪ The outputs are finite and the developed model must assign a single class to new inputs. ▪ the dataset is a collection of labeled-data records in the form: {independent variables as inputs, and the associated classes (i.e., labels) as outputs}. The objective is to develop a machine learning mo...

▪ The outputs are finite and the developed model must assign a single class to new inputs. ▪ the dataset is a collection of labeled-data records in the form: {independent variables as inputs, and the associated classes (i.e., labels) as outputs}. The objective is to develop a machine learning model to relate the inputs to the outputs, and to predict the class of new inputs. ▪ In practice, the dataset is divided into two sets, which are the training and the testing sets. ▪ The training set is employed to develop the classification model, while the testing set is utilized afterwards to evaluate the accuracy of the developed model. ▪ For binary classification models, the outputs of the model can be presented in a confusion matrix form. ▪ The confusion matrix shows how many instances are correctly classified by the developed model, as seen in the following figure for a two-class problem. True Predicted 0 1 1 1 1 0 0 0 1 1 0 0 0 1 1 1 Compute the confusion matrix 0 0 1 1 Predicted (Model output) 0 1 Actual 0 TN FP (desired output) 1 FN TP Precision The number of true positives (i.e., the number of correctly labeled instances) to a specific class, divided by the total number of positive predictions. Precision The number of true positives (i.e., the number of correctly labeled instances) to a specific class, divided by the total number of instances assigned to this class. The precision is a measure of the model’s exactness. Precision The number of true positives (i.e., the number of correctly labeled instances) to a specific class, divided by the total number of positive predictions. Recall The number of true positives to a specific class, divided by the total number of instances actually in this class. Precision The number of true positives (i.e., the number of correctly labeled instances) to a specific class, divided by the total number of positive predictions. The recall is a measure of the model’s completeness. Recall The number of true positives to a specific class, divided by the total number of instances actually in this class. F-score measures the weighted average of precision and recall. ▪ One example is the historical dataset of real estate values. If the characteristics and corresponding prices for many houses within a certain city are provided, can the price of a different house in this area be predicted by its characteristics? ▪ One example is the historical dataset of real estate values. If the characteristics and corresponding prices for many houses within a certain city are provided, can the price of a different house in this area be predicted by its characteristics? ▪ The evaluation metric that is routinely calculated to judge the model’s performance is the mean squared error (MSE), as given by the following equation: Where y is the output of the developed regression model, d is the desired output, and n is the total number of the instances. Where y is the output of the developed regression model, d is the desired output, and n is the total number of the instances. ▪ In general, when exposed to more observations, the model improves its predictive performance. ▪ However, too much adaptability will force the model to learn the noise within the training data rather than the underlying input/output relations. ▪ Therefore, the resultant model overfits the training set, and will not perform equally well on the testing set. ▪ Overfitting happens when the model highly adapts to the training set but by doing so fails on the testing set. Croteau et al. (2017) explain, “overfitting will impact negatively on the degree of generalization to new data and thus must be avoided in order for solutions to be useful for practical application” (p. 306) ▪ An efficient machine learning model attempts to decrease generalization errors and thus have good predictions on data that the model was not trained for. ▪ On the other hand, underfitting happens when the model does not capture enough of the inherent structure in the training data, which results in poor performance with both training and testing sets.

Use Quizgecko on...
Browser
Browser