Evaluation - Keypoints PDF
Document Details
Uploaded by SurrealEuphoria
Tags
Summary
This document discusses model evaluation, focusing on different evaluation techniques, especially underfitting, perfect fit, and overfitting concepts. It explains prediction and reality, using a confusion matrix example. The importance of metrics like Precision, Recall, and F1 Score is also outlined.
Full Transcript
**[Evaluation]** Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. **[Evaluation is the process of understanding the reliability of any AI model, based on outputs by...
**[Evaluation]** Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. **[Evaluation is the process of understanding the reliability of any AI model, based on outputs by feeding test dataset into the model and comparing with actual answers.]** There can be different Evaluation techniques, depending of the type and purpose of the model. Observe the following graphs in which the blue line talks about the model's output while the green one is the actual output along with the data samples. Figure 1 Figure 2 Figure 3 - Figure 1:In the first diagram,The model's output does not match the true function at all. Hence the model is said to be **[underfitting]** and its accuracy is lower. - Figure 2:In the second one, the model's performance matches well with the true function which states that the model has optimum accuracy and the model is called a **[perfect fit]**. - Figure 3:In the third case, model performance is trying to cover all the data samples even if they are out of alignment to the true function. This model is said to be **[overfitting]** and this too has a lower accuracy. Remember that It's not recommended to use the data we used to build the model to evaluate it. This is because our model will simply remember the whole training set, and will therefore always predict the correct label for any point in the training set. This is known as overfitting. To understand the efficiency of any model, we need to check if the predictions which it makes are correct or not. Thus, there exist two conditions which we need to ponder upon: **[Prediction and Reality]**. The prediction is the output which is given by the machine and the reality is the real scenario on which the prediction has been made. Once the model is evaluated thoroughly, it is then deployed in the form of an app which people can use easily. The result of comparison between the prediction and reality can be recorded in what we call the confusion matrix. The confusion matrix allows us to understand the prediction results. Note that it is not an evaluation metric but a record which can help in evaluation. ![](media/image2.png) Prediction and Reality can be easily mapped together with the help of this confusion matrix. **[Evaluation Methods]** ----------------------- ![](media/image4.png) ----------------------- **[Which Metric is Important? ]** Even with a fairly high accuracy value, the parameter accuracy is useless for us as the actual cases. Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a **False Negative** can cost us a lot and is risky too. Imagine no alert being given even when there is a **Forest Fire**. The whole forest might burn down. Another case where a **False Negative** can be dangerous is **Viral Outbreak**. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people. On the other hand, there can be cases in which the **False Positive condition** costs us more than False Negatives. One such case is **Mining**. Imagine a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, False Positive case (predicting there is treasure but there is no treasure) can be very costly. Similarly, let's consider a model that predicts that a **mail is spam** or not. If the model always predicts that the mail is spam, people would not look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not spam) would have a high cost. ![](media/image6.png) To conclude the argument, we must say that if we want to know if our model's performance is good, we need these two measures: Recall and Precision. For some cases, you might have a High Precision but Low Recall or Low Precision but High Recall. But since both the measures are important, there is a need of a parameter which takes both Precision and Recall into account. F1 Score F1 score can be defined as the measure of balance between precision and recall. An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall. In that case, the F1 score would also be an ideal 1 (100%). It is known as the **[perfect value for F1 Score]**. As the values of both Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1. In conclusion, we can say that **[a model has good performance if the F1 Score for that model is high.]**