Why accuracy is not a good metric for scoring classification models?
Accuracy of machine learning models trained to classify data into discreet categories is the proportion of samples the model is correctly able to classify. For example, in data that contains two categories, if the model is able to correctly predict 45 out of 50 samples, then the accuracy of the model is 90%.
This scoring metric can be deceiving when the data is skewed, i.e., when one category has very few samples as compared to the other category. Let’s take an example where we have data from 200 samples. Out of these 200 samples, 20 belong to category A and 180 belong to category B. In this case, even if the classification model predicts that all samples belong to category B, the accuracy score of the model would be 90% as it correctly predicts the 180 samples that belong to category B.
Therefore, in cases where one class is represented by a minority of samples and there are only few categories, the accuracy score is not the scoring metric we should use. Instead, we should use something called as precision and recall values, and also get the confusion matrix. A separate post about this would be written.