Mistake 7: Only reporting accuracy

Only reporting accuracy when assessing the performance of a model.


The main purpose when assessing the performance of a model is to understand its generalization capabilities. This evaluation is done mainly through the use of performance metrics (accuracy, recall, precision, etc.). Each metric captures different aspects of the problem at hand. However, sometimes only accuracy is reported which can lead to misleading interpretations. Let’s say your test set has \(99\) positive instances and \(1\) negative instance. A simple model that always predicts positive regardless of the input will have an accuracy of \(99\%\). However, it will never be able to detect the negative class which is the one you may be the most interested in. By only computing the accuracy, you may think that the model is very good. However, when looking at the recall, it will be \(0\%\) for the negative class. Computing other metrics besides accuracy not only gives you a more complete view of the behavior of the model but also allows you to identify possible issues in the data (e.g., imbalanced classes).

The following code shows the use of classification_report() which computes several metrics for each class (\(0\), \(1\), \(2\)).

# Load the wine dataset.
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data,
                                                    data.target,
                                                    test_size=0.5,
                                                    random_state = 123)
dt = DecisionTreeClassifier(random_state =123)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(classification_report(y_test, y_pred))
#>>               precision    recall  f1-score   support
#>> 
#>>            0       0.83      0.97      0.89        30
#>>            1       0.88      0.77      0.82        30
#>>            2       0.89      0.86      0.88        29
#>> 
#>>     accuracy                           0.87        89
#>>    macro avg       0.87      0.87      0.86        89
#>> weighted avg       0.87      0.87      0.86        89

Computing several metrics allows you to have a better understading of the models performance.