Logistic Regression is a valuable classifier for its interpretability. This code snippet provides a cut-and-paste function that displays the metrics that matter when logistic regression is used for binary classification problems. Everything here is provided by scikit-learn already, but can be time consuming and repetitive to manually call and visualize without this helper function.
evalBinaryClassifier() takes a fitted model, test features, and test labels as input. It returns the F1 score, and also prints dense output that includes:
For an explanation of how to interpret these outputs, skip to after the code block.
The Confusion Matrix describes the predictions that the model made as either True (correct) or False (wrong). A perfect model would have only True Positives and True Negatives. A purely random model will have all 4 categories in similar quantities. If you have a class imbalance problem, typically you'll see many Negatives (True and False both) and few Positives, or vice versa. For a great example of this, see my project on Predicting Disruption.
The Center Graph is the distribution of predicted probabilities of a Positive Outcome. For example, If your model is 100% sure a sample is positive, if will be in the far right bin. The two different colors indicate the TRUE class, not the predicted class. A perfect model would show no overlap at all between the green and red distributions. A purely random model will see them overlap each other entirely.
The decision boundary decides the model's final predictions. In scikit-learn, the default decision boundary is .5; that is, anything above .5 is predicted as a 1 (positive) and anything below .5 is predicted as a 0 (negative). This is an important detail for understanding your model.
The Receiver Operating Characteristic curve describes all possible decision boundaries. The green curve represents the possibilities, and the trade off between the True Positive Rate and the False Positive Rate at different decision points. The extremes are easy to understand: your model could lazily predict 1 for ALL samples and achieve a perfect True Positive Rate but it would also have a False Positive Rate of 1. Similarly, you could reduce your False Positive rate to zero by predicting everything as Negative, but your True Positive Rate would also be zero. The value in your model is its ability to increase the True Positive Rate faster than it increases the False Positive Rate.
A perfect model would be a vertical line up the y-axis (100% True Positives, 0% False Positives). A purely random model would right on the blue dotted line (to find more True Positives means an equal number of False Positives).
The Blue dot represents the .5 decision boundary that is currently determining the Confusion Matrix. Changing this is a useful way to adjust the sensitivity of your model when one error type is worse than another.
Healthcare is full of these decisions: Incorrectly diagnosing cancer is WAY better than incorrectly diagnosing good health. In that case, we'd want a very low decision boundary, which is to say, only predict a negative result (no cancer) if we're VERY sure about it.
If you chose a different boundary using this same model (ex: .3 instead of .5), the blue dot would move up and to the right along the green curve. The new boundary means we'd capture more True Positives, and also more False Positives. This is also easily visualized as the blue line in the center chart moving to the left until it's on 0.3: There would be more "green" bins to the right of the boundary, but also more "red" bins.
scikit-learn does not have a built-in way to adjust the decision boundary, but this can be done easily by calling the predict_proba() method on your data, and then manually coding a decision based on the boundary of your choice.
Did this help you understand your model? Could it be improved? Let me know, I'd love to hear from you!