In this section, we'll define the primary building blocks of the metrics we'll use to evaluate classification models. But first, a fable:
An Aesop's Fable: The Boy Who Cried Wolf (compressed)
A shepherd boy gets bored tending the town's flock. To have some fun, he cries out, "Wolf!" even though no wolf is in sight. The villagers run to protect the flock, but then get really mad when they realize the boy was playing a joke on them.
[Iterate previous paragraph N times.]
One night, the shepherd boy sees a real wolf approaching the flock and calls out, "Wolf!" The villagers refuse to be fooled again and stay in their houses. The hungry wolf turns the flock into lamb chops. The town goes hungry. Panic ensues.
Let's make the following definitions:
- "Wolf" is a positive class.
- "No wolf" is a negative class.
We can summarize our "wolf-prediction" model using a 2x2 confusion matrix that depicts all four possible outcomes:
True Positive (TP):
False Positive (FP):
False Negative (FN):
True Negative (TN):
A true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class.
A false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class.
In the following sections, we'll look at how to evaluate classification models using metrics derived from these four outcomes.