Many problems require a probability estimate as output. Logistic regression is an extremely efficient mechanism for calculating probabilities. Practically speaking, you can use the returned probability in either of the following two ways:
- "As is"
- Converted to a binary category.
Let's consider how we might use the probability "as is." Suppose we create a logistic regression model to predict the probability that a dog will bark during the middle of the night. We'll call that probability:
p(bark | night)
If the logistic regression model predicts a
p(bark | night) of 0.05,
then over a year, the dog's owners should be startled awake approximately
startled = p(bark | night) * nights 18 ~= 0.05 * 365
In many cases, you'll map the logistic regression output into the solution to a binary classification problem, in which the goal is to correctly predict one of two possible labels (e.g., "spam" or "not spam"). A later module focuses on that.
You might be wondering how a logistic regression model can ensure output that always falls between 0 and 1. As it happens, a sigmoid function, defined as follows, produces output having those same characteristics:
The sigmoid function yields the following plot:
Figure 1: Sigmoid function.
z represents the output of the linear layer of a model trained
with logistic regression, then sigmoid(z) will yield a value (a probability)
between 0 and 1. In mathematical terms:
- y' is the output of the logistic regression model for a particular example.
- z is b + w1x1 + w2x2 + ... wNxN
- The w values are the model's learned weights, and b is the bias.
- The x values are the feature values for a particular example.
Note that z is also referred to as the log-odds because the inverse of the
sigmoid states that
z can be defined as the log of the probability of
the "1" label (e.g., "dog barks") divided by the probability of the
"0" label (e.g., "dog doesn't bark"):
Here is the sigmoid function with ML labels:
Figure 2: Logistic regression output.
Click the plus icon to see a sample logistic regression inference calculation.
Suppose we had a logistic regression model with three features that learned the following bias and weights:
- b = 1
- w1 = 2
- w2 = -1
- w3 = 5
Further suppose the following feature values for a given example:
- x1 = 0
- x2 = 10
- x3 = 2
Therefore, the log-odds:
(1) + (2)(0) + (-1)(10) + (5)(2) = 1
Consequently, the logistic regression prediction for this particular example will be 0.731:
Figure 3: 73.1% probability.