Logistic Regression For Beginners using Python

SHUBHAM YADAV
6 min readJan 3, 2021

When an email lands in your inbox, how does your email service know whether it’s a real email or spam? This evaluation is made millon’s of times per day and there is one of the way it can be done is with Logistic Regression.

Logistic Regression is a supervised learning machine learning algorithm that uses regression to predict the continuous probability, ranging from 0 to 1, of a data sample belongs to a specific category or class. Based on that probability, the sample is classified as belonging to the most probable class.

In our spam filtering example, a Logistic Regression algorithm predict the probability of the incoming email being spam. If the predicted probability of email is equal to 0.5 or greater than, then it will be classified as spam ( positive class ) with label 1. On the other hand if the predicted probability of email is less than 0.5 is classified as ham (a real email). We would call ham the negative class, with label 0. The act of dealing, with this type of data have two classes are called as binary classification.

Some other example of what we can classify with Logistic Regression include :

  • Disease survival — Will a patient, 5 years after treatment for a disease, still be alive?
  • Customer conversion — Will a customer arriving on a sign-up page enroll in a service?

Consider we want to predict whether the student will pass or fail.And first step for making the prediction we have to predict the probability of each student passing. Why we not using the Linear Regression for the prediction, you might ask? Let’s try it.

Recall that in Linear Regression, we fit the regression line of the following form to the data:

y = b0+b1*x1+b2*x2+…..+bn*xn

where ,

  • y is the value we are trying to predict
  • b_0 is the intercept of the regression line
  • b_1, b_2, … b_n are the coefficients of the features x_1, x_2, … x_n of the regression line

For our data points y is either 1 (passing), or 0 (failing), and we have one feature, num_hours_studied. Below we fit a Linear Regression model to our data and plotted the results, with the line of best fit in red.

A problem quickly arises. For low values of num_hours_studied the regression line predicts negative probabilities of passing, and for high values of num_hours_studied the regression line predicts probabilities of passing greater than 1. These probabilities are meaningless! We get these meaningless probabilities since the output of a Linear Regression model ranges from -∞ to +∞.

We saw that the output of a Linear Regression model does not provide the probabilities we need to predict whether a student passes the final exam. Step in Logistic Regression!

In Logistic Regression we are also looking to find coefficients for our features, but this time we are fitting a logistic curve to the data so that we can predict probabilities. Described below is an overview of how Logistic Regression works. Don’t worry if something does not make complete sense right away, we will dig into each of these steps in further detail in the remaining section!

To predict the probability of a data sample belonging to a class, we:

  1. initialize all feature coefficients and intercept to 0
  2. multiply each of the feature coefficients by their respective feature value to get what is known as the log-odds
  3. place the log-odds into the sigmoid function to link the output to the range [0,1], giving us a probability

By comparing the predicted probabilities to the actual classes of our data points, we can evaluate how well our model makes predictions and use gradient descent to update the coefficients and find the best ones for our model.

To then make a final classification, we use a classification threshold to determine whether the data sample belongs to the positive class or the negative class.

Log-Odds

In Linear Regression we multiply the coefficients of our features by their respective feature values and add the intercept, resulting in our prediction, which can range from -∞ to +∞. In Logistic Regression, we make the same multiplication of feature coefficients and feature values and add the intercept, but instead of the prediction, we get what is called the log-odds.

The log-odds are another way of expressing the probability of a sample belonging to the positive class, or a student passing the exam. In probability, we calculate the odds of an event occurring as follows:

odds = p(event occurring)/p(event not occurring)

The odds tell us how many more times likely an event is to occur than not occur. If a student will pass the exam with probability 0.7, they will fail with probability 1 - 0.7 = 0.3. We can then calculate the odds of passing as

odds of passing = 0.7/0.3 = 2.33

The log-odds are then understood as the logarithm of the odds!

Log odds of passing = log(2.33) = 0.847

For our Logistic Regression model, however, we calculate the log-odds, represented by z below, by summing the product of each feature value by its respective coefficient and adding the intercept. This allows us to map our feature values to a measure of how likely it is that a data sample belongs to the positive class.

z = b0​+b1​x1​+⋯+bnxn

  • b_0 is the intercept
  • b_1, b_2, … b_n are the coefficients of the features x_1, x_2, … x_n

This kind of multiplication and summing is known as a dot product.

We can perform a dot product using numpy‘s np.dot() method! Given feature matrix features, coefficient vector coefficients, and an intercept, we can calculate the log-odds in numpy as follows:

log_odds = np.dot(features, coefficients) + intercept

np.dot() will take each row, or student, in features and multiply each individual feature value by its respective coefficient in coefficients, summing the result, as shown below.

We then add in the intercept to get the log-odds!

import numpy as np
from exam import hours_studied, calculated_coefficients, intercept
# Create your log_odds() function here
def log_odds(features, coefficients, intercet):
return np.dot(features, coefficients)+intercept
# Calculate the log-odds for the Codecademy University data here
calculated_log_odds = log_odds(hours_studied, calculated_coefficients, intercept)
print(calculated_log_odds)

OUTPUT

[[-1.76125712]
[-1.55447221]
[-1.3476873 ]
[-1.14090239]
[-0.93411748]
[-0.72733257]
[-0.52054766]
[-0.31376275]
[-0.10697784]
[ 0.09980707]
[ 0.30659198]
[ 0.51337689]
[ 0.7201618 ]
[ 0.92694671]
[ 1.13373162]
[ 1.34051653]
[ 1.54730144]
[ 1.75408635]
[ 1.96087126]
[ 2.16765617]]

Sigmoid Function

How did our Logistic Regression model create the S-shaped curve we previously saw? The answer is the Sigmoid Function.

The Sigmoid Function is a special case of the more general Logistic Function, where Logistic Regression gets its name. Why is the Sigmoid Function so important? By plugging the log-odds into the Sigmoid Function, defined below, we map the log-odds z to the range [0,1].

h(z) = 1 / (1+e^−z)
  • e^(-z) is the exponential function, which can be written in numpy as np.exp(-z)

This enables our Logistic Regression model to output the probability of a sample belonging to the positive class, or in our case, a student passing the final exam!

import numpy as np
from exam import calculated_log_odds
# Create your sigmoid function here
def sigmoid(z):
denominator = 1 / (1+np.exp(-z))
return denominator
# Calculate the sigmoid of the log-odds here
probabilities = sigmoid(calculated_log_odds)
print(probabilities)

OUTPUT

[[0.14663296]
[0.17444128]
[0.20624873]
[0.24215472]
[0.28209011]
[0.32578035]
[0.37272418]
[0.42219656]
[0.47328102]
[0.52493108]
[0.57605318]
[0.62559776]
[0.67264265]
[0.71645543]
[0.7565269 ]
[0.79257487]
[0.82452363]
[0.85246747]
[0.87662721]
[0.89730719]]

--

--

SHUBHAM YADAV

SDE - 1 @Trademo | Machine Learning Enthusiast | SIH 2020 Finalist