Everything You To Know About Logistic Regression — Part 1

4 min readMay 12, 2024

Logistic regression is used to estimate the probability of occurrence of an event based on multiple factors.

Need for Logistic Regression:

Linear regression is used to predict the values of a dependent variable using the independent variable, we can explain the impact of the change in dependent variable on the independent variable.

Problem : The dependent variable in some cases are limited to binary [0 & 1] and is only limited to 2 classes whereas the independent variable may have any number of classes and linear regression is designed to solve the problem by minimizing MSE which is not a good fit in this problem.

Why Logistic Regression and not Logistic Classification ?

Logistic Regression returns a continuous values as an output for the input values which are later converted into [0 & 1] based on the threshold value.

Pre requisites Terminology for Log Reg:

Probability :

It is the ratio of the favourable outcomes to the total outcomes and returns the chances of occurrence of the favourable outcomes out of 1.

Probability = Favourable Outcomes / Total Outcomes

Probabilities only ranges from 0 to 1, we can get the linear predictions out of the range.

Odds :

It is the ratio of probability of occurrence of an event to the probability of non occurrence of an event.

Odds = p(occurrence) / 1 – p(not occurrence)

The range of odds are pretty bad as it goes from 0 to infinity, where any value of odds below 1 is not good and any value of odds above 1 to infinity and better and better odds.

Log Odds :

Log of odds is the natural log of odds, It solves the problem of fitting a linear model of probabilities, It is the log of the ratio of two odds being compared.

Odds Ratio = Odds Y / Odds X

Example -

If OR > 1, i.e. O(Y) > O(X) → group X better than group Y.

If OR < 1, i.e. O(Y) < O(X) → group Y better than group X.

If OR = 1, i.e. O(Y) = O(X) → both are same.

Why we use Log odds instead of Probability or Odds?

The problem with using probability is that probability does not represent the constant effect of X (independent variables) therefore we use Odds but the range of odds is non linear as 0–1 are bad odds and 1 – infinity are good odds, hence we use log odds which has a linear range and can easily represent the effect of X.

LOGISTIC REGRESSION :

It is used to analyze the relationship between dichotomous dependent variable and categorical or numeric independent variables. Logistic Regression combines all independent variables to estimate probability that an event will occur.

Logistic Regression follows the Sigmoid function.