What is Logistic Regression in Statistics for Data Science?

Default Avatar
Dr Wajid Khan
Mar 28, 2025 · 24 mins read

Imagine you’re trying to guess something simple—like whether your friend will bring sweets to school or if it’ll rain this afternoon. Numbers can help, but not with a straight answer like “50 sweets.” You need a “yes” or “no”! That’s where logistic regression swoops in—a brilliant tool in statistics that predicts binary outcomes by giving you a chance between 0 and 1. It’s like flipping a magic coin that tells you the odds of something happening!

In data science, logistic regression is a star for answering yes-or-no questions. Will a toy sell? Will a plant grow? It’s perfect when you want a clear choice, not a number. This guide is for students just starting out—full of easy explanations, fun examples like guessing picnic weather, and questions to spark your curiosity. Let’s dive into logistic regression and see how it turns clues into clever guesses!

Definition

Logistic regression is a way to predict a yes-or-no answer (or other categories) using numbers you already know. Unlike linear regression, which guesses numbers like sales or height, logistic regression tackles binary outcomes—things with just two choices, like “yes” or “no”. It gives you a probability (a number between 0 and 1) that something will happen, then helps you decide which side it falls on.

Imagine you’re wondering if a friend will come to your party. You might use clues like how far they live or if it’s raining.

Logistic regression takes those clues (called independent variables) and predicts the dependent variable—will they come (1) or not (0)?

It uses a special curvy line called the sigmoid function to turn any number into a probability between 0 and 1.

In data science, it’s super useful. Shops might predict if a customer will buy, doctors if a patient will recover, or you if a kite will fly. It’s different from counting because it deals with chances, not amounts. Computers make it exciting—tools like Python or R can crunch the numbers fast.

You spot logistic regression all around—guessing if it’ll be sunny or if a game ends in a win. It’s like a yes-or-no detective in your pocket!

Work Book PDF

Download Workbook PDF

Logistic regression predicts binary outcomes (yes/no) using input variables. Here’s how it works:

Probability (p)
1.0 |               ______-------
    |            __--
    |          _/
0.5 |--------*------------------ Decision Boundary
    |      _/
    |    _/
    |__--
0.0 |___________________________► Predictor (X)
     Low       Medium       High

Key Elements:

  1. Sigmoid Curve: The S-shaped line that maps any input to 0-1 probability
  2. Decision Boundary:
    • At p=0.5 (where the * is located)
    • Above: Predict “Yes” (class 1)
    • Below: Predict “No” (class 0)
  3. X-axis: Your predictor (e.g., distance to party)
  4. Y-axis: Probability of outcome (e.g., friend attending)

Example Interpretation:

  • At Low X: p ≈ 0.2 → “No” (friend unlikely to come)
  • At Medium X: p ≈ 0.5 → Uncertain
  • At High X: p ≈ 0.8 → “Yes” (friend likely to come)

Equation Behind the Curve:

p = 1 / (1 + e^(-z))  
where z = b₀ + b₁X

Questions to Think About

  1. What makes logistic regression different from linear regression?
    It predicts yes-or-no answers, not numbers like height or sales.
  2. Why is probability important here?
    It gives you a chance between 0 and 1, so you can pick “yes” or “no” smartly.

How Logistic Regression Works

Logistic regression starts with data and turns it into a yes-or-no guess. Let’s break it down!

Experiments and Data Requirements

To perform logistic regression, you need two key components:

  1. Dependent Variable (Outcome)
    • This is the yes-or-no question you’re trying to answer, represented as binary data (1 = “yes”, 0 = “no”).
    • Examples:
      • “Did the customer buy the product?” (1/0)
      • “Will it rain tomorrow?” (1/0)
      • “Did the student pass the exam?” (1/0)
    • In statistics, this is called a binary response variable.
  2. Independent Variables (Predictors)
    • These are the measurable factors you believe might influence the outcome.
    • They can be:
      • Continuous (e.g., temperature, spending amount)
      • Categorical (e.g., cloud type, customer gender)
      • Ordinal (e.g., rating scales)
    • Example predictors for “Will it rain?”:
      • Temperature (°C)
      • Humidity (%)
      • Cloud cover (oktas)
      • Barometric pressure (hPa)

Data Structure Example: For a store analyzing toy purchases:

| Spending (£) | Purchase (Y/N) |
|--------------|----------------|
| 10           | 0              |
| 20           | 1              |
| 30           | 1              |
| 15           | 0              |
| 25           | 1              |

Visualization 1: Raw Data Plot

Purchase? ▲
          |
        1 |           ● (25,1)  ● (30,1)
          |       ● (20,1)
          |
        0 | ● (10,0)  ● (15,0)
          |
          +-----+-----+-----+-----► Spending (£)
            10   15   20   25   30
  • ● = Actual observed purchases (Y=1) or non-purchases (Y=0)
  • Shows the binary nature of the outcome at different spending levels

Visualization 2: Probability Curve

Probability ▲
of Purchase |
        1.0|               ● (30,0.85)
            |            /
            |          ● (25,0.65)
        0.5 |-------*---------------- [Decision Boundary]
            |     ● (20,0.5) 
            |   /
            | ● (15,0.2)
        0.0 +-----------------------► Spending (£)
            10   15   20   25   30

Key Features:

  • S-curve shows how probability changes with spending
  • ‘*’ = Decision boundary at p=0.5
  • ● = Predicted probabilities at key spending levels
  • Dashed line shows the automatic classification threshold

Interpretation Guide:

  1. Below £15: Low probability (<50%) → Predict “No Purchase”
  2. £20-£25: Transition zone → Uncertain
  3. Above £25: High probability (>50%) → Predict “Purchase”

Real-World Application

A weather app uses similar logic with:

  • Dependent: “Rain tomorrow?” (1/0)
  • Independent:
    • Today’s rainfall (mm)
    • Wind speed (mph)
    • Temperature difference from seasonal average (°C)
Rain Probability ▲
                 |
              1.0|                 ● (High rain+wind)
                 |                /
                 |               /
              0.8|              ● "78% chance" prediction
                 |            _/
              0.5|---------~*~---------------- [Alert Threshold]
                 |      _/  
              0.2|    ● (Normal conditions)
                 |  _/
              0.0|_/________________________► Weather Factors
                  Low    Medium    High

How to Read This:

  1. Each ● represents a weather condition combination
  2. X-axis combines all predictors (rain+wind+temp)
  3. Curve shows how probability changes with conditions
  4. ~*~ marks the app’s alert threshold (likely ~60%)
  5. Dashed line shows the 78% prediction point

Decision Process:

  • Conditions → Model → Probability → Alert
  • Example: 15mm rain + 20mph wind + -5°C temp → 78% → “High chance of rain”

Data Quality Notes:

  1. Ensure enough “yes” and “no” cases (balanced data)
  2. Remove irrelevant predictors (e.g., customer shoe size for toy purchases)
  3. Check for multicollinearity if using multiple predictors

Turning Numbers into Probabilities

What’s Linear Regression in Plain English?

Linear regression is like drawing a straight line through a scatter of points to make predictions. Imagine you’re guessing someone’s house price based on its size. You might notice that bigger houses cost more, so you draw a line to show that pattern. The equation for this line is:

y = b₀ + b₁x

  • y (the prediction): This is what you’re trying to figure out—like the house price or a test score. It’s the number you get at the end.
  • x (the clue or input): This is the thing you already know—like the house size in square feet or hours studied. It’s your starting point.
  • b₀ (the starting point or intercept): This is where the line begins when x is zero. Think of it as the base price of a house with no size (maybe the land value alone) or the score you’d get without studying.
  • b₁ (the slope or rate of change): This tells you how much y changes when x changes by 1. If b₁ = 2, then for every extra square foot, the house price goes up by $2. It’s the steepness of your line.

For example, if b₀ = 50 (base price in thousands) and b₁ = 0.1 (price per square foot), then for a 1,000-square-foot house (x = 1000):

  • y = 50 + 0.1 × 1000 = 50 + 100 = 150 (a $150,000 house).

But here’s the catch: y can be anything—negative, huge, whatever. If x = -500 (impossible size, but math doesn’t care), y = 50 + 0.1 × -500 = 50 - 50 = 0. For yes-or-no questions (e.g., “Will it rain?”), numbers like -5 or 150 don’t work as probabilities—they need to stay between 0 and 1.

Logistic Regression: Fixing the Problem

Logistic regression takes that straight-line idea and bends it into something better for yes-or-no predictions. It starts with the same setup: z = b₀ + b₁x. Here’s what happens:

  • z (the raw score): This is like a “score” based on your clue x. Bigger z means more likely “yes”; smaller z means more likely “no.” It’s still a straight-line combo of b₀ (starting score) and b₁ × x (how much the clue matters).
  • Then, the sigmoid function steps in: p = 1 / (1 + e⁻ᶻ).

This “sigmoid” is a fancy S-shaped curve that squashes z into a probability (p) between 0 and 1:

  • e is a special number (about 2.718) that makes the math work smoothly.
  • -z flips the score so positive z pushes p toward 1, negative z toward 0.
  • The whole 1 / (1 + e⁻ᶻ) formula ensures p never goes below 0 or above 1.

Imagine x is “hours of clouds” and we’re predicting rain. If b₀ = -2 (a low base chance) and b₁ = 1 (each cloudy hour matters a lot), then:

  • 2 hours of clouds: z = -2 + 1 × 2 = 0, so p = 1 / (1 + e⁰) = 1 / 2 = 0.5 (50% chance).
  • 4 hours: z = -2 + 1 × 4 = 2, so p = 1 / (1 + e⁻²) ≈ 0.88 (88% chance).

If p = 0.7, there’s a 70% chance of “yes” (rain!). You pick a cutoff: above 0.5 is “yes,” below is “no.” It’s like a weather app turning cloud data into a rain forecast.

Questions to Think About

  1. Why can’t we use a straight line like linear regression?
    The line y = b₀ + b₁x can spit out wild numbers—like -10 or 200—because it’s just a slope and a starting point with no limits. Probabilities need to be between 0 and 1 (0% to 100%). A straight line doesn’t care about that, but the sigmoid curve in p = 1 / (1 + e⁻ᶻ) forces the answer to fit.

  2. What does the sigmoid function do?
    It takes the raw score z (from b₀ + b₁x) and bends it into a probability. Big positive z (e.g., 5) gets you close to 1 (almost certain “yes”), big negative z (e.g., -5) gets you close to 0 (almost certain “no”), and z = 0 lands at 0.5 (a coin flip). It’s the perfect tool for yes-or-no answers.

Basic Principles of Logistic Regression

Let’s explore the simple ideas that power logistic regression, with examples to try!

From Clues to Chances

You use clues to guess the chance of “yes”. Imagine predicting if a toy breaks based on how often it’s used. If it’s used 5 times, and your model says z = -2 + 0.8 × 5 = 2, the sigmoid function gives p = 1 / (1 + e⁻²) ≈ 0.88. That’s an 88% chance it breaks—probably “yes”!

Odds and Logits

Logistic regression works with odds—the chance of “yes” divided by “no”. If p = 0.75, odds are 0.75 / 0.25 = 3 (3-to-1). It then takes the natural log of odds (called logit) to make a straight-line equation possible 1. This logit score gets turned back into a probability with the sigmoid function.

Decision Boundary

The decision boundary splits “yes” from “no”. At p = 0.5, the logit is 0. If your clue makes z ≥ 0, it’s “yes”; if z < 0, it’s “no”. It’s like drawing a line on a treasure map—cross it, and you’re in “yes” land!

Questions to Think About

  1. What are odds in simple terms?
    It’s how many times “yes” is likely compared to “no”—like 3-to-1 for a win.
  2. Why use a decision boundary?
    It’s the line where you switch from “no” to “yes” based on the chance.

Finding the Best Fit

How does logistic regression pick the right curve? Let’s dig in!

Maximum Likelihood

Unlike linear regression’s least squares, logistic regression uses maximum likelihood to find the best fit. It tweaks b₀ and b₁ to make the observed yeses and nos most likely under the model [^1]. Think of it as tuning a radio until the signal’s clear—here, it’s the probability signal!

Sigmoid in Action

The sigmoid function ensures every guess stays between 0 and 1. For a clue like “hours studied”, it curves the line so big numbers don’t go wild—they just get closer to 1 (yes) or 0 (no). Computers test lots of curves to find the best one.

Questions to Think About

  1. Why not use least squares like linear regression?
    Yes-or-no data doesn’t fit straight lines—maximum likelihood suits probabilities better.
  2. How does the sigmoid help?
    It keeps guesses sensible, always between 0 and 1, no matter the clue.

Assumptions of Logistic Regression

Logistic regression is a powerful tool for yes-or-no predictions, but it works best when certain rules—or assumptions—are followed. Think of these as the recipe for a perfect cake: skip a step, and it might flop! Let’s break them down.

Binary Outcome

The dependent variable (what you’re predicting) must be binary—only two options, like “yes” or “no,” “pass” or “fail,” “rain” or “no rain.” We call it binary because it’s a 0-or-1 choice. For example, predicting if a student passes (1) or fails (0) works great. But if you’ve got three choices—like “pass,” “fail,” or “maybe”—you’d need a fancier version called multinomial logistic regression.

Why it matters: The equation p = 1 / (1 + e⁻ᶻ) (say “pee equals one over one plus ‘e’ to the minus ‘z’”) is like a light switch with only two settings: on (1) or off (0).

It’s built to tell you the chance of “yes” (like “Will it rain?”) versus “no,” giving a number between 0 (no way) and 1 (definitely). For example, if z (the “zee” score from clues like cloudiness) is big and positive, p might be 0.9—90% chance of “yes.”

If z is negative, p might be 0.2—20% chance. But if you ask it something with three options—like “rain,” “sun,” or “clouds”—it’s like asking a two-way switch to pick a third setting.

It can’t! The math gets confused because it’s only designed to split the chance between two things, not more. For extra options, you’d need a different tool, like multinomial logistic regression, which is like a switch with more positions.

Linear Logits

The logit—a fancy term for the log of the odds—needs to form a straight line when plotted against the independent variables (your clues, like hours studied). The odds are just the chance of “yes” divided by the chance of “no” (e.g., 3-to-1 means three times more likely “yes”). The equation starts as z = b₀ + b₁x (“zee equals bee-zero plus bee-one times ex”):

  • b₀ (“bee-zero”): The starting point—what z is when x is zero.
  • b₁ (“bee-one”): The slope—how much z changes per unit of x.
  • x (“ex”): The clue—your input, like study time.

Then, the logit is log(p / (1 - p)) = b₀ + b₁x (“log of pee over one minus pee”). The raw data (like test scores) might wiggle, but this logit stays straight. Imagine a seesaw: as study time (x) goes up, the logit tilts predictably.

Example: If b₀ = -2 and b₁ = 1, then 2 hours of study gives z = -2 + 1 × 2 = 0. The logit is balanced—equal odds.

No Big Outliers

Outliers are wild data points—like a kid studying 100 hours for a quiz when everyone else does 1–5 hours. These can yank the sigmoid curve (that S-shape from p = 1 / (1 + e⁻ᶻ)) off track, making predictions messy. Your data should be mostly reasonable, like typical study times or temperatures.

Why it matters: One crazy number can trick the model into overreacting. If someone with a £10,000 credit card balance defaults but most balances are under £3,000, the curve might overpredict defaults.

If these assumptions break, your predictions could go haywire—like a cake with salt instead of sugar!

Questions to Think About

  1. What happens if the outcome isn’t yes-or-no?
    Logistic regression is like a two-slot toaster—it’s built for bread (0 or 1), not bagels or muffins. For more options, switch to multinomial logistic regression.
  2. Why care about outliers?
    They’re like a loud kid in class—everyone notices them, and the teacher (the model) gets distracted, skewing the lesson (predictions).

Applying Logistic Regression

Let’s dive into a fun example: predicting if a customer pays their credit card bill (0 = no default, 1 = yes default) based on their balance!

Step-by-Step Example

Data: £500 balance, no default (0); £1,500, yes (1); £2,500, yes (1).

  1. Set Up: Use balance (x) to predict default (the dependent variable, 0 or 1).
  2. Fit the Model: A computer crunches the numbers and might spit out z = -10.57 + 0.0055 × x (“zee equals minus ten-point-five-seven plus zero-point-zero-zero-five-five times ex”).
    • b₀ = -10.57 (“bee-zero”): A low starting score—default is unlikely with no balance.
    • b₁ = 0.0055 (“bee-one”): Each £1 of balance nudges z up by 0.0055.
  3. Predict: For a £2,000 balance:
    • z = -10.57 + 0.0055 × 2000 = -10.57 + 11 = 0.43.
    • p = 1 / (1 + e⁻⁰⁴³) ≈ 0.61 (“pee equals one over one plus ee to the minus zero-point-four-three”).
    • Result: 61% chance of default. Since p > 0.5, predict “yes” (default).
  4. Check: Test it on real data. If £2,000 customers usually default and the model says “yes,” it’s working!

Try it yourself: Will a friend share sweets based on how many they have? If 5 sweets = no (0), 10 = yes (1), 15 = yes (1), fit a line like z = -5 + 0.5 × sweets. Test 8 sweets: z = -5 + 0.5 × 8 = -1, p ≈ 0.27—probably “no”!

Questions to Think About

  1. What’s the probability for £3,000?
    • z = -10.57 + 0.0055 × 3000 = -10.57 + 16.5 = 5.93.
    • p = 1 / (1 + e⁻⁵⁹³) ≈ 0.997 (“pee equals one over one plus ee to the minus five-point-nine-three”).
    • Result: 99.7% chance of “yes”—almost certain default!
  2. How do you know it’s good?
    Compare predictions to reality. If “yes” guesses match actual defaults (e.g., £2,500 was 1), it’s a keeper!

Importance of Logistic Regression

Logistic regression is a data science superstar! It’s simple, fast, and nails binary questions—like “Will they buy?” or “Are they sick?” Shops use it to predict purchases (e.g., £50 cart = 80% chance), doctors spot risks (e.g., high fever = 90% illness odds), and you could guess game wins (e.g., 3 goals = 70% victory). It sticks to data, not gut feelings, as stats guru Edwin Jaynes praised in 2003 1.

Jaynes said tools like this cut through guesswork mess 1. From health to fun, logistic regression turns numbers into sharp yes-or-no calls, guiding smart choices daily.

Questions to Think About

  1. Why is it so popular?
    It’s easy to use and rocks at binary puzzles—two-option answers are everywhere!
  2. How does it help daily life?
    It predicts rain (60% = umbrella?), wins (80% = bet?), or risks (95% = see a doc!), making decisions clearer.

    Conclusion

Logistic regression is your yes-or-no buddy in data science, guessing outcomes with a curvy sigmoid twist. It takes clues, turns them into probabilities, and picks “yes” or “no” using maximum likelihood. From bill payments to party plans, it’s a fun way to crack binary mysteries. Grab some data, curve a line, and become a prediction whiz!


References

  1. Navarro, D. (2019). Learning Statistics with R. Retrieved from https://learningstatisticswithr.com/.
  2. Khan Academy. (2025). Statistics and Probability. Retrieved from https://www.khanacademy.org/math/statistics-probability/.
  3. W3Schools. (2025). Python Tutorial. Retrieved from https://www.w3schools.com/python/.
  4. Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press.

Activities for Students

Try these fun games to learn logistic regression:

  1. Rain Guess Game
    Task: Track temperature and if it rains for 5 days.
    Steps: Guess rain (1) or no (0) based on temp, make a rule (e.g., >20°C = no).
    Outcome: See if your rule works!

  2. Sweet Sharing Test
    Task: Note how many sweets friends have and if they share (1 or 0).
    Steps: Predict sharing with a 0.5 cut-off, check real shares.
    Outcome: Test your yes-or-no skill!

  3. Toy Break Challenge
    Task: Use play times (e.g., 3, 5, 10) and breaks (0 or 1).
    Steps: Guess breaks with a simple curve, see if it fits.
    Outcome: Play with probabilities!

  4. Credit Card Play
    Task: Use the example (£500, 0; £1500, 1; £2500, 1).
    Steps: Predict for £1000 with z = -10.57 + 0.0055x.
    Outcome: Practise sigmoid maths!

  5. Yes-or-No Story
    Task: Write 150 words about predicting something fun (e.g., picnic rain).
    Steps: Imagine clues, explain your guess.
    Outcome: Get creative with stats!

Additional Reading

  • Khan Academy, Logistic Regression
    Link: khanacademy.org/math/statistics
    Description: Free, fun lessons on yes-or-no guesses.

  • Math is Fun, Probability
    Link: mathsisfun.com/data/probability
    Description: Simple chance ideas for beginners.

  • W3Schools, Python Basics
    Link: w3schools.com/python
    Description: Code logistic regression easily!

    Visualisation

      P=1 |        ____-------
        |     __--
        |   _-  P=0.5 |----*---------------- [Decision Boundary]
        | _-
        |--____
    P=0 |        ----________
              X→
    

Logistic regression’s S-curve (sigmoid) maps X to probabilities (0-1). The * marks where P(Y=1)=0.5. Steeper curves mean stronger predictor effects. Dashes show how probabilities change as X increases.

Python

Here’s a simple, self-contained logistic regression example in Python with clear comments and sample output:

# Import essential libraries
from sklearn.linear_model import LogisticRegression
import numpy as np

# Sample data: Hours studied vs Pass (1) or Fail (0)
X = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)  # Study hours
y = np.array([0, 0, 0, 1, 1, 1])                 # Exam results

# Create and train the model
model = LogisticRegression()
model.fit(X, y)

# Make predictions
hours_to_predict = np.array([2.5, 4.5]).reshape(-1, 1)
predictions = model.predict(hours_to_predict)
probabilities = model.predict_proba(hours_to_predict)

# Output results
print(f"Predictions: {predictions}")  # [0 1] (Fail, Pass)
print(f"Probabilities:\n{probabilities}")

# Predictions: [0 1]
# Probabilities:
# [[0.734 0.266]  # 73.4% fail, 26.6% pass chance for 2.5 hours
# [0.261 0.739]] # 26.1% fail, 73.9% pass chance for 4.5 hours

Code Explanation

  1. Data Preparation:
    • X: Study hours (1-6)
    • y: Binary outcomes (0=Fail, 1=Pass)
  2. Model Training:
    • Creates a logistic regression model
    • Fits to the study hours vs results data
  3. Predictions:
    • Predicts outcomes for 2.5 and 4.5 study hours
    • Shows both class predictions (0/1) and probabilities
  4. Key Insight:
    • The “decision boundary” is around 3.5 hours
    • Each extra hour increases pass probability

Key Features:

  • Minimal dependencies (only numpy/scikit-learn)
  • Clear binary classification example
  • Shows both hard predictions and probabilities
  • Complete from data to results in <15 lines
  • Works in any Python environment

This example demonstrates:

  1. A realistic student exam scenario
  2. The probability outputs that make logistic regression special
  3. How to interpret the results
  4. All in a compact, reproducible format
  1. Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press.  2 3