Logistic Regression — Binary Classification Explained for Engineering Students



Logistic Regression

Binary Classification Explained for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

  • Definition: Logistic regression is a supervised ML algorithm for binary classification — predicting one of two outcomes.
  • Core function: Sigmoid — σ(z) = 1 / (1 + e⁻ᶻ) — maps output to a probability between 0 and 1.
  • Decision boundary: If predicted probability ≥ 0.5 → class 1; else → class 0.
  • Cost function: Binary Cross-Entropy (Log Loss) — not MSE.
  • Despite the name, logistic regression is a classification algorithm, not a regression algorithm.

1. Definition & Analogy

Logistic regression is a supervised machine learning algorithm used for binary classification — predicting whether an input belongs to one of exactly two categories.

Examples of binary classification problems: Is this email spam or not spam? Will this patient develop diabetes (yes/no)? Will this loan applicant default (yes/no)? Will a student pass or fail?

Despite its name, logistic regression is a classification algorithm. It does not predict a continuous value — it predicts the probability that an input belongs to the positive class (class 1), and then assigns a class label based on that probability.

Analogy — The Exam Pass/Fail Predictor

Imagine you want to predict whether a student will pass (1) or fail (0) based on their study hours. You cannot use linear regression directly — it could predict values like 1.8 or -0.3, which have no meaning as a pass/fail label. Logistic regression solves this by squashing the output through the sigmoid function, always producing a probability between 0 and 1. If probability ≥ 0.5 → predict pass; if < 0.5 → predict fail.

2. The Sigmoid Function

The sigmoid function is the mathematical heart of logistic regression. It maps any real number to a value strictly between 0 and 1:

σ(z) = 1 / (1 + e⁻ᶻ)

z (input)σ(z) (output probability)Interpretation
Very large positive (e.g., +10)≈ 1.0Almost certainly class 1
00.5Decision boundary — equal probability
Very large negative (e.g., −10)≈ 0.0Almost certainly class 0

Key properties of the sigmoid: output is always between 0 and 1; it is differentiable everywhere (essential for gradient descent); it has a characteristic S-shaped curve; σ(0) = 0.5 exactly.

3. Hypothesis & Decision Boundary

The logistic regression model first computes a linear combination of inputs (exactly like linear regression):

z = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ

This value z is then passed through the sigmoid function to produce a probability:

ŷ = P(y=1 | x) = σ(z) = 1 / (1 + e⁻ᶻ)

The decision boundary is the threshold at which we switch our prediction from class 0 to class 1. The standard threshold is 0.5:

  • If ŷ ≥ 0.5 → predict class 1 (positive)
  • If ŷ < 0.5 → predict class 0 (negative)

The decision boundary corresponds to z = 0, or: β₀ + β₁x₁ + β₂x₂ + … = 0. This is a straight line (or hyperplane in multiple dimensions). The threshold can be adjusted — for medical diagnosis, you might use 0.3 to reduce false negatives (missing a disease is worse than a false alarm).

4. Cost Function — Binary Cross-Entropy

Logistic regression cannot use Mean Squared Error (MSE) as its cost function — when the sigmoid is applied, MSE becomes non-convex (multiple local minima), making gradient descent unreliable.

Instead, logistic regression uses Binary Cross-Entropy (also called Log Loss):

J(β) = −(1/m) × Σ [ yᵢ log(ŷᵢ) + (1−yᵢ) log(1−ŷᵢ) ]

TermMeaning
mNumber of training examples
yᵢTrue label (0 or 1)
ŷᵢPredicted probability (between 0 and 1)
logNatural logarithm

Intuition: When y=1 and ŷ→1 (correct), −log(ŷ)→0 (low cost). When y=1 and ŷ→0 (wrong), −log(ŷ)→∞ (very high cost). The function heavily penalises confident wrong predictions.

5. Training — Gradient Descent

The parameter update rules for logistic regression using gradient descent are identical in form to linear regression:

βⱼ := βⱼ − (α/m) × Σ (ŷᵢ − yᵢ) × xᵢⱼ

Where α is the learning rate, m is the number of training examples, and the sum is over all examples. The key difference is that ŷᵢ here is the sigmoid output — not a raw linear prediction. These updates are applied simultaneously for all parameters at each iteration.

6. Worked Example

Problem: A student studied 3 hours. The trained model has β₀ = −4 and β₁ = 1.5. Will the student pass?

Step 1 — Compute z:

z = β₀ + β₁x = −4 + 1.5 × 3 = −4 + 4.5 = 0.5

Step 2 — Apply sigmoid:

σ(0.5) = 1 / (1 + e⁻⁰·⁵) = 1 / (1 + 0.6065) = 1 / 1.6065 ≈ 0.622

Step 3 — Apply decision boundary:

0.622 ≥ 0.5 → Predict: PASS (class 1)

Interpretation: The model predicts a 62.2% probability of passing for a student who studies 3 hours.

7. Logistic Regression vs Linear Regression

FeatureLinear RegressionLogistic Regression
TaskRegression (continuous output)Classification (categorical output)
OutputAny real numberProbability between 0 and 1
Output functionIdentity (ŷ = z)Sigmoid (ŷ = σ(z))
Cost functionMean Squared Error (MSE)Binary Cross-Entropy (Log Loss)
Example problemPredict house pricePredict spam or not spam

8. Python Code


import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Training data: hours studied vs pass(1)/fail(0)
X = np.array([[1], [2], [3], [4], [5], [6]])
y = np.array([0, 0, 0, 1, 1, 1])

# Train logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Predict probability for 3.5 hours
prob = model.predict_proba([[3.5]])[0][1]
print(f"Probability of passing (3.5 hours): {prob:.3f}")

# Predict class label
prediction = model.predict([[3.5]])
print(f"Prediction: {'Pass' if prediction[0] == 1 else 'Fail'}")

# Model evaluation
y_pred = model.predict(X)
print(f"\nAccuracy: {accuracy_score(y, y_pred):.2f}")
print(classification_report(y, y_pred, target_names=['Fail', 'Pass']))
    

9. Common Mistakes Students Make

  • Using logistic regression for multi-class problems without modification: Standard logistic regression is binary only. For 3+ classes, use Softmax Regression (Multinomial Logistic Regression) or a One-vs-Rest approach.
  • Using MSE as the cost function: MSE with sigmoid creates a non-convex surface. Always use binary cross-entropy for logistic regression.
  • Forgetting to scale features: Logistic regression is sensitive to feature scale. Always normalise or standardise inputs before training.
  • Treating the probability output as the final answer: The 0.5 threshold is a default — in imbalanced datasets or high-stakes applications, the threshold must be tuned based on the cost of false positives vs false negatives.
  • Assuming logistic regression cannot handle non-linear boundaries: It can — by adding polynomial features (e.g., x², x₁x₂) to the input, logistic regression can learn non-linear decision boundaries.

10. Frequently Asked Questions

Why is it called logistic regression if it is a classification algorithm?

The name comes from the logistic function (sigmoid) used to transform the output. Historically, the term “regression” referred to fitting a model to data — the classification aspect comes from thresholding the probability output. The name stuck despite being misleading to beginners.

Can logistic regression handle more than two classes?

Not directly in its binary form. For multi-class problems, use Multinomial Logistic Regression (Softmax Regression) or the One-vs-Rest strategy, which trains one binary classifier per class. Scikit-learn’s LogisticRegression handles this automatically with the multi_class parameter.

What is a good log loss score?

Lower log loss is better. A perfect model has log loss = 0. A random classifier on a balanced binary problem has log loss ≈ 0.693 (−log(0.5)). Values below 0.3 are generally considered good, but the benchmark depends on the specific problem and dataset.

Next Steps