Overfitting vs Underfitting — How to Detect and Fix Both



Overfitting vs Underfitting

How to Detect and Fix Both — For Engineering Students

Last Updated: March 2026

📌 Key Takeaways

  • Overfitting: Model is too complex — memorises training data, fails on new data. Low train error, high test error.
  • Underfitting: Model is too simple — misses patterns in data. High train error AND high test error.
  • Good fit: Low train error AND low test error — small gap between them.
  • Fix overfitting: More data, regularisation, simpler model, dropout, early stopping, cross-validation.
  • Fix underfitting: More complex model, more features, less regularisation, more training epochs.
  • Diagnosis tool: Learning curves — plot train and validation error vs training set size or model complexity.

1. Definitions

Overfitting

Overfitting occurs when a model learns the training data too well — including its noise, outliers, and random fluctuations — and consequently fails to generalise to new, unseen data.

An overfitted model has memorised the training set rather than learned the underlying pattern. It performs excellently on data it has seen but poorly on data it has not. This is the ML equivalent of a student who memorises every past exam question verbatim but cannot answer a rephrased version of the same question.

Symptom: Very low training error, high validation/test error. Large gap between the two.

Underfitting

Underfitting occurs when a model is too simple to capture the underlying structure of the data. It fails to learn the true patterns even from the training data, resulting in poor performance on both training and test sets.

This happens when the model has too few parameters relative to the complexity of the problem — like trying to fit a curved relationship with a straight line.

Symptom: High training error AND high validation/test error. Small gap between the two (both are high).

2. Analogy — The Student and the Exam

  • Underfitting student: Never studied the material properly. Scores low on practice tests AND the real exam. The model is too simple.
  • Overfitting student: Memorised every past exam paper word-for-word without understanding the concepts. Scores 100% on all past papers but fails the real exam (which has slightly different questions). The model memorised instead of learning.
  • Well-fitted student: Studied the concepts deeply, understood the underlying principles, practised with varied examples. Scores well on both practice tests and the real exam. This is the goal.

3. How to Detect — Learning Curves

The primary diagnostic tool is the learning curve — a plot of training and validation error against a variable (training set size or model complexity).

Learning Curve vs Training Set Size:

  • Overfitting pattern: Training error remains very low; validation error is much higher. The gap between them is large. Adding more data gradually closes the gap.
  • Underfitting pattern: Both training and validation errors are high. The gap between them is small. Adding more data does not help much — the model is simply too simple.
  • Good fit pattern: Both errors are low and converge to similar values as training size increases.

Learning Curve vs Model Complexity:

  • As complexity increases (e.g., increasing polynomial degree): training error decreases continuously.
  • Validation error first decreases, then increases (as overfitting begins).
  • The optimal complexity is where validation error is minimised — the lowest point of the validation curve.

4. Side-by-Side Comparison

Feature Underfitting Good Fit Overfitting
Training Error High Low Very Low
Validation/Test Error High Low High
Gap (Train vs Test) Small Small Large
Bias High Low Low
Variance Low Low High
Model Complexity Too simple Just right Too complex
Example Linear model on curved data Polynomial degree 3 on curved data Degree-20 polynomial on 10 points

5. How to Fix Overfitting

  1. Get more training data — the most reliable fix. More data makes it harder for the model to memorise all examples and forces it to learn general patterns.
  2. Reduce model complexity — use a simpler model with fewer parameters. Reduce polynomial degree, decrease number of layers/neurons, reduce max_depth of trees.
  3. Add regularisation — penalise model complexity during training (see Regularisation section below).
  4. Use dropout (neural networks) — randomly deactivate neurons during training, forcing the network to learn redundant representations.
  5. Early stopping (neural networks) — stop training when validation error starts increasing, before the model begins to overfit.
  6. Use ensemble methods — Random Forest, Gradient Boosting. These average many models, dramatically reducing variance.
  7. Apply cross-validation — use k-fold cross-validation instead of a single train/test split to get a more reliable estimate of generalisation error.
  8. Feature selection — remove irrelevant features. Every unnecessary feature gives the model more opportunity to overfit.

6. How to Fix Underfitting

  1. Use a more complex model — add polynomial features, increase network depth, increase tree depth.
  2. Add more relevant features — feature engineering to provide the model with more informative inputs.
  3. Reduce regularisation — if regularisation is too strong, the model becomes too constrained. Lower the regularisation parameter (λ or C).
  4. Train for longer (neural networks) — increase the number of epochs. An underfitted neural network may simply need more iterations to converge.
  5. Use a different model — if a linear model consistently underfits, switch to a non-linear model (Decision Tree, SVM with RBF kernel, Neural Network).
  6. Check data quality — underfitting can also be caused by noisy or incorrectly labelled training data. Clean your data and re-examine labels.

7. Regularisation — L1, L2, Dropout

Regularisation adds a penalty term to the cost function to discourage large parameter values, which are associated with model complexity.

L2 Regularisation (Ridge)

J(β) = Loss + λ × Σβⱼ²

Penalises large parameter values by adding the sum of squared weights. Shrinks all weights toward zero but rarely to exactly zero. λ controls regularisation strength — higher λ = more regularisation = simpler model.

L1 Regularisation (Lasso)

J(β) = Loss + λ × Σ|βⱼ|

Penalises the absolute value of weights. Tends to drive some weights to exactly zero, performing automatic feature selection — sparse models. Useful when many features are irrelevant.

Dropout (Neural Networks)

During each training iteration, randomly set a fraction p of neuron outputs to zero. This prevents neurons from co-adapting — forces the network to learn multiple independent representations of the same feature. At inference time, all neurons are active and outputs are scaled by (1−p).

Method Effect on Weights Feature Selection Use When
L2 (Ridge) Shrinks all weights toward 0 No All features potentially relevant
L1 (Lasso) Drives some weights to exactly 0 Yes Many irrelevant features suspected
Elastic Net Combination of L1 and L2 Partial Many features, some correlated
Dropout Randomly zeros activations N/A Deep neural networks

8. Python Code — Plotting Learning Curves


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import learning_curve
from sklearn.datasets import make_regression

# Generate data
X, y = make_regression(n_samples=200, n_features=1, noise=20, random_state=42)

def plot_learning_curve(model, title, X, y):
    train_sizes, train_scores, val_scores = learning_curve(
        model, X, y, cv=5, scoring='neg_mean_squared_error',
        train_sizes=np.linspace(0.1, 1.0, 10)
    )
    train_rmse = np.sqrt(-train_scores.mean(axis=1))
    val_rmse   = np.sqrt(-val_scores.mean(axis=1))

    plt.figure(figsize=(8, 5))
    plt.plot(train_sizes, train_rmse, 'b-o', label='Training error')
    plt.plot(train_sizes, val_rmse,   'r-o', label='Validation error')
    plt.xlabel('Training set size')
    plt.ylabel('RMSE')
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# Underfitting — simple linear model on non-linear data
linear_model = Ridge(alpha=1.0)
plot_learning_curve(linear_model, 'Underfitting — Linear Model', X, y**2)

# Overfitting — high-degree polynomial on small dataset
overfit_model = Pipeline([
    ('poly', PolynomialFeatures(degree=15)),
    ('ridge', Ridge(alpha=0.001))
])
plot_learning_curve(overfit_model, 'Overfitting — Degree 15 Polynomial', X[:50], y[:50])

# Good fit — moderate polynomial with regularisation
good_model = Pipeline([
    ('poly', PolynomialFeatures(degree=3)),
    ('ridge', Ridge(alpha=1.0))
])
plot_learning_curve(good_model, 'Good Fit — Degree 3 + Regularisation', X, y)
    

9. Common Mistakes Students Make

  • Evaluating only on training data: Always evaluate on a separate test set. Training accuracy alone tells you nothing about generalisation. Split data into train/validation/test before any modelling begins.
  • Data leakage: If test data influences preprocessing (e.g., computing the mean for normalisation using both train and test data), you get an overly optimistic test score. Always fit preprocessing on training data only and transform test data using those fitted parameters.
  • Confusing high training accuracy with a good model: A model that achieves 100% training accuracy is almost certainly overfitting. The goal is low test error, not low training error.
  • Fixing overfitting by removing test data: Never adjust your model based on test set performance. The test set must remain completely unseen until final evaluation. Use a validation set (or cross-validation) for model selection and tuning.

10. Frequently Asked Questions

How do I know if my model is overfitting or underfitting?

Check training error vs validation/test error. If training error is very low but test error is much higher → overfitting. If both training and test errors are high → underfitting. If both are low and similar → good fit. Plot learning curves for the clearest diagnosis.

Does more data always fix overfitting?

More data is the most reliable fix for overfitting and rarely makes things worse. However, if the model architecture itself is fundamentally mismatched to the problem, more data alone may not be enough. Combine more data with appropriate regularisation for best results.

What is the difference between overfitting and high variance?

They describe the same phenomenon from different angles. Overfitting is the practical observation that a model performs well on training data but poorly on test data. High variance is the mathematical explanation — the model’s predictions vary greatly depending on which training examples it saw. Overfitting is the symptom; high variance is the cause.

Next Steps