Bias-Variance Tradeoff
The Core Concept of Machine Learning — Explained for Engineering Students
Last Updated: March 2026
📌 Key Takeaways
- Bias: Error from wrong assumptions — causes underfitting. High-bias models are too simple.
- Variance: Error from over-sensitivity to training data — causes overfitting. High-variance models are too complex.
- Total Error = Bias² + Variance + Irreducible Noise
- The goal is to minimise total error — not just bias or variance individually.
- Fix high bias: Use a more complex model, add more features, reduce regularisation.
- Fix high variance: Get more data, add regularisation, use simpler model, use ensemble methods.
1. What is Bias? What is Variance?
Bias
Bias is the error introduced by approximating a real-world problem with a model that is too simple. A high-bias model makes strong, incorrect assumptions — for example, assuming the relationship is always linear when it is actually curved.
High bias leads to underfitting — the model fails to learn the underlying pattern, and its predictions are consistently off, even on training data.
Variance
Variance is the error introduced by a model that is overly sensitive to small fluctuations in the training data. A high-variance model learns not just the true pattern, but also the random noise in the training set.
High variance leads to overfitting — the model performs very well on training data but poorly on new, unseen data.
| High Bias | High Variance | |
|---|---|---|
| Training Error | High | Low |
| Test/Validation Error | High | High |
| Problem | Underfitting | Overfitting |
| Model Complexity | Too simple | Too complex |
| Example Algorithm | Linear Regression on complex data | Deep Decision Tree on small dataset |
2. Analogy — The Archer
Imagine four archers shooting at a target:
- Low Bias, Low Variance: Shots clustered tightly around the bullseye — accurate and consistent. This is the ideal ML model.
- High Bias, Low Variance: Shots clustered tightly but all far from the bullseye — consistently wrong. This is underfitting.
- Low Bias, High Variance: Shots scattered all over but centred around the bullseye on average — inconsistent. This is overfitting.
- High Bias, High Variance: Shots scattered widely AND away from the bullseye — the worst case.
3. Error Decomposition Formula
The expected prediction error of a model can be mathematically decomposed as:
Expected Error = Bias² + Variance + Irreducible Noise
| Term | Meaning | Can We Reduce It? |
|---|---|---|
| Bias² | Squared difference between average model prediction and true value | Yes — use better model |
| Variance | How much model predictions vary across different training datasets | Yes — regularise/simplify |
| Irreducible Noise | Inherent randomness in the data itself | No — cannot be removed |
Even a perfect model cannot drive total error to zero — the irreducible noise sets a floor. The job of the ML engineer is to drive bias and variance as low as possible without increasing the other.
4. Connection to Underfitting & Overfitting
Underfitting (High Bias)
Occurs when the model is not complex enough to capture the data structure. Signs: high training error AND high test error; adding more data does not help; predictions are consistently off in the same direction.
Example: Fitting a straight line through data that follows a quadratic curve.
Overfitting (High Variance)
Occurs when the model is too complex and learns the noise in training data. Signs: very low training error BUT high test error; large gap between training and validation performance.
Example: A degree-15 polynomial that perfectly passes through all 10 training points but oscillates wildly between them.
5. The Tradeoff — Why You Cannot Minimise Both at Once
As model complexity increases, bias decreases and variance increases. There is an optimal point of complexity that minimises total error. This is the core tension — the bias-variance tradeoff.
- Simple models (high bias, low variance): Linear Regression, Naive Bayes, Linear SVM
- Complex models (low bias, high variance): Deep Decision Trees, K-Nearest Neighbours (K=1), large Neural Networks without regularisation
- Balanced models: Random Forest, Gradient Boosting, Ridge/Lasso Regression
6. How to Fix High Bias and High Variance
To Fix High Bias (Underfitting):
- Use a more complex model (switch from linear to polynomial, or add more layers)
- Add more relevant features (feature engineering)
- Reduce regularisation strength (lower λ in Ridge/Lasso)
- Train for more epochs (for neural networks)
- Ensure data quality — noisy labels create artificial bias
To Fix High Variance (Overfitting):
- Get more training data — the most reliable fix
- Add regularisation (L1/Lasso, L2/Ridge, Dropout for neural networks)
- Use a simpler model (reduce depth of decision tree, fewer layers in NN)
- Use ensemble methods (Random Forest averages many high-variance trees)
- Apply cross-validation to better estimate true model performance
- Use early stopping in neural network training
7. Common Mistakes Students Make
- Thinking high training accuracy means a good model: A model can memorise training data and still fail completely on new inputs. Always evaluate on a held-out test set.
- Only focusing on reducing bias: Students often add complexity until training error drops, without checking if test error also drops.
- Confusing irreducible noise with bias: Even the best model will have some error from the data itself — this is not a model flaw.
- Not using validation sets: Without a separate validation set, you cannot detect overfitting during training.
8. Frequently Asked Questions
What is the ideal bias-variance balance?
The ideal balance is the model complexity that minimises total generalisation error. In practice, train multiple models of different complexities, evaluate each on a validation set, and choose the one with the lowest validation error.
Does more data reduce bias or variance?
More data primarily reduces variance. It does not significantly reduce bias. If a model is underfitting, adding more data will not fix it — you need a better model.
How does regularisation affect bias and variance?
Regularisation reduces variance by penalising model complexity. However, too much regularisation increases bias. The regularisation strength (λ) is a hyperparameter that controls this balance.