Random Forest Algorithm Explained Simply



Random Forest Algorithm

Explained Simply for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

  • Definition: Random Forest is an ensemble of decision trees — it trains many trees and combines their predictions to get a more accurate, stable result.
  • Two sources of randomness: (1) Each tree trains on a random bootstrap sample of data. (2) Each split considers only a random subset of features.
  • Prediction: Classification → majority vote of all trees. Regression → average of all tree outputs.
  • Key strength: Much less prone to overfitting than a single decision tree. Excellent out-of-the-box performance.
  • Key hyperparameters: n_estimators (number of trees), max_depth, max_features.
  • Bonus: Provides feature importance scores — tells you which features matter most.

1. What is Random Forest?

Random Forest is an ensemble machine learning algorithm that builds a large collection of decision trees during training and combines their predictions to produce a single, more accurate and stable output.

It was introduced by Leo Breiman in 2001 and remains one of the most powerful and widely used ML algorithms for structured/tabular data. It works for both classification and regression tasks.

Analogy — Wisdom of the Crowd

Imagine asking 500 doctors to independently diagnose a patient, each seeing slightly different information and having slightly different training. Then you take the majority vote. The combined opinion of 500 doctors is far more reliable than the opinion of any single doctor — even if each individual doctor makes some mistakes. This is exactly the intuition behind Random Forest. Each tree is a “doctor.” The forest is the committee that votes.

The key insight is that while each tree may be wrong in some cases, if the trees make different mistakes (uncorrelated errors), the majority vote will still be correct most of the time.

2. How Random Forest Works — Step by Step

  1. Choose the number of trees (n_estimators) — typically 100 to 500.
  2. For each tree:
    1. Draw a random bootstrap sample from the training data (with replacement).
    2. Grow a full decision tree on this sample, but at each split, consider only a random subset of features (not all features).
    3. Do not prune the tree — let it grow deep.
  3. To predict new data:
    1. Pass the input through every tree in the forest.
    2. Each tree produces a prediction.
    3. Combine: majority vote (classification) or average (regression).

The two sources of randomness — bootstrap sampling and feature subsampling — are what make the trees different from each other, which is essential for the ensemble to work well.

3. Bagging — Bootstrap Aggregating

Bagging (Bootstrap Aggregating) is the technique of training each tree on a different random sample of the training data, drawn with replacement.

For a dataset of m examples, each bootstrap sample also contains m examples — but since sampling is with replacement, approximately 63% of the original examples appear (some appear multiple times, some not at all). The remaining ~37% of examples that were not sampled are called the Out-of-Bag (OOB) samples.

Out-of-Bag (OOB) Error

Each tree can be evaluated on its OOB samples — examples it never saw during training. The average OOB error across all trees gives a free estimate of the model’s generalisation error, without needing a separate validation set. This is one of Random Forest’s most convenient properties.

4. Feature Randomness — The Second Source of Randomness

In a standard decision tree, every split considers all available features and picks the best one. In Random Forest, each split considers only a random subset of features — typically:

  • Classification: √n features (square root of total features)
  • Regression: n/3 features (one-third of total features)

Why? If one feature is very strong (e.g., it is the best predictor), every tree would use it at the root — making all trees very similar and correlated. Feature randomness forces each tree to explore different combinations of features, creating diversity in the forest. Diverse trees make different errors, and diverse errors cancel out in the vote.

5. Making Predictions

Classification — Majority Vote

Each tree votes for a class. The class with the most votes wins. Example: 500 trees vote — 380 say “Approve”, 120 say “Reject” → Final prediction: Approve.

Some implementations use soft voting — each tree outputs a probability, and the probabilities are averaged across trees. This is more nuanced and usually more accurate than hard majority voting.

Regression — Average

Each tree predicts a numerical value. The final prediction is the average of all tree predictions. Example: 100 trees predict house prices; the final prediction is the mean of all 100 values.

6. Key Hyperparameters

HyperparameterWhat it controlsDefault (Scikit-learn)Tuning tip
n_estimatorsNumber of trees in the forest100More is generally better; 100–500 covers most cases
max_featuresFeatures considered per split‘sqrt’ (classification)Try ‘sqrt’, ‘log2’, or a fraction like 0.3
max_depthMaximum depth of each treeNone (fully grown)Limit to 10–20 to speed up training
min_samples_leafMinimum samples at leaf node1Increase (e.g., 5–10) to smooth predictions
bootstrapWhether to use baggingTrueKeep True for standard Random Forest
oob_scoreUse OOB samples for evaluationFalseSet True to get a free validation score

7. Feature Importance

Random Forest provides a feature importance score for every input feature — measuring how much each feature contributes to reducing impurity across all trees. This is one of the most useful properties of Random Forest for real-world data analysis.

Feature importance is calculated as the total reduction in Gini impurity (or MSE for regression) attributable to each feature, averaged across all trees and normalised to sum to 1.

Uses of feature importance: identify which variables matter most in your dataset; remove irrelevant features to speed up training; understand the problem domain better; use as a feature selection step before training other models.

8. Random Forest vs Single Decision Tree

FeatureSingle Decision TreeRandom Forest
OverfittingHigh — must be pruned carefullyLow — averaging reduces variance significantly
InterpretabilityHigh — rules can be read directlyLow — 500 trees cannot be interpreted manually
AccuracyModerateHigh — consistently among the top performers
Training speedFastSlower (proportional to n_estimators)
Feature importanceAvailable but noisyAvailable and more reliable
StabilityLow — sensitive to data changesHigh — robust to noise and outliers

9. Python Code


import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest
model = RandomForestClassifier(
    n_estimators=100,
    max_features='sqrt',
    oob_score=True,
    random_state=42
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f"Test Accuracy:     {accuracy_score(y_test, y_pred):.3f}")
print(f"OOB Score:         {model.oob_score_:.3f}")
print(classification_report(y_test, y_pred, target_names=load_iris().target_names))

# Feature importance
importances = model.feature_importances_
for name, importance in sorted(zip(feature_names, importances), key=lambda x: -x[1]):
    print(f"{name}: {importance:.4f}")
    

10. Common Mistakes Students Make

  • Using too few trees: With fewer than 50 trees, the forest is unstable — predictions change significantly with different random seeds. Always use at least 100 trees.
  • Expecting full interpretability: Unlike a single decision tree, a Random Forest cannot be interpreted directly. If interpretability is essential, use a decision tree or SHAP values to explain Random Forest predictions.
  • Not using OOB score: The OOB score is a free, reliable validation metric that uses no extra data. Always set oob_score=True during development to track model performance without a separate validation set.
  • Ignoring feature importance for feature selection: Random Forest’s feature importance is one of the most practical tools for understanding which features matter. Always check it before deploying a model.

11. Frequently Asked Questions

Is Random Forest always better than a single decision tree?

In terms of accuracy and generalisation, yes — almost always. Random Forest consistently outperforms single decision trees on real datasets. The tradeoff is interpretability and training speed. If you need to explain exactly why a prediction was made (e.g., for regulatory compliance), a single tree is preferable. For accuracy, use Random Forest.

What is the difference between Random Forest and Gradient Boosting?

Both are ensemble methods using decision trees, but they work differently. Random Forest trains trees independently in parallel (bagging) and averages their predictions. Gradient Boosting trains trees sequentially — each new tree corrects the errors of the previous one. Gradient Boosting (XGBoost, LightGBM) often achieves higher accuracy but is more prone to overfitting and requires more careful tuning. Random Forest is more robust and easier to use out of the box.

Does Random Forest require feature scaling?

No. Because Random Forest is based on decision trees, which make splits based on thresholds rather than distances, feature scaling (normalisation or standardisation) is not required. This is one of the practical advantages of tree-based methods.

Next Steps