What is Random Forest in machine learning?

Random Forest is an ensemble machine learning algorithm that builds a large number of decision trees during training and combines their predictions — using majority voting for classification or averaging for regression. It reduces overfitting by introducing two sources of randomness: training each tree on a random bootstrap sample of the data, and considering only a random subset of features at each split.

What is bagging in Random Forest?

Bagging (Bootstrap Aggregating) is the technique used in Random Forest where each tree is trained on a different random sample of the training data, drawn with replacement (bootstrap sample). This means each tree sees a slightly different dataset, reducing correlation between trees and lowering variance. Typically about 63% of the original training examples appear in each bootstrap sample.

How many trees should a Random Forest have?

More trees generally means better performance and stability, but with diminishing returns. A good starting point is 100 trees (the default in Scikit-learn). Increasing beyond 500 trees rarely improves performance significantly but increases training time. Use cross-validation to find the optimal number for your specific dataset.

Random Forest Algorithm

Explained Simply for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

Definition: Random Forest is an ensemble of decision trees — it trains many trees and combines their predictions to get a more accurate, stable result.
Two sources of randomness: (1) Each tree trains on a random bootstrap sample of data. (2) Each split considers only a random subset of features.
Prediction: Classification → majority vote of all trees. Regression → average of all tree outputs.
Key strength: Much less prone to overfitting than a single decision tree. Excellent out-of-the-box performance.
Key hyperparameters: n_estimators (number of trees), max_depth, max_features.
Bonus: Provides feature importance scores — tells you which features matter most.

1. What is Random Forest?

Random Forest is an ensemble machine learning algorithm that builds a large collection of decision trees during training and combines their predictions to produce a single, more accurate and stable output.

It was introduced by Leo Breiman in 2001 and remains one of the most powerful and widely used ML algorithms for structured/tabular data. It works for both classification and regression tasks.

Analogy — Wisdom of the Crowd

Imagine asking 500 doctors to independently diagnose a patient, each seeing slightly different information and having slightly different training. Then you take the majority vote. The combined opinion of 500 doctors is far more reliable than the opinion of any single doctor — even if each individual doctor makes some mistakes. This is exactly the intuition behind Random Forest. Each tree is a “doctor.” The forest is the committee that votes.

The key insight is that while each tree may be wrong in some cases, if the trees make different mistakes (uncorrelated errors), the majority vote will still be correct most of the time.

2. How Random Forest Works — Step by Step

Choose the number of trees (n_estimators) — typically 100 to 500.
For each tree:
1. Draw a random bootstrap sample from the training data (with replacement).
2. Grow a full decision tree on this sample, but at each split, consider only a random subset of features (not all features).
3. Do not prune the tree — let it grow deep.
To predict new data:
1. Pass the input through every tree in the forest.
2. Each tree produces a prediction.
3. Combine: majority vote (classification) or average (regression).

The two sources of randomness — bootstrap sampling and feature subsampling — are what make the trees different from each other, which is essential for the ensemble to work well.

3. Bagging — Bootstrap Aggregating

Bagging (Bootstrap Aggregating) is the technique of training each tree on a different random sample of the training data, drawn with replacement.

For a dataset of m examples, each bootstrap sample also contains m examples — but since sampling is with replacement, approximately 63% of the original examples appear (some appear multiple times, some not at all). The remaining ~37% of examples that were not sampled are called the Out-of-Bag (OOB) samples.

Out-of-Bag (OOB) Error

Each tree can be evaluated on its OOB samples — examples it never saw during training. The average OOB error across all trees gives a free estimate of the model’s generalisation error, without needing a separate validation set. This is one of Random Forest’s most convenient properties.

4. Feature Randomness — The Second Source of Randomness

In a standard decision tree, every split considers all available features and picks the best one. In Random Forest, each split considers only a random subset of features — typically:

Classification: √n features (square root of total features)
Regression: n/3 features (one-third of total features)

Why? If one feature is very strong (e.g., it is the best predictor), every tree would use it at the root — making all trees very similar and correlated. Feature randomness forces each tree to explore different combinations of features, creating diversity in the forest. Diverse trees make different errors, and diverse errors cancel out in the vote.

5. Making Predictions

Classification — Majority Vote

Each tree votes for a class. The class with the most votes wins. Example: 500 trees vote — 380 say “Approve”, 120 say “Reject” → Final prediction: Approve.

Some implementations use soft voting — each tree outputs a probability, and the probabilities are averaged across trees. This is more nuanced and usually more accurate than hard majority voting.

Regression — Average

Each tree predicts a numerical value. The final prediction is the average of all tree predictions. Example: 100 trees predict house prices; the final prediction is the mean of all 100 values.

6. Key Hyperparameters

Hyperparameter	What it controls	Default (Scikit-learn)	Tuning tip
n_estimators	Number of trees in the forest	100	More is generally better; 100–500 covers most cases
max_features	Features considered per split	‘sqrt’ (classification)	Try ‘sqrt’, ‘log2’, or a fraction like 0.3
max_depth	Maximum depth of each tree	None (fully grown)	Limit to 10–20 to speed up training
min_samples_leaf	Minimum samples at leaf node	1	Increase (e.g., 5–10) to smooth predictions
bootstrap	Whether to use bagging	True	Keep True for standard Random Forest
oob_score	Use OOB samples for evaluation	False	Set True to get a free validation score

7. Feature Importance

Random Forest provides a feature importance score for every input feature — measuring how much each feature contributes to reducing impurity across all trees. This is one of the most useful properties of Random Forest for real-world data analysis.

Feature importance is calculated as the total reduction in Gini impurity (or MSE for regression) attributable to each feature, averaged across all trees and normalised to sum to 1.

Uses of feature importance: identify which variables matter most in your dataset; remove irrelevant features to speed up training; understand the problem domain better; use as a feature selection step before training other models.

8. Random Forest vs Single Decision Tree

Feature	Single Decision Tree	Random Forest
Overfitting	High — must be pruned carefully	Low — averaging reduces variance significantly
Interpretability	High — rules can be read directly	Low — 500 trees cannot be interpreted manually
Accuracy	Moderate	High — consistently among the top performers
Training speed	Fast	Slower (proportional to n_estimators)
Feature importance	Available but noisy	Available and more reliable
Stability	Low — sensitive to data changes	High — robust to noise and outliers

9. Python Code


import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest
model = RandomForestClassifier(
    n_estimators=100,
    max_features='sqrt',
    oob_score=True,
    random_state=42
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f"Test Accuracy:     {accuracy_score(y_test, y_pred):.3f}")
print(f"OOB Score:         {model.oob_score_:.3f}")
print(classification_report(y_test, y_pred, target_names=load_iris().target_names))

# Feature importance
importances = model.feature_importances_
for name, importance in sorted(zip(feature_names, importances), key=lambda x: -x[1]):
    print(f"{name}: {importance:.4f}")

10. Common Mistakes Students Make

Using too few trees: With fewer than 50 trees, the forest is unstable — predictions change significantly with different random seeds. Always use at least 100 trees.
Expecting full interpretability: Unlike a single decision tree, a Random Forest cannot be interpreted directly. If interpretability is essential, use a decision tree or SHAP values to explain Random Forest predictions.
Not using OOB score: The OOB score is a free, reliable validation metric that uses no extra data. Always set oob_score=True during development to track model performance without a separate validation set.
Ignoring feature importance for feature selection: Random Forest’s feature importance is one of the most practical tools for understanding which features matter. Always check it before deploying a model.

11. Frequently Asked Questions

Is Random Forest always better than a single decision tree?

In terms of accuracy and generalisation, yes — almost always. Random Forest consistently outperforms single decision trees on real datasets. The tradeoff is interpretability and training speed. If you need to explain exactly why a prediction was made (e.g., for regulatory compliance), a single tree is preferable. For accuracy, use Random Forest.

What is the difference between Random Forest and Gradient Boosting?

Both are ensemble methods using decision trees, but they work differently. Random Forest trains trees independently in parallel (bagging) and averages their predictions. Gradient Boosting trains trees sequentially — each new tree corrects the errors of the previous one. Gradient Boosting (XGBoost, LightGBM) often achieves higher accuracy but is more prone to overfitting and requires more careful tuning. Random Forest is more robust and easier to use out of the box.

Does Random Forest require feature scaling?

No. Because Random Forest is based on decision trees, which make splits based on thresholds rather than distances, feature scaling (normalisation or standardisation) is not required. This is one of the practical advantages of tree-based methods.