K-Nearest Neighbours (KNN)

K-Nearest Neighbours (KNN)

Algorithm Explained for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

  • Definition: KNN classifies a new point by majority vote of its K nearest training examples in feature space.
  • Lazy learner: No explicit training phase — all computation happens at prediction time.
  • Distance metric: Euclidean distance (default). Always scale features first.
  • Choosing K: Use cross-validation. Small K = high variance; Large K = high bias. Start with K = √n.
  • Works for: Both classification (majority vote) and regression (average of K neighbours).
  • Main weakness: Slow at prediction time for large datasets — must compute distance to every training point.

1. What is KNN?

K-Nearest Neighbours (KNN) is a supervised ML algorithm that makes predictions based on the K most similar examples in the training dataset. For classification, it assigns the majority class among the K neighbours. For regression, it predicts the average value of the K neighbours.

KNN is a non-parametric, lazy learning algorithm. Non-parametric means it makes no assumptions about the underlying data distribution. Lazy means it does not learn an explicit model during training — it memorises all training examples and defers all computation to prediction time.

Analogy — New Student in Class

Imagine a new student joins your engineering class and you want to predict whether they will score above 80% in exams. You look at the 5 students most similar to them in terms of study hours, attendance, and assignment scores. If 4 out of those 5 students scored above 80%, you predict the new student will too. KNN works exactly this way — “tell me who your neighbours are and I’ll tell you who you are.”

2. How KNN Works — Step by Step

  1. Choose K — the number of neighbours to consider.
  2. Scale features — normalise or standardise all input features (critical for distance-based methods).
  3. Store all training examples — no model parameters to learn.
  4. For a new query point:
    1. Compute the distance from the query point to every training example.
    2. Sort by distance and select the K smallest.
    3. Classification: assign the majority class among K neighbours.
    4. Regression: assign the average value of K neighbours.

The entire “training” of KNN is just storing the data — O(1) time. All computational cost is at prediction time — O(n) per query (must compute distance to all n training examples), which makes KNN slow for large datasets.

3. Distance Metrics

MetricFormulaBest For
Euclideand = √Σ(xᵢ − yᵢ)²Continuous features, general use (default)
Manhattand = Σ|xᵢ − yᵢ|High-dimensional data, robust to outliers
Minkowskid = (Σ|xᵢ − yᵢ|ᵖ)^(1/p)Generalisation: p=1 → Manhattan, p=2 → Euclidean
HammingFraction of positions that differCategorical or binary features
Cosine Similaritycos(θ) = (x · y) / (||x|| × ||y||)Text data, high-dimensional sparse vectors

Critical rule: Always scale features before computing distances. A feature with values in thousands will completely dominate Euclidean distance calculations, making the other features irrelevant. Use StandardScaler or MinMaxScaler.

4. Choosing the Right K

K is the most important hyperparameter in KNN:

  • K = 1: Very sensitive to noise — high variance, low bias (overfitting).
  • Large K (e.g., K = 50): Smooth boundary, robust to noise — low variance, higher bias (underfitting).
  • Optimal K: Found using cross-validation. Plot validation accuracy vs K and choose the best.

Practical starting point: K = √n, where n is the number of training examples. Always use an odd K for binary classification to avoid ties.

K ValueDecision BoundaryBiasVariance
K = 1Very jagged, irregularLowHigh (overfitting)
K = 5–15Moderately smoothModerateModerate
K = 50+Very smooth, nearly linearHighLow (underfitting)

5. Worked Example

Problem: Classify point P = (5, 4) using K=3. Training data:

Pointx₁x₂Class
A23Red
B45Blue
C63Blue
D36Red
E75Blue

Distances from P=(5,4): A≈3.16 (Red), B≈1.41 (Blue), C≈1.41 (Blue), D≈2.83 (Red), E≈2.24 (Blue)

K=3 nearest: B (1.41), C (1.41), E (2.24) — all Blue

Prediction: Blue (3 Blue, 0 Red)

6. KNN for Classification vs Regression

AspectKNN ClassificationKNN Regression
OutputClass label (majority vote)Continuous value (average of K neighbours)
Prediction ruleŷ = mode(y₁, y₂, …, yₖ)ŷ = (1/K) × Σyᵢ
Weighted variantWeight vote by 1/distanceWeight average by 1/distance
Sklearn classKNeighborsClassifierKNeighborsRegressor

Weighted KNN gives closer neighbours more influence — often improves performance. Set weights=’distance’ in Scikit-learn.

7. Advantages & Limitations

AdvantagesLimitations
Simple to understand and implementSlow prediction — O(n) per query for large datasets
No training phase — instantly adapts to new dataHigh memory usage — must store all training data
Naturally handles multi-class problemsSensitive to irrelevant features and feature scale
No assumptions about data distributionStruggles with high-dimensional data (curse of dimensionality)
Can model complex, non-linear boundariesRequires careful tuning of K and distance metric

Curse of Dimensionality

As the number of features increases, the concept of “nearest neighbour” breaks down. In high dimensions, all points tend to be approximately equidistant. Practical limit: KNN works well with fewer than ~20 features. Beyond that, use PCA first.

8. Python Code


import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# KNN Pipeline with scaling (ALWAYS scale for KNN)
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('knn', KNeighborsClassifier(n_neighbors=5, weights='distance', metric='euclidean'))
])

# Find optimal K using cross-validation
param_grid = {'knn__n_neighbors': list(range(1, 21, 2))}  # Odd values only
grid_search = GridSearchCV(knn_pipeline, param_grid, cv=10, scoring='accuracy')
grid_search.fit(X, y)

print(f"Best K:        {grid_search.best_params_['knn__n_neighbors']}")
print(f"Best CV Score: {grid_search.best_score_:.3f}")

# Evaluate best model
best_model = grid_search.best_estimator_
cv_scores = cross_val_score(best_model, X, y, cv=10, scoring='accuracy')
print(f"10-Fold CV Accuracy: {cv_scores.mean():.3f} +/- {cv_scores.std():.3f}")
    

9. Common Mistakes Students Make

  • Not scaling features: The single most common mistake with KNN. Always use StandardScaler before KNN.
  • Using K=1 without cross-validation: K=1 always achieves 0% training error but typically overfits badly.
  • Applying KNN to high-dimensional data: With 100+ features, KNN degrades badly. Apply PCA first.
  • Using even K for binary classification: Use odd values (K=3, 5, 7…) to avoid ties.

10. Frequently Asked Questions

Is KNN a supervised or unsupervised algorithm?

KNN is a supervised algorithm — it requires labelled training data. Do not confuse it with K-Means, which is unsupervised clustering. Both use K and distance, but they solve completely different problems.

Why is KNN called a lazy learner?

KNN is called a lazy learner because it does not build an explicit model during training — it simply memorises all training data. All computation happens at prediction time. This is the opposite of eager learners like Logistic Regression or Neural Networks.

When should I use KNN over other algorithms?

KNN is a good choice when: the dataset is small (under ~10,000 examples), the number of features is low (under ~20), you need a simple baseline model quickly, or the decision boundary is highly non-linear. For large datasets or high-dimensional data, prefer SVM, Random Forest, or Neural Networks.

Next Steps

Leave a Comment