K-Nearest Neighbours (KNN)
Algorithm Explained for Engineering Students
Last Updated: March 2026
📌 Key Takeaways
- Definition: KNN classifies a new point by majority vote of its K nearest training examples in feature space.
- Lazy learner: No explicit training phase — all computation happens at prediction time.
- Distance metric: Euclidean distance (default). Always scale features first.
- Choosing K: Use cross-validation. Small K = high variance; Large K = high bias. Start with K = √n.
- Works for: Both classification (majority vote) and regression (average of K neighbours).
- Main weakness: Slow at prediction time for large datasets — must compute distance to every training point.
1. What is KNN?
K-Nearest Neighbours (KNN) is a supervised ML algorithm that makes predictions based on the K most similar examples in the training dataset. For classification, it assigns the majority class among the K neighbours. For regression, it predicts the average value of the K neighbours.
KNN is a non-parametric, lazy learning algorithm. Non-parametric means it makes no assumptions about the underlying data distribution. Lazy means it does not learn an explicit model during training — it memorises all training examples and defers all computation to prediction time.
Analogy — New Student in Class
Imagine a new student joins your engineering class and you want to predict whether they will score above 80% in exams. You look at the 5 students most similar to them in terms of study hours, attendance, and assignment scores. If 4 out of those 5 students scored above 80%, you predict the new student will too. KNN works exactly this way — “tell me who your neighbours are and I’ll tell you who you are.”
2. How KNN Works — Step by Step
- Choose K — the number of neighbours to consider.
- Scale features — normalise or standardise all input features (critical for distance-based methods).
- Store all training examples — no model parameters to learn.
- For a new query point:
- Compute the distance from the query point to every training example.
- Sort by distance and select the K smallest.
- Classification: assign the majority class among K neighbours.
- Regression: assign the average value of K neighbours.
The entire “training” of KNN is just storing the data — O(1) time. All computational cost is at prediction time — O(n) per query (must compute distance to all n training examples), which makes KNN slow for large datasets.
3. Distance Metrics
| Metric | Formula | Best For |
|---|---|---|
| Euclidean | d = √Σ(xᵢ − yᵢ)² | Continuous features, general use (default) |
| Manhattan | d = Σ|xᵢ − yᵢ| | High-dimensional data, robust to outliers |
| Minkowski | d = (Σ|xᵢ − yᵢ|ᵖ)^(1/p) | Generalisation: p=1 → Manhattan, p=2 → Euclidean |
| Hamming | Fraction of positions that differ | Categorical or binary features |
| Cosine Similarity | cos(θ) = (x · y) / (||x|| × ||y||) | Text data, high-dimensional sparse vectors |
Critical rule: Always scale features before computing distances. A feature with values in thousands will completely dominate Euclidean distance calculations, making the other features irrelevant. Use StandardScaler or MinMaxScaler.
4. Choosing the Right K
K is the most important hyperparameter in KNN:
- K = 1: Very sensitive to noise — high variance, low bias (overfitting).
- Large K (e.g., K = 50): Smooth boundary, robust to noise — low variance, higher bias (underfitting).
- Optimal K: Found using cross-validation. Plot validation accuracy vs K and choose the best.
Practical starting point: K = √n, where n is the number of training examples. Always use an odd K for binary classification to avoid ties.
| K Value | Decision Boundary | Bias | Variance |
|---|---|---|---|
| K = 1 | Very jagged, irregular | Low | High (overfitting) |
| K = 5–15 | Moderately smooth | Moderate | Moderate |
| K = 50+ | Very smooth, nearly linear | High | Low (underfitting) |
5. Worked Example
Problem: Classify point P = (5, 4) using K=3. Training data:
| Point | x₁ | x₂ | Class |
|---|---|---|---|
| A | 2 | 3 | Red |
| B | 4 | 5 | Blue |
| C | 6 | 3 | Blue |
| D | 3 | 6 | Red |
| E | 7 | 5 | Blue |
Distances from P=(5,4): A≈3.16 (Red), B≈1.41 (Blue), C≈1.41 (Blue), D≈2.83 (Red), E≈2.24 (Blue)
K=3 nearest: B (1.41), C (1.41), E (2.24) — all Blue
Prediction: Blue (3 Blue, 0 Red)
6. KNN for Classification vs Regression
| Aspect | KNN Classification | KNN Regression |
|---|---|---|
| Output | Class label (majority vote) | Continuous value (average of K neighbours) |
| Prediction rule | ŷ = mode(y₁, y₂, …, yₖ) | ŷ = (1/K) × Σyᵢ |
| Weighted variant | Weight vote by 1/distance | Weight average by 1/distance |
| Sklearn class | KNeighborsClassifier | KNeighborsRegressor |
Weighted KNN gives closer neighbours more influence — often improves performance. Set weights=’distance’ in Scikit-learn.
7. Advantages & Limitations
| Advantages | Limitations |
|---|---|
| Simple to understand and implement | Slow prediction — O(n) per query for large datasets |
| No training phase — instantly adapts to new data | High memory usage — must store all training data |
| Naturally handles multi-class problems | Sensitive to irrelevant features and feature scale |
| No assumptions about data distribution | Struggles with high-dimensional data (curse of dimensionality) |
| Can model complex, non-linear boundaries | Requires careful tuning of K and distance metric |
Curse of Dimensionality
As the number of features increases, the concept of “nearest neighbour” breaks down. In high dimensions, all points tend to be approximately equidistant. Practical limit: KNN works well with fewer than ~20 features. Beyond that, use PCA first.
8. Python Code
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.datasets import load_iris
# Load data
X, y = load_iris(return_X_y=True)
# KNN Pipeline with scaling (ALWAYS scale for KNN)
knn_pipeline = Pipeline([
('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=5, weights='distance', metric='euclidean'))
])
# Find optimal K using cross-validation
param_grid = {'knn__n_neighbors': list(range(1, 21, 2))} # Odd values only
grid_search = GridSearchCV(knn_pipeline, param_grid, cv=10, scoring='accuracy')
grid_search.fit(X, y)
print(f"Best K: {grid_search.best_params_['knn__n_neighbors']}")
print(f"Best CV Score: {grid_search.best_score_:.3f}")
# Evaluate best model
best_model = grid_search.best_estimator_
cv_scores = cross_val_score(best_model, X, y, cv=10, scoring='accuracy')
print(f"10-Fold CV Accuracy: {cv_scores.mean():.3f} +/- {cv_scores.std():.3f}")
9. Common Mistakes Students Make
- Not scaling features: The single most common mistake with KNN. Always use StandardScaler before KNN.
- Using K=1 without cross-validation: K=1 always achieves 0% training error but typically overfits badly.
- Applying KNN to high-dimensional data: With 100+ features, KNN degrades badly. Apply PCA first.
- Using even K for binary classification: Use odd values (K=3, 5, 7…) to avoid ties.
10. Frequently Asked Questions
Is KNN a supervised or unsupervised algorithm?
KNN is a supervised algorithm — it requires labelled training data. Do not confuse it with K-Means, which is unsupervised clustering. Both use K and distance, but they solve completely different problems.
Why is KNN called a lazy learner?
KNN is called a lazy learner because it does not build an explicit model during training — it simply memorises all training data. All computation happens at prediction time. This is the opposite of eager learners like Logistic Regression or Neural Networks.
When should I use KNN over other algorithms?
KNN is a good choice when: the dataset is small (under ~10,000 examples), the number of features is low (under ~20), you need a simple baseline model quickly, or the decision boundary is highly non-linear. For large datasets or high-dimensional data, prefer SVM, Random Forest, or Neural Networks.