What is K-Nearest Neighbours (KNN)?

K-Nearest Neighbours (KNN) is a supervised machine learning algorithm that classifies a new data point by looking at the K closest training examples in feature space and assigning the majority class among them. It is a lazy learner — it stores all training data and makes predictions at query time rather than learning an explicit model during training.

How do you choose the value of K in KNN?

Choose K using cross-validation — try multiple values (K=1, 3, 5, 7, ..., 20) and select the one with the best validation accuracy. Small K (e.g., K=1) gives a complex, noisy boundary (high variance). Large K gives a smoother boundary (high bias). A common starting point is K = √n where n is the number of training examples. Always use an odd K for binary classification to avoid ties.

What distance metric does KNN use?

KNN most commonly uses Euclidean distance: d = √Σ(xᵢ - yᵢ)². Other options include Manhattan distance (sum of absolute differences), Minkowski distance (generalisation of both), and Hamming distance (for categorical data). The choice of distance metric depends on the data type and domain. Always scale features before using distance-based metrics.

K-Nearest Neighbours (KNN)

Algorithm Explained for Engineering Students

Last Updated: March 2026

📌 Key Takeaways

Definition: KNN classifies a new point by majority vote of its K nearest training examples in feature space.
Lazy learner: No explicit training phase — all computation happens at prediction time.
Distance metric: Euclidean distance (default). Always scale features first.
Choosing K: Use cross-validation. Small K = high variance; Large K = high bias. Start with K = √n.
Works for: Both classification (majority vote) and regression (average of K neighbours).
Main weakness: Slow at prediction time for large datasets — must compute distance to every training point.

1. What is KNN?

K-Nearest Neighbours (KNN) is a supervised ML algorithm that makes predictions based on the K most similar examples in the training dataset. For classification, it assigns the majority class among the K neighbours. For regression, it predicts the average value of the K neighbours.

KNN is a non-parametric, lazy learning algorithm. Non-parametric means it makes no assumptions about the underlying data distribution. Lazy means it does not learn an explicit model during training — it memorises all training examples and defers all computation to prediction time.

Analogy — New Student in Class

Imagine a new student joins your engineering class and you want to predict whether they will score above 80% in exams. You look at the 5 students most similar to them in terms of study hours, attendance, and assignment scores. If 4 out of those 5 students scored above 80%, you predict the new student will too. KNN works exactly this way — “tell me who your neighbours are and I’ll tell you who you are.”

2. How KNN Works — Step by Step

Choose K — the number of neighbours to consider.
Scale features — normalise or standardise all input features (critical for distance-based methods).
Store all training examples — no model parameters to learn.
For a new query point:
1. Compute the distance from the query point to every training example.
2. Sort by distance and select the K smallest.
3. Classification: assign the majority class among K neighbours.
4. Regression: assign the average value of K neighbours.

The entire “training” of KNN is just storing the data — O(1) time. All computational cost is at prediction time — O(n) per query (must compute distance to all n training examples), which makes KNN slow for large datasets.

3. Distance Metrics

Metric	Formula	Best For
Euclidean	d = √Σ(xᵢ − yᵢ)²	Continuous features, general use (default)
Manhattan	d = Σ\|xᵢ − yᵢ\|	High-dimensional data, robust to outliers
Minkowski	d = (Σ\|xᵢ − yᵢ\|ᵖ)^(1/p)	Generalisation: p=1 → Manhattan, p=2 → Euclidean
Hamming	Fraction of positions that differ	Categorical or binary features
Cosine Similarity	cos(θ) = (x · y) / (\|\|x\|\| × \|\|y\|\|)	Text data, high-dimensional sparse vectors

Critical rule: Always scale features before computing distances. A feature with values in thousands will completely dominate Euclidean distance calculations, making the other features irrelevant. Use StandardScaler or MinMaxScaler.

4. Choosing the Right K

K is the most important hyperparameter in KNN:

K = 1: Very sensitive to noise — high variance, low bias (overfitting).
Large K (e.g., K = 50): Smooth boundary, robust to noise — low variance, higher bias (underfitting).
Optimal K: Found using cross-validation. Plot validation accuracy vs K and choose the best.

Practical starting point: K = √n, where n is the number of training examples. Always use an odd K for binary classification to avoid ties.

K Value	Decision Boundary	Bias	Variance
K = 1	Very jagged, irregular	Low	High (overfitting)
K = 5–15	Moderately smooth	Moderate	Moderate
K = 50+	Very smooth, nearly linear	High	Low (underfitting)

5. Worked Example

Problem: Classify point P = (5, 4) using K=3. Training data:

Point	x₁	x₂	Class
A	2	3	Red
B	4	5	Blue
C	6	3	Blue
D	3	6	Red
E	7	5	Blue

Distances from P=(5,4): A≈3.16 (Red), B≈1.41 (Blue), C≈1.41 (Blue), D≈2.83 (Red), E≈2.24 (Blue)

K=3 nearest: B (1.41), C (1.41), E (2.24) — all Blue

Prediction: Blue (3 Blue, 0 Red)

6. KNN for Classification vs Regression

Aspect	KNN Classification	KNN Regression
Output	Class label (majority vote)	Continuous value (average of K neighbours)
Prediction rule	ŷ = mode(y₁, y₂, …, yₖ)	ŷ = (1/K) × Σyᵢ
Weighted variant	Weight vote by 1/distance	Weight average by 1/distance
Sklearn class	KNeighborsClassifier	KNeighborsRegressor

Weighted KNN gives closer neighbours more influence — often improves performance. Set weights=’distance’ in Scikit-learn.

7. Advantages & Limitations

Advantages	Limitations
Simple to understand and implement	Slow prediction — O(n) per query for large datasets
No training phase — instantly adapts to new data	High memory usage — must store all training data
Naturally handles multi-class problems	Sensitive to irrelevant features and feature scale
No assumptions about data distribution	Struggles with high-dimensional data (curse of dimensionality)
Can model complex, non-linear boundaries	Requires careful tuning of K and distance metric

Curse of Dimensionality

As the number of features increases, the concept of “nearest neighbour” breaks down. In high dimensions, all points tend to be approximately equidistant. Practical limit: KNN works well with fewer than ~20 features. Beyond that, use PCA first.

8. Python Code


import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# KNN Pipeline with scaling (ALWAYS scale for KNN)
knn_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('knn', KNeighborsClassifier(n_neighbors=5, weights='distance', metric='euclidean'))
])

# Find optimal K using cross-validation
param_grid = {'knn__n_neighbors': list(range(1, 21, 2))}  # Odd values only
grid_search = GridSearchCV(knn_pipeline, param_grid, cv=10, scoring='accuracy')
grid_search.fit(X, y)

print(f"Best K:        {grid_search.best_params_['knn__n_neighbors']}")
print(f"Best CV Score: {grid_search.best_score_:.3f}")

# Evaluate best model
best_model = grid_search.best_estimator_
cv_scores = cross_val_score(best_model, X, y, cv=10, scoring='accuracy')
print(f"10-Fold CV Accuracy: {cv_scores.mean():.3f} +/- {cv_scores.std():.3f}")

9. Common Mistakes Students Make

Not scaling features: The single most common mistake with KNN. Always use StandardScaler before KNN.
Using K=1 without cross-validation: K=1 always achieves 0% training error but typically overfits badly.
Applying KNN to high-dimensional data: With 100+ features, KNN degrades badly. Apply PCA first.
Using even K for binary classification: Use odd values (K=3, 5, 7…) to avoid ties.

10. Frequently Asked Questions

Is KNN a supervised or unsupervised algorithm?

KNN is a supervised algorithm — it requires labelled training data. Do not confuse it with K-Means, which is unsupervised clustering. Both use K and distance, but they solve completely different problems.

Why is KNN called a lazy learner?

KNN is called a lazy learner because it does not build an explicit model during training — it simply memorises all training data. All computation happens at prediction time. This is the opposite of eager learners like Logistic Regression or Neural Networks.

When should I use KNN over other algorithms?

KNN is a good choice when: the dataset is small (under ~10,000 examples), the number of features is low (under ~20), you need a simple baseline model quickly, or the decision boundary is highly non-linear. For large datasets or high-dimensional data, prefer SVM, Random Forest, or Neural Networks.

K-Nearest Neighbours (KNN)

K-Nearest Neighbours (KNN)

📌 Key Takeaways

1. What is KNN?

Analogy — New Student in Class

2. How KNN Works — Step by Step

3. Distance Metrics

4. Choosing the Right K

5. Worked Example

6. KNN for Classification vs Regression

7. Advantages & Limitations

Curse of Dimensionality

8. Python Code

9. Common Mistakes Students Make

10. Frequently Asked Questions

Is KNN a supervised or unsupervised algorithm?

Why is KNN called a lazy learner?

When should I use KNN over other algorithms?

Next Steps

Next Steps

Leave a Comment