Hashing

Q: What is the expected number of comparisons for a successful search in chaining?

With chaining and load factor alpha = n/m (n keys, m slots), the expected comparisons for a successful search is 1 + alpha/2 (approximately). For an unsuccessful search: 1 + alpha. Both assume simple uniform hashing.

Q: What is the difference between linear probing and quadratic probing?

Linear probing: probe sequence h(k), h(k)+1, h(k)+2, ... (mod m). Simple but causes primary clustering (long runs of occupied slots). Quadratic probing: h(k), h(k)+1^2, h(k)+2^2, ... (mod m). Avoids primary clustering but can cause secondary clustering (all keys with same h(k) follow same probe sequence). Double hashing avoids both clustering types.

Q: What is double hashing and why is it preferred?

Double hashing uses two hash functions: probe sequence = (h1(k) + i*h2(k)) mod m. Each key has a unique probe increment h2(k), so keys with the same h1(k) follow different probe sequences. This eliminates both primary and secondary clustering. h2(k) must be non-zero and coprime with m (choose m prime and h2(k) = 1 + (k mod (m-1))).

Q: What is a perfect hash function?

A perfect hash function maps n keys to n slots with no collisions. It can be built when the key set is known in advance. A minimal perfect hash function uses exactly n slots (no wasted space). Used in static lookup tables like compiler keyword tables, read-only databases.

Hash functions, collision resolution, load factor analysis — O(1) average lookup. A consistent 1–2 mark topic in GATE CS with collision-tracing and expected-cost questions.

Last updated: April 2026 | GATE CS 2024–2026 syllabus

Key Takeaways

Hash function h(k) maps key k to a slot in a hash table of size m. Ideal: uniform distribution.
Load factor α = n/m (n = keys, m = slots). Performance degrades as α increases.
Chaining: each slot holds a linked list of colliding keys. Search O(1+α) average.
Open addressing: all keys in table; on collision probe next slot. Table must not be full.
Linear probing: simple but causes primary clustering. Quadratic: secondary clustering. Double hashing: best probe distribution.
Expected comparisons (chaining, successful): 1 + α/2. Unsuccessful: 1 + α.
Open addressing: expected probes (successful) ≈ (1/α)ln(1/(1−α)). Unsuccessful ≈ 1/(1−α).

1. Hash Functions

Division method: h(k) = k mod m. Choose m = prime (not close to 2^p or 10^p).
Multiplication method: h(k) = ⌊m(kA mod 1)⌋ where A ≈ 0.618 (golden ratio). Works for any m.
Universal hashing: h_a,b(k) = ((ak + b) mod p) mod m where p = prime > m. Minimises worst-case via randomisation.

Properties of a good hash function: (1) Deterministic. (2) Uniform distribution across slots. (3) Fast to compute. (4) Avalanche effect — small key change → large hash change.

2. Collision Resolution: Chaining

Each slot contains a linked list. On collision, append to the list. On search, scan the list at h(k).

Insert: O(1) (prepend to list)
Search (successful): O(1 + α/2) expected
Search (unsuccessful): O(1 + α) expected
Space: O(n + m) for n keys in m slots

Chaining works well for α > 1 (more keys than slots). The table never “fills up.”

3. Open Addressing (Overview)

All keys stored in the table itself. On collision, probe alternative slots using a probe sequence. Table load factor must stay < 1.

General probe sequence: h(k,0), h(k,1), h(k,2), …, h(k,m−1)
Must be a permutation of 0 to m−1 so all slots are checked.

Delete: Cannot simply remove an element (breaks probe chains). Use a “deleted” marker (tombstone) — treated as empty for insertion but not for search.

4. Linear Probing

Probe sequence: h(k, i) = (h(k) + i) mod m for i = 0, 1, 2, …

Advantage: Simple, good cache performance (contiguous memory access).
Disadvantage: Primary clustering — long contiguous runs of occupied slots form, slowing future insertions and lookups.

If slots h(k), h(k)+1, …, h(k)+j are occupied, any key hashing to any of these slots extends the cluster. Clusters grow faster than linearly.

5. Quadratic Probing

Probe sequence: h(k, i) = (h(k) + c₁i + c₂i²) mod m
Common: h(k, i) = (h(k) + i + i²) mod m / 2 (for m = power of 2 or prime).

Advantage: Eliminates primary clustering.
Disadvantage: Secondary clustering — keys with same h(k) follow the same probe sequence. May not visit all slots (guaranteed only if m is prime and α ≤ 0.5, or specific c₁, c₂).

6. Double Hashing

Probe sequence: h(k, i) = (h₁(k) + i × h₂(k)) mod m

Requirements: h₂(k) must be coprime with m. Common: m = prime, h₂(k) = 1 + (k mod (m−1)).

Advantage: Each key has unique probe increment → eliminates both primary and secondary clustering.
Closest to random probing in practice.

7. Load Factor and Expected Cost Analysis

α = n/m (n = keys inserted, m = table size). Assumes simple uniform hashing.

Method	Successful Search	Unsuccessful Search
Chaining	1 + α/2	1 + α
Linear probing	½(1 + 1/(1−α))	½(1 + 1/(1−α)²)
Double hashing	(1/α)ln(1/(1−α))	1/(1−α)

GATE rule of thumb (chaining):
α = 0.5 → successful: 1.25, unsuccessful: 1.5
α = 1.0 → successful: 1.5, unsuccessful: 2.0
α = 2.0 → successful: 2.0, unsuccessful: 3.0

8. GATE CS Solved Examples

Example 1 — Linear probing insertion (GATE CS 2019)

Q: Insert keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of size 11 using h(k) = k mod 11 and linear probing.

Key	h(k)	Probe slots	Final slot
10	10	10	10
22	0	0	0
31	9	9	9
4	4	4	4
15	4	4(occ), 5	5
28	6	6	6
17	6	6(occ), 7	7
88	0	0(occ), 1	1
59	4	4,5,6,7(occ), 8	8

Table: [22, 88, _, _, 4, 15, 28, 17, 59, 31, 10]

Example 2 — Expected comparisons (GATE CS 2016)

Q: A hash table uses chaining. n = 100 keys, m = 10 slots. Expected successful search comparisons?

α = 100/10 = 10. Expected = 1 + α/2 = 1 + 5 = 6.

Answer: 6 comparisons on average.

Example 3 — Double hashing sequence (GATE CS 2020)

Q: Table size m=7. h₁(k)=k mod 7, h₂(k)=1+(k mod 6). Insert key k=27 when slots 6, 0, 1 are occupied.

h₁(27) = 27 mod 7 = 6 (occupied). h₂(27) = 1 + (27 mod 6) = 1+3 = 4.

Probe 1: (6 + 1×4) mod 7 = 10 mod 7 = 3 (empty).

Key 27 inserted at slot 3.

9. Common Mistakes

Mistake 1 — Using simple deletion in open addressing

Deleting by marking a slot empty breaks probe chains. Subsequent searches for keys that probed through this slot will fail prematurely. Always use a tombstone (DELETED marker).

Mistake 2 — Load factor α > 1 in open addressing

Open addressing requires α < 1 (n < m). The table fills up and insertion becomes impossible. Chaining has no such restriction.

Mistake 3 — Confusing successful and unsuccessful search formulas

Successful (chaining): 1 + α/2. Unsuccessful: 1 + α. The unsuccessful case is higher because you traverse the entire chain before concluding absence.

Mistake 4 — Quadratic probing visits all slots

Quadratic probing only guarantees visiting all m slots if m is prime AND the table is less than half full (α < 0.5). Otherwise, some slots may never be probed.

Mistake 5 — h₂(k) = 0 in double hashing

If h₂(k) = 0 for some key, the probe sequence reduces to h₁(k) repeated infinitely — infinite loop. Always ensure h₂(k) ≥ 1. Standard: h₂(k) = 1 + (k mod (m−1)).

10. FAQ

What is the expected number of comparisons for a successful search in chaining?

1 + α/2 where α = n/m is the load factor. Unsuccessful: 1 + α. Both assume simple uniform hashing.

What is the difference between linear probing and quadratic probing?

Linear: probe h(k)+i — primary clustering. Quadratic: probe h(k)+i² — secondary clustering, may miss slots. Both are open addressing variants.

What is double hashing and why is it preferred?

Probe sequence = (h₁(k) + i×h₂(k)) mod m. Unique increment per key eliminates both clustering types. Closest to uniform probing in practice.

What is a perfect hash function?

Maps n keys to n slots with no collisions. Built when key set is known in advance. Minimal perfect hash uses exactly n slots. Used in compiler keyword tables.

Hashing – Data Structures | GATE CS

Hashing

Key Takeaways

1. Hash Functions

2. Collision Resolution: Chaining

3. Open Addressing (Overview)

4. Linear Probing

5. Quadratic Probing

6. Double Hashing

7. Load Factor and Expected Cost Analysis

8. GATE CS Solved Examples

Example 1 — Linear probing insertion (GATE CS 2019)

Example 2 — Expected comparisons (GATE CS 2016)

Example 3 — Double hashing sequence (GATE CS 2020)

9. Common Mistakes

Mistake 1 — Using simple deletion in open addressing

Mistake 2 — Load factor α > 1 in open addressing

Mistake 3 — Confusing successful and unsuccessful search formulas

Mistake 4 — Quadratic probing visits all slots

Mistake 5 — h₂(k) = 0 in double hashing

10. FAQ

What is the expected number of comparisons for a successful search in chaining?

What is the difference between linear probing and quadratic probing?

What is double hashing and why is it preferred?

What is a perfect hash function?

Next in Data Structures

Key Takeaways

1. Hash Functions

2. Collision Resolution: Chaining

3. Open Addressing (Overview)

4. Linear Probing

5. Quadratic Probing

6. Double Hashing

7. Load Factor and Expected Cost Analysis

8. GATE CS Solved Examples

Example 1 — Linear probing insertion (GATE CS 2019)

Example 2 — Expected comparisons (GATE CS 2016)

Example 3 — Double hashing sequence (GATE CS 2020)

9. Common Mistakes

Mistake 1 — Using simple deletion in open addressing

Mistake 2 — Load factor α > 1 in open addressing

Mistake 3 — Confusing successful and unsuccessful search formulas

Mistake 4 — Quadratic probing visits all slots

Mistake 5 — h2(k) = 0 in double hashing

10. FAQ

What is the expected number of comparisons for a successful search in chaining?

What is the difference between linear probing and quadratic probing?

What is double hashing and why is it preferred?

What is a perfect hash function?

Next in Data Structures

Mistake 5 — h₂(k) = 0 in double hashing