Hashing
Hash functions, collision resolution, load factor analysis — O(1) average lookup. A consistent 1–2 mark topic in GATE CS with collision-tracing and expected-cost questions.
Last updated: April 2026 | GATE CS 2024–2026 syllabus
Key Takeaways
- Hash function h(k) maps key k to a slot in a hash table of size m. Ideal: uniform distribution.
- Load factor α = n/m (n = keys, m = slots). Performance degrades as α increases.
- Chaining: each slot holds a linked list of colliding keys. Search O(1+α) average.
- Open addressing: all keys in table; on collision probe next slot. Table must not be full.
- Linear probing: simple but causes primary clustering. Quadratic: secondary clustering. Double hashing: best probe distribution.
- Expected comparisons (chaining, successful): 1 + α/2. Unsuccessful: 1 + α.
- Open addressing: expected probes (successful) ≈ (1/α)ln(1/(1−α)). Unsuccessful ≈ 1/(1−α).
1. Hash Functions
Multiplication method: h(k) = ⌊m(kA mod 1)⌋ where A ≈ 0.618 (golden ratio). Works for any m.
Universal hashing: ha,b(k) = ((ak + b) mod p) mod m where p = prime > m. Minimises worst-case via randomisation.
Properties of a good hash function: (1) Deterministic. (2) Uniform distribution across slots. (3) Fast to compute. (4) Avalanche effect — small key change → large hash change.
2. Collision Resolution: Chaining
Each slot contains a linked list. On collision, append to the list. On search, scan the list at h(k).
Search (successful): O(1 + α/2) expected
Search (unsuccessful): O(1 + α) expected
Space: O(n + m) for n keys in m slots
Chaining works well for α > 1 (more keys than slots). The table never “fills up.”
3. Open Addressing (Overview)
All keys stored in the table itself. On collision, probe alternative slots using a probe sequence. Table load factor must stay < 1.
Must be a permutation of 0 to m−1 so all slots are checked.
Delete: Cannot simply remove an element (breaks probe chains). Use a “deleted” marker (tombstone) — treated as empty for insertion but not for search.
4. Linear Probing
Advantage: Simple, good cache performance (contiguous memory access).
Disadvantage: Primary clustering — long contiguous runs of occupied slots form, slowing future insertions and lookups.
If slots h(k), h(k)+1, …, h(k)+j are occupied, any key hashing to any of these slots extends the cluster. Clusters grow faster than linearly.
5. Quadratic Probing
Common: h(k, i) = (h(k) + i + i²) mod m / 2 (for m = power of 2 or prime).
Advantage: Eliminates primary clustering.
Disadvantage: Secondary clustering — keys with same h(k) follow the same probe sequence. May not visit all slots (guaranteed only if m is prime and α ≤ 0.5, or specific c1, c2).
6. Double Hashing
Requirements: h2(k) must be coprime with m. Common: m = prime, h2(k) = 1 + (k mod (m−1)).
Advantage: Each key has unique probe increment → eliminates both primary and secondary clustering.
Closest to random probing in practice.
7. Load Factor and Expected Cost Analysis
α = n/m (n = keys inserted, m = table size). Assumes simple uniform hashing.
| Method | Successful Search | Unsuccessful Search |
|---|---|---|
| Chaining | 1 + α/2 | 1 + α |
| Linear probing | ½(1 + 1/(1−α)) | ½(1 + 1/(1−α)²) |
| Double hashing | (1/α)ln(1/(1−α)) | 1/(1−α) |
α = 0.5 → successful: 1.25, unsuccessful: 1.5
α = 1.0 → successful: 1.5, unsuccessful: 2.0
α = 2.0 → successful: 2.0, unsuccessful: 3.0
8. GATE CS Solved Examples
Example 1 — Linear probing insertion (GATE CS 2019)
Q: Insert keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of size 11 using h(k) = k mod 11 and linear probing.
| Key | h(k) | Probe slots | Final slot |
|---|---|---|---|
| 10 | 10 | 10 | 10 |
| 22 | 0 | 0 | 0 |
| 31 | 9 | 9 | 9 |
| 4 | 4 | 4 | 4 |
| 15 | 4 | 4(occ), 5 | 5 |
| 28 | 6 | 6 | 6 |
| 17 | 6 | 6(occ), 7 | 7 |
| 88 | 0 | 0(occ), 1 | 1 |
| 59 | 4 | 4,5,6,7(occ), 8 | 8 |
Example 2 — Expected comparisons (GATE CS 2016)
Q: A hash table uses chaining. n = 100 keys, m = 10 slots. Expected successful search comparisons?
α = 100/10 = 10. Expected = 1 + α/2 = 1 + 5 = 6.
Example 3 — Double hashing sequence (GATE CS 2020)
Q: Table size m=7. h1(k)=k mod 7, h2(k)=1+(k mod 6). Insert key k=27 when slots 6, 0, 1 are occupied.
h1(27) = 27 mod 7 = 6 (occupied). h2(27) = 1 + (27 mod 6) = 1+3 = 4.
Probe 1: (6 + 1×4) mod 7 = 10 mod 7 = 3 (empty).
9. Common Mistakes
Mistake 1 — Using simple deletion in open addressing
Deleting by marking a slot empty breaks probe chains. Subsequent searches for keys that probed through this slot will fail prematurely. Always use a tombstone (DELETED marker).
Mistake 2 — Load factor α > 1 in open addressing
Open addressing requires α < 1 (n < m). The table fills up and insertion becomes impossible. Chaining has no such restriction.
Mistake 3 — Confusing successful and unsuccessful search formulas
Successful (chaining): 1 + α/2. Unsuccessful: 1 + α. The unsuccessful case is higher because you traverse the entire chain before concluding absence.
Mistake 4 — Quadratic probing visits all slots
Quadratic probing only guarantees visiting all m slots if m is prime AND the table is less than half full (α < 0.5). Otherwise, some slots may never be probed.
Mistake 5 — h2(k) = 0 in double hashing
If h2(k) = 0 for some key, the probe sequence reduces to h1(k) repeated infinitely — infinite loop. Always ensure h2(k) ≥ 1. Standard: h2(k) = 1 + (k mod (m−1)).
10. FAQ
What is the expected number of comparisons for a successful search in chaining?
1 + α/2 where α = n/m is the load factor. Unsuccessful: 1 + α. Both assume simple uniform hashing.
What is the difference between linear probing and quadratic probing?
Linear: probe h(k)+i — primary clustering. Quadratic: probe h(k)+i² — secondary clustering, may miss slots. Both are open addressing variants.
What is double hashing and why is it preferred?
Probe sequence = (h1(k) + i×h2(k)) mod m. Unique increment per key eliminates both clustering types. Closest to uniform probing in practice.
What is a perfect hash function?
Maps n keys to n slots with no collisions. Built when key set is known in advance. Minimal perfect hash uses exactly n slots. Used in compiler keyword tables.