Hashing – Data Structures | GATE CS


Hashing

Hash functions, collision resolution, load factor analysis — O(1) average lookup. A consistent 1–2 mark topic in GATE CS with collision-tracing and expected-cost questions.

Last updated: April 2026  |  GATE CS 2024–2026 syllabus

Key Takeaways

  • Hash function h(k) maps key k to a slot in a hash table of size m. Ideal: uniform distribution.
  • Load factor α = n/m (n = keys, m = slots). Performance degrades as α increases.
  • Chaining: each slot holds a linked list of colliding keys. Search O(1+α) average.
  • Open addressing: all keys in table; on collision probe next slot. Table must not be full.
  • Linear probing: simple but causes primary clustering. Quadratic: secondary clustering. Double hashing: best probe distribution.
  • Expected comparisons (chaining, successful): 1 + α/2. Unsuccessful: 1 + α.
  • Open addressing: expected probes (successful) ≈ (1/α)ln(1/(1−α)). Unsuccessful ≈ 1/(1−α).

1. Hash Functions

Division method: h(k) = k mod m. Choose m = prime (not close to 2p or 10p).
Multiplication method: h(k) = ⌊m(kA mod 1)⌋ where A ≈ 0.618 (golden ratio). Works for any m.
Universal hashing: ha,b(k) = ((ak + b) mod p) mod m where p = prime > m. Minimises worst-case via randomisation.

Properties of a good hash function: (1) Deterministic. (2) Uniform distribution across slots. (3) Fast to compute. (4) Avalanche effect — small key change → large hash change.

2. Collision Resolution: Chaining

Each slot contains a linked list. On collision, append to the list. On search, scan the list at h(k).

Insert: O(1) (prepend to list)
Search (successful): O(1 + α/2) expected
Search (unsuccessful): O(1 + α) expected
Space: O(n + m) for n keys in m slots

Chaining works well for α > 1 (more keys than slots). The table never “fills up.”

3. Open Addressing (Overview)

All keys stored in the table itself. On collision, probe alternative slots using a probe sequence. Table load factor must stay < 1.

General probe sequence: h(k,0), h(k,1), h(k,2), …, h(k,m−1)
Must be a permutation of 0 to m−1 so all slots are checked.

Delete: Cannot simply remove an element (breaks probe chains). Use a “deleted” marker (tombstone) — treated as empty for insertion but not for search.

4. Linear Probing

Probe sequence: h(k, i) = (h(k) + i) mod m  for i = 0, 1, 2, …

Advantage: Simple, good cache performance (contiguous memory access).
Disadvantage: Primary clustering — long contiguous runs of occupied slots form, slowing future insertions and lookups.

If slots h(k), h(k)+1, …, h(k)+j are occupied, any key hashing to any of these slots extends the cluster. Clusters grow faster than linearly.

5. Quadratic Probing

Probe sequence: h(k, i) = (h(k) + c1i + c2i²) mod m
Common: h(k, i) = (h(k) + i + i²) mod m / 2 (for m = power of 2 or prime).

Advantage: Eliminates primary clustering.
Disadvantage: Secondary clustering — keys with same h(k) follow the same probe sequence. May not visit all slots (guaranteed only if m is prime and α ≤ 0.5, or specific c1, c2).

6. Double Hashing

Probe sequence: h(k, i) = (h1(k) + i × h2(k)) mod m

Requirements: h2(k) must be coprime with m. Common: m = prime, h2(k) = 1 + (k mod (m−1)).

Advantage: Each key has unique probe increment → eliminates both primary and secondary clustering.
Closest to random probing in practice.

7. Load Factor and Expected Cost Analysis

α = n/m (n = keys inserted, m = table size). Assumes simple uniform hashing.

MethodSuccessful SearchUnsuccessful Search
Chaining1 + α/21 + α
Linear probing½(1 + 1/(1−α))½(1 + 1/(1−α)²)
Double hashing(1/α)ln(1/(1−α))1/(1−α)
GATE rule of thumb (chaining):
α = 0.5 → successful: 1.25, unsuccessful: 1.5
α = 1.0 → successful: 1.5, unsuccessful: 2.0
α = 2.0 → successful: 2.0, unsuccessful: 3.0

8. GATE CS Solved Examples

Example 1 — Linear probing insertion (GATE CS 2019)

Q: Insert keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of size 11 using h(k) = k mod 11 and linear probing.

Keyh(k)Probe slotsFinal slot
10101010
22000
31999
4444
1544(occ), 55
28666
1766(occ), 77
8800(occ), 11
5944,5,6,7(occ), 88
Table: [22, 88, _, _, 4, 15, 28, 17, 59, 31, 10]

Example 2 — Expected comparisons (GATE CS 2016)

Q: A hash table uses chaining. n = 100 keys, m = 10 slots. Expected successful search comparisons?

α = 100/10 = 10. Expected = 1 + α/2 = 1 + 5 = 6.

Answer: 6 comparisons on average.

Example 3 — Double hashing sequence (GATE CS 2020)

Q: Table size m=7. h1(k)=k mod 7, h2(k)=1+(k mod 6). Insert key k=27 when slots 6, 0, 1 are occupied.

h1(27) = 27 mod 7 = 6 (occupied). h2(27) = 1 + (27 mod 6) = 1+3 = 4.

Probe 1: (6 + 1×4) mod 7 = 10 mod 7 = 3 (empty).

Key 27 inserted at slot 3.

9. Common Mistakes

Mistake 1 — Using simple deletion in open addressing

Deleting by marking a slot empty breaks probe chains. Subsequent searches for keys that probed through this slot will fail prematurely. Always use a tombstone (DELETED marker).

Mistake 2 — Load factor α > 1 in open addressing

Open addressing requires α < 1 (n < m). The table fills up and insertion becomes impossible. Chaining has no such restriction.

Mistake 3 — Confusing successful and unsuccessful search formulas

Successful (chaining): 1 + α/2. Unsuccessful: 1 + α. The unsuccessful case is higher because you traverse the entire chain before concluding absence.

Mistake 4 — Quadratic probing visits all slots

Quadratic probing only guarantees visiting all m slots if m is prime AND the table is less than half full (α < 0.5). Otherwise, some slots may never be probed.

Mistake 5 — h2(k) = 0 in double hashing

If h2(k) = 0 for some key, the probe sequence reduces to h1(k) repeated infinitely — infinite loop. Always ensure h2(k) ≥ 1. Standard: h2(k) = 1 + (k mod (m−1)).

10. FAQ

What is the expected number of comparisons for a successful search in chaining?

1 + α/2 where α = n/m is the load factor. Unsuccessful: 1 + α. Both assume simple uniform hashing.

What is the difference between linear probing and quadratic probing?

Linear: probe h(k)+i — primary clustering. Quadratic: probe h(k)+i² — secondary clustering, may miss slots. Both are open addressing variants.

What is double hashing and why is it preferred?

Probe sequence = (h1(k) + i×h2(k)) mod m. Unique increment per key eliminates both clustering types. Closest to uniform probing in practice.

What is a perfect hash function?

Maps n keys to n slots with no collisions. Built when key set is known in advance. Minimal perfect hash uses exactly n slots. Used in compiler keyword tables.