Arrays & Strings
Memory addressing, 2D arrays, string operations, KMP and Rabin-Karp pattern matching — the foundation of every Data Structures question in GATE CS.
Last updated: April 2026 | GATE CS 2024–2026 syllabus
Key Takeaways
- Array elements are stored in contiguous memory; random access is O(1), insertion/deletion at arbitrary position is O(n).
- Row-major address: B + w × (i × c + j). Column-major: B + w × (j × r + i). GATE tests both.
- Strings are character arrays; most string operations (search, compare) are O(n) naive, O(n+m) with KMP.
- KMP failure function avoids backtracking: preprocessing O(m), search O(n). Total O(n+m).
- Rabin-Karp uses rolling hash for O(n+m) average; handles multiple patterns efficiently.
- Sparse matrix: store only non-zero elements using triplet or CSR representation.
- Dynamic arrays (like ArrayList) amortise insertion to O(1) average by doubling capacity.
1. Arrays — Basics and Memory
An array stores elements of the same type in contiguous memory. The base address B is the address of A[0] (or A[0][0] for 2D). Each element occupies w bytes (word size).
Access time: O(1) — computed directly from index.
Insertion at index i: O(n) — shift elements i+1 to n-1 one position right.
Deletion at index i: O(n) — shift elements i+1 to n-1 one position left.
2. 2D Arrays — Address Calculation
A 2D array A[R][C] has R rows and C columns. Two storage orders:
Address of A[i][j] = B + w × (i × C + j)
Column-major order (Fortran, MATLAB):
Address of A[i][j] = B + w × (j × R + i)
For lower-bound Lr, Lc (array indexed from Lr to Ur, Lc to Uc):
Row-major: B + w × [(i – Lr) × C + (j – Lc)]
where C = Uc – Lc + 1
3D Array Address (Row-major)
= B + w × (i × R × C + j × C + k)
3. Array Operations and Complexity
| Operation | Time | Note |
|---|---|---|
| Access A[i] | O(1) | Direct computation |
| Search (unsorted) | O(n) | Linear scan |
| Search (sorted) | O(log n) | Binary search |
| Insert at end | O(1) amortised | Dynamic array doubling |
| Insert at position i | O(n) | Shift right |
| Delete at position i | O(n) | Shift left |
| Delete last element | O(1) | Decrement size |
Dynamic array amortised analysis: Doubling strategy — when full, allocate 2× capacity and copy. Cost of n insertions = n + (1+2+4+…+n) ≤ 2n. Amortised cost per insertion = O(1).
4. Strings
A string is an array of characters, typically null-terminated (\0 in C). Key operations:
| Operation | Time |
|---|---|
| Length (strlen) | O(n) |
| Compare (strcmp) | O(min(m,n)) |
| Copy (strcpy) | O(n) |
| Naive pattern search | O(nm) worst case |
| KMP search | O(n+m) |
| Rabin-Karp search | O(n+m) average |
5. KMP Pattern Matching
KMP avoids re-examining text characters after a mismatch by precomputing how far to shift the pattern.
Failure Function (Partial Match Table)
For pattern P of length m, the failure function f[i] = length of the longest proper prefix of P[0..i] that is also a suffix.
i: 0 1 2 3 4 5 6 7 8
P: A B A B C A B A B
f[i]: 0 0 1 2 0 1 2 3 4
Algorithm
- Compute failure function f[] for pattern P in O(m).
- Use two pointers i (text) and j (pattern). On match, advance both. On mismatch with j>0, set j = f[j-1] (don’t advance i). On mismatch with j=0, advance i.
- When j == m, pattern found at text position i-m.
6. Rabin-Karp Algorithm
Uses a rolling hash to slide a window of size m over the text. If hashes match, verify character by character.
where d = alphabet size (26 or 256), q = prime.
Time: O(n+m) average, O(nm) worst (all spurious hits)
Advantage: Easily extended to multiple patterns (one hash comparison per window).
7. Sparse Matrix Representation
| Representation | Space | Access | Best For |
|---|---|---|---|
| Full 2D array | O(m×n) | O(1) | Dense matrices |
| Triplet (row, col, val) | O(k) | O(k) search | Simple, easy to implement |
| CSR (Compressed Sparse Row) | O(k+m) | O(nnz in row) | Row-wise operations |
| Linked list of rows | O(k) | O(k/m) per row | Dynamic matrices |
k = number of non-zero elements. Sparse when k << m×n.
8. GATE CS Solved Examples
Example 1 — 2D Array Address (GATE CS 2018)
Q: An array A[-3..4][-2..5] of integers (4 bytes each) is stored in row-major order. Base address = 1000. Find the address of A[2][3].
Solution:
- Lr = −3, Lc = −2, C = 5−(−2)+1 = 8, w = 4
- Address = 1000 + 4 × [(2−(−3)) × 8 + (3−(−2))]
- = 1000 + 4 × [5 × 8 + 5] = 1000 + 4 × 45 = 1000 + 180
Example 2 — KMP Failure Function (GATE CS 2021)
Q: For pattern P = ABCABDABCABC, what is the failure function value f[11]?
Solution: Build the table:
| i | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P[i] | A | B | C | A | B | D | A | B | C | A | B | C |
| f[i] | 0 | 0 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 5 | 3 |
Example 3 — Dynamic Array Amortised Cost (GATE CS 2020)
Q: A dynamic array doubles when full. Starting with capacity 1, what is the total copy cost for n = 8 insertions?
Solution: Doublings at insertions 2, 3, 5: copy costs 1, 2, 4 = 7. Plus 8 insertions themselves = 15. Or: sum of geometric series = 2n − 1 = 15.
9. Common Mistakes
Mistake 1 — Swapping row-major and column-major formulae
Error: Using j × R + i for row-major.
Fix: Row-major: multiply row index by number of columns (C). Column-major: multiply column index by number of rows (R). Mnemonic: in row-major, rows are contiguous — full row width C separates rows.
Mistake 2 — Forgetting lower bound in address formula
Error: Using A[i][j] address = B + w(i×C + j) when array starts from a non-zero lower bound.
Fix: Always subtract lower bounds: (i − Lr) and (j − Lc). And recompute C = Uc − Lc + 1.
Mistake 3 — Confusing KMP failure function with pattern position
Error: Setting f[i] = i instead of computing the longest proper prefix-suffix.
Fix: f[0] is always 0. For each i, compare P[i] with P[f[i-1]]; if match, f[i] = f[i-1]+1; else fall back using f[f[i-1]-1].
Mistake 4 — Naive pattern matching time as O(nm) always
Error: Stating naive search is always O(nm).
Fix: O(nm) is the worst case (e.g., text = “AAAA…A”, pattern = “AA…AB”). Average case for random text is O(n).
Mistake 5 — Dynamic array insertion as O(n) always
Error: Saying appending to a dynamic array is O(n) because of occasional copying.
Fix: Amortised analysis shows the average cost per append is O(1). Only the rare doubling event costs O(n), and its cost is spread across n/2 previous free inserts.
10. FAQ
What is the address formula for a 2D array in row-major order?
For A[i][j] with R rows, C columns, base B, element size w: B + w×(i×C + j). With lower bounds Lr, Lc: B + w×((i−Lr)×C + (j−Lc)).
What is the time complexity of KMP pattern matching?
O(n+m) — O(m) to build the failure function, O(n) for the search. The text pointer never backtracks, guaranteeing linear time.
How does Rabin-Karp differ from KMP?
KMP uses structural pattern information (failure function). Rabin-Karp uses a rolling hash. Average O(n+m); worst O(nm) if many hash collisions. Rabin-Karp excels at multi-pattern search.
What is a sparse matrix and how is it stored efficiently?
A matrix where most entries are zero. Efficient storage: triplet (row, col, val) for k non-zeros uses O(k) space vs O(m×n) for full storage. CSR adds fast row-wise access.