Arrays & Strings

Q: What is the address formula for a 2D array in row-major order?

For a 2D array A[r][c] stored in row-major order with base address B and element size w: Address of A[i][j] = B + w * (i * c + j). In column-major order: Address = B + w * (j * r + i). GATE frequently asks to compute addresses given r, c, B, w, i, j.

Q: What is the time complexity of KMP pattern matching?

KMP (Knuth-Morris-Pratt) runs in O(n + m) time where n is the text length and m is the pattern length. The O(m) preprocessing builds the failure function (partial match table). The O(n) search phase never backtracks in the text. This is optimal for single-pattern search - better than the naive O(n*m) approach.

Q: How does Rabin-Karp differ from KMP?

KMP uses the failure function to avoid re-examining text characters. Rabin-Karp uses a rolling hash: it computes the hash of each text window in O(1) using the previous window's hash, then only does a character-by-character comparison when hashes match (to handle collisions). Average case: O(n+m). Worst case: O(nm) if all hashes match (spurious hits). Rabin-Karp is preferred for multiple pattern search.

Q: What is a sparse matrix and how is it stored efficiently?

A sparse matrix has most elements as zero. Storing it as a full 2D array wastes space. Efficient representations: (1) Triplet representation - store (row, col, value) for each non-zero element. (2) Compressed Sparse Row (CSR) - three arrays: values, column indices, and row pointers. Both reduce space from O(m*n) to O(k) where k is the number of non-zero elements.

Memory addressing, 2D arrays, string operations, KMP and Rabin-Karp pattern matching — the foundation of every Data Structures question in GATE CS.

Last updated: April 2026 | GATE CS 2024–2026 syllabus

Key Takeaways

Array elements are stored in contiguous memory; random access is O(1), insertion/deletion at arbitrary position is O(n).
Row-major address: B + w × (i × c + j). Column-major: B + w × (j × r + i). GATE tests both.
Strings are character arrays; most string operations (search, compare) are O(n) naive, O(n+m) with KMP.
KMP failure function avoids backtracking: preprocessing O(m), search O(n). Total O(n+m).
Rabin-Karp uses rolling hash for O(n+m) average; handles multiple patterns efficiently.
Sparse matrix: store only non-zero elements using triplet or CSR representation.
Dynamic arrays (like ArrayList) amortise insertion to O(1) average by doubling capacity.

1. Arrays — Basics and Memory

An array stores elements of the same type in contiguous memory. The base address B is the address of A[0] (or A[0][0] for 2D). Each element occupies w bytes (word size).

1D Array address: Address of A[i] = B + i × w
Access time: O(1) — computed directly from index.
Insertion at index i: O(n) — shift elements i+1 to n-1 one position right.
Deletion at index i: O(n) — shift elements i+1 to n-1 one position left.

2. 2D Arrays — Address Calculation

A 2D array A[R][C] has R rows and C columns. Two storage orders:

Row-major order (C, Java, most languages):
Address of A[i][j] = B + w × (i × C + j)

Column-major order (Fortran, MATLAB):
Address of A[i][j] = B + w × (j × R + i)

For lower-bound L_r, L_c (array indexed from L_r to U_r, L_c to U_c):
Row-major: B + w × [(i – L_r) × C + (j – L_c)]
where C = U_c – L_c + 1

3D Array Address (Row-major)

Array A[P][R][C], address of A[i][j][k]:
= B + w × (i × R × C + j × C + k)

3. Array Operations and Complexity

Operation	Time	Note
Access A[i]	O(1)	Direct computation
Search (unsorted)	O(n)	Linear scan
Search (sorted)	O(log n)	Binary search
Insert at end	O(1) amortised	Dynamic array doubling
Insert at position i	O(n)	Shift right
Delete at position i	O(n)	Shift left
Delete last element	O(1)	Decrement size

Dynamic array amortised analysis: Doubling strategy — when full, allocate 2× capacity and copy. Cost of n insertions = n + (1+2+4+…+n) ≤ 2n. Amortised cost per insertion = O(1).

4. Strings

A string is an array of characters, typically null-terminated (\0 in C). Key operations:

Operation	Time
Length (strlen)	O(n)
Compare (strcmp)	O(min(m,n))
Copy (strcpy)	O(n)
Naive pattern search	O(nm) worst case
KMP search	O(n+m)
Rabin-Karp search	O(n+m) average

5. KMP Pattern Matching

KMP avoids re-examining text characters after a mismatch by precomputing how far to shift the pattern.

Failure Function (Partial Match Table)

For pattern P of length m, the failure function f[i] = length of the longest proper prefix of P[0..i] that is also a suffix.

Example: Pattern = ABABCABAB
i: 0 1 2 3 4 5 6 7 8
P: A B A B C A B A B
f[i]: 0 0 1 2 0 1 2 3 4

Algorithm

Compute failure function f[] for pattern P in O(m).
Use two pointers i (text) and j (pattern). On match, advance both. On mismatch with j>0, set j = f[j-1] (don’t advance i). On mismatch with j=0, advance i.
When j == m, pattern found at text position i-m.

Time: O(n + m) Space: O(m) for failure function

6. Rabin-Karp Algorithm

Uses a rolling hash to slide a window of size m over the text. If hashes match, verify character by character.

Rolling hash: h(s[i+1..i+m]) = (h(s[i..i+m-1]) – s[i] × d^m-1) × d + s[i+m]) mod q
where d = alphabet size (26 or 256), q = prime.

Time: O(n+m) average, O(nm) worst (all spurious hits)
Advantage: Easily extended to multiple patterns (one hash comparison per window).

7. Sparse Matrix Representation

Representation	Space	Access	Best For
Full 2D array	O(m×n)	O(1)	Dense matrices
Triplet (row, col, val)	O(k)	O(k) search	Simple, easy to implement
CSR (Compressed Sparse Row)	O(k+m)	O(nnz in row)	Row-wise operations
Linked list of rows	O(k)	O(k/m) per row	Dynamic matrices

k = number of non-zero elements. Sparse when k << m×n.

8. GATE CS Solved Examples

Example 1 — 2D Array Address (GATE CS 2018)

Q: An array A[-3..4][-2..5] of integers (4 bytes each) is stored in row-major order. Base address = 1000. Find the address of A[2][3].

Solution:

L_r = −3, L_c = −2, C = 5−(−2)+1 = 8, w = 4
Address = 1000 + 4 × [(2−(−3)) × 8 + (3−(−2))]
= 1000 + 4 × [5 × 8 + 5] = 1000 + 4 × 45 = 1000 + 180

Answer: 1180

Example 2 — KMP Failure Function (GATE CS 2021)

Q: For pattern P = ABCABDABCABC, what is the failure function value f[11]?

Solution: Build the table:

i	0	1	2	3	4	5	6	7	8	9	10	11
P[i]	A	B	C	A	B	D	A	B	C	A	B	C
f[i]	0	0	0	1	2	0	1	2	3	4	5	3

Answer: f[11] = 3 (longest proper prefix of P[0..11] = “ABC” that is also a suffix)

Example 3 — Dynamic Array Amortised Cost (GATE CS 2020)

Q: A dynamic array doubles when full. Starting with capacity 1, what is the total copy cost for n = 8 insertions?

Solution: Doublings at insertions 2, 3, 5: copy costs 1, 2, 4 = 7. Plus 8 insertions themselves = 15. Or: sum of geometric series = 2n − 1 = 15.

Total cost = 15 = O(n). Amortised per insertion = O(1).

9. Common Mistakes

Mistake 1 — Swapping row-major and column-major formulae

Error: Using j × R + i for row-major.
Fix: Row-major: multiply row index by number of columns (C). Column-major: multiply column index by number of rows (R). Mnemonic: in row-major, rows are contiguous — full row width C separates rows.

Mistake 2 — Forgetting lower bound in address formula

Error: Using A[i][j] address = B + w(i×C + j) when array starts from a non-zero lower bound.
Fix: Always subtract lower bounds: (i − L_r) and (j − L_c). And recompute C = U_c − L_c + 1.

Mistake 3 — Confusing KMP failure function with pattern position

Error: Setting f[i] = i instead of computing the longest proper prefix-suffix.
Fix: f[0] is always 0. For each i, compare P[i] with P[f[i-1]]; if match, f[i] = f[i-1]+1; else fall back using f[f[i-1]-1].

Mistake 4 — Naive pattern matching time as O(nm) always

Error: Stating naive search is always O(nm).
Fix: O(nm) is the worst case (e.g., text = “AAAA…A”, pattern = “AA…AB”). Average case for random text is O(n).

Mistake 5 — Dynamic array insertion as O(n) always

Error: Saying appending to a dynamic array is O(n) because of occasional copying.
Fix: Amortised analysis shows the average cost per append is O(1). Only the rare doubling event costs O(n), and its cost is spread across n/2 previous free inserts.

10. FAQ

What is the address formula for a 2D array in row-major order?

For A[i][j] with R rows, C columns, base B, element size w: B + w×(i×C + j). With lower bounds L_r, L_c: B + w×((i−L_r)×C + (j−L_c)).

What is the time complexity of KMP pattern matching?

O(n+m) — O(m) to build the failure function, O(n) for the search. The text pointer never backtracks, guaranteeing linear time.

How does Rabin-Karp differ from KMP?

KMP uses structural pattern information (failure function). Rabin-Karp uses a rolling hash. Average O(n+m); worst O(nm) if many hash collisions. Rabin-Karp excels at multi-pattern search.

What is a sparse matrix and how is it stored efficiently?

A matrix where most entries are zero. Efficient storage: triplet (row, col, val) for k non-zeros uses O(k) space vs O(m×n) for full storage. CSR adds fast row-wise access.

Arrays and Strings – Data Structures | GATE CS