Arrays and Strings – Data Structures | GATE CS


Arrays & Strings

Memory addressing, 2D arrays, string operations, KMP and Rabin-Karp pattern matching — the foundation of every Data Structures question in GATE CS.

Last updated: April 2026  |  GATE CS 2024–2026 syllabus

Key Takeaways

  • Array elements are stored in contiguous memory; random access is O(1), insertion/deletion at arbitrary position is O(n).
  • Row-major address: B + w × (i × c + j). Column-major: B + w × (j × r + i). GATE tests both.
  • Strings are character arrays; most string operations (search, compare) are O(n) naive, O(n+m) with KMP.
  • KMP failure function avoids backtracking: preprocessing O(m), search O(n). Total O(n+m).
  • Rabin-Karp uses rolling hash for O(n+m) average; handles multiple patterns efficiently.
  • Sparse matrix: store only non-zero elements using triplet or CSR representation.
  • Dynamic arrays (like ArrayList) amortise insertion to O(1) average by doubling capacity.

1. Arrays — Basics and Memory

An array stores elements of the same type in contiguous memory. The base address B is the address of A[0] (or A[0][0] for 2D). Each element occupies w bytes (word size).

1D Array address: Address of A[i] = B + i × w
Access time: O(1) — computed directly from index.
Insertion at index i: O(n) — shift elements i+1 to n-1 one position right.
Deletion at index i: O(n) — shift elements i+1 to n-1 one position left.

2. 2D Arrays — Address Calculation

A 2D array A[R][C] has R rows and C columns. Two storage orders:

Row-major order (C, Java, most languages):
Address of A[i][j] = B + w × (i × C + j)

Column-major order (Fortran, MATLAB):
Address of A[i][j] = B + w × (j × R + i)

For lower-bound Lr, Lc (array indexed from Lr to Ur, Lc to Uc):
Row-major: B + w × [(i – Lr) × C + (j – Lc)]
where C = Uc – Lc + 1

3D Array Address (Row-major)

Array A[P][R][C], address of A[i][j][k]:
= B + w × (i × R × C + j × C + k)

3. Array Operations and Complexity

OperationTimeNote
Access A[i]O(1)Direct computation
Search (unsorted)O(n)Linear scan
Search (sorted)O(log n)Binary search
Insert at endO(1) amortisedDynamic array doubling
Insert at position iO(n)Shift right
Delete at position iO(n)Shift left
Delete last elementO(1)Decrement size

Dynamic array amortised analysis: Doubling strategy — when full, allocate 2× capacity and copy. Cost of n insertions = n + (1+2+4+…+n) ≤ 2n. Amortised cost per insertion = O(1).

4. Strings

A string is an array of characters, typically null-terminated (\0 in C). Key operations:

OperationTime
Length (strlen)O(n)
Compare (strcmp)O(min(m,n))
Copy (strcpy)O(n)
Naive pattern searchO(nm) worst case
KMP searchO(n+m)
Rabin-Karp searchO(n+m) average

5. KMP Pattern Matching

KMP avoids re-examining text characters after a mismatch by precomputing how far to shift the pattern.

Failure Function (Partial Match Table)

For pattern P of length m, the failure function f[i] = length of the longest proper prefix of P[0..i] that is also a suffix.

Example: Pattern = ABABCABAB
i: 0 1 2 3 4 5 6 7 8
P: A B A B C A B A B
f[i]: 0 0 1 2 0 1 2 3 4

Algorithm

  1. Compute failure function f[] for pattern P in O(m).
  2. Use two pointers i (text) and j (pattern). On match, advance both. On mismatch with j>0, set j = f[j-1] (don’t advance i). On mismatch with j=0, advance i.
  3. When j == m, pattern found at text position i-m.
Time: O(n + m)    Space: O(m) for failure function

6. Rabin-Karp Algorithm

Uses a rolling hash to slide a window of size m over the text. If hashes match, verify character by character.

Rolling hash: h(s[i+1..i+m]) = (h(s[i..i+m-1]) – s[i] × dm-1) × d + s[i+m]) mod q
where d = alphabet size (26 or 256), q = prime.

Time: O(n+m) average, O(nm) worst (all spurious hits)
Advantage: Easily extended to multiple patterns (one hash comparison per window).

7. Sparse Matrix Representation

RepresentationSpaceAccessBest For
Full 2D arrayO(m×n)O(1)Dense matrices
Triplet (row, col, val)O(k)O(k) searchSimple, easy to implement
CSR (Compressed Sparse Row)O(k+m)O(nnz in row)Row-wise operations
Linked list of rowsO(k)O(k/m) per rowDynamic matrices

k = number of non-zero elements. Sparse when k << m×n.

8. GATE CS Solved Examples

Example 1 — 2D Array Address (GATE CS 2018)

Q: An array A[-3..4][-2..5] of integers (4 bytes each) is stored in row-major order. Base address = 1000. Find the address of A[2][3].

Solution:

  • Lr = −3, Lc = −2, C = 5−(−2)+1 = 8, w = 4
  • Address = 1000 + 4 × [(2−(−3)) × 8 + (3−(−2))]
  • = 1000 + 4 × [5 × 8 + 5] = 1000 + 4 × 45 = 1000 + 180
Answer: 1180

Example 2 — KMP Failure Function (GATE CS 2021)

Q: For pattern P = ABCABDABCABC, what is the failure function value f[11]?

Solution: Build the table:

i01234567891011
P[i]ABCABDABCABC
f[i]000120123453
Answer: f[11] = 3 (longest proper prefix of P[0..11] = “ABC” that is also a suffix)

Example 3 — Dynamic Array Amortised Cost (GATE CS 2020)

Q: A dynamic array doubles when full. Starting with capacity 1, what is the total copy cost for n = 8 insertions?

Solution: Doublings at insertions 2, 3, 5: copy costs 1, 2, 4 = 7. Plus 8 insertions themselves = 15. Or: sum of geometric series = 2n − 1 = 15.

Total cost = 15 = O(n). Amortised per insertion = O(1).

9. Common Mistakes

Mistake 1 — Swapping row-major and column-major formulae

Error: Using j × R + i for row-major.
Fix: Row-major: multiply row index by number of columns (C). Column-major: multiply column index by number of rows (R). Mnemonic: in row-major, rows are contiguous — full row width C separates rows.

Mistake 2 — Forgetting lower bound in address formula

Error: Using A[i][j] address = B + w(i×C + j) when array starts from a non-zero lower bound.
Fix: Always subtract lower bounds: (i − Lr) and (j − Lc). And recompute C = Uc − Lc + 1.

Mistake 3 — Confusing KMP failure function with pattern position

Error: Setting f[i] = i instead of computing the longest proper prefix-suffix.
Fix: f[0] is always 0. For each i, compare P[i] with P[f[i-1]]; if match, f[i] = f[i-1]+1; else fall back using f[f[i-1]-1].

Mistake 4 — Naive pattern matching time as O(nm) always

Error: Stating naive search is always O(nm).
Fix: O(nm) is the worst case (e.g., text = “AAAA…A”, pattern = “AA…AB”). Average case for random text is O(n).

Mistake 5 — Dynamic array insertion as O(n) always

Error: Saying appending to a dynamic array is O(n) because of occasional copying.
Fix: Amortised analysis shows the average cost per append is O(1). Only the rare doubling event costs O(n), and its cost is spread across n/2 previous free inserts.

10. FAQ

What is the address formula for a 2D array in row-major order?

For A[i][j] with R rows, C columns, base B, element size w: B + w×(i×C + j). With lower bounds Lr, Lc: B + w×((i−Lr)×C + (j−Lc)).

What is the time complexity of KMP pattern matching?

O(n+m) — O(m) to build the failure function, O(n) for the search. The text pointer never backtracks, guaranteeing linear time.

How does Rabin-Karp differ from KMP?

KMP uses structural pattern information (failure function). Rabin-Karp uses a rolling hash. Average O(n+m); worst O(nm) if many hash collisions. Rabin-Karp excels at multi-pattern search.

What is a sparse matrix and how is it stored efficiently?

A matrix where most entries are zero. Efficient storage: triplet (row, col, val) for k non-zeros uses O(k) space vs O(m×n) for full storage. CSR adds fast row-wise access.