Cache Memory — Direct Mapping, Set-Associative, AMAT & Replacement Policies | GATE CS

Cache Memory in Computer Organisation

Direct Mapping, Set-Associative, Fully-Associative, AMAT & Replacement Policies — Complete GATE CS Notes

Last updated: April 2026  |  Highest-yield COA topic for GATE CS

Key Takeaways

  • Cache is a small, fast memory between the CPU and RAM — it exploits temporal and spatial locality to reduce average access time
  • Address breakdown: Offset (block offset) + Index (set number) + Tag — always verify these add up to total address bits
  • Direct mapped: 1 line per set; Set-associative: k lines per set; Fully associative: all lines form a single set
  • AMAT = Hit time + (Miss rate × Miss penalty) — the central formula for all cache performance questions
  • Replacement policies (LRU, FIFO, Random) decide which line to evict on a miss in set-associative/fully-associative caches
  • Write-through always writes to main memory; Write-back defers writes until eviction (uses dirty bit)
  • Three types of cache misses: Compulsory (cold), Capacity, Conflict — higher associativity eliminates conflict misses

1. Why Cache Memory Exists

Modern CPUs execute instructions in nanoseconds. Main memory (DRAM) takes 60–100 nanoseconds to respond. Without cache, a CPU that can execute an instruction in 1 ns would spend 99 of every 100 nanoseconds waiting for memory. The CPU would be idle almost 99% of the time — a catastrophic waste of silicon.

Cache solves this by storing frequently accessed data in a small but extremely fast SRAM (Static RAM) that responds in 1–5 nanoseconds. The challenge: cache can only hold a small fraction of main memory. Cache design is about predicting which data the CPU will need next, and keeping that data close.

Memory TypeTypical SizeAccess TimeCost per GBTechnology
Registers32 × 64-bit = 256 bytes~0.5 nsExtremely highFlip-flops
L1 Cache32–64 KB1–4 cycles (~1 ns)Very highSRAM
L2 Cache256 KB – 1 MB10–20 cycles (~5 ns)HighSRAM
L3 Cache4–32 MB30–50 cycles (~15 ns)ModerateSRAM
Main Memory (DRAM)4–64 GB100–200 cycles (~60 ns)LowDRAM
SSD256 GB – 4 TB~50 μsVery lowNAND Flash

2. Temporal and Spatial Locality

Cache works because real programs exhibit two forms of locality — patterns in how they access memory.

TypeDefinitionReal ExampleWhat Cache Does
Temporal LocalityA location accessed recently is likely to be accessed again soonA loop variable accessed in every iterationKeeps recently accessed blocks in cache (replacement policies)
Spatial LocalityLocations near a recently accessed location are likely to be accessed soonIterating through an array sequentiallyLoads entire blocks (multiple words) on a miss, not just one word

Block size (also called cache line size) directly exploits spatial locality — when any word in a block is accessed, the entire block is loaded into cache, bringing nearby words that are likely to be needed soon.

3. Cache Address Breakdown

Every memory access generates a physical address. The cache hardware splits this address into three fields to locate the relevant cache line.

For a set-associative cache:
| Tag | Index | Block Offset |

Block Offset bits = log₂(block size in bytes)
Index bits = log₂(number of sets)
Tag bits = total address bits − index bits − offset bits

Number of sets = (Cache size) / (Block size × Associativity)

For direct mapped (k = 1):
Number of sets = Cache size / Block size

For fully associative (k = all lines):
No index field — address = Tag + Offset
Tag bits = address bits − offset bits

What each field does:

  • Block Offset: selects the specific byte within the cache block
  • Index: selects which set to look in
  • Tag: compared against the stored tag to confirm it is the right block

The cache also stores a valid bit (1 = line contains valid data) and optionally a dirty bit (1 = line modified, must write back on eviction).

4. Cache Mapping Types

4.1 Direct Mapped Cache

In a direct mapped cache, each memory block maps to exactly one specific cache line. The cache line is determined by: (block address) mod (number of cache lines).

Advantage: Simple, fast — checking one tag is enough. No replacement decision needed.
Disadvantage: Conflict misses — two frequently used blocks that map to the same line keep evicting each other, even if other cache lines are empty.

Memory BlockMaps to Cache Line (8-line cache)
Block 0, 8, 16, 24…Line 0
Block 1, 9, 17, 25…Line 1
Block 7, 15, 23, 31…Line 7

4.2 Set-Associative Cache (k-way)

A k-way set-associative cache divides the cache into sets, each containing k lines. A memory block maps to one specific set, but can occupy any of the k lines in that set.

k = 1: Direct mapped
k = total lines: Fully associative
k = 2, 4, 8: Common in real CPUs (L1 is usually 4-way or 8-way)

Set number for a memory block = (block number) mod (number of sets)

Cache size = number of sets × associativity × block size
Number of sets = Cache size / (block size × k)

4.3 Fully Associative Cache

A memory block can be placed in any cache line. No index bits in the address — just tag and offset. Eliminates all conflict misses, but requires checking all tag entries simultaneously, which is expensive in hardware.

Used in small, specialised structures like TLBs (Translation Lookaside Buffers) where the number of entries is small enough to check in parallel.

PropertyDirect Mappedk-way Set-AssociativeFully Associative
Placement ruleOne fixed lineAny line in a setAny line in cache
Conflict missesHighModerateZero
Hit timeFastestModerateSlowest (all tags checked)
Hardware complexityLowestModerateHighest
Replacement needed?NoYes (within the set)Yes (any line)

5. Replacement Policies — LRU, FIFO, Random

When a miss occurs and the target set is full, one of the existing lines must be evicted to make room. The replacement policy decides which one.

PolicyRulePerformanceImplementation
LRU (Least Recently Used)Evict the line that was accessed least recentlyBest — exploits temporal locality directlyComplex — requires tracking access order (counters or stack)
FIFO (First In, First Out)Evict the line that was loaded earliest (a queue)Good — simple approximation of LRUSimple — just a pointer to oldest entry
RandomEvict a randomly chosen lineSurprisingly good on averageVery simple — random number generator
Optimal (OPT)Evict the line that will be accessed furthest in the futureBest possible — but requires future knowledgeImpossible in real hardware; used as a theoretical benchmark

GATE tip: LRU trace questions are common. Always update the access order on every hit, not just misses. The line evicted is the one at the “back” of the LRU queue.

6. Write Policies — Write-Through vs Write-Back

PropertyWrite-ThroughWrite-Back
On a write hitWrite to cache AND main memory simultaneouslyWrite to cache only; set dirty bit
On a write missWrite directly to main memory (write-around) or allocate+writeLoad block into cache, then write (write-allocate)
Main memory consistencyAlways consistent with cacheMay be stale — updated only on eviction
Memory bandwidthHigh — every write goes to memoryLow — writes only on dirty eviction
Dirty bit needed?NoYes
Write buffer needed?Yes — to absorb bursts of writesNo (usually)
Used inL1 caches in some designs, caches for I/O devicesL2 and L3 caches, most modern CPU designs
Write policies on a write miss:
Write-Allocate: load the block into cache, then write — pairs with write-back
No-Write-Allocate (Write-Around): write directly to memory, do not load into cache — pairs with write-through

7. AMAT Formula & Multi-Level Cache

Single-level cache AMAT:
AMAT = Hit time + (Miss rate × Miss penalty)

Two-level cache AMAT:
AMAT = L1 Hit time + (L1 Miss rate × (L2 Hit time + L2 Miss rate × Memory access time))

Alternative form with local vs global miss rate:
Global L2 miss rate = L1 miss rate × L2 local miss rate
AMAT = L1 Hit time + (L1 Miss rate × L2 Hit time) + (Global L2 Miss rate × Memory time)

Important distinction:

  • Local miss rate: misses seen by this cache level ÷ accesses to this cache level
  • Global miss rate: misses that reach main memory ÷ total accesses

GATE questions on AMAT almost always use the single-level formula. Two-level AMAT appears in harder questions — read carefully whether “miss penalty” means time to reach L2 or time to reach memory.

8. Types of Cache Misses — The 3 Cs

Miss TypeAlso CalledCauseHow to Reduce
CompulsoryCold-start missFirst access to a block — it has never been in cacheLarger block size (prefetch more); prefetching
CapacityWorking set is larger than the cache — even a fully associative cache would missIncrease cache size
ConflictCollision missMultiple blocks map to the same set and evict each otherIncrease associativity; better cache indexing

9. GATE-Level Worked Examples

Example 1 — Address Breakdown (GATE 2021)

Problem: A cache has 128 sets, 2-way set-associative, block size = 32 bytes. The physical address is 32 bits. Find the number of tag, index, and offset bits.

Solution:
Block size = 32 bytes → Offset bits = log₂(32) = 5 bits
Number of sets = 128 → Index bits = log₂(128) = 7 bits
Tag bits = 32 − 7 − 5 = 20 bits

Verification: 20 + 7 + 5 = 32 ✓

Total cache size = 128 sets × 2 ways × 32 bytes = 8 KB

Example 2 — AMAT Calculation (GATE 2022)

Problem: A system has an L1 cache with 2 ns hit time and 5% miss rate. On a miss, it accesses main memory in 100 ns. A write buffer reduces effective miss penalty by 20 ns. What is the AMAT?

Solution:
Hit time = 2 ns
Miss rate = 0.05
Miss penalty = 100 − 20 = 80 ns (after write buffer benefit)

AMAT = 2 + (0.05 × 80) = 2 + 4 = 6 ns

Example 3 — LRU Replacement Trace (GATE 2019)

Problem: A 2-way set-associative cache has 4 sets. Consider Set 0. Using LRU replacement, trace the following accesses and find the number of misses: Blocks 0, 4, 0, 8, 4, 0, 4

Solution:
Blocks mapping to Set 0 (4 sets): 0, 4, 8, 12… (blocks where block# mod 4 = 0)
Cache Set 0 capacity: 2 ways

Access 0: MISS — cache: [0, —] (load 0)
Access 4: MISS — cache: [0, 4] (load 4)
Access 0: HIT — cache: [4, 0] (LRU: 4 is now oldest)
Access 8: MISS — evict LRU=4, cache: [0, 8]
Access 4: MISS — evict LRU=0, cache: [8, 4]
Access 0: MISS — evict LRU=8, cache: [4, 0]
Access 4: HIT — cache: [0, 4] (LRU: 0 is now oldest)

Total misses = 5, Hits = 2

10. Common Mistakes

  1. Tag + Index + Offset not summing to address width
    This is the most common calculation error. Always verify at the end. If you get Tag + Index + Offset ≠ 32 (or whatever the address width is), re-examine your log₂ calculations.
  2. Confusing local and global miss rate in multi-level cache
    The global miss rate reaching memory = L1 miss rate × L2 miss rate. The AMAT formula uses global miss rate for the memory penalty term, not L2’s local miss rate alone.
  3. Updating LRU order only on misses, not on hits
    In LRU, every access — hit or miss — updates the recency order. A line that was hit becomes the most recently used. Only tracking misses gives wrong eviction choices.
  4. Applying write-back’s dirty bit logic to write-through caches
    Write-through caches do not need a dirty bit — main memory is always up to date. Dirty bits are exclusive to write-back caches and signal that a line must be written to memory before eviction.
  5. Forgetting that block size affects compulsory miss rate
    Larger blocks reduce compulsory misses (more data fetched on each miss) but can increase miss penalty (more bytes to transfer). There is a diminishing return — the optimal block size balances these two effects. GATE occasionally asks about the effect of doubling block size.

11. Frequently Asked Questions

What is AMAT in cache memory?

AMAT (Average Memory Access Time) is the weighted average time to access memory, accounting for both cache hits and misses: AMAT = Hit time + (Miss rate × Miss penalty). It is the primary metric for cache performance in GATE COA questions. For a two-level cache, the formula extends to include L2 hit time and L2 miss penalty.

What is the difference between direct mapped and set-associative cache?

A direct mapped cache has exactly one cache line per set — each memory block maps to exactly one location, making conflict misses likely. A k-way set-associative cache has k lines per set — a memory block maps to one set but can occupy any of the k positions, which dramatically reduces conflict misses at the cost of slightly more hardware to check k tags simultaneously.

How do you break down a memory address for cache?

Split the address into three fields (right to left): Block Offset = log₂(block size in bytes) bits, Index = log₂(number of sets) bits, Tag = remaining bits. For fully associative, there is no index field. Always verify that the three fields sum to the total physical address width.

What is the difference between write-through and write-back cache?

Write-through: every cache write is immediately propagated to main memory, keeping them always consistent. Uses more memory bandwidth. Write-back: writes go only to cache; a dirty bit marks modified lines; main memory is updated only when the dirty line is evicted. Uses less bandwidth but adds complexity. Write-back typically delivers better performance in high-bandwidth scenarios.

Leave a Comment