Cache Memory in Computer Organisation
Direct Mapping, Set-Associative, Fully-Associative, AMAT & Replacement Policies — Complete GATE CS Notes
Last updated: April 2026 | Highest-yield COA topic for GATE CS
Key Takeaways
- Cache is a small, fast memory between the CPU and RAM — it exploits temporal and spatial locality to reduce average access time
- Address breakdown: Offset (block offset) + Index (set number) + Tag — always verify these add up to total address bits
- Direct mapped: 1 line per set; Set-associative: k lines per set; Fully associative: all lines form a single set
- AMAT = Hit time + (Miss rate × Miss penalty) — the central formula for all cache performance questions
- Replacement policies (LRU, FIFO, Random) decide which line to evict on a miss in set-associative/fully-associative caches
- Write-through always writes to main memory; Write-back defers writes until eviction (uses dirty bit)
- Three types of cache misses: Compulsory (cold), Capacity, Conflict — higher associativity eliminates conflict misses
1. Why Cache Memory Exists
Modern CPUs execute instructions in nanoseconds. Main memory (DRAM) takes 60–100 nanoseconds to respond. Without cache, a CPU that can execute an instruction in 1 ns would spend 99 of every 100 nanoseconds waiting for memory. The CPU would be idle almost 99% of the time — a catastrophic waste of silicon.
Cache solves this by storing frequently accessed data in a small but extremely fast SRAM (Static RAM) that responds in 1–5 nanoseconds. The challenge: cache can only hold a small fraction of main memory. Cache design is about predicting which data the CPU will need next, and keeping that data close.
| Memory Type | Typical Size | Access Time | Cost per GB | Technology |
|---|---|---|---|---|
| Registers | 32 × 64-bit = 256 bytes | ~0.5 ns | Extremely high | Flip-flops |
| L1 Cache | 32–64 KB | 1–4 cycles (~1 ns) | Very high | SRAM |
| L2 Cache | 256 KB – 1 MB | 10–20 cycles (~5 ns) | High | SRAM |
| L3 Cache | 4–32 MB | 30–50 cycles (~15 ns) | Moderate | SRAM |
| Main Memory (DRAM) | 4–64 GB | 100–200 cycles (~60 ns) | Low | DRAM |
| SSD | 256 GB – 4 TB | ~50 μs | Very low | NAND Flash |
2. Temporal and Spatial Locality
Cache works because real programs exhibit two forms of locality — patterns in how they access memory.
| Type | Definition | Real Example | What Cache Does |
|---|---|---|---|
| Temporal Locality | A location accessed recently is likely to be accessed again soon | A loop variable accessed in every iteration | Keeps recently accessed blocks in cache (replacement policies) |
| Spatial Locality | Locations near a recently accessed location are likely to be accessed soon | Iterating through an array sequentially | Loads entire blocks (multiple words) on a miss, not just one word |
Block size (also called cache line size) directly exploits spatial locality — when any word in a block is accessed, the entire block is loaded into cache, bringing nearby words that are likely to be needed soon.
3. Cache Address Breakdown
Every memory access generates a physical address. The cache hardware splits this address into three fields to locate the relevant cache line.
| Tag | Index | Block Offset |
Block Offset bits = log₂(block size in bytes)
Index bits = log₂(number of sets)
Tag bits = total address bits − index bits − offset bits
Number of sets = (Cache size) / (Block size × Associativity)
For direct mapped (k = 1):
Number of sets = Cache size / Block size
For fully associative (k = all lines):
No index field — address = Tag + Offset
Tag bits = address bits − offset bits
What each field does:
- Block Offset: selects the specific byte within the cache block
- Index: selects which set to look in
- Tag: compared against the stored tag to confirm it is the right block
The cache also stores a valid bit (1 = line contains valid data) and optionally a dirty bit (1 = line modified, must write back on eviction).
4. Cache Mapping Types
4.1 Direct Mapped Cache
In a direct mapped cache, each memory block maps to exactly one specific cache line. The cache line is determined by: (block address) mod (number of cache lines).
Advantage: Simple, fast — checking one tag is enough. No replacement decision needed.
Disadvantage: Conflict misses — two frequently used blocks that map to the same line keep evicting each other, even if other cache lines are empty.
| Memory Block | Maps to Cache Line (8-line cache) |
|---|---|
| Block 0, 8, 16, 24… | Line 0 |
| Block 1, 9, 17, 25… | Line 1 |
| Block 7, 15, 23, 31… | Line 7 |
4.2 Set-Associative Cache (k-way)
A k-way set-associative cache divides the cache into sets, each containing k lines. A memory block maps to one specific set, but can occupy any of the k lines in that set.
k = 1: Direct mapped
k = total lines: Fully associative
k = 2, 4, 8: Common in real CPUs (L1 is usually 4-way or 8-way)
Cache size = number of sets × associativity × block size
Number of sets = Cache size / (block size × k)
4.3 Fully Associative Cache
A memory block can be placed in any cache line. No index bits in the address — just tag and offset. Eliminates all conflict misses, but requires checking all tag entries simultaneously, which is expensive in hardware.
Used in small, specialised structures like TLBs (Translation Lookaside Buffers) where the number of entries is small enough to check in parallel.
| Property | Direct Mapped | k-way Set-Associative | Fully Associative |
|---|---|---|---|
| Placement rule | One fixed line | Any line in a set | Any line in cache |
| Conflict misses | High | Moderate | Zero |
| Hit time | Fastest | Moderate | Slowest (all tags checked) |
| Hardware complexity | Lowest | Moderate | Highest |
| Replacement needed? | No | Yes (within the set) | Yes (any line) |
5. Replacement Policies — LRU, FIFO, Random
When a miss occurs and the target set is full, one of the existing lines must be evicted to make room. The replacement policy decides which one.
| Policy | Rule | Performance | Implementation |
|---|---|---|---|
| LRU (Least Recently Used) | Evict the line that was accessed least recently | Best — exploits temporal locality directly | Complex — requires tracking access order (counters or stack) |
| FIFO (First In, First Out) | Evict the line that was loaded earliest (a queue) | Good — simple approximation of LRU | Simple — just a pointer to oldest entry |
| Random | Evict a randomly chosen line | Surprisingly good on average | Very simple — random number generator |
| Optimal (OPT) | Evict the line that will be accessed furthest in the future | Best possible — but requires future knowledge | Impossible in real hardware; used as a theoretical benchmark |
GATE tip: LRU trace questions are common. Always update the access order on every hit, not just misses. The line evicted is the one at the “back” of the LRU queue.
6. Write Policies — Write-Through vs Write-Back
| Property | Write-Through | Write-Back |
|---|---|---|
| On a write hit | Write to cache AND main memory simultaneously | Write to cache only; set dirty bit |
| On a write miss | Write directly to main memory (write-around) or allocate+write | Load block into cache, then write (write-allocate) |
| Main memory consistency | Always consistent with cache | May be stale — updated only on eviction |
| Memory bandwidth | High — every write goes to memory | Low — writes only on dirty eviction |
| Dirty bit needed? | No | Yes |
| Write buffer needed? | Yes — to absorb bursts of writes | No (usually) |
| Used in | L1 caches in some designs, caches for I/O devices | L2 and L3 caches, most modern CPU designs |
Write-Allocate: load the block into cache, then write — pairs with write-back
No-Write-Allocate (Write-Around): write directly to memory, do not load into cache — pairs with write-through
7. AMAT Formula & Multi-Level Cache
AMAT = Hit time + (Miss rate × Miss penalty)
Two-level cache AMAT:
AMAT = L1 Hit time + (L1 Miss rate × (L2 Hit time + L2 Miss rate × Memory access time))
Alternative form with local vs global miss rate:
Global L2 miss rate = L1 miss rate × L2 local miss rate
AMAT = L1 Hit time + (L1 Miss rate × L2 Hit time) + (Global L2 Miss rate × Memory time)
Important distinction:
- Local miss rate: misses seen by this cache level ÷ accesses to this cache level
- Global miss rate: misses that reach main memory ÷ total accesses
GATE questions on AMAT almost always use the single-level formula. Two-level AMAT appears in harder questions — read carefully whether “miss penalty” means time to reach L2 or time to reach memory.
8. Types of Cache Misses — The 3 Cs
| Miss Type | Also Called | Cause | How to Reduce |
|---|---|---|---|
| Compulsory | Cold-start miss | First access to a block — it has never been in cache | Larger block size (prefetch more); prefetching |
| Capacity | — | Working set is larger than the cache — even a fully associative cache would miss | Increase cache size |
| Conflict | Collision miss | Multiple blocks map to the same set and evict each other | Increase associativity; better cache indexing |
9. GATE-Level Worked Examples
Example 1 — Address Breakdown (GATE 2021)
Problem: A cache has 128 sets, 2-way set-associative, block size = 32 bytes. The physical address is 32 bits. Find the number of tag, index, and offset bits.
Block size = 32 bytes → Offset bits = log₂(32) = 5 bits
Number of sets = 128 → Index bits = log₂(128) = 7 bits
Tag bits = 32 − 7 − 5 = 20 bits
Verification: 20 + 7 + 5 = 32 ✓
Total cache size = 128 sets × 2 ways × 32 bytes = 8 KB
Example 2 — AMAT Calculation (GATE 2022)
Problem: A system has an L1 cache with 2 ns hit time and 5% miss rate. On a miss, it accesses main memory in 100 ns. A write buffer reduces effective miss penalty by 20 ns. What is the AMAT?
Hit time = 2 ns
Miss rate = 0.05
Miss penalty = 100 − 20 = 80 ns (after write buffer benefit)
AMAT = 2 + (0.05 × 80) = 2 + 4 = 6 ns
Example 3 — LRU Replacement Trace (GATE 2019)
Problem: A 2-way set-associative cache has 4 sets. Consider Set 0. Using LRU replacement, trace the following accesses and find the number of misses: Blocks 0, 4, 0, 8, 4, 0, 4
Blocks mapping to Set 0 (4 sets): 0, 4, 8, 12… (blocks where block# mod 4 = 0)
Cache Set 0 capacity: 2 ways
Access 0: MISS — cache: [0, —] (load 0)
Access 4: MISS — cache: [0, 4] (load 4)
Access 0: HIT — cache: [4, 0] (LRU: 4 is now oldest)
Access 8: MISS — evict LRU=4, cache: [0, 8]
Access 4: MISS — evict LRU=0, cache: [8, 4]
Access 0: MISS — evict LRU=8, cache: [4, 0]
Access 4: HIT — cache: [0, 4] (LRU: 0 is now oldest)
Total misses = 5, Hits = 2
10. Common Mistakes
-
Tag + Index + Offset not summing to address width
This is the most common calculation error. Always verify at the end. If you get Tag + Index + Offset ≠ 32 (or whatever the address width is), re-examine your log₂ calculations. -
Confusing local and global miss rate in multi-level cache
The global miss rate reaching memory = L1 miss rate × L2 miss rate. The AMAT formula uses global miss rate for the memory penalty term, not L2’s local miss rate alone. -
Updating LRU order only on misses, not on hits
In LRU, every access — hit or miss — updates the recency order. A line that was hit becomes the most recently used. Only tracking misses gives wrong eviction choices. -
Applying write-back’s dirty bit logic to write-through caches
Write-through caches do not need a dirty bit — main memory is always up to date. Dirty bits are exclusive to write-back caches and signal that a line must be written to memory before eviction. -
Forgetting that block size affects compulsory miss rate
Larger blocks reduce compulsory misses (more data fetched on each miss) but can increase miss penalty (more bytes to transfer). There is a diminishing return — the optimal block size balances these two effects. GATE occasionally asks about the effect of doubling block size.
11. Frequently Asked Questions
What is AMAT in cache memory?
AMAT (Average Memory Access Time) is the weighted average time to access memory, accounting for both cache hits and misses: AMAT = Hit time + (Miss rate × Miss penalty). It is the primary metric for cache performance in GATE COA questions. For a two-level cache, the formula extends to include L2 hit time and L2 miss penalty.
What is the difference between direct mapped and set-associative cache?
A direct mapped cache has exactly one cache line per set — each memory block maps to exactly one location, making conflict misses likely. A k-way set-associative cache has k lines per set — a memory block maps to one set but can occupy any of the k positions, which dramatically reduces conflict misses at the cost of slightly more hardware to check k tags simultaneously.
How do you break down a memory address for cache?
Split the address into three fields (right to left): Block Offset = log₂(block size in bytes) bits, Index = log₂(number of sets) bits, Tag = remaining bits. For fully associative, there is no index field. Always verify that the three fields sum to the total physical address width.
What is the difference between write-through and write-back cache?
Write-through: every cache write is immediately propagated to main memory, keeping them always consistent. Uses more memory bandwidth. Write-back: writes go only to cache; a dirty bit marks modified lines; main memory is updated only when the dirty line is evicted. Uses less bandwidth but adds complexity. Write-back typically delivers better performance in high-bandwidth scenarios.