Cache Memory in Computer Organisation

Q: What is AMAT in cache memory?

AMAT stands for Average Memory Access Time. It is calculated as: AMAT = Hit time + (Miss rate × Miss penalty). Hit time is the time to access the cache when there is a hit. Miss rate is the fraction of accesses that are cache misses. Miss penalty is the additional time to fetch data from the next level of memory on a miss.

Q: How do you break down a memory address for cache?

For a set-associative cache: Offset bits = log₂(block size in bytes), Index bits = log₂(number of sets), Tag bits = total address bits − index bits − offset bits. For direct mapped: same formula with associativity = 1. For fully associative: no index bits — address = Tag + Offset.

Q: What is the difference between write-through and write-back cache?

Write-through: every write to cache is immediately written to main memory too, keeping them consistent. Simple but uses more memory bandwidth. Write-back: writes go only to cache; main memory is updated only when the cache line is evicted (dirty bit tracks this). More efficient but requires a dirty bit and more complex eviction logic.

Direct Mapping, Set-Associative, Fully-Associative, AMAT & Replacement Policies — Complete GATE CS Notes

Last updated: April 2026 | Highest-yield COA topic for GATE CS

Key Takeaways

Cache is a small, fast memory between the CPU and RAM — it exploits temporal and spatial locality to reduce average access time
Address breakdown: Offset (block offset) + Index (set number) + Tag — always verify these add up to total address bits
Direct mapped: 1 line per set; Set-associative: k lines per set; Fully associative: all lines form a single set
AMAT = Hit time + (Miss rate × Miss penalty) — the central formula for all cache performance questions
Replacement policies (LRU, FIFO, Random) decide which line to evict on a miss in set-associative/fully-associative caches
Write-through always writes to main memory; Write-back defers writes until eviction (uses dirty bit)
Three types of cache misses: Compulsory (cold), Capacity, Conflict — higher associativity eliminates conflict misses

1. Why Cache Memory Exists

Modern CPUs execute instructions in nanoseconds. Main memory (DRAM) takes 60–100 nanoseconds to respond. Without cache, a CPU that can execute an instruction in 1 ns would spend 99 of every 100 nanoseconds waiting for memory. The CPU would be idle almost 99% of the time — a catastrophic waste of silicon.

Cache solves this by storing frequently accessed data in a small but extremely fast SRAM (Static RAM) that responds in 1–5 nanoseconds. The challenge: cache can only hold a small fraction of main memory. Cache design is about predicting which data the CPU will need next, and keeping that data close.

Memory Type	Typical Size	Access Time	Cost per GB	Technology
Registers	32 × 64-bit = 256 bytes	~0.5 ns	Extremely high	Flip-flops
L1 Cache	32–64 KB	1–4 cycles (~1 ns)	Very high	SRAM
L2 Cache	256 KB – 1 MB	10–20 cycles (~5 ns)	High	SRAM
L3 Cache	4–32 MB	30–50 cycles (~15 ns)	Moderate	SRAM
Main Memory (DRAM)	4–64 GB	100–200 cycles (~60 ns)	Low	DRAM
SSD	256 GB – 4 TB	~50 μs	Very low	NAND Flash

2. Temporal and Spatial Locality

Cache works because real programs exhibit two forms of locality — patterns in how they access memory.

Type	Definition	Real Example	What Cache Does
Temporal Locality	A location accessed recently is likely to be accessed again soon	A loop variable accessed in every iteration	Keeps recently accessed blocks in cache (replacement policies)
Spatial Locality	Locations near a recently accessed location are likely to be accessed soon	Iterating through an array sequentially	Loads entire blocks (multiple words) on a miss, not just one word

Block size (also called cache line size) directly exploits spatial locality — when any word in a block is accessed, the entire block is loaded into cache, bringing nearby words that are likely to be needed soon.

3. Cache Address Breakdown

Every memory access generates a physical address. The cache hardware splits this address into three fields to locate the relevant cache line.

For a set-associative cache:
| Tag | Index | Block Offset |

Block Offset bits = log₂(block size in bytes)
Index bits = log₂(number of sets)
Tag bits = total address bits − index bits − offset bits

Number of sets = (Cache size) / (Block size × Associativity)

For direct mapped (k = 1):
Number of sets = Cache size / Block size

For fully associative (k = all lines):
No index field — address = Tag + Offset
Tag bits = address bits − offset bits

What each field does:

Block Offset: selects the specific byte within the cache block
Index: selects which set to look in
Tag: compared against the stored tag to confirm it is the right block

The cache also stores a valid bit (1 = line contains valid data) and optionally a dirty bit (1 = line modified, must write back on eviction).

4. Cache Mapping Types

4.1 Direct Mapped Cache

In a direct mapped cache, each memory block maps to exactly one specific cache line. The cache line is determined by: (block address) mod (number of cache lines).

Advantage: Simple, fast — checking one tag is enough. No replacement decision needed.
Disadvantage: Conflict misses — two frequently used blocks that map to the same line keep evicting each other, even if other cache lines are empty.

Memory Block	Maps to Cache Line (8-line cache)
Block 0, 8, 16, 24…	Line 0
Block 1, 9, 17, 25…	Line 1
Block 7, 15, 23, 31…	Line 7

4.2 Set-Associative Cache (k-way)

A k-way set-associative cache divides the cache into sets, each containing k lines. A memory block maps to one specific set, but can occupy any of the k lines in that set.

k = 1: Direct mapped
k = total lines: Fully associative
k = 2, 4, 8: Common in real CPUs (L1 is usually 4-way or 8-way)

Set number for a memory block = (block number) mod (number of sets)

Cache size = number of sets × associativity × block size
Number of sets = Cache size / (block size × k)

4.3 Fully Associative Cache

A memory block can be placed in any cache line. No index bits in the address — just tag and offset. Eliminates all conflict misses, but requires checking all tag entries simultaneously, which is expensive in hardware.

Used in small, specialised structures like TLBs (Translation Lookaside Buffers) where the number of entries is small enough to check in parallel.

Property	Direct Mapped	k-way Set-Associative	Fully Associative
Placement rule	One fixed line	Any line in a set	Any line in cache
Conflict misses	High	Moderate	Zero
Hit time	Fastest	Moderate	Slowest (all tags checked)
Hardware complexity	Lowest	Moderate	Highest
Replacement needed?	No	Yes (within the set)	Yes (any line)

5. Replacement Policies — LRU, FIFO, Random

When a miss occurs and the target set is full, one of the existing lines must be evicted to make room. The replacement policy decides which one.

Policy	Rule	Performance	Implementation
LRU (Least Recently Used)	Evict the line that was accessed least recently	Best — exploits temporal locality directly	Complex — requires tracking access order (counters or stack)
FIFO (First In, First Out)	Evict the line that was loaded earliest (a queue)	Good — simple approximation of LRU	Simple — just a pointer to oldest entry
Random	Evict a randomly chosen line	Surprisingly good on average	Very simple — random number generator
Optimal (OPT)	Evict the line that will be accessed furthest in the future	Best possible — but requires future knowledge	Impossible in real hardware; used as a theoretical benchmark

GATE tip: LRU trace questions are common. Always update the access order on every hit, not just misses. The line evicted is the one at the “back” of the LRU queue.

6. Write Policies — Write-Through vs Write-Back

Property	Write-Through	Write-Back
On a write hit	Write to cache AND main memory simultaneously	Write to cache only; set dirty bit
On a write miss	Write directly to main memory (write-around) or allocate+write	Load block into cache, then write (write-allocate)
Main memory consistency	Always consistent with cache	May be stale — updated only on eviction
Memory bandwidth	High — every write goes to memory	Low — writes only on dirty eviction
Dirty bit needed?	No	Yes
Write buffer needed?	Yes — to absorb bursts of writes	No (usually)
Used in	L1 caches in some designs, caches for I/O devices	L2 and L3 caches, most modern CPU designs

Write policies on a write miss:
Write-Allocate: load the block into cache, then write — pairs with write-back
No-Write-Allocate (Write-Around): write directly to memory, do not load into cache — pairs with write-through

7. AMAT Formula & Multi-Level Cache

Single-level cache AMAT:
AMAT = Hit time + (Miss rate × Miss penalty)

Two-level cache AMAT:
AMAT = L1 Hit time + (L1 Miss rate × (L2 Hit time + L2 Miss rate × Memory access time))

Alternative form with local vs global miss rate:
Global L2 miss rate = L1 miss rate × L2 local miss rate
AMAT = L1 Hit time + (L1 Miss rate × L2 Hit time) + (Global L2 Miss rate × Memory time)

Important distinction:

Local miss rate: misses seen by this cache level ÷ accesses to this cache level
Global miss rate: misses that reach main memory ÷ total accesses

GATE questions on AMAT almost always use the single-level formula. Two-level AMAT appears in harder questions — read carefully whether “miss penalty” means time to reach L2 or time to reach memory.

8. Types of Cache Misses — The 3 Cs

Miss Type	Also Called	Cause	How to Reduce
Compulsory	Cold-start miss	First access to a block — it has never been in cache	Larger block size (prefetch more); prefetching
Capacity	—	Working set is larger than the cache — even a fully associative cache would miss	Increase cache size
Conflict	Collision miss	Multiple blocks map to the same set and evict each other	Increase associativity; better cache indexing

9. GATE-Level Worked Examples

Example 1 — Address Breakdown (GATE 2021)

Problem: A cache has 128 sets, 2-way set-associative, block size = 32 bytes. The physical address is 32 bits. Find the number of tag, index, and offset bits.

Solution:
Block size = 32 bytes → Offset bits = log₂(32) = 5 bits
Number of sets = 128 → Index bits = log₂(128) = 7 bits
Tag bits = 32 − 7 − 5 = 20 bits

Verification: 20 + 7 + 5 = 32 ✓

Total cache size = 128 sets × 2 ways × 32 bytes = 8 KB

Example 2 — AMAT Calculation (GATE 2022)

Problem: A system has an L1 cache with 2 ns hit time and 5% miss rate. On a miss, it accesses main memory in 100 ns. A write buffer reduces effective miss penalty by 20 ns. What is the AMAT?

Solution:
Hit time = 2 ns
Miss rate = 0.05
Miss penalty = 100 − 20 = 80 ns (after write buffer benefit)

AMAT = 2 + (0.05 × 80) = 2 + 4 = 6 ns

Example 3 — LRU Replacement Trace (GATE 2019)

Problem: A 2-way set-associative cache has 4 sets. Consider Set 0. Using LRU replacement, trace the following accesses and find the number of misses: Blocks 0, 4, 0, 8, 4, 0, 4

Solution:
Blocks mapping to Set 0 (4 sets): 0, 4, 8, 12… (blocks where block# mod 4 = 0)
Cache Set 0 capacity: 2 ways

Access 0: MISS — cache: [0, —] (load 0)
Access 4: MISS — cache: [0, 4] (load 4)
Access 0: HIT — cache: [4, 0] (LRU: 4 is now oldest)
Access 8: MISS — evict LRU=4, cache: [0, 8]
Access 4: MISS — evict LRU=0, cache: [8, 4]
Access 0: MISS — evict LRU=8, cache: [4, 0]
Access 4: HIT — cache: [0, 4] (LRU: 0 is now oldest)

Total misses = 5, Hits = 2

10. Common Mistakes

Tag + Index + Offset not summing to address width
This is the most common calculation error. Always verify at the end. If you get Tag + Index + Offset ≠ 32 (or whatever the address width is), re-examine your log₂ calculations.
Confusing local and global miss rate in multi-level cache
The global miss rate reaching memory = L1 miss rate × L2 miss rate. The AMAT formula uses global miss rate for the memory penalty term, not L2’s local miss rate alone.
Updating LRU order only on misses, not on hits
In LRU, every access — hit or miss — updates the recency order. A line that was hit becomes the most recently used. Only tracking misses gives wrong eviction choices.
Applying write-back’s dirty bit logic to write-through caches
Write-through caches do not need a dirty bit — main memory is always up to date. Dirty bits are exclusive to write-back caches and signal that a line must be written to memory before eviction.
Forgetting that block size affects compulsory miss rate
Larger blocks reduce compulsory misses (more data fetched on each miss) but can increase miss penalty (more bytes to transfer). There is a diminishing return — the optimal block size balances these two effects. GATE occasionally asks about the effect of doubling block size.

11. Frequently Asked Questions

What is AMAT in cache memory?

AMAT (Average Memory Access Time) is the weighted average time to access memory, accounting for both cache hits and misses: AMAT = Hit time + (Miss rate × Miss penalty). It is the primary metric for cache performance in GATE COA questions. For a two-level cache, the formula extends to include L2 hit time and L2 miss penalty.

What is the difference between direct mapped and set-associative cache?

A direct mapped cache has exactly one cache line per set — each memory block maps to exactly one location, making conflict misses likely. A k-way set-associative cache has k lines per set — a memory block maps to one set but can occupy any of the k positions, which dramatically reduces conflict misses at the cost of slightly more hardware to check k tags simultaneously.

How do you break down a memory address for cache?

Split the address into three fields (right to left): Block Offset = log₂(block size in bytes) bits, Index = log₂(number of sets) bits, Tag = remaining bits. For fully associative, there is no index field. Always verify that the three fields sum to the total physical address width.

What is the difference between write-through and write-back cache?

Write-through: every cache write is immediately propagated to main memory, keeping them always consistent. Uses more memory bandwidth. Write-back: writes go only to cache; a dirty bit marks modified lines; main memory is updated only when the dirty line is evicted. Uses less bandwidth but adds complexity. Write-back typically delivers better performance in high-bandwidth scenarios.

Cache Memory — Direct Mapping, Set-Associative, AMAT & Replacement Policies | GATE CS