Memory Hierarchy in Computer Organisation
Registers, Cache, DRAM, ROM, Virtual Memory, TLB & Memory Organisation — Complete GATE CS Notes
Last updated: April 2026 | GATE CS syllabus aligned
Key Takeaways
- Memory hierarchy trades off speed, size, and cost — faster memory is smaller and more expensive per bit
- SRAM (flip-flop based) is used for cache — fast, no refresh, expensive. DRAM (capacitor based) is used for main memory — slower, needs refresh, cheap
- Memory organisation: to expand address space or data width, chips can be connected in a bank (word extension) or interleaved (bandwidth improvement)
- Virtual memory allows processes to use more address space than physical RAM — the OS manages page tables and handles page faults
- TLB (Translation Lookaside Buffer) caches recent address translations — a TLB hit avoids page table memory accesses
- Effective memory access time with TLB = TLB hit time + (TLB miss rate × page table access time)
- Thrashing occurs when page fault rate is so high that the CPU barely executes program instructions
1. The Memory Hierarchy Pyramid
Every computer manages a fundamental tension: the memory closest to the CPU (registers) is blazing fast but can only hold a handful of values; the memory that can store your entire operating system and applications (hard disk) is enormous but agonisingly slow. The memory hierarchy bridges this gap by using multiple layers, each faster and more expensive than the layer below it.
| Level | Technology | Typical Size | Access Time | Managed By |
|---|---|---|---|---|
| Registers | Flip-flops (SRAM) | ~256 bytes (32 × 64-bit) | 0.3–0.5 ns | Compiler |
| L1 Cache | SRAM | 32–64 KB per core | 1–4 cycles (≈1 ns) | Hardware |
| L2 Cache | SRAM | 256 KB – 1 MB | 10–20 cycles (≈5 ns) | Hardware |
| L3 Cache | SRAM | 4–32 MB (shared) | 30–50 cycles (≈15 ns) | Hardware |
| Main Memory | DRAM | 4–128 GB | 100–200 cycles (≈60 ns) | OS + Hardware |
| SSD | NAND Flash | 128 GB – 8 TB | ~50–100 μs | OS |
| HDD | Magnetic disk | 500 GB – 20 TB | ~5–10 ms | OS |
| Optical / Tape | Various | Effectively unlimited | Seconds to minutes | Operator |
The principle that makes this hierarchy work: most programs access a small subset of their data most of the time (locality of reference). If the cache holds the 1% of data that accounts for 99% of accesses, average access time is close to cache speed despite main memory being 100× slower.
2. SRAM vs DRAM
| Property | SRAM (Static RAM) | DRAM (Dynamic RAM) |
|---|---|---|
| Storage element | 6-transistor flip-flop (bistable latch) | 1 capacitor + 1 transistor |
| Data retention | Holds data as long as power is on (no refresh) | Capacitor leaks — must refresh every few ms |
| Speed | Very fast: 1–5 ns | Slower: 50–100 ns |
| Density | Low — 6 transistors per bit | High — 1 transistor + capacitor per bit |
| Power consumption | Higher (always active) | Lower when not accessed |
| Cost | Expensive per MB | Cheap per GB |
| Use | L1/L2/L3 cache, register files, TLB | Main memory (RAM) |
DRAM must be refreshed every T_refresh period (typically 64 ms).
During refresh, memory is unavailable for access.
Overhead % = (Refresh time per row × Number of rows) / T_refresh × 100%
3. ROM Types — ROM, PROM, EPROM, EEPROM, Flash
Read-Only Memory is non-volatile — it retains data without power. Different ROM types offer different trade-offs between reprogrammability and complexity.
| Type | Full Name | Programmed By | Erasable? | Use |
|---|---|---|---|---|
| ROM | Read-Only Memory | Manufacturer (mask ROM) | No | Fixed firmware in consumer devices |
| PROM | Programmable ROM | User (one-time, by burning fuses) | No | Small-run production firmware |
| EPROM | Erasable PROM | User (electrically) | Yes — UV light (entire chip) | Development and prototyping |
| EEPROM | Electrically Erasable PROM | User (electrically) | Yes — byte by byte, electrically | BIOS chips, smart cards |
| Flash | Flash EEPROM | User (electrically) | Yes — in blocks, electrically | SSDs, USB drives, BIOS, phones |
4. Memory Organisation — Chips, Banks & Interleaving
Real systems build their memory from multiple chips. Two configurations matter for GATE:
Word Extension (Increasing Address Space)
Connect chips to cover a larger address range — each chip covers a portion of the address space.
Total memory needed = N bytes
Each chip holds = C bytes
Number of chips = N / C
Address lines needed from CPU = log₂(N / word_size_in_bytes)
Address lines handled by chip select = log₂(N / C)
Address lines going into each chip = log₂(C / word_size)
Bit Extension (Wider Data Word)
Connect multiple chips in parallel to increase data width — each chip provides some bits of each word.
If CPU has 32-bit data bus and each chip is 8 bits wide:
Chips needed = 32 / 8 = 4 chips (connected in parallel)
Memory Interleaving
Interleaving divides memory into banks that can be accessed independently. While one bank is being read, the others are already preparing their next access — increasing effective memory bandwidth.
| Property | Low-Order Interleaving | High-Order Interleaving (Banking) |
|---|---|---|
| Bank selection bits | Low-order address bits → consecutive addresses in different banks | High-order bits → consecutive addresses in same bank |
| Best for | Sequential access (streaming) — exploits spatial locality | Independent bank access by multiple processors |
| Conflict | Stride-k access causes conflicts if k = bank count | Sequential access serialised within one bank |
5. Virtual Memory
Virtual memory is an abstraction that gives each process the illusion of having a large, private address space, even if physical RAM is limited. The OS and hardware collaborate to map virtual addresses to physical addresses, swapping pages between RAM and disk as needed.
Why virtual memory?
- Isolation: each process has its own virtual address space — process A cannot accidentally overwrite process B’s memory
- Size: a process’s virtual address space can be larger than physical RAM (64-bit processes can address 16 exabytes)
- Simplicity: programmers do not need to manage physical memory locations
Virtual page number (VPN) = Virtual address / Page size
Page offset = Virtual address mod Page size (same in physical address)
Physical Frame Number (PFN) = lookup VPN in page table
Physical Address = PFN × Page size + Page offset
Number of pages in virtual space:
= Virtual address space / Page size
= 2^(virtual address bits) / 2^(page offset bits)
= 2^(VPN bits)
6. Page Tables & Address Translation
The page table is an array (typically in main memory) where each entry maps one virtual page to one physical frame. Each entry (PTE — Page Table Entry) contains:
- Valid bit: 1 = page is in physical memory; 0 = page is on disk (triggers a page fault)
- Physical Frame Number (PFN): the location in RAM
- Protection bits: read/write/execute permissions
- Dirty bit: page has been written (must save to disk on eviction)
- Reference/Access bit: page has been accessed recently (used by replacement algorithms)
= Number of virtual pages × Size of each PTE
= (Virtual address space / Page size) × PTE size
Example: 32-bit virtual address, 4 KB pages, 4-byte PTE:
Number of pages = 2³²/ 2¹² = 2²⁰ = 1,048,576 pages
Page table size = 1,048,576 × 4 bytes = 4 MB (per process — huge!)
This 4 MB per-process overhead led to multi-level page tables: the page table itself is paged, so only the portions needed are kept in memory. A 2-level page table splits the VPN into two fields — a first-level index and a second-level index. Modern processors use 4-level page tables for 64-bit addressing.
7. TLB — Translation Lookaside Buffer
Every memory access in a virtual memory system requires at least two physical memory accesses: one to read the page table, and one for the actual data. This doubles memory access time — unacceptable for performance. The TLB solves this.
The TLB is a small (16–512 entries), fully-associative, on-chip cache that stores recent VPN→PFN mappings. On most accesses, the TLB provides the physical address in 1–2 cycles, avoiding the page table lookup entirely.
EMAT = TLB hit rate × (TLB time + Memory time)
+ TLB miss rate × (TLB time + Page table time + Memory time)
Simplified (if TLB access is part of every translation):
EMAT = TLB time + (TLB miss rate × Page table walk time) + Memory time
TLB hit: Physical address found in TLB → access data in memory
(Total: TLB lookup + 1 memory access)
TLB miss, page in memory: TLB lookup fails → walk page table (1+ memory accesses) → update TLB → access data
(Total: TLB lookup + k page table accesses + 1 memory access, where k = page table levels)
TLB miss, page fault: Page not in RAM → OS handles page fault → load page from disk → retry
(Disk access: millions of cycles — extremely expensive)
| Event | TLB | Page Table | Physical Memory | Disk |
|---|---|---|---|---|
| TLB Hit | Hit | Not needed | Accessed | Not needed |
| TLB Miss, Page In RAM | Miss | Accessed (to get PFN) | Accessed | Not needed |
| TLB Miss, Page Fault | Miss | Accessed (valid bit = 0) | Page loaded into RAM | Page read from disk |
8. Thrashing
Thrashing is a catastrophic performance failure in virtual memory systems. It happens when the total working set of all running processes exceeds available physical memory. Pages that are constantly needed keep getting evicted to make room for other pages that are also constantly needed — causing a continuous, high-rate stream of page faults.
The symptom: CPU utilisation drops sharply (often below 10%) even as I/O activity spikes, because the CPU is idle waiting for pages to be loaded from disk.
Causes of thrashing:
- Too many processes running simultaneously (high degree of multiprogramming)
- A process whose working set exceeds available frames
Solutions:
- Reduce the degree of multiprogramming (swap out a process entirely)
- Increase physical memory
- Use working set model — allocate enough frames for each process to hold its working set
- Use page fault frequency (PFF) algorithm — add frames to processes with high fault rates; reclaim from processes with low fault rates
9. GATE-Level Worked Examples
Example 1 — Memory Chip Organisation (GATE 2020)
Problem: A memory system requires 32 KB of total memory with 8-bit wide access. Available chips are 1K × 4-bit. How many chips are needed?
Required: 32 KB = 32 × 1024 = 32,768 locations × 8 bits
Each chip: 1K locations × 4 bits = 1024 × 4
Chips for bit extension (to get 8 bits wide): 8 / 4 = 2 chips per row
Chips for word extension (to get 32K locations): 32K / 1K = 32 rows
Total chips = 2 × 32 = 64 chips
Example 2 — Page Table Size (GATE 2021)
Problem: A 32-bit virtual address system uses pages of size 8 KB. Each page table entry is 4 bytes. What is the size of the page table for a single process?
Page size = 8 KB = 2¹³ bytes → Page offset = 13 bits
VPN bits = 32 − 13 = 19 bits
Number of virtual pages = 2¹⁹ = 524,288 pages
Page table size = 524,288 × 4 bytes = 2,097,152 bytes = 2 MB
Example 3 — EMAT with TLB (GATE 2022)
Problem: A system has a TLB with 90% hit rate. TLB access time = 10 ns. Main memory access time = 100 ns. On a TLB miss, one additional page table access is needed. What is the EMAT?
TLB hit: 10 ns (TLB) + 100 ns (memory) = 110 ns
TLB miss: 10 ns (TLB) + 100 ns (page table in memory) + 100 ns (data in memory) = 210 ns
EMAT = 0.90 × 110 + 0.10 × 210
= 99 + 21 = 120 ns
10. Common Mistakes
-
Confusing SRAM and DRAM refresh requirements
SRAM does not need refresh — it is a bistable circuit that holds its state indefinitely while powered. DRAM capacitors leak and must be refreshed every ~64 ms. Saying “SRAM needs periodic refresh” is a common incorrect choice in GATE MCQs. -
Forgetting the page table access in EMAT calculation
On a TLB miss, the CPU must access the page table in main memory before accessing the data. Students often count only the data memory access after a TLB miss, missing the page table lookup. For a single-level page table: TLB miss adds 1 extra memory access. -
Mixing virtual address bits and physical address bits
Virtual and physical address spaces can be different sizes. Page table size is determined by virtual address space (number of virtual pages). Physical frame addresses are determined by physical address space. Always distinguish which address space the question refers to. -
Computing page table size using physical frames, not virtual pages
Page table has one entry per virtual page, not per physical frame. Even if physical RAM is only 512 MB (fewer frames), the page table still needs entries for all 2^VPN virtual pages. -
Assuming thrashing is caused by a single process
Thrashing is a system-level phenomenon caused by the aggregate working set of all running processes exceeding physical memory. A single process with a small working set will not thrash — but it becomes a victim when other processes push total demand over the limit.
11. Frequently Asked Questions
What is the memory hierarchy in computer organisation?
The memory hierarchy is a layered arrangement of storage, ordered from fastest/smallest/most expensive (registers) to slowest/largest/cheapest (tape or optical). Each level acts as a cache for the level below it. The hierarchy works because most programs exhibit locality — they reuse a small, changing subset of their data most of the time.
What is the difference between SRAM and DRAM?
SRAM uses a 6-transistor bistable flip-flop to store each bit. It is fast (1–5 ns), does not require refreshing, but is large and expensive. Used for CPU caches. DRAM stores each bit in a tiny capacitor that leaks charge — it requires periodic refresh every ~64 ms to retain data, but offers very high density and low cost per gigabyte. Used for main memory.
What is a TLB in computer organisation?
The Translation Lookaside Buffer (TLB) is a small, fully-associative, on-chip cache that stores recent virtual-to-physical address translations. Without the TLB, every memory access would require at least two physical memory reads — one to look up the page table, one for the actual data. The TLB reduces most translations to a single on-chip lookup, dramatically reducing average memory access time.
What is thrashing in memory management?
Thrashing is a condition where the system’s page fault rate is so high that it spends more time handling page faults (loading pages from disk) than executing program instructions. CPU utilisation collapses, I/O spikes, and the system appears frozen. It is caused when total working set size exceeds physical memory. The solution is to reduce the number of concurrent processes or add physical memory.