Pipelining in Computer Organisation

Q: What is pipelining in computer organisation?

Pipelining is a technique where multiple instructions are overlapped in execution. Like an assembly line, different stages of different instructions execute simultaneously. A 5-stage pipeline (IF, ID, EX, MEM, WB) can have up to 5 instructions in flight at once, improving throughput without reducing the time for any single instruction.

Q: What are the three types of pipeline hazards?

The three pipeline hazards are: (1) Structural hazards — two instructions need the same hardware resource at the same time, (2) Data hazards — an instruction depends on the result of a previous instruction that has not yet completed (RAW, WAR, WAW), and (3) Control hazards — caused by branch instructions where the next instruction to fetch is unknown.

Q: What is the speedup formula for pipelining?

Pipeline speedup = (n × k) / (k + n − 1), where n is the number of instructions and k is the number of pipeline stages. For large n, speedup approaches k (the number of stages). Efficiency = Speedup / k = n / (k + n − 1).

Q: What is the difference between RAW, WAR, and WAW hazards?

RAW (Read After Write) is a true data dependency — instruction J tries to read a value before instruction I has written it. WAR (Write After Read) is an anti-dependency — instruction J tries to write a value before instruction I has read the old value. WAW (Write After Write) is an output dependency — two instructions write to the same register and order matters. In a simple in-order pipeline, only RAW hazards cause stalls; WAR and WAW are resolved by register renaming in out-of-order processors.

Pipeline Stages, Structural/Data/Control Hazards, Speedup & Efficiency — Complete GATE CS Notes

Last updated: April 2026 | Highest-yield COA topic for GATE CS

Key Takeaways

Pipelining overlaps execution of multiple instructions — like an assembly line — improving throughput without changing the time per instruction
A k-stage pipeline with n instructions takes (k + n − 1) clock cycles total
Speedup = (n × k) / (k + n − 1); as n → ∞, Speedup → k
Hazards disrupt the ideal pipeline: Structural (resource conflict), Data (dependency), Control (branches)
RAW (Read After Write) is the most common and only true hazard in simple pipelines — causes stall cycles
Forwarding (bypassing) resolves most RAW hazards without stalls by routing output directly to the next stage
Control hazards are handled by branch prediction; a misprediction flushes the pipeline and wastes cycles

1. What Is Pipelining?

Imagine a car wash with four stations: soap, rinse, wax, dry. Without pipelining, car 2 cannot enter until car 1 has gone through all four stations. With pipelining, as soon as car 1 moves to rinse, car 2 enters soap — all four stations work on different cars simultaneously.

A CPU pipeline works the same way. Each instruction passes through a series of stages. As soon as instruction 1 moves from stage IF to stage ID, instruction 2 can enter stage IF. At steady state, every clock cycle completes one instruction — something impossible without pipelining.

Two critical points that GATE exploits in questions:

Pipelining improves throughput, not latency. Each individual instruction still takes k clock cycles to complete (where k is the number of stages). Pipelining does not make any one instruction faster — it completes more instructions per unit time.
All stages must take the same time. The clock cycle is set by the slowest pipeline stage. If one stage takes 20 ns and all others take 5 ns, the entire pipeline runs at 20 ns per cycle — a severe bottleneck.

2. The Classic 5-Stage Pipeline

The MIPS 5-stage pipeline is the standard reference for GATE COA. Every numerical problem in GATE assumes this structure unless told otherwise.

Stage	Name	What Happens	Key Registers Updated
IF	Instruction Fetch	Fetch instruction from memory at address in PC; PC ← PC + 4	IR (Instruction Register), PC
ID	Instruction Decode / Register Read	Decode opcode; read source registers from register file; sign-extend immediate	A, B (register values), Imm
EX	Execute / Address Calculate	ALU performs operation (arithmetic, logic, or address calculation for load/store)	ALUout
MEM	Memory Access	Load: read data memory; Store: write to data memory; ALU instructions: pass through	MDR (Memory Data Register)
WB	Write Back	Write result into destination register in register file	Destination register

Pipeline Diagram (Space-Time Diagram)

The standard way to visualise a pipeline is a diagram where rows are instructions and columns are clock cycles:

Cycle: 1 2 3 4 5 6 7 8 9
I1: IF ID EX MEM WB
I2: IF ID EX MEM WB
I3: IF ID EX MEM WB
I4: IF ID EX MEM WB
I5: IF ID EX MEM WB

5 instructions, 5 stages → completes in 5 + 5 − 1 = 9 cycles

3. Speedup, Efficiency & Throughput Formulas

Time without pipelining (sequential):
T_seq = n × k × t_c
(n instructions × k stages × t_c clock cycle time)

Time with pipelining:
T_pipe = (k + n − 1) × t_c
(k cycles to fill pipeline + (n−1) additional cycles for remaining instructions)

Speedup:
S = T_seq / T_pipe = (n × k) / (k + n − 1)

As n → ∞ (many instructions):
S → k (ideal speedup equals number of stages)

Pipeline Efficiency:
E = S / k = n / (k + n − 1)

Throughput (instructions per second):
TP = n / T_pipe = n / ((k + n − 1) × t_c)
As n → ∞: TP → 1 / t_c (one instruction per clock cycle)

Effective CPI with stalls:
CPI_eff = 1 + (stall cycles per instruction)

Clock cycle time — set by bottleneck stage:
t_c = max(t_stage1, t_stage2, …, t_stagek) + t_latch
(t_latch = pipeline register overhead, typically 1–2 ns)

4. Pipeline Hazards — All Three Types

A hazard is any condition that prevents the next instruction from executing in its designated clock cycle. Hazards are the single most tested aspect of pipelining in GATE.

4.1 Structural Hazards

A structural hazard occurs when two instructions in the pipeline need the same hardware resource at the same time, and the hardware is not duplicated.

Classic example: A processor with a single unified memory (no separate instruction and data caches). In the same cycle, IF stage fetches an instruction (reads memory) while MEM stage accesses data (reads/writes memory). Both cannot use the same memory port simultaneously.

Solutions:

Split the cache into separate instruction cache and data cache (Harvard-style internally)
Insert a pipeline stall (bubble) — MEM gets priority, IF waits one cycle
Duplicate the resource if cost allows

Stall insertion for structural hazard:
Insert a “bubble” (NOP — No Operation) into the pipeline. This flushes one stage and wastes one cycle per hazard.

4.2 Data Hazards — RAW, WAR, WAW

A data hazard occurs when an instruction depends on a result that has not yet been produced by an earlier instruction still in the pipeline.

Type	Full Name	Description	Example	Causes Stalls?
RAW	Read After Write	J reads a register before I has written its result	I: ADD R1, R2, R3 → J: SUB R4, R1, R5 (J needs R1)	Yes — true dependency
WAR	Write After Read	J writes a register before I reads the old value	I: SUB R4, R1, R5 → J: ADD R1, R2, R3 (J overwrites R1 before I reads it)	Not in simple in-order pipelines
WAW	Write After Write	J writes a register before I writes it, causing wrong final value	I: ADD R1, R2, R3 → J: SUB R1, R4, R5 (both write R1)	Not in simple in-order pipelines

In a simple 5-stage in-order pipeline like MIPS, only RAW hazards cause stalls. WAR and WAW are only problematic in out-of-order processors, where instructions execute out of program order.

How Many Stall Cycles Does a RAW Hazard Insert?

In a 5-stage pipeline (IF, ID, EX, MEM, WB), a value produced in EX is written back in WB. The instruction after the producer reads registers in ID. If consecutive:

Without forwarding:
Producer (I1) writes result in WB = cycle 5
Consumer (I2, immediately after I1) reads in ID = cycle 3
Gap = 5 − 3 = 2 stall cycles needed

If consumer is 2 instructions after producer (I3):
I3 reads in ID = cycle 4 → 5 − 4 = 1 stall cycle

If consumer is 3+ instructions after producer: no stall

4.3 Control Hazards (Branch Hazards)

A branch instruction (like BEQ, BNE, JMP) changes the program counter. The problem: the next instruction has already been fetched (and possibly decoded) before the CPU knows whether the branch is taken. If the branch is taken, the fetched instructions are wrong and must be discarded — wasting pipeline cycles.

Solutions to control hazards:

Solution	How It Works	Cost
Flush pipeline	Discard instructions fetched after the branch; wait for branch resolution	(k − 1) stall cycles — worst case
Branch prediction (static)	Always predict branch not taken (continue fetching); flush on misprediction	0 cycles if correct; penalty cycles if wrong
Branch prediction (dynamic)	Use history table to predict taken/not-taken based on past behaviour	Misprediction penalty × misprediction rate
Delayed branch	Always execute the instruction(s) after the branch (delay slot); compiler fills delay slot with useful work	0 cycles if compiler fills slot well

Branch penalty cycles:
In a 5-stage pipeline where branch resolution happens at end of EX (stage 3):
Instructions fetched before branch resolved = 2 (IF and ID stages)
Branch penalty = 2 cycles (if branch is taken and not predicted)

CPI with branches:
CPI = 1 + (branch_frequency × branch_penalty × misprediction_rate)

5. Data Forwarding (Bypassing)

Forwarding is the most important solution to RAW hazards. Instead of waiting for a result to be written back to the register file, the result is forwarded directly from the pipeline register where it sits to the stage that needs it.

Without forwarding:
I1: ADD R1, R2, R3 → result available in WB (cycle 5)
I2: SUB R4, R1, R5 → needs R1 in ID (cycle 3) → 2 stalls

With forwarding (EX→EX path):
I1 produces result at end of EX (cycle 3)
I2 needs result at start of EX (cycle 4)
Forward EX/MEM pipeline register → EX ALU input → 0 stalls

Load-use hazard (cannot be fully forwarded):
I1: LW R1, 0(R2) → result available after MEM (cycle 4)
I2: SUB R4, R1, R5 → needs R1 in EX (cycle 4) → impossible without a stall
Solution: Insert 1 stall + forward MEM/WB → EX

The load-use hazard (a LOAD followed immediately by an instruction that uses the loaded value) is the one case where forwarding alone cannot eliminate the stall. This is why compilers try to schedule a useful instruction between a load and its consumer — a technique called load delay slot scheduling.

6. GATE-Level Worked Examples

Example 1 — Basic Speedup Calculation (GATE 2020)

Problem: A non-pipelined processor has a clock cycle of 40 ns. A 5-stage pipelined version of the same processor has a clock cycle of 10 ns. What is the speedup for a program with 100 instructions?

Solution:
T_sequential = 100 × 5 × 40 ns = 20,000 ns
(Each instruction takes 5 stages × 40 ns each in non-pipelined mode)

T_pipelined = (5 + 100 − 1) × 10 ns = 104 × 10 = 1,040 ns

Speedup = 20,000 / 1,040 = 19.23

Note: Speedup is less than 5 (number of stages) because n = 100 is not infinite, and the pipelined clock is not exactly 40/5 due to latch overhead.

Example 2 — Stall Cycles with RAW Hazards (GATE 2022 style)

Problem: In a 5-stage pipeline without forwarding, the following instruction sequence is executed. How many stall cycles are inserted?

I1: ADD R1, R2, R3
I2: SUB R4, R1, R5    ← needs R1 (produced by I1)
I3: MUL R6, R4, R7    ← needs R4 (produced by I2)
I4: DIV R8, R6, R9    ← needs R6 (produced by I3)

Solution (no forwarding):
I1→I2 dependency (R1): I2 immediately follows I1 → 2 stall cycles
I2→I3 dependency (R4): After 2 stalls, I3 follows I2 by 3 cycles → 0 stalls (3 instructions apart = safe)
Wait — with stalls inserted, I2 starts later. Re-examine:

Without stalls: I1(WB cycle 5), I2(ID cycle 3) → 2 stalls
After 2 stalls inserted after I2: I3 originally at ID cycle 4, now pushed to cycle 6.
I2 WB is at cycle 5+2=7. I3 ID at cycle 6 → still 1 stall needed.
After I3 stall: I4 pushed by 1 more. I3 WB at cycle 9. I4 ID at cycle 8 → 1 stall.

Total stalls = 2 + 1 + 1 = 4 stall cycles

Example 3 — Effective CPI with Branches (GATE 2023 style)

Problem: A 5-stage pipeline resolves branches at the end of EX stage. 20% of instructions are branches. The branch predictor is correct 90% of the time. Branch misprediction penalty is 2 cycles. What is the effective CPI?

Solution:
Base CPI = 1
Branch penalty per misprediction = 2 cycles
Misprediction rate = 1 − 0.90 = 0.10
Branch frequency = 0.20

Stall cycles per instruction = branch_freq × misprediction_rate × penalty
= 0.20 × 0.10 × 2 = 0.04

Effective CPI = 1 + 0.04 = 1.04

7. Common Mistakes

Using k cycles (not k + n − 1) for total pipeline time
The total time is (k + n − 1) clock cycles, not k. The k + n − 1 accounts for (k − 1) cycles to fill the pipeline initially, then 1 cycle per instruction. Using just k gives the time for one instruction, not n instructions.
Confusing latency and throughput
Pipelining increases throughput (instructions per second) but does NOT reduce latency (time for one instruction). A 5-stage pipeline still takes 5 clock cycles per instruction. GATE sometimes asks which pipelining improves — the answer is always throughput.
Missing the load-use hazard as a special case
Students apply forwarding and claim zero stalls for all RAW hazards. The load-use case (LW followed immediately by a use of the loaded register) always requires 1 stall even with forwarding, because the data is only available after the MEM stage — one cycle too late for EX forwarding.
Forgetting the latch overhead in clock cycle calculation
The pipeline clock cycle = max(stage time) + latch overhead. If a question gives stage times and latch delay separately, add them. Forgetting the latch overhead underestimates the clock cycle time.
Claiming WAR and WAW cause stalls in a simple pipeline
In a simple 5-stage in-order pipeline, WAR and WAW hazards do not cause stalls because instructions complete in order. These hazards matter in superscalar or out-of-order processors. GATE usually tests in simple in-order pipelines — be precise about which model the question assumes.

8. Frequently Asked Questions

What is pipelining in computer organisation?

Pipelining is a performance technique where the execution of multiple instructions overlaps in time. Just like an assembly line processes multiple cars at different stations simultaneously, a CPU pipeline processes multiple instructions in different stages simultaneously. A k-stage pipeline can have up to k instructions in flight at once, ideally completing one instruction per clock cycle at steady state.

What are the three types of pipeline hazards?

Structural hazards arise when two instructions compete for the same hardware resource at the same time. Data hazards arise when an instruction needs a result that a previous instruction has not yet produced — the most important type is RAW (Read After Write). Control hazards arise from branch instructions, where the CPU fetches the wrong instruction if it does not know whether the branch will be taken.

What is the speedup formula for pipelining?

Speedup = (n × k) / (k + n − 1), where n is the number of instructions and k is the number of stages. For infinite instructions, speedup approaches k. The efficiency = Speedup / k = n / (k + n − 1), which shows how fully the pipeline is utilised.

What is the difference between RAW, WAR, and WAW hazards?

RAW (Read After Write) is a true data dependency — a consumer reads before the producer writes. It causes stalls in in-order pipelines and is the most common hazard. WAR (Write After Read) is an anti-dependency — an instruction writes before a previous one finishes reading. WAW (Write After Write) is an output dependency — two instructions write the same register and order matters. WAR and WAW only cause problems in out-of-order processors; in-order pipelines are immune.

Pipelining in Computer Organisation — Stages, Hazards, Speedup & GATE Numericals