Pipelining in Computer Organisation — Stages, Hazards, Speedup & GATE Numericals

Pipelining in Computer Organisation

Pipeline Stages, Structural/Data/Control Hazards, Speedup & Efficiency — Complete GATE CS Notes

Last updated: April 2026  |  Highest-yield COA topic for GATE CS

Key Takeaways

  • Pipelining overlaps execution of multiple instructions — like an assembly line — improving throughput without changing the time per instruction
  • A k-stage pipeline with n instructions takes (k + n − 1) clock cycles total
  • Speedup = (n × k) / (k + n − 1); as n → ∞, Speedup → k
  • Hazards disrupt the ideal pipeline: Structural (resource conflict), Data (dependency), Control (branches)
  • RAW (Read After Write) is the most common and only true hazard in simple pipelines — causes stall cycles
  • Forwarding (bypassing) resolves most RAW hazards without stalls by routing output directly to the next stage
  • Control hazards are handled by branch prediction; a misprediction flushes the pipeline and wastes cycles

1. What Is Pipelining?

Imagine a car wash with four stations: soap, rinse, wax, dry. Without pipelining, car 2 cannot enter until car 1 has gone through all four stations. With pipelining, as soon as car 1 moves to rinse, car 2 enters soap — all four stations work on different cars simultaneously.

A CPU pipeline works the same way. Each instruction passes through a series of stages. As soon as instruction 1 moves from stage IF to stage ID, instruction 2 can enter stage IF. At steady state, every clock cycle completes one instruction — something impossible without pipelining.

Two critical points that GATE exploits in questions:

  • Pipelining improves throughput, not latency. Each individual instruction still takes k clock cycles to complete (where k is the number of stages). Pipelining does not make any one instruction faster — it completes more instructions per unit time.
  • All stages must take the same time. The clock cycle is set by the slowest pipeline stage. If one stage takes 20 ns and all others take 5 ns, the entire pipeline runs at 20 ns per cycle — a severe bottleneck.

2. The Classic 5-Stage Pipeline

The MIPS 5-stage pipeline is the standard reference for GATE COA. Every numerical problem in GATE assumes this structure unless told otherwise.

StageNameWhat HappensKey Registers Updated
IFInstruction FetchFetch instruction from memory at address in PC; PC ← PC + 4IR (Instruction Register), PC
IDInstruction Decode / Register ReadDecode opcode; read source registers from register file; sign-extend immediateA, B (register values), Imm
EXExecute / Address CalculateALU performs operation (arithmetic, logic, or address calculation for load/store)ALUout
MEMMemory AccessLoad: read data memory; Store: write to data memory; ALU instructions: pass throughMDR (Memory Data Register)
WBWrite BackWrite result into destination register in register fileDestination register

Pipeline Diagram (Space-Time Diagram)

The standard way to visualise a pipeline is a diagram where rows are instructions and columns are clock cycles:

Cycle: 1 2 3 4 5 6 7 8 9
I1: IF ID EX MEM WB
I2: IF ID EX MEM WB
I3: IF ID EX MEM WB
I4: IF ID EX MEM WB
I5: IF ID EX MEM WB

5 instructions, 5 stages → completes in 5 + 5 − 1 = 9 cycles

3. Speedup, Efficiency & Throughput Formulas

Time without pipelining (sequential):
T_seq = n × k × t_c
(n instructions × k stages × t_c clock cycle time)

Time with pipelining:
T_pipe = (k + n − 1) × t_c
(k cycles to fill pipeline + (n−1) additional cycles for remaining instructions)

Speedup:
S = T_seq / T_pipe = (n × k) / (k + n − 1)

As n → ∞ (many instructions):
S → k (ideal speedup equals number of stages)

Pipeline Efficiency:
E = S / k = n / (k + n − 1)

Throughput (instructions per second):
TP = n / T_pipe = n / ((k + n − 1) × t_c)
As n → ∞: TP → 1 / t_c (one instruction per clock cycle)

Effective CPI with stalls:
CPI_eff = 1 + (stall cycles per instruction)

Clock cycle time — set by bottleneck stage:
t_c = max(t_stage1, t_stage2, …, t_stagek) + t_latch
(t_latch = pipeline register overhead, typically 1–2 ns)

4. Pipeline Hazards — All Three Types

A hazard is any condition that prevents the next instruction from executing in its designated clock cycle. Hazards are the single most tested aspect of pipelining in GATE.

4.1 Structural Hazards

A structural hazard occurs when two instructions in the pipeline need the same hardware resource at the same time, and the hardware is not duplicated.

Classic example: A processor with a single unified memory (no separate instruction and data caches). In the same cycle, IF stage fetches an instruction (reads memory) while MEM stage accesses data (reads/writes memory). Both cannot use the same memory port simultaneously.

Solutions:

  • Split the cache into separate instruction cache and data cache (Harvard-style internally)
  • Insert a pipeline stall (bubble) — MEM gets priority, IF waits one cycle
  • Duplicate the resource if cost allows
Stall insertion for structural hazard:
Insert a “bubble” (NOP — No Operation) into the pipeline. This flushes one stage and wastes one cycle per hazard.

4.2 Data Hazards — RAW, WAR, WAW

A data hazard occurs when an instruction depends on a result that has not yet been produced by an earlier instruction still in the pipeline.

TypeFull NameDescriptionExampleCauses Stalls?
RAWRead After WriteJ reads a register before I has written its resultI: ADD R1, R2, R3 → J: SUB R4, R1, R5 (J needs R1)Yes — true dependency
WARWrite After ReadJ writes a register before I reads the old valueI: SUB R4, R1, R5 → J: ADD R1, R2, R3 (J overwrites R1 before I reads it)Not in simple in-order pipelines
WAWWrite After WriteJ writes a register before I writes it, causing wrong final valueI: ADD R1, R2, R3 → J: SUB R1, R4, R5 (both write R1)Not in simple in-order pipelines

In a simple 5-stage in-order pipeline like MIPS, only RAW hazards cause stalls. WAR and WAW are only problematic in out-of-order processors, where instructions execute out of program order.

How Many Stall Cycles Does a RAW Hazard Insert?

In a 5-stage pipeline (IF, ID, EX, MEM, WB), a value produced in EX is written back in WB. The instruction after the producer reads registers in ID. If consecutive:

Without forwarding:
Producer (I1) writes result in WB = cycle 5
Consumer (I2, immediately after I1) reads in ID = cycle 3
Gap = 5 − 3 = 2 stall cycles needed

If consumer is 2 instructions after producer (I3):
I3 reads in ID = cycle 4 → 5 − 4 = 1 stall cycle

If consumer is 3+ instructions after producer: no stall

4.3 Control Hazards (Branch Hazards)

A branch instruction (like BEQ, BNE, JMP) changes the program counter. The problem: the next instruction has already been fetched (and possibly decoded) before the CPU knows whether the branch is taken. If the branch is taken, the fetched instructions are wrong and must be discarded — wasting pipeline cycles.

Solutions to control hazards:

SolutionHow It WorksCost
Flush pipelineDiscard instructions fetched after the branch; wait for branch resolution(k − 1) stall cycles — worst case
Branch prediction (static)Always predict branch not taken (continue fetching); flush on misprediction0 cycles if correct; penalty cycles if wrong
Branch prediction (dynamic)Use history table to predict taken/not-taken based on past behaviourMisprediction penalty × misprediction rate
Delayed branchAlways execute the instruction(s) after the branch (delay slot); compiler fills delay slot with useful work0 cycles if compiler fills slot well
Branch penalty cycles:
In a 5-stage pipeline where branch resolution happens at end of EX (stage 3):
Instructions fetched before branch resolved = 2 (IF and ID stages)
Branch penalty = 2 cycles (if branch is taken and not predicted)

CPI with branches:
CPI = 1 + (branch_frequency × branch_penalty × misprediction_rate)

5. Data Forwarding (Bypassing)

Forwarding is the most important solution to RAW hazards. Instead of waiting for a result to be written back to the register file, the result is forwarded directly from the pipeline register where it sits to the stage that needs it.

Without forwarding:
I1: ADD R1, R2, R3 → result available in WB (cycle 5)
I2: SUB R4, R1, R5 → needs R1 in ID (cycle 3) → 2 stalls

With forwarding (EX→EX path):
I1 produces result at end of EX (cycle 3)
I2 needs result at start of EX (cycle 4)
Forward EX/MEM pipeline register → EX ALU input → 0 stalls

Load-use hazard (cannot be fully forwarded):
I1: LW R1, 0(R2) → result available after MEM (cycle 4)
I2: SUB R4, R1, R5 → needs R1 in EX (cycle 4) → impossible without a stall
Solution: Insert 1 stall + forward MEM/WB → EX

The load-use hazard (a LOAD followed immediately by an instruction that uses the loaded value) is the one case where forwarding alone cannot eliminate the stall. This is why compilers try to schedule a useful instruction between a load and its consumer — a technique called load delay slot scheduling.

6. GATE-Level Worked Examples

Example 1 — Basic Speedup Calculation (GATE 2020)

Problem: A non-pipelined processor has a clock cycle of 40 ns. A 5-stage pipelined version of the same processor has a clock cycle of 10 ns. What is the speedup for a program with 100 instructions?

Solution:
T_sequential = 100 × 5 × 40 ns = 20,000 ns
(Each instruction takes 5 stages × 40 ns each in non-pipelined mode)

T_pipelined = (5 + 100 − 1) × 10 ns = 104 × 10 = 1,040 ns

Speedup = 20,000 / 1,040 = 19.23

Note: Speedup is less than 5 (number of stages) because n = 100 is not infinite, and the pipelined clock is not exactly 40/5 due to latch overhead.

Example 2 — Stall Cycles with RAW Hazards (GATE 2022 style)

Problem: In a 5-stage pipeline without forwarding, the following instruction sequence is executed. How many stall cycles are inserted?

I1: ADD R1, R2, R3
I2: SUB R4, R1, R5    ← needs R1 (produced by I1)
I3: MUL R6, R4, R7    ← needs R4 (produced by I2)
I4: DIV R8, R6, R9    ← needs R6 (produced by I3)
Solution (no forwarding):
I1→I2 dependency (R1): I2 immediately follows I1 → 2 stall cycles
I2→I3 dependency (R4): After 2 stalls, I3 follows I2 by 3 cycles → 0 stalls (3 instructions apart = safe)
Wait — with stalls inserted, I2 starts later. Re-examine:

Without stalls: I1(WB cycle 5), I2(ID cycle 3) → 2 stalls
After 2 stalls inserted after I2: I3 originally at ID cycle 4, now pushed to cycle 6.
I2 WB is at cycle 5+2=7. I3 ID at cycle 6 → still 1 stall needed.
After I3 stall: I4 pushed by 1 more. I3 WB at cycle 9. I4 ID at cycle 8 → 1 stall.

Total stalls = 2 + 1 + 1 = 4 stall cycles

Example 3 — Effective CPI with Branches (GATE 2023 style)

Problem: A 5-stage pipeline resolves branches at the end of EX stage. 20% of instructions are branches. The branch predictor is correct 90% of the time. Branch misprediction penalty is 2 cycles. What is the effective CPI?

Solution:
Base CPI = 1
Branch penalty per misprediction = 2 cycles
Misprediction rate = 1 − 0.90 = 0.10
Branch frequency = 0.20

Stall cycles per instruction = branch_freq × misprediction_rate × penalty
= 0.20 × 0.10 × 2 = 0.04

Effective CPI = 1 + 0.04 = 1.04

7. Common Mistakes

  1. Using k cycles (not k + n − 1) for total pipeline time
    The total time is (k + n − 1) clock cycles, not k. The k + n − 1 accounts for (k − 1) cycles to fill the pipeline initially, then 1 cycle per instruction. Using just k gives the time for one instruction, not n instructions.
  2. Confusing latency and throughput
    Pipelining increases throughput (instructions per second) but does NOT reduce latency (time for one instruction). A 5-stage pipeline still takes 5 clock cycles per instruction. GATE sometimes asks which pipelining improves — the answer is always throughput.
  3. Missing the load-use hazard as a special case
    Students apply forwarding and claim zero stalls for all RAW hazards. The load-use case (LW followed immediately by a use of the loaded register) always requires 1 stall even with forwarding, because the data is only available after the MEM stage — one cycle too late for EX forwarding.
  4. Forgetting the latch overhead in clock cycle calculation
    The pipeline clock cycle = max(stage time) + latch overhead. If a question gives stage times and latch delay separately, add them. Forgetting the latch overhead underestimates the clock cycle time.
  5. Claiming WAR and WAW cause stalls in a simple pipeline
    In a simple 5-stage in-order pipeline, WAR and WAW hazards do not cause stalls because instructions complete in order. These hazards matter in superscalar or out-of-order processors. GATE usually tests in simple in-order pipelines — be precise about which model the question assumes.

8. Frequently Asked Questions

What is pipelining in computer organisation?

Pipelining is a performance technique where the execution of multiple instructions overlaps in time. Just like an assembly line processes multiple cars at different stations simultaneously, a CPU pipeline processes multiple instructions in different stages simultaneously. A k-stage pipeline can have up to k instructions in flight at once, ideally completing one instruction per clock cycle at steady state.

What are the three types of pipeline hazards?

Structural hazards arise when two instructions compete for the same hardware resource at the same time. Data hazards arise when an instruction needs a result that a previous instruction has not yet produced — the most important type is RAW (Read After Write). Control hazards arise from branch instructions, where the CPU fetches the wrong instruction if it does not know whether the branch will be taken.

What is the speedup formula for pipelining?

Speedup = (n × k) / (k + n − 1), where n is the number of instructions and k is the number of stages. For infinite instructions, speedup approaches k. The efficiency = Speedup / k = n / (k + n − 1), which shows how fully the pipeline is utilised.

What is the difference between RAW, WAR, and WAW hazards?

RAW (Read After Write) is a true data dependency — a consumer reads before the producer writes. It causes stalls in in-order pipelines and is the most common hazard. WAR (Write After Read) is an anti-dependency — an instruction writes before a previous one finishes reading. WAW (Write After Write) is an output dependency — two instructions write the same register and order matters. WAR and WAW only cause problems in out-of-order processors; in-order pipelines are immune.

Leave a Comment