Code Optimisation - CSE, Constant Folding, Loop

Q: What is the difference between local and global optimisation?

Local optimisation works within a single basic block (straight-line code). Global optimisation works across multiple basic blocks using the CFG and data flow analysis. Local optimisation sees all values in one block; global optimisation can move code across branches but requires proving safety across all paths.

Q: What is live variable analysis and why is it important for register allocation?

A variable is live at a point if its value will be used later. Live variable analysis computes the live set at every program point. For register allocation: a register cannot be reused while its variable is still live. Two variables never simultaneously live can share a register. This is the foundation of graph-colouring register allocation.

Q: What is strength reduction and when is it used?

Strength reduction replaces expensive operations with cheaper equivalents — most commonly replacing i*k inside a loop with repeated addition. It is applied during loop optimisation and matters most for memory address calculations and systems where multiply is expensive.

Key Takeaways:

Code optimisation improves program performance (speed, memory, power) without changing what the program computes.
Local optimisation works within a single basic block. Global optimisation works across basic blocks using the Control Flow Graph.
Common Subexpression Elimination (CSE) avoids recomputing the same expression by reusing its previously computed value.
Constant folding evaluates constant expressions at compile time. Constant propagation replaces a variable with its known constant value at each use.
Loop optimisations (invariant code motion, strength reduction, induction variable elimination) are the most impactful for programs with tight loops.
Data flow analysis (reaching definitions, live variable analysis) computes global information about values and variable lifetimes across the CFG.

1. Optimisation Taxonomy

Optimisations are classified by scope:

Level	Scope	Examples
Peephole	Few consecutive instructions	Remove redundant loads, replace multiply-by-2 with shift
Local	Within a single basic block	CSE, constant folding, dead code elimination
Global	Across basic blocks within a function	Global CSE, constant propagation, live variable analysis
Loop	Within and around loops	Code motion, strength reduction, loop unrolling
Interprocedural	Across function boundaries	Inlining, alias analysis, whole-program optimisation

The fundamental constraint: An optimisation must be safe — it must never change the observable behaviour of a correct program. Compilers prove safety using data flow analysis before applying transformations.

2. Peephole Optimisation

Peephole optimisation examines a small sliding window of instructions (the “peephole”) and replaces pattern sequences with more efficient equivalents. It is simple, machine-specific, and applied after code generation or on intermediate code.

Common Peephole Transformations

Pattern (Before)	Replacement (After)	Reason
x = x + 0	(delete)	Adding 0 is identity
x = x * 1	(delete)	Multiplying by 1 is identity
x = x * 0	x = 0	Zero property
x = x * 2	x = x << 1	Shift is cheaper than multiply
x = x / 4	x = x >> 2	Shift is cheaper than divide (unsigned)
STORE x; LOAD x	STORE x	Redundant load after store to same location
JUMP L; L:	L:	Useless jump to next instruction
if (2 > 3) goto L	(delete)	Always-false branch

3. Local Optimisations (Within a Basic Block)

Constant Folding

Evaluate expressions with constant operands at compile time.

t = 3 * 4 + 2 → t = 14
if (3 > 5) goto L → (deleted — always false)

Constant Propagation

Replace a variable with its known constant value at all subsequent uses within the block.

x = 5; y = x + 3 → x = 5; y = 8

Common Subexpression Elimination (CSE)

If the same expression is computed multiple times and its operands have not changed between computations, replace the second computation with a reference to the first result.

t1 = a * b; t2 = a * b; c = t1 + t2
→ t1 = a * b; c = t1 + t1

Dead Code Elimination

Remove instructions whose results are never used. An assignment x = expr is dead if x is never read after this point.

t = a + b; /* t is never used */ → (deleted)

Copy Propagation

After a copy x = y, replace subsequent uses of x with y (while y is not redefined).

x = y; z = x + 1 → x = y; z = y + 1
(then dead code elimination may remove x = y if x unused)

4. Loop Optimisations

Loops are the hotspot of most programs — a statement executed 1000 times in a loop contributes 1000× more to runtime than a statement executed once. Loop optimisations yield the greatest performance gains.

Loop Invariant Code Motion (LICM)

Move computations that produce the same result on every iteration to outside the loop (into the loop header/pre-header).

for (i=0; i<n; i++) { t = a * b; x[i] = t + i; }
→ t = a * b; for (i=0; i<n; i++) { x[i] = t + i; }

Strength Reduction

Replace expensive operations with cheaper equivalent ones. Most commonly: replace multiplication by an induction variable with addition.

for (i=0; i<n; i++) { t = i * 4; a[t] = 0; }
→ t = 0; for (i=0; i<n; i++) { a[t] = 0; t = t + 4; }

Induction Variable Elimination

After strength reduction, the original loop counter (induction variable) may no longer be needed — it can be eliminated if the loop exit condition can be reformulated in terms of the new variable.

Loop Unrolling

Replicate the loop body k times and reduce the iteration count by k, eliminating k-1 out of every k branch-back-to-loop-start instructions.

for (i=0; i<8; i++) a[i] = 0;
→ a[0]=0; a[1]=0; a[2]=0; a[3]=0; a[4]=0; a[5]=0; a[6]=0; a[7]=0;

Loop Fusion

Combine two adjacent loops with the same iteration range into one loop, reducing loop overhead and improving cache locality.

Loop Interchange

Swap the order of nested loops to improve cache performance (accessing array memory in row-major order).

5. Data Flow Analysis

Data flow analysis computes information about values and variable uses across the entire CFG — not just within one basic block. It uses equations defined for each basic block (gen and kill sets) and propagates solutions iteratively until a fixed point is reached.

Each analysis defines a direction (forward or backward) and a meet operator (∪ or ∩).

Analysis	Direction	Meet	Purpose
Reaching Definitions	Forward	∪	Which definitions reach each point?
Live Variables	Backward	∪	Which variables are live (needed) at each point?
Available Expressions	Forward	∩	Which expressions have already been computed?
Very Busy Expressions	Backward	∩	Which expressions are computed on every path?

6. Reaching Definitions

A definition d of variable x reaches point p if there exists a path from d to p along which x is not redefined. Used for constant propagation and identifying uses of uninitialised variables.

Transfer Functions

OUT[B] = gen[B] ∪ (IN[B] − kill[B])
IN[B] = ∪ OUT[P] for all predecessors P of B

gen[B]: Definitions generated in B (last definition of each variable in B)
kill[B]: All other definitions of variables defined in B (from anywhere in the program)

7. Live Variable Analysis

A variable x is live at point p if there is a path from p to some use of x along which x is not redefined. Used for dead code elimination and register allocation (live variables must stay in registers or memory).

Transfer Functions (Backward Analysis)

IN[B] = use[B] ∪ (OUT[B] − def[B])
OUT[B] = ∪ IN[S] for all successors S of B

use[B]: Variables used in B before being defined in B
def[B]: Variables defined in B before being used in B

8. Available Expressions

An expression x op y is available at point p if on every path from entry to p, x op y has been computed and neither x nor y has been subsequently modified. Used for global CSE.

Transfer Functions (Forward Analysis)

OUT[B] = e_gen[B] ∪ (IN[B] − e_kill[B])
IN[B] = ∩ OUT[P] for all predecessors P of B

Note: the meet operator is intersection — an expression is only available if it was computed on all preceding paths (conservative: must be available everywhere to safely reuse it).

9. Common Misconceptions

Misconception 1: Code optimisation can always make programs faster.
Optimisation is a best-effort process. Some transformations trade speed for memory (loop unrolling increases code size). Some optimisations help on one architecture but hurt on another (cache effects vary). Compilers offer optimisation levels (O0–O3) precisely because aggressive optimisation is not always beneficial.

Misconception 2: Constant folding and constant propagation are the same thing.
Constant folding evaluates constant expressions at compile time (3*4 becomes 12). Constant propagation replaces a variable with its known constant value (if x=5, replace later uses of x with 5). Folding applies to literal expressions; propagation tracks variables. They are often applied together iteratively.

Misconception 3: Dead code elimination removes unreachable code.
Unreachable code (e.g., code after a return statement) is one form of dead code. Dead code in the data flow sense is live code that produces results nobody reads — assignments to variables never subsequently used. Both are eliminated, but they are detected differently: unreachable code by CFG analysis, dead assignments by live variable analysis.

Misconception 4: Loop invariant code motion is always safe.
Moving code out of a loop is safe only if the instruction would have been executed at least once (the loop runs at least once) and has no side effects (or the side effect can safely happen earlier). Moving an expression that could throw an exception changes the program’s observable behaviour. Compilers check these conditions carefully.

Misconception 5: Data flow analysis gives exact answers.
Data flow analysis gives conservative (safe) approximations. Reaching definitions analysis may report that a definition “might reach” a point even if it never does at runtime (due to infeasible paths). The analysis is correct in the sense that it never misses a real dependency — but it may report false dependencies. This conservatism ensures transformations based on it are always safe.

10. Frequently Asked Questions

What is the difference between local and global optimisation?

Local optimisation works within a single basic block — a straight-line code segment with no branches. It can see all values computed in that block and apply transformations like CSE, constant folding, and dead code elimination. Global optimisation works across multiple basic blocks using the Control Flow Graph and data flow analysis. It can move code across branches (like hoisting loop-invariant computations out of a loop), but requires more complex analysis to prove safety.

How does a compiler perform common subexpression elimination globally?

Global CSE uses available expressions analysis. The compiler computes which expressions are available at the entry of each basic block (have been computed on all preceding paths and not subsequently killed). If expression x op y is available at a point where it is computed again, the second computation is replaced by the previously computed value. This requires inserting a temporary variable to hold the value across basic blocks.

What is live variable analysis and why is it important for register allocation?

A variable is live at a point if its current value will be used later on some execution path. Live variable analysis computes the set of live variables at every program point. For register allocation: a register holding a variable’s value must not be reused for another variable as long as the first variable is still live. Two variables that are never simultaneously live can share a register. This analysis is the foundation of graph-colouring register allocation.

What is strength reduction and when is it used?

Strength reduction replaces a computationally expensive operation with a cheaper one that produces the same result. The most common example is replacing multiplication by an induction variable (i * k) with repeated addition (add k each iteration). It is applied during loop optimisation. On modern CPUs, multiplication is fast, so strength reduction matters mainly for memory address calculations and embedded/RISC systems where multiply is expensive.

Code Optimisation – CSE, Constant Folding, Loop Optimisation and Data Flow Analysis Explained