Code Generation – Register Allocation, Graph Colouring and Instruction Scheduling Explained

Home
CS & IT
Compiler Design
Code Generation
Key Takeaways:

  • Code generation translates optimised intermediate code into target machine code — the final phase of the compiler.
  • Instruction selection maps IR operations to machine instructions, often using tree-pattern matching on the IR.
  • Register allocation assigns IR temporaries to the finite set of physical machine registers. Variables that don’t fit must be spilled to memory.
  • Graph colouring register allocation models conflicts between live variables as an interference graph and colours it with k colours (k = number of registers).
  • Instruction scheduling reorders instructions to avoid pipeline stalls — maximising CPU throughput without changing semantics.
  • A simple code generator works basic block by basic block, maintaining a register descriptor and address descriptor for each variable.

1. Role of the Code Generator

The code generator is the final phase of the compiler. It takes the optimised intermediate representation (typically 3AC or SSA) and produces assembly language or machine code for a specific target architecture (x86, ARM, RISC-V, etc.).

The code generator must solve three interrelated problems simultaneously:

  1. Instruction selection: Which machine instructions implement each IR operation?
  2. Register allocation: Which values live in registers vs memory?
  3. Instruction ordering: In what order to emit instructions to maximise performance?

These problems are interdependent and collectively NP-Hard in general, so compilers use heuristics.

2. Instruction Selection

Instruction selection maps each IR operation to one or more machine instructions. For a RISC-like machine, a 3AC instruction often maps to one machine instruction. For CISC machines (like x86), more complex patterns may map to a single instruction.

Simple Mapping (RISC)

3AC: t = a + b
RISC: LOAD R1, a; LOAD R2, b; ADD R3, R1, R2; STORE t, R3

Tree-Pattern Matching

Modern code generators represent IR as expression trees and cover the tree with tiles — patterns corresponding to machine instructions. The goal is to cover the tree with tiles that minimise total cost (fewer instructions, cheaper instructions). This is solved optimally using dynamic programming (tree DP) in O(n) time.

Macro Expansion

A simpler approach: expand each IR instruction into a fixed sequence of machine instructions using templates. Fast but produces non-optimal code. Often used for initial code generation, with peephole optimisation applied afterward.

3. Simple Code Generation Algorithm

A classic simple code generator processes one basic block at a time, maintaining two data structures:

  • Register descriptor: For each register, which variable’s current value it holds
  • Address descriptor: For each variable, where its current value is (register, memory, both)

getReg(instruction)

For each instruction, decide which registers to use:

  1. If an operand is already in a register, use that register
  2. If a free register is available, use it
  3. If no free register: find the register holding a variable whose next use is furthest in the future. Spill it (save to memory) and reuse that register.

At the end of a basic block, write back to memory all variables that are live (needed after the block) and whose value only exists in a register.

4. Register Allocation

Real machines have a small, fixed number of registers (typically 8–32). IR temporaries are unlimited. Register allocation assigns IR values to physical registers, deciding which values to keep in registers and which to spill to memory.

This is one of the most critical optimisations — memory accesses are 10–100× slower than register operations on modern hardware. Minimising spills is essential for performance.

Allocation vs Assignment

  • Register allocation: Deciding which variables get a register at all
  • Register assignment: Deciding which physical register each variable gets

5. Graph Colouring Register Allocation

The standard algorithm for register allocation models the problem as graph colouring. It was introduced by Chaitin et al. (1981) and is used in GCC, LLVM, and virtually every optimising compiler.

Interference Graph

Build a graph where:

  • Each node represents an IR value (variable or temporary)
  • An edge connects two nodes if the corresponding values are simultaneously live at any program point — they cannot share a register

Graph Colouring

Assign colours (physical registers) to graph nodes such that no two adjacent nodes share a colour. If the graph can be coloured with k colours (k = number of available registers), then all values fit in registers simultaneously. The colouring directly gives the register assignment.

Chaitin’s Algorithm

  1. Build the interference graph
  2. Simplify: Repeatedly remove nodes with fewer than k neighbours (they can always be coloured). Push them onto a stack.
  3. Spill: If all remaining nodes have k or more neighbours, select one to spill (mark it for memory). Remove it and continue simplifying.
  4. Select (colour): Pop nodes from the stack and assign colours — always possible for nodes removed in step 2.
  5. Rewrite: For spilled variables, insert load/store instructions around each use/definition.
  6. Repeat until no more spills occur.

Graph colouring is NP-Complete in general, but Chaitin’s simplification heuristic works well in practice.

6. Register Spilling

When there are not enough registers to hold all live values simultaneously, some values must be spilled — stored in memory (typically the stack frame) and reloaded when needed.

Spill cost heuristics — which variable to spill:

  • Spill the variable used least frequently in the future (next-use farthest)
  • Spill the variable with the cheapest spill cost (infrequently accessed variables)
  • Prefer to spill variables outside loops rather than those inside tight loops
  • Do not spill a variable that is used in the next instruction

Spilling inserts LOAD (before use) and STORE (after definition) instructions. Too many spills significantly degrade performance — good register allocation minimises spills.

7. Instruction Scheduling

Modern CPUs are pipelined — multiple instructions execute simultaneously in different pipeline stages. Some instructions have latency: a result is not available for several cycles after the instruction is issued. If the next instruction depends on the result, the CPU stalls.

Instruction scheduling reorders instructions to fill stall slots, improving pipeline throughput without changing semantics. Two types:

  • Local (basic block) scheduling: Reorder instructions within a basic block
  • Global scheduling: Move instructions across basic blocks (e.g., trace scheduling, software pipelining)

List Scheduling Algorithm

  1. Build a dependency graph for the basic block (edges = data dependencies)
  2. Topological order the graph (the scheduler must respect data dependencies)
  3. At each cycle, choose a ready instruction (all its dependencies satisfied) using a priority heuristic (e.g., prefer instructions with longest critical path remaining)
  4. Issue the chosen instruction

8. Runtime Storage Organisation

The code generator also manages the runtime memory layout — how variables are allocated during execution.

Activation Records (Stack Frames)

Each function call creates an activation record (stack frame) on the call stack, containing:

  • Return address
  • Saved register values (caller-saved or callee-saved)
  • Local variables and temporaries
  • Parameters passed from the caller
  • Frame pointer (FP) and stack pointer (SP)

Calling Convention

A calling convention defines who (caller or callee) saves which registers, how parameters are passed (registers vs stack), and how return values are communicated. Common examples: x86-64 System V ABI (first 6 integer args in RDI, RSI, RDX, RCX, R8, R9), ARM AAPCS (first 4 args in R0–R3).

9. Common Misconceptions

Misconception 1: Register allocation is just about assigning variables to registers.
Register allocation also determines which variables get spilled to memory, inserts load and store instructions for spills, and handles special registers (like the stack pointer or return value register). It is a complex global optimisation involving liveness analysis, interference graph construction, colouring, and code rewriting.

Misconception 2: More registers always means faster code.
More registers generally help (fewer spills), but with diminishing returns. Beyond 16–32 registers, the benefit is minimal because most programs have limited live variables at any point. Encoding register numbers in instruction bits also has costs (more bits per instruction = larger code = worse cache utilisation).

Misconception 3: Instruction scheduling changes what the program computes.
Only in the presence of data hazards if done incorrectly. The scheduler respects all data dependencies (a read of x must come after the write to x that produces the value used). Only independent instructions can be freely reordered. Correct scheduling never changes observable behaviour.

Misconception 4: Graph colouring register allocation finds the optimal solution.
Graph colouring of an interference graph is NP-Complete. Compilers use heuristics (Chaitin’s simplification + spill selection) that work well in practice but are not guaranteed optimal. Research into better spill selection heuristics and alternative models continues.

Misconception 5: The code generator is the simplest phase of the compiler.
The code generator solves NP-Hard problems (register allocation, instruction scheduling) simultaneously. It must also handle calling conventions, memory alignment, instruction encoding, and architecture-specific quirks. It is arguably the most complex phase for a real production compiler.

10. Frequently Asked Questions

What is register allocation and why is it important?

Register allocation assigns IR values (variables, temporaries) to the limited set of physical CPU registers. Values that don’t fit in registers must be loaded and stored from memory at each use (spilling). Since registers are 10–100× faster to access than memory, good register allocation dramatically improves performance. It is one of the most impactful compiler optimisations for numerical and systems code.

What is an interference graph and how is it used in register allocation?

An interference graph has one node per variable (IR value). Two nodes are connected by an edge if they are simultaneously live at any point — meaning they cannot share a register. Allocating registers = colouring the interference graph with k colours (k = number of registers). If k-colourable, every variable gets a register; otherwise some must be spilled to memory. Graph colouring is NP-Complete, so compilers use Chaitin’s simplification heuristic.

What is instruction scheduling and why does it matter on modern CPUs?

Modern CPUs execute instructions in a pipeline — multiple instructions are processed simultaneously at different stages. If instruction B needs the result of instruction A but A is still in the pipeline, the CPU stalls (wastes cycles). Instruction scheduling reorders independent instructions to fill these stall slots. For memory-intensive or floating-point heavy code, good scheduling can double throughput without changing what the code computes.

What is a calling convention and why does it matter for compilers?

A calling convention is a standardised agreement between caller and callee about how function calls work: which registers hold arguments, which register holds the return value, which registers the callee must save and restore, and how the stack is managed. The compiler must follow the platform’s calling convention so that code from different compilers (or hand-written assembly) can interoperate. Common conventions: System V ABI (Linux x86-64), Microsoft x64 ABI (Windows), ARM AAPCS.