1. Compiler Phases Pipeline
| Phase | Input | Output | Key Tool/Theory |
|---|---|---|---|
| Lexical Analysis | Character stream | Token stream | Regex, DFA, LEX/FLEX |
| Syntax Analysis | Token stream | Parse tree / AST | CFG, LL(1)/LR, YACC/Bison |
| Semantic Analysis | AST | Annotated AST | Symbol table, type system, SDT |
| Intermediate Code Gen | Annotated AST | 3AC / IR | DAG, quadruples, SSA |
| Code Optimisation | IR | Optimised IR | Data flow, CFG, loop analysis |
| Code Generation | Optimised IR | Assembly / machine code | Register allocation, scheduling |
Support structures used across all phases: Symbol Table, Error Handler
2. FIRST and FOLLOW — Rules
FIRST(X)
If X is a terminal: FIRST(X) = {X}If X → ε: add ε to FIRST(X)If X → Y1Y2...Yk: Add FIRST(Y1) - {ε} to FIRST(X) If ε ∈ FIRST(Y1): add FIRST(Y2) - {ε}, etc. If ε ∈ FIRST(Yi) for all i: add ε to FIRST(X)FOLLOW(A)
$ ∈ FOLLOW(start symbol)For B → αAβ: FIRST(β) - {ε} ⊆ FOLLOW(A)For B → αA or B → αAβ where ε ∈ FIRST(β): FOLLOW(B) ⊆ FOLLOW(A)3. LL(1) Parse Table Construction
For each production A → α:
1. For each terminal a ∈ FIRST(α) - {ε}: M[A, a] = A → α2. If ε ∈ FIRST(α): For each terminal b ∈ FOLLOW(A): M[A, b] = A → α If $ ∈ FOLLOW(A): M[A, $] = A → αGrammar is LL(1) iff no cell in M has more than one production.
Not LL(1) if: Left recursion exists, or common prefixes exist (needs left factoring).
Conditions for LL(1)
For any two productions A → α | β:
- FIRST(α) ∩ FIRST(β) = ∅
- If ε ∈ FIRST(α): FIRST(β) ∩ FOLLOW(A) = ∅
4. LR Parsing — Steps and Conflicts
LR Parsing Actions
| Action | What Happens |
|---|---|
| Shift sn | Push next input token and state n onto stack; advance input |
| Reduce A → β | Pop |β| symbols from stack; push A; goto state GOTO[top, A] |
| Accept | Input fully parsed; done |
| Error | No valid action; report syntax error |
Conflict Types
| Conflict | Meaning | Cause |
|---|---|---|
| Shift-Reduce | Unclear whether to shift or reduce on lookahead | Ambiguous grammar or insufficient lookahead (SLR vs LALR) |
| Reduce-Reduce | Two different productions could be reduced | Grammar ambiguity or LALR state merging |
SLR Table Construction
Shift: [A → α•aβ] ∈ I and goto(I,a)=J: ACTION[I,a] = shift JReduce: [A → α•] ∈ I: ACTION[I,a] = reduce A→α for all a ∈ FOLLOW(A)Accept: [S' → S•] ∈ I: ACTION[I,$] = acceptGoto: goto(I, A) = J: GOTO[I, A] = J5. Parser Power Comparison
| Parser | Lookahead | Grammar Class | States | Conflicts |
|---|---|---|---|---|
| LR(0) | 0 | LR(0) | Fewest | Most |
| SLR(1) | FOLLOW sets | SLR(1) ⊃ LR(0) | Same as LR(0) | Fewer |
| LALR(1) | Merged lookaheads | LALR(1) ⊃ SLR(1) | Same as LR(0) | Fewest (practical) |
| CLR(1) / LR(1) | Full lookaheads | LR(1) ⊃ LALR(1) | Most | None for LR(1) grammars |
| LL(1) | FIRST/FOLLOW | LL(1) ⊂ LALR(1) | N/A | Fail on left recursion |
Containment: LL(1) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)
6. Three-Address Code Patterns
| Construct | Three-Address Code |
|---|---|
| a = b op c | t = b op c; a = t |
| a = b | a = b |
| if E goto L | [eval E → t]; if t goto L |
| if a relop b goto L | if a relop b goto L |
| x = a[i] | t = a[i]; x = t |
| a[i] = x | a[i] = x |
| call f(a,b) | param a; param b; call f, 2 |
| return x | return x |
| x = *p (deref) | x = *p |
| *p = x | *p = x |
If-Else Translation
if (E) S1 else S2: [code for E, result in t] if t == 0 goto L_else [code for S1] goto L_endL_else: [code for S2]L_end:While Loop Translation
while (E) S:L_test: [code for E, result in t] if t == 0 goto L_end [code for S] goto L_testL_end:7. Data Flow Analysis Equations
| Analysis | Direction | IN equation | OUT equation | Meet |
|---|---|---|---|---|
| Reaching Definitions | Forward | IN[B] = ∪ OUT[pred] | OUT[B] = gen[B] ∪ (IN[B] − kill[B]) | ∪ |
| Live Variables | Backward | IN[B] = use[B] ∪ (OUT[B] − def[B]) | OUT[B] = ∪ IN[succ] | ∪ |
| Available Expressions | Forward | IN[B] = ∩ OUT[pred] | OUT[B] = e_gen[B] ∪ (IN[B] − e_kill[B]) | ∩ |
| Very Busy Expressions | Backward | IN[B] = use[B] ∪ (OUT[B] − kill[B]) | OUT[B] = ∩ IN[succ] | ∩ |
gen/kill/use/def Sets
| Analysis | gen[B] | kill[B] |
|---|---|---|
| Reaching Definitions | Definitions in B not overwritten later in B | All other definitions of same variable (anywhere in program) |
| Live Variables (use/def) | use[B]: vars used in B before being defined in B | def[B]: vars defined in B before being used in B |
| Available Expressions (e_gen/e_kill) | Expressions computed in B, operands not later modified in B | Expressions using any variable defined in B |
8. Optimisation Quick Reference
| Optimisation | What It Does | Enabled By |
|---|---|---|
| Constant Folding | Evaluate constant expressions at compile time | Local analysis |
| Constant Propagation | Replace variables with their known constant values | Reaching definitions |
| Copy Propagation | Replace x = y; use of x with direct use of y | Reaching definitions |
| Dead Code Elimination | Remove instructions whose results are never used | Live variable analysis |
| Common Subexpression Elimination (CSE) | Reuse previously computed expression instead of recomputing | Available expressions |
| Loop Invariant Code Motion | Move loop-invariant computations outside the loop | Reaching definitions + dominance |
| Strength Reduction | Replace expensive ops with cheaper ones (mul → add) | Induction variable analysis |
| Loop Unrolling | Replicate loop body, reduce iterations | Loop analysis |
| Inlining | Replace function call with function body | Call graph analysis |
| Peephole | Replace small instruction patterns with more efficient ones | Local pattern matching |
9. Register Allocation Summary
| Step | Action |
|---|---|
| 1. Liveness Analysis | Compute live-in and live-out sets for each basic block |
| 2. Build Interference Graph | Add edge between two variables if they are simultaneously live |
| 3. Coalesce (optional) | Merge copy-related variables (x = y: prefer same register) |
| 4. Simplify | Remove nodes with degree < k; push onto stack |
| 5. Spill (if needed) | Select variable to spill; remove it and continue |
| 6. Colour (Select) | Pop nodes, assign colours (registers) greedily |
| 7. Rewrite | Insert LOAD/STORE for spilled variables; repeat from step 1 |
Key Formulae
Graph is k-colourable ⇒ no spills neededNode with degree < k can always be coloured (Chaitin's key insight)Spill cost heuristic: prefer to spill variables used outside loops10. Frequently Asked Questions
What is the difference between S-attributed and L-attributed grammars?
An S-attributed grammar uses only synthesised attributes — values flow bottom-up from children to parent. Every S-attributed grammar can be evaluated during LR parsing (YACC semantic actions). An L-attributed grammar additionally allows inherited attributes that flow from a node’s parent or left siblings — but not right siblings. L-attributed grammars can be evaluated in a single left-to-right pass compatible with LL parsing and one-pass compilers. Every S-attributed grammar is also L-attributed; the reverse is not true.
What is the difference between quadruples and triples in intermediate code?
Both represent three-address instructions. Quadruples are 4-field records (op, arg1, arg2, result) — the result is explicitly named as a temporary variable. This makes reordering easy since references use variable names, not positions. Triples are 3-field records (op, arg1, arg2) — the result is the instruction’s array index. Triples are more compact but harder to reorder since moving an instruction changes its index and breaks all references to it.
What is the difference between reaching definitions and available expressions?
Reaching definitions tracks which assignments (definitions) can reach a given point. It uses union (any definition that reaches along any path). Available expressions tracks which expressions have been computed and not invalidated on all paths leading to a point — it uses intersection (must be available on every path). Reaching definitions is used for constant propagation; available expressions is used for global CSE.
How is lexical analysis related to regular languages?
Token patterns (like identifiers, numbers, keywords) are regular languages — each pattern is a regular expression. Since every regular expression can be converted to a DFA, and DFAs run in O(n) linear time, lexers are extremely fast. The pipeline is: write token regex → Thompson’s construction gives NFA → subset construction gives DFA → minimise DFA → implement as transition table. Tools like LEX/FLEX automate this entire pipeline.