Compiler Design Formulas – Complete Quick Reference Sheet

Home
CS & IT
Compiler Design
Formulas & Quick Reference
What You Will Find Here: Every key rule, algorithm, table-construction procedure, and formula in Compiler Design — FIRST/FOLLOW computation, LL(1) table rules, LR parsing steps, 3AC patterns, data flow equations, and the compiler phases pipeline — all in one quick-reference page.

1. Compiler Phases Pipeline

PhaseInputOutputKey Tool/Theory
Lexical AnalysisCharacter streamToken streamRegex, DFA, LEX/FLEX
Syntax AnalysisToken streamParse tree / ASTCFG, LL(1)/LR, YACC/Bison
Semantic AnalysisASTAnnotated ASTSymbol table, type system, SDT
Intermediate Code GenAnnotated AST3AC / IRDAG, quadruples, SSA
Code OptimisationIROptimised IRData flow, CFG, loop analysis
Code GenerationOptimised IRAssembly / machine codeRegister allocation, scheduling

Support structures used across all phases: Symbol Table, Error Handler

2. FIRST and FOLLOW — Rules

FIRST(X)

If X is a terminal: FIRST(X) = {X}
If X → ε: add ε to FIRST(X)
If X → Y1Y2...Yk:
  Add FIRST(Y1) - {ε} to FIRST(X)
  If ε ∈ FIRST(Y1): add FIRST(Y2) - {ε}, etc.
  If ε ∈ FIRST(Yi) for all i: add ε to FIRST(X)

FOLLOW(A)

$ ∈ FOLLOW(start symbol)
For B → αAβ: FIRST(β) - {ε} ⊆ FOLLOW(A)
For B → αA or B → αAβ where ε ∈ FIRST(β): FOLLOW(B) ⊆ FOLLOW(A)

3. LL(1) Parse Table Construction

For each production A → α:

1. For each terminal a ∈ FIRST(α) - {ε}: M[A, a] = A → α
2. If ε ∈ FIRST(α):
   For each terminal b ∈ FOLLOW(A): M[A, b] = A → α
   If $ ∈ FOLLOW(A): M[A, $] = A → α

Grammar is LL(1) iff no cell in M has more than one production.

Not LL(1) if: Left recursion exists, or common prefixes exist (needs left factoring).

Conditions for LL(1)

For any two productions A → α | β:

  • FIRST(α) ∩ FIRST(β) = ∅
  • If ε ∈ FIRST(α): FIRST(β) ∩ FOLLOW(A) = ∅

4. LR Parsing — Steps and Conflicts

LR Parsing Actions

ActionWhat Happens
Shift snPush next input token and state n onto stack; advance input
Reduce A → βPop |β| symbols from stack; push A; goto state GOTO[top, A]
AcceptInput fully parsed; done
ErrorNo valid action; report syntax error

Conflict Types

ConflictMeaningCause
Shift-ReduceUnclear whether to shift or reduce on lookaheadAmbiguous grammar or insufficient lookahead (SLR vs LALR)
Reduce-ReduceTwo different productions could be reducedGrammar ambiguity or LALR state merging

SLR Table Construction

Shift: [A → α•aβ] ∈ I and goto(I,a)=J: ACTION[I,a] = shift J
Reduce: [A → α•] ∈ I: ACTION[I,a] = reduce A→α for all a ∈ FOLLOW(A)
Accept: [S' → S•] ∈ I: ACTION[I,$] = accept
Goto: goto(I, A) = J: GOTO[I, A] = J

5. Parser Power Comparison

ParserLookaheadGrammar ClassStatesConflicts
LR(0)0LR(0)FewestMost
SLR(1)FOLLOW setsSLR(1) ⊃ LR(0)Same as LR(0)Fewer
LALR(1)Merged lookaheadsLALR(1) ⊃ SLR(1)Same as LR(0)Fewest (practical)
CLR(1) / LR(1)Full lookaheadsLR(1) ⊃ LALR(1)MostNone for LR(1) grammars
LL(1)FIRST/FOLLOWLL(1) ⊂ LALR(1)N/AFail on left recursion

Containment: LL(1) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)

6. Three-Address Code Patterns

ConstructThree-Address Code
a = b op ct = b op c; a = t
a = ba = b
if E goto L[eval E → t]; if t goto L
if a relop b goto Lif a relop b goto L
x = a[i]t = a[i]; x = t
a[i] = xa[i] = x
call f(a,b)param a; param b; call f, 2
return xreturn x
x = *p (deref)x = *p
*p = x*p = x

If-Else Translation

if (E) S1 else S2:
  [code for E, result in t]
  if t == 0 goto L_else
  [code for S1]
  goto L_end
L_else: [code for S2]
L_end:

While Loop Translation

while (E) S:
L_test: [code for E, result in t]
  if t == 0 goto L_end
  [code for S]
  goto L_test
L_end:

7. Data Flow Analysis Equations

AnalysisDirectionIN equationOUT equationMeet
Reaching DefinitionsForwardIN[B] = ∪ OUT[pred]OUT[B] = gen[B] ∪ (IN[B] − kill[B])
Live VariablesBackwardIN[B] = use[B] ∪ (OUT[B] − def[B])OUT[B] = ∪ IN[succ]
Available ExpressionsForwardIN[B] = ∩ OUT[pred]OUT[B] = e_gen[B] ∪ (IN[B] − e_kill[B])
Very Busy ExpressionsBackwardIN[B] = use[B] ∪ (OUT[B] − kill[B])OUT[B] = ∩ IN[succ]

gen/kill/use/def Sets

Analysisgen[B]kill[B]
Reaching DefinitionsDefinitions in B not overwritten later in BAll other definitions of same variable (anywhere in program)
Live Variables (use/def)use[B]: vars used in B before being defined in Bdef[B]: vars defined in B before being used in B
Available Expressions (e_gen/e_kill)Expressions computed in B, operands not later modified in BExpressions using any variable defined in B

8. Optimisation Quick Reference

OptimisationWhat It DoesEnabled By
Constant FoldingEvaluate constant expressions at compile timeLocal analysis
Constant PropagationReplace variables with their known constant valuesReaching definitions
Copy PropagationReplace x = y; use of x with direct use of yReaching definitions
Dead Code EliminationRemove instructions whose results are never usedLive variable analysis
Common Subexpression Elimination (CSE)Reuse previously computed expression instead of recomputingAvailable expressions
Loop Invariant Code MotionMove loop-invariant computations outside the loopReaching definitions + dominance
Strength ReductionReplace expensive ops with cheaper ones (mul → add)Induction variable analysis
Loop UnrollingReplicate loop body, reduce iterationsLoop analysis
InliningReplace function call with function bodyCall graph analysis
PeepholeReplace small instruction patterns with more efficient onesLocal pattern matching

9. Register Allocation Summary

StepAction
1. Liveness AnalysisCompute live-in and live-out sets for each basic block
2. Build Interference GraphAdd edge between two variables if they are simultaneously live
3. Coalesce (optional)Merge copy-related variables (x = y: prefer same register)
4. SimplifyRemove nodes with degree < k; push onto stack
5. Spill (if needed)Select variable to spill; remove it and continue
6. Colour (Select)Pop nodes, assign colours (registers) greedily
7. RewriteInsert LOAD/STORE for spilled variables; repeat from step 1

Key Formulae

Graph is k-colourable ⇒ no spills needed
Node with degree < k can always be coloured (Chaitin's key insight)
Spill cost heuristic: prefer to spill variables used outside loops

10. Frequently Asked Questions

What is the difference between S-attributed and L-attributed grammars?

An S-attributed grammar uses only synthesised attributes — values flow bottom-up from children to parent. Every S-attributed grammar can be evaluated during LR parsing (YACC semantic actions). An L-attributed grammar additionally allows inherited attributes that flow from a node’s parent or left siblings — but not right siblings. L-attributed grammars can be evaluated in a single left-to-right pass compatible with LL parsing and one-pass compilers. Every S-attributed grammar is also L-attributed; the reverse is not true.

What is the difference between quadruples and triples in intermediate code?

Both represent three-address instructions. Quadruples are 4-field records (op, arg1, arg2, result) — the result is explicitly named as a temporary variable. This makes reordering easy since references use variable names, not positions. Triples are 3-field records (op, arg1, arg2) — the result is the instruction’s array index. Triples are more compact but harder to reorder since moving an instruction changes its index and breaks all references to it.

What is the difference between reaching definitions and available expressions?

Reaching definitions tracks which assignments (definitions) can reach a given point. It uses union (any definition that reaches along any path). Available expressions tracks which expressions have been computed and not invalidated on all paths leading to a point — it uses intersection (must be available on every path). Reaching definitions is used for constant propagation; available expressions is used for global CSE.

How is lexical analysis related to regular languages?

Token patterns (like identifiers, numbers, keywords) are regular languages — each pattern is a regular expression. Since every regular expression can be converted to a DFA, and DFAs run in O(n) linear time, lexers are extremely fast. The pipeline is: write token regex → Thompson’s construction gives NFA → subset construction gives DFA → minimise DFA → implement as transition table. Tools like LEX/FLEX automate this entire pipeline.

Leave a Comment