Compiler Design Formulas – Complete Quick Reference Sheet

Q: What is the difference between S-attributed and L-attributed grammars?

S-attributed grammars use only synthesised attributes (bottom-up, child to parent) — compatible with LR parsing. L-attributed grammars also allow inherited attributes from parent or left siblings — compatible with LL parsing. Every S-attributed grammar is also L-attributed.

Q: What is the difference between quadruples and triples?

Quadruples are 4-field records (op, arg1, arg2, result) with an explicitly named result — easy to reorder. Triples are 3-field records (op, arg1, arg2) where the result is the instruction's position — more compact but harder to reorder since positions change when instructions move.

Q: What is the difference between reaching definitions and available expressions?

Reaching definitions uses union — a definition reaches if it arrives along any path. Available expressions uses intersection — an expression is available only if it has been computed on every preceding path and not invalidated. Reaching definitions enables constant propagation; available expressions enables global CSE.

What You Will Find Here: Every key rule, algorithm, table-construction procedure, and formula in Compiler Design — FIRST/FOLLOW computation, LL(1) table rules, LR parsing steps, 3AC patterns, data flow equations, and the compiler phases pipeline — all in one quick-reference page.

1. Compiler Phases Pipeline

Phase	Input	Output	Key Tool/Theory
Lexical Analysis	Character stream	Token stream	Regex, DFA, LEX/FLEX
Syntax Analysis	Token stream	Parse tree / AST	CFG, LL(1)/LR, YACC/Bison
Semantic Analysis	AST	Annotated AST	Symbol table, type system, SDT
Intermediate Code Gen	Annotated AST	3AC / IR	DAG, quadruples, SSA
Code Optimisation	IR	Optimised IR	Data flow, CFG, loop analysis
Code Generation	Optimised IR	Assembly / machine code	Register allocation, scheduling

Support structures used across all phases: Symbol Table, Error Handler

2. FIRST and FOLLOW — Rules

FIRST(X)

If X is a terminal: FIRST(X) = {X}
If X → ε: add ε to FIRST(X)
If X → Y₁Y₂...Y_k:
  Add FIRST(Y₁) - {ε} to FIRST(X)
  If ε ∈ FIRST(Y₁): add FIRST(Y₂) - {ε}, etc.
  If ε ∈ FIRST(Y_i) for all i: add ε to FIRST(X)

FOLLOW(A)

$ ∈ FOLLOW(start symbol)
For B → αAβ: FIRST(β) - {ε} ⊆ FOLLOW(A)
For B → αA or B → αAβ where ε ∈ FIRST(β): FOLLOW(B) ⊆ FOLLOW(A)

3. LL(1) Parse Table Construction

For each production A → α:

1. For each terminal a ∈ FIRST(α) - {ε}: M[A, a] = A → α
2. If ε ∈ FIRST(α):
For each terminal b ∈ FOLLOW(A): M[A, b] = A → α
If $ ∈ FOLLOW(A): M[A, $] = A → α

Grammar is LL(1) iff no cell in M has more than one production.

Not LL(1) if: Left recursion exists, or common prefixes exist (needs left factoring).

Conditions for LL(1)

For any two productions A → α | β:

FIRST(α) ∩ FIRST(β) = ∅
If ε ∈ FIRST(α): FIRST(β) ∩ FOLLOW(A) = ∅

4. LR Parsing — Steps and Conflicts

LR Parsing Actions

Action	What Happens
Shift s_n	Push next input token and state n onto stack; advance input
Reduce A → β	Pop \|β\| symbols from stack; push A; goto state GOTO[top, A]
Accept	Input fully parsed; done
Error	No valid action; report syntax error

Conflict Types

Conflict	Meaning	Cause
Shift-Reduce	Unclear whether to shift or reduce on lookahead	Ambiguous grammar or insufficient lookahead (SLR vs LALR)
Reduce-Reduce	Two different productions could be reduced	Grammar ambiguity or LALR state merging

SLR Table Construction

Shift: [A → α•aβ] ∈ I and goto(I,a)=J: ACTION[I,a] = shift J
Reduce: [A → α•] ∈ I: ACTION[I,a] = reduce A→α for all a ∈ FOLLOW(A)
Accept: [S' → S•] ∈ I: ACTION[I,$] = accept
Goto: goto(I, A) = J: GOTO[I, A] = J

5. Parser Power Comparison

Parser	Lookahead	Grammar Class	States	Conflicts
LR(0)	0	LR(0)	Fewest	Most
SLR(1)	FOLLOW sets	SLR(1) ⊃ LR(0)	Same as LR(0)	Fewer
LALR(1)	Merged lookaheads	LALR(1) ⊃ SLR(1)	Same as LR(0)	Fewest (practical)
CLR(1) / LR(1)	Full lookaheads	LR(1) ⊃ LALR(1)	Most	None for LR(1) grammars
LL(1)	FIRST/FOLLOW	LL(1) ⊂ LALR(1)	N/A	Fail on left recursion

Containment: LL(1) ⊂ SLR(1) ⊂ LALR(1) ⊂ CLR(1)

6. Three-Address Code Patterns

Construct	Three-Address Code
a = b op c	t = b op c; a = t
a = b	a = b
if E goto L	[eval E → t]; if t goto L
if a relop b goto L	if a relop b goto L
x = a[i]	t = a[i]; x = t
a[i] = x	a[i] = x
call f(a,b)	param a; param b; call f, 2
return x	return x
x = *p (deref)	x = *p
*p = x	*p = x

If-Else Translation

if (E) S1 else S2:
  [code for E, result in t]
  if t == 0 goto L_else
  [code for S1]
  goto L_end
L_else: [code for S2]
L_end:

While Loop Translation

while (E) S:
L_test: [code for E, result in t]
  if t == 0 goto L_end
  [code for S]
  goto L_test
L_end:

7. Data Flow Analysis Equations

Analysis	Direction	IN equation	OUT equation	Meet
Reaching Definitions	Forward	IN[B] = ∪ OUT[pred]	OUT[B] = gen[B] ∪ (IN[B] − kill[B])	∪
Live Variables	Backward	IN[B] = use[B] ∪ (OUT[B] − def[B])	OUT[B] = ∪ IN[succ]	∪
Available Expressions	Forward	IN[B] = ∩ OUT[pred]	OUT[B] = e_gen[B] ∪ (IN[B] − e_kill[B])	∩
Very Busy Expressions	Backward	IN[B] = use[B] ∪ (OUT[B] − kill[B])	OUT[B] = ∩ IN[succ]	∩

gen/kill/use/def Sets

Analysis	gen[B]	kill[B]
Reaching Definitions	Definitions in B not overwritten later in B	All other definitions of same variable (anywhere in program)
Live Variables (use/def)	use[B]: vars used in B before being defined in B	def[B]: vars defined in B before being used in B
Available Expressions (e_gen/e_kill)	Expressions computed in B, operands not later modified in B	Expressions using any variable defined in B

8. Optimisation Quick Reference

Optimisation	What It Does	Enabled By
Constant Folding	Evaluate constant expressions at compile time	Local analysis
Constant Propagation	Replace variables with their known constant values	Reaching definitions
Copy Propagation	Replace x = y; use of x with direct use of y	Reaching definitions
Dead Code Elimination	Remove instructions whose results are never used	Live variable analysis
Common Subexpression Elimination (CSE)	Reuse previously computed expression instead of recomputing	Available expressions
Loop Invariant Code Motion	Move loop-invariant computations outside the loop	Reaching definitions + dominance
Strength Reduction	Replace expensive ops with cheaper ones (mul → add)	Induction variable analysis
Loop Unrolling	Replicate loop body, reduce iterations	Loop analysis
Inlining	Replace function call with function body	Call graph analysis
Peephole	Replace small instruction patterns with more efficient ones	Local pattern matching

9. Register Allocation Summary

Step	Action
1. Liveness Analysis	Compute live-in and live-out sets for each basic block
2. Build Interference Graph	Add edge between two variables if they are simultaneously live
3. Coalesce (optional)	Merge copy-related variables (x = y: prefer same register)
4. Simplify	Remove nodes with degree < k; push onto stack
5. Spill (if needed)	Select variable to spill; remove it and continue
6. Colour (Select)	Pop nodes, assign colours (registers) greedily
7. Rewrite	Insert LOAD/STORE for spilled variables; repeat from step 1

Key Formulae

Graph is k-colourable ⇒ no spills needed
Node with degree < k can always be coloured (Chaitin's key insight)
Spill cost heuristic: prefer to spill variables used outside loops

10. Frequently Asked Questions

What is the difference between S-attributed and L-attributed grammars?

An S-attributed grammar uses only synthesised attributes — values flow bottom-up from children to parent. Every S-attributed grammar can be evaluated during LR parsing (YACC semantic actions). An L-attributed grammar additionally allows inherited attributes that flow from a node’s parent or left siblings — but not right siblings. L-attributed grammars can be evaluated in a single left-to-right pass compatible with LL parsing and one-pass compilers. Every S-attributed grammar is also L-attributed; the reverse is not true.

What is the difference between quadruples and triples in intermediate code?

Both represent three-address instructions. Quadruples are 4-field records (op, arg1, arg2, result) — the result is explicitly named as a temporary variable. This makes reordering easy since references use variable names, not positions. Triples are 3-field records (op, arg1, arg2) — the result is the instruction’s array index. Triples are more compact but harder to reorder since moving an instruction changes its index and breaks all references to it.

What is the difference between reaching definitions and available expressions?

Reaching definitions tracks which assignments (definitions) can reach a given point. It uses union (any definition that reaches along any path). Available expressions tracks which expressions have been computed and not invalidated on all paths leading to a point — it uses intersection (must be available on every path). Reaching definitions is used for constant propagation; available expressions is used for global CSE.

How is lexical analysis related to regular languages?

Token patterns (like identifiers, numbers, keywords) are regular languages — each pattern is a regular expression. Since every regular expression can be converted to a DFA, and DFAs run in O(n) linear time, lexers are extremely fast. The pipeline is: write token regex → Thompson’s construction gives NFA → subset construction gives DFA → minimise DFA → implement as transition table. Tools like LEX/FLEX automate this entire pipeline.