Semantic Analysis – Symbol Tables, Type Checking and Syntax-Directed Translation Explained

Home
CS & IT
Compiler Design
Semantic Analysis
Key Takeaways:

  • Semantic analysis checks that a syntactically correct program is also meaningful — variables are declared before use, types match, function signatures are respected.
  • The symbol table stores information about every identifier: its name, type, scope, and memory location. It is consulted and updated throughout all compiler phases.
  • Syntax-Directed Translation (SDT) attaches semantic rules to grammar productions, allowing the compiler to compute attributes (type, value, code) during parsing.
  • Synthesised attributes flow bottom-up (from children to parent in the parse tree). Inherited attributes flow top-down (from parent or sibling to child).
  • An S-attributed grammar uses only synthesised attributes and can be evaluated during LR parsing. An L-attributed grammar also allows inherited attributes, compatible with LL and one-pass parsing.
  • Type checking verifies that operations are applied to compatible types and inserts implicit coercions (widening) where allowed.

1. What Semantic Analysis Checks

A program can be syntactically correct (it has valid grammar) but semantically wrong (it doesn’t make sense). Semantic analysis catches errors that the parser cannot:

  • Undeclared identifiers: Using a variable that was never declared
  • Redeclared identifiers: Declaring the same variable twice in the same scope
  • Type errors: Adding a string to an integer, assigning a float to a boolean variable
  • Wrong number of arguments: Calling a function with too many or too few parameters
  • Return type mismatches: A function declared to return int but returning a string
  • Array bounds: Static analysis of array index types (dynamic bounds checking happens at runtime)
  • Control flow: break or continue outside a loop, missing return in all paths

2. Symbol Tables

The symbol table is the compiler’s central repository of information about every identifier in the program. Every phase reads from or writes to it.

Information Stored per Entry

FieldExample
Namecount, printResult, MAX_SIZE
Categoryvariable, function, type, constant, label
Data typeint, float, char*, struct Point
Scope levelglobal (0), function (1), block (2)
Memory locationaddress, offset from base pointer, register
For functionsreturn type, parameter types and count
For arrayselement type, dimensions, sizes

Implementation

Symbol tables are typically implemented as hash tables for O(1) average-case lookup. For scoped lookup (nested blocks), a chained hash table or a stack of hash tables is used — the innermost scope’s table is on top.

3. Scope Rules

Scope determines the region of code in which a name is visible. Two common scoping models:

Static (Lexical) Scoping

A name refers to the declaration in the nearest enclosing scope at the point where the name appears in the source text. Most modern languages (C, Java, Python) use lexical scoping. The compiler can determine all bindings at compile time.

Dynamic Scoping

A name refers to the most recently activated declaration on the call stack at runtime. Early Lisp and some shell languages use dynamic scoping. Requires runtime symbol table lookup; harder to understand and analyse statically.

Block-Structured Scope (C, Java)

Each new block ({ }) creates a new scope. Names in inner scopes shadow (hide) names in outer scopes with the same name. When the block exits, its scope is destroyed. The compiler maintains a scope stack:

  • On block entry: push a new scope table
  • On name declaration: add to current (top) scope table
  • On name lookup: search from innermost scope outward
  • On block exit: pop the scope table

4. Type Checking

Type checking verifies that operations are applied to operands of compatible types. It can be done:

  • Statically (at compile time): C, Java, Go — type errors are caught before running the program
  • Dynamically (at runtime): Python, JavaScript — type errors cause runtime exceptions
  • Hybrid: TypeScript, Kotlin — optional static types with runtime fallbacks

Type Coercion and Widening

Coercion is the implicit conversion of a value from one type to another. Most languages allow safe widening conversions automatically:

int → long → float → double   (widening — no information lost)
double → int   (narrowing — possible loss, usually explicit cast required)

The type checker inserts an implicit widening coercion node in the AST when an int is used in a float expression.

Type Equivalence

  • Structural equivalence: Two types are equal if they have the same structure. C uses this for structs.
  • Name equivalence: Two types are equal only if they have the same declared name. Java uses this for classes.

5. Syntax-Directed Translation (SDT)

Syntax-Directed Translation (SDT) is a technique that integrates semantic computation directly into the parsing process. Instead of building a full parse tree and then traversing it, the compiler computes semantic attributes during parsing by attaching rules to grammar productions.

SDT is the mechanism that allows the compiler to perform type checking, generate symbol table entries, and emit intermediate code — all in a single pass (or close to it).

6. Synthesised and Inherited Attributes

Each grammar symbol (terminal or non-terminal) can have attributes — values associated with a node in the parse tree. Attributes flow in one of two directions:

Synthesised Attributes

A synthesised attribute of node N is computed from N’s children’s attributes. Information flows bottom-up. The value of a non-terminal is determined by reducing its children.

Example: In E → E1 + T, the type of E is determined from the types of E1 and T:

E.type = type_check(E1.type, ‘+’, T.type)

Inherited Attributes

An inherited attribute of node N is computed from N’s parent’s or siblings’ attributes. Information flows top-down. Used when context from above is needed — for example, passing the declared type down to sub-expressions in a declaration.

Example: In int a, b, c; — the type “int” is an inherited attribute passed to each variable in the declaration list.

Attribute Grammar Evaluation Order

  • S-attributed grammar: Only synthesised attributes. Can be evaluated bottom-up during LR parsing (a semantic action in YACC).
  • L-attributed grammar: Inherited attributes from left siblings or parent only. Can be evaluated top-down during LL parsing or with one extra traversal. Every S-attributed grammar is also L-attributed.

7. Syntax-Directed Definitions (SDD)

An SDD is a CFG with attributes and semantic rules attached to each production. Each semantic rule computes an attribute value. The rules are declarative — they describe what to compute, not in what order.

Example SDD for Simple Expressions

ProductionSemantic Rule
E → E1 + TE.val = E1.val + T.val
E → TE.val = T.val
T → T1 * FT.val = T1.val * F.val
T → FT.val = F.val
F → ( E )F.val = E.val
F → digitF.val = digit.lexval

8. Translation Schemes

A translation scheme embeds program fragments (actions) directly into the right-hand sides of grammar productions, specifying when each action runs. Unlike SDDs (declarative), translation schemes are procedural — the order of actions matters.

E → E1 + T { print('+'); }
E → T
T → T1 * F { print('*'); }
T → F
F → digit { print(digit.lexval); }

This translation scheme prints a postfix expression for the arithmetic input. Translation schemes form the basis of code generation embedded in YACC/Bison grammars.

9. Common Misconceptions

Misconception 1: Semantic analysis is the same as type checking.
Type checking is the most well-known part of semantic analysis, but semantic analysis also covers scope resolution, variable declaration checks, control-flow validation, function signature checking, and more. Type checking is one component.

Misconception 2: The symbol table is only used during semantic analysis.
The symbol table is used throughout: the lexer enters identifiers during tokenisation, the parser refines entries, semantic analysis adds type information, and code generation reads memory offsets and register assignments. It is a global compiler data structure.

Misconception 3: Inherited attributes cannot be used in LR parsers.
Standard LR parsers only naturally support S-attributed grammars (synthesised attributes). L-attributed grammars (with restricted inherited attributes) can also be implemented in LR parsers using special stack manipulation tricks. Pure arbitrary inherited attributes are harder and require post-parse traversal.

Misconception 4: Static typing and strong typing mean the same thing.
Static typing means types are checked at compile time (C, Java). Strong typing means no implicit type conversions that lose information (Python is dynamically typed but strongly typed — you can’t add an int and a string). C is statically typed but weakly typed (allows unsafe pointer casts). These are orthogonal properties.

Misconception 5: SDDs directly execute semantic rules in order.
SDDs are declarative — they define dependencies between attributes, not an execution order. The compiler infers a valid evaluation order from the dependency graph. Translation schemes (not SDDs) specify execution order by embedding actions in grammar rules.

10. Frequently Asked Questions

What is the difference between a syntax error and a semantic error?

A syntax error violates the grammar — the program cannot even be parsed into a valid parse tree. Example: missing semicolon, unmatched brace. A semantic error is syntactically valid but meaningless — the parse tree is built, but the semantic checks fail. Example: int x = "hello"; parses correctly but fails type checking. Syntax errors are caught by the parser; semantic errors are caught by the semantic analyser.

What is a syntax-directed translation and how is it different from a regular grammar?

A regular grammar only defines what strings are in the language — the structure. An SDT (Syntax-Directed Translation) adds semantic meaning to that structure by attaching attribute rules to each production. These rules compute values (types, code, evaluation results) during or after parsing. SDT transforms the grammar from a pure language description into a computation mechanism, enabling the compiler to do type checking and code generation in one or a few passes over the parse tree.

What is the difference between synthesised and inherited attributes?

A synthesised attribute’s value is computed from the attributes of the node’s children (bottom-up). A node’s synthesised attribute is “synthesised” from below. An inherited attribute’s value comes from the node’s parent or left siblings (top-down). Inherited attributes carry context downward — for example, passing a declared variable’s type down to sub-expressions. Synthesised attributes are simpler and compatible with LR parsing; inherited attributes require more careful implementation.

How does type coercion work in compilers?

When the type checker finds a type mismatch that is safe to resolve (like using an int where a float is expected), it inserts an implicit conversion node (a coercion node) in the AST. The code generator then emits the appropriate conversion instruction (like int-to-float instruction on the CPU). Unsafe coercions (like narrowing a double to an int) require an explicit cast in the source code — the programmer must acknowledge the potential data loss.

Leave a Comment