- A Deterministic Finite Automaton (DFA) has exactly one transition per state per input symbol — no ambiguity.
- An NFA allows multiple transitions (or none) per symbol, but recognises the same class of languages as DFAs.
- Every regular expression corresponds to a finite automaton, and vice versa — they describe exactly the same languages.
- The Pumping Lemma is the main tool for proving a language is not regular.
- Regular languages are closed under union, intersection, complement, concatenation, and Kleene star.
- Minimisation of a DFA removes redundant states while preserving the recognised language.
1. Deterministic Finite Automaton (DFA)
A DFA is the simplest model of computation. Think of it as a machine reading a string one character at a time, moving between states, and deciding at the end whether to accept or reject. A vending machine is a perfect real-world analogy — it moves between states (waiting, received coin, item selected) based on inputs, with no memory of what happened multiple steps ago.
Formal Definition
A DFA is a 5-tuple M = (Q, Σ, δ, q0, F) where:
- Q — finite set of states
- Σ — finite input alphabet
- δ : Q × Σ → Q — transition function (always defined, exactly one next state)
- q0 ∈ Q — start state
- F ⊆ Q — set of accept states
A string w is accepted if, after reading all of w, the machine is in an accept state. The language of M, written L(M), is the set of all strings it accepts.
Example: DFA for strings ending in “01” over {0,1}
States: q0 (start), q1 (last symbol was 0), q2 (last two symbols were 01 — accept). The transition table:
| State | Input 0 | Input 1 |
|---|---|---|
| q0 (start) | q1 | q0 |
| q1 | q1 | q2 |
| q2 (accept) | q1 | q0 |
On input “1001”: q0 →(1) q0 →(0) q1 →(0) q1 →(1) q2. Ends in accept state — accepted.
2. Nondeterministic Finite Automaton (NFA)
An NFA relaxes the DFA constraint: from any state on any input, there can be zero, one, or multiple transitions. There can also be epsilon (ε) transitions — moves that consume no input. An NFA accepts a string if any path through the machine leads to an accept state.
NFAs are not more powerful than DFAs — they recognise exactly the same languages. But they are often easier to design, because you can branch on uncertainty and “guess” the right path.
Formal Definition
An NFA is a 5-tuple M = (Q, Σ, δ, q0, F) where:
- δ : Q × (Σ ∪ {ε}) → 2Q — transition function returns a set of states (including the empty set)
Epsilon Closure
The epsilon closure of a state q, written ε-closure(q), is the set of all states reachable from q using only ε transitions (including q itself). This is needed when converting an NFA to a DFA.
3. NFA to DFA Conversion (Subset Construction)
Every NFA can be converted to an equivalent DFA using the subset construction (also called powerset construction). Each DFA state represents a set of NFA states that could be active simultaneously.
Algorithm
- Start state of DFA = ε-closure({q0})
- For each DFA state S and each input symbol a, compute: δ'(S, a) = ε-closure(⋃q ∈ S δ(q, a))
- A DFA state is an accept state if it contains any NFA accept state
- Repeat until no new states are discovered
Worst case: An NFA with n states can produce a DFA with up to 2n states. In practice, many subsets are unreachable and the DFA is much smaller.
4. Regular Expressions
A regular expression (regex) is a compact notation for describing a regular language. The three fundamental operations are:
| Operation | Notation | Meaning |
|---|---|---|
| Union | R | S or R + S | Strings in L(R) or L(S) |
| Concatenation | RS | Strings formed by an R-string followed by an S-string |
| Kleene Star | R* | Zero or more repetitions of R-strings (including empty string ε) |
Operator Precedence (high to low)
Star (*) > Concatenation > Union (|). So ab*|c means (a(b*)) | c, not a(b*|c).
Equivalence with Finite Automata
- Regex → NFA: Thompson’s construction — build NFA inductively from the regex structure.
- DFA → Regex: State elimination — remove states one by one, annotating transitions with partial regular expressions.
These conversions prove that Regular Expressions, NFAs, and DFAs all define exactly the same class of languages — the regular languages.
Common Regex Patterns
| Pattern | Regex |
|---|---|
| Any string over {a,b} | (a|b)* |
| Strings with at least one ‘a’ | (a|b)*a(a|b)* |
| Strings of even length over {a,b} | ((a|b)(a|b))* |
| Strings not containing ‘ab’ | b*a*(b+a*)* — requires careful construction |
| Strings ending in 01 | (0|1)*01 |
5. Pumping Lemma for Regular Languages
The Pumping Lemma is the standard tool for proving that a language is not regular. The idea: any sufficiently long string in a regular language must contain a repeatable section (the “pump”). If pumping always leads outside the language, the language cannot be regular.
Statement
If L is a regular language, then there exists a pumping length p such that for any string w ∈ L with |w| ≥ p, we can write w = xyz where:
- |xy| ≤ p
- |y| ≥ 1 (y is non-empty)
- For all i ≥ 0, xyiz ∈ L
Proof Strategy (by contradiction)
- Assume L is regular (so a pumping length p exists)
- Choose a specific string w ∈ L with |w| ≥ p (you choose this cleverly)
- Show that for every way to split w = xyz satisfying the conditions, there exists some i where xyiz ∉ L
- This contradicts the pumping lemma, so L is not regular
Example: Proving {0n1n | n ≥ 0} is not regular
Assume it is regular with pumping length p. Choose w = 0p1p. Any split w = xyz with |xy| ≤ p means y consists only of 0s (y = 0k for some k ≥ 1). Pump to i=2: xy2z = 0p+k1p. This has more 0s than 1s, so it ∉ L. Contradiction — the language is not regular.
6. Closure Properties
Regular languages are closed under the following operations — the result is always another regular language:
| Operation | Closed? | Construction |
|---|---|---|
| Union (L1 ∪ L2) | Yes | NFA: start state with ε to both machines |
| Intersection (L1 ∩ L2) | Yes | Product construction on DFAs |
| Complement (Σ* − L) | Yes | Swap accept/non-accept in DFA |
| Concatenation (L1 · L2) | Yes | NFA: ε from L1 accepts to L2 start |
| Kleene Star (L*) | Yes | NFA: new start/accept + ε loops |
| Difference (L1 − L2) | Yes | L1 ∩ complement(L2) |
| Reversal (LR) | Yes | Reverse all transitions in DFA |
| Homomorphism | Yes | Apply substitution to each symbol |
7. DFA Minimisation
A minimal DFA for a given language has the fewest possible states. Minimisation removes states that are “equivalent” — they behave identically on all future inputs.
Table-Filling Algorithm (Myhill-Nerode)
- Mark all pairs (accept, non-accept) as distinguishable
- For each unmarked pair (p, q) and each input symbol a: if (δ(p,a), δ(q,a)) is already marked distinguishable, mark (p,q) as distinguishable
- Repeat until no new pairs are marked
- Merge all remaining unmarked pairs — they are equivalent states
The Myhill-Nerode theorem provides a deeper result: a language L is regular if and only if it has a finite number of equivalence classes under the relation “x ∼ y iff for all z, xz ∈ L ⇔ yz ∈ L.” The number of classes equals the number of states in the minimal DFA.
8. Common Misconceptions
Misconception 1: NFAs are more powerful than DFAs.
False. They recognise exactly the same class of languages. NFAs are more convenient to design but not more expressive. Any NFA can be converted to an equivalent DFA.
Misconception 2: ε transitions make NFAs accept more languages.
No. Epsilon transitions are just a convenience. You can always eliminate them (epsilon-closure computation) and get an NFA without ε transitions that recognises the same language.
Misconception 3: The Pumping Lemma can prove a language IS regular.
No. The Pumping Lemma can only disprove regularity. If a language satisfies the pumping condition, it does not necessarily mean the language is regular (some non-regular languages also satisfy it). Regularity must be proved by constructing a DFA or regex.
Misconception 4: In subset construction, you always get 2n DFA states from an n-state NFA.
2n is the worst case (upper bound). Most reachable subsets are never reached, and the actual DFA is often much smaller.
Misconception 5: The start state of a DFA can also be an accept state.
This is actually allowed — and it means the empty string ε is in the language. It is not an error; it is a deliberate design choice when ε should be accepted.
9. Frequently Asked Questions
What is the difference between a DFA and an NFA?
In a DFA, from each state on each input symbol, there is exactly one next state — no ambiguity. In an NFA, there can be zero, one, or many next states for each input, plus epsilon (no-input) transitions. Both recognise the same class of languages (regular languages). DFAs are easier to implement; NFAs are easier to design. You can always convert one to the other.
Why does NFA to DFA conversion sometimes produce exponentially more states?
Each DFA state represents a subset of NFA states. With n NFA states, there are 2n possible subsets, giving a worst-case DFA of 2n states. This worst case actually occurs for some NFAs (e.g., the NFA for “strings where the n-th symbol from the end is 1” over {0,1}). However, in practice most subsets are unreachable from the start state, so the DFA is usually much smaller.
How do I know when to use the Pumping Lemma?
Use the Pumping Lemma when you suspect a language is not regular — typically when the language requires counting unbounded quantities (like matching equal numbers of a’s and b’s) or comparing lengths. If you can construct a DFA or regex for it, the language is regular. The Pumping Lemma is only for proving non-regularity, not for proving regularity.
What is the relationship between regular expressions and finite automata?
They are equivalent — every regular expression has a corresponding NFA (built by Thompson’s construction), and every DFA can be converted back to a regular expression (by state elimination). This three-way equivalence (regex = NFA = DFA) defines the class of regular languages. When you use Python’s re module or a UNIX grep pattern, the regex engine is internally building and running a finite automaton.
Is the language of all palindromes over {a,b} regular?
No. You can prove this with the Pumping Lemma. Choose w = apbap. Any split with |xy| ≤ p means y = ak for k ≥ 1. Pumping gives ap+kbap, which is not a palindrome. So the language of all palindromes over {a,b} is not regular — it requires a pushdown automaton (context-free language).