Basic Parsing Techniques
Parsers (Syntax Analyzer)
A parser is a component of the compiler that analyzes the syntactic structure of a program using a context-free grammar (CFG).
Role of Parser
- Takes tokens from lexical analyzer
- Checks grammar correctness
- Constructs parse tree / syntax tree
- Reports syntax errors
Types of Parsers
| Basis | Classification |
|---|---|
| Direction of parsing | Top-down, Bottom-up |
| Parsing method | Predictive, Shift-Reduce |
| Automation | Manual, Automatic (LR) |
Shift-Reduce Parsing (Bottom-Up Parsing)
Shift-Reduce Parsing is a bottom-up parsing technique that builds the parse tree from leaves to root.
Core Idea
- Reduce input string to start symbol
- Uses a stack and input buffer
Basic Operations
| Operation | Description |
|---|---|
| Shift | Push input symbol to stack |
| Reduce | Replace RHS of production with LHS |
| Accept | Successful parsing |
| Error | Invalid syntax |
Example (Conceptual)
Grammar:
Input:
Conflicts in Shift-Reduce Parsing
| Conflict | Meaning |
|---|---|
| Shift-Reduce Conflict | Unclear whether to shift or reduce |
| Reduce-Reduce Conflict | Multiple reductions possible |
Operator Precedence Parsing
Operator Precedence Parsing is a special form of shift-reduce parsing that handles expressions with operators.
Key Characteristics
- Works only with operator grammars
- Uses precedence relations
Operator Precedence Relations
| Relation | Meaning |
|---|---|
< | Lower precedence |
> | Higher precedence |
= | Equal precedence |
Advantages
- Simple to implement
- Efficient for arithmetic expressions
Limitation
-
Cannot parse general CFGs
Top-Down Parsing
Top-Down Parsing constructs the parse tree from root to leaves.
Characteristics
- Starts with start symbol
- Tries to derive input string
- Uses leftmost derivation
Problem
- Left recursion
- Backtracking
Predictive Parsing
Predictive Parsing is a top-down parsing technique without backtracking.
Requirements
- Grammar must be LL(1)
- Uses parsing table
Components
| Component | Description |
|---|---|
| Stack | Stores grammar symbols |
| Input buffer | Tokens |
| Parsing table | Decision guide |
Advantages
- Fast
- Deterministic
- Easy error detection
Automatic Construction of Efficient Parsers
Manual construction of parsers is complex. Hence automatic parser generators are used.
Examples
- YACC
- Bison
These tools generate LR parsers automatically.
LR Parsers
LR Parser is a bottom-up parser that reads input Left-to-right and produces Rightmost derivation in reverse.
Why LR Parsers Are Powerful
- Handle large class of grammars
- Detect syntax errors early
- Used in real compilers
Types of LR Parsers
| Type | Description |
|---|---|
| LR(0) | Simplest |
| SLR(1) | Uses FOLLOW sets |
| CLR(1) | Canonical LR |
| LALR(1) | Efficient and compact |
Canonical Collection of LR(0) Items
LR(0) Item
An LR(0) item is a production with a dot (.) indicating parsing position.
Example:
Purpose
- Represents parser states
- Used to construct ACTION and GOTO tables
Steps to Construct Canonical Collection
- Augment the grammar
- Create initial item with dot at start
- Apply Closure()
- Apply Goto()
- Repeat until no new items
Closure Operation
Adds all productions for a non-terminal appearing after dot.
GOTO Operation
Moves dot over a grammar symbol to generate new state.
Canonical Collection Table (Conceptual)
| State | Items |
|---|---|
| I₀ | S' → .S |
| I₁ | S → a.S |
| I₂ | S → a |
Comparison of Parsing Techniques
| Technique | Direction | Grammar Type | Complexity |
|---|---|---|---|
| Predictive | Top-Down | LL(1) | Low |
| Shift-Reduce | Bottom-Up | CFG | Medium |
| Operator Precedence | Bottom-Up | Operator Grammar | Medium |
| LR | Bottom-Up | Most CFGs | High |
Exam-Oriented Summary
| Topic | Key Point |
|---|---|
| Parser | Syntax checking |
| Shift-Reduce | Stack-based bottom-up |
| Operator Precedence | Expression parsing |
| Top-Down | Root to leaves |
| Predictive | No backtracking |
| LR Parser | Powerful bottom-up |
| LR(0) Items | Parser states |
Constructing SLR (Simple LR) Parsing Tables
SLR parsing is an improvement over LR(0) parsing. It uses FOLLOW sets to resolve reduce actions and eliminate some conflicts.
Steps to Construct an SLR Parsing Table
-
Augment the grammar
Add a new start symbol: -
Construct the Canonical Collection of LR(0) items
-
Compute FOLLOW sets for all non-terminals
-
Fill ACTION and GOTO tables
-
If
A → α . a βandais a terminal → Shift -
If
A → α .→ Reduce using FOLLOW(A) -
If
S' → S .→ Accept
-
Characteristics of SLR
| Feature | Description |
|---|---|
| Grammar power | More than LR(0) |
| Conflict handling | Uses FOLLOW sets |
| Table size | Smaller |
| Limitation | Cannot handle all CFGs |
Constructing Canonical LR (CLR / LR(1)) Parsing Tables
Canonical LR parsing is the most powerful LR technique. It uses LR(1) items with lookahead symbols.
LR(1) Item Format
Where:
-
ais a lookahead terminal
Steps for CLR Table Construction
- Augment grammar
- Construct Canonical Collection of LR(1) items
- Apply Closure and GOTO operations
- Build ACTION and GOTO tables using lookaheads
Characteristics of CLR
| Feature | Description |
|---|---|
| Grammar power | Maximum |
| Conflict resolution | Very strong |
| Table size | Very large |
| Practical use | Rare (memory heavy) |
Constructing LALR (Look-Ahead LR) Parsing Tables
LALR parsing combines the power of CLR with the compact size of SLR.
Construction Method
- Construct Canonical LR(1) collection
- Merge states having same LR(0) core
- Combine lookahead sets
- Construct ACTION and GOTO tables
Why LALR Is Popular
| Feature | LALR | CLR |
|---|---|---|
| Grammar power | High | Very High |
| Table size | Small | Large |
| Used in practice | Yes | Rare |
Most compiler tools (YACC) use LALR parsing.
Using Ambiguous Grammars in LR Parsing
An ambiguous grammar produces multiple parse trees for the same input string.
Example
Expression:
Problems in Parsing Tables
- Shift-Reduce conflicts
- Reduce-Reduce conflicts
Handling Ambiguity
- Rewrite grammar (preferred)
- Use precedence and associativity rules
- Use parser-generator directives
Automatic Parser Generator
An automatic parser generator is a tool that automatically constructs parsing tables and parser code from grammar rules.
Examples
| Tool | Parsing Technique |
|---|---|
| YACC | LALR(1) |
| Bison | LALR / GLR |
| ANTLR | LL(*) |
Advantages
- Reduces manual effort
- Reliable and efficient
- Used in real-world compilers
Implementation of LR Parsing Tables
LR parsing tables consist of two main parts:
(a) ACTION Table
| Input | Action |
|---|---|
| Terminal | Shift / Reduce / Accept |
(b) GOTO Table
| Input | Transition |
|---|---|
| Non-terminal | Next State |
LR Parsing Algorithm (Conceptual)
- Initialize stack with state 0
- Read input symbol
- Consult ACTION table
- Perform shift or reduce
- Update stack using GOTO
- Continue until accept or error
Comparison of LR Parsing Techniques
| Feature | SLR | LALR | CLR |
|---|---|---|---|
| Grammar power | Medium | High | Very High |
| Table size | Small | Small | Large |
| Conflict handling | Limited | Good | Best |
| Practical use | Low | Very High | Low |
Exam-Oriented Summary Table
| Topic | Key Points |
|---|---|
| SLR Parsing | FOLLOW-based reduce actions |
| CLR Parsing | LR(1) lookaheads |
| LALR Parsing | Merged LR(1) states |
| Ambiguous Grammar | Causes conflicts |
| Parser Generator | Automates parser creation |
| LR Tables | ACTION + GOTO |