Basic Parsing Techniques



Parsers (Syntax Analyzer)

A parser is a component of the compiler that analyzes the syntactic structure of a program using a context-free grammar (CFG).

Role of Parser

  • Takes tokens from lexical analyzer
  • Checks grammar correctness
  • Constructs parse tree / syntax tree
  • Reports syntax errors

Types of Parsers

BasisClassification
Direction of parsingTop-down, Bottom-up
Parsing methodPredictive, Shift-Reduce
AutomationManual, Automatic (LR)

Shift-Reduce Parsing (Bottom-Up Parsing)

Shift-Reduce Parsing is a bottom-up parsing technique that builds the parse tree from leaves to root.

Core Idea

  • Reduce input string to start symbol
  • Uses a stack and input buffer

Basic Operations

OperationDescription
ShiftPush input symbol to stack
ReduceReplace RHS of production with LHS
AcceptSuccessful parsing
ErrorInvalid syntax

Example (Conceptual)

Grammar:

E → E + T | T T → id

Input:

id + id

Conflicts in Shift-Reduce Parsing

ConflictMeaning
Shift-Reduce ConflictUnclear whether to shift or reduce
Reduce-Reduce ConflictMultiple reductions possible

Operator Precedence Parsing

Operator Precedence Parsing is a special form of shift-reduce parsing that handles expressions with operators.

Key Characteristics

  • Works only with operator grammars
  • Uses precedence relations

Operator Precedence Relations

RelationMeaning
<Lower precedence
>Higher precedence
=Equal precedence

Advantages

  • Simple to implement
  • Efficient for arithmetic expressions

Limitation

  • Cannot parse general CFGs

Top-Down Parsing

Top-Down Parsing constructs the parse tree from root to leaves.

Characteristics

  • Starts with start symbol
  • Tries to derive input string
  • Uses leftmost derivation

Problem

  • Left recursion
  • Backtracking

Predictive Parsing

Predictive Parsing is a top-down parsing technique without backtracking.

Requirements

  • Grammar must be LL(1)
  • Uses parsing table

Components

ComponentDescription
StackStores grammar symbols
Input bufferTokens
Parsing tableDecision guide

Advantages

  • Fast
  • Deterministic
  • Easy error detection

Automatic Construction of Efficient Parsers

Manual construction of parsers is complex. Hence automatic parser generators are used.

Examples

  • YACC
  • Bison

These tools generate LR parsers automatically.

LR Parsers

LR Parser is a bottom-up parser that reads input Left-to-right and produces Rightmost derivation in reverse.

Why LR Parsers Are Powerful

  • Handle large class of grammars
  • Detect syntax errors early
  • Used in real compilers

Types of LR Parsers

TypeDescription
LR(0)Simplest
SLR(1)Uses FOLLOW sets
CLR(1)Canonical LR
LALR(1)Efficient and compact


Canonical Collection of LR(0) Items

LR(0) Item

An LR(0) item is a production with a dot (.) indicating parsing position.

Example:

A → α . β

Purpose

  • Represents parser states
  • Used to construct ACTION and GOTO tables

Steps to Construct Canonical Collection

  • Augment the grammar
  • Create initial item with dot at start
  • Apply Closure()
  • Apply Goto()
  • Repeat until no new items

Closure Operation

Adds all productions for a non-terminal appearing after dot.

GOTO Operation

Moves dot over a grammar symbol to generate new state.

Canonical Collection Table (Conceptual)

StateItems
I₀S' → .S
I₁S → a.S
I₂S → a

Comparison of Parsing Techniques

TechniqueDirectionGrammar TypeComplexity
PredictiveTop-DownLL(1)Low
Shift-ReduceBottom-UpCFGMedium
Operator PrecedenceBottom-UpOperator GrammarMedium
LRBottom-UpMost CFGsHigh

Exam-Oriented Summary

TopicKey Point
ParserSyntax checking
Shift-ReduceStack-based bottom-up
Operator PrecedenceExpression parsing
Top-DownRoot to leaves
PredictiveNo backtracking
LR ParserPowerful bottom-up
LR(0) ItemsParser states

Constructing SLR (Simple LR) Parsing Tables

SLR parsing is an improvement over LR(0) parsing. It uses FOLLOW sets to resolve reduce actions and eliminate some conflicts.

Steps to Construct an SLR Parsing Table

  1. Augment the grammar
    Add a new start symbol:

    S' → S
  2. Construct the Canonical Collection of LR(0) items

  3. Compute FOLLOW sets for all non-terminals

  4. Fill ACTION and GOTO tables

    • If A → α . a β and a is a terminal → Shift

    • If A → α .Reduce using FOLLOW(A)

    • If S' → S .Accept

Characteristics of SLR

FeatureDescription
Grammar powerMore than LR(0)
Conflict handlingUses FOLLOW sets
Table sizeSmaller
LimitationCannot handle all CFGs

Constructing Canonical LR (CLR / LR(1)) Parsing Tables

Canonical LR parsing is the most powerful LR technique. It uses LR(1) items with lookahead symbols.

LR(1) Item Format

[A → α . β , a]

Where:

  • a is a lookahead terminal

Steps for CLR Table Construction

  • Augment grammar
  • Construct Canonical Collection of LR(1) items
  • Apply Closure and GOTO operations
  • Build ACTION and GOTO tables using lookaheads

Characteristics of CLR

FeatureDescription
Grammar powerMaximum
Conflict resolutionVery strong
Table sizeVery large
Practical useRare (memory heavy)

Constructing LALR (Look-Ahead LR) Parsing Tables

LALR parsing combines the power of CLR with the compact size of SLR.

Construction Method

  • Construct Canonical LR(1) collection
  • Merge states having same LR(0) core
  • Combine lookahead sets
  • Construct ACTION and GOTO tables

Why LALR Is Popular

FeatureLALRCLR
Grammar powerHighVery High
Table sizeSmallLarge
Used in practiceYesRare

Most compiler tools (YACC) use LALR parsing.

Using Ambiguous Grammars in LR Parsing

An ambiguous grammar produces multiple parse trees for the same input string.

Example

EE + E | E * E | id

Expression:

id + id * id

Problems in Parsing Tables

  • Shift-Reduce conflicts
  • Reduce-Reduce conflicts

Handling Ambiguity

  • Rewrite grammar (preferred)
  • Use precedence and associativity rules
  • Use parser-generator directives

Automatic Parser Generator

An automatic parser generator is a tool that automatically constructs parsing tables and parser code from grammar rules.

Examples

ToolParsing Technique
YACCLALR(1)
BisonLALR / GLR
ANTLRLL(*)

Advantages

  • Reduces manual effort
  • Reliable and efficient
  • Used in real-world compilers

Implementation of LR Parsing Tables

LR parsing tables consist of two main parts:

(a) ACTION Table

InputAction
TerminalShift / Reduce / Accept

(b) GOTO Table

InputTransition
Non-terminalNext State

LR Parsing Algorithm (Conceptual)

  • Initialize stack with state 0
  • Read input symbol
  • Consult ACTION table
  • Perform shift or reduce
  • Update stack using GOTO
  • Continue until accept or error

Comparison of LR Parsing Techniques

FeatureSLRLALRCLR
Grammar powerMediumHighVery High
Table sizeSmallSmallLarge
Conflict handlingLimitedGoodBest
Practical useLowVery HighLow

Exam-Oriented Summary Table

TopicKey Points
SLR ParsingFOLLOW-based reduce actions
CLR ParsingLR(1) lookaheads
LALR ParsingMerged LR(1) states
Ambiguous GrammarCauses conflicts
Parser GeneratorAutomates parser creation
LR TablesACTION + GOTO