Grammars and Parsing
Grammars and Sentence Structure
In Natural Language Processing, a grammar is a set of rules that defines how words combine to form valid sentences in a language.
Grammar helps a computer determine whether a sentence is syntactically correct.
Grammar rules are derived from Linguistics.
Components of Grammar
| Component | Description | Example |
|---|---|---|
| Terminal Symbols | Actual words in the sentence | boy, apple |
| Non-Terminal Symbols | Grammatical categories | NP, VP |
| Production Rules | Rules to generate sentences | S → NP + VP |
| Start Symbol | Starting point of grammar | S |
Example Grammar Rules
| Rule | Meaning |
|---|---|
| S → NP VP | Sentence consists of Noun Phrase + Verb Phrase |
| NP → Det N | Noun phrase |
| VP → V NP | Verb phrase |
| Det → the, a | Determiner |
| N → boy, apple | Noun |
| V → eats | Verb |
Example Sentence Structure
Sentence: “The boy eats an apple.”
Parse representation:
S
├── NP
│ ├── Det → The
│ └── N → boy
└── VP
├── V → eats
└── NP
├── Det → an
└── N → apple
This hierarchical structure helps computers understand sentence syntax.
Parsing in NLP
Parsing is the process of analyzing a sentence according to grammar rules to determine its syntactic structure.
Parsing identifies:
- sentence components
- grammatical relationships
- hierarchical structure
Example sentence: “Ram reads a book.”
Parser identifies:
| Component | Role |
|---|---|
| Ram | Subject |
| reads | Verb |
| book | Object |
Top-Down Parser
A Top-Down Parser starts parsing from the start symbol (S) and tries to generate the input sentence using grammar rules. It expands grammar rules until it matches the input sentence.
Steps of Top-Down Parsing
- Start with S
- Expand grammar rules
- Match terminal symbols with input words
- Continue until full sentence is generated
Example sentence: “The boy eats apple.”
Parsing process:
S
→ NP VP
→ Det N VP
→ The boy VP
→ The boy V NP
→ The boy eats apple
Advantages
| Advantage | Explanation |
|---|---|
| Simple design | Easy to implement |
| Clear derivation | Shows sentence generation |
Disadvantages
| Disadvantage | Explanation |
|---|---|
| Backtracking required | Wrong rule selection |
| Inefficient | Many unnecessary expansions |
Bottom-Up Parser
A Bottom-Up Parser starts from the input words and gradually builds the parse tree upwards toward the start symbol (S). It reduces groups of words into grammar symbols.
Parsing Process
Example: Sentence: “The boy eats apple.”
Step process:
| Step | Reduction |
|---|---|
| The | Det |
| boy | N |
| Det + N | NP |
| eats | V |
| apple | N |
| V + N | VP |
| NP + VP | S |
Advantages
| Advantage | Explanation |
|---|---|
| Efficient for many grammars | Less backtracking |
| Works well with ambiguous grammar |
Disadvantages
| Disadvantage | Explanation |
|---|---|
| Complex implementation | Harder than top-down |
| Large search space | Many reductions possible |
Difference Between Top-Down and Bottom-Up Parsing
| Feature | Top-Down Parsing | Bottom-Up Parsing |
|---|---|---|
| Starting Point | Start symbol (S) | Input words |
| Direction | Root → Leaves | Leaves → Root |
| Strategy | Predict structure | Build structure |
| Efficiency | Less efficient | More efficient |
Transition Network Grammars
A Transition Network Grammar represents grammar rules using a network of states and transitions. It is based on finite state machines used in Artificial Intelligence.
Components
| Component | Meaning |
|---|---|
| Node | State |
| Arc | Transition between states |
| Label | Word or category |
Example - Sentence structure:
Start → NP → VP → End
Graph representation:
(Start) → NP → VP → (End)
Transitions represent grammatical categories.
Augmented Transition Network (ATN)
An advanced version of transition network grammar.
Features:
- recursion
- memory registers
- conditions
Used for complex sentence parsing.
Top-Down Chart Parsing
Top-Down Chart Parsing is an improved version of top-down parsing that uses a chart data structure to store intermediate results and avoid repeated computations. It is widely used in syntactic parsing systems.
Components of Chart Parser
| Component | Description |
|---|---|
| Chart | Data structure storing partial parses |
| Edge | Partial grammar rule |
| Agenda | List of operations to perform |
Example Chart Entry
Sentence: “The boy eats apple.”
Chart entries may include:
| Position | Grammar Rule |
|---|---|
| 0–2 | NP |
| 2–3 | V |
| 3–4 | N |
| 2–4 | VP |
Finally:
NP + VP → S
Advantages
| Advantage | Explanation |
|---|---|
| Avoids repeated work | Stores partial results |
| Efficient parsing | Handles ambiguity |
| Works with large grammars | Suitable for NLP systems |
Summary Table
| Topic | Key Idea |
|---|---|
| Grammar | Rules to form valid sentences |
| Sentence Structure | Hierarchical representation |
| Top-Down Parser | Start from root symbol |
| Bottom-Up Parser | Start from input words |
| Transition Network Grammar | Grammar as state network |
| Chart Parsing | Efficient parsing using charts |
Short
Grammars and Parsing: Grammar is a set of rules that define the structure of sentences in a language. Parsing is the process of analyzing a sentence according to these grammatical rules to determine its syntactic structure. Major parsing techniques include Top-Down parsing, which starts from the start symbol and expands grammar rules, and Bottom-Up parsing, which begins with input words and builds the structure upward. Transition Network Grammars represent grammar using states and transitions, while Top-Down Chart Parsing improves efficiency by storing intermediate parsing results in a chart.
Feature Systems and Augmented Grammars
In Natural Language Processing, feature systems are used to represent grammatical properties of words such as number, gender, tense, and person.
Feature systems help parsers enforce grammatical agreement rules in sentences.
Example sentence: “She eats apples.”
The system checks:
| Word | Feature |
|---|---|
| She | Person = 3rd |
| eats | Verb agreement = 3rd person |
| apples | Number = plural |
These grammatical properties are represented using feature structures.
Basic Feature System for English
A feature system represents grammatical information using attribute–value pairs.
Example feature structure:
Noun
[
Number = Singular
Gender = Masculine
Person = Third
]
Common Features in English Grammar
| Feature | Meaning | Example |
|---|---|---|
| Number | Singular or plural | boy / boys |
| Person | First, second, third | I, you, he |
| Tense | Time of action | past, present |
| Gender | Masculine/feminine | actor / actress |
| Case | Grammatical role | subject/object |
Example: Agreement Checking
Sentence: “The boy runs.”
Feature matching:
| Word | Feature |
|---|---|
| boy | Number = singular |
| runs | Verb form = singular |
Correct agreement → sentence valid.
Incorrect example: “The boy run.”
Feature mismatch → grammar error.
Morphological Analysis and the Lexicon
Morphological Analysis
Morphology studies the internal structure of words.
Morphological analysis identifies:
- root word
- prefixes
- suffixes
- grammatical properties
Example word: “unhappiness”
Structure:
| Component | Meaning |
|---|---|
| un | prefix |
| happy | root |
| ness | suffix |
Types of Morphology
| Type | Description | Example |
|---|---|---|
| Inflectional Morphology | Changes grammatical form | play → played |
| Derivational Morphology | Creates new words | happy → happiness |
Lexicon in NLP
The lexicon is a dictionary of words used by NLP systems.
Each word entry contains:
| Field | Description |
|---|---|
| Word | actual word |
| Part of Speech | noun, verb, adjective |
| Features | number, tense |
| Meaning | semantic information |
Example Lexicon Entry
| Word | POS | Feature |
|---|---|---|
| boy | noun | singular |
| eat | verb | base form |
| eats | verb | present, 3rd person |
Lexicons help the system identify word properties during parsing.
Parsing with Features
Feature-based parsing ensures grammatical agreement using feature matching. The parser checks whether features of words match grammar rules.
Example Grammar Rule with Features
Traditional rule:
S → NP VP
Feature-based rule:
S → NP[NUM=?n] VP[NUM=?n]
Meaning:
- The number feature of NP and VP must match.
Example Parsing
Sentence: “The boys play football.”
Feature analysis:
| Word | Feature |
|---|---|
| boys | plural |
| play | plural verb |
Feature match → sentence accepted.
Incorrect example: “The boys plays football.”
Plural subject + singular verb → parser rejects.
Augmented Transition Networks (ATN)
An Augmented Transition Network (ATN) is an advanced form of Transition Network Grammar used in Artificial Intelligence for parsing natural language.ATN extends finite state networks with additional capabilities.
Components of ATN
| Component | Description |
|---|---|
| Nodes | states of parsing |
| Arcs | transitions between states |
| Registers | memory variables |
| Tests | conditions to check grammar |
ATN Parsing Example
Sentence structure:
Start → NP → VP → End
Network representation:
(Start)
|
NP
|
VP
|
(End)
During parsing, the system:
- Moves through states
- Checks grammatical conditions
- Stores information in registers
Advantages of ATN
| Advantage | Explanation |
|---|---|
| Handles complex grammar | Supports recursion |
| Efficient parsing | Reduces ambiguity |
| Supports feature checking | Works with feature systems |
Relationship Between Feature Systems and Augmented Grammars
| Concept | Role |
|---|---|
| Feature System | Stores grammatical properties |
| Lexicon | Stores word information |
| Morphological Analysis | Breaks words into components |
| Parsing with Features | Checks agreement |
| ATN | Uses features during parsing |
Together they allow NLP systems to process language more accurately.
Key Points
| Topic | Important Idea |
|---|---|
| Feature Systems | Attribute–value grammatical representation |
| Morphological Analysis | Study of word structure |
| Lexicon | NLP dictionary of words |
| Parsing with Features | Agreement checking during parsing |
| ATN | Advanced parsing network with memory |
Short
Feature systems represent grammatical properties of words using attribute–value pairs such as number, tense, and gender. Morphological analysis studies the internal structure of words and uses a lexicon that stores word information. Parsing with features ensures grammatical agreement between words during sentence analysis. Augmented Transition Networks are advanced parsing models that extend transition networks by adding memory registers, conditions, and recursive capabilities to process complex natural language sentences.