Code Generation
Code Generation
Code Generation is the final phase of a compiler, where the intermediate code (IC) is translated into target machine code or assembly code.
Objectives of Code Generation
- Produce correct target code
- Make code efficient in time and space
- Use machine resources effectively
Design Issues in Code Generation
The code generator must address several practical and architectural issues.
Major Design Issues
| Issue | Description |
|---|---|
| Input to Code Generator | Intermediate code (TAC, syntax tree) |
| Target Language | Assembly or machine code |
| Instruction Selection | Choose efficient machine instructions |
| Register Allocation | Decide which variables go into registers |
| Memory Management | Access to variables and temporaries |
| Evaluation Order | Order of expression evaluation |
| Code Optimization | Improve performance and reduce size |
The Target Language
The target language is usually:
- Assembly language, or
- Machine code
Characteristics of a Good Target Language
- Simple instruction set
- Support for registers
- Fast execution
- Easy translation from IC
Example (Target Assembly Style)
Addresses in the Target Code
During code generation, variables must be mapped to addresses.
Types of Addressing
| Address Type | Description |
|---|---|
| Absolute Address | Fixed memory location |
| Relative Address | Offset from base or stack pointer |
| Register Address | Value stored in register |
| Indexed Address | Base + index (used for arrays) |
Example
Target-style code:
Basic Blocks
A Basic Block is a maximal sequence of consecutive statements with:
- Single entry point
- Single exit point
- No jumps except at the end
Identification of Basic Blocks
A statement is a leader if:
- It is the first statement
- It is the target of a jump
- It follows a jump instruction
Example
Intermediate Code:
Basic Blocks
| Block | Statements |
|---|---|
| B1 | 1, 2, 3 |
| B2 | 4 |
| B3 | 5 |
Flow Graphs
A Flow Graph is a directed graph where:
- Nodes = Basic Blocks
- Edges = Possible flow of control
Purpose
- Used for optimization
- Helps in data-flow analysis
Optimization of Basic Blocks
Optimizations applied within a basic block are called local optimizations.
Common Local Optimizations
| Optimization | Description |
|---|---|
| Common Subexpression Elimination | Avoid repeated computation |
| Dead Code Elimination | Remove unused statements |
| Constant Folding | Compute constants at compile time |
| Strength Reduction | Replace costly ops with cheaper ones |
| Copy Propagation | Replace variables with their values |
Example
Before Optimization:
After Optimization:
Code Generator
The Code Generator converts optimized IC into target code.
Functions of Code Generator
| Function | Purpose |
|---|---|
| Instruction Selection | Map IC → machine instructions |
| Register Allocation | Assign variables to registers |
| Address Calculation | Generate correct memory references |
| Instruction Scheduling | Reduce stalls and delays |
Code Optimization
Code Optimization improves the program without changing its meaning.
Goals
- Reduce execution time
- Reduce memory usage
- Improve power efficiency (modern systems)
Types of Code Optimization
(a) Machine-Independent Optimization
| Type | Example |
|---|---|
| Constant Folding | 3*4 → 12 |
| Dead Code Elimination | Remove unused code |
| Loop Invariant Code Motion | Move constant expressions out of loop |
| Algebraic Simplification | x*1 → x |
(b) Machine-Dependent Optimization
| Type | Example |
|---|---|
| Register Allocation | Use registers instead of memory |
| Instruction Scheduling | Avoid pipeline stalls |
| Peephole Optimization | Replace instruction sequences |
Peephole Optimization Example
Before:
After:
Comparison: Code Generation vs Code Optimization
| Aspect | Code Generation | Code Optimization |
|---|---|---|
| Purpose | Produce target code | Improve target/intermediate code |
| Phase | Final phase | Multiple phases |
| Output | Assembly / machine code | Faster & smaller code |
| Mandatory | Yes | Optional but desirable |
Exam-Oriented Summary Table
| Topic | Key Points |
|---|---|
| Code Generation | Final compiler phase |
| Design Issues | Instruction, registers, memory |
| Target Language | Assembly/machine code |
| Addresses | Register, memory, indexed |
| Basic Blocks | Single entry & exit |
| Flow Graphs | Control-flow representation |
| Optimization | Improve speed & space |
| Code Generator | IC → Target code |
Machine-Independent Optimizations
Machine-independent optimizations are optimizations that do not depend on the target machine architecture.
They are applied mainly on intermediate code (IR) and aim to improve execution time and memory usage.
Key Characteristics
- Independent of registers and instruction sets
- Mostly based on program semantics and data flow
- Applied before code generation
Common Machine-Independent Optimizations
| Optimization | Description |
|---|---|
| Constant Folding | Evaluate constant expressions at compile time |
| Constant Propagation | Replace variables with known constant values |
| Common Subexpression Elimination | Avoid recomputation of same expression |
| Dead Code Elimination | Remove statements with no effect |
| Copy Propagation | Replace copies by original values |
| Loop Optimization | Improve loop performance |
Loop Optimization
Loops often execute many times, so optimizing them yields significant performance gains.
Why Loop Optimization Is Important
- Reduces repeated computations
- Minimizes loop overhead
- Improves cache and memory behavior
Common Loop Optimizations
(a) Loop Invariant Code Motion
Move computations outside the loop if their values do not change.
Before:
After:
(b) Strength Reduction
Replace expensive operations with cheaper ones.
Before:
After:
(c) Induction Variable Elimination
Remove unnecessary loop variables derived from others.
Summary of Loop Optimizations
| Technique | Benefit |
|---|---|
| Invariant Code Motion | Fewer computations |
| Strength Reduction | Faster arithmetic |
| Induction Elimination | Reduced variables |
| Loop Unrolling (advanced) | Reduced loop overhead |
DAG Representation of Basic Blocks
A Directed Acyclic Graph (DAG) is used to represent computations inside a basic block.
Purpose of DAG
- Identify common subexpressions
- Detect dead code
- Enable local optimizations
- Leaves → operands (variables/constants)
- Internal nodes → operators
- Edges → flow of computation
Example
Basic Block:
DAG Optimization:
a + bcomputed oncet2eliminated
Benefits of DAG
| Benefit | Explanation |
|---|---|
| Common Subexpression Elimination | Single node per expression |
| Dead Code Detection | Unused nodes removed |
| Efficient Code Generation | Optimized evaluation order |
Value Numbers and Algebraic Laws
Value Numbering
Value numbering assigns a unique number to each distinct computed value.
Purpose
- Detect equivalence of expressions
- Eliminate redundant computations
Example of Value Numbering
Here, t2 does not need recomputation.
Algebraic Laws Used in Optimization
| Law | Example |
|---|---|
| Commutative | a + b = b + a |
| Associative | (a + b) + c = a + (b + c) |
| Identity | x + 0 = x |
| Zero | x * 0 = 0 |
| Idempotent | x - x = 0 |
Application
Used along with value numbering to simplify expressions.
Global Data-Flow Analysis
Global Data-Flow Analysis studies the flow of data across basic blocks in a program.
Why Global Analysis?
- Local optimization works within a block
- Global optimization works across blocks
Data-Flow Analysis Framework
- Based on Control Flow Graph (CFG)
- Uses equations to compute information
Components
| Component | Description |
|---|---|
| CFG | Nodes = basic blocks |
| IN[B] | Data before block |
| OUT[B] | Data after block |
| Transfer Function | Effect of block |
Common Global Data-Flow Problems
| Analysis | Purpose |
|---|---|
| Reaching Definitions | Which definitions reach a point |
| Live Variable Analysis | Variables needed in future |
| Available Expressions | Expressions already computed |
| Very Busy Expressions | Expressions needed on all paths |
Example: Live Variable Analysis
A variable is live if its value is used later.
Here, x is live after first statement.
Role of Data-Flow Analysis in Optimization
| Optimization | Data-Flow Used |
|---|---|
| Dead Code Elimination | Live variable analysis |
| Common Subexpression | Available expressions |
| Register Allocation | Liveness information |
| Code Motion | Reaching definitions |
Combined Exam-Oriented Summary Table
| Topic | Key Points |
|---|---|
| Machine-Independent Optimization | IR-level improvements |
| Loop Optimization | Reduce repeated work |
| DAG | Local optimization tool |
| Value Numbering | Detect equal expressions |
| Algebraic Laws | Expression simplification |
| Global Data-Flow | Cross-block analysis |