Code Generation



Code Generation

Code Generation is the final phase of a compiler, where the intermediate code (IC) is translated into target machine code or assembly code.

Objectives of Code Generation

  • Produce correct target code
  • Make code efficient in time and space
  • Use machine resources effectively

Design Issues in Code Generation

The code generator must address several practical and architectural issues.

Major Design Issues

IssueDescription
Input to Code GeneratorIntermediate code (TAC, syntax tree)
Target LanguageAssembly or machine code
Instruction SelectionChoose efficient machine instructions
Register AllocationDecide which variables go into registers
Memory ManagementAccess to variables and temporaries
Evaluation OrderOrder of expression evaluation
Code OptimizationImprove performance and reduce size

The Target Language

The target language is usually:

  • Assembly language, or
  • Machine code

Characteristics of a Good Target Language

  • Simple instruction set
  • Support for registers
  • Fast execution
  • Easy translation from IC

Example (Target Assembly Style)

LOAD R1, A ADD R1, B STORE R1, C

Addresses in the Target Code

During code generation, variables must be mapped to addresses.

Types of Addressing

Address TypeDescription
Absolute AddressFixed memory location
Relative AddressOffset from base or stack pointer
Register AddressValue stored in register
Indexed AddressBase + index (used for arrays)

Example

x = y + z

Target-style code:

LOAD R1, y ADD R1, z STORE R1, x

Basic Blocks

A Basic Block is a maximal sequence of consecutive statements with:

  • Single entry point
  • Single exit point
  • No jumps except at the end

Identification of Basic Blocks

A statement is a leader if:

  • It is the first statement
  • It is the target of a jump
  • It follows a jump instruction

Example

Intermediate Code:

1. a = b + c 2. d = a * e 3. if d > f goto L1 4. g = d - h L1: i = g + 1

Basic Blocks

BlockStatements
B11, 2, 3
B24
B35

Flow Graphs

A Flow Graph is a directed graph where:

  • Nodes = Basic Blocks
  • Edges = Possible flow of control

Purpose

  • Used for optimization
  • Helps in data-flow analysis

Optimization of Basic Blocks

Optimizations applied within a basic block are called local optimizations.

Common Local Optimizations

OptimizationDescription
Common Subexpression EliminationAvoid repeated computation
Dead Code EliminationRemove unused statements
Constant FoldingCompute constants at compile time
Strength ReductionReplace costly ops with cheaper ones
Copy PropagationReplace variables with their values

Example

Before Optimization:

t1 = a + b t2 = a + b c = t2

After Optimization:

t1 = a + b c = t1

Code Generator

The Code Generator converts optimized IC into target code.

Functions of Code Generator

FunctionPurpose
Instruction SelectionMap IC → machine instructions
Register AllocationAssign variables to registers
Address CalculationGenerate correct memory references
Instruction SchedulingReduce stalls and delays

Code Optimization

Code Optimization improves the program without changing its meaning.

Goals

  • Reduce execution time
  • Reduce memory usage
  • Improve power efficiency (modern systems)

Types of Code Optimization

(a) Machine-Independent Optimization

TypeExample
Constant Folding3*4 → 12
Dead Code EliminationRemove unused code
Loop Invariant Code MotionMove constant expressions out of loop
Algebraic Simplificationx*1 → x

(b) Machine-Dependent Optimization

TypeExample
Register AllocationUse registers instead of memory
Instruction SchedulingAvoid pipeline stalls
Peephole OptimizationReplace instruction sequences

Peephole Optimization Example

Before:

LOAD R1, x STORE R1, x

After:

(no code)

Comparison: Code Generation vs Code Optimization

AspectCode GenerationCode Optimization
PurposeProduce target codeImprove target/intermediate code
PhaseFinal phaseMultiple phases
OutputAssembly / machine codeFaster & smaller code
MandatoryYesOptional but desirable

Exam-Oriented Summary Table

TopicKey Points
Code GenerationFinal compiler phase
Design IssuesInstruction, registers, memory
Target LanguageAssembly/machine code
AddressesRegister, memory, indexed
Basic BlocksSingle entry & exit
Flow GraphsControl-flow representation
OptimizationImprove speed & space
Code GeneratorIC → Target code

Machine-Independent Optimizations

Machine-independent optimizations are optimizations that do not depend on the target machine architecture.
They are applied mainly on intermediate code (IR) and aim to improve execution time and memory usage.

Key Characteristics

  • Independent of registers and instruction sets
  • Mostly based on program semantics and data flow
  • Applied before code generation

Common Machine-Independent Optimizations

OptimizationDescription
Constant FoldingEvaluate constant expressions at compile time
Constant PropagationReplace variables with known constant values
Common Subexpression EliminationAvoid recomputation of same expression
Dead Code EliminationRemove statements with no effect
Copy PropagationReplace copies by original values
Loop OptimizationImprove loop performance

Loop Optimization

Loops often execute many times, so optimizing them yields significant performance gains.

Why Loop Optimization Is Important

  • Reduces repeated computations
  • Minimizes loop overhead
  • Improves cache and memory behavior

Common Loop Optimizations

(a) Loop Invariant Code Motion

Move computations outside the loop if their values do not change.

Before:

while (i < n) { x = a * b; y = x + i; }

After:

x = a * b; while (i < n) { y = x + i; }

(b) Strength Reduction

Replace expensive operations with cheaper ones.

Before:

x = i * 4;

After:

x = x + 4;

(c) Induction Variable Elimination

Remove unnecessary loop variables derived from others.

Summary of Loop Optimizations

TechniqueBenefit
Invariant Code MotionFewer computations
Strength ReductionFaster arithmetic
Induction EliminationReduced variables
Loop Unrolling (advanced)Reduced loop overhead

DAG Representation of Basic Blocks

A Directed Acyclic Graph (DAG) is used to represent computations inside a basic block.

Purpose of DAG

  • Identify common subexpressions
  • Detect dead code
  • Enable local optimizations

Construction of DAG

  • Leaves → operands (variables/constants)
  • Internal nodes → operators
  • Edges → flow of computation

Example

Basic Block:

t1 = a + b t2 = a + b c = t2

DAG Optimization:

  • a + b computed once
  • t2 eliminated

Benefits of DAG

BenefitExplanation
Common Subexpression EliminationSingle node per expression
Dead Code DetectionUnused nodes removed
Efficient Code GenerationOptimized evaluation order

Value Numbers and Algebraic Laws

Value Numbering

Value numbering assigns a unique number to each distinct computed value.

Purpose

  • Detect equivalence of expressions
  • Eliminate redundant computations

Example of Value Numbering

t1 = a + b → VN = 1 t2 = a + b → VN = 1 (same value) t3 = t1 + c → VN = 2

Here, t2 does not need recomputation.

Algebraic Laws Used in Optimization

LawExample
Commutativea + b = b + a
Associative(a + b) + c = a + (b + c)
Identityx + 0 = x
Zerox * 0 = 0
Idempotentx - x = 0

Application

Used along with value numbering to simplify expressions.

Global Data-Flow Analysis

Global Data-Flow Analysis studies the flow of data across basic blocks in a program.

Why Global Analysis?

  • Local optimization works within a block
  • Global optimization works across blocks

Data-Flow Analysis Framework

  • Based on Control Flow Graph (CFG)
  • Uses equations to compute information

Components

ComponentDescription
CFGNodes = basic blocks
IN[B]Data before block
OUT[B]Data after block
Transfer FunctionEffect of block

Common Global Data-Flow Problems

AnalysisPurpose
Reaching DefinitionsWhich definitions reach a point
Live Variable AnalysisVariables needed in future
Available ExpressionsExpressions already computed
Very Busy ExpressionsExpressions needed on all paths

Example: Live Variable Analysis

A variable is live if its value is used later.

x = a + b y = x + c

Here, x is live after first statement.

Role of Data-Flow Analysis in Optimization

OptimizationData-Flow Used
Dead Code EliminationLive variable analysis
Common SubexpressionAvailable expressions
Register AllocationLiveness information
Code MotionReaching definitions

Combined Exam-Oriented Summary Table

TopicKey Points
Machine-Independent OptimizationIR-level improvements
Loop OptimizationReduce repeated work
DAGLocal optimization tool
Value NumberingDetect equal expressions
Algebraic LawsExpression simplification
Global Data-FlowCross-block analysis