0% found this document useful (0 votes)
24 views56 pages

Unit-5 - CD - Code Generation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views56 pages

Unit-5 - CD - Code Generation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

COMPILER DESIGN

Unit 5: Code Generation

Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN

Unit 5: Code Generation


Introduction

Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation

Introduction

➢The final phase in our compiler model is the code generator.

➢Code generator takes as input the intermediate representation(IR)


produced by the front end of the compiler, along with relevant symbol
table information, and produces as output a semantically equivalent
target program, as shown in Figure.

Source Intermediate Code Intermediate Code Target


Program Front End code code Program
Optimizer Generator
COMPILER DESIGN
Unit 5: Code Generation

Introduction
• Linear representations - 3AC/SSA (Quadruples / Triples / Indirect triples)
Intermediate Representation • VM instructions (Bytecodes / Stack machine codes)
• Graphical representations (Syntax trees / DAGs)

Code Generator
• RISC (many registers, 3AC, simple addressing modes, simple ISA)
• CISC (few registers, 2AC, variety of addressing modes, variable length
Target Machine Code instructions, Instruction may take more than a single clock cycle to get
executed)
• Stack machine (push/pop, stack top uses registers, used in JVM, JIT
compilation)

*
COMPILER DESIGN
Unit 5: Code Generation

Introduction
t0 = y
Intermediate Representation t0 = t0 + z
x = t0

Code Generator

LD R0, y
Target Machine Code ADD R0, R0, z
ST x, R0
COMPILER DESIGN
Unit 5: Code Generation

Introduction

➢The requirements imposed on a code generator are severe.


1. The target program must preserve the semantic meaning of the
source program.
o Meaning intended by the programmer in the original source program
should carry forward in each compilation stage until code-generation.

2. The target program must be of high quality.


o Execution time or space or energy or …

3. The code generator itself must run efficiently.


o Instruction Selection, Register Allocation and Instruction ordering.
*
COMPILER DESIGN
Unit 5: Code Generation

Introduction

➢The challenges in code generation are,


1. Mathematically, the problem of generating an optimal target
program for a given source program is undecidable;

2. Many of the subproblems encountered in code generation such


as register allocation are Computationally intractable(NP-Hard).

*
COMPILER DESIGN
Unit 5: Code Generation

Introduction
➢A code generator has three primary tasks:
IR Code:
1. Instruction Selection t0 = y
o It involves choosing appropriate target-machine instructions to t0 = t0 + z
implement the IR statements. x = t0

2. Register Allocation & Assignment


Target Code:
o It involves deciding what values to keep in which registers.
LD R0, y
3. Instruction ordering ADD R0, R0, z
o It involves deciding in what order to schedule the execution of ST x, R0
instructions
COMPILER DESIGN

Unit 5: Code Generation


Issues in the Design of a Code Generator

Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation

Issues in the Design of a Code Generator

➢The code generator design issues details are dependent on


1. the specifics of Intermediate Representation,

2. the Target Language, and the Run-time system,

3. tasks such as Instruction Selection, Register Allocation and Assignment,


and Instruction Ordering.

➢The most important criterion for a code generator is that it produce


correct target code.

*
COMPILER DESIGN
Unit 5: Code Generation

Issues in the Design of a Code Generator

Issues in the Design of a Code Generator are:

1. Input to the Code Generator

2. The Target Program

3. Instruction Selection

4. Register Allocation

5. Evaluation Order
COMPILER DESIGN
Unit 5: Code Generation

1. Input to the Code Generator

➢The input to the code generator is

1. The intermediate representation(IR) of the source program


produced by the front end, and

2. Information in the symbol table that is used to determine the run-


time addresses of the data objects denoted by the names in the
IR.
COMPILER DESIGN
Unit 5: Code Generation

1. Input to the Code Generator

➢The many choices for the intermediate representation(IR) include


1. Three-address representations such as
o Quadruples, Triples, and Indirect triples
2. Virtual machine representations such as
o Bytecodes and stack-machine code
3. Linear representations such as
o Postfix notation
4. Graphical representations such as
o Syntax trees and DAG's.
COMPILER DESIGN
Unit 5: Code Generation

1. Input to the Code Generator


➢Assumptions:

o The front end has scanned, parsed, and translated the source program into a
relatively low-level Intermediate Representation.

o All syntactic and static semantic errors have been detected properly.

o The necessary type checking has taken place, and that type conversion operators
have been inserted wherever necessary.

o The code generator can therefore proceed on the assumption that its input is free of
these kinds of errors.

*
COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

➢The instruction-set architecture of the target machine has a significant


impact on the difficulty of constructing a good code generator that
produces high-quality machine code.

➢The most common target-machine architectures are

a) RISC (Reduced Instruction Set Computer),

b) CISC (Complex Instruction Set Computer), and

c) Stack based Architecture.


COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

a) A RISC machine typically has


Examples of RISC microprocessors are Alpha,
o many registers,
ARC, ARM, AVR, MIPS, PA-RISC, PIC, Power
o three-address instructions,
Architecture, and SPARC
o simple addressing modes, and
o a relatively simple instruction-set architecture.

b) A CISC machine typically has


o few registers, Examples of CISC processors are the
o two-address instructions, System/360, VAX, PDP-11, Motorola 68000
o a variety of addressing modes, family, AMD and Intel x86 CPUs.
o variable-length instructions.

*
COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

c) Stack based Architecture

o Stack-based architectures were revived with the introduction of the JVM.

o Example: In a stack based virtual machine, the operation of adding two numbers
would usually be carried out in the following manner (where 20, 7, and ‘result’
are the operands):

1.POP 20
2.POP 7
3.ADD 20, 7, result
4.PUSH result
*
COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

The target program is the output of the code generator.


The output may be absolute machine language code, relocatable machine language code
or assembly language code

➢ Producing an absolute machine-language program as output has the advantage that it


can be placed in a fixed location in memory and immediately executed.

➢ Producing a relocatable machine-language program (often called an object module) as


output allows subprograms to be compiled separately.
A set of relocatable object modules can be linked together and loaded for execution by a
linking loader. *
COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

The target program is the output of the code generator.


The output may be absolute machine language code, relocatable machine language code
or assembly language code

➢ Producing an assembly-language program as output makes the process of code


generation somewhat easier.

We can generate symbolic instructions and use the macro facilities of the assembler to
help generate code.

The price paid is the assembly step after code generation.


COMPILER DESIGN
Unit 5: Code Generation

2. The Target Program

➢For readability, we use assembly code as the target language.


As long as addresses can be calculated from offsets and other information
stored in the symbol table, the code generator can produce relocatable or
absolute addresses for names just as easily as symbolic addresses.
COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection
➢The code generator must map the IR program into a code sequence
IR Code:
that can be executed by the target machine.
t0 = y
t0 = t0 + z
x = t0

➢The complexity of Instruction Selection depends upon Target Code:


a) Level of the IR LD R0, y
b) Nature of the Instruction-Set Architecture(ISA) ADD R0, R0, z
c) Desired quality of the generated code. ST x, R0
COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection

The complexity of Instruction Selection depends upon

a) Level of the IR
➢ If the IR is high level, the code generator may translate each IR statement into a
sequence of machine instructions using code templates.
Such statement-by-statement code generation, however, often produces poor code
that needs further optimization.

➢ If the IR reflects some of the low-level details of the underlying machine, then the
code generator can use this information to generate more efficient code sequences.

o e.g., intsize versus 4.


COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection

The complexity of Instruction Selection depends upon

b) Nature of the Instruction-Set Architecture(ISA)


➢ The nature of the instruction set of the target machine has a strong effect on the difficulty of instruction
selection.
For example, the uniformity and completeness of the instruction set are important factors.

▪ If the target machine does not support each data type in a uniform manner, then each exception to the
general rule requires special handling. On some machines, for example, floating-point operations are
done using separate registers.

▪ The set of instructions are said to be complete if the target machine includes a sufficient number of
instructions in each of the category (Arithmetic, logical, shift, move, conditional-and-unconditional-jumps and i/o instructions)
COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection

The complexity of Instruction Selection depends upon


c) Desired quality of the generated code.
➢ If we do not care about the efficiency of the target program, instruction selection is
straightforward.
▪ For each type of three-address statement, we can design a code skeleton that defines the
target code to be generated for that construct.
▪ For example, every three-address statement of the form x = y + z, where x, y, and z are
statically allocated, can be translated into the code sequence
COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection

➢If we do not care about the efficiency of the target program, instruction selection
is straightforward.

This strategy often produces redundant loads and stores. For example, the
sequence of three-address statements

would be translated into

a=b+c

d=a+e

Here, the fourth statement is redundant since it loads a value that has just been stored,
and so is the third if a is not subsequently used.
COMPILER DESIGN
Unit 5: Code Generation

3. Instruction Selection

➢ The quality of the generated target code is usually determined by its speed and size.
▪ A given IR program can be implemented by many different target code sequences, with
significant cost differences between the different implementations.

▪ A naive translation of the intermediate code may therefore lead to correct but
unacceptably inefficient target code.
For example, if the target machine has an INC instruction, then a = a + 1 may be
implemented more efficiently by the single instruction INC a, rather than
COMPILER DESIGN
Unit 5: Code Generation

4. Register Allocation

➢ Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values.

➢ Instructions involving register operands are invariably shorter and faster than those
involving operands in memory, so efficient utilization of registers is particularly important.

➢ A key problem in code generation is deciding what values to hold in what registers.

➢ Keep values in registers as long as possible to minimize the number of load / store
statements executed.

➢ Values not held in registers need to reside in memory.


COMPILER DESIGN
Unit 5: Code Generation

4. Register Allocation

➢The use of registers is often subdivided into two subproblems:


▪ Register allocation, during which we select the set of variables that will reside in
registers at each point in the program.

▪ Register assignment, during which we pick the specific register that a variable will
reside in.
Register allocation - deciding which values to keep in registers. Register assignment - choosing specific registers for values.

➢ Finding an optimal assignment of registers to variables is difficult, even with single-


register machines. Mathematically, the problem is NP-complete.
COMPILER DESIGN
Unit 5: Code Generation

4. Register Allocation
➢ The register allocation problem is further complicated because the hardware and/or the
Before operation:
operating system of the target machine may require that certain register-usage conventions
2 1

be observed. x y

➢ Example 8.1 : Certain machines require register-pairs (an even and next odd numbered multiplicand multiplier

register) for some operands and results.

For example, on some machines, integer multiplication and integer division involve register After operation:
pairs. 2 1
x y
▪ The multiplication instruction is of the form M x, y
product
where x, the multiplicand, is the even register of an even/odd register pair and y, the
multiplier, is the odd register. The product occupies the entire even/odd register pair.
COMPILER DESIGN
Unit 5: Code Generation

4. Register Allocation

➢ Example 8.1 : cont… Before operation:


2 1
• The division instruction is of the form x y
dividend divisor

where the dividend occupies an even/odd register pair whose even register is x; After operation:
2 1
the divisor is y. After division, the even register holds the remainder and the x y
odd register the quotient. remainder quotient
COMPILER DESIGN
Unit 5: Code Generation

5. Evaluation Order

➢ The order in which computations are performed can affect the efficiency of the
target code.

• Some computation orders require fewer registers to hold intermediate


results than others.

• Picking a best computation order in the general case is a difficult NP-


complete problem.

➢ Initially, we shall avoid the problem by generating code for the TAC in the order in
which they have been produced by the intermediate code generator.
COMPILER DESIGN

Unit 5: Code Generation


Target Machine Model (Hypothetical Model)

Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model


➢ Our target computer models a three-address machine with
1. Load and Store operations,
2. Computation operations,
3. Jump operations, and
4. Conditional jumps.
➢ The underlying computer is a byte-addressable machine with n general-
purpose registers, R0, R1, . . . , Rn - 1.
➢ To avoid hiding the concepts in a myriad of details, we shall use a very limited
set of instructions and assume that all operands are integers.
➢ Most instructions consists of an operator, followed by a target, followed by a
list of source operands. Ex: op dest, src1, src2
op dest, src
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model


➢ We assume the following kinds of instructions are available:
dest src
1. Load (from memory) ( LD reg, (memloc) )

dest src
2. Store (to memory) ( ST (memloc), reg )

dest src
3. Move (b/w registers) ( MOV reg1, reg2 )
reg may be register, memory location
4. Computations ( op, dest, src1, src2 ) Or immediate constant.

a) ADD
b) SUB
c) MUL
d) DIV
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model

5. Unconditional jumps ( BR L )

6. Conditional jumps ( Bcond R, L )

cond : LTZ, GTZ, EZ, LTEZ, GTEZ

For example, BLTZ R, L causes a jump to label L if the value in


register R is less than zero, and allows control to pass to the next
machine instruction if not.
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model


Addressing Modes
1. Direct Addressing mode:
• In direct addressing mode, the address field contains the address of the operand.

• Example: Add the content of R1 and the content of 1001(memory


address) and store the result back to R1

ADD R1, (1001) Here 1001 is the address where operand is stored.

Example (for hypothetical model): ADD R1, a

Note: In our hypothetical model, for simplicity, instead of memory address of a variable,
we are using variable name in the instruction itself.
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model


Addressing Modes
2. Index addressing mode: Index addressing mode is used to access an
array whose elements are in successive memory locations.
Example: x = a[i] op dest src1 src2
LD R1 i
t1 = 4 * i
MUL R1 R1 #4
t2 = a[t1] r-value
x = t2 MOV R2 R1(a)
ST x R2
In the above MOV instruction
R2 ← contents (contents of R1 + a )
R2 ← contents (offset + base address)
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model


Addressing Modes

3. Indirect addressing mode:


Here two references are required. First reference to get effective address.
Second reference to access the data.
Example : x = *p
op dest src1 src2
LD R1 p
t1 = *p r-value
MOV R2 0(R1)
x = t1
ST x R2
In the above MOV instruction: R2 ← contents(0 + contents of R1),
that is, loading into R2 the value in the memory location obtained by adding 0 to the
contents of register R1.
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model

Addressing Modes
4. Immediate addressing mode: In this mode data is present in address field
of instruction.
Example: a = 100

a = 100 op dest src1 src2


LD R1 #100
ST a R1

Note: Limitation in the immediate mode is that the range of constants are restricted by size
of address field.
COMPILER DESIGN
Unit 5: Code Generation

Target Machine Model

Note:
1. Byte-addressable machine.

2. N general purpose registers are available:


R0, R1, R2…………….. Rn-1

3. Assume all operands are integers.

4. Comments are preceded by //


COMPILER DESIGN
Unit 5: Code Generation

Generate Three address code and Target code for

1) a[i] = c

t1 = 4 * i op dest src1 src2

a[t1] = c LD R1 i
MUL R1 R1 #4
LD R2 c
ST R1(a) R2
l-value

Contents of (R1 + a ) ← R2
COMPILER DESIGN
Unit 5: Code Generation

Generate Target code for

2) if x < y goto L

op dest src1 src2


LD R1 x R1 has x

LD R2 y R2 has y

SUB R1 R1 R2 R1 has x-y

BLTZ R1 M

M is the equivalent machine instruction generated for label L


COMPILER DESIGN
Unit 5: Code Generation

Generate Target code for


op dest src1 src2
3) i=0 R1 represents i, and its initial value is 0
LD R1 #0
s=0
MOV R2 R1 R2 represents s, and its initial value is 0
L1: if i >= n goto L2 LD R3 n R3 has value n.
s=s+i L1: SUB R4 R3 R1 R4 has value n-i
i=i+1 BEZ R4 L2
goto L1 ADD R2 R2 R1 R2 has value s, i.e., s=s+i

L2: ADD R1 R1 #1 R1 has value i, i.e., i=i+1

BR L1
L2: ST i R1
ST s R2
COMPILER DESIGN
Unit 5: Code Generation

Generate Target code for


4) F = 1;
while(N > 0)
op dest src1 src2
{
LD R1 #1 R1 represents F, and its initial value is 1
F = F * N;
LD R2 N R2 has value N
N = N – 1;
}
L1: BLTEZ R2 L2
MUL R1 R1 R2
TAC:
SUB R2 R2 #1 R2 has value N, i.e., N=N-1
F=1
L1 : if(N <= 0) goto L2 BR L1
F = F * N L2: ST F R1
N = N - 1
goto L1 ST N R2
L2 :
COMPILER DESIGN
Unit 5: Code Generation
COMPILER DESIGN
Unit 5: Code Generation
COMPILER DESIGN
Unit 5: Code Generation

How do we generate Target code when procedures are involved?

Need mechanisms for:


int findFact(int n){
Passing arguments
int i, fact=1;

for(i=1;i<=n;i++) Local Storage

fact=fact*i; Returning results

return fact; Linking control

}
COMPILER DESIGN
Unit 5: Code Generation

How do we generate Target code when procedures are involved?

Many questions to answer:


1. What does the dynamic execution of functions look like?
2. Where is the executable code for functions located?
3. How are parameters passed in and out of functions?
4. Where are local variables stored?
COMPILER DESIGN
Unit 5: Code Generation

Memory Layout of an executable program


Low address
Machine code of the program
Code
determined at compile time.

Static Global constants (fixed-size and static data)

Dynamic data objects (class objects or objects created by


Heap malloc/calloc)

Both are dynamic, cannot be


Free Memory determined at compile time.

Runtime stack or control-stack stores data structures called


Stack activation records for the called procedures.
High address
Figure: Subdivision of run-time memory into code and data areas
COMPILER DESIGN
Unit 5: Code Generation

Code generation for procedures (Static allocation)


Example 1:

IC for Procedure c() Assuming static allocation for procedures.


action The code for procedures c() and p() are kept at the memory locations 100 and
action 200 respectively.
call p
Also, the activation records for procedures c() and p() are kept at the memory
action
halt locations 300 and 364 respectively.

IC for Procedure p()


Note: CPU Registers occupies zero bytes in target code instructions. Opcodes,
action immediate constants, memory addresses and memory operands(variables) needs
return 4-bytes in target code instructions.

Note: action denotes set of three-address statements, target code of action (i.e., ACTION) takes 20 bytes memory.
COMPILER DESIGN
Unit 5: Code Generation

Code generation for procedures (Static allocation)


Example 1:

IC for Procedure c() Target code for c() Activation record of c()
action 100: ACTION 300: …
action 120: ACTION
call p 140: ST 364, 160
action 152: BR 200
halt 160: ACTION
180: HALT
IC for Procedure p() Activation record of p()
action Target code for p() 364: 160
return 200: ACTION
220: BR *364

Note: action denotes set of three-address statements, target code of action (i.e., ACTION) takes 20 bytes memory.
COMPILER DESIGN
Unit 5: Code Generation

Code generation for procedures (Static allocation)

Exercise 1: Generate Target code for the following TAC, assuming static allocation for procedures.
The code for procedures p() and q() are kept at the memory locations 100 and 300 respectively.
Also, the activation records for procedures p() and q() are kept at the memory locations 400 and 600
respectively. Assume the names m, n and x represent addresses.
Note: CPU Registers occupies zero bytes in target code instructions. Opcodes, immediate constants, memory
addresses and memory operands(variables) needs 4-bytes in target code instructions.

IC for Procedure p() IC for Procedure q()


m=5 x=2*x
n=m*2 Return
call q
halt
COMPILER DESIGN
Unit 5: Code Generation

Code generation for procedures (Static allocation)


Exercise 1: Solution Target code for p() Activation record of p()
100: LD R1, #5 400: …
IC for Procedure p()
108: ST m, R1
m=5 116: MUL R1, R1, #2
n=m*2 124: ST n, R1
call q 132: ST 600, 152
halt 144: BR 300
152: HALT Activation record of q()
IC for Procedure q() 600: 152
x=2*x
Target code for q()
return
300: LD R1, x
308: MUL R1,R1,#2
316: BR *600
COMPILER DESIGN
References

➢ Compilers–Principles, Techniques and Tools, Alfred V. Aho, Monica S. Lam,


Ravi Sethi, Jeffery D. Ullman, 2nd Edition

➢ https://markfaction.wordpress.com/2012/07/15/stack-based-vs-register-
based-virtual-machine-architecture-and-the-dalvik-vm/
THANK YOU

Prakash C O
Department of Computer Science and Engineering
[email protected]
+91 98 8059 1946
COMPILER DESIGN
Unit 5: Code Generation

4. Register Allocation
➢Consider the two three-address code sequences in Fig. 8.2 in which the only
difference in (a) and (b) is the operator in the second statement. The shortest
assembly-code sequences for (a) and (b) are given in Fig. 8.3.

Ri stands for register i. SRDA stands for Shift-Right-Double-Arithmetic and


SRDA R0,32 shifts the dividend into R1 and clears R0 so all bits equal its sign bit.

You might also like