Unit-5 - CD - Code Generation
Unit-5 - CD - Code Generation
Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation
Introduction
Introduction
• Linear representations - 3AC/SSA (Quadruples / Triples / Indirect triples)
Intermediate Representation • VM instructions (Bytecodes / Stack machine codes)
• Graphical representations (Syntax trees / DAGs)
Code Generator
• RISC (many registers, 3AC, simple addressing modes, simple ISA)
• CISC (few registers, 2AC, variety of addressing modes, variable length
Target Machine Code instructions, Instruction may take more than a single clock cycle to get
executed)
• Stack machine (push/pop, stack top uses registers, used in JVM, JIT
compilation)
*
COMPILER DESIGN
Unit 5: Code Generation
Introduction
t0 = y
Intermediate Representation t0 = t0 + z
x = t0
Code Generator
LD R0, y
Target Machine Code ADD R0, R0, z
ST x, R0
COMPILER DESIGN
Unit 5: Code Generation
Introduction
Introduction
*
COMPILER DESIGN
Unit 5: Code Generation
Introduction
➢A code generator has three primary tasks:
IR Code:
1. Instruction Selection t0 = y
o It involves choosing appropriate target-machine instructions to t0 = t0 + z
implement the IR statements. x = t0
Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation
*
COMPILER DESIGN
Unit 5: Code Generation
3. Instruction Selection
4. Register Allocation
5. Evaluation Order
COMPILER DESIGN
Unit 5: Code Generation
o The front end has scanned, parsed, and translated the source program into a
relatively low-level Intermediate Representation.
o All syntactic and static semantic errors have been detected properly.
o The necessary type checking has taken place, and that type conversion operators
have been inserted wherever necessary.
o The code generator can therefore proceed on the assumption that its input is free of
these kinds of errors.
*
COMPILER DESIGN
Unit 5: Code Generation
*
COMPILER DESIGN
Unit 5: Code Generation
o Example: In a stack based virtual machine, the operation of adding two numbers
would usually be carried out in the following manner (where 20, 7, and ‘result’
are the operands):
1.POP 20
2.POP 7
3.ADD 20, 7, result
4.PUSH result
*
COMPILER DESIGN
Unit 5: Code Generation
We can generate symbolic instructions and use the macro facilities of the assembler to
help generate code.
3. Instruction Selection
➢The code generator must map the IR program into a code sequence
IR Code:
that can be executed by the target machine.
t0 = y
t0 = t0 + z
x = t0
3. Instruction Selection
a) Level of the IR
➢ If the IR is high level, the code generator may translate each IR statement into a
sequence of machine instructions using code templates.
Such statement-by-statement code generation, however, often produces poor code
that needs further optimization.
➢ If the IR reflects some of the low-level details of the underlying machine, then the
code generator can use this information to generate more efficient code sequences.
3. Instruction Selection
▪ If the target machine does not support each data type in a uniform manner, then each exception to the
general rule requires special handling. On some machines, for example, floating-point operations are
done using separate registers.
▪ The set of instructions are said to be complete if the target machine includes a sufficient number of
instructions in each of the category (Arithmetic, logical, shift, move, conditional-and-unconditional-jumps and i/o instructions)
COMPILER DESIGN
Unit 5: Code Generation
3. Instruction Selection
3. Instruction Selection
➢If we do not care about the efficiency of the target program, instruction selection
is straightforward.
This strategy often produces redundant loads and stores. For example, the
sequence of three-address statements
a=b+c
d=a+e
Here, the fourth statement is redundant since it loads a value that has just been stored,
and so is the third if a is not subsequently used.
COMPILER DESIGN
Unit 5: Code Generation
3. Instruction Selection
➢ The quality of the generated target code is usually determined by its speed and size.
▪ A given IR program can be implemented by many different target code sequences, with
significant cost differences between the different implementations.
▪ A naive translation of the intermediate code may therefore lead to correct but
unacceptably inefficient target code.
For example, if the target machine has an INC instruction, then a = a + 1 may be
implemented more efficiently by the single instruction INC a, rather than
COMPILER DESIGN
Unit 5: Code Generation
4. Register Allocation
➢ Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values.
➢ Instructions involving register operands are invariably shorter and faster than those
involving operands in memory, so efficient utilization of registers is particularly important.
➢ A key problem in code generation is deciding what values to hold in what registers.
➢ Keep values in registers as long as possible to minimize the number of load / store
statements executed.
4. Register Allocation
▪ Register assignment, during which we pick the specific register that a variable will
reside in.
Register allocation - deciding which values to keep in registers. Register assignment - choosing specific registers for values.
4. Register Allocation
➢ The register allocation problem is further complicated because the hardware and/or the
Before operation:
operating system of the target machine may require that certain register-usage conventions
2 1
be observed. x y
➢ Example 8.1 : Certain machines require register-pairs (an even and next odd numbered multiplicand multiplier
For example, on some machines, integer multiplication and integer division involve register After operation:
pairs. 2 1
x y
▪ The multiplication instruction is of the form M x, y
product
where x, the multiplicand, is the even register of an even/odd register pair and y, the
multiplier, is the odd register. The product occupies the entire even/odd register pair.
COMPILER DESIGN
Unit 5: Code Generation
4. Register Allocation
where the dividend occupies an even/odd register pair whose even register is x; After operation:
2 1
the divisor is y. After division, the even register holds the remainder and the x y
odd register the quotient. remainder quotient
COMPILER DESIGN
Unit 5: Code Generation
5. Evaluation Order
➢ The order in which computations are performed can affect the efficiency of the
target code.
➢ Initially, we shall avoid the problem by generating code for the TAC in the order in
which they have been produced by the intermediate code generator.
COMPILER DESIGN
Prakash C O
Department of Computer Science and Engineering
COMPILER DESIGN
Unit 5: Code Generation
dest src
2. Store (to memory) ( ST (memloc), reg )
dest src
3. Move (b/w registers) ( MOV reg1, reg2 )
reg may be register, memory location
4. Computations ( op, dest, src1, src2 ) Or immediate constant.
a) ADD
b) SUB
c) MUL
d) DIV
COMPILER DESIGN
Unit 5: Code Generation
5. Unconditional jumps ( BR L )
ADD R1, (1001) Here 1001 is the address where operand is stored.
Note: In our hypothetical model, for simplicity, instead of memory address of a variable,
we are using variable name in the instruction itself.
COMPILER DESIGN
Unit 5: Code Generation
Addressing Modes
4. Immediate addressing mode: In this mode data is present in address field
of instruction.
Example: a = 100
Note: Limitation in the immediate mode is that the range of constants are restricted by size
of address field.
COMPILER DESIGN
Unit 5: Code Generation
Note:
1. Byte-addressable machine.
1) a[i] = c
a[t1] = c LD R1 i
MUL R1 R1 #4
LD R2 c
ST R1(a) R2
l-value
Contents of (R1 + a ) ← R2
COMPILER DESIGN
Unit 5: Code Generation
2) if x < y goto L
LD R2 y R2 has y
BLTZ R1 M
BR L1
L2: ST i R1
ST s R2
COMPILER DESIGN
Unit 5: Code Generation
}
COMPILER DESIGN
Unit 5: Code Generation
Note: action denotes set of three-address statements, target code of action (i.e., ACTION) takes 20 bytes memory.
COMPILER DESIGN
Unit 5: Code Generation
IC for Procedure c() Target code for c() Activation record of c()
action 100: ACTION 300: …
action 120: ACTION
call p 140: ST 364, 160
action 152: BR 200
halt 160: ACTION
180: HALT
IC for Procedure p() Activation record of p()
action Target code for p() 364: 160
return 200: ACTION
220: BR *364
Note: action denotes set of three-address statements, target code of action (i.e., ACTION) takes 20 bytes memory.
COMPILER DESIGN
Unit 5: Code Generation
Exercise 1: Generate Target code for the following TAC, assuming static allocation for procedures.
The code for procedures p() and q() are kept at the memory locations 100 and 300 respectively.
Also, the activation records for procedures p() and q() are kept at the memory locations 400 and 600
respectively. Assume the names m, n and x represent addresses.
Note: CPU Registers occupies zero bytes in target code instructions. Opcodes, immediate constants, memory
addresses and memory operands(variables) needs 4-bytes in target code instructions.
➢ https://markfaction.wordpress.com/2012/07/15/stack-based-vs-register-
based-virtual-machine-architecture-and-the-dalvik-vm/
THANK YOU
Prakash C O
Department of Computer Science and Engineering
[email protected]
+91 98 8059 1946
COMPILER DESIGN
Unit 5: Code Generation
4. Register Allocation
➢Consider the two three-address code sequences in Fig. 8.2 in which the only
difference in (a) and (b) is the operator in the second statement. The shortest
assembly-code sequences for (a) and (b) are given in Fig. 8.3.