0% found this document useful (0 votes)
299 views12 pages

CD UNIT-V Intermediate Code Generation

Uploaded by

irfanrohith8897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
299 views12 pages

CD UNIT-V Intermediate Code Generation

Uploaded by

irfanrohith8897
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 5 – Intermediate Code Generation

1. Role of intermediate code generation. OR


What is intermediate code? Which are the advantages of it?
 In the analysis-synthesis
synthesis model of a compiler, the front end translates a source program
into an intermediate representation from which backend generates target code.
 The generation of an intermediate language leads to efficient code generation.
Static type Intermediate Code
Parser
checker code generator generator

Fig.5.1 Position of intermediate code generator in compiler


 There are certain advantages of generating machine independent intermediate code,
1. A compiler for a different machine can be created by attaching a back end for the
new machine to an existing front end.
2. A machine independent code optimizer can be applied to intermediate code in
order to optimize the code generation.

2. Explain different intermediate forms.


There are three types of intermediate representation,
representation
1. Abstract syntax tree
2. Postfix notation
3. Three address code
Abstract syntax tree
 A syntax tree depicts the natural hierarchical structure of a source program.
 A DAG (Directed Acyclic Graph) gives the same information but in a more compact way
because common sub-expressions
sub are identified.
 A syntax tree and DAG for the assignment statement a = b*-c + b*-c is given in Fig. 5.2.

Assign Assign

a + a +

* * *

b Uminus b Uminus
b Uminus

c c
c

Fig.5. Syntax tree & DAG for a = b*-c + b*-c


Fig.5.2

Dixita Kagathara, CE Department | 170701 – Compiler Design 53


Unit 5 – Intermediate Code Generation

Postfix notation
 Postfix notation is a linearization of a syntax tree.
 In postfix notation the operands occurs first and then operators are arranged.
 the postfix notation for the syntax tree in Fig. 5.2 is,
a b c uminus * b c uminus * + assign.
Three address code
 Three address code is a sequence of statements
statement of the general form,
a:= b op c
 Where a, b or c are the operands that can be names or constants. And op stands for any
operator.
 For the expression
pression like a = b + c + d might be translated into a sequence
sequence,
t1=b+c
t2=t1+d
a= t2
 Here t1 and t2 are the temporary names generated by the compiler.
 There are at most three addresses allowed (two for operands and one for result). Hence,
this representation is called three-address
three code.

3. Implementations of three address code.


 There are three types of representation used for three address code,,
1. Quadruples
2. Triples
3. Indirect
ndirect triples
 Consider the input statement x:= -a*b + -a*b.
 Three address code for above statement given in table 5.1,
tj : t1= - a

t2 := t1 * b

: t3= - a
t4 := t3 * b

t5 := t2 + t4

x : x= t5
Table 5.1 Three address code
Quadruple representation
 The quadruple is a structure with at the most four fields such as op, arg1, arg2.
 The op field is used to represent the internal code for operator.
 The
he arg1 and arg2 represent the two operands.
 And result field is used to store the result of an expression.
 Statement with unary operators like x= -y do not use arg2.
Dixita Kagathara, CE Department | 170701 – Compiler Design 54
Unit 5 – Intermediate Code Generation
 Conditional and unconditional jumps put the target label in result.
Number Op Arg1 Arg2 result

(0) uminus a t1

(1) * t1 b t2

(2) uminus a t3
(3) * t3 b t4
(4) + t2 t4 t5
(5) := t5 x
Table 5.2 Quadruple representation
Triples
 To avoid entering temporary names into the symbol table, we might refer a temporary
value by the position of the statement that computes it.
 If we do so, three address statements
statement can be represented by records with only three
fields:: op, arg1 and arg2.
Number Op Arg1 Arg2

(0) uminus a

(1) * (0) b

(2) uminus a

(3) * (2) b

(4) + (1) (3)

(5) := X (4)
Table 5.3 Triple representation
Indirect Triples
 In the indirect triple representation
represen the listing of triples has been done. And listing
pointers are used instead of using statement.
statement
 This implementation is called indirect triples.
Number Op Arg1 Arg2 Statement

(0) uminus a (0) (11)

(1) * (11) b (1) (12)

(2) uminus a (2) (13)

(3) (14)

Dixita Kagathara, CE Department | 170701 – Compiler Design 55


Unit 5 – Intermediate Code Generation
(3) * (13) b (4) (15)

(4) + (12) (14) (5) (16)

(5) := X (15)
Table 5.4 Indirect triple representation

4. Syntax directed translation mechanism.


 For obtaining the three address code the SDD translation scheme or semantic rules must
be written for each source code statement.
 There are various programming constructs for which the semantic rules can be defined.
 Using these rules the corresponding intermediate
intermediate code in the form of three address
code can be generated.
 Various programming constructs are,
1. Declarative statement
2. Assignment statement
3. Arrays
4. Boolean expressions
5. Control statement
6. Switch case
7. Procedure call
Declarative Statement
 In the declarative statements the data items along with their data types are declared.
Example:
S D {offset=0}
D  id: T {enter([Link], [Link], offset);
offset=offset+[Link]}
T integer {[Link]:=integer;
[Link]:=4}
T real {[Link]:=real;
[Link]:=8}
Tarray[num]
array[num] of T1 {[Link]:=array([Link],[Link])
[Link]:=[Link] X [Link] }
T  *T1 {[Link]:=pointer([Link])
[Link]:=4}
Table 5.5 Syntax directed translation for Declarative statement
 Initially, the value of offset is set to zero. The computation of offset can be done by using
the formula offset = offset + width.
 In the above translation scheme [Link] and [Link] are the synthesized attribute
attributes.
 The type indicates the data type of corresponding identifier and width is used to indicate
the memory units associated with an identifier of corresponding type.
 The rule Did:Tid:T is a declarative statement for id declaration.
Dixita Kagathara, CE Department | 170701 – Compiler Design 56
Unit 5 – Intermediate Code Generation
 The enter function used for creating the symbol table entry for identifier along with its
type and offset.
 The width of array is obtained by multiplying the width of each element by number of
elements in the array.
Assignment statement
 The assignment statement mainly deals with the expressions.
 The expressions can be of type integer, real, array and record.
 Consider the following grammar,
grammar
Sid :=E
EE1 + E2
EE1 * E2
E-E1
E(E1)
Eid
The translation scheme of above grammar is given in table 5.6.
Production Rule Semantic actions
S id :=E { p=look_up([Link]);
If p≠ nil then
emit(p = [Link])
else error;}
E E1 + E2 { [Link]=newtemp();
emit ([Link]=[Link] ‘+’ [Link])}
E E1 * E2 { [Link]=newtemp();
emit ([Link]=[Link] ‘*’ [Link])}
E -E1 { [Link]=newtemp();
emit ([Link]=’uminus’ [Link])}
E (E1) {[Link]=[Link]}
E id { p=look_up([Link]);
If p≠ nil then
emit (p = [Link])
else
error;}
Table 5.6. Translation scheme to produce three address code for assignments
 The look_up returns the entry for [Link] in the symbol table if it exists there.
 The function emit is for appending the three address code to the output file.
Otherwise an error will be reported.
 newtemp() is the function for generating new temporary variables.
 [Link] is used to hold the value of E.
 Consider
onsider the assignment statement x:=(a+b) *(c+d),
*(c+d)
Production Rule Semantic action for Output
Attribute evaluation

Dixita Kagathara, CE Department | 170701 – Compiler Design 57


Unit 5 – Intermediate Code Generation
Eid [Link] := a

Eid [Link] := b

EE1 +E2 [Link] := t1 t1 := a+b


Eid [Link] := c

Eid [Link] := d

EE1 + E2 [Link] := t2 t2 := c+d

EE1 * E2 [Link] := t3 t3 := (a+b)*(c+d)


Sid := E x := t3
Table 5.7 Three address code for Assignment statement
Arrays
 Array is a contiguous storage of elements.
 Elements of an array can be accessed quickly if the elements are stored in a block of
consecutive locations. If the width of each array element is w,, then the ith element of
array A begins in location,
location
base + ( i – low ) x w
 Where low is the lower bound on the subscript and base is the relative address of the
storage allocated for the array. That is, base is the relative address of A A[low].
 The expression can be partially evaluated at compile time if it is rewritten as as,
i x w + ( base – low x w)
 The sub expression c = base – low x w can be evaluated when the declaration of the
array is seen. We assume that c is saved in the symbol table entry for A , so the relative
address of A[i] is obtained by simply adding i x w to c.
 There are two representation of array,
1. Row major representation.
2. Column major representation.
 In the case of row-major major form,
form the relative address of A[ i1,, i2] can be calculated by the
formula:
base + ((i1 – low1) x n2 + i2 – low2) x w
 where, low1 and low2 are the lower bounds on the values of i1 and i2 and n2 is the
number of values that i2 can take. That is, if high2 is the upper bound on the value of i2,
then n2 = high2 –low2 low2 + 1.
 Assuming that i1 and i2 are the only values that are known at compile time, we can
rewrite the above expression as
((i1 x n2) + i2) x w + (base – ((low1 x n2) + low2) x w)
 Generalized formula: ula: The expression generalizes to the following expression for the
relative address of A[i1,i2,…,ik
i1,i2,…,ik]
(( . . . (( i1n2 + i2 ) n3 + i3) . . . ) nk + ik ) x w + base – (( . . .((low1n2 + low2)n3 +

Dixita Kagathara, CE Department | 170701 – Compiler Design 58


Unit 5 – Intermediate Code Generation
low3) . . .)nk + lowk) x w
for all j, nj = highj – lowj + 1
 The Translation Scheme for Addressing Array Elements :
SL:=E
E E + E
E ( E )
E L
L Elist ]
L id
ElistElist , E
Elistid [ E
 The translation scheme for generating three address code is given by using appropriate
semantic actions.
Production Rule Semantic Rule
SL:=E { if [Link] = null then
emit ( [Link] ‘: =’ [Link] ) ;
else
emit ( [Link] ‘ [‘ [Link] ‘ ]’ ‘: =’ [Link])) }
E E + E { [Link] : = newtemp;
emit ( [Link] ‘: =’ [Link] ‘ +’ [Link] ) }
E ( E ) { [Link] : = [Link] }
E L { if [Link] = null then
[Link] : = [Link]
else begin
[Link] : = newtemp;
emit ( [Link] ‘: =’ [Link] ‘ [‘ [Link] ‘]’)
end }
L Elist ] { [Link] : = newtemp;
[Link] : = newtemp;
emit ([Link] ‘: =’ c( [Link] ));
emit([Link]‘:=’[Link]‘*’width
([Link])) }
L id { [Link] := [Link];
[Link] := null }
ElistElist , E { t := newtemp;
dim : = [Link] + 1;
emit(t‘:=’[Link]‘*’limit([Link],di dim));
emit ( t ‘: =’ t ‘+’ [Link]);
[Link] : = [Link];
[Link] : = t;
[Link] : = dim }
Elistid [ E { [Link] : = [Link];

Dixita Kagathara, CE Department | 170701 – Compiler Design 59


Unit 5 – Intermediate Code Generation
[Link] : = [Link];
[Link] : = 1 }
Table 5.8 Syntax directed translation scheme to generate three address code for Array
 Annotated parse tree for x=A[i, j] is given in figure 5.3.
S

[Link]=x = [Link]=t4
[Link]=NULL

[Link]=t2
[Link]= t3

[Link]= t1 ]
[Link]=2
[Link]=A

,
[Link]=y [Link]=z
[Link]=1
[Link]=A
[Link]=z
[Link]=null
[
A [Link]=y
z

[Link]= y
[Link]=null

y
Fig 5.3. Annotated parse tree for x:=A[y, z]
Boolean expressions
 Normally there are two types of Boolean expressions used,
used
1. For computing the logical values.
2. In conditional expressions using if-then-else
if or while-do.
 Consider the Boolean expression generated by following grammar :
EE OR E

Dixita Kagathara, CE Department | 170701 – Compiler Design 60


Unit 5 – Intermediate Code Generation
EE AND E
ENOT E
E(E)
Eid relop id
ETRUE
EFALSE
 The relop is denoted by <=,
< >=,, <, >. The OR and AND are left associate.
 The highest precedence is to NOT then AND and lastly OR.
EE1 OR E2 {E .place:=newtemp()
Emit ([Link] ':=[Link] "OR' [Link])}
EE1 AND E2 {E .place:=newtemp()
Emit ([Link] ':=[Link] "AND' [Link])}
ENOT E1 {E .place:=newtemp()
Emit ([Link] ':="NOT' [Link])}
E(E1) {[Link] := [Link] }
Eid1 relop id2 {E. place := newtemp()
Emit ('if [Link] [Link] [Link] 'goto'
next_state +3);
Emit ([Link]':=' '0' );
Emit ('goto' next state +2);
Emit ([Link] := '1')}
ETRUE {E .place:=newtemp()
Emit ([Link] ':=' '1')}
EFALSE {E .place:=newtemp()
Emit ([Link] ':=' '0')}
Table 5.9 Syntax directed translation scheme to generate three address code for Boolean expression
 The function Emit generates the three address code and newtemp () is for generation of
temporary variables.
 For the semantic action for the rule E  id1 relop id2 contains next_state which gives
the index of next three address statement in the output sequence.
sequence
 Let us take an example and generate the three address code using above translation
scheme:
p > q AND r < s OR u > v
100: if p > q goto 103
101: t1:=0
102: goto 104
103: t1:=1
104: if r < s goto 107
105: t2:=0
106: goto 108
107: t2=1
108:if u>v goto 111

Dixita Kagathara, CE Department | 170701 – Compiler Design 61


Unit 5 – Intermediate Code Generation
109:t3=0
110:goto 112
111:t3=1
112:t4=t1 AND t2
113:t5=t4 OR t3
Control statement
 The control statements are if-then-else
if and while-do.
 The grammar and translation scheme forr such statements is given in table 5.10.
 S  if E then S1| if E then S1 else S2 | while E do S1
S->if E then S1 {[Link]=new_label()
[Link]=new_label()
[Link]=[Link]
[Link]=[Link]
[Link]=[Link]||gen_code([Link]’:’)||[Link]}
S->if
>if E then S1 else S2 {[Link]=new_label()
[Link]=new_label()
[Link]=[Link]
[Link]=[Link]
[Link]=[Link]||gen_code([Link]’:’)||[Link]
||gen_code([Link]’:’)||[Link]
||gen_code(‘goto’,[Link])
||gen_code([Link]’:’)
gen_code([Link]’:’) || [Link]}
S->while E do S1 {[Link]=new_label()
[Link]=new_label()
[Link]=[Link]
[Link]=[Link]
[Link]=gen_code([Link]’:’)||[Link]
||gen_code([Link]’:’)
||[Link]||gen_code(‘goto’,[Link])}
Table 5.10 Syntax directed translation scheme to generate three address code for Control statement
 Consider the statement: if a<b then a=a+5 else a=a+7
 Three address code for above statement using semantic rule is,
is
100: if a<b goto L1
101: goto 103
102: L1: a=a+5
103: a=a+7
Switch case
 Consider the following switch statement;
switch E

Dixita Kagathara, CE Department | 170701 – Compiler Design 62


Unit 5 – Intermediate Code Generation
begin
case V1: S1
case V2: S2
….
case Vn-1
n : Sn-1
default Sn
default:
end
 Syntax directed translation scheme to translate this case statement into intermediate
code is given in table 5.11.
Code to evaluate E into t
goto test
L1: Code for S1
goto next
L2 Code for S2
goto next
…….
Ln-1 Code for Sn-1
goto next
Ln Code for Sn
goto next
test: If t=V1 goto L1
If t=V1 goto L1

If t=V1 goto L1
goto Ln
next:
Table 5.11 Syntax directed translation scheme to generate three address code for switch case
 When we see the keyword switch, we generate two new labels test and next and a new
temporary t.
 After processing E, we generate the jump goto test.
 We process each statement case Vi : Si by emitting the newly created label Li, followed
by code for Si, followed by the jump goto next.
 When the keyword end terminating the body of switch is found, we are ready to
generate the code for n-way
n branch.
 Readingg the pointer value pairs on the case stack from the bottom to top , we can
generate a sequence of three address code of the form,
case V1 L1
case V2 L2
……
case Vn-1 Ln-1
case t Ln
label next

Dixita Kagathara, CE Department | 170701 – Compiler Design 63


Unit 5 – Intermediate Code Generation
 Where t is the name holding the value of selector expression E, and Ln is the label for
default statement.
 The case Vi Li three address statement is a synonym for if t= Vi goto Li.
Procedure call
 Procedure or function is an important programming construct which is used to obtain
the modularity in the user program.
 Consider a grammar for a simple procedure call, call
Scall id (L)
LL, E
LE
 Here S denotes the statement and L denotes the list of parameters.
 And E denotes the expression.
 The translation scheme can be as given below,
Production rule Semantic Action
Scall id (L) { for each item p in queue do
append(‘param’ p);
append(‘call’ [Link]);}
LL,E { insert [Link] in the queue }
LE { initialize the queue and insert [Link] in the queue }
Table 5.12 Syntax directed translation scheme to generate three address code for procedure call
 The data structure queue is used to hold the various parameters of the procedure.
 The keyword param is used to denote list of parameters passed to the procedure.
 The call to the procedure is given by ‘call id’ where id denotes the name of procedure.
 [Link] gives the value of parameter which is inserted in the queue.
 For LEE the queue gets empty and a single pointer to the symbol table is obtained. This
pointer denotes the value of E.

Dixita Kagathara, CE Department | 170701 – Compiler Design 64

You might also like