UNIT IV
Run Time Storage Allocation and Code Generation
Source Language Issues - Storage Organization- Storage Allocation-Parameter Passing,
Symbol Tables - Dynamic Storage Allocation- Code Generation -Issues in Design of a Code
Generator
Introduction
Run Time Environment
• Run Time Environment establishes relationships between names and data objects.
• The allocation and de-allocation of data objects are managed by the Run Time
Environment
• Each execution of a procedure is referred to as an activation of the procedure.
• If the procedure is recursive, several of its activations may & alive at the same time. Each
call of a procedure leads to an activation that may manipulate data objects allocated for
its use.
• The representation of a data object at run time is determined by its type.
• Elementary data types, such as characters, integers, and reals can be represented by
equivalent data objects in the target machine.
• However, aggregates, such as arrays, strings , and structures, are usually represented by
collections of primitive objects.
Source Language Issues
• Procedure
• Activation Trees
• Control Stack
• The Scope of a Declaration
• Bindings of Names
Procedure
• A procedure definition is a declaration that associates an identifier with a statement. The
identifier is the procedure name and the statement is the procedure body.
• A procedure returns value for the called function.
• A complete program will also be treated as a procedure.
• When a procedure name appears within an executable statement, we say that the
procedure is called at that point.
• The basic idea is that a procedure call executes the procedure body.
1
• Some of the identifiers appearing in a procedure definition are special, and are called
formal parameters of the procedure.
• Actual parameters may be passed to a called procedure.
• Procedures can contains local and global variables .
Activation Trees
• Each execution of procedure is referred to as an activation of the procedure.
• Lifetime of an activation is the sequence of steps present in the execution of the
procedure.
• If ‘a’ and ‘b’ be two procedures then their activations will be non-overlapping (when one
is called after other) or nested (nested procedures).
• A procedure is recursive if a new activation begins before an earlier activation of the
same procedure has ended.
• An activation tree shows the way control enters and leaves activations.
Rules to Construct an Activation Tree
• Each node represents an activation of a procedure.
• The root node represents the activation of the main program.
• The node for a is the parent of the node for b if and only if control flows from activation a
to b.
• The node for a is to the left of the node for b if and only if the lifetime of a occurs before
the lifetime of b.
Sample Code for Quick sort
main()
{
Int n;
readarray();
quicksort(1,n);
}
quicksort(int m, int n)
{
Int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
2
Stack Control
• We can use a stack , called a control stack to keep track of live procedure activations.
• The idea is to push the node for activation onto the control stack as the activation begins
and to pop the node when the activation ends.
• Then the contents of the control stack are related to paths to the root of the activation tree.
• When node n is at the top of the control stack, the stack contains the nodes along the path
from n to the root.
Example
• The activation tree that have been reached when control enters the activation represented
by q(2,3 ) . Activations with labels r, p(1, 9), p(1, 3), and q(1, 3) have executed to
completion, so the figure contains dashed lines to their nodes. The solid lines mark the
path from q(2, 3) to the root.
Scope of Declaration
• A declaration in a language is a syntactic construct that associates information with a
name. Declarations may be explicit, as in the Pascal fragment var i : integer; or they may
be Implicit.
• For example, any variable name starting with I is assumed to denote an integer in a
Fortran program, unless otherwise declared.
• The scope rules of a language determine which declaration of a name applies when the
name appears in the text of a program.
• The portion of the program to which a declaration applies is called the scope of that
declaration. An occurrence of a name in a procedure is said to be local to the procedure if
it is in the scope of a declaration within the procedure; otherwise, the occurrence is said
to be nonlocal.
3
• At compile time, the symbol table can be used to find the declaration that applies to an
occurrence of a name.
• Special, static, global, volatile, final and so on are also used to declare variables.
Binding of Names
• Even if each name is declared once in a program, the same name may denote different data
objects at run time. The informal term "data object" corresponds to a storage location that
can hold values.
• In programming language semantics, the term environment refers to a function that maps
a name to a storage location, and the term state refers to a function that maps a storage
location to the value held here
• Environments and states are different; an assignment changes the state, but not the
environment. For example, suppose that storage address 100, associated with variable pi,
holds 0. After the assignment pi := 3. 14, the same storage address is associated with pi,
but the value held there is 3.14.
Binding of Names
• Activation Records
Activation Records
• Procedure calls and returns are usually managed by a run-time stack called the control
stack. Each live activation has an activation record (sometimes called a frame) on the
control stack. The contents of activation records vary with the language being
implemented.
• The following are the contents in an activation record
o Temporary values, such as those arising from the evaluation of expressions, in
cases where those temporaries cannot be held in registers.
o Local data belonging to the procedure whose activation record this is.
o A saved machine status, with information about the state of the machine just
before the call to the procedure. This information typically includes the return
4
address and the contents of registers that were used by the calling procedure and
that must be restored when the return occurs.
o An "access link" may be needed to locate data needed by the called procedure but
found elsewhere, e.g., in another activation record.
o A control link, pointing to the activation record of the caller.
o Space for the return value of the called function, if any. Again, not all called
procedures return a value, and if one does, we may prefer to place that value in a
register for efficiency.
o The actual parameters used by the calling procedure. Commonly, these values are
not placed in the activation record but rather in registers.
Storage Organization
• The executing target program runs in its own logical address space in which each program
value has a location. The management and organization of this logical address space is
shared between the compiler, operating system, and target machine. The operating
system maps the logical addresses into physical addresses, which are usually spread
throughout memory.
• The run-time representation of an object program in the logical address space consists of
data and program areas.
• The run time storage is subdivided to hold code and data as follows:
o The generated target code
o Data objects
o Control stack(which keeps track of information of procedure activations0)
• The size of the generated target code is fixed at compile time, so the compiler can place
the executable target code in a statically determined area Code, usually in the low end of
memory.
• The size of some program data objects, such as global constants, and data generated by
the compiler, such as information to support garbage collection, may be known at compile
time, and these data objects can be placed in another statically determined area called
Static.
• One reason for statically allocating as many data objects as possible is that the addresses
of these objects can be compiled into the target code.
• In early versions of Fortran, all data objects could be allocated statically.
Storage Allocation
• There are basically three storage-allocation strategy is used in each of the three data areas
in the organization.
o Static allocation lays out storage for all data objects at compile time.
o Stack allocation manages the run-time storage as a stack,
5
o Heap allocation allocates and de-allocates storage as needed at run time from a
data area known as a heap.
o
Static Allocation
• In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time support package.
• Since the bindings do not change at run time, every time a procedure is activated, its
names are bound to the same storage locations.
• The above property allows the values of local names to be retained across activations of a
procedure. That is, when control returns to a procedure, the values of the locals are the
same as they were when control left the last time.
• From the type of a name, the compiler determines the amount of storage to set aside for
that name.
• The address of this storage consists of an offset from an end of the activation record for
the procedure.
• The compiler must eventually decide where the activation records go, relative to
the target code and to one another.
Limitation of Static Allocation
• The size of a data object and constraints on its position in memory must be known at
compile time.
• Recursive procedures are restricted, because all activations of a procedure use the same
bindings for local names.
• Dynamic allocation is not allowed. so data structures cannot be created dynamically.
6
Static Allocation
Stack Allocation
• Stack allocation is based on the idea of a control slack.
• A stack is a Last In First Out (LIFO) storage device where new storage is allocated and
deallocated at only one ``end'', called the top of the stack.
• Storage is organized as a stack, and activation records are pushed and popped as
activations begin and end, respectively.
• Storage for the locals in each call of a procedure is contained in the activation record for
that call. Thus locals are bound to fresh storage in each activation, because a new
activation record is pushed onto the stack when a call is made.
• Furthermore, the values of locals are detected when the activation ends. that is, the values
are lost because the storage for locals disappears when the activation record is popped.
• At run time, an activation record can be allocated and de-allocated by incrementing and
decrementing top of the stack respectively.
Stack Allocation
Heap Allocation
• The deallocation of activation records need not occur in a last-in first-out fashion, so
storage cannot be organized as a stack.
• Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects. Pieces may be deallocated in any order. So over time the heap
will consist of alternate areas that are free and in use.
• Heap is an alternate for stack.
7
Heap Allocation
Parameter Passing
• The communication medium among procedures is known as parameter passing. The
values of the variables from a calling procedure are transferred to the called procedure
by some mechanism.
R- value
• The value of an expression is called its r-value. The value contained in a single variable
also becomes an r-value if its appear on the right side of the assignment operator.
• R-value can always be assigned to some other variable.
L-value
• The location of the memory(address) where the expression is stored is known as the l-
value of that expression.
• It always appears on the left side if the assignment operator.
Different ways of passing the parameters to the procedure
• Call by Value
• Call by reference
• Copy restore
• Call by name
Call by Value
• In call by value the calling procedure pass the r-value of the actual parameters and the
compiler puts that into called procedure’s activation record.
• Formal parameters hold the values passed by the calling procedure, thus any changes
made in the formal parameters does not affect the actual parameters.
8
Call by Reference
• In call by reference the formal and actual parameters refers to same memory location.
• The l-value of actual parameters is copied to the activation record of the called function.
Thus the called function has the address of the actual parameters.
• If the actual parameters does not have a l-value (eg- i+3) then it is evaluated in a
new temporary location and the address of the location is passed.
• Any changes made in the formal parameter is reflected in the actual parameters (because
changes are made at the address).
Call by Copy Restore
• In call by copy restore compiler copies the value in formal parameters when the
procedure is called and copy them back in actual parameters when control returns to the
called function.
• The r-values are passed and on return r-value of formals are copied into l-value of actuals.
9
Call by Name
• In call by name the actual parameters are substituted for formals in all the places formals
occur in the procedure.
• It is also referred as lazy evaluation because evaluation is done on parameters only when
needed.
Symbol Table
• Symbol tables are data structures that are used by compilers to hold information about
source-program constructs. The information is collected incrementally by the
analysis phases of a compiler and used by the synthesis phases to generate the target
code.
• Entries in the symbol table contain information about an identifier such as its character
string (or lexeme) , its type, its position in storage, and any other relevant information.
Symbol Table
• The symbol table, which stores information about the entire source program, is used by
all phases of the compiler.
• An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name.
• These attributes may provide information about the storage allocated for a name, its type,
its scope.
10
• A symbol table can be implemented in one of the following ways:
o Linear (sorted or unsorted) list
o Binary Search Tree
o Hash table
• Among the above all, symbol tables are mostly implemented as hash tables, where the
source code symbol itself is treated as a key for the hash function and the return value is
the information about the symbol.
• A symbol table may serve the following purposes depending upon the language in hand:
o To store the names of all entities in a structured form at one place.
o To verify if a variable has been declared.
o To implement type checking, by verifying assignments and expressions.
o To determine the scope of a name (scope resolution).
Symbol-Table Entries
• A compiler uses a symbol table to keep track of scope and binding information about
names. The symbol table is searched every time a name is encountered in the source text.
• Changes to the table occur if a new name or new information about an existing name is
discovered. A linear list is the simplest to implement, but its performance is poor. Hashing
schemes provide better performance.
• The symbol table grows dynamically even though fixed at compile time.
• Each entry in the symbol table is for the declaration of a name.
• The format of entries does not uniform.
• The following information about identifiers are stored in symbol table.
o The name.
o The data type.
o The block level.
o Its scope (local, global).
o Pointer / address
o Its offset from base pointer
o Function name, parameter and variable.
Dynamic Storage Allocation
• The techniques needed to implement dynamic storage allocation is mainly depends on
how the storage deallocated. If deallocation is implicit, then the run-time support package
is responsible for determining when a storage block is no longer needed. There is less
a compiler has to do if deallocation is done explicitly by the programmer.
Explicit Allocation of Fixed-Sized Blocks
11
• The simplest form of dynamic allocation involves blocks of a fixed size.
• Allocation and deallocation can be done quickly with little or no storage overhead.
• Suppose that blocks are to be drawn from a contiguous area of storage. Initialization of
the area is done by using a portion of each block for a link to the next block.
• A pointer available points to the first block. Allocation consists of taking a block off the list
and deallocation consists of putting the block back on the list.
Explicit Allocation of Variable-Sized Blocks
• When blocks are allocated and deallocated, storage can become fragmented; that is, the
heap may consist of alternate blocks that are free.
• The situation can occur if a program allocates five blocks and then de- allocates the second
and fourth.
• Fragmentation is of no consequence if blocks are of fixed size, but if they are of variable
size, because we could not allocate a block larger than any one of the free blocks, even
though the space is available.
• First fit, worst fit and best fit are some methods for allocating variable-sized blocks.
Implicit Deallocation
• Implicit deallocation requires cooperation between the user program and the run-
time package, because the latter needs to know when a storage block is no longer in use.
• This cooperation is implemented by fixing the format of storage blocks.
12
• Reference counts:
o We keep track of the number of blocks that point directly to the present block. If
this count ever drops to 0, then the block can be deallocated because it cannot be
referred to.
o In other words, the block has become garbage that can be collected. Maintaining
reference counts can be costly in time.
• Marking techniques:
o An alternative approach is to suspend temporarily execution of the user program
and use the frozen pointers to determine which blocks are in use.
Code Generation
Code generator is used to produce the target code for three-address statements. It uses registers
to store the operands of the three address statement.
Example:
Consider the three address statement x:= y + z. It can have the following sequence of codes:
MOV x, R0
ADD y, R0
Register and Address Descriptors:
o A register descriptor contains the track of what is currently in each register. The register
descriptors show that all the registers are initially empty.
o An address descriptor is used to store the location where current value of the name can
be found at run time.
A code-generation algorithm:
The algorithm takes a sequence of three-address statements as input. For each three address
statement of the form a:= b op c perform the various actions. These are as follows:
1. Invoke a function getreg to find out the location L where the result of computation b op c
should be stored.
13
2. Consult the address description for y to determine y'. If the value of y currently in memory
and register both then prefer the register y' . If the value of y is not already in L then
generate the instruction MOV y' , L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z. if z is
in both then prefer a register to a memory location. Update the address descriptor of x to
indicate that x is in location L. If x is in L then update its descriptor and remove x from all
other descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or in
register then alter the register descriptor to indicate that after execution of x : = y op z
those register will no longer contain y or z.
Generating Code for Assignment Statements:
The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following sequence
of three address code:
1. t:= a-b
2. u:= a-c
3. v:= t +u
4. d:= v+u
Code sequence for the example is as follows:
Statement Code Generated Register descriptor Address descriptor
Register empty
t:= a - b MOV a, R0 R0 contains t t in R0
SUB b, R0
u:= a - c MOV a, R1 R0 contains t t in R0
SUB c, R1 R1 contains u u in R1
v:= t + u ADD R1, R0 R0 contains v u in R1
R1 contains u v in R1
d:= v + u ADD R1, R0 R0 contains d d in R0
MOV R0, d d in R0 and memory
14
Issues in Design of a Code Generator
Code generator converts the intermediate representation of source code into a form that can be
readily executed by the machine. A code generator is expected to generate the correct code.
Designing of the code generator should be done in such a way that it can be easily implemented,
tested, and maintained.
The following issue arises during the code generation phase:
Input to code generator – The input to the code generator is the intermediate code generated by
the front end, along with information in the symbol table that determines the run-time addresses
of the data objects denoted by the names in the intermediate representation. Intermediate codes
may be represented mostly in quadruples, triples, indirect triples, Postfix notation, syntax trees,
DAGs, etc. The code generation phase just proceeds on an assumption that the input is free from
all syntactic and state semantic errors, the necessary type checking has taken place and the type-
conversion operators have been inserted wherever necessary.
• Target program: The target program is the output of the code generator. The output may
be absolute machine language, relocatable machine language, or assembly language.
• Absolute machine language as output has the advantages that it can be placed in a
fixed memory location and can be immediately executed. For example, WATFIV is
a compiler that produces the absolute machine code as output.
• Relocatable machine language as an output allows subprograms and subroutines
to be compiled separately. Relocatable object modules can be linked together and
loaded by a linking loader. But there is added expense of linking and loading.
• Assembly language as output makes the code generation easier. We can generate
symbolic instructions and use the macro-facilities of assemblers in generating
code. And we need an additional assembly step after code generation.
• Memory Management – Mapping the names in the source program to the addresses of
data objects is done by the front end and the code generator. A name in the three address
statements refers to the symbol table entry for the name. Then from the symbol table
entry, a relative address can be determined for the name.
Instruction selection – Selecting the best instructions will improve the efficiency of the program.
It includes the instructions that should be complete and uniform. Instruction speeds and machine
idioms also play a major role when efficiency is considered. But if we do not care about the
efficiency of the target program then instruction selection is straightforward. For example, the
respective three-address statements would be translated into the latter code sequence as shown
below:
P:=Q+R
S:=P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
Here the fourth statement is redundant as the value of the P is loaded again in that statement that
just has been stored in the previous statement. It leads to an inefficient code sequence. A given
15
intermediate representation can be translated into many code sequences, with significant cost
differences between the different implementations. Prior knowledge of instruction cost is needed
in order to design good sequences, but accurate cost information is difficult to predict.
• Register allocation issues – Use of registers make the computations faster in comparison
to that of memory, so efficient utilization of registers is important. The use of registers is
subdivided into two subproblems:
1. During Register allocation – we select only those sets of variables that will reside in the
registers at each point in the program.
2. During a subsequent Register assignment phase, the specific register is picked to access
the variable.
To understand the concept consider the following three address code sequence
t:=a+b
t:=t*c
t:=t/d
Their efficient machine code sequence is as follows:
MOV a,R0
ADD b,R0
MUL c,R0
DIV d,R0
MOV R0,t
1. Evaluation order – The code generator decides the order in which the instruction will be
executed. The order of computations affects the efficiency of the target code. Among many
computational orders, some will require only fewer registers to hold the intermediate
results. However, picking the best order in the general case is a difficult NP-complete
problem.
2. Approaches to code generation issues: Code generator must always generate the correct
code. It is essential because of the number of special cases that a code generator might
face. Some of the design goals of code generator are:
• Correct
• Easily maintainable
• Testable
• Efficient
16