CD Module 4
CD Module 4
Semantics of a language provide meaning to its constructs, like tokens and syntax structure.
Semantics help interpret symbols, their types, and their relations with each other. Semantic
analysis judges whether the syntax structure constructed in the source program derives any
meaning or not.
For example:
int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally
correct, but it should generate a semantic error as the type of the assignment differs. These
rules are set by the grammar of the language and evaluated in semantic analysis.
Scope resolution
Type checking
Array-bound checking
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:
Type mismatch
Undeclared variable
Semantic analysis is the compiler phase that follows syntax analysis (parsing) and ensures
that the program adheres to the rules of the language, including:
Type checking: Ensuring operations are compatible with the types of data.
Scope resolution: Verifying that identifiers are defined and accessible in the correct
scope.
Name resolution: Ensuring variable, function, and type names are used consistently
and declared before use.
Semantic analysis relies on symbol tables to keep track of identifiers, types, and scope
information and on attribute grammars to manage the properties (attributes) of different
parts of the program.
Attributes
Attributes are values or properties associated with the nodes (symbols) in the syntax tree.
They carry semantic information required for further analysis, code generation, or
optimization. For example:
Attributes help propagate information through the syntax tree, assisting in tasks like type
checking, constant folding, and code generation.
1. Synthesized Attributes:
2. Inherited Attributes:
o These are computed based on the values of attributes from a node's parent or
siblings.
o Useful for tasks where a construct depends on the context of its parent or
surrounding elements, such as the type of a variable passed down from a
declaration.
Attribute Grammar
Attribute Grammar (AG) is a formal way of associating attributes with the grammar rules
of a language. An attribute grammar is essentially a context-free grammar enhanced with
rules for computing attributes, defining how the attributes for each symbol in the grammar
are derived.
1. Grammar rules: Context-free grammar production rules defining the structure of the
language.
3. Semantic rules: Rules associated with each production in the grammar to specify
how the attributes of symbols should be computed.
Example of Attribute Grammar for Arithmetic Expressions
E -> E1 + T
E -> T
T -> T1 * F
T -> F
F -> (E)
F -> id
Let's assume we want to compute the value of an arithmetic expression. We will use an
attribute val for this purpose, which will hold the computed value of each expression.
1. For E -> E1 + T:
2. For E -> T:
o [Link] = [Link]
3. For T -> T1 * F:
4. For T -> F:
o [Link] = [Link]
o [Link] = [Link]
A subset of attribute grammars is L-attributed grammars, which are compatible with top-
down parsing. In L-attributed grammars, each inherited attribute of a symbol depends only
on:
/\
E +
/ /\
T T *
/ / /\
F F F 2
| | |
3 5 2
1. Attribute Evaluation:
o [Link] = 3
o [Link] = [Link] = [Link] = 3
o [Link] = 5 (TF5)
o [Link] = 2(TF2)
In an attribute grammar, TAC can be generated as part of the semantic actions. Examples of
common TAC forms are:
Assignment statements (x = y op z)
t1 = 5 * 2
t2 = 3 + t1
Steps:
2. For each node, compute the synthesized attributes based on the values of its children’s
attributes.
3. Move up the tree, recursively calculating the synthesized attributes until you reach the
root.
Example:
This approach is efficient for pure synthesized attributes because the dependencies are
straightforward and only require one pass from the leaves to the root.
Top-down evaluation is suitable for inherited attributes, which depend on values from
parent or sibling nodes. Preorder traversal is used, where attributes are propagated from the
root down to the leaves.
Steps:
1. Start at the root node of the syntax tree and initialize inherited attributes based on the
context (e.g., variable scope).
3. For each node, compute inherited attributes based on parent or sibling information
and pass them down to child nodes.
Example:
For a variable declaration, D → type id, if [Link] is an inherited attribute representing the
data type, the value is passed from the parent node to the id node to ensure the identifier has
the correct type.
Top-down evaluation works well for grammars with only inherited attributes, as it
propagates information from the root to the leaves.
When both synthesized and inherited attributes are used, a dependency graph can be
constructed to manage complex dependencies. The graph represents how each attribute
depends on others, allowing a systematic way to evaluate all attributes in the correct order.
Steps:
1. Build a dependency graph where each node represents an attribute, and directed
edges indicate dependencies.
2. Perform a topological sort on the dependency graph to determine the evaluation order
of attributes.
3. Evaluate the attributes in the order obtained from the topological sort to ensure that
each attribute is computed only after all its dependencies are resolved.
Example:
For a grammar with both inherited and synthesized attributes in an expression tree, attributes
are calculated based on dependencies defined by the grammar. If A → B C, where
[Link] depends on [Link], the dependency graph ensures that [Link] is
computed before [Link].
Summary Table
These algorithms ensure efficient and correct composition of attributes based on the
grammar’s requirements, supporting both simple and complex attribute dependency
scenarios.
Symbol table
In compiler design, a symbol table is a data structure used to store information about the
various identifiers (symbols) in a program, such as variable names, function names, object
names, and class names. It plays a crucial role in semantic analysis by helping the compiler
manage information about scope, types, and other properties of identifiers.
A symbol table typically includes information for each identifier such as:
Identifier Name: The name of the variable, function, or any other symbol.
Type: The data type (e.g., int, float, struct) of the identifier.
Scope: Information about the scope in which the identifier is defined (e.g., global,
local).
Memory Location: The address or offset in memory where the identifier is stored.
Attributes: Additional information like size, access modifiers, parameter types (for
functions), and value (for constants).
2. Type Checking: The symbol table provides the type of each identifier, which is
necessary for verifying type consistency in expressions and assignments.
4. Error Detection: During semantic analysis, the symbol table helps detect errors like
undeclared variables, re-declaration of variables, and misuse of functions or operators.
3. Delete: Remove symbols that are no longer in scope (for example, after leaving a
function or block scope).
A symbol table can be implemented using various data structures, depending on the language
and the compiler’s needs:
Hash Table: For efficient lookups, where each identifier’s name hashes to a location.
This is a popular choice due to its average O(1) time complexity for insertions and
lookups.
Linked List: Useful for simple scope management, though it can be slower due to
O(n) time complexity for lookup.
Binary Search Tree (BST): Provides sorted entries but has an average O(log n) lookup
time.
Stack: Often used for scope handling in block-structured languages. Each new scope
(block) pushes a new symbol table onto the stack, which is then popped when the
scope ends.
int x = 5;
void function() {
int y = 10;
x = y + 2;
To manage different scopes, the symbol table can be organized as a stack of tables:
When a new scope begins (e.g., a function or block), a new symbol table is pushed
onto the stack.
When the scope ends, the symbol table is popped off, ensuring that variables are no
longer accessible outside their defined scope.
Data Types
Data types define the kind of values that variables can hold and the operations that can be
performed on them. Each programming language has its own set of data types that determine
how memory is allocated, how the values are stored, and how the data interacts with other
types.
o Floating Point (float, double): Represents real numbers with decimal points
(e.g., 3.14, -0.001).
o Structure (struct in C/C++): A custom data type that groups different types
together.
o Union: Similar to structures but stores different types in the same memory
location, using space for the largest member.
o Enumeration (enum): A data type with named integral constants, often used
for state or status flags.
o Data types like lists, stacks, queues, and trees that are defined by the
operations they support rather than by a specific storage format.
Type Checking
Type checking is the process of verifying that the types of values used in expressions,
variables, and function calls are compatible. It is a critical step in the semantic analysis phase
of compilation, as it ensures the correctness of operations according to language rules.
o Helps detect type errors before program execution, making code more reliable
and reducing runtime errors.
o Performed at runtime.
o Allows more flexibility but can lead to runtime errors if operations are applied
to incompatible types.
Type Conversion
Type checking often involves type conversion to make compatible data types interact, either
by:
Type Inference: Some languages (e.g., Haskell, Python with type hints) can infer
types based on context, reducing the need for explicit type declarations.
Strong vs. Weak Typing: Strongly typed languages enforce strict type rules, whereas
weakly typed languages allow more flexibility in type conversions, sometimes
automatically converting types even if the result may be unexpected.
Error Prevention: Ensures operations are valid for given data types, helping prevent
common errors.
Code Optimization: Allows the compiler to make assumptions about data types,
improving efficiency.
Code Clarity: Makes code more readable and predictable by enforcing consistent
data usage.
Examples
int x = 10;
x=5
y = "hello"
A program as a source code is merely a collection of text (code, statements etc.) and
to make it alive, it requires actions to be performed on the target machine.
A program contains names for procedures, identifiers etc., that require mapping with
the actual memory location at runtime.
Activation Trees
A procedure has a start and an end delimiter and everything inside it is called the body
of the procedure.
The execution of a procedure is called its activation. An activation record contains all
the necessary information required to call a procedure.
An activation record may contain the following units (depending upon the source language
used).
We assume that the program control flows in a sequential manner and when a
procedure is called, its control is transferred to the called procedure.
When a called procedure is executed, it returns the control back to the caller.
Activation Tree
The node for procedure ‘x’ is the parent of node for procedure ‘y’ if and only if the
control flows from procedure x to procedure y.
Example – Consider the following program of Quicksort
main() {
Int n;
readarray();
quicksort(1,n);
}
quicksort(int m, int n) {
Int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}
Code : It is known as the text part of a program that does not change at runtime. Its
memory requirements are known at the compile time.
Procedures : Their text part is static but they are called in a random manner. That is
why, stack storage is used to manage procedure calls and activations.
Variables : Variables are known at the runtime only, unless they are global or
constant. Heap memory allocation scheme is used for managing allocation and de-
allocation of memory for variables in runtime.
Static Allocation
In this allocation scheme, the compilation data is bound to a fixed location in the memory and
it does not change when the program executes.
As the memory requirement and storage locations are known in advance, runtime support
package for memory allocation and de-allocation is not required.
Stack Allocation
Procedure calls and their activations are managed by means of stack memory allocation.
It works in last-in-first-out (LIFO) method and this allocation strategy is very useful for re-
cursive procedure calls.
Heap Allocation
Variables local to a procedure are allocated and de-allocated only at runtime. Heap allocation
is used to dynamically allocate memory to the variables and claim it back when the variables
are no more required.
Except statically allocated memory area, both stack and heap memory can grow and shrink
dynamically and unexpectedly. Therefore, they cannot be provided with a fixed amount of
memory in the system.
Parameter Passing
r-value
The value of an expression is called its r-value. The value contained in a single
variable also becomes an r-value if it appears on the right-hand side of the assignment
operator. r-values can always be assigned to some other variable.
l-value
day = 1;
week = day * 7;
month = 1;
From this example, we understand that constant values like 1, 7, 12, and variables like
day, week, month and year, all have r-values. Only variables have l-values as they
also represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.
Formal Parameters
Variables that take the information passed by the caller procedure are called formal
parameters. These variables are declared in the definition of the called function.
Actual Parameters
Variables whose values or addresses are being passed to the called procedure are
called actual parameters. These variables are specified in the function call as
arguments.
Example:
fun_one()
}
fun_two(int formal_parameter)
print formal_parameter;
Formal parameters hold the information of the actual parameter, depending upon the
parameter passing technique used. It may be a value or an address.
Pass by Value
In pass by value mechanism, the calling procedure passes the r-value of actual
parameters and the compiler puts that into the called procedure’s activation record.
Formal parameters then hold the values passed by the calling procedure. If the values
held by the formal parameters are changed, it should have no impact on the actual
parameters.
Pass by Reference
In pass by reference mechanism, the l-value of the actual parameter is copied to the
activation record of the called procedure.
This way, the called procedure now has the address (memory location) of the actual
parameter and the formal parameter refers to the same memory location. Therefore, if
the value pointed by the formal parameter is changed, the impact should be seen on
the actual parameter as they should also point to the same value.
The compiler generates code to allocate space for local variables in the stack frame.
For example:
PUSH param1
PUSH param2
CALL function_label
3. Recursion Support
Each recursive call gets its own stack frame. For example:
int factorial(int n) {
if (n == 0) return 1;
During execution:
Languages like Pascal or Python, which support nested functions, use access links or
a display table in the stack frame to locate variables in enclosing scopes.
1. Stack Overflow
Excessive recursion or deep call chains can exhaust the stack memory.
Large local variables or excessive function calls can bloat stack usage.
During exceptions or errors, the compiler must support stack unwinding to clean up
frames and restore a consistent state.
3. Dynamic Memory: Allocates memory only when needed (on function entry) and
deallocates automatically (on function return).
int result = a + b;
return result;
int main() {
return 0;
}
Stack:
Stack:
Advanced Concepts
1. Inline Functions: Avoid stack frame creation by embedding function code directly.
2. Tail Call Optimization: Reuse stack frames for certain tail-recursive calls to prevent
stack overflow.
3. Stack vs. Heap in Runtime: Optimize allocation to balance between stack (fast,
limited) and heap (flexible, slower).
A typical memory layout for a running program consists of the following segments:
o Example: The binary code for printf() function calls or program loops.
2. Data Segment:
o Divided into two parts: Initialized Data Segment and Uninitialized Data
Segment (or BSS - Block Started by Symbol).
o Initialized Data: Stores global and static variables that have been initialized
explicitly. For example, int x = 10; (initialized global variable).
o Uninitialized Data (BSS): Stores global and static variables that are declared
but not initialized (they get default values, typically 0).
o Data segment is allocated once during program load and persists throughout
program execution.
3. Heap Segment:
4. Stack Segment:
5. Additional Areas:
|-------------------|
|-------------------|
| Environment Vars |
|-------------------|
| (grows down) |
|-------------------|
|-------------------|
| Uninitialized |
| Data (BSS) |
|-------------------|
| Initialized Data|
|-------------------|
|-------------------|
Isolation of Code and Data: Separating the code, static data, and dynamic data
segments reduces the chance of accidental overwriting of code or constants,
improving program stability.
Flexible Memory Use with Heap: The heap allows for memory to be allocated only
as needed, enabling efficient memory utilization for dynamic data.
Stack Overflow: If the stack grows too large (e.g., due to deep recursion), it can
overwrite other memory areas, causing a crash.
Fragmentation: Over time, the heap may become fragmented if frequent allocations
and deallocations leave gaps, which can reduce memory efficiency.
Fully static runtime environment
A fully static runtime environment is a memory organization approach where all memory
allocations are determined and fixed at compile time. This means that no memory allocation
or deallocation happens dynamically during program execution. In such an environment, the
program has a fixed layout in memory, and only a predetermined amount of memory is
reserved for each variable, function, and data structure.
o All variables, data structures, and functions are allocated fixed locations in
memory at compile time.
o Each element has a static, predefined memory address, which remains constant
throughout the program’s execution.
o Memory for all data must be reserved at compile time, which means data
structures like arrays or lists must have fixed sizes.
o Memory use is very efficient because each variable and function has a single,
known location in memory.
5. Limited Flexibility:
o Structures that require variable sizes (e.g., dynamic arrays, complex data
structures like linked lists or trees) are impractical in this environment.
Real-Time Systems: Where timing and predictability are critical, static memory
allocation ensures that memory access times remain constant and deterministic.
The layout in a fully static runtime environment is similar to standard memory layouts but
excludes dynamic memory regions like the heap or stack. Here’s how it typically looks:
|-------------------|
|-------------------|
|-------------------|
| Static Call Frames| ← Memory for function parameters and local variables, pre-allocated
Code Segment: Contains all the compiled program code and is read-only.
Data Segment: Holds all global variables, constants, and other initialized and
uninitialized static data.
Static Call Frames: Instead of using a dynamic stack for function calls, memory for
each function’s local variables and parameters is allocated at compile time. Each
function has its own fixed memory space, and calls are handled with pre-set
addresses.
Predictability: Fixed memory addresses ensure that memory access times are
constant, which is especially valuable in time-sensitive applications.
Reduced Complexity: Without the need for a heap or runtime stack, the memory
model is straightforward and easy to manage.
No Recursion: Recursive functions cannot be used because stack frames are not
created dynamically.
Wasted Memory: All memory must be reserved upfront, potentially leading to
unused memory if reserved space exceeds the actual need.
Dynamic Memory
Dynamic memory refers to memory that is allocated at runtime, allowing flexibility for the
program to request and release memory based on its requirements as it runs. This is in
contrast to static memory, where memory is allocated at compile-time, and stack memory,
which is managed automatically by the compiler for local variables and function calls.
1. Heap Memory:
o Unlike the stack, the heap does not have fixed-size allocations and can grow or
shrink as needed during program execution.
o Memory Leaks: Failure to release unused memory can lead to memory leaks,
causing excessive memory usage over time.
Parameter passing defines how arguments are passed to functions, influencing whether and
how changes in the function affect the original variables.
1. Pass-by-Value:
o A copy of the actual value is passed to the function. Modifications within the
function do not affect the original variable.
o Pros:
o Cons:
Can be inefficient for large data types since a copy is created.
2. Pass-by-Reference:
o The function receives a reference to the variable (e.g., its memory address),
allowing it to modify the original variable directly.
o Pros:
o Cons:
o Pros:
Allows functions to modify the original data and pass large data types
efficiently.
o Cons:
o Pros:
o Cons:
o A copy of the value is passed (like pass-by-value), modified, and then copied
back to the original variable on function return.
o Pros:
o Cons:
6. Pass-by-Name:
o Rather than passing a value or reference, the actual expression is passed and
evaluated each time it is used within the function.
o Pros:
o Cons:
Yes (after
Pass-by-Result Assigns output to the parameter Out parameters
function)
Yes
Passes an expression for delayed Functional
Pass-by-Name (depends
evaluation languages
on eval)