0% found this document useful (0 votes)
66 views8 pages

INTRO

This document summarizes the key steps in the compilation process: 1. The compilation process translates a program written in a high-level source language into an equivalent program in a low-level machine language. This translation is done by a compiler. 2. A compiler takes a source program as input, analyzes it to detect errors, and outputs an executable target program. The main steps are lexical analysis, syntax analysis, semantic analysis, and code generation. 3. Lexical analysis converts the source code into tokens. Syntax analysis checks that the tokens follow the rules of the grammar to build a parse tree. Semantic analysis validates the meaning and checks for type compatibility. Code generation outputs the final machine-

Uploaded by

lounabezart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views8 pages

INTRO

This document summarizes the key steps in the compilation process: 1. The compilation process translates a program written in a high-level source language into an equivalent program in a low-level machine language. This translation is done by a compiler. 2. A compiler takes a source program as input, analyzes it to detect errors, and outputs an executable target program. The main steps are lexical analysis, syntax analysis, semantic analysis, and code generation. 3. Lexical analysis converts the source code into tokens. Syntax analysis checks that the tokens follow the rules of the grammar to build a parse tree. Semantic analysis validates the meaning and checks for type compatibility. Code generation outputs the final machine-

Uploaded by

lounabezart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 1 : Compilation

Chapter 1: Compilation
1. Introduction

The languages used for communication between humans vary from one culture to
another. To carry out a dialogue between two persons A and B, the language must be familiar
to both. Otherwise, the intervention of a third person C, who masters the two languages of A
and B, is essential. Person C will act as an interpreter to translate the discussions between
Persons A and B.

When it comes to human-machine interaction, the user creates a source code that the
machine then runs. This process is not as simple, because the source code is written in
advanced language (C, C++, Java…) and the machine only executes elementary instructions
(binary). To achieve this, a translation must be done between the source and target programs.
This translation is called the Compilation.

2. What is a Compiler?

A compiler is a program that takes as input a program 𝑎, written in a source language 𝐴,


and translates it into an equivalent program 𝑏 in a target language 𝐵. The source language is a
high-level language and the target language is a machine language.

1
Dr. KHERICI Nada
Chapter 1 : Compilation

Source program (a) Compiler Target program (b)

High-level language Machine language


(A) Error messages (B)

Figure 1 : Compilation
Compilation cannot be done unless the source code is well formulated and follows the
rules of the source language. An important role of a compiler is the detection of errors made
during the creation of the source program 𝑎.

2.1.Compilation and Execution

The result of the compilation process is an object program that will be loaded into memory
to be executed by the processor. The compiler only detects errors related to the syntax of the
source program. The table below illustrates the difference between compile and run errors.

Example :

Compilation errors Execution errors

𝑎 = ((𝑏 + 𝑐) ∗ 𝑑; // a missing parenthesis 𝑥 = 100/(𝑎 − 𝑎); // division by 0

𝐈𝐟 𝑥 < 𝑏 𝑓𝑖𝑛1() ; // a semicolon before else 𝑖𝑛𝑡 𝑡𝑎𝑏[5] ;


𝐞𝐥𝐬𝐞 𝑓𝑖𝑛2() ; 𝑡𝑎𝑏[5] = 2 ; // index out of range

2.2.Compiler vs Interpreter

A Compiler is responsible for translating the source program without going through the
execution of the target program. The final result of the compilation process is a machine-
language program.

An Interpreter is another kind of language translation. The input of an interpreter is not


limited to the source program but also to the data to be executed. The source program is
interpreted while it is running, and the output is the outcome of the source program's
execution.

2
Dr. KHERICI Nada
Chapter 1 : Compilation

Code source Interpreter


Output result
Input

Error Message

Figure 2 : Interpreter

3. Compilation phases

The compilation of a program goes through two phases: the analysis phase and the synthesis
phase.

Figure 3: Compilation phases

3.1.Analysis phase

The analysis phase is known as the front-end1 of the compiler. It permits to read the text of the
source program and then divide it into parts. Finally, the analyzer detects lexical, grammatical,
and syntactic errors. This phase generates an intermediate representation of the source
program and the symbol table to supply them to the following phase which is the synthesis
phase (as shown in the figure). It consists of three phases of analysis which are: lexical
analysis, syntax analysis, and semantic analysis.

3.2.Synthesis phase

The synthesis phase represents the back-end part2 of the compiler. It generates the target
program using the intermediate representation of the source code and the symbol table. This

1
A part of a program responsible only for the user interface that allows it to interact with a back-end part.
2
A part of a program that is not directly accessible by users who must go through the front-end.
3
Dr. KHERICI Nada
Chapter 1 : Compilation

phase consists of three phases of synthesis which are: the production of the intermediate code,
the optimization of the intermediate code, and the production of the target code.

4. Compiler structure

The compilation process is a sequence of phases. Each phase includes its own representation
of the source program, uses the output of the previous phase as input, and provides an output
for the next phase. The figure summarizes the compilation steps.

Figure 4: Compiler structure

4
Dr. KHERICI Nada
Chapter 1 : Compilation

Let’s take the example of the following assignment statement and analyze it through the
compilation process. This assignment is written in C language.

𝑧 =𝑥∗𝑦+5 (1)

4.1.Lexical Analysis

The first phase of compilation is the lexical analysis. It is carried out by a lexical analyzer
(also called Scanner). The principal roles of a Scanner are :

 Scans the source code as a stream of characters.


 Converts the scanned text into meaningful lexemes3.
 Represents these lexemes in the form of tokens as: < 𝑡𝑜𝑘𝑒𝑛_𝑛𝑎𝑚𝑒, 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒_𝑣𝑎𝑙𝑢𝑒 >
and stock it in table of symbols.
 Ignores insignificant information (spaces, comments, etc.)
 Passes the lexemes to the Parser.

In the token, the first component token-name is an abstract symbol that is used during
syntax analysis, and the second component attribute-value points to an entry in the symbol
table for this token. Lexemes can be reserved words (𝑤ℎ𝑖𝑙𝑒, 𝑖𝑓, 𝑒𝑙𝑠𝑒, etc.), identifiers,
operators (+, −, =, ==, etc.), numeric constants, etc.

If we take assignment (1), characters could be grouped into the following lexemes and
mapped into the following tokens passed on to the syntax analyzer:

1. The lexeme 𝑧 is mapped into a token < 𝑖𝑑, 1 >, where 𝑖𝑑 is an abstract symbol standing
for identifier and 1 refers to the symbol table entry for 𝑧. The symbol-table entry for an
identifier holds information about the identifier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token <=>. Since this
token needs no attribute value, we have omitted the second component. We could have
used any abstract symbol such as assign for the token name, but for notational
convenience, we have chosen to use the lexeme itself as the name of the abstract symbol.
3. The lexeme 𝑥 is mapped into the token < 𝑖𝑑, 2 >, where 2 points to the symbol table
entry for the variable 𝑥.
4. The symbol ∗ is a lexeme that is mapped into the token <∗>.

3
A lexeme is a unit of meaning. We can also find lexical unit or entity, token, word.
5
Dr. KHERICI Nada
Chapter 1 : Compilation

5. The lexeme 𝑦 is mapped into the token < 𝑖𝑑, 3 >, where 3 points to the symbol table
entry for the variable 𝑦.
6. The symbol + is a lexeme that is mapped into the token < +>.
7. The integer 5 is a lexeme that is mapped into the token < 5 >.

Spaces between lexemes would be discarded by the lexical analyzer. After lexical
analysis, the representation of the assignment statement in (1) is :

< 𝑖𝑑, 1 ><=> < 𝑖𝑑, 2 ><∗>< 𝑖𝑑, 3 >< +>< 5 > (2)

4.2.Syntax Analysis

The next phase is called the syntax analysis or parsing. The Parser (synaxis analyzer)
takes the token produced by the Scanner as input and generates a parse tree (or syntax tree).
This stage involves checking the tokens with the source code grammar. The parser ensures
that the expression made by these tokens is syntactically correct.

id1 +

* 5

id2 id3

Figure 5: Syntax Tree

4.3.Semantic Analysis

Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, the assignment of values is between compatible data types, and
adding a string to an integer. Also, the semantic analyzer keeps track of identifiers, their types

6
Dr. KHERICI Nada
Chapter 1 : Compilation

and expressions; whether identifiers are declared before use or not, etc. The semantic analyzer
produces an annotated syntax tree as an output.

In the last example, the Semantic analyzer must check if the variables x, y, and z have the
same type. If z is an integer, the variables x and y must be also integers, or an error is
detected. Assuming that in this example x, y, and z are floats, a conversion of the integer 5 to
5.0 is done by the analyzer.

4.4.Intermediate Code Generation

After semantic analysis, the compiler generates an intermediate code of the source code
for the target machine. It is in between the high-level language and the machine language.
This intermediate code should be generated in such a way that it makes it easier to translate
into the target machine code.

for the last example, the intermediate code is :

𝑡𝑒𝑚𝑝1 = 𝑖𝑛𝑡 𝑡𝑜 𝑓𝑙𝑜𝑎𝑡(5)


𝑡𝑒𝑚𝑝2 = 𝑖𝑑2 ∗ 𝑖𝑑3
𝑡𝑒𝑚𝑝3 = 𝑡𝑒𝑚𝑝2 + 𝑡𝑒𝑚𝑝1
𝑖𝑑1 = 𝑡𝑒𝑚𝑝3
4.5.Code Optimization

The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines and arranges the sequence of
statements to speed up the program execution without wasting resources (CPU, memory).

The last intermediate code can be optimized to the following :

𝑡𝑒𝑚𝑝1 = 𝑖𝑑2 ∗ 𝑖𝑑3


𝑖𝑑1 = 𝑡𝑒𝑚𝑝1 + 5.0
4.6.Code Generation

In this phase, the code generator takes the optimized representation of the intermediate
code and maps it to the target machine language. The code generator translates the
intermediate code into a sequence of machine code. This sequence executes the same tasks as
the intermediate code would do.

The code generation of the last optimized code is :

7
Dr. KHERICI Nada
Chapter 1 : Compilation

movf x, r2
movf y, r1
mulf r1, r2
addf 5.0, r2
movf r2, z
4.7.Symbol Table

It is a data structure maintained throughout all the phases of a compiler. All the identifiers’
names along with their types are stored here. The symbol table makes it easier for the
compiler to quickly search the identifier record and retrieve it.

The manager of the symbol table proceeds as the following:

- In the lexical analysis, it inserts a new entry in the symbol table for every first scanned
identifier.
- In the syntaxis phase, it associates types to identifiers.
- In the semantic phase, it verifies the type convenience.

4.8.Error handler

At each error detection, the error manager steps in to give a clear diagnosis to locate and
correct the error raised. Depending on the current compilation phase, here are some possible
errors:

- Lexical analysis: Misspelling or typing error of a lexeme.


- Syntactic analysis: Missing parenthesis, missing number of operands.
- Semantic analysis: Incompatibility of types.
- Intermediate code generation: Operand incompatibility.
- Code optimization: inaccessible declaration.
- Code generation: memory limit for saving a variable.

8
Dr. KHERICI Nada

You might also like