AY 2022-23 J. B. Institute of Engineering and Technology B.
Tech: AIML
onwards (UGC Autonomous) III Year – II Sem
Course AUTOMATA AND COMPILER DESIGN
L T P D
Code:
(Common to AI&ML, IT)
Credits: 3 3 0 0 0
Pre-Requisites: Nil
Course objectives:
The students will:
1. Illustrate different phases of compilation.
2. Describe the steps and algorithms used by language translators and features.
3. Enumerate top down and bottom up parsing techniques used in compilation process.
4. Learn the syntax directed translation and type checking and learning the effectiveness of
optimization.
5. Develop algorithms to generate code for a target machine.
Module 1:
Formal Language and Regular Expressions: Languages, Definition Language regular
expressions, Finite Automata-DFA, NFA
Conversions: Conversion of regular expression to NFA, NFA to DFA, Epsilon NFA to NFA,
Epsilon NFA to DFA, Phases of compilation, Lexical Analyzergenerator(LEX).
Module 2:
Top down parsers: Context free grammars, derivation, parse trees, Ambiguity, LL (K) Grammars
and LL (1) parsing.
Bottom up parsers: Bottom up parsing-SR parsing,LR Parsing-SLR,CLR and LALR Parsers,
YACC tool.
Module 3:
Semantics analysis: Syntax directed translation, S-attributed and L-attributed grammars, and
Intermediate code forms-AST, Polish notation, three address codes.
Type checking: Type checking, type conversions, equivalence of type expressions,
Overloading of functions and operations. Context sensitive features- Chomsky hierarchy oflanguages
and recognizers.
Module 4:
Symbol table:Symbol table format, organization of symbol table-Linear, hashing ,tree.
Storage allocation: Activation record, Runtime stacks and heap allocation
J . B. I nstitute of Engineering and Technology Page 81
Module 5:
Code optimization: Principal sources of optimization, basic blocks, flow graphs, data flow
analysis of flow graphs, peephole optimization.
Code generation: Machine dependent code generation, object code forms, generic code
generation algorithm, register allocation and assignment using DAG representation of Block.
TEXT BOOKS:
1. Compilers Principles, Techniques & Tools, Alfred V. Aho, Monica S. Lam, Ravi Sethi and
Jeffery D. Ullman, Pearson Addison Wesley Education, Second Edition.
2. Modern Compiler Implementation in C, Andrew N. Appel, Cambridge University Press.
REFERENCE BOOKS:
1. Lex&yacc , John R. Levine, Tony Mason, Doug Brown, O‟reilly
2. Modern Compiler Design, Dick Grune, Henry E. BAL, Cariel T. H. Jacobs, Wiley dreamtech.
3. Engineering a Compiler, Cooper & Linda, Elsevier.
4. Compiler Construction, Louden, Thomson.
5. Systems Programming and Operating Systems, D
E - Resources:
1. [Link]
2. [Link]
3. [Link]
Course outcomes:
The Student will be able to:
1. Analyze phases of compilation, particularly lexical analysis, parsing, semantic
analysis and code generation.
2. Construct parsing tables for different types of parsing techniques.
3. Classify the Semantic Analysis and Intermediate code generation phase.
4. Apply code optimization techniques to different programming languages.
5. Construct object code for natural language representations.
J . B. I nstitute of Engineering and Technology Page 82
Module 1:
Formal Language and Regular Expressions: Languages, Definition Language regular expressions,
Finite Automata-DFA, NFA
Conversions: Conversion of regular expression to NFA, NFA to DFA, Epsilon NFA to NFA, Epsilon
NFA to DFA, Phases of compilation, Lexical Analyzer generator(LEX).
Fundamentals
Symbol – An atomic unit, such as a digit, character, lower-case letter, etc.
Sometimesa word.[Formal language does not deal with the “meaning”
of thesymbols.]
Alphabet – A finite set of symbols, usually denoted byΣ.
Σ ={0, 1}
Σ = {0, a,9, 4}
Σ = {a, b, c,d}
String – A finite length sequence of symbols, presumably from some
alphabet. w=0110
y=0aa
x=aabcaa
z = 111
Special string: ε (also denoted by λ)
Concatenation: wz = 0110111
Length: |w| = 4 |ε| = 0 |x| = 6
Reversal: R
y =
aa0
Some special sets ofstrings:
Σ* All strings of symbols fromΣ
Σ+ Σ* -{ε}
Example: Σ = {0,1}
Σ* = {ε, 0, 1, 00, 01, 10, 11, 000, 001,…}
Σ+ = {0, 1, 00, 01, 10, 11, 000, 001,…}
A languageis:
A set of strings from some alphabet (finite or infinite). In otherwords,
Any subset L ofΣ*
Some speciallanguages:
{}The empty set/language, containing nostring.
{ε}A language containing one string, the emptystring.
Examples:
Σ = {0,1}
L = {x | x is in Σ* and x contains an even number of 0‟s}
Σ = {0, 1, 2,…, 9, .}
L = {x | x is in Σ* and x forms a finite length real number}
= {0, 1.5, 9.326,…}
Σ = {a, b, c,…, z, A, B,…, Z}
L = {x | x is in Σ* and x is a Pascal reserved word}
= {BEGIN, END, IF,…}
*
Σ = {Pascal reserved words} U { (, ), ., :, ;,…} U {Legal Pascal identifiers} L = {x | x is in Σ and x is a
syntactically correct Pascal program}
Σ = {English words}
*
L = {x | x is in Σ and x is a syntactically correct English sentence}
Regular Expression
• A regular expression is used to specify a language, and it does soprecisely.
• Regular expressions are veryintuitive.
• Regular expressions are very useful in a variety ofcontexts.
• Given a regular expression, an NFA-ε can be constructed from itautomatically.
• Thus, so can an NFA, a DFA, and a corresponding program, allautomatically!
Definition:
Let Σ be an alphabet. The regular expressions over Σare:
Ø Represents the empty set {}
Ε Represents the set{ε}
Represents the set {a}, for any symbol a inΣ
Let r and s be regular expressions that represent the sets R and S, respectively.
r+sRepresents the set RUS (precedence3)
rsRepresents thesetRS (precedence2)
r* Represents thesetR* (highest precedence)
(r) Represents thesetR (not an op, providesprecedence)
If r is a regular expression, then L(r) is used to denote the correspondinglanguage.
Examples:
Let Σ = {0,1}
(0 +1)* All strings of 0‟s and1‟s0(0 +1)* All strings of 0‟s and 1‟s, beginning with a0
(0 +1)*1 All strings of 0‟s and 1‟s, ending with a1
(0 + 1)*0(0+1)* All strings of 0‟s and 1‟s containing at least one 0 (0 + 1)*0(0 + 1)*0(0+1)* All
strings of 0‟s and 1‟s containing at least two
0‟s (0+1)*01*01* All strings of 0‟s and 1‟s containing at least two
0‟s (101*0)* All strings of 0‟s and 1‟s containing an even number of
0‟s 1*(01*01*)* All strings of 0‟s and 1‟s containing an even number
of 0‟s (1*01*0)*1* All strings of 0‟s and 1‟s containing an even
number of0‟s
Identities:
1. Øu = uØ=Ø Multiply by0
2. εu = uε=u Multiply by1
3. Ø* =ε
4. ε* =ε
5. u+v =v+u
6. u + Ø =u
7. u + u = u
8. u* =(u*)*
9. u(v+w) =uv+uw
10. (u+v)w =uw+vw
11. (uv)*u = u(vu)*
12. (u+v)* = (u*+v)*
=u*(u+v)*
=(u+vu*)*
= (u*v*)*
=u*(vu*)*
=(u*v)*u*
Finite State
Machines
A finite state machine has a set of states and two functions called the next-state
function and the outputfunction
The set of states correspond to all the possible combinations of the internal storage
If there are n bits of storage, there are 2n possiblestates
The next state function is a combinational logic function that given the inputs and the
current state, determines the next state of thesystem
The output function produces a set of outputs from the current state and theinputs
There are two types of finite statemachines
In a Moore machine, the output only depends on the currentstate
While in a Mealy machine, the output depends both the current state and the currentinput
We are only going to deal with the Mooremachine.
These two types are equivalent incapabilities
A Finite State Machine consistsof:
Kstates:S = {s1, s2, … ,sk}, s1 is initial state Ninputs:I = {i1,
i2, …,in}
Moutputs:O = {o1, o2, …,om}
Next-state function T(S, I) mapping each current state and input to next state Output Function
P(S) specifies output
Finite Automata
Two types – both describe what are called regular languages
• Deterministic (DFA) – There is a fixed number of states and we
can only bein one state at a time
• Nondeterministic (NFA) –There is a fixed number of states but wcan bein
multiple states at onetime
While NFA‟s are more expressive than DFA‟s, we will see that adding
nondeterminism does not let us define any language that cannot be
defined by aDFA.
One way to think of this is we might write a program using a NFA,
but then when it is “compiled” we turn the NFA into an
equivalentDFA.
Formal Definition of a Finite Automaton
• Finite set of states, typicallyQ.
One state is the start/initial state, typically q0 // q0 ∈Q
• Alphabet of input symbols, typically∑
•
• Zero or more final/accepting states; the set is typically F. // F⊆Q
• A transition function, typicallyδ. Thisfunction
• Takes a state and input symbol asarguments.
Deterministic Finite Automata (DFA)
• A DFA is a five-tuple: M = (Q, Σ, δ, q0, F)
Q=A finite set ofstates
Σ=A finite inputalphabet
q0=The initial/starting state, q0 is inQ
F=A set of final/accepting states, which is a subset ofQ
Δ=A transition function, which is a total function from Q x Σ toQ
δ: (Q x Σ)–>Q δ is defined for any q in Q and s in Σ, and δ(q,s)=q‟is equal to
another state q‟ inQ.
Intuitively, δ(q,s) is the state entered by M after reading symbol s while in state q.
• LetM=(Q,Σ,δ,q,F)beaDFAandletwbeinΣ*.ThenwisacceptedbyMiff
0
δ(q ,w) = p for some state p in F.
0
• Let M = (Q, Σ, δ, q , F) be a DFA. Then the language accepted by M is theset:
0
L(M) = {w | w is in Σ* and δ(q ,w) is in F}
• Another equivalentdefinition:
L(M) = {w | w is in Σ* and w is accepted by M}
• Let L be a language. Then L is a regular language iff there exists a
DFA M such that L =L(M).
Notes:
• A DFA M = (Q, Σ, δ,q0,F) partitions the set Σ* into two sets:
L(M)and Σ* -L(M).
• If L = L(M) then L is a subset of L(M) and L(M) is a subset ofL.
• Similarly, if L(M1) = L(M2) then L(M1) is a subset of L(M2) and L(M2) is a
subset of L(M1).
• Some languages are regular, others are not. For example,if
L1 = {x | x is a string of 0's and 1's containing an even number of 1's} and L2 = {x | x = 0n1n
for some n >= 0}then L1 is regular but L2 is not.
Nondeterministic Finite Automata (NFA)
An NFA is a five-tuple: M = (Q, Σ, δ, q0,F)
Q A finite set ofstates
Σ A finite inputalphabet
q0 The initial/starting state, q0 is inQ
F A set of final/accepting states, which is a subset ofQ
δ A transition function, which is a total function from Q x Σ to2Q
δ: (Q x Σ)->2Q -2Q is the power set of Q, the set of all subsets of Q δ(q,s) -The set of
all states p such that there is atransition
labeled s from q to p δ(q,s) is a function from Q x S to 2Q (but not to Q)
Let M = (Q, Σ, δ,q0,F) be an NFA and let w be in Σ*. Then w is accepted by
M iffδ({q0}, w) contains at least one state inF.
Let M = (Q, Σ, δ,q0,F) be an NFA. Then the language accepted by M is
theset: L(M) = {w | w is in Σ* and δ({q0},w) contains at least one state inF}
Another equivalentdefinition:
L(M) = {w | w is in Σ* and w is accepted by M}
Conversion from NFA to DFA
Suppose there is an NFA N < Q, ∑, q0, δ, F > which recognizes a language L. Then the DFA D <
Q‟, ∑, q0, δ‟, F‟ > can be constructed for language L as:
Step 1: Initially Q‟ = ɸ.
Step 2: Add q0 to Q‟.
Step 3: For each state in Q‟, find the possible set of states for each input symbol using transition
function of NFA. If this set of states is not in Q‟, add it to Q‟.
Step 4: Final state of DFA will be all states with contain F (final states of NFA)
Example
Consider the following NFA shown in Figure 1.
Following are the various parameters for NFA.
Q = { q0, q1, q2 }
∑ = ( a, b )
F = { q2 }
δ (Transition Function of NFA)
Step 1: Q‟ = ɸ
Step 2: Q‟ = {q0}
Step 3: For each state in Q‟, find the states for each input symbol.
Currently, state in Q‟ is q0, find moves from q0 on input symbol a and b using transition function of
NFA and update the transition table of DFA
δ‟ (Transition Function of DFA)
Now { q0, q1 } will be considered as a single state. As its entry is not in Q‟, add it to Q‟.
So Q‟ = { q0, { q0, q1 } }
Now, moves from state { q0, q1 } on different input symbols are not present in transition table of
δ‟ ( { q0, q1 }, a ) = δ ( q0, a ) ∪ δ ( q1, a ) = { q0, q1 }
DFA, we will calculate it like:
δ‟ ( { q0, q1 }, b ) = δ ( q0, b ) ∪ δ ( q1, b ) = { q0, q2 }
Now we will update the transition table of DFA.
δ‟ (Transition Function of DFA)
Now { q0, q2 } will be considered as a single state. As its entry is not in Q‟, add it to Q‟.
So Q‟ = { q0, { q0, q1 }, { q0, q2 } }
Now, moves from state {q0, q2} on different input symbols are not present in transition table of
δ‟ ( { q0, q2 }, a ) = δ ( q0, a ) ∪ δ ( q2, a ) = { q0, q1 }
DFA, we will calculate it like:
δ‟ ( { q0, q2 }, b ) = δ ( q0, b ) ∪ δ ( q2, b ) = { q0 }
Now we will update the transition table of DFA.
δ‟ (Transition Function of DFA)
As there is no new state generated, we are done with the conversion. Final state of DFA will be state
which has q2 as its component i.e., { q0, q2 }
Following are the various parameters for DFA.
Q‟ = { q0, { q0, q1 }, { q0, q2 } }
∑ = ( a, b )
F = { { q0, q2 } } and transition function δ‟ as shown above. The final DFA for above NFA has
been shown in Figure 2.
Note : Sometimes, it is not easy to convert regular expression to DFA. First you can convert regular
expression to NFA and then NFA to DFA
Application of Finite state machine and regular expression in Lexical analysis: Lexical
analysis is the process of reading the source text of a program and converting that source code into a
sequence of tokens. The approach of design a finite state machine by using regular expression is so
useful to generates token form a given source text program. Since the lexical structure of more or
less every programming language can be specified by a regular language, a common way to
implement a lexical analysis is to; 1. Specify regular expressions for all of the kinds of tokens in the
language. The disjunction of all of the regular expressions thus describes any possible token in the
language. 2. Convert the overall regular expression specifying all possible tokens into a deterministic
finite automaton (DFA). 3. Translate the DFA into a program that simulates the DFA. This program
is the lexical analyzer. To recognize identifiers, numerals, operators, etc., implement a DFA in code.
State is an integer variable, δ is a switch statement Upon recognizing a lexeme returns its lexeme,
lexical class and restart DFA with next character in source code.