0% found this document useful (0 votes)

4 views

2

The document outlines the structure and function of a compiler, focusing on the role of the lexical analyzer (scanner) in processing source code into tokens for further analysis. It explains concepts such as regular expressions, finite automata, and the creation of a symbol table, which are essential for tokenization and syntax analysis. Additionally, it describes operations on strings and languages, providing examples of how tokens are defined and recognized within programming languages.

Uploaded by

Zooz 24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

2

Uploaded by

Zooz 24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 109

The Structure of a Compiler

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Scanner (Lexical Analyzer) Representation

➢ The scanner begins the analysis of the source program by

reading the input, character by character, and grouping
Symbol and Optimizer
characters into individual words and symbols (tokens)
Attribute
Tables
 RE ( Regular expression )
 NFA ( Non-deterministic Finite Automata )
 (Used by
DFA ( Deterministic Finite Automata ) all
 LEX Phases of
The Compiler) Code
Generator

Target machine code

1
The Role of the Lexical Analyzer
➢ The lexical Analyzer is the first phase of a compiler. A program or function which
performs lexical analysis is called a lexical analyzer, lexer, or scanner.
➢ The Main Task: is to read the input characters and produce as output a sequence of
tokens that the parser uses for syntax analysis.

token
source lexical
program analyzer parser
get next
token

symbol
table

2
The Role of the Lexical Analyzer
The Secondary Tasks:
1. Eliminating the following from the source program:
a. comments // global variables

b. whitespace a=1 + 4;

1. tab write ( a);

2. newline characters write (a,

a*2);

2. Correlating error messages from the compiler with the source program. It may keep
track of the number of newline characters seen, so that a line number can be
associated with an error message.
3. Making a copy of source program with errors marked (in some compilers)

3
What is a Language? What is going to be analysed
An alphabet (Σ) is a finite set of symbols . {a, b, c}
A symbol is an element of an alphabet. a
A word is a finite sequence of symbols drawn from the alphabet Σ. abcaa
A language (over alphabet Σ) is a set of words. {abcaa, abc, b, caa}
Σ* denotes the set of all words over the alphabet Σ.
| s | denotes the length of string | UQU | is a string of length 3
ε denotes the word of length 0, the empty word.
 denotes the empty set, or {ε}
Note1: In language theory the terms sentence and word are often used as synonyms for the term string
Note2: A language (over alphabet Σ) is a set of string (over alphabet Σ).
For example: Σ = {a}; one possible language is L = { ε, a; aa; aaa}.

4
Operations on Strings: Terms for parts of a string
TERM DEFINITION Example
e.g banana

prefix of s A string obtained by removing zero or more trailing symbols ε, b, ba, ban, ...,
of string s banana

suffix of s A string formed by deleting zero or more of the leading ε, a, na, ana, ...,
symbols of s banana

substring of s A string obtained by deleting a prefix and a suffix from s. ε, b, a, n, ba, an,
Every prefix and every suffix of s is a substring of s, but not na, nan, ...,
every substring of s is a prefix or a suffix of s. banana

subsequence of s Any string formed by deleting zero or more not necessarily ε, b, a, n, ba, bn,
contiguous symbols from s an, aa, na, nn, ...,

5
Operations on Strings

Concatenation: Concatenation of words is denoted by its position.

If x and y are strings, then the concatenation of x and y is xy
e.g. If x=dog and y= house, then xy=doghouse

Concatenation is not symmetric

Exponentiation
s0 = ε
s1 = s
s2 = ss

6
Operations on Languages or Strings
Let say we have two languages L and M, Then:
Union of L and M, L  M
L  M = {s  L or s  M}
Concatenation of L and M, LM
LM = {s  L and s  M}
Kleene closure of L, L*

L = i =0 L
* i

Positive closure

of
i
L, L+ (Kleene Plus)

L = i =1 L
+
7
Operations on Languages : Example
L is the set {A, B, . . ., Z, a, b, . . . , z}
D the set {0, 1, . . . , 9}
Since a symbol can be regarded as a string of length one, the sets L and D are each finite
languages. The following are some examples of new languages created from L and D
1. L U D is the set of letters and digits.
2. LD is the set of strings consisting of a letter followed by a digit.
3. L4 is the set of all four-letter strings.
4. L* is the set of all strings of letters, including ε, the empty string.
5. L(L U D)* is the set of all strings of letters and digits beginning with a letter.
6. D+ is the set of all strings of one or more digits.

8
➢ A token is a string of characters, categorized according to the rules as a
symbol (e.g., Identifier, Number, Comma, and so on).
➢ The process of forming tokens from an input stream of characters is called
tokenization, and the lexer categorizes them according to a symbol type.
➢ Tokenization: is frequently defined by regular expressions, which are
understood by a lexical analyzer generator such as lex.

➢ For each Lexeme, the Lexical Analyzer produces output as token of the form:
(Token-name, Attribute-value)
➢Token-name: symbol that is used during Syntax Analysis
➢Attribute-value: points to an entry in the symbol table for this token.

9
Pattern: is a rule associated with token that describes the set of strings
Lexeme: a string matched by the pattern of a token
Token: a set of strings SAMPLE INFORMAL DESCRIPTION OF
TOKEN LEXEME PATTERN
const const const
if if if
relation <,<=,=,<>,>,>= < or <= or = or <> or >= or >
id pi, count, D2 letter followed by letters and digits
num 3.1416, 0, 6.02E23 any numeric constant
literal “core dumped” any characters between “ and ”
except ”

➢ Symbol Table: is a Data Structure used to store information about various

source language constructs. The character string or lexeme forming an
identifier is saved in a symbol table entry. Later phases of the compiler might
add to this entry information such as the type of the identifier, its usage
(variable or label) and its position in storage (address).

10
Attributes are used to distinguish different lexemes in a token
E = M * C ** 2

Entry Type
<id, pointer to symbol-table entry for E> E
<assign_op, > =
<id, pointer to symbol-table entry for M > M
<mult_op, > *
<id, pointer to symbol-table entry for C> C

<exp-op, > **
2
<num, integer value 2>

Tokens affect syntax analysis &

Attributes affect semantic analysis
11
Consider this expression in the C++ programming language:
Position = initial + rate * 60
lexeme Token type Symbol
Position Identifier id,1
= Assignment OP =
Initial Identifier Id,2
Lexical Analysis
+ Addition OP +
rate Identifier Id,3
* Multi OP *
60 Integer 60
(id, 1) (=) (id, 2) (+) (id, 3) (*) (60)

12
➢ The specification of a programming language often includes a set of rules which defines the lexer.
These rules usually consist of regular expressions, and they define the set of possible
character sequences that are used to form individual tokens or lexemes.

➢ Note that is usually based on a Finite-State Machine (FSM) applying:

➢ Regular Expression
➢ Finite Automata ( Deterministic, Non-deterministic)

➢ It is encoded within its information on the possible sequences of characters that can be
contained within any of the tokens it handles

13
Describing Tokens using Regular Expression (1)
➢ We use regular expressions to describe programming language tokens.
➢ A Regular expression is built up out of simpler regular expressions using a set of defining rules
➢ A regular expression (RE) is defined inductively
a ordinary character stands for itself
ε empty string
R|S either R or S (alteration), where R,S = RE
RS R followed by S (concatenation)
R* concatenation of R 0 or more times
➢ A regular expression R describes a set of strings of characters denoted L(R)
➢ L(R) = the language defined by R: L(abc) = { abc }
➢ Each token can be defined L(hello|goodbye) = { hello, goodbye }
using a regular expression L(1(0|1)*) = all binary numbers that start with a 1

14
Describing Tokens using Regular Expression (2)
➢ A language denoted by a regular expression is said to be a regular set.
➢ Unnecessary parentheses can be avoided in regular expressions if we adopt the
conventions that:

1. The unary operator * has the highest precedence and is left associative,
2. Concatenation has the second highest precedence and is left associative,
3. | has the lowest precedence and is left associative.

(a)|((b)(c)) is equivalent to a | bc

Both expressions denote the set of strings that are either a

single a or zero or more b’s followed by one c.
15
Describing Tokens using Regular Expression (3)
Examples of regular expression over the language of :
={a, b}
a|b {a, b}
(a | b)(a | b) {aa, ab, ba, bb}
a* {, a, aa, aaa, ... }
(a | b)* The set of all strings of a’s and b’s
a | a*b The set containing the string a and all strings
consisting of zero or more a’s followed by a b
16
Describing Tokens using Regular Expression (5)
Notational Shorthands:
One or more instances
(r)+ denoting (L(r))+
r* = r + | 
r+ = r r *
Zero or one instance
r? = r | 
Character classes
[abc] = a | b | c
[a-z] = a | b | ... | z
[^a-z] = any character except [a-z]

18
Describing Tokens using Regular Expression (6)

25 12.55 1.4E10

digit → 0 | 1 | … | 9
digits → digit+
op_f → ( . digits)?
op_e → ( E ( + | - ) ? digits )?
num → digits op_f op_e

19
Given a regular expression
Generated
R Scanner
Generator P Program
(Finite State Machine )

R)
S  L( Yes

A string S P
R)
S  L(
No

20
FSM is a recognizer program for a language that takes string x as an
input and answer:
YES: if x is a sentence in the language
NO: if x is a sentence not in the language.

Types of FSM:
1. Nondeterministic Finite Automata (NFA)
2. Deterministic Finite Automata (DFA)

21
• Finite automata is TRANSITION DIAGRAM.
• Positions in a transition diagram are drawn as
circles and are called States. is a state
• The states are connected by arrows, called
transition. is a transition

• One state is labeled the Start State; it is the is the start state
initial state of the transition diagram where
control resides when we begin to recognize a
token is a final state
• One or more states is labeled the Final State;
it control when we stop recognize a token.

22
• Finite automaton (FA)
• can be used to recognize the tokens specified by a regular expression

FA = {Q, , s0, F, move }

• A FA consists of
• A finite set of states Q
• A set of input symbols  (the input symbol alphabet)
• A set of transitions (or moves) from one state to another,
labeled with characters in L
• A special start state s0 (only one)
• A set of final, or accepting, states F

23
• Example
• This machine accepts (abc+)+

( a b c +) +

a b c

24
(a | b)*abb a

start a b b
0 1 2 3

b
RE: (a | b)*abb
Input symbol
States: {0, 1, 2, 3}Q State
Input symbols: {a, b}  a b
Transition function: move 0 {0, 1} {0}
(0,a) = {0,1}, (0,b) = {0} 1 - {2}
(1,b) = {2}, (2,b) = {3} 2 - {3}
Start state: 0 3
Final states: {3} Transition Table
25
running token on FA
a

start a b b
(a | b)*abb 0 1 2 3

abb: {0} → {0, 1} → {0, 2} → {0, 3}

a b b
aabb: {0} → {0, 1} → {0, 1} → {0, 2} → {0, 3}
a a b b

26
An FA accepts an input string s if there is some path in the
transition diagram from the start state to some final state such
that the edge labels along this path spell out s
Alphabet = {a}
q1 a q2
Two choices a No transition

q0
a
q3
No transition
27
First Choice
a a

q1 a q2
a
q0
a
q3

28
First Choice
a a

q1 a q2
a
q0
a
q3

29
First Choice
a a All input is consumed

q1 a q2 “accept”
a
q0
a
q3

30
Second Choice
a a

q1 a q2
a
q0
a
q3
31
Second Choice
a a
Input cannot be consumed

q1 a q2
a
Automaton Halts
q0
a
q3 “reject”

32
aa is accepted by the NFA:

“accept”

q1 a q2 q1 a q2
a a
q0
a
q0
a
q3 q3 “reject”
because this
computation this computation
accepts aa is ignored
33
a

q1 a q2
a
q0
a
q3

34
First Choice

a
“reject”

q1 a q2
a
q0
a
q3

35
Second Choice

q1 a q2
a
q0
a
q3

36
Second Choice

q1 a q2
a
q0
a
q3 “reject”

37
Another Rejection example

a a a

q1 a q2
a
q0
a
q3

38
First Choice

a a a

q1 a q2
a
q0
a
q3

39
First Choice

a a a
Input cannot be consumed

q1 a q2 “reject”
a
q0
a Automaton halts
q3

40
Second Choice

a a a

q1 a q2
a
q0
a
q3

41
Second Choice

a a a
Input cannot be consumed

q1 a q2
a
Automaton halts
q0
a
q3 “reject”

42
Language accepted: L = {aa}

q1 a q2
a
q0
a
q3

43
Lambda Transitions  or (empty transition  )

•Note: the  symbol never appears on the input tape

q0 a q1  q2 a q3

44
a a

q0 a q1  q2 a q3

45
a a

q0 a q1  q2 a q3

46
input tape head does not move

a a
•Note: the  symbol never appears on the input tape

q0 a q1  q2 a q3

47
all input is consumed

a a

“accept”

q0 a q1  q2 a q3

String aa is accepted
48
Rejection Example

a a a

q0 a q1  q2 a q3

49
a a a

q0 a q1  q2 a q3

50
(read head doesn’t move)

a a a

q0 a q1  q2 a q3

51
Input cannot be consumed

a a a

Automaton halts
“reject”

q0 a q1  q2 a q3

String aaa is rejected

52
Language accepted: L = {aa}

q0 a q1  q2 a q3

53
Another NFA Example

q0 a q1 b q2  q3


54
a b

q0 a q1 b q2  q3


55
a b

q0 a q1 b q2  q3


56
a b

“accept”

q0 a q1 b q2  q3


57
Another String

a b a b

q0 a q1 b q2  q3


58
a b a b

q0 a q1 b q2  q3


59
a b a b

q0 a q1 b q2  q3


60
a b a b

q0 a q1 b q2  q3


61
a b a b

q0 a q1 b q2  q3


62
a b a b

q0 a q1 b q2  q3


63
a b a b

“accept”

q0 a q1 b q2  q3


0
q0 q1 0, 1 q2
1

65
Language accepted

L(M ) = {λ, 10, 1010, 101010, ...}

= {10} *
0
q0 q1 0, 1 q2
1 (redundant
state)

66
•Simple automata:

M1 M2
q0 q0

L(M1 ) = {} L(M 2 ) = {λ}

67


 (q , x ) = q1 , q2 , , qk 

q1
x resulting states with

q x
q1
following one transition
with symbol x
x

qk
68
Example of Transition Function 

 (q0 , 1) = q1

0
q0 q1 0, 1 q
2
1

69
Example of Transition Function 

 (q1,0) = {q0 , q2 }

0
q0 q1 0, 1 q
2
1

70
Example of Transition Function 
 (q0 ,  ) = {q2 }

0
q0 q1 0, 1 q
2
1

71
Example of Transition Function 

 (q2 ,1) = 

0
q0 q1 0, 1 q
2
1

72
*
Extended Transition Function 
Same with  but applied on strings
 (q0 , a ) = q1 
*

q4 q5
a a
q0 a q1 b q2  q3

73
*
Extended Transition Function 
 (q0 , aa ) = q4 , q5 
*

q4 q5
a a
q0 a q1 b q2  q3

74
*
Extended Transition Function 
 * (q0 , ab ) = q2 , q3, q0 

q4 q5
a a
q0 a q1 b q2  q3

75
76
RE
Thompson’s construction

NFA
Subset construction

DFA

77
* We can construct an NFA from a regular expression
* Thompson’s construction algorithm
1. Build the NFA inductively
2. Define rules for each base RE
3. Combine for more complex RE’s

s E f

general machine

78
start 
i f
empty string transition

start i
a f
alphabet symbol transition

79
– Suppose N(s) and N(t) are NFA for RE s and t
• for s | t, construct

 N(s) 
start f
i
 
N(t)

ε ε
E1
S F Alteration: (E1 | E2)
ε E2 ε
•New start state S ε-transitions to the start states of E1 and E2
•ε-transitions from the final/accepting states of E1 and E2 to the new final state F
80
– Suppose N(s) and N(t) are NFA for RE s and t
• for st, construct

start i N(s) N(t) f

ε ε ε ε
S E1 A E2 F Concatenation: (E1 E2)
• New start state S ε-transition to the start state of E1
• ε-transition from final/accepting state of E1 to A, ε-transition from A
to start state of E2
• ε-transitions from the final/accepting state E2 to the new final state F
81
• for s*, construct


start  
i N(s) f

E
ε ε
S A F Closure: (E*)
ε ε

82
Develop an NFA for the RE: (x | y)*
x ε
ε B C
A F First create NFA for (x | y)
ε D y E ε

x ε
ε B C
Then add in the closure
A ε ε
F operator
D y E
ε ε
ε ε
S G H
83
a
aa* | bb* a
1 2

start
0
RE: aa* | bb*

3 4
States: {0, 1, 2, 3, 4} b
b
Input symbols: {a, b}
Transition function:
(0, ) = {1, 3}, (1, a) = {2}, (2, a) = {2}
(3, b) = {4}, (4, b) = {4}
Start state: 0
Final states: {2, 4}
84

a
2 3
 
start   a b b
(a | b)*abb 0 1 6 7 8 9 10
 
b
4 5


85
RE
Thompson’s construction

NFA
Subset construction

DFA

86
A DFA is a special case of an NFA in which
1. No state has an -transition
2. For each state q and input symbol a, there is at most one edge labeled
a leaving q
Formal Definition of DFAs:
Q : Set of states, i.e. q0 , q1, q2 
: Input aplhabet, i.e. a, b  
 : Transition function
q0 : Initial state
M = (Q, ,  , q0 , F )
F : Accepting states
87
b
(a | b) * abb b

start b b
a
0 1 2 3
a
RE: (a | b)*abb
States: {0, 1, 2, 3} a
a
Input symbols: {a, b}
Transition function:
(0,a) = 1, (1,a) = 1, (2,a) = 1, (3,a) = 1
(0,b) = 0, (1,b) = 2, (2,b) = 3, (3,b) = 0
Start state: 0
Final states: {3}
88
Finding NFA States
aa* | b | ab

a a
1 4

 b
start 0 2 5
a  a b
 0 1,2,3 - -
3 1 - 4 -
2 - - 5
3 - 2 -
4 - 4 -
5 - - -
89
Is there an NFA States

a a
2
1
a
aa* | b | ab start 0 b

b 3
a b
0 1 3
1 2 3
2 2 -
3 - -

90
DFA
* Action on each input is fully determined
* Implement using table-driven approach
* More states generally required to implement RE
NFA
* May have a choice at each step
* Accepts string if there is any path to an accepting state
* Not obvious how to implement this

91
a set of NFA states  a DFA state
• Find the initial state of the DFA
• Find all the states in the DFA
• Construct the transition table
• Find the final states of the DFA
We can do that by removing every non-deterministic case
* Non- Deterministic cases:
1- States with multiple outgoing edges due to same input
2- ε transitions

92
• Solving 1: Multiple transitions a+b*
– Solve by subset construction a b
– Build new DFA based upon the power set of states on a
the NFA start 1 2
– Move (S,a) is relabeled to target a new state whenever
single input goes to multiple states

 (1,a) → 1 or 2, create new state 1/2

(1/2,a) →1/2
(1/2,b) → 2
(2,a) → - a b
a b
(2,b) → 2
Any state with “2” in name is a final state  start 1 1/2 2

94
• solving 2: ε transitions
– Any state reachable by an ε transition is “part of the state”
– ε-closure - Any state reachable from S by ε transitions is in the ε-closure; treat ε-
closure as 1 big state, always include ε-closure as part of the state

a b a b

start a ε start a b
1 2
ε-closure(2) = {2,3}
3  1 2/3 3
(1, a) → 2/3 (3, a) → -
create new state 2/3 (1, b) → - (3, b) → 3
(2/3, a) → 2/3
(2/3, b) → 3

95
NFA M a
q0 a q1  q2
b
DFA M
q0 

96
 * (q0 , a ) = {q1 , q2 }
NFA M a
q0 a q1  q2
b

DFA M
q0  a
q1, q2 

97
 * (q0 , b ) =  empty set

NFA M a
q0 a q1  q2
b

DFA M
q0  a
q1, q2 
b

 trap state
98
 (q1 , a ) = {q1 , q2 }
*

NFA M a  * (q2 , a ) = 
q0 a q1  q2 union

b q1, q2 

a
DFA M
q0  a
q1, q2 
b


99
 (q1 , b ) = {q0 }
*

NFA M a  * (q2 , b ) = {q0 }

a  union
q0 q1 q2
b q0 

b a
DFA M
q0  a
q1, q2 
b


100
NFA M a
q0 a q1  q2
b

b a
DFA M
q0  a
q1, q2 

101
END OF CONSTRUCTION

NFA M a
q0 a q1  q2 q1  F
b
a
DFA M b

q0  a
q1, q2 
q1, q2  F 

102
0
Example 2: Conversion NFA to DFA
for {q0} call it A
States 0 1 q1 1
 (A,0) = {q1, q3} call it B A B C
0
 (A,1) = {q2, q3} call it C B D E start 0, 1 q3
q0
 (B,0) = {q1} call it D C E F
 (B,1) = {q3} call it E D D E 1 q2 0

 (C,0) = {q3} it is E E 
F E F 1
 (C,1) = {q2} call it F
 (D,0) = {q1} it is D 0
B D 0
 (D,1) = {q3} it is E
0 1 1
 (E,0) = 
A 0,1  0,1
 (E,1) =  E
 (F,0) = {q3} it is E 1 0 0
 (F,1) = {q2} it is F C 1 1
103
F
a
• Prior to NFA to DFA conversion:
c
• Empty cycle removal 2 ε
ε
– Combine nodes that comprise cycle
start 1 ε 4
• Empty transition removal ε
ε 3 ε

a 2 ε
start c 4
1
ε
104
b
• Resulting DFA can be quite large
b
– Contains redundant or equivalent states 2 a
b
– find groups of equivalent states and merge them start a
1 4 5
b
a
3 a
b
Both DFAs accept
b*ab*a
b b
start
1 2 3
a a
105
• Two programs were developed at Bell Labs in mid 70’s
– Lex: transducer, transforms an input stream into the alphabet
of the grammar processed by yacc
Flex = fast lex, later developed by Free Software Foundation
– Yacc: yet another compiler/compiler

• Input to lexer generator

– List of regular expressions in priority order
– Associated action with each RE
• Output
– Program that reads input stream and breaks it up into tokens
according the the REs

106
PART 1: Convert Regular Language to RE
1. Write a regular expression for all strings of 𝟎 and 𝟏 which contains the substring 𝟎𝟏𝟏𝟎
2. Write a regular expression for all strings of 𝒀 and 𝒁 where every 𝒁 is immediately followed by
𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 5 𝒁

3. Write a regular expression for all strings of 𝑷 and 𝑸 which contains an odd number of 𝑸

PART 2: Convert Regular Language to FSM

1. 𝑴𝟏=The set of strings that has exactly 3 b (and any number of a).
2. 𝑴𝟐=The set of strings where the number of B is a multiple of 3 (and there can be any number of A).

116
PART 3: Convert RE to NFA
1. 𝑴𝟑=Construct FA that can read the RE: a (b |c) d (e | f) g (h | i)

PART 4: Convert NFA to RE

1. 𝑴4=

117
PART 5: Convert NFA to DFA
1. 𝑴5=

118
Have a terrific day

119

Basic Writing Skills Handout
100% (1)
Basic Writing Skills Handout
11 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Ling 390 Syntax HW Exercises
100% (1)
Ling 390 Syntax HW Exercises
24 pages
Lesson Plan Present Perfect Simple vs. Continuous
0% (1)
Lesson Plan Present Perfect Simple vs. Continuous
6 pages
ENG101 Short Notes
No ratings yet
ENG101 Short Notes
29 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
Chapter 3 Finite automata and lexical analysis
No ratings yet
Chapter 3 Finite automata and lexical analysis
100 pages
cd1
No ratings yet
cd1
92 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Acd Unit-2
No ratings yet
Acd Unit-2
16 pages
CD ch2
No ratings yet
CD ch2
104 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Compiler
No ratings yet
Compiler
60 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
95 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical analysis
No ratings yet
Lexical analysis
62 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
2 Lex
No ratings yet
2 Lex
45 pages
SE Compiler Chapter 2
No ratings yet
SE Compiler Chapter 2
16 pages
Lexical Analyzer 1
No ratings yet
Lexical Analyzer 1
37 pages
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
WINSEM2023-24_CSI2005_TH_VL2023240501823_2024-01-08_Reference-Material-I
No ratings yet
WINSEM2023-24_CSI2005_TH_VL2023240501823_2024-01-08_Reference-Material-I
23 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Ch3myppt
No ratings yet
Ch3myppt
59 pages
unit1
No ratings yet
unit1
34 pages
Lexical Analyser
No ratings yet
Lexical Analyser
55 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
69 pages
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
No ratings yet
The Structure of A Compiler: Any Compiler Must Perform Two Major Tasks
57 pages
Compiler 2
No ratings yet
Compiler 2
10 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Ch2 Lexical Analysis
No ratings yet
Ch2 Lexical Analysis
11 pages
Lec 06 Specification of Tokens
No ratings yet
Lec 06 Specification of Tokens
23 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Unit 6
No ratings yet
Unit 6
109 pages
2024_CD-Ch02_Lexical_Analysis
No ratings yet
2024_CD-Ch02_Lexical_Analysis
25 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
M.Suhaib Khalid PDF
No ratings yet
M.Suhaib Khalid PDF
10 pages
Lexical Analysis
No ratings yet
Lexical Analysis
36 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Compiler Design Unit-1 - 4
No ratings yet
Compiler Design Unit-1 - 4
4 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
2_Lexical Analysis
No ratings yet
2_Lexical Analysis
52 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Lecture 4 (6)
No ratings yet
Lecture 4 (6)
29 pages
Lecture 3 (6)
No ratings yet
Lecture 3 (6)
46 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Lab2
No ratings yet
Lab2
1 page
Class 9 English Grammar Ncert Solutions Determiners
No ratings yet
Class 9 English Grammar Ncert Solutions Determiners
8 pages
Mynediad Gogledd Canllaw Gramadeg Grammar Guideline
No ratings yet
Mynediad Gogledd Canllaw Gramadeg Grammar Guideline
15 pages
Causative Verbs & Subjunctive
No ratings yet
Causative Verbs & Subjunctive
36 pages
Compound Sentences and Coordinating Conjunctions: and The Second Clause Contains A Similar Idea As The First
No ratings yet
Compound Sentences and Coordinating Conjunctions: and The Second Clause Contains A Similar Idea As The First
3 pages
Dzexams Bem Anglais 935000
No ratings yet
Dzexams Bem Anglais 935000
5 pages
Linguopoetic Characteristics of Parentheses in Erkin A'zam's Works
No ratings yet
Linguopoetic Characteristics of Parentheses in Erkin A'zam's Works
4 pages
Introducing New Language
100% (1)
Introducing New Language
4 pages
English Plus 5 2-Тоқсан
No ratings yet
English Plus 5 2-Тоқсан
61 pages
Naushaba Jabeen D19314 Module 6 Assignments
No ratings yet
Naushaba Jabeen D19314 Module 6 Assignments
19 pages
Test Initial 9
No ratings yet
Test Initial 9
3 pages
Eg 3 Unit 1 Inversions PDF
No ratings yet
Eg 3 Unit 1 Inversions PDF
18 pages
LM-Q1-Week 2
No ratings yet
LM-Q1-Week 2
18 pages
MODUL WORKBOOK ENGLISH 1 FOR NURSE Fitris
100% (1)
MODUL WORKBOOK ENGLISH 1 FOR NURSE Fitris
53 pages
Can, Could, Will Able To
No ratings yet
Can, Could, Will Able To
3 pages
Code Mixing
100% (1)
Code Mixing
21 pages
Hindi 350 Words
No ratings yet
Hindi 350 Words
3 pages
Reported Speech
No ratings yet
Reported Speech
3 pages
Positive Statement Tag Questions
No ratings yet
Positive Statement Tag Questions
6 pages
Completar Las Actividades Utilizando Las Diferentes Formas Del Presente Perfecto Simple
No ratings yet
Completar Las Actividades Utilizando Las Diferentes Formas Del Presente Perfecto Simple
4 pages
MFL RATIONALE SCOPE & SEQUENCE
No ratings yet
MFL RATIONALE SCOPE & SEQUENCE
7 pages
English Summer Homework
No ratings yet
English Summer Homework
56 pages
Middle Test Nama NPM Prodi/Kelas Mata Kuliah Dosen Pengampu: Ridhani Fizi, S. PD., M. PD
No ratings yet
Middle Test Nama NPM Prodi/Kelas Mata Kuliah Dosen Pengampu: Ridhani Fizi, S. PD., M. PD
3 pages
English
No ratings yet
English
3 pages
Noun Clauses Beginning With A Question Word
No ratings yet
Noun Clauses Beginning With A Question Word
35 pages
Indirect Speech
No ratings yet
Indirect Speech
29 pages
Types of Adverbs
No ratings yet
Types of Adverbs
8 pages

2

Uploaded by

2

Uploaded by

The Structure of a Compiler

➢ The scanner begins the analysis of the source program by

Target machine code

1. tab write ( a);

2. newline characters write (a,

Concatenation: Concatenation of words is denoted by its position.

Concatenation is not symmetric

➢ Symbol Table: is a Data Structure used to store information about various

Tokens affect syntax analysis &

➢ Note that is usually based on a Finite-State Machine (FSM) applying:

(a)|((b)*(c)) is equivalent to a | b*c

Both expressions denote the set of strings that are either a

FA = {Q, , s0, F, move }

abb: {0} → {0, 1} → {0, 2} → {0, 3}

•Note: the  symbol never appears on the input tape

String aaa is rejected

L(M ) = {λ, 10, 1010, 101010, ...}

L(M1 ) = {} L(M 2 ) = {λ}

start i N(s) N(t) f

 (1,a) → 1 or 2, create new state 1/2

NFA M a  * (q2 , b ) = {q0 }

• Input to lexer generator

PART 2: Convert Regular Language to FSM

PART 4: Convert NFA to RE

You might also like

(a)|((b)(c)) is equivalent to a | bc