0% found this document useful (0 votes)
7 views20 pages

L3_FSM

Uploaded by

mekasiddu44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

L3_FSM

Uploaded by

mekasiddu44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

23CS2204

Compiler Design
Dr. Sadu Chiranjeevi
Assistant Professor
Department of Computer Science and Engineering
[email protected]

1
How to describe tokens?
• Programming language tokens can be
described by regular languages
• Regular languages
– Are easy to understand
– There is a well understood and useful theory
– They have efficient implementation
• Regular languages have been discussed in
great detail in the “Theory of Computation”
course
How to specify tokens
• Regular definitions
– Let ri be a regular expression and di be a
distinct name
– Regular definition is a sequence of
definitions of the form
d1  r1
d2 r2
…..
dn  rn
– Where each ri is a regular expression
over Σ U {d1, d2, …, di-1}
Examples
• My fax number
91-(512)-259-7586
• Σ = digit U {-, (, ) }
• Country  digit + digit2

• Area ‘(‘ digit ‘)’


+ digit3

• Exchange  digit+ digit3


• Phone  digit+ digit4
• Number  country ‘-’ area ‘-’
exchange ‘-’ phone
Examples

• My email address
[email protected]
• Σ = letter U {@, . }
• letter  a| b| …| z| A| B| …| Z
• name  letter+
• address  name ‘@’ name ‘.’ name
‘.’ name
Examples …
• Identifier
letter  a| b| …|z| A| B| …| Z
digit  0| 1| …| 9
identifier  letter(letter|digit)*|_(letter|digit)*

• Unsigned number in C
• digit  0| 1| …|9
• digits  digit+
fraction  ’.’ digits | є
exponent  (E ( ‘+’ | ‘-’ | є) digits) | є
number  digits fraction exponent
Regular expressions in specifications
• Regular expressions describe many useful languages

• Regular expressions are only specifications;


implementation is still required

• Given a string s and a regular expression R,


does s Є L(R) ?

• Solution to this problem is the basis of the lexical


analyzers

• However, just the yes/no answer is not sufficient

• Goal: Partition the input into tokens


1. Write a regular expression for lexemes of each
token
• number  digit+
2. Construct R matching all lexemes of all tokens
• R = R1 + R2 + R3 + …..
3. Let input be x1…xn
• for 1 ≤ i ≤ n check x1…xi Є L(R)
4. x1…xi Є L(R)  x1…xi Є L(Rj) for some j
• smallest such j is token class of x1…xi
5. Remove x1…xi from input; go to (3)
Transition Diagrams
• Regular expression are declarative specifications
• Transition diagram is an implementation
• A transition diagram consists of
– An input alphabet belonging to Σ
– A set of states S
– A set of transitions statei → 𝑖𝑛𝑝𝑢𝑡 statej
– A set of final states F
– A start state n
• Transition s1 →𝑎 s2 is read:
in state s1 on input 𝑎 go to state s2
• If end of input is reached in a final state then accept
• Otherwise, reject
Pictorial notation
• A state

• A final state

• Transition

• Transition from state i to state j on an


input a a
i j
How to recognize tokens
• Consider
relop  < | <= | = | <> | >= | >
id  (letter|_)(letter|digit)*
num  digit+ (‘.’ digit+)? (E(‘+’|’-’)? digit+)?
delim  blank | tab | newline
ws  delim+

• Construct an analyzer that will return


<token, attribute> pairs
Transition diagram for relops
Transition diagram for identifier
letter

Letter|_ other *

digit

Transition diagram for white spaces


delim

delim *
other
Transition diagram for unsigned numbers
Implementation of transition
diagrams
Token nexttoken() {
while(1) {
switch (state) {
……
case 10: c=nextchar();
if(isletter(c)) state=10;
elseif (isdigit(c)) state=10;
else state=11;
break;
……
}
}
}
Lexical analyzer generator
• Input to the generator
– List of regular expressions in priority order
– Associated actions for each of regular expression
(generates kind of token and other book keeping
information)

• Output of the generator


– Program that reads input character stream and breaks
that into tokens
– Reports lexical errors (unexpected characters), if any
LEX: A lexical analyzer
generator
lex.yy.c
Token C code for C
specifications LEX Lexical Compiler
Lex.l analyzer
Object code
a.out
Input Lexical
tokens
program analyzer
Format of Lex file
• A Lex program is separated into three sections by %%
delimiters. The formal of Lex source is as follows:

{ definitions }
%%
{ rules }
%%
{ user subroutines }
Format of Lex file
• Definitions include declarations of constant, variable
and regular definitions.

• Rules define the statement of form p1 {action1} p2


{action2}....pn {action}.

• Where pi describes the regular expression and action1


describes the actions what action the lexical analyzer
should take when pattern pi matches a lexeme.

• User subroutines are auxiliary procedures needed by


the actions. The subroutine can be loaded with the
lexical analyzer and compiled separately.
Lex Program
/*lex program to count number of words*/
%{
#include<stdio.h>
#include<string.h>
int i = 0;
%}

/* Rules Section*/
%%
([a-zA-Z0-9])* {i++;} /* Rule for counting number of words*/

"\n" {printf("%d\n", i); i = 0;}


%%

int yywrap(void){}

int main()
{
// The function that starts the analysis
yylex();

return 0;
}

You might also like