study of Lex
study of Lex
Lex is a tool that have hen built for constructing lexical analyzers fron spccial purposc notalions
bascd on regular cxpressions.
First, a spccification of a lexical anal yzcr is preparcd by creating a program lex . 1in-thç Lex
languagc. Then, lex.l is run througlh thc Lex compiler to produco aC program lcx. yy.c. Thc
program lex. yy . cconsists of a tabular representation of a transitioa diagrarn constructed from
the regular expressions of lex.1, togclher with astandard routine that uscs the table to recognizc
lexemes, The actions associated with regular expressions in lcx.l arc picces of C code and arc
cauried over directlyto lex. yy. c. Finally, lex.yy . c istun through the Cconpiler to produce an
object program a.out, which is the lexical analyzer that transforns an input strean into a
Sequence of tokens.
Lex
Source Lex
progran1 compiler
lex.l
C
lex.yy,c compiler
a.ont
sequence
inpui A.Out of
streaI
to%ens
(ranslation rules
auxiliary proccdures
The declarations section includes declarations of variables, manifest constants and regular
delinitions. The iranslation rules of a Lex program are statements of the form
{action,}
{action }
Pn (action, }
where each p, is a regular expression and each action, is a program fragment describing what
action the lexical analyzer should take when pattern pi matches a lexeme. The third section holds
whatever auxiliary procedures are nceded by the actions. Alternatively, these proccdurcs can be
compiled separately and loaded with the lexical analyzer.
The lexical analyzer retums asingle quantity, the token, to the parser. Topass an attribute value
vith inormation about the lexcmc, we can set a global variable callcd yylval.
LEX DEFINITIONS
Table wvith two columns:
I. regular expressions
2. actions
g:
integer printf("found keyword INT");
Ifaction has more than one statement, enclose it within }
REGULAR EXPRESSIONS
lext characters: a - z, 0 - 9, space..
n:newline.
\: tab.
operators: "\[)^-?. *+|()S/{%<>
": treat '..' as text characters (useful for spaces).
n:treat next character as text character.
: match anylhing.
[...]: match anything within 0
?: match zero or one lime, eg: ab?c ! aC, abc
: match zero or more times, eg: ab*c ! ac, abc, abbc...
t:match one or more times, eg: abtc!abc, abbc...
(..): group .., eg: (ab)+ ! ab, abab...
|: alternation, eg ablcd ! ab, cd
{n,1) :repitilion, eg a{1,3} ! a, aa, aaa
{cdefn} :substitute lefn (from first section).
Aclions
:>Null action.
ECHO; ’ printf("%s", yytext);
(...’Multi-statement action.
relurn yytext; > send contents of yytext to the parser.
yylext : C-String of matched characters (Make a copy if neccessary!)
yylen : Lengthof tlhe matchcd characters.
Figure 2: LEX Template
[0-9)
YYlval.anInt atoi( (char *) &cyytext [0]);
return INTEGER,;
return *yytext;