0% found this document useful (0 votes)
138 views20 pages

Introduction To Lex

This document introduces the flex tool for generating scanners. It discusses the structure of flex input files which contain regular expression rules and actions. The definitions, rules, and user code sections are described. Regular expressions and how they are used to match patterns are also covered. Finally, it mentions how flex can cooperate with Yacc to generate parsers and discusses additional flex resources.

Uploaded by

Avi Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views20 pages

Introduction To Lex

This document introduces the flex tool for generating scanners. It discusses the structure of flex input files which contain regular expression rules and actions. The definitions, rules, and user code sections are described. Regular expressions and how they are used to match patterns are also covered. Finally, it mentions how flex can cooperate with Yacc to generate parsers and discusses additional flex resources.

Uploaded by

Avi Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to Lex

Ying-Hung Jiang
[email protected]
Outlines

 Review of scanner
 Introduction to flex
 Regular Expression
 Cooperate with Yacc
 Resources
Review

 Parser -> scanner -> source code


 See teacher’s slide
flex
flex - fast lexical analyzer generator

 Flex is a tool for generating scanners.


 Flex source is a table of regular expressions
and corresponding program fragments.
 Generates lex.yy.c which defines a routine
yylex()
Format of the Input File

 The flex input file consists of three sections,


separated by a line with just %% in it:
definitions
%%
rules
%%
user code
Definitions Section

 The definitions section contains declarations of


simple name definitions to simplify the
scanner specification.
 Name definitions have the form:
name definition
 Example:
DIGIT [0-9]
ID [a-z][a-z0-9]*
Rules Section

 The rules section of the flex input contains a


series of rules of the form:
pattern action
 Example:
{ID} printf( "An identifier: %s\n", yytext );

 The yytext and yylength variable.


 If action is empty, the matched token is
discarded.
Action

 If the action contains a ‘{‘, the action spans


till the balancing ‘}‘ is found, as in C.
 An action consisting only of a vertical bar ('|')
means "same as the action for the next rule.“
 The return statement, as in C.
 In case no rule matches: simply copy the input
to the standard output (A default rule).
Precedence Problem

 For example: a “<“ can be matched by “<“ and


“<=“.
 The one matching most text has higher
precedence.
 If two or more have the same length, the rule
listed first in the flex input has higher
precedence.
User Code Section

 The user code section is simply copied to


lex.yy.c verbatim.
 The presence of this section is optional; if it is
missing, the second %% in the input file may be
skipped.
 In the definitions and rules sections, any
indented text or text enclosed in %{ and %} is
copied verbatim to the output (with the %{}'s
removed).
A Simple Example
%{
int num_lines = 0, num_chars = 0;
%}

%%
\n ++num_lines; ++num_chars;
. ++num_chars;

%%
main() {
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
}
Any Question So Far?
Regular Expression
Regular Expression (1/3)
x match the character 'x'
. any character (byte) except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
Regular Expression (2/3)
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
"[xyz]\"foo“ the literal string: [xyz]"foo
\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
Regular Expression (3/3)
\0 a NUL character (ASCII code 0)
\123 the character with octal value 123
\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
^r an r, but only at the beginning of a line (i.e.,
which just starting to scan, or right after a
newline has been scanned).
r$ an r, but only at the end of a line (i.e., just
before a newline). Equivalent to "r/\n".
Cooperate with Yacc

 The y.tab.h file (generated by –d option).


 See pascal.l and pascal.y for example.
Resources

 Google directory of lexer and parser


generators.
 Flex homepage:
http://www.gnu.org/software/flex
 Lex/yacc Win32 port: http://www.monmouth
.com/~wstreett/lex-yacc/lex-yacc.html
 Above links are available on course webpage.
 Flex(1)
Any Question?

You might also like