0% found this document useful (0 votes)
20 views12 pages

Lexical Analysis for CS Students

It is all about lexical analysis

Uploaded by

harsh raj chikku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views12 pages

Lexical Analysis for CS Students

It is all about lexical analysis

Uploaded by

harsh raj chikku
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Lexical Analysis

HARSH RAJ
Galgotias University
Greater Noida
Overview
tokens
source lexical analyzer syntax analyzer
program (scanner) (parser)

symbol table
manager

 Main task: to read input characters and group them into


“tokens.”
 Secondary tasks:
 Skip comments and whitespace;
 Correlate error messages with source program (e.g., line number of error).

Lexical Analysis 2
Overview (cont’d)
Input file Token
sequence
keywd_int
/ * p g m . c * / \n i n
identifier: “main”
t m a i n ( i n t a
left_paren
r g c , c h a r * * a r keywod_int
g v ) { \n \t i n t x , lexical identifier: “argc”
analyzer comma
Y ; \n \t f l o a t w ;
keywd_char
... star
star
identifier: “argv”
right_paren
left_brace
keywd_int

CSc 453: Lexical Analysis 3
Implementing Lexical Analyzers
Different approaches:
 Using a scanner generator, e.g., lex or flex. This automatically
generates a lexical analyzer from a high-level description of the tokens.
(easiest to implement; least efficient)
 Programming it in a language such as C, using the I/O facilities of the
language.
(intermediate in ease, efficiency)
 Writing it in assembly language and explicitly managing the input.
(hardest to implement, but most efficient)

CSc 453: Lexical Analysis 4


Lexical Analysis: Terminology
 token: a name for a set of input strings with related
structure.
Example: “identifier,” “integer constant”
 pattern: a rule describing the set of strings
associated with a token.
Example: “a letter followed by zero or more letters, digits, or
underscores.”
 lexeme: the actual input string that matches a
pattern.
Example: count

Lexical Analysis 5
Examples
Input: count = 123
Tokens:
identifier : Rule: “letter followed by …”
Lexeme: count
assg_op : Rule: =
Lexeme: =
integer_const : Rule: “digit followed by …”
Lexeme: 123

Lexical Analysis 6
Algorithm / Pseudo code
 BEGIN
 Initialize character pointer to the start of the source code
 WHILE not end of source code
 Skip any white spaces and newlines
 IF character is a letter
 Begin identifier or keyword
 WHILE character is a letter or digit
 Add character to current token Advance character
pointer
 END WHILE
 IF current token is a keyword
 Output keyword token
 ELSE CSc 453: Lexical Analysis 7
pseudo
• Output identifier token
• END IF
• ELSE IF
• character is a digit Begin number
• WHILE character is a digit
• Add character to current token
• Advance character pointer
• END WHILE
• Output number token
• ELSE
• Output error token Advance character pointer
• END IF
• END WHILE
Lexical Analysis 8
Regular Expressions
A pattern notation for describing certain kinds
of sets over strings:
Given an alphabet :
  is a regular exp. (denotes the language {})
 for each a  , a is a regular exp. (denotes the language
{a})
 if r and s are regular exps. denoting L(r) and L(s)
respectively, then so are:
 (r) | (s) ( denotes the language L(r)  L(s) )
 (r)(s) ( denotes the language L(r)L(s) )
 (r)* ( denotes the language L(r)* )

Lexical Analysis 9
Working of Lexical Analyzer

Lexical Analysis 10
Conclusion
Content:
•Key Takeaways:
• Definition
• Purpose
• Components
•Importance in Compiler Design:
• Error Detection
• Efficiency
• Foundation for Parsing
•Practical Considerations:
• Token Definitions
• Handling Errors
• Tools and Libraries

CSc 453: Lexical Analysis 11


Thank You!

CSc 453: Lexical Analysis 12

You might also like