Compiler_Construction_Lexical_Analysis
Compiler_Construction_Lexical_Analysis
Analysis
Lecture Slides | Based on CSC412
Module
Unit 1: The Scanner
OBJECTIVES
• Keyword: int
• Identifier: x
• Operator: =
• Number: 10
Summary
• The scanner is the first phase of compilation.
lexical analyser
lexical analyser
-state the problems with hand implementation method
of constructing lexical analysers
– Transition Diagrams
Input Buffering
• The Input Buffer Approach is a technique used in lexical
analysis to optimize character-by-character scanning by
storing the source code in memory buffers.
disk reads).
memory buffers).
Disadvantages of Input Buffering
• You can construct a single TD for all of them and then you
Definitions
– Symbol: A single character (e.g., a, b, 0, 1).
– Alphabet: A finite set of symbols (e.g., {0, 1}, ASCII,
Unicode).
– String (word, sentence): A sequence of characters from an
alphabet (e.g., "hello").
– Language: A set of valid strings (words) defined over an
alphabet (e.g., { "aa", "ab", "ba", "bb" }).
Operations on Strings
Tokens
• A token is a pair: <token_name, optional_attribute>
Example:
• <id, ptr_to_symbol_table>, <number, 42>
• pattern { action }
• Input Text:
transitions).
input.
Acceptance Condition:
– An NFA accepts a string if it can move from the start state to a final
• Key Differences:
• Key Challenge:
– The number of DFA states can be much larger than the NFA!
– But a DFA is faster in execution because it has a single, clear
transition path for each input.
Conclusion & Summary
• NFA allows multiple paths and ε-moves.
• DFA ensures only one path for each input.
• Regular expressions → NFA → DFA for pattern matching in
compilers.