Compiler Design – Lexical Analysis
Tokens, Lexemes, Patterns, Role, Errors & Recovery
Presentation on
Department of Computer Science & Engineering
Pundra University of Science & Technology, Bogura – 5800
Presenter:
Md. Riaz
ID: 0322310105101024
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 2
Outline
1. Introduction
2. Tokens, Lexemes, Patterns
3. Real-Life Examples (C/Java/Python)
4. Role of Lexical Analyzer
5. Functions of Lexical Analyzer
6. Flow Diagram
7. Lexical Errors
8. Error Recovery Strategies
9. Key Takeaways
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 3
Introduction to Lexical Analysis
🎯 First phase of a compiler
🎯 Converts source code into tokens
🎯 Also called Scanning
🎯 Acts as input and provides tokens to Syntax Analyzer
Lexical Analysis is the process of converting characters into meaningful tokens
Lexical Analysis is the first phase of a compiler, also called 🔍scanning.
Its main job is to read the raw source code (a sequence of characters) and convert it
into tokens that are meaningful units for the next stage, the parser (syntax analyzer).
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 4
Steps in Lexical Analysis
1⃣ Scanning the Input – Read characters
2⃣ Lexeme Identification – Recognize sequences
3⃣ Token Generation – Produce tokens
4⃣ Symbol Table Management – Store identifiers
5⃣ Error Detection – Find invalid tokens
6⃣ Output Token Stream – Pass to parser
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 5
Steps in Lexical Analysis
Scanning the Input
● Read the source program character by character.
● Example: int a = 5;
Lexeme Identification
● Recognize meaningful sequences of characters.
● Example: int, a, =, 5, ;
Token Generation
● Convert each lexeme into a token with type &
attributes.
● Example: int → <keyword, int>
Symbol Table Management
● Store identifiers (like variables, functions) with
their attributes (type, scope).
Error Detection
● Identify invalid or unrecognized sequences early.
● Example: int @x; (invalid @).
Output Token Stream
● Pass a clean sequence of tokens to the Syntax
Analyzer for parsing.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 6
Tokens
Tokens are the smallest meaningful units in a program. Each token has a type
(e.g., keyword, identifier) and may carry an attribute (value). For example, the
literal '25' is a NUM token with value 25.
• Token → Category of lexeme
• Examples: keywords, identifiers, operators, literals
• Represented as <type, value> pairs
Lexeme Token Type Value
int Keyword -
25 Literal 25
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 7
Lexemes and Patterns
A lexeme is the actual sequence of characters that matches a token definition.
A pattern is the rule (often a regular expression) that describes the form of a token.
Patterns are usually described using Regular Expressions (RE)
For example, 'int' is a keyword lexeme, 'x' is an identifier, '=' is an operator, and
'10' is a literal.
• Lexeme → actual string matched
• Pattern → rule/regex describing a token
• Example: int x = 10;
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 8
Functions of Lexical Analyzer
👉 Remove whitespaces & comments – ignored by compiler.
👉 Error detection – detects invalid tokens early.
👉 Interface with syntax analyzer – outputs tokens as input for the parser.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 9
Flow Diagram
📄 Source Code → 🔍 Lexical Analyzer → 🔑 Tokens → 📐 Syntax Analyzer
This shows the pipeline: raw characters are scanned by the lexical analyzer,
converted to tokens, and then used by the parser to build the program’s
structure.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 10
Example
Input: int sum = a + b;
Lexical Analyzer Output:
● Token (keyword, int)
● Token (identifier, sum)
● Token (operator, =)
● Token (identifier, a)
● Token (operator, +)
● Token (identifier, b)
● Token (punctuation, ;)
In short: Lexical Analysis breaks raw source code into tokens, removes unnecessary elements (spaces,
comments), manages identifiers, and passes clean structured tokens to the syntax analyzer.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 11
Real-Life Examples (C/Java/Python)
● C: while (i <= 10)
● Java: System.out.println("Hi");
● Python: for i in range(5):
Tokens appear in all languages. In C, 'while' is a keyword token, 'i' is an identifier.
In Java, 'System', 'out', 'println' are identifiers, while in Python, 'for' and 'in' are
keywords. Though syntax differs, the process of tokenization is similar.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 12
Role of Lexical Analyzer
1. Tokenization
● Breaks source code into tokens for the parser.
● Example: float x = 12.5; → tokens: float, x, =, 12.5, ;.
2. Removing Whitespace & Comments
● Compiler ignores spaces, tabs, and comments.
Example: int x; // variable declaration
● Tokens → int, x, ;
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 13
Role of Lexical Analyzer
3. Interface with Syntax Analyzer
● Provides tokens with attributes (token name, lexeme, type).
● Acts as a bridge between raw code and parsing stage.
4. Symbol Table Management
● Stores identifiers (e.g., variable names, function names) with attributes (type, scope, memory location).
👉 In short: The lexical analyzer simplifies the input program, prepares
tokens, and passes them on for syntax analysis.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 14
Error Handling in Lexical Analysis
Lexical errors occur when invalid characters or malformed tokens appear.
Types of Lexical Errors
● Invalid Characters: int @x = 5; (@ not allowed)
● Unterminated String: "Hello
● Invalid Number: 123.45.67
● Illegal Symbol: float%value; (% not valid here)
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 15
Error Recovery Strategies
👉Panic Mode Recovery
● Skip invalid input until a valid token is found.
● Example: int #x; → skip #.
👉Error Productions
● Predefine common mistakes in grammar.
● Example: treating "endd" as "end".
👉Insertion/Deletion
● Insert or delete characters to make valid tokens.
● Example: pritn → print.
👉Replacement
● Replace an invalid character with a valid one.
● Example: flot → float.
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 16
Strategy
Error Type Example Fix Strategy
Invalid Character int @x; Panic Mode
Unterminated String "Hello Insert missing "
Malformed Number 123.45.67 Delete extra .
Illegal Symbol float%num; Replace % → *
Friday, Sept 05, 2025 Department of Computer Science & Engineering, PUB 17
Key Takeaways
✅ Lexical Analysis = Scanning + Tokenization
✅ Steps: Buffering → Tokens → Parser
✅ Role: Simplifies code & bridges to parser
✅ Errors: Handled using recovery strategies
✅ Ensures smooth compilation
Department of Computer Science & Engineering, PUB
Any Questions?
Thank You.

Analyze Lexical Analysis - Compiler Design

  • 1.
    Compiler Design –Lexical Analysis Tokens, Lexemes, Patterns, Role, Errors & Recovery Presentation on Department of Computer Science & Engineering Pundra University of Science & Technology, Bogura – 5800 Presenter: Md. Riaz ID: 0322310105101024
  • 2.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 2 Outline 1. Introduction 2. Tokens, Lexemes, Patterns 3. Real-Life Examples (C/Java/Python) 4. Role of Lexical Analyzer 5. Functions of Lexical Analyzer 6. Flow Diagram 7. Lexical Errors 8. Error Recovery Strategies 9. Key Takeaways
  • 3.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 3 Introduction to Lexical Analysis 🎯 First phase of a compiler 🎯 Converts source code into tokens 🎯 Also called Scanning 🎯 Acts as input and provides tokens to Syntax Analyzer Lexical Analysis is the process of converting characters into meaningful tokens Lexical Analysis is the first phase of a compiler, also called 🔍scanning. Its main job is to read the raw source code (a sequence of characters) and convert it into tokens that are meaningful units for the next stage, the parser (syntax analyzer).
  • 4.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 4 Steps in Lexical Analysis 1⃣ Scanning the Input – Read characters 2⃣ Lexeme Identification – Recognize sequences 3⃣ Token Generation – Produce tokens 4⃣ Symbol Table Management – Store identifiers 5⃣ Error Detection – Find invalid tokens 6⃣ Output Token Stream – Pass to parser
  • 5.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 5 Steps in Lexical Analysis Scanning the Input ● Read the source program character by character. ● Example: int a = 5; Lexeme Identification ● Recognize meaningful sequences of characters. ● Example: int, a, =, 5, ; Token Generation ● Convert each lexeme into a token with type & attributes. ● Example: int → <keyword, int> Symbol Table Management ● Store identifiers (like variables, functions) with their attributes (type, scope). Error Detection ● Identify invalid or unrecognized sequences early. ● Example: int @x; (invalid @). Output Token Stream ● Pass a clean sequence of tokens to the Syntax Analyzer for parsing.
  • 6.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 6 Tokens Tokens are the smallest meaningful units in a program. Each token has a type (e.g., keyword, identifier) and may carry an attribute (value). For example, the literal '25' is a NUM token with value 25. • Token → Category of lexeme • Examples: keywords, identifiers, operators, literals • Represented as <type, value> pairs Lexeme Token Type Value int Keyword - 25 Literal 25
  • 7.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 7 Lexemes and Patterns A lexeme is the actual sequence of characters that matches a token definition. A pattern is the rule (often a regular expression) that describes the form of a token. Patterns are usually described using Regular Expressions (RE) For example, 'int' is a keyword lexeme, 'x' is an identifier, '=' is an operator, and '10' is a literal. • Lexeme → actual string matched • Pattern → rule/regex describing a token • Example: int x = 10;
  • 8.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 8 Functions of Lexical Analyzer 👉 Remove whitespaces & comments – ignored by compiler. 👉 Error detection – detects invalid tokens early. 👉 Interface with syntax analyzer – outputs tokens as input for the parser.
  • 9.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 9 Flow Diagram 📄 Source Code → 🔍 Lexical Analyzer → 🔑 Tokens → 📐 Syntax Analyzer This shows the pipeline: raw characters are scanned by the lexical analyzer, converted to tokens, and then used by the parser to build the program’s structure.
  • 10.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 10 Example Input: int sum = a + b; Lexical Analyzer Output: ● Token (keyword, int) ● Token (identifier, sum) ● Token (operator, =) ● Token (identifier, a) ● Token (operator, +) ● Token (identifier, b) ● Token (punctuation, ;) In short: Lexical Analysis breaks raw source code into tokens, removes unnecessary elements (spaces, comments), manages identifiers, and passes clean structured tokens to the syntax analyzer.
  • 11.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 11 Real-Life Examples (C/Java/Python) ● C: while (i <= 10) ● Java: System.out.println("Hi"); ● Python: for i in range(5): Tokens appear in all languages. In C, 'while' is a keyword token, 'i' is an identifier. In Java, 'System', 'out', 'println' are identifiers, while in Python, 'for' and 'in' are keywords. Though syntax differs, the process of tokenization is similar.
  • 12.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 12 Role of Lexical Analyzer 1. Tokenization ● Breaks source code into tokens for the parser. ● Example: float x = 12.5; → tokens: float, x, =, 12.5, ;. 2. Removing Whitespace & Comments ● Compiler ignores spaces, tabs, and comments. Example: int x; // variable declaration ● Tokens → int, x, ;
  • 13.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 13 Role of Lexical Analyzer 3. Interface with Syntax Analyzer ● Provides tokens with attributes (token name, lexeme, type). ● Acts as a bridge between raw code and parsing stage. 4. Symbol Table Management ● Stores identifiers (e.g., variable names, function names) with attributes (type, scope, memory location). 👉 In short: The lexical analyzer simplifies the input program, prepares tokens, and passes them on for syntax analysis.
  • 14.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 14 Error Handling in Lexical Analysis Lexical errors occur when invalid characters or malformed tokens appear. Types of Lexical Errors ● Invalid Characters: int @x = 5; (@ not allowed) ● Unterminated String: "Hello ● Invalid Number: 123.45.67 ● Illegal Symbol: float%value; (% not valid here)
  • 15.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 15 Error Recovery Strategies 👉Panic Mode Recovery ● Skip invalid input until a valid token is found. ● Example: int #x; → skip #. 👉Error Productions ● Predefine common mistakes in grammar. ● Example: treating "endd" as "end". 👉Insertion/Deletion ● Insert or delete characters to make valid tokens. ● Example: pritn → print. 👉Replacement ● Replace an invalid character with a valid one. ● Example: flot → float.
  • 16.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 16 Strategy Error Type Example Fix Strategy Invalid Character int @x; Panic Mode Unterminated String "Hello Insert missing " Malformed Number 123.45.67 Delete extra . Illegal Symbol float%num; Replace % → *
  • 17.
    Friday, Sept 05,2025 Department of Computer Science & Engineering, PUB 17 Key Takeaways ✅ Lexical Analysis = Scanning + Tokenization ✅ Steps: Buffering → Tokens → Parser ✅ Role: Simplifies code & bridges to parser ✅ Errors: Handled using recovery strategies ✅ Ensures smooth compilation
  • 18.
    Department of ComputerScience & Engineering, PUB Any Questions? Thank You.