0% found this document useful (0 votes)

16 views

Compiler_Construction_Lexical_Analysis

The document covers the fundamentals of lexical analysis in compiler construction, detailing the role of the scanner in converting source code into tokens and simplifying syntax analysis. It discusses methods for implementing lexical analyzers, including input buffering and transition diagrams, as well as the use of regular expressions for token recognition. Additionally, it introduces automatic generation of lexical analyzers using tools like Lex and Flex, and explains the conversion of regular expressions to finite automata.

Uploaded by

Anthony Yunusa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Compiler_Construction_Lexical_Analysis

Uploaded by

Anthony Yunusa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 63

Compiler Construction: Lexical

Analysis
Lecture Slides | Based on CSC412
Module
Unit 1: The Scanner
OBJECTIVES

At the end of this unit, you should be able to:

• state the need of a compiler
• state the role of a compiler
• define the scanner
• state the functions of the scanner.
• Introduction to lexical analysis, tokens, and
scanning.

• The lexical analyzer (scanner) reads source code character by character

and converts it into tokens.

• Tokens: The smallest units in a program (keywords, identifiers,

operators, etc.).

• The scanner helps in simplifying syntax analysis by grouping meaningful

symbols together.
Need for Lexical Analysis

• Simplifies parser design: Tokens are easier to analyze than raw

characters.

• Removes whitespace, comments, and unnecessary symbols.

• Keeps track of line numbers and errors.

Role of Lexical Analyzer
• Acts as an interface between source code and parser.
• Two approaches:
– Separate pass (writes tokens to an intermediate file).
– Integrated with the parser (called as needed).

• Example: If input is x = y + 5;, the scanner generates: id(x) = id(y) +

num(5) ;
The Scanner
• Definition: Groups characters into meaningful tokens.

• Error Handling: Detects errors (e.g., incomplete strings, invalid

symbols).

• Pattern Matching: Uses regular expressions to recognize

tokens.
• Example: Tokenizing int x = 10;

• Keyword: int

• Identifier: x

• Operator: =

• Number: 10
Summary
• The scanner is the first phase of compilation.

• Converts raw input into tokens for further processing.

• Uses pattern matching techniques (Regular Expressions, Finite

Automata).
Unit 2: Hand Implementation of
Lexical Analyzer
OBJECTIVES

• At the end of this unit, you should be able to:

-list the various methods of constructing a lexical analyser

-describe the input buffering method of constructing a

lexical analyser

-explain the transition diagram method of constructing a

lexical analyser
-state the problems with hand implementation method
of constructing lexical analysers

-construct transition diagrams to handle keywords,

identifiers and delimiters.
Introduction
Lexical analyzers can be implemented manually or using tools.

• Manual implementation is based on pattern matching

techniques.

• Two major methods:

– Input Buffering

– Transition Diagrams
Input Buffering
• The Input Buffer Approach is a technique used in lexical
analysis to optimize character-by-character scanning by
storing the source code in memory buffers.

• Instead of reading one character at a time from disk (which is

slow), the scanner loads a block of text into memory and uses
two pointers to track tokens efficiently.
Working of Input Buffering
• The lexical analyzer reads input characters into a buffer
(usually two-buffer scheme).
• The forward pointer moves ahead to scan characters and
recognize tokens.
• If a token is recognized, the lexeme is extracted and
processed.
• If lookahead is needed, the forward pointer moves ahead
while lexeme_begin stays.
• If the forward pointer reaches the buffer end, the next half
is loaded from the source file.
Advantages of Input Buffering

• Speeds up lexical analysis (reduces character-by-character

disk reads).

• Efficient in handling lookahead (especially for multi-character

tokens like ==).

• Prevents unnecessary disk accesses (since input is loaded in

memory buffers).
Disadvantages of Input Buffering

• Requires memory space to store the buffers.

• Complex implementation (managing buffer boundaries).

• Handling lookahead requires extra logic.

Transition Diagrams
• A Transition Diagram (TD) Approach is a finite state machine (FSM)-
based method used in lexical analysis to recognize tokens.

• A transition diagram consists of:

– States (nodes) → Represent different steps in recognizing a token.
– Transitions (edges) → Define how the scanner moves between states based on
input characters.
– Final (accepting) state → When reached, a token is successfully recognized.
– Start state → The initial state where scanning begins.
– Error state → If an invalid character is encountered, the scanner rejects the
input.
(Start)

'i' → 'f' → (Final) → "IF keyword"

(Letter) → (Letter | Digit) → (Final) → "Identifier"

How to Handle Keywords
• There are two ways we can handle keywords.
– We can use the transition diagrams for the identifier and when you
get a delimiter, you look up a dictionary (that contains all the
keywords) to see if the identifier you are seeing is a keyword or
not.

– Another way is to bring all the keywords together in a TD i.e.

construct a TD for each keyword.
• E.g. Suppose the following keywords exist in a language:

BEGIN, END, IF, THEN, ELSE.

• You can construct a single TD for all of them and then you

have something like the diagram below

Advantages of Transition Diagrams
• Clear visual representation of token recognition.

• Easy to implement using finite state machines (FSMs).

• Handles different token patterns in a structured way.

Disadvantages of Transition Diagrams

• Can become complex for large grammars.

• Hard to manage when multiple tokens have overlapping

patterns.

• Does not inherently handle whitespace and comments.

Automatic Generation of Lexical Analyzer
• In the previous unit, we discussed manual (hand)
implementation of lexical analysers and the difficulties
involved. This unit introduces automatic lexical analyser
generation using regular expressions and tools such as Lex
and Flex.

• Instead of writing a lexer manually, we can use automated

tools that take a description of tokens and generate a lexer
automatically.
Language Theory Background

To understand lexical analysers, we need a foundation in

language theory.

Definitions
– Symbol: A single character (e.g., a, b, 0, 1).
– Alphabet: A finite set of symbols (e.g., {0, 1}, ASCII,
Unicode).
– String (word, sentence): A sequence of characters from an
alphabet (e.g., "hello").
– Language: A set of valid strings (words) defined over an
alphabet (e.g., { "aa", "ab", "ba", "bb" }).
Operations on Strings

• Concatenation: Combining two strings ("ab" + "cd" = "abcd").

• Exponentiation: Repeating a string multiple times ("a"^3 =
"aaa").
• Prefix, Suffix, Substring, Subsequence: Different ways to
extract parts of a string.
Operations on Languages

• Union (L1 ∪ L2) → Combines languages (e.g., { "a", "b" } ∪

{ "b", "c" } = { "a", "b", "c" }).

• Concatenation (L1L2) → Joins words (e.g., { "a", "b" } { "c", "d"

} = { "ac", "ad", "bc", "bd" }).

• Exponentiation (L^n) → Repeats elements (L^2 = { "aa", "ab",

"ba", "bb" }).
• Kleene Closure (L*) → Infinite repetition, including the empty
string ({ ε, "a", "aa", "aaa", ... }).

• Kleene Closure Always Includes ε (Empty String).

Regular Expressions (REs)
• Regular expressions (REs) are used to describe patterns in
text.They define regular languages, which are recognized by
finite automata.

• Definition of Regular Expressions

– Basic rules:
• ε → Represents an empty string.
• A single character (e.g., a, b, 0, 1) represents a language
{ a }.
• Union (|) → Matches either pattern (a | b matches "a"
or "b").
• Concatenation (rs) → Joins two expressions ("ab" means "a"
followed by "b").

• Kleene Star () → Matches zero or more repetitions (a

matches ε, "a", "aa", "aaa", ...).
Example:
The RE (a|b)*aa represents words that:

– Contain only "a" and "b".

– End with "aa".
– Examples: "aa", "baa", "abaa", "bbaaa", etc.

• Regular Expressions Are Used in Languages Like:

awk, Java, JavaScript, Perl, Python, grep, sed
Lex is a tool that extends regular expressions for tokenizing
input. Special symbols used in Lex REs:

Example of a Lex Regular Expression:

• "a.*b" matches "a", followed by any characters, ending
in "b".
Tokens, Patterns, Lexemes, and Attributes

Tokens
• A token is a pair: <token_name, optional_attribute>
Example:
• <id, ptr_to_symbol_table>, <number, 42>

Patterns and Lexemes

• Pattern: A description of lexemes (using REs).
• Lexeme: A sequence of characters matching a pattern.
Examples of Patterns and Lexemes
Specifying a Lexical Analyser with Lex

Structure of a Lex Program

• Lex programs consist of:
Declarations
%%
Translation Rules
%%
Auxiliary Functions
• Each translation rule has:

• pattern { action }

• Pattern: Regular expression defining a token.

• Action: C code executed when the pattern matches.

Example of a Lex Program
• Lex Program to Count Words, Numbers, and Lines
Example Input and Output

• Input Text:

• Hello world 123

• This is Lex 456

• It counts words and numbers 789

Steps in Lex Implementation

• Read language specification.

• Construct NFA (Non-Deterministic Finite Automaton).
• Convert NFA to DFA (Deterministic Finite Automaton).
• Optimize the DFA.
• Generate parsing tables & code.
• Key Question: Why are finite automata important in lexical
analysis?

• They efficiently recognize patterns (e.g., detecting keywords

like for, while).

• They convert regular expressions (REs) into a structured

format that a compiler can process.
Unit 4: Implementing a Lexical Analyzer
Introduction
• The lexical analyser (or scanner) is the first stage of a
compiler.

• It reads the source code and converts it into tokens (basic

units like keywords, identifiers, operators, etc.).

• In this unit, we will discuss finite state machines, which are

the recognisers that recognise regular expressions and also
how to convert REs into finite automata and vice versa.
Objectives

By the end of this unit, you should be able to:

– Define finite automata (FA)

– Convert regular expressions (REs) to NFA

– Convert NFA to DFA

Finite Automata

• A finite automaton is a machine that takes an input string X

and decides whether it belongs to a specific language L.

• Types of Finite Automata:

– Nondeterministic Finite Automaton (NFA)
– Deterministic Finite Automaton (DFA)
Nondeterministic Finite Automaton (NFA)

• An NFA can be represented by a 5-tuple (S,Σ,δ, s₀,F) has:

• ✔️A finite non-empty set of states (S)

• ✔️A finite non-empty set input alphabets (Σ)

• ✔️A transition function (δ) mapping states and input symbols

to new states. i.e S X Σ -> S

• ✔️An initial state (s₀) in (S)

• ✔️A set of final states (F) belonging to S

Example: How an NFA Works

– The machine can be in multiple states at once (because of multiple

transitions).

– Epsilon (ε) moves allow moving between states without consuming

input.

Acceptance Condition:

– An NFA accepts a string if it can move from the start state to a final

state while reading the string.

• The language defined by an NFA is the set of
strings accepted by the NFA.
Deterministic Finite Automaton (DFA)
• A DFA is an NFA with no ε-moves and only one possible
transition per input character.

• Key Differences:

Feature NFA DFA

Multiple transitions for the
✅ Yes ❌ No
same input?

ε-moves allowed? ✅ Yes ❌ No

Deterministic behavior? ❌ No ✅ Yes

Converting a Regular Expression to an NFA
• Regular expressions (REs) describe patterns in
text. We can convert them to NFA in 5 steps:

• Steps to Convert RE → NFA:

– For a single character "a" → create two states with a transition
on 'a'.
– For concatenation (AB) → link two NFAs together.
– For alternation (A|B) → create a start state with ε-moves to
both NFAs.
– For repetition (A)* → add ε-moves to allow looping.
– For grouping (AB)* → combine rules above.
Converting NFA to DFA (Subset Construction Algorithm)
To create a DFA from an NFA, we:
– 1️Find ε-closures (all reachable states without consuming input).
– 2️Track all possible states an NFA could be in for a given input.
– 3️Merge states where possible to create a deterministic machine.

• Key Challenge:
– The number of DFA states can be much larger than the NFA!
– But a DFA is faster in execution because it has a single, clear
transition path for each input.
Conclusion & Summary
• NFA allows multiple paths and ε-moves.
• DFA ensures only one path for each input.
• Regular expressions → NFA → DFA for pattern matching in
compilers.

CS3304 9 LanguageSyntax 2 PDF
No ratings yet
CS3304 9 LanguageSyntax 2 PDF
39 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
No ratings yet
Lexical Analysis: Dr. Murali Krishna Enduri Department of CSE
88 pages
Chpater 2 Lexical Analysis
No ratings yet
Chpater 2 Lexical Analysis
48 pages
Compiler_lexical analyzer-2
No ratings yet
Compiler_lexical analyzer-2
16 pages
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
No ratings yet
A Typical Lexical Analyzer Generator Nfa To Dfa DFA Analysis
64 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
CC_unit_2
No ratings yet
CC_unit_2
80 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Lexical Analysis I: Compiler Construction
No ratings yet
Lexical Analysis I: Compiler Construction
35 pages
Compiler Design
No ratings yet
Compiler Design
42 pages
Compiler Design Part 2
No ratings yet
Compiler Design Part 2
20 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Unit2
No ratings yet
Unit2
61 pages
Chapter 2 - Copy
No ratings yet
Chapter 2 - Copy
39 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
2.1 Constituents of Lexical Analysis
No ratings yet
2.1 Constituents of Lexical Analysis
10 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
2
No ratings yet
2
109 pages
ch-2.pdf 2
No ratings yet
ch-2.pdf 2
27 pages
2
No ratings yet
2
40 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical analysis
No ratings yet
Lexical analysis
62 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lab
0% (1)
Lab
32 pages
CC Note 1
No ratings yet
CC Note 1
11 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
1st Phase Lexical Analyzer
No ratings yet
1st Phase Lexical Analyzer
33 pages
jay's atc
No ratings yet
jay's atc
9 pages
Chapter 2 Lexical Analysis (Scanning) Edited
No ratings yet
Chapter 2 Lexical Analysis (Scanning) Edited
46 pages
The Role of The Lexical Analyzer: Token Source Program
No ratings yet
The Role of The Lexical Analyzer: Token Source Program
30 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
2024_CD-Ch02_Lexical_Analysis
No ratings yet
2024_CD-Ch02_Lexical_Analysis
25 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
CS-352 - Spring 2024 - Lec2
No ratings yet
CS-352 - Spring 2024 - Lec2
35 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
CD GTU Study Material Presentations Unit-2 27082020063553AM
No ratings yet
CD GTU Study Material Presentations Unit-2 27082020063553AM
84 pages
cd1
No ratings yet
cd1
92 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
2 Lexical
100% (1)
2 Lexical
7 pages
Introduction to Lexical Analysis
No ratings yet
Introduction to Lexical Analysis
2 pages
Ch3_LexicalAnalysis
No ratings yet
Ch3_LexicalAnalysis
40 pages
6-Lexical Analysis Part5
No ratings yet
6-Lexical Analysis Part5
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Syllabus and Cos - Jan 2024 - TOC
No ratings yet
Syllabus and Cos - Jan 2024 - TOC
2 pages
Theory of Computation - CS8501 2017 Regulation - Semester Question Paper 2021 Nov Dec
No ratings yet
Theory of Computation - CS8501 2017 Regulation - Semester Question Paper 2021 Nov Dec
4 pages
07 Properties of Regular Languages
No ratings yet
07 Properties of Regular Languages
13 pages
Finite Automata
No ratings yet
Finite Automata
36 pages
Assignment For Slow Learners (Automata and Theory)
No ratings yet
Assignment For Slow Learners (Automata and Theory)
2 pages
Finite Automata in Natural Language Processing
No ratings yet
Finite Automata in Natural Language Processing
5 pages
Module:1 Introduction To Languages and Grammars
No ratings yet
Module:1 Introduction To Languages and Grammars
20 pages
BCS503 Notes 1
No ratings yet
BCS503 Notes 1
26 pages
CSE322 FORMAL LANGUAGES AND AUTOMATION THEORY
100% (1)
CSE322 FORMAL LANGUAGES AND AUTOMATION THEORY
2 pages
14. Automata and Complexity theory
No ratings yet
14. Automata and Complexity theory
26 pages
Exercises ChurchTuring
No ratings yet
Exercises ChurchTuring
2 pages
Module 3 CFG - Final
No ratings yet
Module 3 CFG - Final
40 pages
PDF Ultimate Computing Biomolecular Consciousness and NanoTechnology 2003 (electronic) Edition Stuart R Hameroff download
100% (3)
PDF Ultimate Computing Biomolecular Consciousness and NanoTechnology 2003 (electronic) Edition Stuart R Hameroff download
40 pages
5. Turing Machine (1)
No ratings yet
5. Turing Machine (1)
39 pages
Cs3452 Toc Unit3
No ratings yet
Cs3452 Toc Unit3
48 pages
Ch2regular Expression
No ratings yet
Ch2regular Expression
42 pages
Theory of Automata ASSIGNMENT 1
No ratings yet
Theory of Automata ASSIGNMENT 1
10 pages
TOA - Lecture 1
No ratings yet
TOA - Lecture 1
42 pages
IMRaD RFID BASED ATTENDANCE AND HALL PASS SYSTEM
No ratings yet
IMRaD RFID BASED ATTENDANCE AND HALL PASS SYSTEM
18 pages
Automata Theory
No ratings yet
Automata Theory
48 pages
Syllabus 2nd Year
No ratings yet
Syllabus 2nd Year
66 pages
Tcs Es 2019 Easy Solution Textbook For Tcs
No ratings yet
Tcs Es 2019 Easy Solution Textbook For Tcs
97 pages
TOC Notes
No ratings yet
TOC Notes
15 pages
Held C.&Co (Eds) Mental Models and The Mind
No ratings yet
Held C.&Co (Eds) Mental Models and The Mind
72 pages
TOC Full Note For PU
No ratings yet
TOC Full Note For PU
50 pages
WWW - Manaresults.Co - In: II B. Tech II Semester Model Examinations, March 2018 Formal Languages and Automata Thoery
No ratings yet
WWW - Manaresults.Co - In: II B. Tech II Semester Model Examinations, March 2018 Formal Languages and Automata Thoery
6 pages
Compiler Design Lab Manual (1)
No ratings yet
Compiler Design Lab Manual (1)
71 pages
A Patch For Postel's Robustness Principle
No ratings yet
A Patch For Postel's Robustness Principle
5 pages
TOC Assignment 1 - Adesh Pokhrel
No ratings yet
TOC Assignment 1 - Adesh Pokhrel
4 pages
BCA (AI & ML)-5th Sem
No ratings yet
BCA (AI & ML)-5th Sem
14 pages

Compiler_Construction_Lexical_Analysis

Uploaded by

Compiler_Construction_Lexical_Analysis

Uploaded by

Compiler Construction: Lexical

At the end of this unit, you should be able to:

• The lexical analyzer (scanner) reads source code character by character

• Tokens: The smallest units in a program (keywords, identifiers,

• The scanner helps in simplifying syntax analysis by grouping meaningful

• Simplifies parser design: Tokens are easier to analyze than raw

• Removes whitespace, comments, and unnecessary symbols.

• Keeps track of line numbers and errors.

• Example: If input is x = y + 5;, the scanner generates: id(x) = id(y) +

• Error Handling: Detects errors (e.g., incomplete strings, invalid

• Pattern Matching: Uses regular expressions to recognize

• Converts raw input into tokens for further processing.

• Uses pattern matching techniques (Regular Expressions, Finite

• At the end of this unit, you should be able to:

-list the various methods of constructing a lexical analyser

-describe the input buffering method of constructing a

-explain the transition diagram method of constructing a

-construct transition diagrams to handle keywords,

• Manual implementation is based on pattern matching

• Two major methods:

• Instead of reading one character at a time from disk (which is

• Speeds up lexical analysis (reduces character-by-character

• Efficient in handling lookahead (especially for multi-character

tokens like ==).

• Prevents unnecessary disk accesses (since input is loaded in

• Requires memory space to store the buffers.

• Complex implementation (managing buffer boundaries).

• Handling lookahead requires extra logic.

• A transition diagram consists of:

'i' → 'f' → (Final) → "IF keyword"

(Letter) → (Letter | Digit) → (Final) → "Identifier"

– Another way is to bring all the keywords together in a TD i.e.

BEGIN, END, IF, THEN, ELSE.

have something like the diagram below

• Easy to implement using finite state machines (FSMs).

• Handles different token patterns in a structured way.

• Can become complex for large grammars.

• Hard to manage when multiple tokens have overlapping

• Does not inherently handle whitespace and comments.

• Instead of writing a lexer manually, we can use automated

To understand lexical analysers, we need a foundation in

• Concatenation: Combining two strings ("ab" + "cd" = "abcd").

• Union (L1 ∪ L2) → Combines languages (e.g., { "a", "b" } ∪

• Concatenation (L1L2) → Joins words (e.g., { "a", "b" } { "c", "d"

• Exponentiation (L^n) → Repeats elements (L^2 = { "aa", "ab",

• Kleene Closure Always Includes ε (Empty String).

• Definition of Regular Expressions

• Kleene Star (*) → Matches zero or more repetitions (a*

– Contain only "a" and "b".

• Regular Expressions Are Used in Languages Like:

Example of a Lex Regular Expression:

Patterns and Lexemes

Structure of a Lex Program

• Pattern: Regular expression defining a token.

• Action: C code executed when the pattern matches.

• Hello world 123

• This is Lex 456

• It counts words and numbers 789

• Read language specification.

• They efficiently recognize patterns (e.g., detecting keywords

• They convert regular expressions (REs) into a structured

• It reads the source code and converts it into tokens (basic

• In this unit, we will discuss finite state machines, which are

By the end of this unit, you should be able to:

– Define finite automata (FA)

– Convert regular expressions (REs) to NFA

– Convert NFA to DFA

• A finite automaton is a machine that takes an input string X

• Types of Finite Automata:

• An NFA can be represented by a 5-tuple (S,Σ,δ, s₀,F) has:

• ✔️A finite non-empty set of states (S)

• ✔️A finite non-empty set input alphabets (Σ)

• ✔️A transition function (δ) mapping states and input symbols

• ✔️An initial state (s₀) in (S)

• ✔️A set of final states (F) belonging to S

– The machine can be in multiple states at once (because of multiple

• Kleene Star () → Matches zero or more repetitions (a