0% found this document useful (0 votes)
80 views70 pages

NFA to DFA Conversion Techniques

The document discusses translating regular expressions to nondeterministic finite automata (NFA) and then to deterministic finite automata (DFA). It defines NFA and DFA formally and explains how to represent them using graphs and transition tables. The key steps are: 1) Translate regular expressions to NFA. 2) Use the subset construction algorithm to convert the NFA to a DFA, which involves taking the epsilon-closure of states and transitions. 3) The resulting DFA is deterministic and can efficiently recognize tokens in the regular language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views70 pages

NFA to DFA Conversion Techniques

The document discusses translating regular expressions to nondeterministic finite automata (NFA) and then to deterministic finite automata (DFA). It defines NFA and DFA formally and explains how to represent them using graphs and transition tables. The key steps are: 1) Translate regular expressions to NFA. 2) Use the subset construction algorithm to convert the NFA to a DFA, which involves taking the epsilon-closure of states and transitions. 3) The resulting DFA is deterministic and can efficiently recognize tokens in the regular language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Majority of texts, diagrams and tables in the slide is based

on the text book Compilers: Principles, Techniques, and


Tools by Aho, Sethi, Ullman and Lam.
• Translate regular expressions to NFA
• Translate NFA to an efficient DFA

Optional

regular
NFA DFA
expressions

Simulate NFA Simulate DFA


to recognize to recognize
tokens tokens
3
Nondeterministic finite automata (NFA) have no restrictions on the labels
of their edges. A symbol can label several edges out of the same state, and
, the empty string, is a possible label.

• An NFA is a 5-tuple (S, , , s0, F) where

S is a finite set of states


 is a finite set of symbols, the alphabet
 is a mapping from S to a set of states
s0  S is the start state
F  S is the set of accepting (or final) states
4
• An NFA can be diagrammatically represented by a labeled directed graph called a
transition graph. This graph is very much like a transition diagram, except:
• The same symbol can label edges from one state to several different states,
and
• An edge may be labeled by ε, the empty string, instead of, or in addition to,
symbols from the input alphabet.
• The transition graph for an NFA recognizing the language of regular expression
(a|b)*abb.
a
S = {0,1,2,3}
 = {a, b}
start a b b
0 1 2 3 s0 = 0
F = {3}
b

5
start a b b
0 1 2 3

• The mapping  of an NFA can be represented in a transition table

(0,a) = {0,1}
(0,b) = {0}
(1,b) = {2}
(2,b) = {3}
6
• An NFA accepts an input string x if and only if there is some path with
edges labeled with symbols from x in sequence from the start state to
some accepting state in the transition graph

• A state transition from one state to another on the path is called a


move

• The language defined by an NFA is the set of input strings it accepts,


such as (ab)*abb for the example NFA

7
Lex specification with NFA
regular expressions
p1 { action1 } N(p1) action1
p2 { action2 }

start
s0
 N(p2) action2
… …
pn { actionn }  actionn
N(pn)

Subset construction

DFA
8

start
i  f

a start a
i f

start  N(r1) 
r1r2 i f
 N(r2) 
r 1r 2 start
i N(r1) N(r2) f

r* start
i  N(r)  f

 9
start a
1 2
a { action1 }
abb { action2 } start a b b
a*b+ { action3 }
3 4 5 6
a b

start
7 b 8
a
1 2

start
0  3
a
4
b
5
b
6
a b

7 b 8 10
Example

11
Example
Example
(a|b)*a(a|b)+
(ab)+(ab)*ab
• A deterministic finite automaton is a special case of an NFA
• No state has an -transition
• For each state s and input symbol a there is at most one edge labeled a
leaving s
• Each entry in the transition table is a single state
• At most one path exists to accept a string
• Simulation algorithm is simple

17
• INPUT: An input string x terminated by an end-of-file character eof. A DFA D
with start state s0, accepting states F, and transition function move.
• OUTPUT: Answer “yes" if D accepts x; “no" otherwise.
• METHOD: Apply the algorithm to the input string x. The function move(s,c)
gives the state to which there is an edge from state s on input c. The function
nextChar returns the next character of the input string x.
A DFA that accepts (a|b)*abb

b
b
a
start a b b
0 1 2 3

a a

19
• The subset construction algorithm converts an NFA into a DFA using:
-closure(s) = {s}  {t  s  …  t}
-closure(T) = sT -closure(s)
move(T,a) = {t  s a t and s  T}
• The algorithm produces:
Dstates is the set of states of the new DFA consisting of sets of states
of the NFA
Dtran is the transition table of the new DFA

21

-closure({0}) = {0,1,3,7}
1
a
2
move({0,1,3,7},a) = {2,4,7}
 -closure({2,4,7}) = {2,4,7}
move({2,4,7},a) = {7}
start
0  3
a
4
b
5
b
6 -closure({7}) = {7}
a b
move({7},b) = {8}

7 8
-closure({8}) = {8}
b
move({8},a) = 

a a b a none
0 2 7 8
1 4
3 7
7 Also used to simulate NFAs 22

S := -closure({s0})
Sprev := 
a := nextchar()
while S   do
Sprev := S
S := -closure(move(S,a))
a := nextchar()
end do
if Sprev  F   then
execute action in Sprev
return “yes”
else return “no” 23
Initially, -closure(s0) is the only state in Dstates and it is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
U := -closure(move(T,a))
if U is not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do
24

a
2 3

start    a b b
0 1 6 7 8 9 10

4
b
5


b
Dstates
C A = {0,1,2,4,7}
B = {1,2,3,4,6,7,8}
b a b
C = {1,2,4,5,6,7}
start a b b D = {1,2,4,5,6,7,9}
A B D E E = {1,2,4,5,6,7,10}
a
a
a 25
a a1
1 2

start
0  3
a
4
b
5
b
6 a2
a b

7 8 a3
b
b
Dstates
a3
C A = {0,1,3,7}
a B = {2,4,7}
b
b b C = {8}
start D = {7}
A D E = {5,8}
a F = {6,8}
a
b b
B E F
a1 a3 a2 a3 26
Find an equivalent DFA of the following NFA with -transitions using subset
construction rule.
-closure(0) = {0, 1, 2, 4, 7}

TranFunction [A, a] = -closure(move((0, 1, 2, 4, 7), a))

TranFunction [A, a] = -closure(move(0, a) Ս move(1, a) Ս move(2, a) Ս move(4, a) Ս


move(7, a))
TranFunction [A, a] = -closure(3, 8) = B (‘B’ is a new state in DFA)

TranFunction [A, b] = -closure(move(A, b))

TranFunction [A, b] = -closure(move(0, 1, 2, 4, 7), b)

TranFunction [A, b] = -closure(move(0, b) Ս move(1, b) Ս move(2, b) Ս move(4, b) Ս


move(7, b))

TranFunction [A, b] = -closure(5) = C (‘C’ is a new state in DFA)


TranFunction [B, a] = -closure(move(B, a)) = ‘B’
TranFunction [B, b] = -closure(move(B, b)) = ‘D’ (‘D’ is a new state in DFA)
TranFunction [C, a] = -closure(move(C, a)) = ‘B’
TranFunction [C, b] = -closure(move(C, b)) = ‘C’
TranFunction [D, a] = -closure(move(D, a)) = ‘B’
TranFunction [D, b] = -closure(move(D, b)) = ‘E’ (‘E’ is a new state in DFA)
TranFunction [E, a] = -closure(move(E, a)) = ‘B’
TranFunction [E, b] = -closure(move(E, b)) = ‘C’

NFA States DFA State a b


{0, 1, 2, 4, 7} A B C
{1, 2, 3, 4, 6, 7, 8} B B D
{1, 2, 4, 5, 6, 7} C B C
{1, 2, 4, 5, 6, 7, 9} D B E
{1, 2, 4, 5, 6, 7, 10} E B C
DFA diagram
Convert the following NFA into an equivalent DFA.

S = 2Q = 25 = 32 states (as the number of states in NFA is 5)


 S = [Ꝋ, {0}, {1}, {2}, {3}, {4}, {0, 1}, {0, 2}, {0, 3}, {0, 4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4},
{3, 4}, {0, 1, 2}, {0, 1, 3}, {0, 1, 4}, {0, 2, 3}, {0, 2, 4}, {0, 3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4},
{2, 3, 4}, {0, 1, 2, 3}, {0, 1, 2, 4}, {0, 1, 3, 4}, {0, 2, 3, 4}, {1, 2, 3, 4}, {0, 1, 2, 3, 4}]
Start and Final NFA State DFA State a b
States of DFA
Ꝋ A Ꝋ Ꝋ
 {0} B {0, 1} {0}
{1} C Ꝋ {2}
{2} D {3} Ꝋ
{3} E Ꝋ {4}
* {4} F Ꝋ Ꝋ
{0, 1} G {0, 1} {0, 2}
{0, 2} H {0, 1, 3} {0}
{0, 3} I {0, 1} {0, 4}
* {0, 4} J {0, 1} {0} DFA State a b
{1, 2} K {3} {2}  B G B
{1, 3} L Ꝋ {2, 4}
* {1, 4} M Ꝋ {2} G G H
{2, 3} N {3} {4} H R B
* {2, 4} O {3} Ꝋ
* {3, 4} P Ꝋ {4} R G U
{0, 1, 2} Q {0, 1, 3} {0, 2} U* R B
{0, 1, 3} R {0, 1} {0, 2, 4}
* {0, 1, 4} S {0, 1} {0, 2}
{0, 2, 3} T {0, 1, 3} {0, 4}
* {0, 2, 4} U {0, 1, 3} {0}
* {0, 3, 4} V {0, 1} {0, 4}
{1, 2, 3} W {3} {2, 4}
* {1, 2, 4} X {3} {2}
* {1, 3, 4} Y Ꝋ {2, 4}
* {2, 3, 4} Z {3} {4}
{0, 1, 2, 3} AA {0, 1, 3} {0, 2, 4}
* {0, 1, 2, 4} BB {0, 1, 3} {0, 2}
* {0, 1, 3, 4} CC {0, 1} {0, 2, 4}
* {0, 2, 3, 4} DD {0, 1, 3} {0, 4}
* {1, 2, 3, 4} EE {3} {2, 4}
* {0, 1, 2, 3, 4} FF {0, 1, 3} {0, 2, 4}
DFA diagram
b

C
b a
b a
start a b b start a b b
A B D E A B D E
a a
a
a b a

34
(A, x)  G (A, x) ≠ G
(B, x)  G (B, x) ≠ G b

C
If |x| = 0, then A and B are said to be ‘0’ equivalent b a b

If |x| = 1, then A and B are said to be ‘1’ equivalent start a b b


A B D E
a
If |x| = 2, then A and B are said to be ‘2’ equivalent a
a
If |x| = 3, then A and B are said to be ‘3’ equivalent
… … … … … … … …
… … … … … … … …
If |x| = n, then A and B are said to be ‘n’ equivalent
DFA a b
State
A B C
B B D
C B C
D B E
E B C
‘0’ Equivalence  {A, B, C, D} {E}
‘1’ Equivalence  {A, B, C} {D} {E}
‘2’ Equivalence  {A, C} {B} {D} {E}
‘3’ Equivalence  {A, C} {B} {D} {E}

DFA a b
State
(A, C) B C
B B D
D B E
E B C
• The “important states” of an NFA are those without an -transition,
that is if
move({s},a)   for some a then s is an important state
• The subset construction algorithm uses only the important states
when it determines
-closure(move(T,a))

38
• Augment the regular expression r with a special end symbol # to
make accepting states important: the new expression is r#
• Construct a syntax tree for r#
• Traverse the tree to construct functions nullable, firstpos, lastpos, and
followpos

39
• Obtain augmented regular expression (r)#
• Construct syntax tree for (r)#
• Compute the following four functions
• Nullable
• Firstpos
• Lastpos
• Followpos
• Construct DFA using followpos

40
Syntax Tree of (a|b)*abb#

alternation

(a|b) |
a b

41
Syntax Tree of (a|b)*abb#

closure

(a|b)* *
alternation

42
Syntax Tree of (a|b)*abb#
concatenation

closure
(a|b)*a
a
alternation
*

|
a b

43
Syntax Tree of (a|b)*abb#

concatenation

closure (a|b)*ab
b

a
alternation
*

|
a b
44
Syntax Tree of (a|b)*abb#

concatenation

b
closure (a|b)*abb
b

a
alternation
*

|
a b
45
Syntax Tree of (a|b)*abb#

concatenation

b
closure
b

a
alternation
* (a|b)*abb#

|
a b
46
Syntax Tree of (a|b)*abb#

concatenation

#
6
b
closure 5
b
4
a
alternation
* 3

| position
number
(for leafs )
a b
47
1 2
Traverse the tree to construct functions nullable, firstpos, lastpos, and
followpos.
For a node n, let L(n) be the language generated by the subtree with
root n
• nullable(n): L(n) contains the empty string ε
• firstpos(n): set of positions under n that can match the first symbol of
a string in L(n)
• lastpos(n): the set of positions under n that can match the last symbol
of a string in L(n)
• followpos(i): the set of positions that can follow position i in any
generated string
48
Node n nullable(n) firstpos(n) lastpos(n)

Leaf  true  

Leaf i false {i} {i}

| nullable(c1) firstpos(c1) lastpos(c1)


/ \ or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if nullable(c1) then if nullable(c2) then
• nullable(c1)
firstpos(c1)  lastpos(c1) 
/ \ and
firstpos(c2) lastpos(c2)
c1 c2 nullable(c2)
else firstpos(c1) else lastpos(c2)
*
| true firstpos(c1) lastpos(c1)
c1 49
Node n nullable(n) firstpos(n) lastpos(n)

Leaf i false {i} {i}

|
{1} a {1} {2} b {2}
1 2

50
Node n nullable(n) firstpos(n) lastpos(n)
| nullable(c1) firstpos(c1) lastpos(c1)
/ \ or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)

{1, 2} | {1, 2}

{1} a {1} {2} b {2}


1 2

51
*
| true firstpos(c1) lastpos(c1)
c1

nullable

{1, 2}
* {1, 2}
{1, 2} | {1, 2}
{1} a {1} {2} b {2} 52
1 2
if nullable(c1)
if nullable(c2)
• nullable(c1) then
then lastpos(c1)
/ \ and firstpos(c1) 
c1 c2  lastpos(c2)
nullable(c2) firstpos(c2)
else lastpos(c2)
else firstpos(c1)
nullable
{1, 2, 3} {3}

{3} a {3}
{1, 2}
* {1, 2} 3

{1, 2} | {1, 2}
{1} a {1} {2} b {2} 53
1 2
{1, 2, 3} {6}

{1, 2, 3} {5} {6} # {6}


6
{1, 2, 3} {4} {5} b {5}
nullable 5
{1, 2, 3} {3} {4} b {4}
4
firstpos lastpos
{3} a {3}
{1, 2}
* {1, 2} 3

{1, 2} | {1, 2}
{1} a {1} {2} b {2} 54
1 2
for each node n in the tree do
if n is a cat-node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) := followpos(i)  firstpos(c2)
end do
else if n is a star-node
for each i in lastpos(n) do
followpos(i) := followpos(i)  firstpos(n)
end do
end if
end do
55
Algorithm to Construct Regular Expression to
DFA Directly
s0 := firstpos(root) where root is the root of the syntax tree
Dstates := {s0} and is unmarked
while there is an unmarked state T in Dstates do
mark T
for each input symbol a   do
let U be the set of positions that are in followpos(p)
for some position p in T,
such that the symbol at position p is a
if U is not empty and not in Dstates then
add U as an unmarked state to Dstates
end if
Dtran[T,a] := U
end do
end do 56
{1, 2, 3} {6}

{1, 2, 3} {5} {6} # {6}


6
{1, 2, 3} {4} {5} b {5}
nullable 5
{1, 2, 3} {3} {4} b {4}
4
firstpos lastpos
{3} a {3}
{1, 2}
* {1, 2} 3

{1, 2} | {1, 2}
{1} a {1} {2} b {2} 57
1 2
The position of regular expression can follow another in the following
ways:
• If n is a cat node with left child c1 and right child c2, then for every
position i in lastpos(c1), all positions in firstpos(c2) are in followpos(i).
• For cat node, for each position i in lastpos of its left child, the firstpos of
its right child will be in followpos(i).
• If n is a star node and i is a position in lastpos(n), then all positions in
firstpos(n) are in followpos(i).
• For star node, the firstpos of that node is in f ollowpos of all positions in
lastpos of that node.
Node followpos
1 {1, 2, 3} 1
2 {1, 2, 3} 3 4 5 6
3 {4}
2
4 {5}
5 {6}
6 -

b b
a
start a 1,2, b 1,2, b 1,2,
1,2,3
3,4 3,5 3,6
a
a
59
Example 2: Construct a DFA from the regular expression in a direct
conversion.
Augmented RE: ∗

Annotated syntax tree

Followpos
Node Followpos
1 {1, 2, 3, 4}
2 {2, 3, 4}
3 {2, 3, 4}
4 {4, 5}
5 ----
Determine DFA States
Step 1:

Step 2:

Step 3:
Step 4:

Transition Table for DFA DFA state diagram


DFA a b
State
A B
B B C
*C D C
D D C

a
2 3

start    a
0 1 6 7 8

b 
A
4 5

-closure (0) = {0, 1, 2, 4, 7} = A (Start State in DFA)


 

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]
]
(Second State in DFA)

65

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]
]
(Third State in DFA)

66

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]

]
(Second State in DFA)
67

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]

]
(Third State in DFA)
68

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]

]
(Second State in DFA)
69

a
2 3

start    a
0 1 6 7 8

4
b
5

]
]

]
(Third State in DFA)
70

You might also like