Regular Expression Notes
Regular Expression Notes
o The language accepted by finite automata can be easily described by simple expressions
called Regular Expressions. It is the most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching
algorithm used this pattern to find the operations on a string.
For instance:
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx,
xxxx, .....}
Union: If L and M are two regular languages then their union L U M is also a union.
L U M = {s | s is in L or s is in M}
Intersection: If L and M are two regular languages then their intersection is also an
intersection.
L ⋂ M = {st | s is in L and t is in M}
Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular
language.
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero
times, that means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a
regular expression for this as:
R = a*
Solution:
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any number
of a's and b's.
Solution:
r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and
b.
The (a + b)* shows any combination with a and b even a null string.
Regular
Regular Languages
( a∪ e ∪i∪ o ∪ u )
Expression
set of vowels {a, e, i, o, u}
a followed by 0 or (a.b*) {a, ab, abb, abbb, abbbb,….}
more b
any no. of vowels v*.c* ( where v – { ε , a ,aou, aiou, b, abcd…..} where
followed by any no. of vowels and c – ε represent empty string (in case 0
consonants consonants) vowels and 0 consonants )
1. Which one of the following languages over the alphabet {0,1} is described by the
regular expression?
(0+1)*0(0+1)*0(0+1)*
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.
Solution : Option A says that it must have substring 00. But 10101 is also a part of
language but it does not contain 00 as substring. So it is not correct option.
Option B says that it can have maximum two 0’s but 00000 is also a part of
language. So it is not correct option.
Option C says that it must contain atleast two 0. In regular expression, two 0 are
present. So this is correct option.
Option D says that it contains all strings that begin and end with either 0 or 1. But it
can generate strings which start with 0 and end with 1 or vice versa as well. So it is
not correct.
S ->aS | bS | ∊
2. Which of the following languages is generated by given grammar?
2. Regular expression for the set of all strings of a’s and b’s that have at least 2 a’s.
(a+b)* a (a+b)* a (a+b)*
Valid strings={aa, aaa, baa,aab, aba, abba,…}
Invalid strings={a, b, baa,abb, abb, abbb,…}
3. Regular expression for the set of all strings whose first symbol from the right end is a 0.
L = (0+1)*0
4. Regular expression for the set of all strings whose second symbol from the right end is a
0.
L = (0+1)*.0.(0+1)
5. Regular expression for the set of all strings whose 3rd symbol from the right end is a 0.
L = (0+1)*.0.(0+1).(0+1)
6. Describe the strings that are represented by the regular expression (0+1)*.0.(0+1).(0+1).
Valid Strings={0000,1000, 1010, and many other similar strings}
7. Regular expression for the strings that do not contain a as a string defined over {a,b}
(b)*
8. Regular expression for the strings that do not contain single a as a string defined over
{a,b}
(aa+b)*
9. A regular expression for the language of allthose strings having even length strings and
starting with a or odd length strings starting with b
RE = a(aa+bb+ab+ba)*(a+b) + b(aa+bb+ab+ba)*
Some RE Examples
Regular Regular Set
Expressions
(a+b)* Set of strings of a’s and b’s of any length including the null string. So L = { ε,
a, b, aa , ab , bb , ba, aaa…….}
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb, aabb,
babb, aaabb, ababb, …………..}
(11)* Set consisting of even number of 1’s including empty string, So L= {ε, 11,
1111, 111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number of
b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating any
combination of the strings aa, ab, ba and bb including null, so L = {aa, ab, ba,
bb, aaab, aaba, …………..}
Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets
Property 1. The union of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Hence, proved.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Property 6. The closure of a regular set is regular.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths
excluding Null)
RE (L*) = a (a)*
Hence, proved.
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented
by an RE − (0 + 1)*001(0 + 1)*
Hence, proved.
Identities Related to Regular Expressions
Given R, P, L, Q as regular expressions, the following identities hold −
∅* = ε
ε* = ε
RR* = R*R
R*R* = R*
(R*)* = R*
RR* = R*R
(PQ)*P =P(QP)*
(a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
R + ∅ = ∅ + R = R (The identity for union)
R ε = ε R = R (The identity for concatenation)
∅ L = L ∅ = ∅ (The annihilator for concatenation)
R + R = R (Idempotent law)
L (M + N) = LM + LN (Left distributive law)
(M + N) L = ML + NL (Right distributive law)
ε + RR* = ε + R*R = R*
Solution:
In a regular expression, the first symbol should be 1, and the last symbol should be 0. The r.e.
is as follows:
R = 1 (0+1)* 0
Example 2:
Write the regular expression for the language starting and ending with a and having any
having any combination of b's in between.
Solution:
R = a b* a
Example 3:
Write the regular expression for the language starting with a but not having consecutive b's.
R = {a + ab}*
Example 4:
Write the regular expression for the language accepting all the string in which any number of
a's is followed by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any number
of c's means c*. Since as given in problem statement, b's appear after a's and c's appear after
b's. So the regular expression could be:
R = a* b* c*
Example 5:
Write the regular expression for the language over ∑ = {0} having even length of the string.
Solution:
R = (00)*
Example 6:
Write the regular expression for the language having a string which should have atleast one 0
and alteast one 1.
Solution:
Example 7:
Describe the language denoted by following regular expression
Solution:
The language can be predicted from the regular expression by finding the meaning of it. We
will first split the regular expression as:
Example 8:
Write the regular expression for the language L over ∑ = {0, 1} such that all the string do not
contain the substring 01.
Solution:
R = (1* 0*)
Example 9:
Write the regular expression for the language containing the string over {0, 1} in which there
are at least two occurrences of 1's between any two occurrences of 1's between any two
occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence the
r.e. for required language is:
1. R = (1 + (0111*0))*
Example 10:
Write the regular expression for the language containing the string in which every 0 is
immediately followed by 11.
Solution:
1. R = (011 + 1)*
Conversion of RE to FA
To convert the RE to FA, we are going to use a method called the subset method. This
method is used to obtain FA from the given regular expression. This method is given below:
Step 1: Design a transition diagram for given regular expression, using NFA with ε moves.
Method
Step 1 Construct an NFA with Null moves from the given regular expression.
Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA.
Problem
Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0
Solution
We will concatenate three expressions "1", "(0 + 1)*" and "0"
Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we
get the following −
It is an NDFA corresponding to the RE − 1 (0 + 1)* 0. If you want to convert it into a DFA,
simply apply the method of converting NDFA to DFA
38.9M
761
Example 1:
Design a FA from given regular expression 10 + (0 + 11)0* 1.
Solution: First we will construct the transition diagram for a given regular expression.
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Now we have got NFA without ε. Now we will convert it into required DFA for that, we will
first write a transition table for this NFA.
State 0 1
→q0 q3 {q1, q2}
q1 qf ϕ
q2 ϕ q3
q3 q3 qf
*qf ϕ ϕ
State 0 1
→[q0] [q3] [q1, q2]
[q1] [qf] ϕ
[q2] ϕ [q3]
[q3] [q3] [qf]
[q1, q2] [qf] [q3]
*[qf] ϕ ϕ
Example 2:
Design a NFA from given regular expression 1 (1* 01* 01*)*.
Step 1:
Step 2:
Step 3:
Example 3:
Construct the FA for regular expression 0*1 + 10.
Solution:
Step 1:
Step 2:
Step 3:
Step 4:
Conversion of Regular Expression to Finite Automata
As the regular expressions can be constructed from Finite Automata using the State
Elimination Method, the reverse method, state decomposition method can be used to
construct Finite Automata from the given regular expressions.
Note: This method will construct NFA (with or without ε-transitions, depending on
the expression) for the given regular expression, which can be further converted to
DFA using NFA to DFA conversion.
Fig 1
Fig 2
Step 2: Repeat the following rules (state decomposition method) by considering the
least precedency regular expression operator first until no operator is left in the
expression. Precedence of operators in regular expressions is defined as Union <
Concatenation < Kleene’s Closure.
Union operator (+) can be eliminated by introducing parallel edges between the two
states as follows.
Fig 3: Removal of Union Operator
Fig 5
2. Else if there is only one incoming edge at the right-most state, i.e., B in transition A
-> B, then introduce self-loop on state B and label edge A to B as an ε-transition, as
shown in Fig 6.
Fig 6
3. Else introduce a new state between two states having self-loop labeled as the
expression. The new state will have ε-transitions with the previous states as follows,
as shown in Fig 7.
Fig 7
Example:
Fig 8
Step 2:
A. As the least precedency operator in the expression is a union(+). So we will
introduce parallel edges (parallel self-loops here) for ‘ab’ and ‘ba’, as shown in Fig 9.
Fig 9
B. Now we have two labels with concatenation operators (no operator mentioned
between two variables is concatenation), so we remove them one by one by
introducing new states, q1 and q2 as shown in Fig 10 and Fig 11. (Refer Fig 4 above)
Fig 10
Fig 11
Step 3: As no operators are left, we can say that Fig 11 is the required finite automata
(NFA).
DFA to Regular Expression-
The two popular methods for converting a given DFA to its regular expression are-
1. Arden’s Method
2. State Elimination Method
Arden’s Theorem-
Arden’s Theorem is popularly used to convert a given DFA to its regular expression.
It states that-
Let P and Q be two regular expressions over ∑.
If P does not contain a null string ∈, then-
R = Q + RP has a unique solution i.e. R = QP*
Conditions-
Steps-
To convert a given DFA to its regular expression using Arden’s Theorem, following steps are
followed-
Step-01:
Form a equation for each state considering the transitions which comes towards that state.
Add ‘∈’ in the equation of initial state.
Step-02:
Bring final state in the form R = Q + RP to get the required regular expression.
Important Notes-
Note-01:
Arden’s Theorem can be used to find a regular expression for both DFA and NFA.
Note-02:
Problem-01:
Find regular expression for the following DFA using Arden’s Theorem-
Solution-
Step-01:
Step-02:
Problem-02:
Find regular expression for the following DFA using Arden’s Theorem-
Solution-
Step-01:
Form a equation for each state-
q1 = ∈ ……(1)
q2 = q1.a ……(2)
q3 = q1.b + q2.a + q3.a …….(3)
Step-02:
Problem-03:
Find regular expression for the following DFA using Arden’s Theorem-
Solution-
Step-01:
Step-02:
Problem-04:
Find regular expression for the following DFA using Arden’s Theorem-
Solution-
Step-01:
Step-02:
This method involves the following steps in finding the regular expression for any given DFA-
Step-01:
Thumb Rule
The initial state of the DFA must not have any incoming edge.
If there exists any incoming edge to the initial state, then create a new initial state having
no incoming edge to it.
Example-
Step-02:
Thumb Rule
There must exist only one final state in the DFA.
If there exists multiple final states in the DFA, then convert all the final states into non-
final states and create a new single final state.
Example-
Step-03:
Thumb Rule
The final state of the DFA must not have any outgoing edge.
If there exists any outgoing edge from the final state, then create a new final state having
no outgoing edge from it.
Example-
Step-04:
In the end,
Only an initial state going to the final state will be left.
The cost of this transition is the required regular expression.
NOTE
The state elimination method can be applied to any finite automata.
(NFA, ∈-NFA, DFA etc)
Problem-01:
Solution-
Step-01:
Step-02:
Final state B has an outgoing edge.
So, we create a new final state qf.
Step-03:
So, after eliminating state A, we put a direct path from state q i to state B having cost ∈.0
There is a path going from state qi to state B via state A.
=0
There is a loop on state B using state A.
So, after eliminating state A, we put a direct loop on state B having cost 1.0 = 10.
Step-04:
NOTE-
Problem-02:
Solution-
Step-01:
Step-03:
Step-05:
Problem-03:
Solution-
Step-01:
Step-02:
Step-03:
∈.c*.a = c*a
So, after eliminating state q1, we put a direct path from state qi to state q2 having cost
Step-04:
Problem-04:
Solution-
Step-01:
Step-02:
Step-03:
Step-04:
∈.b*.(aa*(bb*+∈)+∈) = b*(aa*(bb*+∈)+∈)
So, after eliminating state A, we put a direct path from state q i to state qf having cost
From here,
We know, bb* + ∈ = b*
So, we can also write-
Solution-
Step-01:
Since initial state A has an incoming edge, so we create a new initial state q i.
Since final state A has an outgoing edge, so we create a new final state q f.
Step-03:
So, after eliminating state A, we put a direct path from state q i to state qf having cost ∈.
There is a path going from state qi to state qf via state A.
From here,
Problem-06:
From here,
Regular Expression = a
Problem-07:
Step-01:
Step-03: