Theory of Automata Notes
Theory of Automata Notes
Definition − The set ∑+ is the infinite set of all possible strings of all possible
lengths over ∑ excluding λ.
Representation − ∑+ = ∑1 ∪ ∑2 ∪ ∑3 ∪…….
∑+ = ∑* − { λ }
Example − If ∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}
In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene
closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In
mathematics it is more commonly known as the free monoid construction. The application of the
Kleene star to a set is written as . It is widely used for regular expressions, which is the context in
which it was introduced by Stephen Kleene to characterize certain automata, where it means "zero
or more repetitions".
1. If is a set of strings, then is defined as the smallest superset of that contains
the empty string and is closed under the string concatenation operation.
2. If is a set of symbols or characters, then is the set of all strings over symbols in ,
including the empty string .
The set can also be described as the set containing the empty string and all finite-length strings that
can be generated by concatenating arbitrary elements of , allowing the use of the same element
multiple times. If is either the empty set ∅ or the singleton set , then ; if is any other finite
set or countably infinite set, then is a countably infinite set.[1] As a consequence, each formal
language over a finite or countably infinite alphabet is countable, since it is a subset of the countably
infinite set .
The operators are used in rewrite rules for generative grammars.
Given a set define
(the language consisting only of the empty string),
and define recursively the set
for each .
If is a formal language, then , the -th power of the set , is a shorthand for
the concatenation of set with itself times. That is, can be understood to be the set of
all strings that can be represented as the concatenation of strings in .
The definition of Kleene star on is[2]
This means that the Kleene star operator is an idempotent unary operator: for any
set of strings or characters, as for every .
Kleene’s
A language is said to be regular if it can be represented by using a Finite Automata or if
a Regular Expression can be generated for it. This definition leads us to the general
definition that; For every Regular Expression corresponding to the language, a Finite
Automata can be generated.
For certain expressions like :- (a+b), ab, (a+b)* ; It’s fairly easier to make the Finite
Automata by just intuition as shown below. The problem arises when we are provided
with a longer Regular Expression. This brings about the need for a systematic approach
towards FA generation, which has been put forward by Kleene in Kleene’s Theorem – I
Kleene’s Theorem-I :
For any Regular Expression r that represents Language L(r), there is a Finite Automata
that accepts same language.
To understand Kleene’s Theorem-I, Let’s take in account the basic definition of Regular
Expression where we observe that , and a single input symbol “a” can be included in
a Regular Language and the corresponding operations that can be performed by the
combination of these are:
Say, and be two regular expressions. Then,
1. + is a regular expression too, whose corresponding language is L( )U
L( )
2. . is a regular expression too, whose corresponding language is L( ).L(
)
3. * is a regular expression too, whose corresponding language is L( )*
We can further use this definition in association with Null Transitions to give rise to a FA
by the combination of two or more smaller Finite Automata (each corresponding to a
Regular Expression).
Let S accept L = {a} and T accept L = {b}, then R can be represented as a combination of
S and T using the provided operations as:
R = S + T
We observe that,
1. In case of union operation we can have a new start state, from which, null
transition proceeds to the starting state of both the Finite State Machines.
2. The final states of both the Finite Automata’s are converted to intermediate
states. The final state is unified into one which can be traversed by null
transitions.
R = S.T
We observe that,
1. In case of concatenation operation we can have the same starting state as that
of S, the only change occurs in the end state of S, which is converted to an
intermediate state followed by a Null Transition.
2. The Null transition is followed by the starting state of T, the final state of T is
used as the end state of R.
R = S*
We observe that,
1. A new starting state is added, and S has been put as an intermediate state so
that self looping condition could be incorporated.
2. Starting and Ending states have been defined separately so that the self looping
condition is not disturbed.
Now that we are aware about the general operations. Let’s see how Kleene’s Theorem-I
can be used to generate a FA for the given Regular Expression.
Example:
Make a Finite Automata for the expression (ab+a)*
We see that using Kleene’s Theorem – I gives a systematic approach towards the
generation of a Finite Automata for the provided Regular Expression.
In the following image, we can see that from state q0 for input a, there are two next
states q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is
not fixed or determined that with a particular input where to go next. Hence this FA is
called non-deterministic finite automata.
δ: Q x ∑ →2Q
where,
1. Q: finite set of states
2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function
Graphical Representation of an NFA
An NFA can be represented by digraphs called state diagram. In which:
Example 1:
1. Q = {q0, q1, q2}
2. ∑ = {0, 1}
3. q0 = {q0}
4. F = {q2}
Solution:
Transition diagram:
Transition Table:
→q0 q0, q1 q1
q1 q2 q0
*q2 q2 q1, q2
In the above diagram, we can see that when the current state is q0, on input 0, the next
state will be q0 or q1, and on 1 input the next state will be q1. When the current state is
q1, on input 0 the next state will be q2 and on 1 input, the next state will be q0. When
the current state is q2, on 0 input the next state is q2, and on 1 input the next state will
be q1 or q2.
Example 2:
NFA with ∑ = {0, 1} accepts all strings with 01.
Solution:
Transition Table:
→q0 q1 ε
q1 ε q2
*q2 q2 q2
Example 3:
NFA with ∑ = {0, 1} and accept all string of length atleast 2.
Solution:
Transition Table:
q1 q2 q2
*q2 ε ε
In the following diagram, we can see that from state q0 for input a, there is only one
path which is going to q1. Similarly, from q0, there is only one path for input b going to
q2.
Formal Definition of DFA
A DFA is a collection of 5-tuples same as we described in the definition of FA.
1. Q: finite set of states
2. ∑: finite set of the input symbol
3. q0: initial state
4. F: final state
5. δ: Transition function
1. δ: Q x ∑→Q
Graphical Representation of DFA
A DFA can be represented by digraphs called state diagram. In which:
Example 1:
1. Q = {q0, q1, q2}
2. ∑ = {0, 1}
3. q0 = {q0}
4. F = {q2}
Solution:
Transition Diagram:
Transition Table:
→q0 q0 q1
q1 q2 q1
*q2 q2 q2
Example 2:
DFA with ∑ = {0, 1} accepts all starting with 0.
Solution:
Explanation:
o In the above diagram, we can see that on given 0 as input to DFA in state q0 the DFA
changes state to q1 and always go to final state q1 on starting input 0. It can accept 00,
01, 000, 001....etc. It can't accept any string which starts with 1, because it will never go to
final state on a string starting with 1.
Example 3:
DFA with ∑ = {0, 1} accepts all ending with 0.
Solution:
Explanation:
In the above diagram, we can see that on given 0 as input to DFA in state q0, the DFA
changes state to q1. It can accept any string which ends with 0 like 00, 10, 110, 100....etc.
It can't accept any string which ends with 1, because it will never go to the final state q1
on 1 input, so the string ending with 1, will not be accepted or will be rejected.
Mealy Machine
A Mealy machine is a machine in which output symbol depends upon the present input
symbol and present state of the machine. In the Mealy machine, the output is
represented with each input symbol for each state separated by /. The Mealy machine
can be described by 6 tuples (Q, q0, ∑, O, δ, λ') where
1. Q: finite set of states
2. q0: initial state of machine
3. ∑: finite set of input alphabet
4. O: output alphabet
5. δ: transition function where Q × ∑ → Q
6. λ': output function where Q × ∑ →O
Example 1:
Design a Mealy machine for a binary input sequence such that if it has a substring 101,
the machine output A, if the input has substring 110, it outputs B otherwise it outputs C.
Solution: For designing such a machine, we will check two conditions, and those are
101 and 110. If we get 101, the output will be A. If we recognize 110, the output will be
B. For other strings the output will be C.
Now we will insert the possibilities of 0's and 1's for each state. Thus the Mealy machine
becomes:
Example 2:
Design a mealy machine that scans sequence of input of 0 and 1 and generates output
'A' if the input string terminates in 00, output 'B' if the string terminates in 11, and
output 'C' otherwise.
Type 0 Grammar:
Type 0 grammar is known as Unrestricted grammar. There is no restriction on the
grammar rules of these types of languages. These languages can be efficiently modeled
by Turing machines.
For example:
1. bAa → aa
2. S → s
Type 1 Grammar:
Type 1 grammar is known as Context Sensitive Grammar. The context sensitive grammar
is used to represent context sensitive language. The context sensitive grammar follows
the following rules:
o The context sensitive grammar may have more than one symbol on the left hand side of
their production rules.
o The number of symbols on the left-hand side must not exceed the number of symbols
on the right-hand side.
o The rule of the form A → ε is not allowed unless A is a start symbol. It does not occur on
the right-hand side of any rule.
o The Type 1 grammar should be Type 0. In type 1, Production is in the form of V → T
For example:
1. S → AT
2. T → xy
3. A → a
Type 2 Grammar:
Type 2 Grammar is known as Context Free Grammar. Context free languages are the
languages which can be represented by the context free grammar (CFG). Type 2 should
be type 1. The production rule is of the form
1. A → α
Where A is any single non-terminal and is any combination of terminals and non-
terminals.
For example:
1. A → aBb
2. A → b
3. B → a
Type 3 Grammar:
Type 3 Grammar is known as Regular Grammar. Regular languages are those languages
which can be described using regular expressions. These languages can be modeled by
NFA or DFA.
Type 3 is most restricted form of grammar. The Type 3 grammar should be Type 2 and
Type 1. Type 3 should be in the form of
1. V → T*V / T*
For example:
1. A → xy
Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivation or more than one parse tree for the given input
string. If the grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No method
can automatically detect and remove the ambiguity, but we can remove ambiguity by
re-writing the whole grammar without ambiguity.
Example 1:
Let us consider a grammar G with the production rule
1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost
derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.
Example 2:
Check whether the given grammar G is ambiguous or not.
1. E → E + E
2. E → E - E
3. E → id
Solution:
From the above grammar String "id + id - id" can be derived in 2 ways:
1. E → E + E
2. → id + E
3. → id + E - E
4. → id + id - E
5. → id + id- id
1. E → E - E
2. → E + E - E
3. → id + E - E
4. → id + id - E
5. → id + id - id
Since there are two leftmost derivation for a single string "id + id - id", the grammar G is
ambiguous.
Example 3:
Check whether the given grammar G is ambiguous or not.
1. S → aSb | SS
2. S → ε
Solution:
For the string "aabb" the above grammar can generate two parse trees
Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.
Example 4:
Check whether the given grammar G is ambiguous or not.
1. A → AA
2. A → (A)
3. A → a
Solution:
For the string "a(a)aa" the above grammar can generate two parse trees:
Since there are two parse trees for a single string "a(a)aa", the grammar G is ambiguous.
Unambiguous Grammar
A grammar can be unambiguous if the grammar does not contain ambiguity that means
if it does not contain more than one leftmost derivation or more than one rightmost
derivation or more than one parse tree for the given input string.
1. X → Xa
2. If the right associative operates(^) is used in the production rule then apply right
recursion in the production rule. Right recursion means that the rightmost symbol on
the left side is the same as the non-terminal on the right side. For example,
1. X → aX
Example 1:
Consider a grammar G is given as follows:
1. S → AB | aaB
2. A → a | Aa
3. B → b
Solution:
1. S → AB
2. A → Aa | a
3. B → b
Example 2:
Show that the given grammar is ambiguous. Also, find an equivalent unambiguous
grammar.
1. S → ABA
2. A → aA | ε
3. B → bB | ε
Solution:
The given grammar is ambiguous because we can derive two different parse tree for
string aa.
The unambiguous grammar is:
1. S → aXY | bYZ | ε
2. Z → aZ | a
3. X → aXY | a | ε
4. Y → bYZ | b | ε
Example 3:
Show that the given grammar is ambiguous. Also, find an equivalent unambiguous
grammar.
1. E → E + E
2. E → E * E
3. E → id
Solution:
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → id
Example 4:
Check that the given grammar is ambiguous or not. Also, find an equivalent
unambiguous grammar.
1. S → S + S
2. S → S * S
3. S → S ^ S
4. S → a
Solution:
The given grammar is ambiguous because the derivation of string aab can be
represented by the following string:
1. S → S + A |
2. A → A * B | B
3. B → C ^ B | C
4. C → a
Strings:
A string is a finite ordered sequence of symbols chosen form some set of
alphabet or ∑. For example, ‘aababbbbaa’ is a valid string from the alphabet ∑
= {a, b}, similarly ‘001111000101’ is a valid string form the alphabet ∑ = {0,
1}.
Empty string:
Every alphabet ∑ has a special string called empty string which means the
string with zero occurrences of symbols. This string represented by λ, e or ε. It
is the string that may be chosen from any alphabet whatsoever.
Length of a string:
The finite occurrence of input symbols form ∑ present the length of a string. If s
denotes the string over alphabet ∑ then length of a string is represented by |S|.
For instance, ‘001110’ is a string from the alphabets ∑= {0, 1} has length 6.
Similarly if ∑ = {a, b} and S = ‘aabbabbba’ then |S| = 9.
Introduction to Grammars
Previous Page
Next Page
n the literary sense of the term, grammars denote syntactical rules for conversation in
natural languages. Linguistics have attempted to define grammars since the inception
of natural languages like English, Sanskrit, Mandarin, etc.
The theory of formal languages finds its applicability extensively in the fields of
Computer Science. Noam Chomsky gave a mathematical model of grammar in 1956
which is effective for writing computer languages.
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
N or V is a set of variables or non-terminal symbols.
N
belongs to VN.
Example
Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
Here,
S, A, and B are Non-terminal symbols;
a and b are Terminal symbols
S is the Start symbol, S ∈ N
Productions, P : S → AB, A → a, B → b
Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
S and A are Non-terminal symbols.
a and b are Terminal symbols.
ε is an empty string.
S is the Start symbol, S ∈ N
Production P : S → aAb, aA → aaAb, A → ε
Example