0% found this document useful (0 votes)
34 views31 pages

Chapter 4

Regular expressions provide a precise way to define formal languages. They use operations like *, +, and () to describe strings of characters. Any language that can be defined by a regular expression is considered a regular language. All finite languages are regular because they can be expressed as a single regular expression listing all possible strings. Regular expressions allow unambiguous definitions of languages in a way that removes all ambiguity.

Uploaded by

ruba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views31 pages

Chapter 4

Regular expressions provide a precise way to define formal languages. They use operations like *, +, and () to describe strings of characters. Any language that can be defined by a regular expression is considered a regular language. All finite languages are regular because they can be expressed as a single regular expression listing all possible strings. Regular expressions allow unambiguous definitions of languages in a way that removes all ambiguity.

Uploaded by

ruba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

Regular Expressions

Defining Languages Using


Regular Expressions
 Previously, we defined the languages:
• L1 = {Xn for n = 1, 2, 3, . . .}
• L2 = {x, xxx, xxxxx, . . .}
 But these are not very precise ways of
defining languages.
 So, now we want a very precise way of
defining a languages, and we will do this
using regular expressions
Regular Expressions
 Regular expressions are written in bold face letters and
are a way of specifying the language.
 Formal way to define the lexical specifications of a
language
 Remove ambiguity altogether
 Called expressions on account of similarity with
arithmetic expressions
 Use *, + and ()
 * shows repetition
 + presents choice or disjunction (some time authors used
U for this purpose)
 () used for grouping
Language-Defining Symbols:
Star Sign
 We now introduce the use of the Kleene star, applied not
to a set, but directly to the letter x and written as a
superscript: x*.
 This simple expression indicates some sequence of x’s
(may be none at all):
x* = Λ or x or x2 or x3…
= xn for some n = 0, 1, 2, 3, …

 We can think of the star as an unknownpower.


 That is, x* stands for a string of x’s, but we do not
specify how many, and it may be the null string .

4
 The notation x* can be used to define languages
by writing, say L = language (x*)

 Since x* is any string of x’s, L is then the


language of all possible strings of x’s of any
length (including Λ).

5
 Given the alphabet = {a, b}, suppose we wish to define the
language L that contains all words of the form one a followed by
some number of b’s (maybe no b’s at all); that is
L = {a, ab, abb, abbb, abbbb, …}

 Using the language-defining symbol, we may write


L = language (ab*)

 This equation obviously means that L is the language in which the


words are the concatenation of an initial a with some or no b’s.

6
 We can apply the Kleene star to the whole
string ab if we want:
(ab)* = Λ or ab or abab or ababab…
 Observe that
(ab)* ≠ a*b*
 because the language defined by the
expression on the left contains the word
abab, whereas the language defined by
7
the expression on the right does not.
 If we want to define the language L1 = {x; xx; xxx; …}
using the language-defining symbol, we can write
L1 = language(xx*)
which means that each word of L1 must start with an x
followed by some (or no) x’s.

 Note that we can also define L1 using the notation + (as


an exponent) introduced in previous lecture
L1 = language(x+)

8
Plus Sign

 Let us introduce another use of the plus


sign. By the expression
x+y
where x and y are strings of characters
from an alphabet, we mean either x or y.

 Care should be taken so as not to confuse


this notation with the notation + (as an
9
exponent).
Example

 Consider the language T over the alphabet


Σ = {a; b; c}:
 T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb;
cbbbb; …}
 In other words, all the words in T begin with
either an a or a c and then are followed by some
number of b’s.
 Using the above plus sign notation, we may
write this as
10
T = language((a+ c)b*)
Example

 Consider a finite language L that contains


all the strings of a’s and b’s of length
three exactly:
L = {aaa, aab, aba, abb, baa, bab, bba,
bbb}
 Thus, we may write
L = language((a+ b)(a + b)(a + b))

11
Example

 In general, if we want to refer to the set of all possible


strings of a’s and b’s of any length whatsoever, we could
write
language((a+ b)*)

 This is the set of all possible strings of letters from


the alphabet Σ = {a, b}, including the null string.

12
Regular Expressions
 Given  = {a,b}
 a* = {Λ, a,aa,aaa,aaa,aaaa,aaaaa, …}
 ab* = {a, ab,abb,abbb,abbbb, …}
 a+b = {a,b}
 (ab)* = {Λ, ab, abab, ababab, …}
 (a+b)* = {Λ, any string of as and bs}
Formal Definition of Regular
Expressions
 The set of regular expression is defined by
following rules
1. Every letter of  and Λ is a regular
expression
2. If r1 and r2 are regular expressions, then so
are
 (r1)
 r1r2
 r1+r2
 r1*

3.Nothing else is a regular expression


Regular Expressions
 Whether following are RE if so what
languages do they generate
a (b + a)*
 bb(a+b)
 (a+b)(a+b)(a+b)
 (a+b)*ba
 (a+b)*a(a+b)*
 (a+b)*aa(a+b)*
Regular Expressions
 Write RE for the following languages over
the  ={a,b}.
 All words ending with b
 All words that start with a
 All words that start with a double letter
 All words that contain at least one double letter
 All words that start and end with a double letter
 All words of length >=3
 All words that contain exactly one a or exactly
one b
 All words that don’t end at b
Regular Expressions
 Example: Give a regular expression for each of the
following over the alphabet { 0, 1 }:

 { w | w begins with a ‘1’ and ends with a ‘0’ }

 { w | w contains exactly three 1’s}

 { w | w contains at least three 1’s}

 { w | w is a string that begin with a ‘1’ and contain


exactly two 0’s }

 Regular expression definition of a language is not unique.

17
[Section 1.3]
Regular expressions

Examples: give regular expressions for the


following languages:
- { w ε {0,1}* | w contains the substring 01 }
or
- { w ε ∑* | w contains the substring 01 }
- {w in {0,1}* | second symbol of w is a 1}
- { w ε {0,1}* | |w| < 4 }
Some exercises on regular expressions

 Example: What is L((a  b)*a(a  b)*)?


 Ans: {w in {a, b}* | w contains at least one
a}
 Write regular expressions for:
1. {w in {a,b}* | |w| is odd }.
2. {w in {a,b}* | w does not have ab as a substring}.
3. {w in {a,b,c}* | no b in w can come before any c in w}.

19
Regular Expression
Identities
Regular Expression Identities
1. u = u = u
2. * = 
3. u+v=v+u
4. u+u=u
5. u* = (u*)*
6. u (v + w) = uv + uw
7. (u + v) w = uw + vw

20
Languages Associated
with Regular
Expressions
Definition

 The following rules define the language associated


with any regular expression:

 Rule 1: The language associated with the regular


expression that is just a single letter is that one-letter
word alone, and the language associated with Λ is just
{Λ}, a one-word language.

 Rule 2: If r1 is a regular expression associated with the


language L1 and r2 is a regular expression associated
with the language L2, then:
(i) The regular expression (r1)(r2) is associated with the product
L1L2, that is the language L1 times the language L2:

language(r1r2) = L1L2
22
Definition contd.
 Rule 2 (cont.):

(ii) The regular expression r1 + r2 is associated


with the language formed by the union of L1
and L2:
language(r1 + r2) = L1 + L2

(iii) The language associated with the regular


expression (r1)* is L1*, the Kleene closure of
the set L1 as a set of words:
23 language(r1*) = L1*
Languages associated with REs
 r1= a, r2 = b, r3 = Λ
 IfL1 is associated with r1 and L2 is
associated r2
 Language(r1r2)= L1L2
 Language(r1+r2) = L1+L2 = L1 U L2
 Language(r1*) = L1* (Kleen’s Closure of L1)
Regular Languages
 How to tell whether a language is regular
 Define an RE for it, if it is possible the language
is Regular other wise non-regular
 Definition
 Any language that can be represented by a
regular expression is called a regular
language
 It is to be noted that if r1, r2 are regular
expressions, corresponding to the languages L1
and L2 then the languages generated by r1+
r2, r1r2( or r2r1) and r1*( or r2*) are also
regular languages.
Regular Languages
 Example
 Consider the language L, defined over
Σ = {a,b}, of strings of length 2,
starting with a, then
 L = {aa, ab}, may be expressed by
the regular expression aa+ab. Hence
L, by definition, is a regular language.
Regular Languages
 All finite languages are regular
 Example
 Consider the language L, defined over Σ
= {a,b}, of strings of length 2, starting
with a, then L = {aa, ab}, may be
expressed by the regular expression
aa+ab. Hence L, by definition, is a
regular language.
Theorem

 If L is a finite language (a language with only finitely many


words), then L can be defined by a regular expression. In
other words, all finite languages are regular.

Proof

 Let L be a finite language. To make one regular expression that


defines L, we turn all the words in L into boldface type and insert
plus signs between them.

 For example, the regular expression that defines the language


L = {baa, abbba, bababa} is (baa + abbba + bababa)

 This algorithm only works for finite languages because an infinite


language would become a regular expression that is infinitely long,
which is forbidden.
28
Equivalent Regular Expressions
 Definition
 Two regular expressions are said to be
equivalent if they generate the same
language.
 Example
 Consider the following regular
expressions
 r1 = (a + b)* (aa + bb)
 r2 = (a + b)*aa + ( a + b)*bb then both
regular expressions define the language
of strings ending in aa or bb
Example

 The language of all words that have at least two a’s can
be defined by the expression:
(a + b)*a(a + b)*a(a + b)*

 Another expression that defines all the words with at


least two a’s is
b*ab*a(a + b)*

 Hence, we can write


(a + b)*a(a + b)*a(a + b)* = b*ab*a(a + b)*

where by the equal sign we mean that these two


expressions are equivalent in the sense that they
30
describe the same language.
Example

 Let V be the language of all strings of a’s and b’s in


which either the strings are all b’s, or else an a followed
by some b’s. Let V also contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …}
 We can define V by the expression
b* + ab*
where Λ is included in b*.
 Alternatively, we could define V by
(Λ + a)b*
which means that in front of the string of some b’s, we
have
either an a or nothing.
31

You might also like