Theory in Programming Practice
Theory in Programming Practice
Jayadev Misra
The University of Texas at Austin
Austin, Texas 78712, USA
email: [email protected]
“He who loves practice without theory is like the sailor who boards ship without
a rudder and compass and never knows where he may be cast.”
Leonardo da Vinci (1452–1519)
3
4
Contents
Preface 3
1 Text Compression 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 A Very Incomplete Introduction to Information Theory . . . . . 10
1.3 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Uniquely Decodable Codes and Prefix Codes . . . . . . . 14
1.3.2 Constructing An Optimal Prefix Code . . . . . . . . . . . 15
1.3.3 Proof of Correctness . . . . . . . . . . . . . . . . . . . . . 17
1.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 Lempel-Ziv Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5
6 CONTENTS
3 Cryptography 51
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Early Encryption Schemes . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Substitution Cyphers . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Electronic Transmission . . . . . . . . . . . . . . . . . . . 54
3.3 Public Key Cryptography . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . 56
3.3.2 The RSA Scheme . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Digital Signatures and Related Topics . . . . . . . . . . . . . . . 65
3.5 Block Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Text Compression
1.1 Introduction
Data compression is useful and necessary in a variety of applications. These
applications can be broadly divided into two groups: transmission and storage.
Transmission involves sending a file, from a sender to a receiver, over a channel.
Compression reduces the number of bits to be transmitted, thus making the
transmission process more efficient. Storing a file in a compressed form typically
requires fewer bits, thus utilizing storage resources (including main memory
itself) more efficiently.
Data compression can be applied to any kind of data: text, image (such as
fax), audio and video. A 1-second video without compression takes around 20
megabytes (i.e., 170 megabits) and a 2-minute CD-quality uncompressed mu-
sic (44,100 samples per second with 16 bits per sample) requires more than 84
megabits. Impressive gains can be made by compressing video, for instance, be-
cause successive frames are very similar to each other in their contents. In fact,
real-time video transmission would be impossible without considerable compres-
sion. There are several new applications that generate data at prodigious rates;
certain earth orbiting satellites create around half a terabyte (1012 ) of data per
day. Without compression there is no hope of storing such large files in spite of
the impressive advances made in storage technologies.
Lossy and Lossless Compression Most data types, except text, are com-
pressed in such a way that a very good approximation, but not the exact content,
of the original file can be recovered by the receiver. For instance, even though
the human voice can range up to 20kHz in frequency, telephone transmissions
retain only up to about 5kHz.1 The voice that is reproduced at the receiver’s
end is a close approximation to the real thing, but it is not exact. Try lis-
1 A famous theorem, known as the sampling theorem, states that the signal must be sampled
at twice this rate, i.e., around 10,000 times a second. Typically, 8 to 16 bits are produced for
each point in the sample.
9
10 CHAPTER 1. TEXT COMPRESSION
white spaces are squeezed out, or the text is formatted slightly differently.
3 Knuth [30] gives a delightful treatment of a number of popular songs in this vein.
4 http://www.ibiblio.org/gutenberg/cgi-bin/sdb/t9.cgi/t9.cgi?entry=74
&full=yes&ftpsite=http://www.ibiblio.org/gutenberg/
1.2. A VERY INCOMPLETE INTRODUCTION TO INFORMATION THEORY11
entropy
p
0.0 0.5 1.0
Let us compute a few values of the entropy function. Suppose we have the
binary alphabet where the two symbols are equiprobable. Then, as is shown in
Figure 1.1,
That is, you need 1 bit on the average to encode each symbol, so you cannot
compress such strings at all! Next, suppose the two symbols are not equiprob-
able; “0” occurs with probability 0.9 and “1” with 0.1. Then,
The text can be compressed to less than half its size. If the distribution is even
more lop-sided, say 0.99 probability for “0” and 0.01 for “1”, then h = 0.080; it
12 CHAPTER 1. TEXT COMPRESSION
is possible to compress the file to 8% of its size. Note that Shannon’s theorem
does not say how to achieve this bound; we will see some schemes that will
asymptotically achive this bound.
Exercise 1
Show that for an alphabet of size m where all symbols are equally probable, the
entropy is log m. 2
Next, consider English text. The source alphabet is usually defined as the
26 letters and the space character. There are then several models for entropy.
The zero-order model assumes that the occurrence of each character is equally
likely. Using the zero-order model, the entropy is h = log 27 = 4.75. That is, a
string of length n would have no less than 4.75 × n bits.
The zero-order model does not accurately describe English texts: letters
occur with different frequency. Six letters — ‘e’, ‘t’, ‘a’, ‘o’, ‘i’, ‘n’— occur
over half the time; see Tables 1.1 and 1.2. Others occur rarely, such as ‘q’
and ‘z’. In the first-order model, we assume that each symbol is statistically
independent (that is, the symbols are produced independently) but we take
into account the probability distribution. The first-order model is a better
predictor of frequencies and it yields an entropy of 4.219 bits/symbol. For a
source Roman alphabet that also includes the space character, a traditional
value is 4.07 bits/symbol.
Higher order models take into account the statistical dependence among the
letters, such as that ‘q’ is almost always followed by ‘u’, and that there is a
high probability of getting an ‘e’ after an ‘r’. A more accurate model of English
yields lower entropy. The third-order model yields 2.77 bits/symbol. Estimates
by Shannon [46] based on human experiments have yielded values as low as 0.6
to 1.3 bits/symbol.
The Braille code, developed for use by the blind, uses a 2 × 3 matrix of dots
where each dot is either flat or raised. The 6 dots provide 26 = 64 possible
combinations. After encoding all the letters, the remaining combinations are
assigned to frequently occurring words, such as “and” and “for”.
Example Let the symbols {a, c, g, t} have the probabilities 0.05, 0.5, 0.4, 0.05
(in the given order). We show three different codes, C1, C2 and C3, and the
associated expected code lengths in Table 1.3.
Exercise 2
Try to construct the best tree for the following values {1, 2, 3, 4, 5, 7, 8}.
The weight of the best tree is 78. 2
5 String s is a prefix of string t if t = s +
+ x, for some string x, where +
+ denotes concate-
nation.
1.3. HUFFMAN CODING 15
0 1
c
1
0 1
g
01
0 1
a t
000 001
Remark: In a best tree, there is no dangling leaf; i.e., each leaf is labeled with
a distinct symbol. Therefore, every internal node (i.e., nonleaf) has exactly two
children. Such a tree is called a full binary tree.
Exercise 3
Show two possible best trees for the alphabet {0, 1, 2, 3, 4} with probabilities
{0.2, 0.4, 0.2, 0.1, 0.1}. The trees should not be mere rearrangements of each
other through reflections of subtrees. 2
Solution One possible solution is shown below.
0.4
0.1 0.1
0.2
0.1 0.1
Figure 1.3: Two different best trees over the same probabilities
leaves by the numbers from the bag so that the weight, i.e., the sum of the
weighted pathlengths to the leaves, is minimized.
The Huffman Algorithm If bag b has a single number, create a tree of one
node, which is both a root and a leaf, and label the node with the number.
Otherwise (the bag has at least two numbers), let u and v be the two smallest
numbers in b, not necessarily distinct. Let b0 = b − {u, v} ∪ {u + v}, i.e., b0 is
obtained from b by replacing its two smallest elements by their sum. Construct
a best tree for b0 . There is a leaf node in the tree labeled u + v; expand this
node to have two children that are leaves and label them with u and v.
1.0
1.0
1.0 1.0
0.05 0.05
Lemma 1: Let x and y be two values in a bag where x < y. In a best tree
for the bag, Y ≤ X, where X and Y are the pathlengths to x and y.
Proof: Let T be a best tree. Switch the two values x and y in T to obtain a new
tree T 0 . The weighted pathlengths to x and y in T are xX and yY , respectively.
And, they are xY and yX, respectively, in T 0 . Weighted pathlengths to all other
nodes in T and T 0 are identical. Since T is a best tree, weight of T is less than
equal to that of T 0 . Therefore,
xX + yY ≤ xY + yX
⇒ {arithmetic}
yY − xY ≤ yX − xX
⇒ {arithmetic}
(y − x)Y ≤ (y − x)X
⇒ {since x < y, (y − x) > 0; arithmetic}
Y ≤X
Proof: Let T be a best tree for b. Let U and V be the pathlengths to u and v,
respectively. Let the sibling of u be x and X be the pathlength to x. (In a best
tree, u has a sibling. Otherwise, delete the edge to u, and let the parent of u
become the node corresponding to value u, thus lowering the cost.)
If v = x, the lemma is proven. Otherwise, v < x.
v<x
⇒ {from Lemma 1}
X≤V
⇒ {u ≤ v and v < x. So, u < v. from Lemma 1}
X ≤ V and V ≤ U
⇒ {X = U , because x and u are siblings}
X=V
Switch the two values x and v (they may be identical). This will not alter
the weight of the tree because X = V , while establishing the lemma.
Lemma 3: Let T be an optimal tree for bag b in which u and v are siblings.
Let T 0 be all of T except the two nodes u and v; see Figure 1.5. Then T 0 is a
best tree for bag b0 = b − {u, v} ∪ {u + v}.
T’
u+v
u v
Figure 1.5: The entire tree is T for b; its upper part is T 0 for b0 .
W (T )
= {definition of weighted pathlength of T }
q + (p + 1) × u + (p + 1) × v
= {arithmetic}
q + p × (u + v) + (u + v)
= {definition of weighted pathlength of T 0 }
W (T 0 ) + (u + v)
Since T is the best tree for b, T 0 is the best tree for b0 . Otherwise, we may
replace T 0 by a tree whose weight is lower than W (T 0 ), thus reducing W (T ), a
contradiction since T is the best tree.
We combine Lemma 2 and 3 to get the following theorem, which says that
Huffman’s algorithm constructs a best tree.
2. T 0 is a best tree for b0 , where T 0 is all of T except the two nodes u and v;
see Figure 1.5.
Exercise 4
2. Show that the tree corresponding to an optimal prefix code is a full binary
tree.
3. In a best tree, consider two nodes labeled x and y, and let the correspond-
ing pathlengths be X and Y , respectively. Show that
1.3. HUFFMAN CODING 19
x<y ⇒ X≥Y
x ≤ y ⇒ X ≥ Y , and
x=y ⇒ X≥Y
5. Consider the first n fibonacci numbers (start at 1). What is the structure
of the tree constructed by Huffman’s algorithm on these values?
6. Prove that the weight of any tree is the sum of the values in the non-leaf
nodes of the tree. For example in Figure 1.4, the weight of the final tree is
1.6, and the sum of the values in the non-leaf nodes is 1.0 + 0.5 + 0.1 = 1.6.
Observe that a tree with a single leaf node and no non-leaf node has weight
0, which is the sum of the values in the non-leaf nodes (vacuously).
Does the result hold for non-binary trees?
8. Combining the results of the last two exercises, give an efficient algorithm
to compute the weight of the optimal tree; see Section 1.3.4.
9. (Research) As we have observed, there may be many best trees for a given
bag. We may wish to find the very best tree that is a best tree and in which
the maximum pathlength to any node is as small as possible, or the sum
of the pathlengths to the leaves is minimized. The following procedure
achieves both of these goals simultaneously: whenever there is a tie in
choosing values, always choose an original value rather than a combined
value. Show the correctness of this method and also that it minimizes the
maximum pathlength as well as the sum of the pathlengths among all best
trees. See Knuth [28], Section 2.3.4.5, page 404. 2
h≤H <h+1
certain phrases, and we can compress much better if we choose those as our
basic symbols.
The Lempel-Ziv code, described in the following section addresses some of
these issues.
1.3.4 Implementation
During the execution of Huffman’s algorithm, we will have a bag of elements
where each element holds a value and it points to either a leaf node —in case
it represents an original value— or a subtree —if it has been created during
the run of the algorithm. The algorithm needs a data structure on which the
following operations can be performed efficiently: (1) remove the element with
the smallest value and (2) insert a new element. In every step, operation (1)
is performed twice and operation (2) once. The creation of a subtree from two
smaller subtrees is a constant-time operation, and is left out in the following
discussion.
A priority queue supports both operations. Implemented as a heap, the
space requirement is O(n) and each operation takes O(log n) time, where n is the
maximum number of elements. Hence, the O(n) steps of Huffman’s algorithm
can be implemented in O(n log n) time.
If the initial bag is available as a sorted list, the algorithm can be imple-
mented in linear time, as follows. Let leaf be the list of initial values sorted in
ascending order. Let nonleaf be the list of values generated in sequence by the
algorithm (by summing the two smallest values in leaf ∪ nonleaf ).
The important observation is that
You are asked to prove this in part 7 of the exercises in Section 1.3.3.
This observation implies that the smallest element in leaf ∪ nonleaf at any
point during the execution is the smaller of the two items at the heads of leaf and
nonleaf . That item is removed from the appropriate list, and the monotonicity
property is still preserved. An item is inserted by adding it at the tail end of
nonleaf , which is correct according to monotonicity.
It is clear that leaf is accessed as a list at one end only, and nonleaf at
both ends, one end for insertion and the other for deletion. Therefore, leaf may
be implemented as a stack and nonleaf as a queue. Each operation then takes
constant time, and the whole algorithm runs in O(n) time.
sender scans the text from left to right identifying certain strings (henceforth,
called words) that it inserts into a dictionary. Let me illustrate the procedure
when the dictionary already contains the following words. Each word in the
dictionary has an index, simply its position in the dictionary.
index word
0 hi
1 t
2 a
3 ta
Suppose the remaining text to be transmitted is taaattaaa. The sender
scans this text from left until it finds a string that is not in the dictionary. In
this case, t and ta are in the dictionary, but taa is not in the dictionary. The
sender adds this word to the dictionary, and assigns it the next higher index,
4. Also, it transmits this word to the receiver. But it has no need to transmit
the whole word (and, then, we will get no compression at all). The prefix of
the word excluding its last symbol, i.e., ta, is a dictionary entry (remember, the
sender scans the text just one symbol beyond a dictionary word). Therefore, it
is sufficient to transmit (3, a), where 3 is the index of ta, the prefix of taa that
is in the dictionary, and a is the last symbol of taa.
The receiver recreates the string taa, by loooking up the word with index
3 and appending a to it, and then it appends taa to the text it has created
already; also, it updates the dictionary with the entry
index word
4 taa
Initially, the dictionary has a single word, the empty string, hi, as its only
(0th) entry. The sender and receiver start with this copy of the dictionary and
the sender continues its transmissions until the text is exhausted. To ensure
that the sender can always find a word which is not in the dictionary, assume
that the end of the file, written as #, occurs nowhere else in the string.
that point, 9. Transmission of the remaining portion of the string follows the
same procedure.
0, <>
t a c g
a g a c
c # c
Exercise 5
2.1 Introduction
The following description from Economist, July 3rd, 2004, captures the essence
of error correction and detection, the subject matter of this chapter. “On July
1st [2004], a spacecraft called Cassini went into orbit around Saturn —the
first probe to visit the planet since 1981. While the rockets that got it there are
surely impressive, just as impressive, and much neglected, is the communications
technology that will allow it to transmit its pictures millions of kilometers back
to Earth with antennae that use little more power than a light-bulb.
To perform this transmission through the noisy vacuum of space, Cassini
employs what are known as error-correcting codes. These contain internal tricks
that allow the receiver to determine whether what has been received is accurate
and, ideally, to reconstruct the correct version if it is not.”
First, we study the logical operator exclusive-or, which plays a central role
in error detection and correction. The operator is written as ⊕ in these notes.
It is a binary operator, and its truth table is shown in Table 2.1. Encoding true
by 1 and false by 0, we get Table 2.2, which shows that the operator is addition
modulo 2, i.e., addition in which you discard the carry.
In all cases, we apply ⊕ to bit strings of equal lengths, which we call words.
The effect is to apply ⊕ to the corresponding bits independently. Thus,
F T
F F T
T T F
25
26 CHAPTER 2. ERROR DETECTION AND CORRECTION
0 1
0 0 1
1 1 0
0110
⊕
1011
=
1101
• ⊕ is commutative: x ⊕ y = y ⊕ x
• ⊕ is associative: (x ⊕ y) ⊕ z = x ⊕ (y ⊕ z)
• inverse: x ⊕ x = 0, x ⊕ x = 1
• Complementation: (x ⊕ y) = x ⊕ y
Ŵ = 0
≡ {X, Y is a partition of W ; so Ŵ = X̂ ⊕ Ŷ }
2.1. INTRODUCTION 27
X̂ ⊕ Ŷ = 0
≡ {add Ŷ to both sides of this equation}
X̂ ⊕ Ŷ ⊕ Ŷ = Ŷ
≡ {Ŷ ⊕ Ŷ = 0 and X̂ ⊕ 0 = X̂}
X̂ = Ŷ 2
The proof of the following observation is similar to the one above and is
omitted.
Note: The two observations above say different things. The first one says
that if W is dependent then for all partitions into X and Y we have X̂ = Ŷ ,
and, conversely, if for all partitions into X and Y we have X̂ = Ŷ , then W
is dependent. The second observation implies a stronger result than the latter
part of the first observation: if there exists any (not all) partition into U and V
such that Û = V̂ , then W is dependent. 2
Exercise 6
x = α 1 β
x⊕u = α 0 γ
u = 0s 1 β⊕γ
Exercise 7
Some number of couples attend a party at which a black or white hat is placed
on every one’s head. No one can see his/her own hat, but see all others. Every
one is asked to guess the color of his/her hat (say, by writing on a piece of
paper). The persons can not communicate in any manner after the hats are
placed on their heads. Devise protocols by which:
Solution Let H be the exclusive-or of all hat colors, and h the color of hat of
a specific person and s the exclusive-or of all hat colors he/she can see. Clearly,
H = h ⊕ s, or h = H ⊕ s. Therefore, if a person knows the correct value of
H, then he/she can guess the hat color correctly by first computing s and then
H ⊕ s.
A more general problem Let there be N persons and let the number of
hat colors be t, 1 ≤ t ≤ N (previously, t = 2). Not every color may appear
on someone’s head. The value of t is told to the group beforehand. Devise a
protocol such that bN/tc persons guess their hat colors correctly.
For a solution see,
http://www.cs.utexas.edu/users/misra/Notes.dir/N-colorHats.pdf
Exercise 8
There are 100 men standing in a line, each with a hat on his head. Each hat is
either black or white. A man can see the hats of all those in front of him, but
not his own hat nor of those behind him. Each man is asked to guess the color
of his hat, in turn from the back of the line to the front. He shouts his guess
which every one can hear. Devise a strategy to maximize the number of correct
guesses.
A possible strategy is as follows. Number the men starting at 0 from the
back to the front. Let the guess of 2i be the color of (2i + 1)’s hat, and (2i + 1)’s
guess is what he heard from 2i. So, (2i + 1)’s guess is always correct; thus, half
the guesses are correct. We do considerably better in the solution, below.
2.1. INTRODUCTION 29
G⊕S
= {G = G0 ⊕ guess of a0 . And, guess of a0 = G0 ⊕ S 0 }
G0 ⊕ G0 ⊕ S 0 ⊕ S
= {G0 ⊕ G0 = 0. And, S 0 = S ⊕ h}
S⊕h⊕S
= {simplifying}
h
Exercise 9
(A Mathematical Curiosity) Let S be a finite set such that if x and y are in S,
so is x ⊕ y. First, show that the size of S is a power of 2. Next, show that if the
size of S exceeds 2 then S is dependent.
Solution See
http://www.cs.utexas.edu/users/misra/Notes.dir/NoteEWD967.pdf
Exercise 10
Let w1 , w2 , . . . , wN be a set of unknown words. Let Wi be the exclusive-or of
all the words except wi , 1 ≤ i ≤ N . Given W1 , W2 , . . . , WN , can you determine
the values of w1 , w2 , . . . , wN ? You can only apply ⊕ on the words. You may
prefer to attack the problem without reading the following hint.
Hint:
1. Show that the problem can be solved when N is even.
2. Show that the problem cannot be solved when N is odd.
w1 ⊕ w2 ⊕ w4 = 1 0 0 1 1
w1 ⊕ w3 = 1 0 1 1 0
w2 ⊕ w3 = 0 0 0 0 1
w3 ⊕ w4 = 1 1 0 1 1
x := x ⊕ y
From symmetry of the right side, the resulting value of x is also a complemen-
tation of y by x. If y is a word of all 1s, then x ⊕ y is the complement of (all
bits of) x; this is just an application of the law: x ⊕ 1 = x.
Suppose we want to construct a word w from x, y and u as follows. Wherever
u has a 0 bit choose the corresponding bit of x, and wherever it has 1 choose
from y, see the example below.
u = 0 1 0 1
x = 1 1 0 0
y = 0 0 1 1
w = 1 0 0 1
Exercise 11
Prove this result. 2
2.2.2 Toggling
Consider a variable x that takes two possible values, m and n. We would like to
toggle its value from time to time: if it is m , it becomes n and vice versa. There
is a neat way to do it using exclusive-or. Define a variable t that is initially set
to m ⊕ n and never changes.
toggle:: x := x ⊕ t
To see why this works, check out the two cases: before the assignment, let
the value of x be m in one case and n in the other. For x = m, the toggle sets
x to m ⊕ t, i.e., m ⊕ m ⊕ n, which is n. The other case is symmetric.
Exercise 12
Variable x assumes the values of p, q and r in cyclic order, starting with p.
Write a code fragment to assign the next value to x, using ⊕ as the primary
operator in your code. You will have to define additional variables and assign
them values along with the assignment to x.
32 CHAPTER 2. ERROR DETECTION AND CORRECTION
Solution Define two other variables y and z whose values are related to x’s
by the following invariant:
x, y, z = t, t ⊕ t0 , t ⊕ t00
where t0 is the next value in cyclic order after t (so, p0 = q, q 0 = r and r0 = p),
and t00 is the value following t0 . The invariant is established initially by letting
x, y, z = p, p ⊕ q, p ⊕ r
The cyclic assignment is implemented by
x := x ⊕ y;
y := y ⊕ z;
z := y ⊕ z
Show that if x, y, z = t, t ⊕ t0 , t ⊕ t00 before these assignments, then x, y, z =
t0 , t0 ⊕ t00 , t0 ⊕ t after the assignments (note: t000 = t). 2
2.2.3 Exchange
Here is a truly surprising application of ⊕. If you wish to exchange the val-
ues of two variables you usually need a temporary variable to hold one of the
values. You can exchange without using a temporary variable. The following
assignments exchange the values of x and y.
x := x ⊕ y;
y := x ⊕ y;
x := x ⊕ y
To see that this program actually exchanges the values, suppose the values
of x and y are X and Y before the exchange. The following annotated program
shows the values they have at each stage of the computation; I have used back-
ward substitution to construct this annotation. The code is to the left and the
annotation to the right in a line.
y = Y, (x ⊕ y) ⊕ y = X, i.e., x = X, y = Y
x := x ⊕ y;
x ⊕ (x ⊕ y) = Y, (x ⊕ y) = X, i.e., y = Y, (x ⊕ y) = X
y := x ⊕ y;
x ⊕ y = Y, y = X
x := x ⊕ y
x = Y, y = X
therefore he approaches the programming task in full humility, and among other things he
avoids clever tricks like the plague”. From “The Humble Programmer” by Edsger W. Dijkstra,
1972 Turing Award lecture [15].
34 CHAPTER 2. ERROR DETECTION AND CORRECTION
losing if the result is 0, winning otherwise. Thus, the state (1,1) results in 0, a
losing state, whereas (1,2) gives 0 1 ⊕ 1 0 = 1 1, which is a winning state. The
final state, where all piles are empty, is a losing state. The mnemonics, losing
and winning, signify the position of a player: a player who has to make a move
in a winning state has a winning strategy, i.e., if he makes the right moves he
wins no matter what his opponent does; a player in a losing state will definitely
lose provided his opponent makes the right moves. So, one of the players has a
winning strategy based on the initial state. Of course, either player is allowed
to play stupidly and squander a winning position.
The proof of this result is based on the following state diagram. We show
that any possible move in a losing state can only lead to a winning state, thus
a player who has to move in this state cannot do anything but hope that his
opponent makes a mistake! A player in a winning state has at least one move to
transform the state to losing; of course, he can make a wrong move and remain
in the winning state, thus handing his opponent the mistake he was hoping for.
Next, we prove the claims made in this diagram.
all moves
losing winning
there is a move
Proof of (1): x⊕u = α 0(β ⊕γ). Comparing x and x⊕u, we have x⊕u < x.
Proof of (2): The exclusive-or of the piles before the move is u; so, the
exclusive-or of the piles except x is x ⊕ u. Hence, the exclusive-or of the piles
after the move is (x ⊕ u) ⊕ (x ⊕ u), which is 0.
Exercise 13
In a winning state let y be a pile that has a 0 in the same position as the leading
bit of u. Show that removing any number of chips from y leaves a winning state.
u = 0s 1 γ
y = α 0 β
Exercise 14
The following technique has been suggested for improving the security of trans-
mission. The sender encrypts the first block using the key k. He encrypts
subsequent blocks by using the previous encrypted block as the key. Is this
secure? How about using the plaintext of the previous block as the key? Sup-
pose a single block is deciphered by the eavesdropper; can he then decipher all
blocks, or all subsequent blocks? 2
fc ⊕ rd
= mc ⊕ re⊕c ⊕ rd
= mc ⊕ rc⊕d⊕c ⊕ rd
= mc ⊕ rd ⊕ rd
= mc
We claim that Alice does not know c, because she receives e which tells her
nothing about c. And, Bob does not know mc from the data he receives from
Alice. All he can do is apply exclusive-or with rd . If Bob computes fc ⊕ rd he
gets
2.5. RAID ARCHITECTURE 37
fc ⊕ rd
= mc ⊕ re⊕c ⊕ rd
= mc ⊕ rc⊕d⊕c ⊕ rd
= mc ⊕ r1⊕d ⊕ rd
= mc ⊕ rd ⊕ rd
= mc ⊕ r0 ⊕ r1
The sender appends a bit at the end of each block so that each 4-bit block
has an even number of 1s. This additional bit is called a parity bit, and each
block is said to have even parity. After addition of parity bits, the input string
shown above becomes,
This string is transmitted. Suppose two bits are flipped during transmission,
as shown below; the flipped bits are underlined.
Note that the flipped bit could be a parity bit or one of the original ones.
Now each erroneous block has odd parity, and the receiver can identify all such
blocks. It then asks for retransmissions of those blocks.
If two bits (or any even number) of a block get flipped, the receiver cannot
detect the error. This is a serious problem, so simple parity check is rarely
used. In practice, the blocks are much longer (than 3, shown here) and many
additional bits are used for error detection.
Is parity coding any good? How much is the error probability reduced if
you add a single parity bit? First, we compute the probability of having one
or more error in a b bit block, and then compute the probability of missing
errors even after adding a single parity bit. The analysis here uses elementary
probability theory.
Let p be the probability of error in the transmission of a single bit2 . The
probability of correct transmission of a single bit is q, where q = 1 − p. The
probability of correct transmission of a b bit block is q b . Therefore, without
parity bits the probability that there is an undetected error in the block is
1 − q b . For p = 10−4 and b = 12, this probability is around 1.2 × 10−3 .
With the addition of a parity bit, we have to send b + 1 bits. The probability
of n errors in a block of b + 1 bits is
b+1
pn × q (b+1−n)
n
2 I am assuming that all errors are independent, a thoroughly false assumption when burst
1 0 1 1 1
0 1 1 1 1
1 1 1 0 1
0 0 1 1 0
0 0 0 1 1
b+1
This can be understood as follows. First, is the number of different
n
ways of choosing n bits out of b + 1 bits (this is a binomial coefficient), pn is the
probability of all these bits becoming erroneous, and q (b+1−n) is the probability
of the remaining bits being error-free.
We can not detect any even number of errors with a single parity bit. So,
the probability of undetected error is the sum of this term over all even values
of n, 0 < n ≤ b + 1. We can simplify calculations by noting that q is typically
b+1
very small; so we may ignore all except the first term, i.e., take pn ×
2
q (b+1−2) as the probability of undetected error. Setting b, p, q = 12, 10−4 , 1 −
10−4 , this probability is around 7.8 × 10−7 , several orders of magnitude smaller
than 1.2 × 10−3 .
Exercise 15
Show an error pattern in Table 2.4 that will not be detected by this method. 2
Exercise 16
Develop a RAID architecture based on two-dimensional parity bits. 2
40 CHAPTER 2. ERROR DETECTION AND CORRECTION
• (d(x, y) = 0) ≡ (x = y)
• d(x, y) ≥ 0
• d(x, y) = d(y, x)
The first two properties are easy to see, by inspection. For the last property,
observe that it is sufficient to prove this result when x, y and z are single bits,
because the distance between bit strings are computed bit by bit. We can prove
d(x, y) + d(y, z) ≥ d(x, z) as follows3 .
d(x, y) + d(y, z)
= {x, y and z are single bits. So, d(x, y) = x ⊕ y}
(x ⊕ y) + (y ⊕ z)
≥ {For bits a and b, a + b ≥ a ⊕ b. Let a = x ⊕ y and b = y ⊕ z}
(x ⊕ y) ⊕ (y ⊕ z)
= {simplify}
x⊕y⊕y⊕z
= {simplify}
x⊕z
= {x and z are single bits. So, d(x, z) = x ⊕ z}
d(x, z)
Table 2.5: Coding for error correction; parity bits are in bold
Exercise 17
Let x and y be non-negative integers, count(x) the number of 1s in the binary
representation of x, and even(x) is true iff x is even. We say that x has even
parity if count(x) is even, otherwise it has odd parity. Show that two words
of identical parity (both even or both odd) have even distance, and words of
different parity have odd distance.
The term even(count(x)) stands for “x has even parity”. Therefore, the first
term in the last line of the above proof, (even(count(x)) ≡ even(count(y))),
denotes that x and y have identical parity. Hence, the conclusion in the above
proof says that the distance between x and y is even iff x and y have identical
parity. 2
For the given example, we can detect two errors and correct one error in
transmission. Suppose 11011 is changed to 11010. The receiver observes that
this is not a codeword, so he has detected an error. He corrects the error
by looking for the nearest codeword, the one that has the smallest Hamming
distance from the received word. The computation is shown in Table 2.6. As
shown there, the receiver concludes that the original transmission is 11011.
Now suppose two bits of the original transmission are altered, so that 11011
is changed to 10010. The computation is shown in Table 2.7. The receiver will
detect that there is an error, but based on distances, he will assume that 10110
was sent. We can show that this particular encoding can correct one error only.
The number of errors that can be detected/corrected depends on the Hamming
distance among the codewords, as given by the following theorem.
Theorem 1 Let h be the Hamming distance between the nearest two code-
words. It is possible to detect any number of errors less than h and correct any
number of errors less than h/2.
1. if d(x, y) < h: the receiver can detect if errors have been introduced during
transmission.
2. if d(x, y) < h/2: the receiver can correct the errors, if any. It picks the
closest codeword to y, and that is x.
Proof of (1): The distance between any two distinct codewords is at least
h. The distance between x and y is less than h. So, either x = y or y is
not a codeword. Therefore, the receiver can detect errors as follows: if y is
2.7. ERROR CORRECTION 43
0 0 1 1 1 1 0 1 0 1 1 0 1
d d d d d c d d d c d c c
13 12 11 10 9 8 7 6 5 4 3 2 1
* * * * * * * *
requires only logarithmic number of extra bits, called check bits, and it corrects
at most one error in a transmission. The novel idea is to transmit in the check
bits the positions where the data bits are 1. Since it is impractical to actually
transmit all the positions, we will instead transmit an encoding of them, using
exclusive-or. Also, since the check bits can be corrupted as easily as the data
bits, we treat check bits and data bits symmetrically; so, we also send the
positions where the check bits are 1s. More precisely, we regard each position
number in the transmitted string as a word, and encode the check bits in such
a way that the following rule is obeyed:
• HC Rule: the set of position numbers where the data bits and check
bits are 1 form a dependent set, i.e., the exclusive-or of these positions,
regarded as words, is 0 (see Section 2.1.2).
1011 (=11)
⊕
1010 (=10)
⊕
1001 (=9)
⊕
1000 (=8)
⊕
0110 (=6)
⊕
0100 (=4)
⊕
0011 (=3)
⊕
0001 (=1)
= 0000 2
2.7. ERROR CORRECTION 45
The question for the sender is where to store the check bits (we have stored
them in positions 8, 4, 2 and 1 in the example above) and how to assign values
to them so that the set of positions is dependent. The question for the receiver
is how to decode the received string and correct a possible error.
Receiver Let P be the set of positions where the transmitted string has 1s
and P 0 where the received string has 1s. From the assumption that there is at
most one error, we have either P = P 0 , P 0 = P ∪ {t}, or P = P 0 ∪ {t}, for some
position t; the latter two cases arise when the bit at position t is flipped from 0
to 1, and 1 to 0, respectively. From rule HC, P̂ = 0, where P̂ is the exclusive-or
of the words in P .
The receiver computes Pˆ0 . If P = P 0 , he gets Pˆ0 = P̂ = 0. If P 0 = P ∪ {t},
he gets Pˆ0 = P̂ ⊕ {t} = 0 ⊕ {t} = t. If P = P 0 ∪ {t}, he gets P̂ = Pˆ0 ⊕ {t}, or
Pˆ0 = P̂ ⊕ {t} = 0 ⊕ {t} = t. Thus, in both cases where the bit at t has been
flipped, Pˆ0 = t. If t 6= 0, the receiver can distinguish error-free transmission
from erroneous transmission and correct the error in the latter case.
Sender We have seen from the previous paragraph that there should not be
a position numbered 0, because then error-free transmission cannot be distin-
guished from one where the bit at position 0 has been flipped. Therefore, the
positions in the transmitted string are numbered starting at 1. Each position is
an n-bit word. And, we will employ n check bits.
Check bits are put at every position that is a power of 2 and the remaining
bits are data bits. In the example given earlier, check bits are put at positions
1, 2, 4 and 8, and the remaining nine bits are data bits. So the position of
any check bit as a word has a single 1 in it. Further, no two check bit position
numbers have 1s in the same place.
Let C be the set of positions where the check bits are 1s and D the positions
where the data bits are 1s. We know D, but we don’t know C yet, because
check bits have not been assigned values. We show next that C is uniquely
determined from rule HC.
From rule HC, Ĉ ⊕ D̂ = 0. Therefore, Ĉ = D̂. Since we know D, we can
compute D̂. For the example considered earlier, D̂ = 1101. Therefore, we have
to set the check bits so that Ĉ = 1101. This is done by simply assigning the
bit string Ĉ to the check bits in order from higher to lower positions; for the
example, assign 1 to the check bit at positions 8, 4 and 1, and 0 to the check
bit at position 2. The reason this rule works is that assigning a value v to the
check bit at position 2i , i ≥ 0, in the transmitted string has the effect of setting
the ith bit of Ĉ to v.
How many check bits do we need for transmitting a given number of data
bits? Let d be the number of data bits and c the number of check bits. With
c check bits, we can encode 2c positions, i.e., 0 through 2c − 1. Since we have
decided not to have a position numbered 0 (see the discussion at the end of
the “Receiver” and the beginning of the “Sender” paragraphs), the number of
positions is at most 2c − 1. We have, d + c ≤ 2c − 1. Therefore, the number of
46 CHAPTER 2. ERROR DETECTION AND CORRECTION
Hadamard Matrix
We will define a family of 0, 1 matrices H, where Hn is a 2n × 2n matrix, n ≥ 0.
In the Reed-Muller code, we take each row of the matrix to be a codeword.
The family H is defined recursively.
H0 = 1
Hn Hn
Hn+1 =
Hn Hn
1 1 1 1
1 0 1 0
H2 =
1
1 0 0
1 0 0 1
Hadamard matrices have many pleasing properties. The two that are of
interest to us are: (1) Hn is symmetric for all n, and (2) the Hamming distance
between any two distinct rows of Hn , n ≥ 1, is 2n−1 . Since the matrices have
been defined recursively, it is no surprise that the proofs employ induction. I
will leave the proof of (1) to you. Let us prove (2).
We apply matrix algebra to prove this result. To that end, we replace a 0
by −1 and leave a 1 as 1. Dot product of two words x and y is given by
x · y = Σi (xi × yi )
To show that all pairs of distinct rows of Hn differ in exactly half the po-
sitions, we take the matrix product Hn × HnT and show that the off-diagonal
elements, those corresponding to pairs of distinct rows of Hn , are all zero. That
is, Hn × HnT is a diagonal matrix. Since Hn is symmetric, Hn = HnT . We show:
• n + 1, where n ≥ 0 :
Hn+1 × Hn+1
= {definition of Hn+1 }
Hn Hn Hn Hn
×
Hn Hn Hn Hn
= {matrix multiplication}
Hn × Hn + Hn × Hn Hn × Hn + Hn × Hn
Hn × Hn + Hn × Hn Hn × Hn + Hn × Hn
= {Hn = −Hn }
2(Hn × Hn ) 0
0 2(Hn × Hn )
Exercise 20
Compute the Hamming distance of x and y in terms of x · y and the lengths of
the words. 2
Solution Let
m = the length of x (and also of y)
e = number of positions i where xi = yi
d = number of positions i where xi 6= yi
Thus, the Hamming distance is d. We have
e + d = m, and
e − d = x · y, therefore
e + d − (e − d) = m − x · y, or
d = (m − x · y)/2
50 CHAPTER 2. ERROR DETECTION AND CORRECTION
Chapter 3
Cryptography
3.1 Introduction
A central problem in secure communication is the following: how can two parties,
a sender and a receiver, communicate so that no eavesdropper can deduce the
meaning of the communication?
Suppose Alice has a message to send to Bob. Henceforth, we take a mes-
sage to be a string over a specified alphabet; the string to be transmitted is
usually called plaintext. The plaintext need not be a meaningful sentence. For
instance, it could be a password or a credit card number. Any message could
be intercepted by an eavesdropper, named Eve; so, it is not advisable to send
the message in its plaintext form. Alice will encrypt the plaintext to create a
ciphertext. Alice and Bob agree on a protocol, so that only Bob knows how to
decrypt, i.e., convert the ciphertext to plaintext.
The goal of encryption and decryption is to make it hard (or impossible)
for Eve to decrypt the ciphertext while making it easy for Alice to encrypt and
Bob to decrypt. This means that Bob has some additional information, called
a key, which Eve does not possess. When Alice and Bob share knowledge of the
key, they are using a symmetric key system. Modern public key cryptography
is asymmetric; encrypting uses a public key that is known to every one while
decrypting requires a private key known only to the receiver.
The communication medium is really not important. Alice could write her
message on a piece of paper (or on a clay tablet) and mail it physically; or she
could send the message by email. Alice and Bob could engage in a telephone
conversation in which only Alice speaks. Any communication medium is vul-
nerable, so security is achieved by choosing the encryption (and decryption)
algorithms carefully.
51
52 CHAPTER 3. CRYPTOGRAPHY
a c d k n t w
d t k w n a c
EaaEtw Ea kEcn
Since the second word is a two letter word beginning with E, which is uncommon
except for proper names, we decide to abandon the guess that d is E. We try
replacing d by t, the next most common symbol, to get
TaaTtw Ta kTcn
Now, it is natural to assume that a is really o, from the word Ta. This gives:
TOOTtw TO kTcn
It is unlikely that we can make any progress with the first word. So, we start
fresh, with d set to the next most likely letter, a.
A variation of this scheme is to consider the frequencies of pairs of letters in
the ciphertext, in the hope of eliminating certain possibilities. In Table 3.3, we
take the two most common letters in the ciphertext, d and a, and compute the
number of times they are adjacent to certain other letters; check that d and a
are adjacent to each other 3 times and d and k are adjacent just once. We see
that the adjacency of d and a is quite common. We may reasonably guess that
one of d and a is a vowel. Because of the presence of the word da, both are not
vowels.
a c d k n t w
d 3 1 1 1
a 1 3
the ciphertext is short, the frequencies may not correspond to the letter fre-
quencies we expect. Consider a short text like quick quiz, which you may send
to a friend regarding the ease of a pop quiz in this class. It will be difficult to
decipher this one through frequency analysis, though the pair qu, which occurs
twice, may help.
Exercise 21
Would it improve security to encrypt the plaintext two times using substitution
cypher instead of just once? 2
It can be shown that one-time pads are unbreakable. However, the major
difficulty is in sharing the pad. How can Alice and Bob agree on a pad to begin
with? In ancient times it was conceivable that they could agree on a common
book —say, the King James version of the Bible— and use successive strings
from the book as keys. However, the need to develop different pads for each
pair of communicants, and distribute the pads efficiently (i.e., electronically)
and securely, makes this scheme impractical.
Many military victories, defeats and political intrigues over the entire course
of human history are directly attributable to security/breakability of codes.
Lively descriptions appear in a delightful book by Singh [47].
can check 1010 messages a second, it will still take around 58 years to check all
possible messages. She could, of course, be lucky, and get a break early in the
search; but, the probability of being lucky is quite low.
The notion of computational intractability was not available to the early
cryptanalysts; it is a product of modern computer science. We now know that
there are certain problems which, though decidable, are computationally in-
tractable in that they take huge amounts of time to solve. Normally, this is
an undesirable situation. We have, however, turned this disadvantage to an
advantage by putting the burden of solving an intractable problem on Eve, the
eavesdropper.
The idea of public key cryptography using one-way functions is due to Diffie
and Hellman [14]. Rivest, Shamir and Adelman [44] were the first to propose a
specific one-way function that has remained unbroken (or, so it is believed). In
the next section, I develop the theory behind this one-way function.
Examples
Relatively Prime Positive integers x and y are relatively prime iff gcd(x, y) =
1. Since gcd(0, x) = x for positive x, it follows that 0 and x are relatively prime
iff x = 1. Note that gcd(0, 0) is undefined.
Exercise 22
Disprove each of the following conclusions.
mod p
u ≡ v,
mod p
x ≡ y
mod p
max(u, x) ≡ max(v, y),
x mod p
u ≡ vy
• (P4; due to Fermat) bp−1 mod p = 1, where p is prime, and b and p are
relatively prime.
Exercise 23
With b and p as in (P4) show that for any nonnegative integer m
mod p
bm ≡ bm mod (p−1)
58 CHAPTER 3. CRYPTOGRAPHY
Solution Write bm as bp−1 × bp−1 . . . × bm mod (p−1) . Use (P4) to reduce each
bp−1 mod p to 1. 2
a × x + b × y = gcd(x, y).
This result is known as Bézout’s lemma. For example, let x, y = 12, 32. Then
gcd(x, y) = 4. And, a, b = 3, −1 satisfy the equation.
Note that a and b need not be positive, nor are they unique. In fact, verify
that for any solution (a, b), the pair given below is also a solution, where k is
any integer.
a×x+b×y =y
a0 × (x − y) + b0 × y = gcd(x − y, y)
a0 × (x − y) + b0 × y = gcd(x, y)
a0 × x + (b0 − a0 ) × y = gcd(x, y)
Next, consider the classical Euclid algorithm for computing gcd. We will
modify this algorithm to compute a and b as well.
3.3. PUBLIC KEY CRYPTOGRAPHY 59
u, v := x, y
{u ≥ 0, v ≥ 0, u 6= 0 ∨ v 6= 0, gcd(x, y) = gcd(u, v)}
while v 6= 0 do
u, v := v, u mod v
od
{gcd(x, y) = gcd(u, v), v = 0}
{gcd(x, y) = u}
One way of computing u mod v is to explicitly compute the quotient q,
q = bu/vc, and subtract v × q from u. Thus, u, v := v, u mod v is replaced
by
q := bu/vc;
u, v := v, u − v × q
To compute a and b as required, we augment this program by introducing
variables a, b and another pair of variables c, d, which satisfy the invariant
(a × x + b × y = u) ∧ (c × x + d × y = v)
An outline of the program is shown below.
u, v := x, y; a, b := 1, 0; c, d := 0, 1;
while v 6= 0 do
q := bu/vc;
α : {(a × x + b × y = u) ∧ (c × x + d × y = v)}
u, v := v, u − v × q;
a, b, c, d := a0 , b0 , c0 , d0
β : {(a × x + b × y = u) ∧ (c × x + d × y = v)}
od
a b u c d v q
1 0 17 0 1 2668
0
0 1 2668 1 0 17
156
1 0 17 −156 1 16
1
−156 1 16 157 −1 1
16
157 −1 1 -2668 17 0
c0 , d0 = a − c × q, b − d × q
u, v := x, y; a, b := 1, 0; c, d := 0, 1;
while v 6= 0 do
q := bu/vc;
α : {(a × x + b × y = u) ∧ (c × x + d × y = v)}
u, v := v, u − v × q;
a, b, c, d := c, d, a − c × q, b − d × q
β : {(a × x + b × y = u) ∧ (c × x + d × y = v)}
od
a×x+b×y
= {from the invariant}
u
= {u = gcd(x, y), from the annotation of the first program in page 59}
gcd(x, y)
Example Table 3.4 shows the steps of the algorithm for x, y = 17, 2668. 2
Exercise 24
Show that the algorithm terminates, and that α : {(a × x + b × y = u) ∧ (c ×
x + d × y = v)} is a loop invariant. Use annotations shown in the program. 2
1. 1 ≤ d < n, 1 ≤ e < n,
2. both d and e are relatively prime to φ(n), and
mod φ(n)
3. d × e ≡ 1.
r|d
⇒ {arithmetic}
r|(d × e)
⇒ {d × e = k × φ(n) + 1}
r|(k × φ(n) + 1)
⇒ {since r is a divisor of φ(n), r|(k × φ(n)}
r|1
⇒ {arithmetic}
|r| = 1
62 CHAPTER 3. CRYPTOGRAPHY
• Publicize (e, n) as the public key. Save (d, n) as the private key.
3.3.2.1 Encryption
To send message M to a principal whose public key is (e, n) and 0 ≤ M < n,
send M 0 where M 0 = (M e mod n).
Example; contd. Let us represent each letter of the alphabet by two digits,
with white space = 00 a = 01 b = 02, etc.
Suppose the message to be sent is “bad day”. The representation yields:
02010400040125.
Since n is 2773, we can convert any pair of letters to a value below n, the
largest such pair being zz which is encoded as 2626. Therefore, our block length
is 2 letters. We get the following blocks from the encoded message: 0201 0400
0401 2500; we have appended an extra blank at the end of the last block to
make all blocks have equal size.
Now for encryption of each block. We use the parameters from the previous
example, where e = 17. For the first block, we have to compute 020117 mod
2773, for the second 040017 mod 2773, etc. 2
There is an efficient way to raise a number to a given exponent. To compute
M 17 , we need not multiply M with itself 16 times. Instead, we see that M 17 =
M 16 ×M = (M 8 )2 ×M = ((M 4 )2 )2 ×M = (((M 2 )2 )2 )2 ×M . The multiplication
strategy depends on the binary representation of the exponent. Also, at each
stage, we may apply mod n, so that the result is always less than n. Specifically,
C := 1;
for i = k..0 do
if ei = 0
then C := C 2 mod n
else C := ((C 2 mod n) ∗ M ) mod n
fi
od
Exercise 25
The algorithm to compute M e mod n, given earlier, scans the binary represen-
tation of e from left to right. It is often easier to scan the representation from
right to left, because we can check if e is even or odd easily on a computer. We
use:
M 2t = (M 2 )t
M 2t+1 = (M 2 )t × M
C := 1; h, m := e, M ;
while h 6= 0 do
if odd(h) then C := C ∗ m fi ;
h := h ÷ 2;
m := m2
od
{C = M e }
3.3.2.2 Decryption
M 00 = (M 0d mod n)
Example We continue with the previous example. The encryption and de-
cryption steps are identical, except for different exponents. We use the encryp-
tion algorithm with exponent 157 to decrypt. The encryption of 0201 is 2710
and of 0400 is 0017. Computing 2710157 mod 2773 and 0017157 mod 2773 yield
the original blocks, 0201 and 0400. 2
mod p
Lemma 1: For any M , 0 ≤ M < n, M d×e ≡ M.
Proof:
M d×e mod p
mod φ(n)
= {d × e ≡ 1, and φ(n) = (p − 1) × (q − 1)}
M t×(p−1)+1 mod p, for some t
= {rewriting}
((M (p−1) )t × M ) mod p
= {modular simplification: replace (M (p−1) ) by (M (p−1) ) mod p}
((M (p−1) mod p)t × M ) mod p
= {Consider two cases:
• M and p are not relatively prime:
Since p is prime, M is a multiple of p, i.e.,
(M mod p) = 0. So, M (p−1) mod p = 0.
The entire expression is 0, thus equal to M mod p.
• M and p are relatively prime:
Then, (M (p−1) ) mod p = 1, from (P4).
The expression is (1t × M ) mod p = M mod p.
}
M mod p
Lemma 2: For any M , 0 ≤ M < n, (M d×e mod n) = M .
Proof:
mod p
M ≡ M d×e , from Lemma 1
mod q d×e
M ≡ M , replacing p by q in Lemma 1
mod n d×e
M ≡ M , from above two, using P3 and n = p × q
(M mod n) = (M d×e mod n) , from above
M = (M d×e mod n) , M < n; so M mod n = M
We are now ready to prove the main theorem, that encryption followed by
decryption yields the original message.
Theorem: M 00 = M .
M
= {Lemma 2}
M d×e mod n
= {arithmetic}
(M e )d mod n
= {modular simplification rule}
3.4. DIGITAL SIGNATURES AND RELATED TOPICS 65
Exercise 26
Why is it necessary to choose distinct primes for p and q?
fortunately (or, fortunately for cryptopgraphy), the algorithm can run only on a quantum
computer.
66 CHAPTER 3. CRYPTOGRAPHY
Alice encrypts the message using her private key; this is now a signed mes-
sage. More formally, let x be the message, fa and fb the public keys of Alice
and Bob, and fa−1 and fb−1 be their private keys, respectively. Then fa−1 (x) is
the message signed by Alice. If the message is intended for Bob’s eyes only, she
encrypts the signed message with Bob’s public key, and sends fb (fa−1 (x)). Bob
first decrypts the message using his own private key, then decrypts the signed
message using Alice’s public key. Alice may also include her name in plaintext,
fb (“alice” ++ fa−1 (x)) where ++ is concatenation, so that Bob will know whose
public key he should apply to decrypt the signed message.
We show that such signatures satisfy two desirable properties. First, de-
crypting any message with the Alice’s public key will result in gibberish unless
it has been encrypted with her private key. So, if Bob is able to get a meaningful
message by decryption, he is convinced that Alice sent the message.
Second, Alice cannot deny sending the message, because no one else has ac-
cess to her private key. An impartial judge can determine that Alice’s signature
appears on the document (message) by decrypting it with her public key. Note
that the judge does not need access to any private information.
Note that no one can modify this message while keeping Alice’s signature
affixed to it. Thus, no electronic cutting and pasting of the message/signature
is possible.
Observe a very important property of the RSA scheme: any message can be
encrypted by the public or the private key and decrypted by its inverse.
Digital signatures are now accepted for electronic documents. A user can
sign a check, or a contract, or even a document that has been signed by other
parties.
Another look at one-time pads We know that one-time pads provide com-
plete security. The only difficulty with them is that both parties to a transmis-
sion must have access to the same pad. We can overcome this difficulty using
RSA, as follows.
Regard each one-time pad as a random number. Both parties to a trans-
mission have access to a pseudo-random number generator which produces a
stream of random numbers. The pseudo-random number generator is public
knowledge, but the seed which the two parties use is a shared secret. Since they
use the same seed, they will create the same stream of random numbers. Then
the encryption can be relatively simple, like taking exclusive or.
This scheme has one drawback, having the seed as a shared secret. RSA
does not have this limitation. We can use RSA to establish such a secret: Bob
generates a random number as the seed and sends it to Alice, encrypted by
Alice’s public key. Then, both Bob and Alice know the seed.
of Bob by her own. Then Alice encrypts a message for Bob by Eve’s public key.
Any message she sends to Bob could be intercepted and decrypted by Eve.
The problem arises because Alice does not know that the message received
from David is not authentic. To establish authenticity, David —often called a
trusted third party— could sign the message. Then, Eve cannot do the substi-
tution as described above.
Trusted third parties play a major role in security protocols, and authenti-
cating such a party is almost always handled by digital signatures.
Blind Signature Alice wants Bob to sign a document without revealing its
contents to Bob. This is useful if Bob is a notary, so that Bob does not need to
know the contents of the document, but merely verify the signature.
Suppose Alice has a document M . She would like to have M d mod n, where
d is Bob’s decryption key and n is as described in Section 3.3.2 (page 60). Alice
sends M × k e to Bob, where e is Bob’s encryption key and k is some random
number, 1 ≤ k < n. Bob signs the document with his decryption key d and
returns (M × k e )d mod n to Alice. Now,
(M × k e )d mod n
= {arithmetic}
68 CHAPTER 3. CRYPTOGRAPHY
(M d × k e×d ) mod n
= {arithmetic}
(M d × (k e×d mod n)) mod n
= {From Lemma 2, page 64, (k e×d mod n = k}
(M d × k) mod n
don’t really need that g be a (p−1)th root of 1; all we need is that g k mod p 6= 1,
for many values of k, k < p − 1. With this relaxed requirement, we choose p to
be a prime of the form 2 × q + 1, where q is a prime. Such primes are known as
safe or Sophie-Germain primes. The first few safe primes are: 5, 7, 11, 23, 47,
59, 83, 107, 167, 179, 227, 263.
Given p = 2×q +1, from Fermat’s Theorem (P4 in Page 57), g 2×q mod p = 1
for any g. That is, either g 2 mod p = 1 or g q mod p = 1; further, g k mod p 6= 1,
for all other values of k, 1 ≤ k ≤ q. We look for a g such that g 2 mod p 6= 1; then
g k mod p 6= 1, for all k, 1 ≤ k < q. We find such a g by a dumb search, simply
looking at g = 2, 3, 4, · · · until we find one such that g 2 mod p 6= 1. Usually,
this is a very short computation.
Woman in the Middle attack The given protocol fails if Eve intercepts
every communication. She pretends to be Alice to Bob and Bob to Alice. She
substitutes her own keys, e and e0 , in place of a and b. Thus, she establishes a
secret s with Bob and another secret s0 with Alice, thus decoding all communi-
cation between them.
There is no easy way to handle this problem. Each communication has to be
authenticated, i.e., Alice must be sure that any message purported to be from
Bob is actually from Bob (and similarly for Bob), at least for the duration of
the Diffie-Hellman exchange protocol. This can be accomplished by Alice and
Bob signing every message they send (using their private keys).
4.1 Introduction
4.1.1 Wolf-Goat-Cabbage Puzzle
A shepherd arrives at a river bank with a wolf, a goat and a cabbage. There is a
boat there that can carry them to the other bank. However, the boat can carry
the shepherd and at most one other item. The shepherd’s actions are limited
by the following constraints: if the wolf and goat are left alone, the wolf will
devour the goat, and if the goat and the cabbage are left alone, well, you can
imagine. . .
You can get a solution quickly by rejecting certain obvious possibilities. But
let us attack this problem more systematically. What is the state of affairs at
any point during the passage: what is on the left bank, what is on the right bank,
and where the boat is (we can deduce the contents of the boat by determining
which items are absent from both banks). The state of the left bank is a subset
of {w,g,c} —w for wolf, g for goat, and c for cabbage— and similarly for the
right bank. The shepherd is assumed to be with the boat (the cabbage cannot
steer the boat :-) ), so the state of the boat is that it is: (1) positioned at the left
bank, (2) positioned at the right bank, (3) in transit from left to right, or (4) in
transit from right to left; let us represent these possibilities by the symbols, L,
R, LR, RL, respectively.
Thus, we represent the initial state by a triple like h{w,g,c}, L, {}i. Now
what possible choices are there? The shepherd can row alone, or take one item
with him in the boat, the wolf, the goat or the cabbage. These lead to the
following states respectively.
71
72 CHAPTER 4. FINITE STATE MACHINES
Observe that all states except h{w,c}, LR, {}i are inadmissible, since some-
one will consume something. So, let us continue the exploration from h{w,c},
LR, {}i.
When the shepherd reaches the other bank, the state changes from h{w,c},
LR, {}i to h{w,c}, R, {g}i. Next, the shepherd has a choice: he can row back
with the goat to the left bank (an obviously stupid move, because he will then
be at the initial state), or he may row alone. In the first case, we get the state
h{w,c}, RL, {}i, and in the second case h{w,c}, RL, {g}i. We may continue
exploring from each of these possibilities, adding more states to the diagram.
Figure 4.1 shows the initial parts of the exploration more succinctly.
<{w,g,c}, L, {}>
w g c
<{w,g,c}, LR,{}> <{g,c}, LR, {}> <{w,c}, LR,{}>
<{w,g}, LR, {}>
−−
<{w,c}, R, {g}>
g
<{w,c}, RL,{g}>
−−
<{w,c}, L. {g}>
The important thing to note is that the number of states is finite (prove it).
So, the exploration will terminate sometime.
Exercise 27
Complete the diagram. Show that a specific kind of path in the graph corre-
sponds to a solution. How many solutions are there? Can you define states
differently to derive a smaller diagram? 2
Exercise 28
Given is a 2 × 2 board which contains a tile in each of its cells; one is a blank tile
(denoted by —), and the others are numbered 1 through 3. A move exchanges
the blank tile with one of its neighbors, in its row or column. The tiles are
initially placed as shown in Table 4.1.
4.1. INTRODUCTION 73
1 3
2 —
1 2
3 —
Show that it is impossible to reach the configuration (state) given in Table 4.2
from the initial state, given in Table 4.1.
Proof through enumeration: There are two possible moves from any state:
either the blank tile moves horizontally or vertically. Enumerate the states,
observe which states are reachable from which others and prove the result. It is
best to treat the system as a finite state machine.
A non-enumerative proof: From a given configuration, construct a string by
reading the numbers from left to right along the first row, dropping down to
the next row and reading from right to left. For Table 4.1, we get 132 and for
Table 4.2, we get 123. How does a move affect a string? More precisely, which
property of a string is preserved by a move?
Exercise 29
(A puzzle due to Sam Loyd) This is the same puzzle as in the previous exercise
played on a 4 × 4 board. The board contains a tile in each of its cells; one is a
blank tile (denoted by —), and the others are numbered 1 through 15. A move
exchanges the blank tile with one of its neighbors, in its row or column. The
tiles are initially placed as shown in Table 4.3.
01 02 03 04
05 06 07 08
09 10 11 12
13 15 14 —
Show that a sorted configuration, as shown in Table 4.4, can not be reached
in a finite sequence of moves from the initial configuration.
You can do a computer search (calculate the search space size before you
start), or prove this result. Create a string (a permutation) of 1 through 15
from each configuration, show that each move preserves a certain property of a
permutation, that the initial configuration has the given property and the final
configurations does not. Consider the number of inversions in a permutation.
74 CHAPTER 4. FINITE STATE MACHINES
01 02 03 04
05 06 07 08
09 10 11 12
13 14 15 —
g y r
What causes the state transitions? It is usually the passage of time; let us
say that the light changes every 30 seconds. We can imagine that an internal
clock generates a pulse every 30 seconds that causes the light to change state.
Let symbol p denote this pulse.
Suppose that an ambulance arrives along an intersecting road and remotely
sets this light red (so that it may proceeed without interference from vehicles
travelling along this road). Then, we have a new state transition, from green
to red and from yellow to red, triggered by the signal from the ambulance; call
this signal a. See Figure 4.3 for the full description.
p
a
p p
g y r
“facetious” and “sacrilegious”. But not “tenacious”, which contains all the
vowels but not in order.
Let us design a program to solve this problem. Our program looks at each
word in the dictionary in turn. For each word it scans it until it finds an “a”, or
fails to find it. In the latter case, it rejects the word and moves on to the next
word. In the first case, it resumes its search from the point where it found “a”
looking for “e”. This process continues until all the vowels in order are found,
or the word is rejected.
i := 0;
while S[i] 6= “c” ∧ i ≤ N do
i := i + 1
od ;
if i ≤ N then “success” else “failure” fi
A simpler strategy uses a “sentinel”, an item at the end of the list which
guarantees that the search will not fail. It simplifies AND speeds up the loop.
S[N + 1] := “c”; i := 0;
while S[i] 6= “c” do
i := i + 1
od ;
if i ≤ N then “success” else “failure” fi 2
If you complete the program you will find that its structure is a mess. There
are five loops, each looking for one vowel. They will be nested within a loop. A
failure causes immediate exit from the corresponding loop. (Another possibility
is to employ a procedure which is passed the word, the vowel and the position
in the word where the search is to start.) Modification of this program is messy.
Suppose we are interested in words in which exactly these vowels occur in order,
so “sacrilegious” will be rejected. How will the program be modified? Suppose
we don’t care about the order, but we want all the vowels to be in the word;
so, “tenacious” will make the cut. For each of these modifications, the program
structure will change significantly.
What we are doing in all these cases is to match a pattern against a word.
The pattern could be quite complex. Think about the meaning of pattern if
you are searching a database of music, or a video for a particular scene. Here
are some more examples of “mundane” patterns that arise in text processing.
76 CHAPTER 4. FINITE STATE MACHINES
:= ; < ≤ = 6= > ≥ +
1000 1001 1002 1003 1004 1005 1006 1007 1008
4. Ignore comments (the stuff that appears between braces) and extra white
spaces.
start
1 a 2 e 3 i 4 o 5 u
6
A= Alphabet
If the machine in Figure 4.4 receives the string “abstemious” then its suc-
cessive states are: 1 2 2 2 2 3 3 4 5 6 6. Since its final state is an accepting
state, we say that the string is accepted by the machine. A string that makes
the machine end up in a rejecting state is said to be rejected by the machine.
Which state does the machine end up in for the following strings: aeio,
tenacious, f, aaeeiioouu, ( denotes the empty string)? Convince yourself that
the machine accepts a string iff five vowels appear in order in that string.
Exercise 30
Draw a machine that accepts strings which contain the five vowels in order, and
no other vowels. So, the machine will accept “abstemious”, but not “sacrile-
gious”. See Figure 4.5. 2
Note There is a more general kind of finite state machine called a nondeter-
ministic machine. The state transitions are not completely determined by the
78 CHAPTER 4. FINITE STATE MACHINES
A A A A A A
start
1 a 2 e 3 i 4 o 5 u
6
Alphabet
A = Alphabet − {a,e,i,o,u}
current state and the input symbol as in the deterministic machines you have
seen so far. The machine is given the power of clairvoyance so that it chooses
the next state, out of a possible set of successor states, which is the “best” state
for processing the remaining unseen portion of the string. 2
In all cases, we deal with strings —a sequence of symbols— drawn from
a fixed alphabet. A string may or may not satisfy a pattern: “abstemious”
satisfies the pattern of having all five vowels in order. Here are some more
examples of patterns.
start D
start +, − D
Exercise 31
Draw finite state machines that accept the strings in the following problems.
2. Any string in which “(“ and “)” are balanced, the level of parentheses
nesting is at most 3.
80 CHAPTER 4. FINITE STATE MACHINES
E
Fractional Number Integer
3. Any string starting with “b” followed by any number of “a”s and then a
“d”. These strings are: “bd”, “bad”, “baad”, “baaad”, . . . 2
Exercise 32
1. Design finite state machines for the following problems. Assume that the
alphabet is {0, 1}.
3. For the following problems, the alphabet consists of letters (from the Ro-
man alphabet) and digits (Arabic numerals).
4.2. FINITE STATE MACHINE 81
Exercise 33
Let F be a finite state machine.
1. Design a program that accepts a description of F and constructs a Java
program J equivalent to F . That is, J accepts a string as input and prints
“accept” or “reject”. Assume that your alphabet is {0,1} , and a special
symbol, say #, terminates the input string.
2. Design a program that accepts a description of F and a string s and prints
“accept” or “reject” depending on whether F accepts or rejects s. 2
start 0
A B
1 1
D C
0
Figure 4.11: Accepts strings with even number of 0s and odd number of 1s
The strategy is to guess which string is accepted in each state, and attach
that as a label to that state. This is similar to program proving. Let
p^q ~p ^ q
start 0
A B
1 1
D 0 C
p ^ ~q ~p ^ ~q
Why Does the Verification Procedure Work? It seems that we are using
some sort of circular argument, but that is not so. In order to convince yourself
4.2. FINITE STATE MACHINE 83
that the argument is not circular, construct a proof using induction. The the-
orem we need to prove is as follows: after processing any string x, the machine
state is A, B, C or D iff x satisfies p ∧ q, ¬p ∧ q, ¬p ∧ ¬q or p ∧ ¬q, respectively.
The proof of this statement is by induction on the length of x.
For |x| = 0: x is an empty string and p ∧ q holds for it. The machine state
is A, so the theorem holds.
For |x| = n + 1, n ≥ 0: use the induction hypothesis and the proofs from
Table 4.8.
s/t
A B
start
0/0 1/1
0/0
1/1
0 1
Figure 4.14: Transducer that squeezes each block to a single bit
Exercise 34
Design a transducer which replaces each 0 by 01 and 1 by 10 in a string of 0s
and 1s. 2
Exercise 35
The input is a 0-1 string. A 0 that is both preceded and succeeded by at least
three 1s is to be regarded as a 1. The first three symbols are to be reproduced
exactly. The example below shows an input string and its transformation; the
bit that is changed has an overline on it in the input and underline in the output.
0110111011111000111 becomes
0110111111111000111
1/1
11/0
00/0 01/0
01/1 n c 10/0
10/1 11/1
00/1
Exercise 36
86 CHAPTER 4. FINITE STATE MACHINES
Suppose that the input for either operand is terminated by a special symbol #.
Thus, a possible input could be (1, 1)(#, 0)(#, 1)(#, #), representing the sum
of 1 and 101. Redesign the serial adder to produce the complete sum.
Note that the flipped bit could be a parity bit or one of the original ones.
Now each erroneous block has odd parity, and the receiver can identify all such
blocks. It then asks for retransmission of those blocks. If two bits (or any even
number) of bits of a block get flipped, the receiver cannot detect the error. In
practice, the blocks are much longer (than 3, shown here) and many additional
bits are used for error detection.
The logic at the receiver can be depicted by a finite state acceptor, see
Figure 4.17. Here, a block is accepted iff it has even parity. The receiver will
ask for retransmission of a block if it enters a reject state for that block (this is
not part of the diagram).
0 0
The sender is a finite state transducer that inserts a bit after every three
input bits; see figure 4.18. The start state is 0. The states have the following
4.2. FINITE STATE MACHINE 87
start
0
0/0 1/1
2 1/1 1/1 1
0/0 0/0
4 3
1/11 1/10
0/00 0/01
Figure 4.18: Append parity bit to get even parity; block length is 3
Exercise 37
Redesign the machine of Fig 4.17 to accept only a 4-bit string.
Exercise 38
Design a machine that accepts a string of symbols, and outputs the same string
by (1) removing all white spaces in the beginning, (2) reducing all other blocks
of white spaces (consecutive white spaces) to a single white space. Thus, the
string (where - denotes a white space)
----Mary----had--a--little---lamb-
is output as
Mary-had-a-little-lamb-
Modify your design so that a trailing white space is not produced.
Exercise 39
A binary string is valid if all blocks of 0s are of even length and all blocks of
1s are of odd length. Design a machine that reads a string and outputs a Y or
N for each bit. It outputs N if the current bit ends a block (a block is ended
by a bit that differs from the bits in that block) and that block is not valid;
otherwise the output is Y . See Table 4.9 for an example. 2
88 CHAPTER 4. FINITE STATE MACHINES
0 0 1 0 0 0 1 1 0 0 1 1 0 0 1
Y Y Y Y Y Y N Y N Y Y Y N Y Y
Exercise 40
The device expects the player to press the keys within 30 seconds. If no key
is pressed in this time interval, the machine transits to the initial state (and
rejects the response). Assume that 30 seconds after the last key press the device
receives the symbol p (for pulse) from an internal clock. Modify the machine in
Figure 4.19 to take care of this additional symbol. You may assume that p is
never received during the input of the first 2 bits. 2
Remark Finite state machines are used in many applications where the pas-
sage of time or exceeding a threshold level for temperature, pressure, humidity,
carbon-monoxide, or similar analog measures, causes a state change. A sensor
converts the analog signals to digital signals which are then processed by a finite
state machine. A certain luxury car has rain sensors mounted in its windshield
that detect rain and turn on the wipers. (Be careful when you go to a car wash
with this car.) 2
4.3. SPECIFYING CONTROL LOGIC USING FINITE STATE MACHINES89
0 1
0 1 0 1
0 0 1 1
0 1
1. If the user presses the appropriate button —a for A and b for B— after
depositing at least the correct amount —15¢ for A and 20¢ for B— the
machine dispenses the item and returns change, if any, in nickels.
2. If the user inserts additional coins after depositing 20¢ or more, the last
coin is returned.
3. If the user asks for an item before depositing the appropriate amount, a
warning light flashes for 2 seconds.
4. The user may cancel the transaction at any time. The deposit, if any, is
returned in nickels.
The first step in solving the problem is to decide on the input and output
alphabets. I propose the following input alphabet:
{n, d, a, b, c}.
90 CHAPTER 4. FINITE STATE MACHINES
Insertion of a nickel (resp., dime) is represented by n (resp., d), pressing the but-
tons for A (resp., B) is represented by a (resp., b), and pressing the cancellation
button is represented by c.
The output alphabet of the machine is
{n, d, A, B, w}.
Returning a nickel (resp., dime) is represented by n (resp., d). A string like nnn
represents the return of 3 nickels. Dispensing A (resp., B) is represented by A
(resp., B). Flashing the warning light is represented by w.
The machine shown in Figure 4.20 has its states named after the multiples of
5, denoting the total deposit at any point. No other deposit amount is possible,
no other number lower than 25 is divisible by 5 (a nickel’s value) and no number
higher than 25 will be accepted by the machine. (Why do we have a state 25
when the product prices do not exceed 20?) The initial state is 0. In Figure 4.20,
all transitions of the form c/nnn . . . are directed to state 0.
0 n 5 n 10 n 15 n 20 25
c/ nn c/nnn c/nnnn
c/n c/nnnnn
a/w a/w a/w b/w d/d d/d
b/w b/w b/w n/n n/n
c/
All edges labeled with c/... are directed to state 0
Exercise 41
Design a soda machine that dispenses three products costing 35¢, 55¢ and 75¢.
It operates in the same way as the machine described here. 2
can define the pattern by a finite state machine. We can also write a definition
using a regular expression:
0 | (1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)(0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9)∗
, φ, α, β, γ
α, αβ, αφ, ((φ))φ
(αβ | α((αβ))) | (α | )
((αβ)∗ (α((αβ)))∗ )((α | ) | αβ)∗
((α(αβ)∗ | (βα)∗ β)∗ αβ) | ((αγ)∗ (γα)∗ ) 2
Exercise 42
1. With the given alphabet what are the strings in α, α, φα, φ, φ?
2. What is the set of strings (αβ | ααβ)(βα | αβ)?
3. What is the set of strings (α | β)∗ ? 2
Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
pDigit = 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
simple integer = 0 | (pDigit Digit ∗ )
Letter = A | B | . . . Z | a | b | . . . z
Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
identifier = Letter (Letter | Digit)∗
IncInt = int 0
int 9 = 9
int 8 = 8 | 8 int 9 | int 9 = 8 | ( | 8)int 9
int 7 = 7 | 7 int 8 | int 8 = 7 | ( | 7)int 8
int i = i | i int i+1 | int i+1 = i | ( | i)int i+1 , for 0 ≤ i < 9
Exercise 43
The following solution has been proposed for the increasing integer problem:
0∗ 1∗ 2∗ 3∗ 4∗ 5∗ 6∗ 7∗ 8∗ 9∗ . What is wrong with it?
Solution The given expression generates non-decreasing strings, not just in-
creasing strings. So, 11 is generated.
The following solution almost corrects the problem; each integer is generated
at most once, but it generates the empty string. Write [i] as a shorthand for |i.
[0][1][2][3][4][5][6][7][8][9]
3. (φR) = φ, (Rφ) = φ
4. Commutativity of Union (R | S) = (S | R)
94 CHAPTER 4. FINITE STATE MACHINES
8. Idempotence of Union (R | R) = R
9. Closure
φ∗ =
RR∗ = R∗ R
R∗ = ( | RR∗ )
Exercise 44
Write regular expressions for the following sets of binary strings.
Exercise 45
Define the language over the alphabet {0, 1, 2} in which consecutive symbols
are different. 2
Exercise 46
What are the languages defined by
1. (0∗ 1∗ )∗
2. (0∗ | 1∗ )∗
3. ∗
4. (0∗ )∗
5. ( | 0∗ )∗ 2
4.4. REGULAR EXPRESSIONS 95
p = 0∗ | 0∗ 1q
q = 0∗ 1p
p = 0∗ | 0∗ 10∗ 1p
b0 = | b0 0 | b1 1 (1)
b1 = b0 1 | b2 0 (2)
b2 = b1 0 | b2 1 (3)
a | pα
= {replace p by aα∗ }
96 CHAPTER 4. FINITE STATE MACHINES
a | aα∗ α
= {apply (7) in Section 4.4.3}
a( | α∗ α)
= {α∗ = | α∗ α, see (9) in Section 4.4.3}
aα∗
= {replace aα∗ by p}
p 2
b2 = b1 01∗
b1 = b0 1 | b1 01∗ 0
b1 = b0 1(01∗ 0)∗
b0 = | b0 0 | b0 1(01∗ 0)∗ 1, or
= | b0 (0 | 1(01∗ 0)∗ 1)
Exercise 47
The definition of b0 allows the empty string to be regarded as a number. Fix
the definitions so that a number is a non-empty string. Make sure that your fix
does not result in every number starting with 0.
b0 = 0 | b0 0 | b1 1 (1’)
But this has the effect of every number starting with 0. To avoid this problem,
modify equation (2) to include 1 as a possibility for b1 . The equations now
become
b0 = 0 | b0 0 | b1 1 (1’)
b1 = 1 | b0 1 | b2 0 (2’)
b2 = b1 0 | b2 1 (3)
S
Alph − {b}
b
G
F Alph
Alph
Using our convention about permanently rejecting states, see page 77, we
will simplify Figure 4.21 to Figure 4.22.
S
b
start
S
b
You can see that it is a concatenation of a machine that accepts b and one
that accepts c∗ . Next, let us construct a machine that accepts bc∗ | cb∗ . Clearly,
we can build machines for both bc∗ and cb∗ separately. Building their union is
easy, because bc∗ and cb∗ start out with different symbols, so we can decide
which machine should scan the string, as shown in Figure 4.25.
start
b c
c b
Exercise 48
02139, USA
100 CHAPTER 4. FINITE STATE MACHINES
‘[ ... ]’ is a "character set", which begins with ‘[’ and is terminated by a ‘]’.
In the simplest case, the characters between the two brackets are what this set
can match.
Thus, ‘[ad]’ matches either one ‘a’ or one ‘d’, and ‘[ad]*’ matches any string
composed of just ‘a’s and ‘d’s (including the empty string), from which it follows
that ‘c[ad]*r’ matches ‘cr’, ‘car’, ‘cdr’, ‘caddaar’, etc.
You can also include character ranges a character set, by writing two char-
acters with a ‘-’ between them. Thus, ‘[a-z]’ matches any lower-case letter.
Ranges may be intermixed freely with individual characters, as in ‘[a-z$%.]’,
which matches any lower case letter or ‘$’, ‘%’ or period.
Note that the usual special characters are not special any more inside a char-
acter set. A completely different set of special characters exists inside character
sets: ‘]’, ‘-’ and ‘^’.
To include a ‘]’ in a character set, you must make it the first character. For
example, ‘[]a]’ matches ‘]’ or ‘a’. To include a ‘-’, write ‘-’ at the beginning or
end of a range. To include ‘^’, make it other than the first character in the set.
‘[^ ... ]’ ‘[^’ begins a "complemented character set", which matches any
character except the ones specified. Thus, ‘[^a-z0-9A-Z]’ matches all characters
*except* letters and digits.
‘^’ is not special in a character set unless it is the first character. The
character following the ‘^’ is treated as if it were first (‘-’ and ‘]’ are not special
there).
A complemented character set can match a newline, unless newline is men-
tioned as one of the characters not to match. This is in contrast to the handling
of regexps in programs such as ‘grep’.
‘^’ is a special character that matches the empty string, but only at the
beginning of a line in the text being matched. Otherwise it fails to match
anything. Thus, ‘^foo’ matches a ‘foo’ which occurs at the beginning of a line.
‘$’ is similar to ‘^’ but matches only at the end of a line. Thus, ‘xx*$’
matches a string of one ‘x’ or more at the end of a line.
‘\’ has two functions: it quotes the special characters (including ‘\’), and it
introduces additional special constructs.
Because ‘\’ quotes special characters, ‘\$’ is a regular expression which
matches only ‘$’, and ‘\[’ is a regular expression which matches only ‘[’, etc.
For the most part, ‘\’ followed by any character matches only that character.
However, there are several exceptions: two-character sequences starting with ‘\’
which have special meanings. The second character in the sequence is always
an ordinary character on their own. Here is a table of ‘\’ constructs.
‘\|’ specifies an alternative. Two regular expressions A and B with ‘\|’ in
between form an expression that matches anything that either A or B matches.
Thus, ‘foo\|bar’ matches either ‘foo’ or ‘bar’ but no other string.
‘\|’ applies to the largest possible surrounding expressions. Only a sur-
rounding ‘\( ... \)’ grouping can limit the scope of ‘\|’.
Full backtracking capability exists to handle multiple uses of ‘\|’.
‘\( ... \)’ is a grouping construct that serves three purposes:
4.4. REGULAR EXPRESSIONS 101
string constant begins and ends with a double-quote. ‘\"’ stands for a double-
quote as part of the regexp, ‘\\’ for a backslash as part of the regexp, ‘\t’ for
a tab and ‘\n’ for a newline.
This contains four parts in succession: a character set matching period, ‘?’,
or ‘!’; a character set matching close-brackets, quotes, or parentheses, repeated
any number of times; an alternative in backslash-parentheses that matches end-
of-line, a tab, or two spaces; and a character set matching whitespace characters,
repeated any number of times.
To enter the same regexp interactively, you would type TAB to enter a
tab, and ‘C-q C-j’ to enter a newline. You would also type single slashes as
themselves, instead of doubling them for Lisp syntax.
start
A 0 B
1 1
D 0 C
Figure 4.26: Accept strings with even number of 0s and odd number of 1s
zero := true
one := true
the variables. There is no explicit accepting state; we have to specify that the
string is accepted if zero ∧ ¬one holds. As this example shows, we carry part
of the state information in the variables.
Finite state machines embody both control and data. Using variables to en-
code data leaves us with the problem of encoding control alone, often a simpler
task. In this section, we develop the notations and conventions for manipulat-
ing variable values. Additionally, we describe how a state may have internal
structure; a state can itself be a finite state machine.
Note: We had earlier used the notation s/t in finite state transducers (see
Section 4.2.3) to denote that on reading symbol s, the machine makes a state
transition and outputs string t. We now employ a slightly different notation,
replacing “/” by “ → ”, which is a more common notation in software design.
L LD LD LD LD LD LD
In Figure 4.29, we use variable n for the length of the identifier seen so far.
The size of the machine is 3, independent of the length of identifiers.
iff n = 0 and the state is A. A transition with “else” guard is taken if no other
guard is satisfied.
n:= 0 else
A R
A Variation Same as above but the string contains parentheses and brackets,
“[” and “]”, and they have to be balanced in the customary manner (as in an
arithmetic expression). String ()[(())] is balanced whereas ([)] is not. Merely
counting for each type of bracket is insufficient, because ([)] will then be ac-
cepted.
( −−> push(st,"(")
[ −−> push(st,"[")
with the constants, and there are no parentheses. Consider only non-empty
expressions ended by the special symbol #. In Figure 4.32 we show a classical
finite state machine that accepts such strings; here a denotes the next input
constant. Observe that if the symbols + or × are seen in the initial state, the
string is rejected.
a #
Next, we enhance the machine of Figure 4.32 in Figure 4.33 to compute the
value of the arithmetic expression. For example, the machine will accept the
string 3×2+4×1×2# and output 14. The machine that outputs the expression
value is shown in Figure 4.33. There are two integer variables, s and p. Variable
p holds the value of the current term and s the value of the expression excluding
the current term. Thus, for 3 × 2 + 4 × 1 × 2#, after we have scanned 3 × 2 + 4×,
values of p and s are 4 and 3×2 = 6, respectively. The transition marked with ×
denotes that the guard is × and the command part is empty (sometimes called
skip).
handle parentheses, and we handle parentheses using the machine of Figure 4.30,
though we have to store partially computed values in a stack, as in Figure 4.31.
The outline of the scheme is as follows. We start evaluating an expression
like 3+(4×(2×6+8)×3)+7 as before, setting s, p = 3, 1 after scanning 3+. Then
on scanning “(”, we save the pair hs, pi on stack st, and start evaluating the
expression within parentheses as a fresh expression, i.e., by setting s, p := 0, 1.
Again, on encountering “(”, we save s, p = 0, 4 on the stack, and start evaluating
the inner parenthesized expression starting with s, p := 0, 1. On scanning “)”,
we know that evaluation of some parenthesized expression is complete, the value
of the expression is a = s + p and we should resume computation of its outer
expression as if we have seen the constant a. We retrieve the top pair of values
from the stack, assign them to s and p, and simulate the transition that handles
a constant, i.e., we set p := p × a. We store the length of the stack in n,
incrementing it on encountering a “(” and decrementing it for a “)”. A “)” is
accepted only when n > 0 and “#” when n = 0.
Exercise 49
Run the machine of Figure 4.34 on input string 3 + (4 × (2 × 6 + 8) × 3) + 7# and
show the values of the variables (including contents of the stack) at each step.
Choose several strings that are syntactically incorrect, and run the machine on
each of them.
Exercise 50
(Research) Would it be simpler to specify the machine in Figure 4.34 if we could
call a machine recursively? Explore the possibility of adding recursion to finite
state machines.
108 CHAPTER 4. FINITE STATE MACHINES
off door v sw on
door’ v sw v tmout
(a v b) ^ (c v d)
avb cvd
a b c d
Figure 4.36: A Finite State Machine with tree structure over states
Exercise 51
Consider a machine whose root is labeled (a ∨ b) ∧ (c ∧ d) ∨ h ∨ (g ∨ (e ∧ f )).
Draw the tree corresponding to this machine and enumerate all possible state
combinations in which the machine could be at any time.
110 CHAPTER 4. FINITE STATE MACHINES
a c
0 1
b d
taken together. In Figure 4.38, each symbol occurs in exactly one component;
so, there is never a simultaneous transition. We describe an example next, where
the general transition rule is applied.
Consider opening a car door. This action has effects on several parts of the
system. The dome light comes on, if the car is moving then a “door open” light
comes on, if the headlights are on then warning chime sounds and if the key is
in the ignition, a message is displayed in the dashboard. We can describe all
these effects by having an and-state that includes each part —dome light, “door
open” light, chime and dashboard— as a component. The event of opening a
door causes simultaneous transitions in each component.
Semantics We have been very lax about specifying the behaviors of enhanced
finite state machines. Unlike a classical machine where the moment of a transi-
tion (“when” the transition happens after a symbol is received) and its duration
(“how long” it takes for the transition to complete) are irrelevant, enhanced ma-
chines have to take such factors into account. It will not do to say that the light
comes on in response to a switch press after an arbitrary delay, if we have to
consider time-out in the design.
A transition takes place instantaneously as soon its guard holds, and if sev-
eral guards hold simultaneously for transitions from a single state, any one of
these transitions may fire. Observe that transitions from different states may
fire simultaneously if their guards hold. We assume that it takes no time at all
to evaluate the guard or execute the command part.
There is no guarantee that a transition t will be executed at all if its guard
holds, because another transition may fire and falsify the predicate in the guard
of t. As an example, consider two machines that share a printer. Each machine
may print if it is ready to print and the printer is free. Therefore, potentially
two transitions are ready to fire at some moment. But as soon as one transi-
tion succeeds, i.e., starts printing, the printer is no longer free and the other
machine’s transition may not fire immediately, or ever. Semantics of concurrent
behavior go beyond the scope of this course.
a c
0 1
b d
Figure 4.40(c), a different design for “on” state is shown; it is an and-state. The
design in Figure 4.40(c) is more modular; it clearly shows that speed and rotate
switches control different aspects of the fan.
speed
off s, 1 s,2 1 s
on
speed s
2 speed rotate
speed
r
off
power
on
speed s h−off
2 speed rotate heat
speed
r h−on
Exercise 52
Reconsider the problem of the dome light from Section 4.5.2.3 in page 108. The
car state is given by three components: the switch ( the state is off or on), the
door (shut or ajar) and the dome light’s state (off or on). There are two possible
events that can affect the state: (1) opening or closing the door (use event door
114 CHAPTER 4. FINITE STATE MACHINES
off
power
on
speed s h−off
2 speed rotate heat, heat
~in(3)
speed, r h−on
~in(h−on)
3
to toggle between shut and ajar; that is, door0 is same as door), and (2) flipping
the switch (event sw). Flipping the switch has a simple effect on the switch
state, toggling it between off and on. However, the effect on the light state is
more elaborate. If the door is shut, flipping the switch toggles the light state
between off and on. If the door is ajar, the switch event has no effect. Similarly,
if the switch is on then the light is on irrespective of the state of the door. First,
describe the system using a simple finite state machine. Next, describe it using
three components, for the switch, door and light.
point won by the receiver causes a transition to the right neighbor along a row
(provided there is a right neighbor); similarly a point won by the server causes
transition downward in a column (provided there is a neighbor below). For the
rightmost column, a receiver’s point causes him/her to win the game provided
the server’s score is below 40; similarly, for the server.
The remaining question is to model the behavior at (40, 40). Here, we
do a simple analysis to conclude that a winning point by the receiver merely
decreases the score of the server, so that the score becomes (30, 40); dually, if
the server wins the point the score becomes (40, 30).
receiver
0 15 30 40
server 0 r
15 r gr
r
30
r
40
s s s s
gs
Figure 4.44: Scoring in a tennis game. s/r are points won by server/receiver.
0: sp, rp := 0, 0
1: s, sp < 40 → sp := sp0
2: r, rp < 40 → rp := rp0
3: s, sp = 40 ∧ rp = 40 → rp := 30
4: r, sp = 40 ∧ rp = 40 → sp := 30
5: s, sp = 40 ∧ rp < 40 → gs := true
6: r, sp < 40 ∧ rp = 40 → gr := true
116 CHAPTER 4. FINITE STATE MACHINES
1,2
0
gs gr
5 6
3,4
receiver s, sp =40
r r r r, sp<40
0 15 30 40 gr
server r; rp =40
s s s s, rp<40
0 15 30 40 gs
Figure 4.47: Scoring in a tennis game, using structured transitions and states
118 CHAPTER 4. FINITE STATE MACHINES
Chapter 5
5.1 Introduction
In this set of lectures, I will talk about recursive programming, a program-
ming technique you have seen before. I will introduce a style of programming,
called Functional Programming, that is especially suited for describing recursion
in computation and data structure. Functional programs are often significantly
more compact, and easier to design and understand, than their imperative coun-
terparts. I will show why induction is an essential tool in designing functional
programs.
5.2 Preliminaries
5.2.1 Running Haskell programs from command line
The Haskell compiler is installed on all Sun and Linux machines in this depart-
ment. To enter an interactive session for Haskell, type
119
120 CHAPTER 5. RECURSION AND INDUCTION
ghci
The machine responds with something like
Prelude>
At this point, whenever it displays zzz> for some zzz, the machine is waiting
for some response from you. You may type an expression and have its value
displayed on your terminal, as follows.
Prelude> 3+4
7
Prelude> 2^15
32768
5.2.3 Comments
Any string following -- in a line is considered a comment. So, you may write in
a command line:
Prelude> 2^15 -- This is 2 raised to 15
or, in the text of a program
-- I am now going to write a function called "power."
-- The function is defined as follows:
-- It has 2 arguments and it returns
-- the first argument raised to the second argument.
5.2. PRELIMINARIES 121
I prefer to put the end of the comment symbol, -}, in a line by itself.
and
is not. The line lowup c is taken to be the start of another definition. In the
last case, you will get an error message like
The semicolon (;) plays an important role; it closes off a definition (implic-
itly, even if you have not used it). That is why you see unexpected ‘;’ in the
error message.
122 CHAPTER 5. RECURSION AND INDUCTION
5.3.1 Integer
You can use the traditional integer constants and the usual operators for addi-
tion (+), subtraction (-) and multiplication (*). Unary minus sign is the usual
-, but enclose a negative number within parentheses, as in (-2); I will tell you
why later. Division over integers is written in infix as ‘div‘ and it returns only
the integer part of the quotient. (` is the backquote symbol, usually it is the
leftmost key in the top row of your keyboard.) For division over negative in-
tegers, the following rules are used: for positive x and y, (-x) ‘div‘ (-y) = x
‘div‘ y, and (-x) ‘div‘ y = x ‘div‘ (-y) = −dx/ye. Thus, 5 ‘div‘ 3 is 1 and
(-5) ‘div‘ 3 is -2. Exponentiation is the infix operator ^ so that 2^15 is 32768.
The remainder after division is given by the infix operator `rem` and the infix
operator ‘mod‘ is for modulo; x `mod` y returns a value between 0 and y − 1,
for positive y. Thus, (-2) `rem` 3 is −2 and (-2) `mod` 3 is 1.
Two other useful functions are even and odd, which return the appropriate
boolean results about their integer arguments. Functions max and min take two
arguments each and return the maximum and the minimum values, respectively.
It is possible to write a function name followed by its arguments without any
parentheses, as shown below; parentheses are needed only to enforce an evalu-
ation order.
Prelude> max 2 5
5
The arithmetic relations are < <= == /= > >=. Each of these is a binary op-
erator that returns a boolean result, True or False. Note that equality operator
is written as == and inequality as /=. Unlike in C++, 3 + (5 >= 2) is not a valid
expression; Haskell does not specify how to add an integer to a boolean.
5.3.2 Boolean
There are the two boolean constants, written as True and False. The boolean
operators are:
Prelude> 'a'
'a'
Prelude> "a b c"
"a b c"
Prelude> "a, b, c"
"a, b, c"
You can compare characters using arithmetic relations; the letters (charac-
ters in the Roman alphabet) are ordered in the usual fashion with the uppercase
letters smaller than the corresponding lowercase letters. The expected ordering
applies to the digits as well.
There are two functions defined on characters, ord and chr. Function ord(c)
returns the value of the character c in the internal coding table; it is a number
between 0 and 255. Function chr converts a number between 0 and 255 to
the corresponding character. Therefore, chr(ord(c))=c, for all characters c,
and ord(chr(i))=i, for all i, 0 ≤ i < 256. Note that all digits in the order
'0' through '9', all lowercase letters ’a’ through ’z’ and uppercase letters ’A’
through ’Z’ are contiguous in the table. The uppercase letters have smaller ord
values than the lowercase ones.
To use ord and chr you first have to write the following command in the
command line.
import Data.Char
124 CHAPTER 5. RECURSION AND INDUCTION
Prelude Data.Char>
A string is a list of characters; all the rules about lists, described later, apply
to strings.
Infix and Prefix Operators I use the term operator to mean a function
of two arguments. An infix operator, such as +, is written between its two
arguments, whereas a prefix operator, such as max, precedes its arguments. You
can convert an infix operator to a prefix operator by putting parentheses around
the function name (or symbol). Thus, (+) x y is the same as x + y. You can
convert from prefix to infix by putting backquotes around an operator, so div
5 3 is the same as 5 ‘div‘ 3.
Most built-in binary operators in Haskell that do not begin with a letter,
such as +, *, &&, and ||, are infix; max, min, rem, div, and mod are prefix. 2
An expression consists of functions, binary infix operators and operands as
in
5.4. WRITING FUNCTION DEFINITIONS 125
-2 + sqr 9 + min 2 7 - 3
Here the first minus (called unary minus) is a prefix operator, sqr is a function
of one argument, + is a binary operator, min is a function of two arguments, and
the last minus is a binary infix operator.
Functions bind more tightly than infix operators. Function arguments are
written immediately following the function name, and the right number of ar-
guments are used up for each function, e.g., one for sqr and two for min. No
parentheses are needed unless your arguments are themselves expressions. So,
for a function g of two arguments, g x y z stands for (g x y) z. If you write g
f x y, it will be interpreted as (g f x) y; so, if you have in mind the expression
g(f(x),y), write it as g (f x) y or g(f(x),y). Now, sqr 9 + min 2 7 - 3 is (sqr
9) + (min 2 7) - 3. As a good programming practice, do not ever write f 1+1;
make your intentions clear by using parentheses, writing (f 1)+1 or f(1+1).
How do we read sqr 9 + min 2 7 × max 2 7? After functions are bound to
their arguments, we get (sqr 9) + (min 2 7) × (max 2 7). That is, we are left
with operators only, and the operators bind according to their binding powers.
Since × has higher binding power than +, the expression is read as (sqr 9) +
((min 2 7) × (max 2 7)).
Operators of equal binding power usually associate to the left; so, 5 - 3 -
2 is (5 - 3) - 2, but this is not always true. Operators in Haskell are either
(1) associative, so that the order does not matter, (2) left associative, as in
binary minus shown above, or (3) right associative, as in 2 ^ 3 ^ 5, which is
2 ^ (3 ^ 5). When in doubt, parenthesize.
In this connection, unary minus, as in -2, is particularly problematic. If you
would like to apply inc to -2, don’t write
inc -2
This will be interpreted as (inc) - (2); you will get an error message. Write
inc (-2). And -5 ‘div‘ 3 is -(5 ‘div‘ 3) which is-1, but (-5) ‘div‘ 3 is -2.
Exercise 53
What is max 2 3 + min 2 3? 2
Here are some simple function definitions. Note that I do not put any paren-
theses around the arguments, they are simply written in order and parentheses
are put only to avoid ambiguity. We will discuss this matter in some detail
later, in Section 5.4.1 (page 124).
Note: Parameters and arguments I will use these two terms synony-
mously. 2
*Main> :l Teaching.dir/337.dir/HaskellFiles.dir/337.hs
*Main> inc 5
6
*Main> imply True False
False
*Main> digit '6'
True
*Main> digit 'a'
False
*Main> digit(chr(inc (ord '8')))
True
*Main> digit(chr(inc (ord '9')))
False
Unlike a variable in C++, this variable’s value does not change during the
program execution; we are really giving a name to a constant expression so that
we can use this name for easy reference later.
Exercise 54
5. Define a function max3 whose arguments are three integers and whose value
is their maximum. 2
5.4.3 Conditionals
In traditional imperative programming, we use if-then-else to test some con-
dition (i.e., a predicate) and perform calculations based on the test. Haskell
also provides an if-then-else construct, but it is often more convenient to use
a conditional equation, as shown below. The following function computes the
absolute value of its integer argument.
absolute x
| x >= 0 = x
| x < 0 = -x
Exercise 55
The where construct permits local definitions, i.e., defining variables (and
functions) within a function definition. The variables sqx, sqy and sqz are
undefined outside this definition.
We can do this example by using a local function to define squaring.
pythagoras x y z = sq x + sq y == sq z
where
sq p = p*p
Observe that the equations use constants in the left side; these constants are
called literal parameters. During function evaluation with a specific argument
—say, False True— each of the equations are checked from top to bottom to find
the first one where the given arguments match the pattern of the equation. For
imply False True, the pattern given in the first equation matches, with False
matching False and q matching True.
We can write an even more elaborate definition of imply:
imply False False = True
imply False True = True
imply True False = False
imply True True = True
The function evaluation is simply a table lookup in this case, proceeding se-
quentially from top to bottom.
Pattern matching has two important effects: (1) it is a convenient way of
doing case discrimination without writing a sphagetti of if-then-else statements,
and (2) it binds names to formal parameter values, i.e., assigns names to com-
ponents of the data structure —q in the first example— which may be used in
the function definition in the right side.
Pattern matching on integers can use simple arithmetic expressions, as shown
below in the definition of the successor function.
suc 0 = 1
suc (n+1) = (suc n)+1
Asked to evaluate suc 3, the pattern in the second equation is found to match
—with n = 2— and therefore, evaluation of (suc 3) is reduced to the evaluation
of (suc 2) + 1.
Pattern matching can be applied in elaborate fashions, as we shall see later.
power2 0 = 1
power2 (n+1) = 2 * (power2 n)
How does the computer evaluate a call like power2 3? Here is a very rough
sketch. The interpreter has an expression to evaluate at any time. It picks
an operand (a subexpression) to reduce. If that operand is a constant, there
is nothing to reduce. Otherwise, it has to compute a value by applying the
definitions of the functions (operators) used in that expression. This is how the
evaluation of such an operand proceeds. The evaluator matches the pattern in
each equation of the appropriate function until a matching pattern is found.
Then it replaces the matched portion with the right side of that equation. Let
us see how it evaluates power2 3.
power2 3
= 2 * (power2 2) -- apply function definition on 3
= 2 * (2 * (power2 1)) -- apply function definition on 2
= 2 * (2 * (2 * (power2 0))) -- apply function definition on 1
= 2 * (2 * (2 * (1))) -- apply function definition on 0
= 2 * (2 * (2)) -- apply definition of *
= 2 * (4) -- apply definition of *
= 8 -- apply definition of *
count 0 = 0
count n
| even n = count (n `div` 2)
| odd n = count (n `div` 2) + 1
Note on pattern matching It would have been nice if we could have written
the second equation as follows.
mlt x 0 = 0
mlt x y = (mlt x (y-1)) + x
The recursive call is made to a strictly smaller argument in each case. There
are two arguments which are both numbers, and the second number is strictly
decreasing in each call. The smallest value of the arguments is attained when
the second argument is 0.
The multiplication algorithm has a running time roughly proportional to the
magnitude of y, because each call decreases y by 1. We now present a far better
algorithm. You should study it carefully because it introduces an important
concept, generalizing the function. The idea is that we write a function to
calculate something more general, and then we call this function with a restricted
set of arguments to calculate our desired answer. Let us write a function,
quickMlt, that computes x*y + z over its three arguments. We can then define
mlt x y = quickMlt x y 0
132 CHAPTER 5. RECURSION AND INDUCTION
The reason we define quickMlt is that it is more efficient to compute than mlt
defined earlier. We will use the following result from arithmetic.
x × (2 × t) + z = (2 × x) × t + z
x × (2 × t + 1) + z = (2 × x) × t + (x + z)
quickMlt x 0 z = z
quickMlt x y z
| even y = quickMlt (2 * x ) (y `div` 2) z
| odd y = quickMlt (2 * x ) (y `div` 2) (x + z)
Exercise 56
Extend quickMlt to operate over arbitrary y, positive, negative and zero. 2
Exercise 57
Use the strategy shown for multiplication to compute xy . I suggest that you
compute the more general function z ∗ xy . 2
fib 0 = 0
fib 1 = 1
fib n = fib(n-1) + fib(n-2)
fib n
| n == 0 = 0
| n == 1 = 1
| otherwise = fib(n-1) + fib(n-2)
5.5. RECURSIVE PROGRAMMING 133
The first definition has three equations and the second has one conditional
equation with three (guarded) clauses. Either definition works, but these pro-
grams are quite inefficient. Let us see how many times fib is called in computing
(fib 6), see Figure 5.1. Here each node is labeled with a number, the argument
of a call on fib; the root node is labeled 6. In computing (fib 6), (fib 4)
and (fib 5) have to be computed; so, the two children of 6 are 4 and 5. In
general, the children of node labeled i+2 are i and i+1. As you can see, there
is considerable recomputation; in fact, the computation time is proportional to
the value being computed. (Note that (fib 6) (fib 5) (fib 4) (fib 3) (fib
2) (fib 1) are called 1 1 2 3 5 8 times, respectively, which is a part of the
Fibonacci sequence). We will see a better strategy for computing Fibonacci
numbers in the next section.
6
5
4 4
3 3 3
2 2 2 2 2
1 1 1 1 1 1 1 1
0 0 0 0 0
egcd m n
| m == 0 = n
| otherwise = egcd (n `mod` m) m
gcd m n
| m == n = m
| m > n = gcd (m - n) n
| n > m = gcd m (n - m)
There is a modern version of the gcd algorithm, known as binary gcd. This
algorithm uses multiplication and division by 2, which are implemented by shifts
on binary numbers. The algorithm uses the following facts: (1) if m == n, then
gcd m n = m, (2) if m and n are both even, say 2s and 2t, then gcd m n = 2* (gcd
s t), (3) if exactly one of m and n, say m, is even and equal to 2s, then gcd m n =
gcd s n, (5) if m and n are both odd and, say, m > n, then gcd m n = gcd (m-n)
n. In this case, m-n is even whereas n is odd; so, gcd m n = gcd (m-n) n = gcd
((m-n) ‘div‘ 2) n.
bgcd m n
| m == n = m
| (even m) && (even n) = 2 * (bgcd s t)
| (even m) && (odd n) = bgcd s n
| (odd m) && (even n) = bgcd m t
| m > n = bgcd ((m-n) `div` 2) n
| n > m = bgcd m ((n-m) `div` 2)
where s = m `div` 2
t = n `div` 2
We can estimate the running time (the number recursive calls) as a function
of x and y, where x and y are arguments of bgcd. Note that the value of log2 x
+ log2 y decreases in each recursive call. So the execution time is at most
logarithmic in the argument values.
Exercise 58
For positive integers m and n prove that
1. gcd m n = gcd n m
5.6 Tuple
We have, so far, seen a few elementary data types. There are two important
ways we can build larger structures using data from the elementary types—
tuple and list. We cover tuple in this section.
Tuple is the Haskell’s version of a record ; we may put several kinds of data
together and give it a name. In the simplest case, we put together two pieces of
data and form a 2-tuple, also called a pair. Here are some examples.
As you can see, the components may be of different types. Note that Haskell
treats (3) and 3 alike, both are treated as numbers. Let us add the following
definitions to our program file:
5.6. TUPLE 135
teacher = ("Misra","Jayadev")
uniqno = 59285
course = ("cs337",uniqno)
There are two predefined functions, fst and snd , that return the first and
the second components of a pair, respectively.
*Main> fst(3,5)
3
*Main> snd teacher
"Jayadev"
*Main> snd course
59285
There is no restriction at all in Haskell about what you can have as the first
and the second component of a tuple. In particular, we can create another tuple
hard = (teacher,course)
Haskell allows you to create tuples with any number of components, but fst
and snd are applicable only to a pair.
fibpair 0 = (0,1)
fibpair n = (y, x+y)
where (x,y) = fibpair (n-1)
136 CHAPTER 5. RECURSION AND INDUCTION
Exercise 59
1. What is the difference between (3,4,5) and (3,(4,5))?
2. A point in two dimensions is a pair of coordinates; assume that we are
dealing with only integer coordinates. Write a function that takes two
points as arguments and returns True iff either the x−coordinate or the
y−coordinate of the points are equal. Here is what I expect (ray is the
name of the function).
3. A line is given by a pair of distinct points (its end points). Define func-
tion parallel that has two lines as arguments and value True iff they are
parallel. Recall from coordinate geometry that two lines are parallel if
their slopes are equal, and the slope of a line is given by the difference
of the y−coordinates of its two points divided by the difference of their
x−coordinates. In order to avoid division by 0, avoid division altogether.
Here is the result of some evaluations.
4. The function fibpair n returns the pair ((fib n), (fib (n+1))). The
computation of (fib (n+1)) is unnecessary, since we are interested only in
fib n. Redefine fib so that this additional computation is avoided.
Solution
fib 0 = 0
fib n = snd(fibpair (n-1))
5.7. TYPE 137
5.7 Type
Every expression in Haskell has a type. The type may be specified by the
programmer, or deduced by the interpreter. If you write 3+4, the interpreter
can deduce the type of the operands and the computed value to be integer (not
quite, as you will see). When you define
imply p q = not p || q
digit c = ('0' <= c) && (c <= '9')
the interpreter can figure out that p and q in the first line are booleans (because
|| is applied only to booleans) and the result is also a boolean. In the second
line, it deduces that c is a character because of the two comparisons in the right
side, and that the value is boolean, from the types of the operands.
The type of an expression may be a primitive one: Int, Bool, Char or String,
or a structured type, as explained below. You can ask to see the type of an
expression by giving the command :t, as in the following.
The type of a tuple is a tuple of types, one entry for the type of each operand.
In the following, [Char] denotes a string; I will explain why in the next section.
*Main> :t ("Misra","Jayadev")
("Misra","Jayadev") :: ([Char],[Char])
*Main> :t teacher
teacher :: ([Char],[Char])
*Main> :t course
course :: ([Char],Integer)
*Main> :t (teacher,course)
(teacher,course) :: (([Char],[Char]),([Char],Integer))
Each function has a type, namely, the types of its arguments in order followed
by the type of the result, all separated by ->.
*Main> :t imply
imply :: Bool -> Bool -> Bool
*Main> :t digit
digit :: Char -> Bool
Capitalizations for types Type names (e.g., Int, Bool) are always capital-
ized. The name of a function or parameter should never be capitalized.
138 CHAPTER 5. RECURSION AND INDUCTION
5.7.1 Polymorphism
Haskell allows us to write functions whose arguments can be any type, or any
type that satisfies some constraint. Consider the identity function:
identity x = x
*Main> :t identity
identity :: t -> t
That is for any type t, it accepts an argument of type t and returns a value of
type t.
A less trivial example is a function whose argument is a pair and whose value
is the same pair with its components exchanged.
eqpair (x,y) = x == y
It is obvious that eqpair (3,5) makes sense, but not eqpair(3,’j’). We would
expect the type of eqpair to be (t,t) -> Bool, but it is more subtle.
*Main> :t eqpair
eqpair :: (Eq t) => (t, t) -> Bool
This says that the type of eqpair is (t, t) -> Bool, for any type t that belongs
to the Eq class, i.e., types over which == is defined. Otherwise, the test == in
eqpair cannot be performed. Equality is not necessarily defined on all types,
particularly on function types.
Finally, consider a function that sorts two numbers which are given as a pair.
sort (x,y)
| x <= y = (x,y)
| x > y = (y,x)
*Main> :t sort
sort :: (Ord t) => (t, t) -> (t, t)
It says that sort accepts any pair of elements of the same type, provided the
type belongs to the Ord type class, i.e., there is an order relation defined over
that type; sort returns a pair of the same type as its arguments. An order
relation is defined over most of the primitive types. So we can do the following
kinds of sorting. Note, particularly, the last example.
Polymorphism means that a function can accept and produce data of many
different types. This allows us to define a single sorting function, for example,
which can be applied in a very general fashion.
*Main> :t 3
3 :: (Num t) => t
*Main> :t (3,5)
(3,5) :: (Num t, Num t1) => (t, t1)
Read the second line to mean 3 has the type t, where t is any type in the
type class Num. The last line says that (3,5) has the type (t, t1), where t and
t1 are arbitrary (and possibly equal) types in the type class Num. So, what is
the type of 3+4? It has the type of any member of the Num class.
*Main> :t 3+4
3 + 4 :: (Num t) => t
140 CHAPTER 5. RECURSION AND INDUCTION
*Main> digit 9
<interactive>:1:6:
No instance for (Num Char)
arising from the literal `9' at <interactive>:1:6
Possible fix: add an instance declaration for (Num Char)
In the first argument of `digit', namely `9'
In the expression: digit 9
In the definition of `it': it = digit 9
5.8 List
Each tuple has a bounded number of components —two each for course and
teacher and two in (teacher,course). In order to process larger amounts of
data, where the number of data items may not be known a priori, we use the
data structure list. A list consists of a finite sequence of items2 all of the same
type. Here are some lists.
The following are not lists because not all their elements are of the same
type.
[[2],3,5,7]
[(3,5), 8]
[(3,5), (3,8,2)]
['J',"misra"]
[2,3] 6= [3,2]
[2] 6= [2,2]
2 We deal with only finite lists in this note. Haskell permits definitions of infinite lists and
computations on them, though only a finite portion can be computed in any invocation of a
function.
5.8. LIST 141
*Main> :t [True]
[True] :: [Bool]
*Main> :t [True, False]
[True,False] :: [Bool]
*Main> :t [(2,'c'), (3,'d')]
[(2,'c'), (3,'d')] :: (Num t) => [(t, Char)]
*Main> :t [[2],[3],[5],[7]]
[[2],[3],[5],[7]] :: (Num t) => [[t]]
*Main> :t [(3,5), (3,8), (3,5), (3,7)]
[(3,5),(3,8),(3,5),(3,7)] :: (Num t, Num t1) => [(t, t1)]
*Main> :t [[(3,5), (3,8)], [(3,5), (3,7), (2,9)]]
[[(3,5),(3,8)],[(3,5),(3,7),(2,9)]] :: (Num t, Num t1) => [[(t, t1)]]
*Main> :t ['a','b','c']
['a','b','c'] :: [Char]
*Main> :t ["misra"]
["misra"] :: [[Char]]
Empty List A very special case is an empty list, one having no items. We
write it as []. It appears a great deal in programming. What is the type of []?
*Main> :t []
[] :: [a]
This says that [] is a list of a, where a is any type. Therefore, [] can be given
as argument wherever a list is expected.
*Main> 3: [2,1]
[3,2,1]
*Main> 3: []
[3]
*Main> 1: (2: (3: [])) --Study this one carefully.
[1,2,3]
*Main> 'j': "misra"
"jmisra"
*Main> "j": "misra"
<interactive>:1:5:
Couldn't match expected type `[Char]' against inferred type `Char'
Expected type: [[Char]]
Inferred type: [Char]
In the second argument of `(:)', namely `"misra"'
In the expression: "j" : "misra"
len [] = ..
len (x:xs) = ..
The definition of this function spans more than one equation. During func-
tion evaluation with a specific argument —say, [1,2,3]— each of the equations
is checked from top to bottom to find the first one where the given list matches
the pattern of the argument. So, with [1,2,3], the first equation does not
match because the argument is not an empty list. The second equation matches
because x matches with 1 and xs matches with [2,3]. Additionally, pattern
matching assigns names to components of the data structure —x and xs in this
example— which may then be used in the RHS of the function definition.
len [] = 0
len (x:xs) = 1 + (len xs)
5.8. LIST 143
suml [] = 0
suml (x:xs) = x + (suml xs)
multl [] = 1
multl (x:xs) = x * (multl xs)
maxl [x] = x
maxl (x:xs) = max x (maxl xs)
Exercise 60
Write a function that takes the conjunction (&&) of the elements of a list of
booleans.
andl [] = True
andl (x:xs) = x && (andl xs)
So, we have
*Main> andl [True, True, 2 == 5]
False
2
Now consider a function whose value is not just one item but a list. The
following function negates every entry of a list of booleans.
notl[] = []
notl (x:xs) = (not x) : (notl xs)
So,
*Main> notl [True, True, 2 == 5]
[False,False,True]
3 But people do and they define it to be −∞. The value −∞ is approximated by the
The following function removes all negative numbers from the argument list.
negrem [] = []
negrem (x:xs)
| x < 0 = negrem xs
| otherwise = x : (negrem xs)
So,
*Main> negrem []
[]
*Main> negrem [2,-3,1]
[2,1]
*Main> negrem [-2,-3,-1]
[]
Pattern matching over a list may be quite involved. The following function,
divd, partitions the elements of the argument list between two lists, putting the
elements with even index in the first list and with odd index in the second list
(list elements are numbered starting at 0). So, divd [1,2,3] is ([1,3],[2]) and
divd [1,2,3,4] is ([1,3],[2,4]). See Section 5.8.5 for another solution to this
problem.
We conclude this section with a small example that goes beyond “primitive
recursion”, i.e., recursion is applied not just on the tail of the list. The problem is
to define a function uniq that returns the list of unique items from the argument
list. So, uniq[3, 2] = [3, 2], uniq[3, 2, 2] = [3, 2], uniq[3, 2, 3] = [3,
2].
uniq [] = []
uniq (x:xs) = x: (uniq(minus x xs))
where
minus y [] = []
minus y (z: ys)
| y == z = (minus y ys)
| otherwise = z: (minus y ys)
Note This program does not work if you try to evaluate uniq [] on the com-
mand line. This has to do with type classes; the full explanation is beyond the
scope of these notes.
Exercise 61
5.8. LIST 145
1. Define a function unq that takes two lists xs and ys as arguments. Assume
that initially ys contains distinct elements. Function unq returns the list
of unique elements from xs and ys. Define uniq using unq.
unq [] ys = ys
unq (x:xs) ys
| inq x ys = unq xs ys -- inq x ys is: x in ys?
| otherwise = unq xs (x:ys)
where
inq y [] = False
inq y (z: zs) = (y == z) || (inq y zs)
uniq xs = unq xs []
3. The prefix sum of a list of numbers is a list of equal length whose ith
element is the sum of the first i items of the original list. So, the prefix
sum of [3,1,7] is [3,4,11]. Write a linear-time algorithm to compute the
prefix sum.
Hint: Use function generalization.
ps xs = pt xs 0
where pt [] c = []
pt (x:xs) c = (c+x) : (pt xs (c+x))
divide0 [] = ([],[])
divide0 (x: xs) = (x:f, s)
where (f,s) = divide1 xs
divide1 [] = ([],[])
divide1 (x: xs) = (f, x:s)
where (f,s) = divide0 xs
We then get,
0 0
We encode this machine using function zero and one for the initial state and
the other state. The input string is coded as argument of the functions, and the
function values are boolean, True for acceptance and False for rejection. The
machine is run on input string s by calling zero(s).
zero [] = True
zero('0':xs) = zero(xs)
zero('1':xs) = one(xs)
one[] = False
one('0':xs) = one(xs)
one('1':xs) = zero(xs)
Next, consider a finite state transducer that accepts a string of symbols, and
outputs the same string by (1) removing all white spaces in the beginning, (2)
reducing all other blocks of white spaces (consecutive white spaces) to a single
white space. Thus, the string (where - denotes a white space)
5.9. EXAMPLES OF PROGRAMMING WITH LISTS 147
----Mary----had--a little---lamb-
is output as
Mary-had-a-little-lamb-
A machine to solve this problem is shown in Figure 5.3, where n denotes any
symbol other than a white space
−/
n/n
n/n
first next
−/−
first [] = []
first(' ':xs) = first(xs)
first(x:xs) = x:next(xs)
next[] = []
next(' ':xs) = ' ':first(xs)
next(x:xs) = x:next(xs)
Exercise 62
Modify your design so that a trailing white space is not produced.
The list constructor cons of Section 5.8.2 (page 141) is used to add an item at
the head of a list. The function snoc, defined below, adds an item at the “end”
of a list.
snoc x [] = [x]
snoc x (y: xs) = y:(snoc x xs)
The execution of snoc takes time proportional to the length of the argument
list, whereas cons takes constant time. So, it is preferable to use cons.
148 CHAPTER 5. RECURSION AND INDUCTION
5.9.1.2 concatenate
The following function concatenates two lists in order. Remember that the two
lists need to have the same type in order to be concatenated.
conc [] ys = ys
conc (x:xs) ys = x : (conc xs ys)
There is a built-in operator that does the same job; conc xs ys is written
as xs ++ ys. The execution of conc takes time proportional to the length of the
first argument list.
Exercise 63
5.9.1.3 flatten
and flattens it out by putting all the elements into a single list, like
[1,2,3,10,20,30]
This definition should be studied carefully. Here xs is a list and xss is a list
of lists.
flatten [] = []
flatten (xs : xss) = xs ++ (flatten xss)
Exercise 64
and
5.9.1.4 reverse
4. Show how to right-rotate a list efficiently (i.e., in linear time in the size of
the argument list). Right-rotation of [1,2,3,4] yields [4,1,2,3].
Hint: use rev.
right_rotate [] = []
right_rotate xs = y: (rev ys)
where
y:ys = (rev xs)
[(1,'a','b'),(2,'a','c'),(1,'b','c'),(3,'a','b'),
(1,'c','a'),(2,'c','b'),(1,'a','b')]
There is an iterative solution for this problem, which goes like this. Disk 1
moves in every alternate step starting with the first step. If n is odd, disk 1
moves cyclically from ’a’ to ’b’ to ’c’ to ’a’ . . ., and if n is even, disk 1 moves
cyclically from ’a’ to ’c’ to ’b’ to ’a’ . . .. In each remaining step, there is
exactly one possible move: ignore the peg of which disk 1 is the top; compare
the tops of the two remaining pegs and move the smaller one to the top of the
other peg (if one peg is empty, move the top of the other peg to its top).
I don’t know an easy proof of this iterative scheme; in fact, the best proof
I know shows that this scheme is equivalent to an obviously correct recursive
scheme.
The recursive scheme is based on the following observations. There is a step
in which the largest disk is moved from ’a’; we show that it is sufficient to move
it only once, from ’a’ to ’b’. At that moment, disk n is the top disk at ’a’ and
there is no other disk at ’b’. So, all other disks are at ’c’, and, according to
the given constraint, they are correctly stacked. Therefore, prior to the move
of disk n, we have the subtask of moving the remaining n − 1 disks, provided
n > 1, from ’a’ to ’c’. Following the move of disk n, the subtask is to move the
remaining n−1 disks from ’a’ to ’c’. Each of these subtasks is smaller than the
original task, and may be solved recursively. Note that in solving the subtasks,
disk n may be disregarded, because any disk can be placed on it; hence, its
presence or absence is immaterial. Below, tower n x y z returns a list of steps
to transfer n disks from peg x to y using z as an intermediate peg.
tower n x y z
| n == 0 = []
| otherwise = (tower (n-1) x z y)
++ [(n,x,y)]
++ (tower (n-1) z y x)
Exercise 66
00 01 11 10
But merely concatenating this sequence to the previous sequence won’t do; 010
and 100 differ in more than one bit position. But concatenating the reverse of
the above sequence works, and we get
*Main> gray 3
["000","001","011","010","110","111","101","100"]
Considering that we will have to reverse this list to compute the function
value for the next higher argument, let us define a more general function,
grayGen, whose argument is a natural number n and whose output is a pair
of lists, (xs,ys), where xs is the Gray code of n and ys is the reverse of xs. We
can compute xs and ys in similar ways, without actually applying the reverse
operation.
First, define a function cons0 whose argument is a list of strings and which
returns a list by prepending a ’0’ to each string in the argument list. Similarly,
define cons1 which prepends ’1’ to each string.
cons0 [] = []
cons0 (x:xs) = ('0':x):(cons0 xs)
cons1 [] = []
cons1 (x:xs) = ('1':x):(cons1 xs)
grayGen 0 = ([""],[""])
grayGen n = ((cons0 a) ++ (cons1 b), (cons1 a) ++ (cons0 b))
where (a,b) = grayGen (n-1)
gray n = fst(grayGen n)
Exercise 67
1. Show another sequence of 3-bit numbers that has the Gray code property.
rev a = b
where (a,b) = grayGen n
5.9. EXAMPLES OF PROGRAMMING WITH LISTS 153
You will have to use the following facts; for arbitrary lists xs and ys
3. Given two strings of equal length, their Hamming distance is the number of
positions in which they differ. Define a function to compute the Hamming
distance of two given strings.
4. In a Gray code sequence consecutive numbers have hamming distance of
1. Write a function that determines if the strings in its argument list have
the Gray code property. Make sure that you compare the first and last
elements of the list. 2
Exercise 68
This exercise is about computing winning strategies in simple 2-person games.
1. Given are two sequences of integers of equal length. Two players alter-
nately remove the first number from either sequence; if one of the se-
quences becomes empty, numbers are removed from the other sequence
until it becomes empty. The game ends when both sequences are empty,
and then the player with the higher total sum wins the game (assume that
sums of all the integers is odd, so that there is never a tie). Write a pro-
gram to determine if the first player has a winning strategy. For example,
given the lists [2,12] and [10,7], the first player does not have a winning
strategy; if he removes 10, then the second player removes 7, forcing the
first player to remove 2 to gain a sum of 12 and the second player to gain
19; and if the first player removes 2, the second player removes 12, again
forcing the same values for the both players.
Solution
Define play(xs,ys), where xs and ys are lists of integers of equal length,
that returns True iff the first player has a winning strategy. Define helper
function wins(b,xs,ys), where xs and ys are lists of integers, not neces-
sarily of the same length, and b is an integer. The function returns True
iff the first player has a winning strategy from the given configuration
assuming that he has already accumulated a sum of b. Then play(xs,ys)
= wins(0,xs,ys). And, function loses(b,xs,ys) is True iff the first player
has no winning strategy, i.e., the second player has a winning strategy.
play(xs,ys) = wins(0,xs,ys)
play(xs) = wins(0,xs,rev(xs),length(xs))
5.9.4 Sorting
Consider a list of items drawn from some totally ordered domain such as the
integers. We develop a number of algorithms for sorting such a list, that is, for
producing a list in which the same set of numbers are arranged in ascending
order.4 We cannot do in situ exchanges in sorting, as is typically done in
imperative programming, because there is no way to modify the argument list.
isort [] = []
isort (x:xs) = .. (isort xs) .. -- skeleton of a definition
The first line is easy to justify. For the second line, the question is: how
can we get the sorted version of (x:xs) from the sorted version of xs —that
is isort xs— and x? The answer is, insert x at the right place in (isort xs).
So, let us first define a function insert y ys, which produces a sorted list by
appropriately inserting y in the sorted list ys.
insert y [] = [y]
insert y (z:zs)
| y <= z = y:(z:zs)
| y > z = z: (insert y zs)
Exercise 69
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys)
| x <= y = x : (merge xs (y:ys))
| x > y = y : (merge (x:xs) ys)
mergesort [] = []
mergesort [x] = [x]
mergesort xs = merge left right
where
(xsl,xsr) = divide0 xs
left = mergesort xsl
right = mergesort xsr
156 CHAPTER 5. RECURSION AND INDUCTION
Exercise 70
1. Why is
merge [] [] = []
2. Show that mergesort has a running time of O(2n × n) where the argument
list has length 2n .
4. Develop a function similar to merge that has two ascending lists as argu-
ments and creates an ascending list of common elements.
5. Develop a function that has two ascending lists as arguments and cre-
ates the difference, first list minus the second list, as an ascending list of
elements.
5.9.4.3 Quicksort
Function quicksort partitions its input list xs into two lists, ys and zs, so
that every item of ys is at most every item of zs. Then ys and zs are sorted
and concatenated. Note that in mergesort, the initial partitioning is easy and
the final combination is where the work takes place; in quicksort the initial
partitioning is where all the work is.
We develop a version of quicksort that differs slightly from the description
given above. First, we consider the partitioning problem. A list is partitioned
with respect to some value v that is supplied as an argument; all items smaller
than or equal to v are put in ys and all items greater than v are put in zs.
partition v [] = ([],[])
partition v (x:xs)
| x <= v = ((x:ys),zs)
| x > v = (ys,(x:zs))
where (ys,zs) = partition v xs
There are several heuristics for choosing v; let us choose it to be the first
item of the given (nonempty) list. Here is the definition of quicksort.
quicksort [] = []
quicksort (x:xs) = (quicksort ys ) ++ [x] ++ (quicksort zs)
where (ys,zs) = partition x xs
5.9. EXAMPLES OF PROGRAMMING WITH LISTS 157
Exercise 71
2. What is the running time of quicksort if the input file is already sorted?
Exercise 72
1. Define a function that takes two lists of equal length as arguments and
produces a boolean list of the same length as the result; an element of the
boolean list is True iff the corresponding two elements of the argument
lists are identical.
2. Define a function that creates a list of unique elements from a sorted list.
Use this function to redefine function uniq of Section 5.8.4 (page 144).
3. Define function zip that takes a pair of lists of equal lengths as argument
and returns a list of pairs of corresponding elements. So,
5. Define function take where take n xs is a list containing the first n items
of xs in order. If n exceeds the length of xs then the entire list xs is
returned.
7. Define function index where index i xs returns the ith element of xs.
Assume that elements in a list are indexed starting at 0. Also, assume
that the argument list is of length at least i. 2
Exercise 73
158 CHAPTER 5. RECURSION AND INDUCTION
A matrix can be represented as a list of lists. Let us adopt the convention that
each outer list is a column of the matrix. Develop an algorithm to compute the
determinant of a matrix of numbers. 2
Exercise 74
It is required to develop a number of functions for processing an employee
database. Each entry in the database has four fields: employee, spouse, salary
and manager. The employee field is a string that is the name of the employee,
the spouse field is the name of his/her spouse –henceforth, “his/her” will be ab-
breviated to “its” and “he/she” will be “it”—, the salary field is the employee’s
annual salary and the manager field is the name of employee’s manager. Assume
that the database contains all the records of a hierachical (tree-structured) orga-
nization in which every employee’s spouse is also an employee, each manager is
also an employee except root, who is the manager of all highest level managers.
Assume that root does not appear as an employee in the database.
A manager of an employee is also called its direct manager; an indirect
manager is either a direct manager or an indirect manager of a direct manager;
thus, root is every employee’s indirect manager.
Write functions for each of the following tasks. You will find it useful to
define a number of auxiliary functions that you can use in the other functions.
One such function could be salary, which given a name as an argument returns
the corresponding salary.
In the following type expressions, DB is the type of the database, a list of
4-tuples, as described above.
2. List all employees who directly manage their spouses; do the same for
indirect management.
3. List all managers who indirectly manage both an employee and its spouse.
4. Are there employees e and f such that e’s spouse is f ’s manager and f ’s
spouse is e’s manager?
normalize :: DB -> DB
patienceSort(cs) = psort([],cs)
For the design of psort, we consider putting numbers one at a time to the
piles. To this end, we design yet another (simpler) helper function psort1 where
psort1(ps, c) puts a single number c to the list of piles ps. Then,
psort(ps,[]) = ps
psort(ps, c:cs) = psort(psort1(ps, c), cs)
psort1([],c) = [[c]]
psort1((p:ps):pss,c)
| c < p = (c:(p:ps)):pss
| c >= p = (p:ps):psort1(pss,c)
Exercise 75
1. Show that any ascending subsequence of the original sequence has at most
one entry from a pile. Consequently, the length of the longest ascending
subsequence is at most the number of piles.
2. Show that follwing the links backwards starting from any number gives
an ascending subsequence.
3. Combining the two results above, prove that the proposed algorithm con-
structs a longest ascending subsequence.
4. Can there be many longest ascending subsequences? In that case, can you
enumerate all of them given the piles?
5. (just for fun) Develop a totally different algorithm for computing a longest
ascending subsequence.
5 I am grateful to Jay Hennig, class of Fall 2010, for pointing out an error in the treatment
a0 + a1 × x + a2 × x2 + · · · + an × xn
= a0 + x × (a1 + a2 × x + · · · + an × xn−1 )
ep1 [] x = 0
ep1 (a:as) x = a + x * (ep1 as x)
4 + 0 × x − 3 × x2 + 2 × x3
= 4 + x × (0 − 3 × x + 2 × x2 )
= 4 + x × (0 + x × (−3 + 2 × x))
= 4 + x × (0 + x × (−3 + x × (2)))
rev' ys = rvc [] ys []
conc xs ys = rvc xs [] ys
rvc [] [] q = q
rvc [] (v:vs) q = rvc [] vs (v:q)
rvc (u:us) vs q = u:(rvc us vs q)
Next, we show a proof that the code of rvc meets the specification (*).
We will need the following facts about rev and ++ for the proof. We use
the associativity of ++ wiithout explicitly referring to it.
(i) ++ is associative, i.e., xs ++ (ys ++ zs) = (xs ++ ys) ++ zs, for any
lists xs, ys and zs
(ii) [] ++ ys = ys and ys ++ [] = ys, for any list ys
(iii) x:xs = [x] ++ xs, for any item x and list xs
(iv) rev [] = []
(v) rev [x] = [x], where [x] is a list with a single item
(vi) rev(xs ++ ys) = (rev ys) ++ (rev xs), for any lists xs and ys
And,
[] ++ rev(v:vs) ++ q
= {using fact (ii)}
rev(v:vs) ++ q
= {using fact(iii)}
rev([v] ++ [vs]) ++ q
= {using fact(vi)}
rev(vs) ++ rev([v]) ++ q
= {using fact(v)}
rev(vs) ++ [v] ++ q
And,
(u:us) ++ rev(vs) ++ q
= {using fact (iii)}
[u] ++ us ++ rev(vs) ++ q
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)
The first two are easy to see. The third one implements the exclusive-or over a
list; it treats True as 1, False as 0, and takes the modulo 2 sum over the list.
That is, it returns the parity of the number of True elements in the list.
We can define flatten of Section 5.9.1.3 (page 148) by
Note I have been writing the specific operators, such as (+), within paren-
theses, instead of writing them as just +, for instance, in the definition of suml.
This is because the definition of foldr requires f to be a prefix operator, and +
is an infix operator; (+) is the prefix version of +. 2
Note There is an even nicer way to define functions such as suml and multl;
just omit xs from both sides of the function definition. So, we have
In these notes, I will not describe the justification for this type of definition. 2
Function foldr has an argument that is a function; foldr is called a higher
order function. The rules of Haskell do not restrict the type of argument of a
function; hence, a function, being a typed value, may be supplied as an argu-
ment. Function (and procedure) arguments are rare in imperative programming,
but they are common and very convenient to define and use in functional pro-
gramming. Higher order functions can be defined for any type, not just lists.
What is the type of foldr? It has three arguments, f, z and xs, so its type
is
The type of z is arbitrary, say a. Then f takes two arguments of type a and
produces a result of type a, so its type is (a -> a -> a). Next, xs is a list of
type a, so its type is [a]. Finally, the result type is a. So, we have for the type
of foldr
*Main> :t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b
5.11. HIGHER ORDER FUNCTIONS 165
This means that the two arguments of f need not be of the same type. Here
is an example; function evenl determines if all integers of a given list are even.
For its definition, we use function ev that takes an integer and a boolean as
arguments and returns a boolean.
ev x b = (even x) && b
evenl xs = foldr ev True xs
fold f [x] = x
fold f (x:xs) = f x (fold f xs)
Function map takes as arguments (1) a function f and (2) a list of elements
on which f can be applied. It returns the list obtained by applying f to each
element of the given list.
map f [] = []
map f (x:xs) = (f x) : (map f xs)
So,
*Main> :t map
map :: (a -> b) -> [a] -> [b]
where andl is defined in Section 5.11.1 (page 163). Here (map even xs) creates a
list of booleans of the same length as the list xs such that the ith boolean is True
iff the ith element of xs is even. The function andl then takes the conjunction
of the booleans in this list.
Exercise 76
Redefine the functions cons0 and cons1 from page 152 using map. 2
Function filter has two arguments, a predicate p and a list xs; it returns the
list containing the elements of xs for which p holds.
filter p [] = []
filter p (x:xs)
| p x = x: (filter p xs)
| otherwise = (filter p xs)
So, we have
*Main> :t filter
filter :: (a -> Bool) -> [a] -> [a]
Exercise 77
What is filter p (filter q xs)? In particular, what is filter p (filter (not
p) xs)? 2
5.12. PROGRAM DESIGN: BOOLEAN SATISFIABILITY 167
a number of literals, and each literal is either a variable or its negation. For
example, the formula (p ∨ q) ∧ (¬p ∨ ¬q) ∧ (p ∨ ¬q) is in CNF. Any boolean
formula can be converted to a logically equivalent CNF formula. Henceforth,
we assume that the input formula is in CNF.
• some term in the formula is empty (i.e., it has no literals), in which case
the term (and hence also the formula) is unsatisfiable, because it is a
disjunction.
p ~p
q ~q r ~r
r ~r r ~r q ~q
F F T F F F
[
["-p", "+q", "+r"],
["+p", "-r"],
["-q", "+r"],
["-p", "-q", "-r"],
["+p", "+q", "+r"]
]
We should be clear about the types. I define the types explicitly (I have not
told you how to do this in Haskell; just take it at face value).
dp xss
| xss == [] = True
| emptyin xss = False
| otherwise = (dp yss) || (dp zss)
where
v = literal xss
yss = reduce v xss
zss = reduce (neg v) xss
literal xss: returns a literal from xss, where xss is nonempty and does not
contain an empty term,
neg v: returns the string corresponding to the negation of v.
reduce v xss, where v is a literal: returns the formula obtained from xss by
dropping any term containing the literal v and dropping any occurrence
of the literal neg v in each remaining term.
Function literal can return any literal from the formula; it is easy to return
the first literal of the first term of its argument. Since the formula is not empty
and has no empty term, this procedure is valid.
A call to reduce v xss scans through the terms of xss. If xss is empty, the
result is the empty formula. Otherwise, for each term xs,
• if v appears in xs, drop the term,
• if the negation of v appears in xs then modify xs by removing the negation
of v,
• if neither of the above conditions hold, retain the term.
-- negate a literal
neg :: Literal -> Literal
neg ('+': var) = '-': var
neg ('-': var) = '+': var
Function reduce introduces two new functions, which will be developed next.
inl v xs, where v is a literal and xs is a term: returns True iff v appears in xs,
remove u xs, where u is a literal known to be in term xs: return the term
obtained by removing u from xs.
Exercise 78
2. Function reduce checks if xss is empty. Is this necessary given that reduce
is called from dp with a nonempty argument list? Why doesn’t reduce
check whether xss has an empty term?
term in the formula is ordered and in reduce v xss, v is the smallest literal in
xss, then we can check the first literal of each term to see whether it contains v
or its negation.
Function dp2, given below, does the job of dp, but it needs an extra argument,
a list of variables like [ "p", "q" ,"r" ], which defines the variable ordering.
Here, reduce2 is the counterpart of reduce. Now we no longer need the functions
inl, remove and literal.
reduce3 v [] = ([],[])
reduce3 v ((x:xs):xss)
| '+': v == x = (yss , xs:zss)
| '-': v == x = (xs:yss, zss )
| otherwise = ((x:xs):yss, (x:xs):zss)
where
(yss,zss) = reduce3 v xss
174 CHAPTER 5. RECURSION AND INDUCTION
Exercise 79
Count of URL Access Frequency We are given a large log of web page
requests. We would like to know how often certain web pages are being accessed.
The map function takes a single request and outputs hU RL, 1i. The reduce
function sums the values for the same URL and emits a total count for each
URL.
the list of all source URLs associated with a given target URL and emits the
pair: htarget, list(source)i.
Inverted Index This problem is very similar to the Reverse Web-Link Graph
problem, described above. Instead of URLs, we look for common words in web
documents. The required output is hword, list(source)i. It is easy to augment
this computation to keep track of word positions.
Distributed Sort The map function extracts a key from each document,
and emits a hkey, documenti pair. The reduce function simply emits all pairs
unchanged. But, the implementation imposes an order on the outputs which
results in sorted output.
Relational Database
6.1 Introduction
You can now purchase a music player that stores nearly 10,000 songs. The
storage medium is a tiny hard disk, a marvel of hardware engineering. Equally
impressive is the software which combines many aspects of compression, error
correction and detection, and database manipulation.
First, the compression algorithm manages to store around 300 music CDs,
each with around 600MB of storage, on my 20GB player; this is a compression
of about 10 to 1. While it is possible to compress music to any extent, because
exact reproduction is not expected, you would not want to listen to such music.
Try listening to a particularly delicate piece over the telephone! The compression
algorithm manages to reproduce music reasonably faithfully.
A music player begins its life expecting harsh treatment, even torture. The
devices are routinely dropped, they are subjected to X-ray scans at airports,
and left outside in very cold or very hot cars. Yet, the hardware is reasonably
resilient, but more impressively, the software works around the hardware glitches
using error-correcting strategies some of which we have outlined in an earlier
chapter.
The question that concerns us in this chapter is how to organize a large
number of songs so that we can locate a set of songs quickly. The songs are
first stored on a desktop (being imported from a CD or over the internet from
a music store); they can be organized there and then downloaded to a player.
A naive organization will make it quite frustrating to find that exact song in
your player. And, you may wish to listen to all songs which are either by artist
A or composer B, in the classical genre, and have not been played more than 6
times in the last 3 months. The subject matter of this chapter is organization
of certain kinds of data, like songs, to allow efficient selection of a subset which
meets a given search criterion.
For many database applications a set of tuples, called a table, is often the
appropriate data structure. Let me illustrate it with a small database of movies;
179
180 CHAPTER 6. RELATIONAL DATABASE
see Table 6.1 (page 180). We store the following information for each movie: its
title, actor, director, genre and the year of release. We list only the two most
prominent actors for a movie, and they have to appear in different tuples; so
each movie is being represented by two tuples in the table. We can now easily
specify a search criterion such as, find all movies released between 1980 and
2003 in which Will Smith was an actor and the genre is SciFi. The result of this
search is a table, shown in Table 6.2 (page 180).
Chapter Outline We introduce the table data structure and some terminol-
ogy in section 6.2. A table resembles a mathematical relation, though there
are some significant differences which we outline in that section. An algbra of
relations is developed in section 6.3. The algebra consists of a set of operations
on relations (section 6.3.1) and a set of identities over relational expressions
(section 6.3.2). The identities are used to process queries efficiently, as shown
in section 6.3.3. A standard query language, SQL, is described in section 6.3.4.
This chapter is a very short introduction to the topic; for more thorough treat-
ment see the relevant chapters in [35] and [2].
6.2. THE RELATIONAL DATA MODEL 181
A general relation consists of tuples, not necessarily pairs as for binary re-
lations. Consider a family relation which consists of triples (c, f, m), where c is
the name of a child, and f and m are the father and the mother. Or, the relation
P ythagoras which consists of triples (x, y, z) where the components are positive
integers and x2 + y 2 = z 2 . Or, F ermat which consists of quadruples of positive
integers (x, y, z, n), where xn + y n = z n and n > 2. (A recent breakthrough
in mathematics has established that F ermat = φ.) In databases, the relations
need not be binary; in fact, most often, they are not binary.
A relation, being a set, has all the set operations defined on it. We list some
of the set operations below which are used in relational algebra.
1. Union: R ∪ S = {x| x ∈ R ∨ x ∈ S}
2. Intersection: R ∩ S = {x| x ∈ R ∧ x ∈ S}
3. Difference: R − S = {x| x ∈ R ∧ x 6∈ S}
Thus, given R = {(1, 2), (2, 3), (3, 4)} and S = {(2, 3), (3, 4), (4, 5)}, we get
Theatre Address
General Cinema 2901 S 360
Tinseltown USA 5501 S I.H. 35
Dobie Theater 2021 Guadalupe St
Entertainment Film 6700 Middle Fiskville Rd
R×S = S×R
For mathematical relations, this identity does not hold because the components
cannot be permuted.
A relational database is a set of relations with distinct relation names. The
relations in Tables 6.1 (page 180), 6.4 (page 182), and 6.5 (page 183) make up
a relational database. Typically, every relation in a database has a common
attribute with some other relation.
6.3. RELATIONAL ALGEBRA 183
x+y =y+x
x × (y + z) = x × y + x × z
Table 6.7: Cross Product of the two relations in Table 6.6 (page 184)
(page 184); they are separated by a vertical line. Their cross product is shown
in Table 6.7 (page 184).
The cross product in Table 6.7 makes no sense. We introduce the join
operator later in this section which takes a more “intelligent” cross product.
If R and S have common attribute names, the names are changed so that
we have disjoint attributes. One strategy is to prefix the attribute name by the
name of the relation. So, if you are computing Prof × Student where both Prof
and Student have an attribute id, an automatic renaming may create Profid and
Studentid . This does not always work, for instance, in Prof × Prof . Manual
aid is then needed. In this chapter, we write R × S only if the attributes of R
and S are disjoint.
Note a subtle difference between mathematical and database relations for
cross product. For tuple (r, s) in R and (u, v) in S, their mathematical cross
product gives a tuple of tuples, ((r, s), (u, v)), whereas the database cross prod-
uct gives a tuple containing all 4 elements, (r, s, u, v).
The number of tuples in R×S is the number of tuples in R times the number
in S. Thus, if R and S have 1,000 tuples each, R × S has a million tuples and
R × (S × S) has a billion. So, cross product is rarely computed in full. It is
often used in conjunction with other operations which can be applied in a clever
sequence to eliminate explicit computations required for a cross product.
Projection The operations we have described so far affect only the rows (tu-
ples) of a table. The next operation, projection, specifies a set of attributes of
a relation that are to be retained to form a relation. Projection removes all
other attributes (columns), and removes any duplicate rows that are created as
a result. We write πu,v (R) to denote the relation which results by retaining
only the attributes u and v of R. Let R be the relation shown in Table 6.1
6.3. RELATIONAL ALGEBRA 185
(page 180). Then, πTitle,Director ,Genre,Year (R) gives Table 6.9 (page 186) and
πTitle,Actor (R) gives Table 6.10 (page 186).
Selection The selection operation chooses the tuples of a relation that sat-
isfy a specified predicate. A predicate uses attribute names as variables, as
in year ≥ 1980 ∧ year ≤ 2003 ∧ actor = “W ill Smith00 ∧ genre =
“SciF i00 . A tuple satisfies a predicate if the predicate is true when the at-
tribute names are replaced by the corresponding values from the tuple. We
write σp (R) to denote the relation consisting of the subset of tuples of R that
satisfy predicate p. Let R be the relation in Table 6.1 (page 180). Then,
σyear ≥1980 ∧year ≤2003 ∧actor =“Will Smith 00 ∧genre=“SciFi 00 (R) is shown in Table 6.2
(page 180) and σactor =“Will Smith 00 ∧genre=“Comedy 00 (R) is the empty relation.
Join There are several join operators in relational algebra. We study only one
which is called natural join, though we simply call it join in this chapter. The
join of R and S is written as R ./ S. Here, R and S need not be compatible;
typically, they will have some common attributes.
The join is a more refined way of taking the cross product. As in the cross
product, take each tuple r of R and s of S. If r and s have no common attributes,
or do not match in their common attributes, then their join produces an empty
tuple. Otherwise, concatenate r and s keeping only one set of values for the
common attributes (which match). Consider Tables 6.4 (page 182) and 6.5
(page 183). Their join is shown in Table 6.8 (page 185). And, the join of
Tables 6.9 (page 186) and 6.10 (page 186) is Table 6.1 (page 180).
The join operator selects only the tuples which match in certain attributes;
so, join results in a much smaller table than the cross product. Additionally,
the result is usually more meaningful. In many cases, a large table can be
decomposed into two much smaller tables whose join recreates the original table.
See the relations in Tables 6.9 (page 186) and 6.10 (page 186) whose join gives
us the relation in Table 6.1. The storage required for these two relations is much
smaller than that for Table 6.1 (page 180).
Title Actor
Jurassic Park Jeff Goldblum
Jurassic Park Sam Neill
Men in Black Tommy Lee Jones
Men in Black Will Smith
Independence Day Will Smith
Independence Day Bill Pullman
My Fair Lady Audrey Hepburn
My Fair Lady Rex Harrison
The Sound of Music Julie Andrews
The Sound of Music Christopher Plummer
Bad Boys II Martin Lawrence
Bad Boys II Will Smith
Ghostbusters Bill Murray
Ghostbusters Dan Aykroyd
Tootsie Dustin Hoffman
Tootsie Jessica Lange
Table 6.10: Table 6.1 (page 180) arranged by Title and Actor
Exercise 80
Suppose R and S are compatible. Show that R ./ S = R ∩ S.
6.3. RELATIONAL ALGEBRA 187
2. (Commutativity of selection)
R∪S = S∪R
(R ∪ S) ∪ T = R ∪ (S ∪ T )
R×S = S×R
(R × S) × T = R × (S × T )
R ./ S = S ./ R
(R ./ S) ./ T = R ./ (S ./ T ),
provided R and S have common attributes and so do S and T , and
no attribute is common to all three relations.
6. (Selection pushing)
σp (R ∪ S ) = σp (R) ∪ σp (S )
σp (R ∩ S ) = σp (R) ∩ σp (S )
σp (R − S ) = σp (R) − σp (S )
σp (R × S ) = σp (R) × S
σp (R ./ S ) = σp (R) ./ S
7. (Projection pushing)
πa (R ∪ S ) = πa (R) ∪ πa (S )
πa (R ./ S ) = πa (πb (R) ./ πc (S ))
a ⊆ attr(R) ∪ attr(S)
b = (a ∩ attr(R)) ∪ d
c = (a ∩ attr(S)) ∪ d
d = attr(R) ∩ attr(S) 2
Selection splitting law says that evaluations of σp∧q (R) and σp (σq (R)) are
interchangeable; so, apply either of the following procedures: look at each tuple
of R and decide if it satisfies p ∧ q, or first identify the tuples of R which satisfy
q and from those identify the ones which satisfy p. The benefit of one strategy
over another depends on the relative costs of access times to the tuples and
predicate evaluation times. For large databases, which are stored in secondary
storage, access time is the major cost. Then it is preferable to evaluate σp∧q (R).
It is a good heuristic to apply projection and selection to as small a rela-
tion as possible. Therefore, it is almost always better to evaluate σp (R) ./ S
instead of σp (R ./ S ), i.e., apply selection to R which tends to be smaller than
R ./ S. Similarly, distributivity of projection over join is often used in query
optimizations.
Exercise 81
Suppose predicate p names only the attributes of S. Show that σp (R ./ S ) =
R ./ σp (S ).
Exercise 82
Show that πa (R ∩ S ) = πa (R) ∩ πa (S ) does not necessarily hold.
We would like to know the answer to: What are the addresses of theatres
where Will Smith is playing on Saturday at 9PM. We write a relational expres-
sion for this query and then transform it in several stages to a form which can
be efficiently evaluated. Let predicates
The query has the form πAddress (σp∧q (x )), where x is a relation yet to be
defined. Since x has to include information about Actor, T ime and Address,
we take x to be R ./ S ./ T . Relation x includes many more attributes than
the ones we desire; we will project away the unneeded attributes. The selection
operation extracts the tuples which satisfy the predicate p ∧ q, and then the
projection operation simply lists the addresses. So, the entire query is
πAddress (σp∧q hR ./ S ./ T i)
Above and in the following expressions, we use brackets of different shapes to
help readibility.
We transform this relational expression.
πAddress (σp∧q hR ./ S ./ T i)
≡ {Associativity of join; note that the required conditions are met}
πAddress (σp∧q h(R ./ S ) ./ T i)
≡ {Selection pushing over join}
πAddress (σp∧q hR ./ S i ./ T )
≡ {See lemma below. p names only the attributes of R and q of S}
πAddress (hσp (R) ./ σq (S )i ./ T )
≡ {Distributivity of projection over join; d = {T heatre}}
πAddress (πTheatre hσp (R) ./ σq (S )i ./ πAddress,Theatre (T ))
≡ {πAddress,Theatre (T ) = T }
πAddress (πTheatre hσp (R) ./ σq (S )i ./ T )
≡ {Distributivity of projection over join;
the common attribute of σp (R) and σq (S ) is T itle}
πAddress (hπTitle (σp (R)) ./ πTheatre,Title (σq (S ))i ./ T )
σp∧q (R ./ S ) = σp (R) ./ σq (S )
Proof:
σp∧q (R ./ S )
≡ {Selection splitting}
σp hσq (R ./ S )i
≡ {Commutativity of join}
σp hσq (S ./ R)i
≡ {Selection pushing over join}
190 CHAPTER 6. RELATIONAL DATABASE
σp hσq (S ) ./ Ri
≡ {Commutativity of join}
σp hR ./ σq (S )i
≡ {Selection pushing over join}
σp (R) ./ σq (S )
Student Id Dept Q1 Q2 Q3
216285932 CS 61 72 49
228544932 CS 35 47 56
859454261 CS 72 68 75
378246719 EE 70 30 69
719644435 EE 60 70 75
549876321 Bus 56 60 52
Avg Q1
59
Example of Aggregation Consider the Grades relation in Table 6.11 (page 191).
Now AAvg Q1 (Grades) creates Table 6.12 (page 191).
We create Table 6.13 (page 191) by AMin Q1, Min Q2, Min Q3 (Grades).
Table 6.13: Min of each quiz from relation Grades, Table 6.11
Consider the names of the attributes in the result Table 6.13, created by
AMin Q1, Min Q2, Min Q3 (Grades). We have simply concatenated the name of
the aggregation function and the attribute in forming those names. In general,
the user specifies what names to assign to each resulting attribute; we do not
develop the notation for such specification here.
Example of Grades, contd. Compute the average score in each quiz for each
department. We write Dept AAvg Q1, Avg Q2, Avg Q3 (Grades) to get Table 6.14
(page 192). Count the number of students in each department whose total score
exceeds 170: Dept ACount, Student Id hσQ1 +Q2 +Q3 >170 (Grades)i.
192 CHAPTER 6. RELATIONAL DATABASE
Table 6.14: Avg of each quiz by department from relation Grades, Table 6.11
String Matching
7.1 Introduction
In this chapter, we study a number of algorithms on strings, principally, string
matching algorithms. The problem of string matching is to locate all (or some)
occurrences of a given pattern string within a given text string. There are many
variations of this basic problem. The pattern may be a set of strings, and the
matching algorithm has to locate the occurrence of any pattern in the text.
The pattern may be a regular expression for which the “best” match has to be
found. The text may consist of a set of strings if, for instance, you are trying
to find the occurrence of “to be or not to be” in the works of Shakespeare. In
some situations the text string is fixed, but the pattern changes, as in searching
Shakespeare’s works. Quite often, the goal is not to find an exact match but a
close enough match, as in DNA sequences or Google searches.
The string matching problem is quite different from dictionary or database
search. In dictionary search, you are asked to determine if a given word belongs
to a set of words. Usually, the set of words —the dictionary— is fixed. A
hashing algorithm suffices in most cases for such problems. Database searches
can be more complex than exact matches over strings. The database entries
may be images (say, thumbprints), distances among cities, positions of vehicles
in a fleet or salaries of individuals. A query may involve satisfying a predicate,
e.g., find any “hospital that is within 10 miles of a specific vehicle and determine
the shortest path to it”.
We spend most of this chapter on the exact string matching problem: given
a text string t and a pattern string p over some alphabet, construct a list of
positions where p occurs within t. See Table 7.1 for an example.
The naive algorithm for this problem matches the pattern against the string
starting at every possible position in the text. This may take O(m × n) time
where m and n are the two string lengths. We show three different algorithms
all of which run much faster, and one is an O(m + n) algorithm.
193
194 CHAPTER 7. STRING MATCHING
index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
text a g c t t a c g a a c g t a a c g a
pattern a a c g
output * *
Exercise 83
You are given strings x and y of equal length, and asked to determine if x is a
rotation of y. Solve the problem through string matching. 2
Notation Let the text be t and the pattern be p. The symbols in a string
are indexed starting at 0. We write t[i] for the ith symbol of t, and t[i..j] for
the substring of t starting at i and ending just before j, i ≤ j. Therefore, the
length of t[i..j] is j − i; it is empty if i = j. Similar conventions apply to p.
For a string r, write r for its length. Henceforth, the length of the pattern p,
p, is m; so, its elements are indexed 0 through m − 1. Text t is an infinite string.
This assumption is made so that we do not have to worry about terminations
of algorithms; we simply show that every substring in t that matches p will be
found ultimately.
we may discard string s if val (p) 6= val (s). We illustrate the procedure for strings
of 5 decimal digits where val returns the sum of the digits in its argument string.
Let p = 27681; so, val (p) = 24. Consider the text given in Table 7.2; the
function values are also shown there (at the position at which a string ends).
There are two strings for which the function value is 24, namely 27681 and
19833. We compare each of these strings against the original string, 27681, to
find that there is one exact match.
Function val is similar to a hash function. It is used to remove most strings
from consideration. Only when val (s) = val (p) do we have a collision, and we
match s against p. As in hashing, we require that there should be very few col-
lisions on the average; moreover, val should be easily computable incrementally,
i.e., from one string to the next.
7.2. RABIN-KARP ALGORITHM 195
text 2 4 1 5 7 2 7 6 8 1 9 8 3 3 7 8 1 4
val (s) 19 19 22 27 30 24 31 32 29 24 30 29 22 23
Minimizing Collisions A function like val partitions the set of strings into
equivalence classes: two strings are in the same equivalence class if their function
values are identical. Strings in the same equivalence class cause collisions, as
in the case of 27681 and 19833, shown above. In order to reduce collisions, we
strive to make the equivalence classes equal in size. Then, the probability of
collision is 1/n, where n is the number of possible values of val .
For the function that sums the digits of a 5-digit number, the possible values
range from 0 (all digits are 0s) to 45 (all digits are 9s). But the 46 equiva-
lence classes are not equal in size. Note that val (s) = 0 iff s = 00000; thus if
you are searching for pattern 00000 you will never have a collision. However,
val (s) = 24 for 5875 different 5-digit strings. So, the probability of collision
is around 0.05875 (since there are 105 5-digit strings). If there had been an
even distribution among the 46 equivalence classes, the probability of collision
would have been 1/46, or around 0.02173, almost three times fewer collisions
than when val (s) = 24.
One way of distributing the numbers evenly is to let val (s) = s mod q, for
some q; we will choose q to be a prime, for efficient computation. Since the
number of 5-digit strings may not be a multiple of q, the distribution may not
be completely even, but no two classes differ by more than 1 in their sizes. So,
this is as good as it gets.
Examples
(20 + 5) mod 3 = ((20 mod 3) + 5) mod 3
((x × y) + g) mod p = (((x mod p) × y) + (g mod p)) mod p
xn mod p = (x mod p)n mod p
x2n mod p = (x2 )n mod p = (x2 mod p)n mod p
xn mod p = xn mod p mod p, is wrong. 2
We use this rule to compute rb mod q.
rb mod q
= {rb = (ar − a × 10n−1 ) × 10 + b}
((ar − a × 10n−1 ) × 10 + b) mod q
= {replace ar and 10n−1 }
(((ar mod q) − a × (10n−1 mod q)) × 10 + b) mod q
= {let u = ar mod q and f = 10n−1 mod q, both already computed}
((u − a × f ) × 10 + b) mod q
Example Let q be 47. Suppose we have computed 12768 mod 47, which is
31. And, also 104 mod 47 which is 36. We compute 27687 mod 47 by
Exercise 84
Show that the equivalence classes under mod q are almost equal in size. 2
Exercise 85
Derive a general formula for incremental calculation when the alphabet has d
symbols, so that each string can be regarded as a d-ary number. 2
Note that in the third case, r is not changed; so, none of the symbols in
t[l0 ..r] will be scanned again.
The question (in the third case) is, what is l0 ? Abbreviate t[l..r] by v and
0
t[l ..r] by u. We show below that u is a proper prefix and a proper suffix of v.
Thus, l0 is given by the longest u that is both a proper prefix and a proper suffix
of v.
From the invariant, v is a prefix of p. Also, from the invariant, u is a prefix
of p, and, since l0 > l, u is a shorter prefix than v. Therefore, u is a proper
prefix of v. Next, since their right ends match, t[l0 ..r] is a proper suffix of t[l..r],
i.e., u is a proper suffix of v.
We describe the algorithm in schematic form in Table 7.3. Here, we have
already matched the prefix “axbcyaxb”, which is v. There is a mismatch in
the next symbol. We shift the pattern so that the prefix “axb”, which is u, is
aligned with a portion of the text that matches it.
index l l0 r
text a x b c y a x b z - - - - - -
pattern a x b c y a x b t s
newmatch a x b c y a x b t s
In general, there may be many strings, u, which are both proper prefix and
suffix of v; in particular, the empty string satisfies this condition for any v.
Which u should we choose? Any u we choose could possibly lead to a match,
because we have not scanned beyond t[r]. So, we increment l by the minimum
required amount, i.e., u is the longest string that is both a proper prefix and
suffix of v; we call u the core of v.
The question of computing l0 then reduces to the following problem: given a
string v, find its core. Then l0 = l + (length of v) − (length of core of v). Since v
is a prefix of p, we precompute the cores of all prefixes of the pattern, so that we
may compute l0 whenever there is a failure in the match. In the next subsection
we develop a linear algorithm to compute cores of all prefixes of the pattern.
After the pattern has been completely matched, we record this fact and let
l0 = l + (length of p) − (length of core of p).
We show that KMP runs in linear time. Observe that l + r increases in each
step (in the last case, l0 > l). Both l and r are bounded by the length of the
text string; so the number of steps is bounded by a linear function of the length
of text. The core computation, in the next section, is linear in the size of the
pattern. So, the whole algorithm is linear.
7.3. KNUTH-MORRIS-PRATT ALGORITHM 199
Exercise 86
Find all strings u such that u ≺ ababab. 2
The following properties of follow from the properties of the prefix relation;
you are expected to develop the proofs. Henceforth, u and v denote arbitrary
strings and is the empty string.
Exercise 87
1. u.
2. is a partial order. Use the fact that prefix relation is a partial order.
(u v ∧ w v) ⇒ (u w ∨ w u) 2
It follows, by replacing u with c(v), that c(v) ≺ v. In particular, c(v) < v. Also,
every non-empty string has a unique core. To see this, let r and s be cores of
v. We show that r = s. For any u, u c(v) ≡ u ≺ v. Using r and s for c(v),
we get
u r ≡ u ≺ v and u s ≡ u ≺ v
Exercise 88
Let u be a longer string than v. Is c(u) necessarily longer than c(v)? 2
Exercise 89
Show that the core function is monotonic, that is,
u v ⇒ c(u) c(v) 2
i times
z }| {
i i
We write c (v) for i-fold application of c to v, i.e., c (v) = c(c(..(c (v)..)))
and c0 (v) = v. Since c(v) < v, ci (v) is defined only for some i, not necessarily
all i, in the range 0 ≤ i ≤ v. Note that, ci+1 (v) ≺ ci (v) . . . c1 (v) ≺ c0 (v) = v.
Exercise 90
Compute ci (ababab) for all possible i. What is ci (ab)n , for any i, i ≤ n? 2
• v = 0:
uv
≡ {v = 0, i.e., v = }
u= ∧ v=
≡ {definition of c0 : v = ⇒ ci (v) is defined for i = 0 only}
h∃i : 0 ≤ i : u = ci (v)i
7.3. KNUTH-MORRIS-PRATT ALGORITHM 201
• v > 0:
uv
≡ {definition of }
u=v ∨ u≺v
≡ {definition of core}
u = v ∨ u c(v)
≡ {c(v) < v; apply induction hypothesis on second term}
u = v ∨ h∃i : 0 ≤ i : u = ci (c(v))i
≡ {rewrite}
u = c0 (v) ∨ h∃i : 0 < i : u = ci (v)i
≡ {rewrite}
h∃i : 0 ≤ i : u = ci (v)i 2
Example In Figure 7.1, u is a string and s is the symbol following it. The
prefixes of u ending at A, B, C are c1 (u), c2 (u) and c3 (u), and the symbols
following them are a, b, c, respectively. Here C is the empty string. To compute
the core of us, match s against a; in case of failure, match s against b, and again
in case of failure, match s against c.
202 CHAPTER 7. STRING MATCHING
c b a s
C B A u
We compute the cores of pattern p using the scheme of Section 7.3.3.3. The
core for the empty prefix is undefined. For a prefix of length 1, the core is .
Next, suppose we have computed the cores of all prefixes of p up to length j;
we then compute the core of the next longer prefix, of length j + 1, using the
scheme of Section 7.3.3.3.
First, let us decide how to store the cores. Any core of a prefix of p is a prefix
of p. So, we can simply store the length of the core. We store the cores in array
d, where d[k], for k > 0, is the length of the core of p[0..k], i.e., d[k] = |c(p[0..k])|.
(Prefix of length 0 is which does not have a core.)
The following program has the invariant that the cores of all prefixes up to
and including p[0..j] are known; let u be p[0..j]. The goal of the program is to
next compute the core of us where s = p[j]. To this end, we apply the scheme
of Section 7.3.3.3, whereby we successively check if u[|c1 (u)|] = s, u[|c2 (u)|] = s,
· · · , . Suppose u[|ck (u)|] 6= s for all k, 0 < k < t. We next have to check if
u[|ct (u)|] = s. Let i = |ct (u)|; so the check is if u[i] = s, or since u is a prefix of
p, p[i] = s, or since p[j] = s, if p[i] = p[j]. If this check succeeds, we have found
the core of us, i.e., of p[0..j + 1]; it is simply p[0..i + 1], or d[j + 1] = i + 1. If
the check fails, we have to set i := |ct+1 (u)| = |c(ct (u))| = |c(p[0..i])| = d(i).
However, if i = 0 then d(i) is not defined, and we conclude that the core of
p[0..j + 1] = , or d[j + 1] = 0.
Below, b → C, where b is a predicate and S a sequence of statements, is
known as a guarded command ; b is the guard and C the command. Command
C is executed only if b holds. Below, exactly one guard is true in any iteration.
j := 1; d[1] := 0; i := 0;
while j < p do
S1 :: p[i] = p[j] → d[j + 1] := i + 1; j := j + 1; i := d[j]
S2 :: p[i] 6= p[j] ∧ i 6= 0 → i := d[i]
S3 :: p[i] 6= p[j] ∧ i = 0 → d[j + 1] := 0; j := j + 1; i := d[j] {i = 0}
endwhile
7.3. KNUTH-MORRIS-PRATT ALGORITHM 203
• Proof for S1 :
{2j − i = n} d[j + 1] := i + 1; j := j + 1; i := d[j] {2j − i > n}
, goal
{2j − i = n} d[j + 1] := i + 1; j := j + 1 {2j − d[j] > n}
, axiom of assignment
{2j − i = n} d[j + 1] := i + 1 {2(j + 1) − d[j + 1] > n}
, axiom of assignment
{2j − i = n} {2(j + 1) − (i + 1) > n}
, axiom of assignment
2j − i = n ⇒ 2(j + 1) − (i + 1) > n
, simplify
true , arithmetic 2
• Proof for S2 :
{2j − i = n} i := d[i] {2j − i > n}
, goal
2j − i = n ⇒ 2j − d[i] > n , axiom of assignment
2j − i < 2j − d[i] , arithmetic
d[i] < i , arithmetic
true , i = u and d[i] = c(u) 2
• Proof for S3 :
{2j − i = n} d[j + 1] := 0; j := j + 1; i := d[j] {2j − i > n}
, goal
{2j − i = n} d[j + 1] := 0; j := j + 1 {2j − d[j] > n}
, axiom of assignment
{2j − i = n} d[j + 1] := 0 {2(j + 1) − d[j + 1] > n}
, axiom of assignment
{2j − i = n} {2(j + 1) − 0 > n}
, axiom of assignment
2j − i = n ⇒ 2(j + 1) > n , simplify
true , arithmetic and i ≥ 0 (invariant) 2
204 CHAPTER 7. STRING MATCHING
Exercise 91
1. Show that you can match pattern p against text t by computing the cores
of all prefixes of pt (pt is the concatenation of p and t).
2. Define u to be the k-core of string v, where k ≥ 0 and v 6= , if u ≺ v, u’s
length is at most k and u is the longest string with this property. Show
that the k-core is well-defined. Devise an algorithm to compute the k-core
of a string for a given k. 2
l := 0;
{ Q1 }
loop
{ Q1 }
“increase l while preserving Q1”
{ Q1 }
endloop
Thus, the whole pattern is matched when j = 0, and no part has been matched
when j = m.
We establish Q2 by setting j to m. Then, we match the symbols from right
to left of the pattern (against the corresponding symbols in the alignment) until
we find a mismatch or the whole pattern is matched.
j := m;
{ Q2 }
while j > 0 ∧ p[j − 1] = t[l + j − 1] do j := j − 1 endwhile
{ Q1 ∧ Q2 ∧ (j = 0 ∨ p[j − 1] 6= t[l + j − 1]) }
if j = 0
then { Q1 ∧ Q2 ∧ j = 0 } record a match at l; l := l0 { Q1 }
else { Q1 ∧ Q2 ∧ j > 0 ∧ p[j − 1] 6= t[l + j − 1] } l := l00 { Q1 }
endif
{ Q1 }
Next, we show how to compute l and l0 , l0 > l and l00 > l, so that Q1 is
satisfied. For better performance, l should be increased as much as possible in
each case. We take up the computation of l00 next; computation of l0 is a special
case of this.
The precondition for the computation of l00 is,
Q1 ∧ Q2 ∧ j > 0 ∧ p[j − 1] 6= t[l + j − 1].
We consider two heuristics, each of which can be used to calculate a value
of l00 ; the greater value is assigned to l. The first heuristic, called the bad
symbol heuristic, exploits the fact that we have a mismatch at position j − 1
of the pattern. The second heuristic, called the good suffix heuristic, uses the
fact that we have matched a suffix of p with the suffix of the alignment, i.e.,
p[j..m] = t[l + j..l + m] (though the suffix may be empty).
text - - - - - - - h c e
pattern a t t e n d a n c e
align a t t e n d a n c e
The suffix “ce” has been matched; the symbols ’h’ and ’n’ do not match.
We now reason as follows. If symbol ’h’ of the text is part of a full match,
that symbol has to be aligned with an ’h’ of the pattern. There is no ’h’ in
the pattern; so, no match can include this ’h’ of the text. Hence, the pattern
206 CHAPTER 7. STRING MATCHING
may be shifted to the symbol following ’h’ in the text, as shown under align
in Table 7.5. Since the index of ’h’ in the text is l + j − 1 (that is where the
mismatch occurred), we have to align p[0] with t[l +j], i.e., l should be increased
to l + j. Observe that we have shifted the alignment several positions to the
right without scanning the text symbols shown by dashes, ’-’, in the text; this
is how the algorithm achieves sublinear running time in many cases.
Next, suppose the mismatched symbol in the text is ’t’, as shown in Table 7.6.
text - - - - - - - t c e
pattern a t t e n d a n c e
Unlike ’h’, symbol ’t’ appears in the pattern. We align some occurrence of ’t’
in the pattern with that in the text. There are two possible alignments, which
we show in Table 7.7.
text - - t c e - - - - - -
align1 a t t e n d a n c e
align2 a t t e n d a n c e
Minimum shift rule: Shift the pattern by the minimum allowable amount.
According to this rule, in Table 7.8 we would shift the pattern to get align1 .
Justification for the rule: This rule preserves Q1; we never skip over a possible
match following this rule, because no smaller shift yields a match at the given
position, and, hence no full match.
Conversely, consider the situation shown in Table 7.8. The first pattern line
shows an alignment where there is a mismatch at the rightmost symbol in the
alignment. The next two lines show two possible alignments that correct the
mismatch. Since the only text symbol we have examined is ’x’, each dash in
Table 7.8 could be any symbol at all; so, in particular, the text could be such
that the pattern matches against the first alignment, align1 . Then, we will
violate invariant Q1 if we shift the pattern as shown in align2 . 2
For each symbol in the alphabet, we precalculate its rightmost position in
the pattern. The rightmost ’t’ in “attendance” is at position 2. To align the
mismatched ’t’ in the text in Table 7.7 that is at position t[l+j −1], we align p[2]
with t[l + j − 1], that is, p[0] with t[l − 2 + j − 1]. In general, if the mismatched
7.4. BOYER-MOORE ALGORITHM 207
text - - x - -
pattern x x y
align1 x x y
align2 x x y
symbol’s rightmost occurrence in the pattern is at p[k], then p[0] is aligned with
t[l − k + j − 1], or l is increased by −k + j − 1. For a nonexistent symbol in the
pattern, like ’h’, we set its rightmost occurrence to −1 so that l is increased to
l + j, as required.
The quantity −k + j − 1 is negative if k > j − 1. That is, the rightmost
occurrence of the mismatched symbol in the pattern is to the right of the mis-
match. Fortunately, the good suffix heuristic, which we discuss in Section 7.4.3,
always yields a positive increment for l; so, we ignore this heuristic if it yields a
negative increment.
text - - - - - z a b - -
pattern a b x a b y a b
Then, we shift the pattern to the right so that the matched part is occupied
by the same symbols, “ab”; this is possible only if there is another occurrence
208 CHAPTER 7. STRING MATCHING
of “ab” in the pattern. For the pattern of Table 7.9, we can form the new
alignment in two possible ways, as shown in Table 7.10.
text - - z a b - - - - - -
align1 a b x a b y a b
align2 a b x a b y a b
text - - x a b - - -
pattern a b x a b
align a b x a b
As shown in the preceding examples, in all cases we shift the pattern to align
the right end of a proper prefix r with the right end of the previous alignment.
Also, r is a suffix of s or s is a suffix of r. In the example in Table 7.10, s is
“ab” and there are two possible r, “abxab” and “ab”, for which s is a suffix.
Additionally, is a suffix of s. In Table 7.11, s is “xab” and there is exactly one
nonempty r, “ab”, which is a suffix of s. Let
R = {r is a proper prefix of p ∧
(r is a suffix of s ∨ s is a suffix of r)}
The good suffix heuristic aligns an r in R with the end of the previous
alignment, i.e., the pattern is shifted to the right by m − r. Let b(s) be the
amount by which the pattern should be shifted for a suffix s. According to the
minimum shift rule,
b(s) = min{m − r | r ∈ R}
In the rest of this section, we develop an efficient algorithm for computing b(s).
l := l0 is implemented by
l := l + b(p), and
l := l00 is implemented by
l := l + max(b(s), j − 1 − rt(h)),
where s = p[j..m] and h = t[l + j − 1]
b(s) = min{m − r | r ∈ R}
Then,
P2: c(p) ∈ R
Proof: From the definition of core, c(p) ≺ p. Hence, c(p) is a proper prefix of p.
Also, c(p) is a suffix of p, and, since s is a suffix of p, they are totally ordered.
So, either c(p) is a suffix of s or s is a suffix of c(p). Hence, c(p) ∈ R. 2
Recall that
V = {v | v is a suffix of p ∧ c(v) = s}
The program
The goal of the concrete program is to compute an array e, where e[j] is the
amount by which the pattern is to be shifted when the matched suffix is p[j..m],
0 ≤ j ≤ m. That is,
We have no need to keep explicit prefixes and suffixes; instead, we keep their
lengths, s in i and v in j. Let array f hold the lengths of the cores of all suffixes
of p. Summarizing, for suffixes s and v of p,
i = s,
j = v,
e[m − i] = b[s], using i = s,
f [v] = c(v), i.e., f [j] = c(v)
Since our goal is to compute the lengths of the cores, c(v), we compute c(u)
instead, i.e., the lengths of the cores of the prefixes of p̂, and store them in f .
Exercise 92
Show that
1. r s ≡ r̂ ŝ
2. r ≺ s ≡ r̂ ≺ ŝ
3. c(r̂) = c(r)
d
Solution
1. rs
≡ {definition of }
r is a prefix of s and r is a suffix of s
≡ {properties of prefix, suffix and reverse}
r̂ is a suffix of ŝ and r̂ is a prefix of ŝ
≡ {definition of }
r̂ ŝ
2. Similarly.
3. Indirect proof of equality is a powerful method for proving equality. This
can be applied to elements in a set which has a reflexive and antisymmetric
relation like . To prove y = z for specific elements y and z, show that
for every element x,
x y ≡ x z.
s c(r̂)
≡ {definition of core}
s ≺ r̂
≡ {second part of this exercise}
ŝ ≺ r
≡ {definition of core}
ŝ c(r)
≡ {first part of this exercise}
s c(r)
d 2
Parallel Recursion
8.2 Powerlist
The basic data structure on which recursion is employed (in LISP[37] or ML[38])
is a list. A list is either empty or it is constructed by concatenating an element
to a list. (We restrict ourselves to finite lists throughout this paper.) We call
such a list linear (because the list length grows by 1 as a result of applying
the basic constructor). Such a list structure seems unsuitable for expressing
parallel algorithms succinctly; an algorithm that processes the list elements has
to describe how successive elements of the list are processed.
1A notable exception is the recursive description of a prefix sum algorithm in [26].
215
216 CHAPTER 8. PARALLEL RECURSION
h0i | h1i = h0 1i
h0i ./ h1i = h0 1i
h0 1i | h2 3i = h0 1 2 3i
h0 1i ./ h2 3i = h0 2 1 3i
8.2.1 Definitions
A data item from the linear list theory will be called a scalar. (Typical scalars
are the items of base types—integer, boolean, etc.—tuples of scalars, functions
from scalars to scalars and linear lists of scalars.) Scalars are uninterpreted in
our theory. We merely assume that scalars can be checked for type compatibility.
We will use several standard operations on scalars for purposes of illustration.
hSi or hP i or u | v or u ./ v
Examples
8.2. POWERLIST 217
revhxi = hxi
rev(p | q) = (rev q) | (rev p)
218 CHAPTER 8. PARALLEL RECURSION
The case analysis, as for linear lists, is based on the length of the argument
powerlist. We adopt the pattern matching scheme of ML[38] and Miranda[52]2
to deconstruct the argument list into its components, p and q, in the recursive
case. Deconstruction, in general, uses the operators | and ./ ; see Section 8.3.
In the definition of rev, we have used | for deconstruction; we could have used
./ instead and defined rev in the recursive case by
It can be shown, using the laws in Section 8.3, that the two proposed definitions
of rev are equivalent and that
rev(rev P ) = P
Scalar Functions
Operations on scalars are outside our theory. Some of the examples in this
paper, however, use scalar functions, particularly, addition and multiplication
(over complex numbers) and cons over linear lists. A scalar function, f , has zero
or more scalars as arguments and its value is a scalar. We coerce the application
of f to a powerlist by applying f “pointwise” to the elements of the powerlist.
For a scalar function f of one argument we define
f hxi = hf xi
f (p | q) = (f p) | (f q)
f (p ./ q) = (f p) ./ (f q)
hxi ⊕ hyi = hx ⊕ yi
(p | q) ⊕ (u | v) = (p ⊕ u) | (q ⊕ v)
(p ./ q) ⊕ (u ./ v) = (p ⊕ u) ./ (q ⊕ v)
8.2.3 Discussion
The base case of a powerlist is a singleton list, not an empty list. Empty lists (or,
equivalent data structures) do not arise in the applications we have considered.
For instance, in matrix algorithms the base case is a 1 × 1 matrix rather than an
empty matrix, Fourier transform is defined for a singleton list (not the empty
list) and the smallest hypercube has one node.
The recursive definition of a powerlist says that a powerlist is either of the
form u ./ v or u | v. In fact, every non-singleton powerlist can be written in
either form in a unique manner (see Laws in Section 8.3). A simple way to view
p | q = L is that if the elements of L are indexed by n-bit strings in increasing
numerical order (where the length of L is 2n ) then p is the sublist of elements
whose highest bit of the index is 0 and q is the sublist with 1 in the highest bit
of the index. Similarly, if u ./ v = L then u is the sublist of elements whose
lowest bit of the index is 0 and v’s elements have 1 as the lowest bit of the index.
At first, it may seem strange to allow two different ways for constructing the
same list—using tie or zip. As we see in this paper this causes no difficulty, and
further, this flexibility is essential because many parallel algorithms—the Fast
Fourier Transform being the most prominent—exploit both forms of construc-
tion.
We have restricted u, v in u | v and u ./ v to be similar. This restriction
allows us to process a powerlist by recursive divide and conquer, where each
division yields two halves that can be processed in parallel, by employing the
same algorithm. (Square matrices, for instance, are often processed by quarter-
ing them. We will show how quartering, or quadrupling, can be expressed in
our theory.) The similarity restriction allows us to define complete binary trees,
hypercubes and square matrices that are not “free” structures.
The length of a powerlist is a power of 2. This restricts our theory somewhat.
It is possible to design a more general theory eliminating this constraint; we
sketch an outline in Section 8.6.
8.3 Laws
L0. For singleton powerlists, hxi, hyi
hxi | hyi = hxi ./ hyi
Inductive Proofs
Most proofs on powerlists are by induction on the length, depth or shape of the
list. The length, len, of a powerlist is the number of elements in it. Since the
length of a powerlist is a power of 2, the logarithmic length, lgl, is a more useful
measure. Formally,
lglhxi = 0
lgl(u | v) = 1 + (lgl u)
The depth of a powerlist is the number of “levels” in it.
depth hSi = 0
depth hP i = 1 + (depth P )
depth (u | v) = depth u
(In the last case, since u, v are similar powerlists they have the same depth.)
Most inductive proofs on powerlists order them lexicographically on the pair
(depth, logarithmic length). For instance, to prove that a property Π holds for
all powerlists, it is sufficient to prove
ΠhSi, and
Π P ⇒ ΠhP i, and
(Π u) ∧ (Π v) ∧ (u, v) similar ⇒ Π(u | v)
The last proof step could be replaced by
(Π u) ∧ (Π v) ∧ (u, v) similar ⇒ Π(u ./ v)
The shape of a powerlist P is a sequence of natural numbers n0 , n1 , . . . , nd where
d is the depth of P and
n0 is the logarithmic length of P ,
n1 is the logarithmic length of (any) element of P , say r
n2 is the logarithmic length of any element of r, . . .
..
.
8.4. EXAMPLES 221
8.4 Examples
We show a few small algorithms on powerlists. These include such well-known
examples as the Fast Fourier Transform and Batcher sorting schemes. We re-
strict the discussion in this section to simple (unnested) powerlists (where the
depth is 0); higher dimensional lists (and algorithms for matrices and hyper-
cubes) are taken up in a later section. Since the powerlists are unnested, induc-
tion based on length is sufficient to prove properties of these algorithms.
8.4.1 Permutations
We define a few functions that permute the elements of powerlists. The function
rev, defined in Section 8.2.2, is a permutation function. These functions appear
as components of many parallel algorithms.
Rotate
Function rr rotates a powerlist to the right by one; thus, rrha b c di =
hd a b ci. Function rl rotates to the left: rlha b c di = hb c d ai.
P ’s indices rotated right = (000 100 001 101 010 110 011 111)
rs P = ha c e g b d f hi
P ’s indices rotated left = (000 010 100 110 001 011 101 111)
ls P = ha e b f c g d hi
Rotate Index
A class of permutation functions can be defined by the transformations
on the element indices. For a powerlist of 2n elements we associate an n-bit
index with each element, where the indices are the binary representations of
0, 1, .., 2n − 1 in sequence. (For a powerlist u | v, indices for the elements in
u have “0” as the highest bit and in v have “1” as the highest bit. In u ./ v,
similar remarks apply for the lowest bit.) Any bijection, h, mapping indices
to indices defines a permutation of the powerlist: The element with index i is
moved to the position where it has index (h i). Below, we consider two simple
index mapping functions; the corresponding permutations of powerlists are use-
ful in describing the shuffle-exchange network. Note that indices are not part
of our theory.
A function that rotates an index to the right (by one position) has the
permutation function rs (for right shuffle) associated with it. The definition
of rs may be understood as follows. The effect of rotating an index to the
right is that the lowest bit of an index becomes the highest bit; therefore, if
rs is applied to u ./ v, the elements of u—those having 0 as the lowest bit—
will occupy the first half of the resulting powerlist (because their indices have
“0” as the highest bit, after rotation); similarly, v will occupy the second half.
Analogously, the function that rotates an index to the left (by one position)
induces the permutation defined by ls (for left shuffle), below. Figure 8.2 shows
the effects of index rotations on an 8-element list.
rshxi = hxi , lshxi = hxi
rs(u ./ v) = u | v , ls(u | v) = u ./ v
Inversion
The function inv is defined by the following function on indices. An element
with index b in P has index b0 in (inv P ), where b0 is the reversal of the bit
string b. Thus,
inv(inv P ) = P ,
inv(rev P ) = rev(inv P )
and for any scalar operator ⊕
n=0 h[ ]i
n=1 h[0] [1]i
n=2 h[00] [01] [11] [10]i
n=3 h[000] [001] [011] [010] [110] [111] [101] [100]i
8.4.2 Reduction
In the linear list theory [5], reduction is a higher order function of two argu-
ments, an associative binary operator and a list. Reduction applied to ⊕ and
[a0 a1 . . . an ] yields (a0 ⊕ a1 ⊕ . . . ⊕ an ). This function over powerlists is defined
by
red ⊕ hxi = x
red ⊕ (p | q) = (red ⊕ p) ⊕ (red ⊕ q)
G 0 = h[ ]i
G (n + 1) = (0 : P ) | (1 : (rev P ))
where P = (G n)
8.4.4 Polynomial
A polynomial with coefficients pj , 0 ≤ j < 2n , where n ≥ 0, may be represented
by a powerlist
X p whose j th element is pj . The polynomial value at some point
ω is pj × ω j . For n > 0 this quantity is
0≤j<2n
X X
p2j × ω 2j + p2j+1 × ω 2j+1 .
0≤j<2n−1 0≤j<2n−1
8.4. EXAMPLES 225
hxi ep w = hxi
(p ./ q) ep w = (p ep w2 ) + (w × (q ep w2 ))
F T p = p ep (W p)
We collect the two equations for F T to define F F T , the Fast Fourier Trans-
form. In the following, (powers p) is the powerlist hω 0 , ω 1 , .. , ω N −1 i where N
is the length of p and ω is the (2 × N )th principal root of 1. This was the value
of u in the previous paragraph. The function powers can be defined similarly
to ep.
F F T hxi = hxi
F F T (p ./ q) = (P + u × Q) | (P − u × Q)
where P = FFT p
Q = FFT q
u = powers p
IF T (r | s) = p ./ q
in the unknowns p, q. This form of deconstruction is chosen so that we can
easily solve the equations we generate, next. Taking FFT of both sides,
F F T (IF T (r | s)) = F F T (p ./ q)
The left side is (r | s) because IFT, FFT are inverses. Replacing the right
side by the definition of F F T (p ./ q) yields the following equations.
r | s = (P + u × Q) | (P − u × Q)
P = FFT p
Q = FFT q
u = powers p
These equations are easily solved for the unknowns P, Q, u, p, q. (The law of
unique deconstruction, L2, can be used to deduce from the first equation that
r = P + u × Q and s = P − u × Q. Also, since p and r are of the same length we
may define u using r instead of p.) The solutions of these equations yield the
following definition for IFT. Here, /2 divides each element of the given powerlist
by 2.
IF T hxi = hxi
IF T (r | s) = p ./ q
where P = (r + s)/2
u = powers r
Q = ((r − s)/2)/u
p = IF T P
q = IF T Q
As in the FFT, the definition of IFT includes both constructors, | and ./ .
It can be implemented efficiently on a butterfly network. The complexity of
IFT is same as that of the FFT.
228 CHAPTER 8. PARALLEL RECURSION
sorthxi = hxi
sort(p ./ q) = (sort p) merge (sort q)
where merge (written as a binary infix operator) creates a single sorted powerlist
out of the elements of its two argument powerlists each of which is sorted. In
this section, we show two different methods for implementing merge. One
scheme is Batcher merge, given by the operator bm. Another scheme is given
by bitonic sort where the sorted lists u, v are merged by applying the function
bi to (u | (rev v)).
A comparison operator, l, is used in these algorithms. The operator is
applied to a pair of equal length powerlists, p, q; it creates a single powerlist out
of the elements of p, q by
p l q = (p min q) ./ (p max q)
That is, the 2ith and (2i + 1)th items of p l q are (pi min qi ) and (pi max qi ),
respectively. The powerlist p l q can be computed in constant time using
O(len p) processors.
Bitonic Sort
A sequence of numbers, x0 , x1 , .., xi , .., xN , is bitonic if there is an index
i, 0 ≤ i ≤ N , such that x0 , x1 , .., xi is monotonic (ascending or descending)
and xi , .., xN is monotonic. The function bi, given below, applied to a bitonic
powerlist returns a sorted powerlist of the original items.
bihxi = hxi
bi(p ./ q) = (bi p) l (bi q)
For sorted powerlists u, v, the powerlist (u | (rev v)) is bitonic; thus u, v can
be merged by applying bi to (u | (rev v)). The form of the recursive definition
suggests that bi can be implemented on O(N ) processors in O(log N ) parallel
steps, where N is the length of the argument powerlist.
Batcher Merge
Batcher has also proposed a scheme for merging two sorted lists. We define
this scheme, bm, as an infix operator below.
Powerlists containing only 0’s and 1’s have the following properties.
The simplicity of (A2), compared with (A20 ), may suggest why ./ is the pri-
mary operator in parallel sorting. 2
The following results, (B1, B2), are easy to prove. We prove (B3).
B2. z(p l q) = zp + zq 2
B4. z(bi p) = zp
Induction: Let L = p ./ q
p ./ q bitonic
⇒ {A3}
p bitonic, q bitonic, |zp − zq| ≤ 1
⇒ {induction on p and q}
(bi p) sorted, (bi q) sorted, |zp − zq| ≤ 1
⇒ {from B4: z(bi p) = zp, z(bi q) = zq}
(bi p) sorted, (bi q) sorted, |z(bi p) − z(bi q)| ≤ 1
⇒ {apply B3 with (bi p), (bi q) for p, q}
(bi p) l (bi q) sorted
⇒ {definition of bi}
bi(p ./ q) sorted
B6. p bm q = bi(p | (rev q)), where rev reverses a powerlist (Section 8.2.2).
If p, q are sorted then p | (rev q) is bitonic (a fact that we don’t prove here).
Then, from the correctness of bi it follows that bi(p | (rev q)) and, hence, p bm q
8.4. EXAMPLES 231
bi(hxi | revhyi)
= {definition of rev}
bi(hxi | hyi)
= {(hxi | hyi) = (hxi ./ hyi)}
bi(hxi ./ hyi)
= {definition of bi}
hxi l hyi
= {definition of bm}
hxi bm hyi
Induction: Let p, q = r ./ s, u ./ v
•0 •1 •2 •3 •4 •5 •6 •7 level 0
•0 •1 •2 •3 •4 •5 •6 •7 level 1
•0 •1 •2 •3 •4 •5 •6 •7 level 2
•0 •1 •2 •3 •4 •5 •6 •7 level 3
Figure 8.4: A network to compute the prefix sum of 8 elements.
that is, in (ps L) the element with index i, i > 0, is obtained by applying ⊕ to
the first (i + 1) elements of L in order. We will give a formal definition of prefix
sum later in this section.
Prefix sum is of fundamental importance in parallel computing. We show
that two known algorithms for this problem can be concisely represented and
proved in our theory. Again, zip turns out to be the primary operator for
describing these algorithms.
A particularly simple scheme for prefix sum of 8 elements is shown in Fig-
ure 8.4. In that figure, the numbered nodes represent processors, though the
same 8 physical processors are used at all levels. Initially, processor i holds the
list element Li , for all i. The connections among the processors at different
levels depict data transmissions. In level 0, each processor, from 0 through 6,
sends its data to its right neighbor. In the ith level, processor i sends its data
to (i + 2i ), if such a processor exists (this means that for j < 2i , processor j
receives no data in level i data transmission). Each processor updates its own
data, d, to r ⊕d where r is the data it receives; if it receives no data in some level
then d is unchanged. It can be shown that after completion of the computation
at level (log2 (len L)), processor i holds the ith element of (ps L).
Another scheme, due to Ladner and Fischer[32], first applies ⊕ to adjacent
elements x2i , x2i+1 to compute the list hx0 ⊕ x1 , .. x2i ⊕ x2i+1 , ..i. This list
has half as many elements as the original list; its prefix sum is then computed
recursively. The resulting list is hx0 ⊕ x1 , .., x0 ⊕ x1 ⊕ .. ⊕ x2i ⊕ x2i+1 , . . .i.
This list contains half of the elements of the final list; the missing elements are
8.4. EXAMPLES 233
Specification
As we did for the sorting schemes (Section 8.4.6), we introduce an operator
in terms of which the prefix sum problem can be defined. First, we postulate
that 0 is the left identity element of ⊕, i.e., 0 ⊕ x = x. For a powerlist p, let
p∗ be the powerlist obtained by shifting p to the right by one. The effect of
shifting is to append a 0 to the left and discard the rightmost element of p;
thus, ha b c di∗ = h0 a b ci. Formally,
∗
hxi = h0i
(p ./ q)∗ = q ∗ ./ p
It is easy to show
S1.(r ⊕ s)∗ = r∗ ⊕ s∗
S2.(p ./ q)∗∗ = p∗ ./ q ∗
z = z∗ ⊕ L (DE)
z0 = (z ∗ )0 ⊕ L0
= 0 ⊕ L0
= L0 , and
zi+1 = zi ⊕ Li+1 , 0 ≤ i < (len L) − 1
Notes
3. The uniqueness of the solution of (DE) can be proved entirely within the
powerlist algebra, similar to the derivation of Ladner-Fischer scheme given
later in this section.
234 CHAPTER 8. PARALLEL RECURSION
lf hxi = hxi
lf (p ./ q) = (t∗ ⊕ p) ./ t
where t = lf (p ⊕ q)
Correctness
We can prove the correctness of sps and lf by showing that the function ps
satisfies the equations defining each of these functions. It is more instructive to
see that both sps and lf can be derived easily from the specification (DE). We
carry out this derivation for the Fischer-Ladner scheme as an illustration of the
power of algebraic manipulations. First, we note, pshxi = hxi.
pshxi
= {from the defining equation DE for pshxi}
(pshxi)∗ ⊕ hxi
= {definition of ∗ }
h0i ⊕ hxi
= {⊕ is a scalar operation}
h0 ⊕ xi
= {0 is the identity of ⊕}
hxi
r ./ t
= {r ./ t = ps (p ./ q). Using (DE)}
(r ./ t)∗ ⊕ (p ./ q)
= {(r ./ t)∗ = t∗ ./ r}
(t∗ ./ r) ⊕ (p ./ q)
= {⊕, ./ commute}
(t∗ ⊕ p) ./ (r ⊕ q)
LF1. r = t∗ ⊕ p , and
LF2. t=r⊕q
LF3. t = ps(p ⊕ q)
ps(p ./ q)
= {by definition}
r ./ t
= { Using (LF1) for r}
(t∗ ⊕ p) ./ t
where t is defined by LF3. This is exactly the definition of the function lf for a
non-singleton powerlist. We also note that
r
= {by eliminating t from (LF1) using (LF2) }
(r ⊕ q)∗ ⊕ p
= { definition of *}
r ⊕ q∗ ⊕ p
∗
Using (DE) and this equation we obtain LF4 that is used in proving the cor-
rectness of sps, next.
LF4. r = ps(q ∗ ⊕ p)
236 CHAPTER 8. PARALLEL RECURSION
Correctness of sps
We show that for a non-singleton powerlist L,
yin = x0 ⊕ .. ⊕ xi
This description is considerably more difficult to manipulate. The parallelism
in it is harder to see. The proof of correctness requires manipulations of indices:
for this example, we have to show that for all i, j
8.5. HIGHER DIMENSIONAL ARRAYS 237
yij = xk ⊕ .. ⊕ xi
where k = max(0, i − 2j + 1).
Analogous definitions can be given for n-dimensional arrays. Observe that the
length of each dimension is a power of 2. As we had in the case of a pow-
erlist, the same matrix can be constructed in several different ways, say, first
by constructing the rows and then the columns, or vice versa. We will show, in
Section 8.5.2, that
(p | q) |0 (u | v) = (p |0 u) | (q |0 v)
i.e., | , |0 commute.
Note : We could have defined a matrix using ./ and ./0 instead of | and
|0 . As | and ./ are duals in the sense that either can be used to construct
238 CHAPTER 8. PARALLEL RECURSION
* ∧ ∧ + * ∧ ∧ +
2 4 0 1
A= B=
3 5 6 7
∨ ∨ ∨ ∨
* ∧ ∧ ∧ ∧ + * ∧ ∧ ∧ ∧ +
2 4 0 1 2 0 4 1
A|B= A ./ B =
3 5 6 7 3 6 5 7
∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨
∧ ∧ ∧ ∧
* 2 4 + * 2 4 +
3 5 0 1
A |0 B = A ./0 B =
0 1 3 5
6 7 6 7
∨ ∨ ∨ ∨
(or uniquely deconstruct) a powerlist, |0 and ./0 are also duals, as we show in
Section 8.5.2. Therefore, we will freely use all four construction operators for
matrices. 2
τ hhxii = hhxii
τ (p | q) = (τ p) |0 (τ q)
τ (u |0 v) = (τ u) | (τ v)
τ ((p | q) |0 (u | v))
= {applying the last rule}
(τ (p | q)) | (τ (u | v))
= {applying the middle rule}
8.5. HIGHER DIMENSIONAL ARRAYS 239
p q σpσu
σ =
u v σq σv
From the induction hypothesis, (τ p), (τ q), etc., are well defined. Hence,
Crucial to the above proof is the fact that | and |0 commute; this is remi-
niscent of the “Church-Rosser Property” [11] in term rewriting systems. Com-
mutativity is so important that we discuss it further in the next subsection.
It is easy to show that
τ (p ./ q) = (τ p) ./0 (τ q) and
τ (u ./0 v) = (τ u) ./ (τ v)
σhhxii = hhxii
σ((p | q) |0 (u | v)) = ((σ p) |0 (σ q)) | ((σ u) |0 (σ v))
g 0 hxi = hg xi
g 0 (r | s) = (g 0 r) | (g 0 s)
We have defined these two forms explicitly because we use one or the other
in all our examples; f 0 for a function f of arbitrary arity is similarly defined.
Observe that f 0 applied to a powerlist of length N yields a powerlist of length N .
The number of primes over f determines the dimension at which f is applied
(the outermost dimension is numbered 0; therefore writing ./ , for instance,
without primes, simply zips two lists). The operator for pointwise application
also appears in [3] and in [49].
Common special cases for the binary operator, op, are | and ./ and their
m
z }| {
00 0
pointwise application operators. In particular, writing ./m to denote ./ . . . ,
we define, ./ 0 = ./ and for m > 0,
Theorem 1 f 0 , ./ commute.
Proof: We prove the result for unary f ; the general case is similar. Proof is
by structural induction.
Induction:
f 0 ((p | q) ./ (u | v))
= { | , ./ in the argument commute}
f 0 ((p ./ u) | (q ./ v))
= {f 0 , | commute}
f 0 (p ./ u) | f 0 (q ./ v)
= {induction}
((f 0 p) ./ (f 0 u)) | ((f 0 q) ./ (f 0 v))
= { | , ./ commute}
((f 0 p) | (f 0 q)) ./ ((f 0 u) | (f 0 v))
= {f 0 , | commute}
(f 0 (p | q)) ./ (f 0 (u | v)) 2
8.5. HIGHER DIMENSIONAL ARRAYS 241
8.5.2 Deconstruction
In this section we show that any powerlist that can be written as p |m q for
some p, q can also be written as u ./m v for some u, v and vice versa; this is
analogous to Law L1, for dual deconstruction. Analogous to Law L2, we show
that such deconstructions are unique.
u ./m v = p |m q
Conversely, for any u, v and m ≥ 0, if u ./m v is defined then there exist some
p, q such that
p |m q = u ./m v 2
We do not prove this theorem; its proof is similar to the theorem given below.
(p ⊗m q = u ⊗m v) ≡ (p = u ∧ q = v)
Induction: p = p0 | p1 , q = q0 | q1 , u = u0 | u1 , v = v0 | v1
(p0 | p1 ) |n+1 (q0 | q1 ) = (u0 | u1 ) |n+1 (v0 | v1 )
≡ {definition of |n+1 }
(p0 |n+1 q0 ) | (p1 |n+1 q1 ) = (u0 |n+1 v0 ) | (u1 |n+1 v1 )
≡ {unique deconstruction using Law L2}
(p0 |n+1 q0 ) = (u0 |n+1 v0 ) ∧ (p1 |n+1 q1 ) = (u1 |n+1 v1 )
≡ {induction on the length of p0 , q0 , p1 , q1 }
(p0 = u0 ) ∧ (q0 = v0 ) ∧ (p1 = u1 ) ∧ (q1 = v1 )
≡ {Law L2}
(p0 | p1 ) = (u0 | u1 ) ∧ (q0 | q1 ) = (v0 | v1 )
emhSi = hSi
emhP i = em P
em(u | v) = hem ui | hem (rev v)i
The first line is the rule for embedding a single item in 0-dimensional hypercube.
The next line, simply, says that an array having length 1 in a dimension can be
embedded by ignoring that dimension. The last line says that a non-singleton
array can be embedded by embedding the left half of dimension 0 and the reverse
of the right half in the two component hypercubes of a larger hypercube.
8.6 Remarks
Related Work
Applying uniform operations on aggregates of data have proved to be extremely
powerful in APL [23]; see [3] and [5] for algebras of such operators. One of the
earliest attempts at representing data parallel algorithms is in [42]. In their
words, “an algorithm... performs a sequence of basic operations on pairs of
data that are successively 2(k−1) , 2(k−2) , .., 20 = 1 locations apart”. An algo-
rithm operating on 2N pieces of data is described as a sequence of N parallel
steps of the above form where the k th step, 0 < k ≤ N , applies in parallel a
binary operation, OPER, on pairs of data that are 2(N −k) apart. They show
that this paradigm can be used to describe a large number of known parallel
algorithms, and any such algorithm can be efficiently implemented on the Cube
Connected Cycle connection structure. Their style of programming was imper-
ative. It is not easy to apply algebraic manipulations to such programs. Their
244 CHAPTER 8. PARALLEL RECURSION
programming paradigm fits in well within our notation. Mou and Hudak[40]
and Mou[41] propose a functional notation to describe divide and conquer-type
parallel algorithms. Their notation is a vast improvement over Preparata and
Vuillemin’s in that changing from an imperative style to a functional style of
programming allows more succinct expressions and the possibility of algebraic
manipulations; the effectiveness of this programming style on a scientific prob-
lem may be seen in [53]. They have constructs similar to tie and zip, though
they allow unbalanced decompositions of lists. An effective method of pro-
gramming with vectors has been proposed in [7, 8]. He proposes a small set of
“vector-scan” instructions that may be used as primitives in describing parallel
algorithms. Unlike our method he is able to control the division of the list and
the number of iterations depending on the values of the data items, a necessary
ingredient in many scientific problems. Jones and Sheeran[24] have developed
a relational algebra for describing circuit components. A circuit component is
viewed as a relation and the operators for combining relations are given ap-
propriate interpretations in the circuit domain. Kapur and Subramaniam[25]
have implemented the powerlist notation for the purpose of automatic theorem
proving. They have proved many of the algorithms in this paper using an in-
ductive theorem prover, called RRL (Rewrite Rule Laboratory), that is based
on equality reasoning and rewrite rules. They are now extending their theorem
prover so that the similarity constraints on the powerlist constructors do not
have to be stated explicitly.
One of the fundamental problems with the powerlist notation is to devise
compilation strategies for mapping programs (written in the powerlist notation)
to specific architectures. The architecture that is the closest conceptually is the
hypercube. Kornerup[31] has developed certain strategies whereby each parallel
step in a program is mapped to a constant number of local operations and
communications at a hypercube node.
Combinational circuit verification is an area in which the powerlist nota-
tion may be fruitfully employed. Adams[1] has proved the correctness of adder
circuits using this notation. A ripple-carry adder is typically easy to describe
and prove, whereas a carry-lookahead adder is much more difficult. Adams has
described both circuits in our notation and proved their equivalence in a remark-
ably concise fashion. He obtains a succinct description of the carry-lookahead
circuit by employing the prefix-sum function (See Section 4.7).
The last line of this definition applies to a non-singleton list of odd length; the
list is deconstructed into two lists p, q of equal length and e, the middle element.
(We have abused the notation, applying | to three arguments). Similarly, the
function lf for prefix sum may be defined by
lf hxi = hxi
lf (p ./ q) = (t∗ ⊕ p) ./ t
lf (e ./ p ./ q) = e ./ (e ⊕ (t∗ ⊕ p)) ./ (e ⊕ t)
where t = lf (p ⊕ q)
In this definition, the singleton list and lists of even length are treated as
before. A list of odd length is deconstructed into e, p, q, where e is the first
element of the argument list and p ./ q constitutes the remaining portion of the
list. For this case, the prefix sum is obtained by appending the element e to
the list obtained by applying e⊕ to each element of lf (p ./ q); we have used the
convention that (e ⊕ L) is the list obtained by applying e⊕ to each element of
list L.
[1] Will Adams. Verifying adder circuits using powerlists. Technical Report
TR 94-02, Dept. of Computer Science, Univ. of Texas at Austin, Austin,
Texas 78712, Mar 1994.
[3] J. Backus. Can programming be liberated from the von Neumann style? A
functional style and its algebra of programs. Communications of the ACM,
21(8):613–641, Aug 1978. Turing Award Lecture (1977).
[4] Kenneth Batcher. Sorting networks and their applications. In Proc. AFIPS
Spring Joint Computer Conference, volume 32, pages 307–314, Reston, VA,
1968. AFIPS Press.
[7] Guy E. Blelloch. Vector Models for Data-Parallel Computing. MIT Press,
1990.
[10] K. Mani Chandy and Jayadev Misra. Parallel Program Design: A Foun-
dation. Addison-Wesley, 1988.
247
248 BIBLIOGRAPHY
[22] Paul Hudak, Jon Peterson, and Joseph Fasel. A Gentle Introduction to
Haskell, Version 98. Available at http://www.haskell.org/tutorial/, 2000.
[23] K. Iverson. A Programming Language. John Wiley and Sons, 1962.
[24] Geraint Jones and Mary Sheeran. Circuit design in Ruby. In Jørgen
Staunstrup, editor, Formal Methods for VLSI Design. North-Holland, 1990.
[25] D. Kapur and M. Subramaniam. Automated reasoning about parallel al-
gorithms using powerlists. Manuscipt in preparation, 1994.
[26] Richard M. Karp and Vijaya Ramachandran. Parallel algorithms for shared
memory machines. In J. van Leeuwen, editor, Handbook of Theoretical
Computer Science. Elsevier and the MIT Press, 1990.
[27] Henry Kautz and Bart Selman. Planning as satisfiability. Proceedings of
the 10th European Conference on Artificial Intelligence (ECAI 92), 1992.
BIBLIOGRAPHY 249
[31] Jacob Kornerup. Mapping a functional notation for parallel programs onto
hypercubes. Information Processing Letters, 53:153–158, 1995.
[33] Tracy Larrabee. Efficient generation of test patterns using boolean satisfi-
ability. PhD thesis, Stanford University, 1990.
[35] Philip M. Lewis, Arthur Bernstein, and Michael Kifer. Databases and
Transaction Processing. Addison-Wesley, 2002.
[38] Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard
ML. MIT Press, 1990.
[39] Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and
Sharad Malik. Chaff: Engineering an Efficient SAT solver. In Proceedings
of the 39th Design Automation Conference, June 2001.
[41] Z.G. Mou. Divacon: A parallel language for scientific computing based
on divide-and-conquer. In Proc. 3rd Symp. on the Frontiers of Massively
Parallel Computation, pages 451–461, Oct 1991.
[44] R.L. Rivest, A. Shamir, and L. Adelman. A method for obtaining digital
signatures and public key cryptosystems. Communications of the ACM,
21(2):120–126, Feb 1978.