0% found this document useful (0 votes)
33 views67 pages

Notes Week7

Uploaded by

ruben135rejo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views67 pages

Notes Week7

Uploaded by

ruben135rejo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Lecture notes for

F17SC Discrete Mathematics

Dr Matteo Capoferri1 and Dr Adrian Turcanu2

Department of Mathematics
Heriot-Watt University

AY 2022–2023

1 Senior Course Leader, Edinburgh. E-mail: [email protected], Webpage: https:


//mcapoferri.com.
2 Dubai. E-mail: [email protected].
Abstract

This course is in five parts. The first part is concerned with the foundations of
set and function theory, alongside elements of combinatorics. The second part deals
with probability theory and applications. The third part consists in Graph Theory,
indispensable for students of computer science and applied mathematics. The fourth
part is concerned with recurrence relations and, finally, the fifth part provides an
introduction to matrix theory and linear algebra.

Each section is complemented by a list of exercises. Exercises for which a solution


is provided are marked with a (∗).

All the teaching materials for the course are available on Canvas.

All you need is for the course is contained in these lecture notes and in the accom-
panying Problem Sets, which will gradually become available on Canvas. However,
if you wish to consult a textbook, you may refer to

• P. Grossman, Discrete Mathematics for Computing (has a nice introduction to


set theory, combinatorics and graphs with a range of examples from computer
science)

• A. Chetwynd and P. Diggle, Discrete Mathematics (a short, very friendly text-


book, covering combinatorics and probability)

• K. H. Rosen, Discrete Mathematics and its application (the graph theory chap-
ters are particularly useful)

• S. B. Maurer and A. Ralston, Discrete Algorithmic Mathematics (covers linear


algebra, probability and applications to algorithms)

• S. Lipschutz, Schaum’s outline of theory and problems of linear algebra (covers


matrices and linear equations)

Updated: March 2, 2023.

Acknowledgements: I am grateful to Dr Anatoly Konechny for making his teach-


ing materials available to me. These notes are largely based on his.
Contents

1 Set Theory and Combinatorics 4


1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Computer Representation of Sets . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 The Sum and Product Rules . . . . . . . . . . . . . . . . . . . . . . 23
1.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Permutations and Combinations . . . . . . . . . . . . . . . . . . . . 25
1.8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.9 The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . . . 29
1.9.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Probability 36
2.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2 Conditional Probability and Independence . . . . . . . . . . . . . . . 42
2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Random Variables and Probability Distributions . . . . . . . . . . . . 47
2.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Expected value and variance . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Examples of applications to algorithms . . . . . . . . . . . . . . . . . 58
2.5.1 Algorithm MaxNumber [NON EXAMINABILE] . . . . . . . . 58
2.5.2 Algorithm SeqSearch [NON EXAMINABILE] . . . . . . . . . 59
2.5.3 Algorithm Permute [NON EXAMINABILE] . . . . . . . . . . 59

2
Page 3 Dr Matteo Capoferri & Dr Adrian Turcanu

3 Graph Theory 61
3.1 Introduction to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 1

Set Theory and Combinatorics

1.1 Sets
Definition 1.1. A set is an unordered collection of objects.

The notation A = {1, 3, 5, 7} means that the set A contains the four elements
1, 3, 5, 7 . We write a ∈ A to denote that a is an element of the set A . We write
a∈ / A to indicate that a does not belong to A . For example, given A = {1, 3, 5, 7}
we have that 5 ∈ A and 6 ∈ / A.
There are a number of ways of denoting sets. Thus for the set of positive even
numbers we could write
A = {2, 4, 6, 8, . . .}
or
A = {x | x is a positive even number} ,
where the vertical bar | is to be read “such that”.
A set cannot contain the same object more than once. Also, the elements in a
set are not ordered so the sets {a, b, c} and {c, a, b} are the same. In general, two
sets are equal if they contain the same elements.

Definition 1.2. A set B is a subset of a set A if every element of B is also an


element of a A .

If B is a subset of A we write B ⊂ A . For example, if A = {1, 3, 5, 7} and


B = {3, 7} , then B ⊂ A .

Remark 1.1. You should take care to distinguish between ‘B is a subset of A’ ( B ⊂


A ) and ‘B is a element of A’ ( B ∈ A ). For example if

A = {1, {3, 5}, 3, 6, 7},

then all of the following are true

3 ∈ A, {3, 5} ∈ A, 5∈
/ A, {3, 6} ⊂ A, {3} ⊂ A.

4
Page 5 Dr Matteo Capoferri & Dr Adrian Turcanu

The set that has no elements is called the empty set and is denoted by ∅ . By
convention ∅ ⊂ A for any set A.
Remark 1.2. Note that the set ∅ and the set B = {∅} are different. The single
element of B is the empty set itself.

Definition 1.3. An ordered pair is a pair (a, b) , where a comes before b.


Two pairs (a, b) and (c, d) are equal whenever a = c and b = d . Note that
(3, 8) and (8, 3) are not equal.
Definition 1.4. If A and B are sets then the Cartesian product A × B is the set
of all ordered pairs (a, b) with a ∈ A and b ∈ B .

Example 1.5. Let A = {1, 2, 3} and B = {x, y} . Find the Cartesian product
A×B.
Solution. The Cartesian product A × B is

A × B = { (1, x), (1, y), (2, x), (2, y), (3, x), (3, y) }.

1.1.1 Exercises

Exercise 1.1.1 (∗). List the elements of the set

{x | x is the square of an integer and x < 25} .

Solution. We have

{x | x is the square of an integer and x < 25} = {0, 1, 4, 9, 16} .

Exercise 1.1.2 (∗). Find B × A for Example 1.5.

Solution. B × A = { (x, 1), (x, 2), (x, 3), (y, 1), (y, 2), (y, 3) } .

1.2 Set Operations


Definition 1.6. If A and B are sets then the union of A and B , A∪B is defined
as
A ∪ B := {x | x ∈ A or x ∈ B}.
Discrete Mathematics Page 6

The symbol “:=” means “equal by definition”.

Definition 1.7. The intersection of A and B , A ∩ B is defined as

A ∩ B := {x | x ∈ A and x ∈ B}.

Note that the statement x ∈ A or x ∈ B does not exclude the possibility that
x is an element of both sets. For example, we have

{1, 3, 5, 7} ∪ {2, 5, 7} = {1, 2, 3, 5, 7}

and
{1, 3, 5, 7} ∩ {2, 5, 7} = {5, 7} .
Remark 1.3. It is useful to picture combinations of sets with the aid of Venn dia-
grams. This will be done in the lectures.

Definition 1.8. If A and B are sets, then the difference of A and B is denoted
by A \ B and defined as

A \ B := {x | x ∈ A and x ∈
/ B}.

For example, if A = {1, 3, 5, 7} and B = {4, 5, 6, 7} then A \ B = {1, 3} .


Often, all the sets under consideration are subsets of some larger set U called the
universal set.

Definition 1.9. The complement of a set A is A = U \ A .

Thus A = {x | x ∈
/ A} .

Example 1.10. Let U = {1, 2, 3, 4, 5, 6, 7} , A = {1, 7} , B = {1, 2, 4, 6} and


C = {1, 3, 5, 7} . Find the sets
(a) C \ A , (b) A ∩ B , (c) (A ∩ B) ∪ C , (d) A ∪ B , (e) (C − A) ∩ B .
Solution. (a) C \ A = {3, 5} .
(b) B = {3, 5, 7} so A ∩ B = {7} .
(c) A ∩ B = {1} and C = {2, 4, 6} . Hence (A ∩ B) ∪ C = {1, 2, 4, 6} .
(d) A ∪ B = {1, 2, 4, 6, 7} so A ∪ B = {3, 5, } .
(e) C \ A = {1, 2, 4, 6, 7} so (C \ A) ∩ B = {1, 2, 4, 6} .

Set operations satisfy the following laws.

A ∪ (B ∪ C) = (A ∪ B) ∪ C

A ∩ (B ∩ C) = (A ∩ B) ∩ C
Page 7 Dr Matteo Capoferri & Dr Adrian Turcanu

A∪B =B∪A
A∩B =B∩A
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(A ∪ B) = A ∩ B
(A ∩ B) = A ∪ B
Note that the empty set ∅ satisfies the following properties A ∪ ∅ = A and
A ∩ ∅ = ∅ for any set A.

1.2.1 Exercises

Exercise 1.2.1 (∗). Let U = {1, 2, 3, 4, 5, 6, 7, 8} , A = {3, 8} , B = {2, 4, 6, 8} and


C = {1, 3, 5, 7} .
Find the sets
(a) C \ A , (b) A ∩ B , (c) (A ∩ B) ∪ C ,
(d) A ∩ B , (e) (C \ A) ∩ B .

Solution. (a) C \ A = {1, 5, 7} .


(b) A ∩ B = {3, 8} ∩ {1, 3, 5, 7} = {3} .
(c) A ∩ B = {8} , so (A ∩ B) ∪ C = {2, 4, 6, 8} and
(d) A ∩ B = {1, 2, 3, 4, 5, 6, 7} .
(e) C \ A = {2, 3, 4, 6, 8} , so (C \ A) ∩ B = B .

Exercise 1.2.2 (∗). Let C = A ∩ B and D = A ∩ B . Find C ∩ D .

Solution. C ∩ D = ∅ since there are no elements that are in both B and B .

Exercise 1.2.3 (∗). What can you say about sets A and B if
(a) A∪B = A ? (b) A∩B = A ? (c) A∩B = B ∩A ? (d) A\B = A ?

Solution. (a) If A ∪ B = A then B ⊂ A .


(b) If A ∩ B = A then A ⊆ B .
(c) Since A ∩ B = B ∩ A is always true we cannot say anything about A and B .
(d) Suppose that A \ B = A and x ∈ A ∩ B . Then x ∈ A and x ∈ B so
x∈/ A \ B . Since A \ B = A , no such x can exist and A ∩ B = ∅ .
Discrete Mathematics Page 8

1.3 Computer Representation of Sets


There are many ways to represent sets on a computer. We could store the elements
of a set in an unordered fashion. If we did this, it would be time-consuming to
compute the union or intersection of two sets since these operations would require a
large amount of searching for elements. We outline below a method for storing the
elements of a set which makes computing set operations easy.

• Step 1. We order a finite universal set U = {u1 , u2 , . . . , un } .

• Step 2. Given a subset A of U , we represent A by a bit string of length n


where the k th bit in the string is 1 if uk ∈ A and 0 otherwise.

As an example, let U = {1, 2, 3, 4, 5, 6} , A = {2, 3} and B = {3, 4, 5, 6} . Then


the bit string for A is 0 1 1 0 0 0 and the bit string for B is 0 0 1 1 1 1 .
Given the bit strings for A and B, one can easily compute the bit strings for A,
A ∪ B and A ∩ B as follows.

− To find the bit string for A from the bit string for A we turn each 1 into a 0
and each 0 into a 1.

For the above example, the bit string for A is 0 1 1 0 0 0 so the bit string for A
is 1 0 0 1 1 1 .

− The k th bit in the bit string for A ∪ B is 1 if either (or both) of the bits in
the k th position of A and B is 1, and is 0 if both bits are 0.

− The k th bit in the bit string for A ∩ B is 1 if both of the bits in the k th
position of A and B are 1 , and is 0 if either (or both) of the two bits is 0.

As an example, let U = {1, 2, 3, 4, 5} , A = {1, 3, 5} and B = {3, 4} . The bit


string for A is 1 0 1 0 1 whereas the bit string for B is 0 0 1 1 0 . Therefore, the bit
string for A ∪ B is 1 0 1 1 1 and the bit string for A ∩ B is 0 0 1 0 0 .

1.3.1 Exercises

Exercise 1.3.1 (∗). Let U = {1, 2, 3, 4, 5, 6, 7} , A = {3, 4, 5, 7} and B = {5, 6} .


Find the bit strings for A , B , A , A ∪ B , and A ∩ B .

Solution. The bit string for A is 0 0 1 1 1 0 1 , and the bit string for B is 0 0 0 0 1 1 0 .
The bit strings for A , A ∪ B and A ∩ B are 1 1 0 0 0 1 0 , 0 0 1 1 1 1 1 and
0000100.
Page 9 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 1.3.2 (∗). Which subsets of a finite universal set U do these bit strings
represent?
(a) the string with all zeros, (b) the string with all ones.

Solution. (a) The empty set. (b) The set U .

Exercise 1.3.3 (∗). Explain how to find the bit string corresponding to A \ B.

Solution. For A \ B , the k th bit is 1 if the k th bit of the first string is 1 and the
k th bit of the second string is 0, and is 0 otherwise.

1.4 Relations
Let A and B be sets. A relation between A and B links certain elements of A with
certain elements of B. More formally, a relation from A to B is a subset of A × B.
A relation on the set A is a relation from A to A.

Here are some examples of relations:

1. Let A be the set of students at Heriot-Watt University and let B be the set
of modules. The relation R consists of pairs (a, b) , a ∈ A , b ∈ B with the
student a being enrolled to the course b.

2. Let R be the relation on the set of people consisting of pairs (a, b) where a is
a parent of b.

3. Let A = {1, 2, 3, 4} and B = {x, y} . Then

R = { (1, y), (2, x), (3, x), (4, y) }

is a relation from A to B.

4. The following are relations on the set of integers:

R1 = {(a, b)|a ≥ b},

R2 = {(a, b)|a − b = 4}.

5. Let A = {1, 2, 3, 4} and let R be the relation on A given by

R = {(a, b)|a divides b }.

Then R contains the pairs

(1, 1) (1, 2) (1, 3) (1, 4) (2, 2) (3, 3) (4, 4) (2, 4) .


Discrete Mathematics Page 10

If S is a (small) finite set, then we can represent a relation R on S by a directed


graph, a network of blobs and arrows. We have a blob for each element of S, and
draw an arrow from x to y if (x, y) ∈ R.
Let S = {a, b, c, d} and let R be the relation defined by the subset
{(a, a), (a, b), (b, b), (b, c), (c, b), (c, d), (d, d)}.
The directed graph encoding this relation is shown below.

Definition 1.11. Let R be a relation on a set A. We say that R is


• reflexive if (a, a) ∈ R for all elements a in A;
• symmetric if whenever (a, b) ∈ R then (b, a) ∈ R;
• antisymmetric if whenever (a, b) ∈ R and (b, a) ∈ R then a = b;
• transitive if whenever (a, b) ∈ R and (b, c) ∈ R then (a, c) ∈ R.
A description of the above properties in terms of graphs was given in class.

Example 1.12. As an example, consider the relation in item 5 above.


It is reflexive since it contains all pairs of the form (a, a) , namely,

(1, 1) (2, 2) (3, 3) (4, 4)

It is not symmetric since, for example, it contains (1, 2) but not (2, 1) .
It is antisymmetric since there is no pair of elements a and b with a ̸= b such
that (a, b) and (b, a) belong to R.
It is transitive. To check this we have to show that if (a, b) and (b, c) belong to
R, then so does (a, c) .

Example 1.13. As another example, let R be the relation on the set of people
consisting of pairs (a, b) where a likes b. Unfortunately it is not symmetric. It is
not antisymmetric or transitive. It is probably not even reflexive.
Page 11 Dr Matteo Capoferri & Dr Adrian Turcanu

Definition 1.14. A relation that is reflexive, symmetric and transitive is called an


equivalence relation.

Example 1.15. Let R be the relation on the set A = {1, 3, 5, 9, 11, 18} defined by
the pairs (a, b) such that a − b is divisible by 4.

Given a set A with an equivalence relation R on it, we can break up all elements in
A into disjoint groups (subsets) such that within each group all elements are related
between themselves but no two elements from two different groups are related. Such
groups of elements are called equivalence classes. For Example 1.15 the equivalence
classes are C1 = {1, 5, 9}, C2 = {3, 11}, C3 = {18}.

1.4.1 Exercises

Exercise 1.4.1 (∗). Let A = {2, 3, 4, 5, 6} and let R be the relation on A given by

R = {(a, b)|a divides b }

Write down the ordered pairs in R.


Solution. R = {(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (2, 4), (2, 6), (3, 6)}.

Exercise 1.4.2 (∗). Let R be the relation on the set of real numbers consisting
of pairs (a, b) with a ≤ b. Determine whether the relation is reflexive, symmetric.
antisymmetric, or transitive.
Solution. This relation is reflexive because a ⩽ a. It is not symmetric because e.g.
2 ⩽ 3 but 3 ⩽̸ 2. It is antisymmetric because if a ⩽ b and b ⩽ a this does imply
a = b. It is transitive, that is a ⩽ b and b ⩽ c implies a ⩽ c.

Exercise 1.4.3. Show that Example 1.15 defines an equivalence relation.

1.5 Functions
Let X and Y be sets.
Definition 1.16. A function from X to Y is a rule that assigns to each element of
X exactly one element of Y . We write f : X → Y . The set X is called the domain
of f and the set Y is called the codomain of f .
Discrete Mathematics Page 12

Here are some examples of functions.

Example 1.17. f : R → R defined by the formula f (x) = x2 is a real-valued


function of the kind studied in calculus courses.

Example 1.18. Let A = {1, 5, 7}, B = {3, 8, 22, 30}. A function f : A → B is


defined as
f (1) = 22 , f (5) = 3 , f (7) = 22 .

Example 1.19. Function f : N → N is defined so that f (1) = 1 and f (n + 1) =


(n + 1)f (n). The latter is called a recurrence relation. It allows one to compute
values of f starting from the one given explicitly. For example f (2) = f (1 + 1) =
2f (1) = 2 · 1 = 2. It is actually computing the factorial: f (n) = n!.

Example 1.20. The next example is called the McCarthy 91 function and was
given by John McCarthy, one of the founders of artificial intelligence.
Here f : N → Z is defined for positive integers n by

n − 10 if n ≥ 101
f (n) =
f (f (n + 11)) if n ≤ 100

If we try to find f (1) directly, we put n = 1 in the definition. This shows that f (1)
depends on f (12) which in turn depends on f (23) etc. It seems hopeless to find
f (1) in this way. The trick is to work backwards. Putting n = 100 in the definition
gives
f (100) = f (f (111)) = f (101) = 91
Putting n = 99 in the definition gives

f (99) = f (f (110)) = f (100) = 91 .

Continuing in this way we can show that f (n) = 91 for all n ≤ 100 .

If f : X → Y is any function, the set of images

Ran(f ) := {y ∈ Y | y = f (x) for some x ∈ X}

is called the range of f .


Page 13 Dr Matteo Capoferri & Dr Adrian Turcanu

In Example 1.17 the range of f is the set of all non-negative real numbers:
Ran(f ) = {y ∈ R | y ≥ 0}. In Example 1.18 the range is the set Ran(f ) = {3, 22}.
For the McCarthy 91 function (Example 1.20) the range is Ran(f ) = {y ∈ N | y ≥
91}.
Definition 1.21. A function is called surjective if its range is equal to its codomain.
None of the above examples defines a surjective function. However it is easy to
construct functions from the above examples which are surjective by simply modi-
fying the codomain. For instance f (x) = x2 becomes surjective if we consider it as
a function from all real numbers into the non-negative real numbers: f : R → R≥0 .
Definition 1.22. A function is called injective if no two distinct elements of the
domain have the same image.
In the above only the factorial function of Example 1.19 is injective.

Given two functions f : A → B, g : B → C we can define their composition


g ◦ f : A → C by the rule g ◦ f (x) = g(f (x)). The notation g ◦ f needs to be read
from right to left: we first apply f then g.

Example 1.23. Let f : R → R be defined as f (x) = 2x2 and g : R → R is defined


as g(x) = 3x + 4 then

(f ◦ g)(x) = f (g(x)) = f (3x + 4) = 2(3x + 4)2 = 2(9x2 + 24x + 16) = 18x2 + 48x + 32

while
(g ◦ f )(x) = g(f (x)) = g(2x2 ) = 3(2x2 ) + 4 = 6x2 + 4

The identity function on a set A is the function i : A → A defined so that


i(x) = x.
Definition 1.24. Let f : A → B and g : B → A be functions. If g ◦ f : A → A is
the identity function on A and f ◦ g : B → B is the identity function on B then it
is said that f is the inverse of g (and g is the inverse of f ).

A function f that has an inverse is said to be invertible.


The following theorem is true.
Theorem 1.25. A function f is invertible if and only if it is surjective and injective.
Remark 1.4. A function that is both injective and surjective is often called bijective.

1.5.1 Exercises
Discrete Mathematics Page 14

Exercise 1.5.1. For each of the following functions find their range. Determine
whether the function is surjective and whether it is injective.

(a)(∗) f : R → R , f (x) = 2x + 1 (b) g : R → R , g(x) = x4 + 1 ,

(c)(∗) h : X → Y , h(s) = number of ones in s ,


(d) j : X → Y , j(s) = the first bit of s
where X be the set of all finite non-empty strings of bits and let Y be the set of all
non-negative integers.

Solution. (a) The range is R because for any y ∈ R we always have a solution to
y = f (x) = 2x + 1 given by x = (y − 1)/2. This also gives the inverse function
f −1 (y) = (y − 1)/2. Therefore by Theorem 1.25 f is injective and surjective.

(c) The function h is surjective. To show this take any non-negative integer n.
If n > 0 then let sn be a string of length n containing only 1’s. then h(sn ) = n.
If n = 0 then we have h(0) = 0 where we take the string containing a single 0.
This function is not injective. To demonstrate this it suffices to note that, e.g.,
s(101) = 2 = s(011).

Exercise 1.5.2 (*). Let the functions f , g and h have R as their domain and
codomain, and be defined as
(
2
1 if x ≥ 0
f (x) = 4x − 3 , g(x) = x + 1 , h(x) = .
0 if x < 0

Find rules for the following functions (with domain and codomain R)

(a) f ◦ f , (b) h ◦ f , (c) h ◦ g .

Solution. (a) We have f ◦ f (x) = f (4x − 3) = 4(4x − 3) − 3 = 16x − 15.

(b) To calculate h ◦ f (x) we first obtain f (x) = 4x − 3. Then if 4x − 3 ≥ 0


the final result is 1 and if 4x − 3 < 0 the final result is 0. Since 4x − 3 ≥ 0 means
x ≥ 3/4 we can summarise the answer as
(
1 if x ≥ 3/4
h ◦ f (x) =
0 if x < 3/4

(c) Since g(x) = 1 + x2 is always greater or equal than 0, h ◦ g(x) = 1 for all x.
Page 15 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 1.5.3 (*). Find the inverse function for each of the following functions,
or explain why no inverse exists.

(a) g : R → R , g(x) = x2 ,

(b) j : S → S
where S is the set of non-empty finite strings of lower case letters, and j(s) gives
a string obtained by moving the last character to the beginning of the string, e.g.
j(yncrfm) = myncrf.

Solution. (a) This function is the same as the absolute value function g(x) = x2 =
|x|. Since g(−1) = 1 = g(1) this function is not injective and therefore by Theo-
rem 1.25 the inverse function does not exist.

(b) The inverse function j −1 (s) performs the inverse operation on the string: it
takes the first letter and moves it to the end of the string, e.g. j −1 (sdfj) = dfjs.

1.6 Mathematical Induction


Induction is one of the most commonly used proof techniques in Discrete Mathe-
matics and Computer Science. It is used for proving statements which depend on
the integers.

Suppose we want to find a formula for the sum of the first n odd integers:
• For n = 1 the sum is 1 = 12 .
• For n = 2 the sum is 1 + 3 = 4 = 22 .
• For n = 3 the sum is 1 + 3 + 5 = 9 = 32 .
• For n = 4 the sum is 1 + 3 + 5 + 7 = 16 = 42 .
Based on the above, it seems reasonable to conjecture that
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .
But how does one rigorously prove this?

To start with, let us look at what it means for a statement (or proposition) to
depend on a positive integer n .

Example 1.26. Let P (n) be the statement

1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .

Then:
Discrete Mathematics Page 16

• P (3) states that 1 + 3 + 5 = 32 ;

• P (4) states that 1 + 3 + 5 + 7 = 42 .

Both P (3) and P (4) are true.

Example 1.27. Let Q(n) be the statement 3n > (n + 2)2 . Then

• Q(2) states that 32 > 42 ;

• Q(3) states that 33 > 52 .

Q(2) is false, whereas Q(3) is true.

Example 1.28. Let P (n) be the statement n2 + n is even. Write down P (1) ,
P (3) , P (k) and P (k + 1) .
Solution. P (1) states that 12 + 1 is even. P (3) states that 32 + 3 is even. P (k)
states that k 2 + k is even. P (k + 1) states that (k + 1)2 + k + 1 is even.

Example 1.29. Let P (n) be the statement 50 − 2n ≥ 0 . Write down P (1) , P (5)
and P (30) . Which of them are true?
Solution. P (1) states that 50 − 2 ≥ 0 which is true. P (5) states that 50 − 10 ≥ 0
which is true. P (30) states that 50 − 60 ≥ 0 which is false.

Example 1.30. Let {an } be the sequence defined by a1 = 4 and

an+1 = 2an − 3 , n ≥ 1 .

Let P (n) be the statement an = 2n−1 + 3 . Write down P (1) , P (2) and P (3) .
Which of them are true?
Solution. P (1) states that a1 = 20 + 3 , that is, a1 = 4 . Hence P (1) is true. P (2)
states that a2 = 21 + 3 , that is, a2 = 5 . P (3) states that a3 = 22 + 3 , that is,
a3 = 7 .
We use an+1 = 2an − 3 to find a2 and a3 : a2 = 2a1 − 3 = 2(4) − 3 so a2 = 5,
and a3 = 2a2 − 3 = 2(5) − 3 so a3 = 7. Hence, both P (2) and P (3) are true.
Page 17 Dr Matteo Capoferri & Dr Adrian Turcanu

Let P (n) be a proposition which depends n where n is a positive integer.


A proof by mathematical induction that P (n) is true for all positive integers n
consists of two steps.

Mathematical induction

Step 1. Show that P (1) is true.

Step 2. Let k be any positive integer. Show that if P (k) is true


then P (k + 1) is true.

Example 1.31. Use Mathematical Induction to prove that for any positive integer
n
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2
Solution. Let P (n) be the statement

1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .

P (1) states that 1 = 12 which is correct. Hence P (1) holds.


Assume P (k) is true: 1 + 3 + 5 + 7 + . . . + (2k − 1) = k 2 . Then

1 + 3 + 5 + 7 + . . . + (2k − 1) + (2k + 1) = [1 + 3 + 5 + 7 + . . . + (2k − 1)] + 2k + 1.

Since P (k) holds,

1 + 3 + 5 + 7 + . . . + (2k − 1) + (2k + 1) = k 2 + 2k + 1 = (k + 1)2 .

This shows that P (k + 1) follows from P (k) .


By the principle of mathematical induction, P (n) is true for any positive integer
n .

Remark 1.5. In the above example, P (n) is a statement. It is not an algebraic


expression. It is not acceptable to write P (n) = n2 .

Example 1.32. Let {an } be the sequence defined by a1 = 1 and

an+1 = an + 8n , n ≥ 1 .

Prove by induction that an = (2n − 1)2 for all positive integers n .


Solution. Let P (n) be the statement: an = (2n − 1)2 .
P (1) is the statement: a1 = (2 − 1)2 . Since a1 = 1 , P (1) is true.
Discrete Mathematics Page 18

Assume P (k) is true: ak = (2k − 1)2 . Then

ak+1 = ak + 8k
= (2k − 1)2 + 8k
= 4k 2 + 4k + 1
= (2k + 1)2 .

Since 2(k + 1) − 1 = 2k + 1 we see that P (k + 1) follows from P (k) . By the


principle of mathematical induction, P (n) is true for any positive integer n .

To prove things by induction you should:

(a) State P (n) .

(b) Show that P (1) is true.

(c) Do the inductive step : show that if P (k) is true then P (k + 1) is true.

It is worth stressing that one needs to carry out ALL of the above steps. Indeed,
there are a number of pitfalls in using the principle of mathematical induction as
the next three examples show.

I. It does not suffice to prove only that P (1) is true. Indeed, let P (n) be the
statement: n2 = n . Then P (1) is true. But P (n) is true only when n = 1
or n = 0 and is false for n ≥ 2.

II. It does not suffice to prove only that if P (k) is true then P (k + 1) is true.
Indeed, let P (n) be the statement: n + 1 = n . Assume that P (k) holds so
that k + 1 = k . Adding 1 to each side gives k + 2 = k + 1 so P (k + 1) is
true. Of course, P (1) is false so we cannot get started.

III. Another typical error is to give a proof of P (k) implies P (k + 1) which only
works for some values of k , for example k ≥ 2 . Let P (n) be the statement:
any n students taking the same examination must all score the same mark.
Clearly P (1) is true.
Assume that P (k) is true and let any group of k + 1 candidates be given.
Let A and B be particular candidates and C the remaining group of k − 1
students. Then A and C form a group of k students so by P (k) they all
have the same mark. The same is true of B and C . Thus A and B and the
group C score the same mark so P (k + 1) is true. By induction, P (n) holds
for all n .
The mistake is that the group C must not be empty. If k = 1 then C is
empty and the marks of students from the groups A and C do not coincide.
We have proved that P (k) is true implies that P (k + 1) is true if k ≥ 2 but
there is no way of establishing P (2) which is false.
Page 19 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 1.33. Let {an } be the sequence defined by a1 = 1 and



an+1 = 2 + an , n ≥ 1 .

Prove by induction that an ≤ 2 for all positive integers n .


Solution. Let P (n) be the statement: an ≤ 2 . Since a1 = 1 , P (1) is true.
Assume that P (k) holds. Thus ak ≤ 2. Then

ak+1 = 2 + ak

≤ 2+2
= 2,

that is, P (k + 1) is true. By induction P (n) is true for all n .

The next example uses the factorial function n! . The number 5! (read five
factorial or factorial 5) means 5 × 4 × 3 × 2 × 1 . Note that 1! = 1 and that by
convention 0! = 1 . The reader should check by calculation that 4! = 24 and use a
calculator to check that 7! = 5040 .

Example 1.34. Prove by induction that 2n < n! for n ≥ 4 .


Solution. Let P (n) be the statement: 2n < n! .
We first check that P (4) is true. This is correct since 4! = 24 > 16 = 24 .
Assume that k ≥ 4 and that P (k) holds so that 2k < k! . Then

2k+1 = 2 2k


< 2 (k!)
< (k + 1) k!
= (k + 1)!.

Hence 2k+1 < (k + 1)! so P (k + 1) is true.


By induction P (n) is true for all n .

1.6.1 Exercises

Exercise 1.6.1 (∗). Let P (n) be the statement 3n − 100 ≥ 0 . Write down P (1) ,
P (4) and P (40) . Which of them are true?

Solution. P (1) states that 3(1) − 100 ≥ 0 , which is false.


Discrete Mathematics Page 20

P (4) states that states that 3(4) − 100 ≥ 0 , which is false.


P (40) states that states that 3(40) − 100 ≥ 0 , which is true.

Exercise 1.6.2 (∗). Let {an } be the sequence defined by a1 = 3 and

an+1 = 5an − 8 , n ≥ 1 .

Let P (n) be the statement an = 5n−1 + 2 . Write down P (1) and P (2) . Which of
them are true?

Solution. P (1) states that a1 = 50 + 2 , that is, a1 = 3 . Hence P (1) is true.


P (2) states that a2 = 51 +2 , that is, a2 = 7 . Since a2 = 5a1 −8 = 5(3)−8 = 7 ,
P (2) is true.

Exercise 1.6.3 (∗). Use Mathematical Induction to prove that for any positive
integer n
1 + 2 + 22 + 23 + . . . + 2n = 2n+1 − 1.

Solution. Let P (n) be the statement

1 + 2 + 22 + 23 + . . . + 2n = 2n+1 − 1.

P (1) states that 1 + 2 = 22 − 1 = 3 which is correct.


Assume P (k) is true: 1 + 2 + 22 + 23 + . . . + 2k = 2k+1 − 1 . Then by P (k) ,

1 + 2 + 22 + 23 + . . . + 2k + 2k+1 = 2k+1 − 1 + 2k+1 .

Since 2k+1 + 2k+1 = 2k+2 ,

1 + 2 + 22 + 23 + . . . + 2k + 2k+1 = 2k+2 − 1.

This shows that P (k + 1) follows from P (k) .


By the principle of mathematical induction, P (n) is true for any positive integer
n .

Exercise 1.6.4 (∗). Prove by induction that 3n < n! for n ≥ 7 .

Solution. Let P (n) be the statement 3n < n! .


We first check that P (7) is true. This is correct since 7! = 5040 and 37 = 2187 .
Page 21 Dr Matteo Capoferri & Dr Adrian Turcanu

Assume that k ≥ 7 and that P (k) holds so that 3k < k! . Then

3 3k < 3 (k!) < (k + 1) k! = (k + 1)!.




Hence 3k+1 < (k + 1)! so P (k + 1) is true. By induction P (n) is true for all n .

Exercise 1.6.5 (∗). Let {an } be the sequence defined by a1 = x and

a2n + 12
an+1 = , n≥1
7
where 3 < x < 4. Prove by induction that 3 < an < 4 for all n.

Solution. Let P (n) be the statement : 3 < an < 4 .


Since a1 = x and 3 < x < 4 , P (1) is true.
Assume that P (k) holds so that 3 < ak < 4 . Then

7ak+1 = a2k + 12
< 16 + 12
= 28

so ak+1 < 4. Also,


7 ak+1 = a2k + 12 > 9 + 12 = 21
so P (k + 1) is true. By induction P (n) is true for all n .

Exercise 1.6.6 (∗). For n ≥ 2 the sequence {sn } is given by by


    
1 1 1
sn = 1 − 1− ... 1 −
2 3 n

so that s2 = 12 and s3 = 31 · By computing more values of sn if needed, guess a


formula for sn and use induction to prove it.

Solution. Using the formula for sn we obtain


1 1 1
s2 = s3 = s4 = ,
2 3 4
1
and we conjecture that sn = .
n
1
Let P (n) be the statement : sn = ·
n
Discrete Mathematics Page 22

By the above calculations, P (2) is true.


Assume that P (k) holds so that
    
1 1 1 1
sk = 1 − 1− ... 1 − = .
2 3 k k

Then
       
1 1 1 1 1 1
sk+1 = 1− 1− ... 1 − 1− = 1− .
2 3 k k+1 k k+1

Since  
1 1 1
1− = ,
k k+1 k+1
P (k + 1) is true. By induction P (n) is true for all n ≥ 2 .

Exercise 1.6.7 (∗). Let {sn } be the sequence defined by

sn = 1(1!) + 2(2!) + 3(3!) + . . . + n(n!)

Check that s1 = 1 , s2 = 5 and s3 = 23 . By computing more values of sn if needed,


guess a formula for sn and use induction to prove it.

Solution. Using the formula for sn we obtain

s1 = 1 s2 = 5 s3 = 23 s4 = 119.

Since
2! = 2 3! = 6 4! = 24 5! = 120
we conjecture that sn = (n + 1)! − 1 .
Let P (n) be the statement: sn = (n + 1)! − 1 . The above calculations show
that P (n) is true for n = 1, 2, 3, 4 .
Assume that P (k) holds so that sk = (k + 1)! − 1 . Then

sk+1 = 1(1!) + 2(2!) + . . . + k(k!) + (k + 1) (k + 1)! = (k + 1)! − 1 + (k + 1) (k + 1)!.

Since

(k + 1)! − 1 + (k + 1) (k + 1)! = (k + 1)! [1 + k + 1] − 1 = (k + 2)! − 1,

P (k + 1) is true. By induction P (n) is true for all n .


Page 23 Dr Matteo Capoferri & Dr Adrian Turcanu

1.7 The Sum and Product Rules


Consider the following examples of counting problems

Example 1.35. The mathematics teaching committee requires one student repre-
sentative from either the first year or the second year or the third year. If there
are 500 first year, 300 second year and 100 third year students, how many possible
different representatives are there?
Solution. There are obviously 500 different choices for picking a representative from
the first year students, 300 for the second and 100 for the third year. Since the
students are all different and we have to pick only one representative from either
year we can add the numbers of choices for a representative from each year to obtain
900 different possible representatives.

The solution to the above problem follows form a general rule.

The sum rule

If there are n(A) ways to do A and, distinct from them, n(B) ways
to do B, then the number of ways to do A or B is n(A) + n(B).
Similarly, one adds n(A), n(B), n(C) when one must do A or B or C;
and so on.

Consider a different problem.

Example 1.36. The mathematics student committee is composed of three students:


one from each of the first year, the second year and the third year. If there are 500
first year, 300 second year and 100 third year students, how many possible different
committees are there?
Solution. There are 500 choices for a representative from the first year students.
For each of those representatives there are 300 ways to pick a representative from
the second year students. So we have 500 × 300 = 150, 000 ways to pick the two
representatives from the first and second year students. For each choice of those (for
each pair of representatives) we have 100 choices for a third year representatives. In
total this gives us 150, 000 × 100 = 15, 000, 000 different possible committees. Thus
in this problem we had to multiply through the numbers for each respective choice:
500 × 300 × 100.

The solution to the above problem follows form a general rule.


Discrete Mathematics Page 24

The product rule

If there are n(A) ways to do A and n(B) ways to do B, then the


number of ways to do A and B is n(A) × n(B). It is assumed that
A and B are independent; that is, the number of choices in B is the
same regardless of which choice in A is taken. Similarly, one multiplies
theree terms: n(A) × n(B) × n(C) if one has to do choices in A, B,
and C independently; and so on.

To solve the following example we need to use both of the above rules.

Example 1.37. In a certain computer system a file name must be a string of letters
and digits which is 1,2, or 3 symbols long. The first symbol must be a letter and
the system does not make any distinction between uppercase and lowercase letters.
How many legitimate file names are there?
Solution. A file name either has one symbol, or two symbols or three symbols. In
each case we have 26 choices for the first letter (the 26 letters of the alphabet).
The number of one-symbol names is then 26. The number of 2-symbol names is
26 × 36 = 936 because for the second symbol we can have either a letter or a
digit, 36 = 26 + 10 choices. We multiply the number of choices for the first and
for the second symbol because both choices are done independently. Similarly the
number of three-symbol names is 26 × 36 × 36 = 33696. Summing the numbers for
the three different cases (by the sum rule) we obtain that the system at hand can
accommodate 26 + 936 + 33696 = 34658 different files.

1.7.1 Exercises

Exercise 1.7.1 (∗). A binary word is made of 0’s and 1’s. How many binary words
are there with length exactly 7? How many binary words are there with length up
to 3?

Solution. For each digit we have two choices: 0 or 1. Thus by the product rule
we have 2 × 2 × 2 × 2 × 2 × 2 × 2 = 27 different words of length 7. To count the
number of words with length up to 3 we need to add up the numbers of words of
length 1,2,3 (here we use the sum rule). Using the product rule as before we obtain:
21 + 22 + 23 = 14 - the number of binary words with length up to 3.

Exercise 1.7.2. In a restaurant for the main dish there are 10 meat, 5 fowl, and 8
fish choices. How many ways are there to pick a main dish?
Page 25 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 1.7.3 (∗). A salesman is to visit 5 towns. Assuming that he goes to one
town after another, visiting all of them once, in how many orders can he make his
trip?

Solution. The salesman has 5 choices for the first town in his itinerary. Since each
town is to be visited only once he/she has 4 choices for the second town in the trip,
and so on. By the product rule we obtain 5 × 4 × 3 × 2 × 1 = 120 - the number of
different trips.

Exercise 1.7.4 (∗). How many numbers between 1 and 1000 have exactly one 7 in
them?

Solution. The condition of the problem means that we are considering numbers of
up to 3 digits. Hence we can count separately the numbers at hand with length
1,2,3 and then add them up. Clearly we have only one such number of length 1: 7.
For two-digit numbers the digit 7 can either be the first digit or the second one. If
it stands first then we have 9 choices for the second digit: 0, 1, 2, 3, 4, 5, 6, 8, 9. If 7
is the second digit we have only 8 choices for the first digit because 0 is not allowed.
Thus we have 9 + 8 = 17 required numbers of length 2. For length 3 the digit 7 can
again be on the 1st, 2nd or 3rd position. As before, taking care of the zero, we obtain
9 × 9, 8 × 9 and 8 × 9 different numbers for each case respectively. Finally using the
sum rule we add up all different cases and obtain 1 + 17 + 81 + 72 + 72 = 243.

1.8 Permutations and Combinations


Permutations and combinations are the basic building blocks of combinatorial math-
ematics.

Definition 1.38. A permutation of n objects taken k at a time is any ordered


selection of k distinct objects from a given set of n objects.
Note that the order in which the selection is made matters. Thus for a set of 5
letters {B, S, W, K, L} the lists {B, W, K}, {W, B, K} are two different permutations
of the 5 given letters taken 3 at a time.
Repetitions are not allowed. Thus for the above example a list {B, B, L} is not
a permutation. We will often say “ordered selection without repetitions” instead of
“permutations” as this is self-descriptive.

Example 1.39. In how many ways can a committee with 10 members select a
chairperson, a treasurer and an administrator?
Discrete Mathematics Page 26

Solution. Suppose that the first person to be selected is to be the chairperson. There
are 10 different was to do it. Suppose the second person selected is the treasurer for
whom we have 9 choices. Finally, suppose the last person chosen is the administrator,
for whom there are 8 possible people available. By the product rule the number of
selections is 10 × 9 × 8 = 720.

In the above example we had to make an ordered selection of 3 distinct people


from a group of 10 people. For general permutations the following theorem holds.

Theorem 1.40. The number of ordered selections of k distinct objects from a set
of n objects is n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1).

Remark 1.6. The above number can be alternatively written as

n!
n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1) = (1.1)
(n − k)!

where we used the factorial notation

n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1

Note that by convention one defines

0! = 1 .

With this convention the expression on the right hand side in (1.1) gives the same
result for n = k as the expression on the left hand side.

Proof of Theorem 1.40. There are n ways to select the first object. After the first
object is selected there are n − 1 objects left and there are respectively n − 1 choices
for the second object. We continue selecting until the last object for which there
are n − k + 1 choices. By the product rule these selections can be made in n(n −
1)(n − 2) . . . (n − k + 1) ways.

Remark 1.7. Observe that using formula (1.1) for k = n we find that the number of
different ways to order n objects is n!.
Consider now a different kind of selections called combinations.

Definition 1.41. A combination of n objects taken k at a time is any selection of


k distinct objects from a given set of n objects.

In this case the order of objects selected does not matter. We will often say
“unordered selection without repetitions” rather than “combination”.
Page 27 Dr Matteo Capoferri & Dr Adrian Turcanu

Theorem 1.42. The number of unordered selections of k objects from a set of n


objects without repetitions is
n(n − 1) . . . (n − k + 1) n!
= (1.2)
k! (n − k)!k!
The combination of factorials on the right hand side is called a binomial coefficient
and is denoted  
n n!
:= . (1.3)
k (n − k)!k!
Proof. We already know that the number of ordered selections is given by formula
(1.1). To pick an ordered selection of k elements one can first make an unordered
selection of k elements and then choose an order. We know that there are k! different
ways to order k elements. Thus by the product rule
 
n! n
= × k!
(n − k)! k

where nk stands for the number of unordered selections. Dividing both sides by k!


we obtain formula (1.2).


The binomial coefficients (1.3) appear in the binomial formula that gives coeffi-
cients for an expansion of a power
n  
n
X n k n−k
(x + y) = x y . (1.4)
k=0
k

For example
           
5 5 0 5 5 1 4 5 2 3 5 3 2 5 4 1 5 5 0
(x + y) = xy + xy + xy + xy + xy + xy
0 1 2 3 4 5
= y 5 + 5xy 4 + 10x2 y 3 + 10x3 y 2 + 5x4 y + x5 .

Example 1.43. In the card game bridge, a hand has 13 cards. There are 52 cards
in a deck. How many different bridge hands are there?
Solution. We have to make an unordered selection of 13 cards out of 52. Repetitions
are not allowed. By the above theorem there are 52 13
= 635013559600 different
bridge hands.

Example 1.44. A full house in poker is a collection of 5 cards in which 3 of them


are from one denomination and 2 from another. In a pack of cards there are 13
denominations (2,3,...,queen,king,ace) and 4 cards of each. How many different full
houses are there?
Discrete Mathematics Page 28

Solution. The order in which we pick cards for a full house does not matter so we
can specify a convenient order. Let us first pick a denomination for three cards.
There are 13 choices. Next we have to pick  3 cards out of 4 in this denomination.
Since the order does not matter we have 43 = 4 choices. Next let us pick the second
denomination. We have 12  choices for it. To pick 2 cards out of 4 in the chosen
denomination we have 42 = 6 choices. Finally by the product rule we multiply
through the choices to obtain 13 × 4 × 12 × 6 = 3744 — the number of different full
houses.

1.8.1 Exercises

Exercise 1.8.1. Find in how many ways an ordered selection of 2 elements out of
a set of 5 elements can be made. No repetitions are allowed. How many unordered
selections of 2 elements from the same set are there?

Exercise 1.8.2. There are 14 available ingredients for making perfume. How many
different perfumes can be made if one is to use 4 ingredients for each one?

Exercise 1.8.3. A travelling salesemen is to visit any 5 different cities from a list
of 10. How many different itineraries he can have if he visits the cities successively
one after another?

Exercise 1.8.4 (∗). What is the number of ways in which a subset of one or two
elements can be picked up from a set of n elements?

Solution. Clearly there are n ways to pick a subset of one element. To pick a subset
of two elements we have to make an unordered selection with no repetitions of 2
elements out of n. Hence we have n2 = 2×(n−2)!
n!
= n(n−1)
2
two-element subsets. By
n(n−1) n(n+1)
the sum rule we obtain n + 2
= 2
.

Exercise 1.8.5 (∗). A university department has 12 male professors and 3 female
professors. It is decided that every committee of professors in this department should
Page 29 Dr Matteo Capoferri & Dr Adrian Turcanu

have at least one female member. How many different committees of 12 people can
be formed?

Solution. If we temporarily disregard the requirement that there is at least one


15

female professor in the committee we have 12 = 455 different committees. To
ensure the condition of having at least one female professor we need to subtract
from the above number the number of all male committees which is just 1 as there
are only 12 male professors. Hence the final answer is 454.

Exercise 1.8.6. The computer science students are entering a team of 5 in the
university relay race. There are 20 undergraduates and 10 postgraduates. How
many teams are possible if
(a)(∗) any student is eligible
(b) there must be 2 postgraduates and 3 undergraduates

Solution. (a) We have an unordered


 selection without repetitions of 5 students out
of 30 = 10 + 20. This gives us 30
5
= 15504 possible teams.

Exercise 1.8.7 (∗). How many ways are there to pick a combination of k distinct
numbers from {1, 2, . . . , n} if 1 and 2 cannot both be picked?

Solution. We are dealing here with unordered selections. If we temporarily disregard


n

the condition on 1 and 2 not being picked together we obtain k selections. To take
care of the additional condition we must subtract from that result the number of
forbidden combinations. In such combinations both 1 and 2 are picked. The number
n−2 n n−2

of all such combinations is k−2 . Therefore the final answer is k − k−2 .

1.9 The Inclusion-Exclusion Principle

If a set A contains a finite number of elements then it is called a finite set and we
write |A| for the number of elements in it. An infinite set is a set that is not finite.
The number |A| is called the cardinality of A. Hence |{x, y, z}| = 3 and the set of
positive integers is infinite.
Discrete Mathematics Page 30

Inclusion-Exclusion Principle

If A and B are finite sets, then

|A ∪ B| = |A| + |B| − |A ∩ B|. (1.5)

For instance, suppose we want to find the number of cards in an ordinary deck
that are clubs or aces. Let A be the set of cards that are aces and let C be the set
of cards that are clubs.
We have |A| = 4 , |C| = 13 and |A ∩ C| = 1 (the ace of clubs). Then the
Exclusion Principle gives us
|A ∪ C| = |A| + |C| − |A ∩ C| = 4 + 13 − 1 = 16.

Example 1.45. A survey in Scotland finds that 86% of the population have heard
of product A while 80% have heard of product B. Also, 70% of the population
have heard about both products. What percentage of the population have not
heard about either product?
Solution. Let A be the set of people who have heard about product A and define
B similarly. Let N be the population of Scotland.
From the given data, |A| = (0.86)N , |B| = (0.8)N and |A ∩ B| = (0.7)N .
Then

|A ∪ B| = |A| + |B| − |A ∩ B| = (0.86)N + (0.8)N − (0.7)N = (0.96)N.

Hence 96% of the population have heard of either product A or product B and 4%
have not heard about either product.

The Inclusion-Exclusion Principle for the union of three sets reads


|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C| (1.6)

In the lectures I will indicate why (1.6) holds by means of a Venn diagram.

Example 1.46. How many integers from 1 to 100 are divisible by 2 or 3 or 5?


Solution. U = {1, 2, . . . , 100} , A = {n ∈ U | 2 divides n}
B = {n ∈ U | 3 divides n} , C = {n ∈ U | 5 divides n}
We need to find |A ∪ B ∪ C|.
We have that |A| = 100/2 = 50 , |C| = 100/5 = 20 , and |B| is the integer part of
100/3 , that is |B| = 33 .
Page 31 Dr Matteo Capoferri & Dr Adrian Turcanu

For a number to be in the set A ∩ B it must be divisible by 6. Hence |A ∩ B| = 16 .


Numbers in the set A ∩ B ∩ C are divisible by 30. Hence |A ∩ B ∩ C| = 3 .
Similarly, |A ∩ C| = 10 and |B ∩ C| = 6 so

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

and
|A ∪ B ∪ C| = 50 + 33 + 20 − 16 − 10 − 6 + 3 = 74.

Example 1.47. At the University of Scabia all first year computer science students
have to study at least one of the three modules : accounting, ethics and mathematics.
There are 237 first year computer science students. Also, 150 study maths, 120
study accounting, 100 study ethics, 50 study both accounting and maths, 20 study
both accounting and ethics and 70 study both maths and ethics.

(a) Find the number who study all three subjects.

(b) Find the number who study accounting and maths but not ethics.

Solution. Let A be the set of students studying accounting, M those doing maths
and E those doing ethics. From the given data,

|A| = 120 |E| = 100 |M | = 150

|A ∩ M | = 50 |A ∩ E| = 20 |E ∩ M | = 70
Also, since everyone is studying at least one of the three modules,

|A ∪ E ∪ M | = 237

Using

|A ∪ E ∪ M | = |A| + |E| + |M | − |A ∩ E| − |A ∩ M | − |E ∩ M | + |A ∩ E ∩ M |

237 = 120 + 100 + 150 − 20 − 50 − 70 + |A ∩ E ∩ M |


|A ∩ E ∩ M | = 7
Hence there 7 students who study all three subjects.
The set of students not studying ethics is E .
Hence the number who study accounting and maths but not ethics is A ∩ M ∩ E .
Let X = A ∩ M ∩ E and Y = A ∩ M ∩ E . Then

X ∩Y =∅ X ∪Y =A∩M
Discrete Mathematics Page 32

By (1.5) , |X ∪ Y | = |X| + |Y | − |X ∩ Y | so

|A ∩ M | = |A ∩ M ∩ E| + |A ∩ M ∩ E|

Hence
50 = 7 + |A ∩ M ∩ E|
and |A ∩ M ∩ E| = 43 .

1.9.1 Exercises

Exercise 1.9.1 (∗). A and B are sets with |A| = 30 and |B| = 20 . Find A ∪ B
if

(a) A ∩ B = ∅ ,

(b) |A ∩ B| = 5 ,

(c) B ⊂ A .

Solution. (a) If A ∩ B = ∅ then |A ∪ B| = |A| + |B| = 50 .


(b) |A ∪ B| = |A| + |B| − |A ∩ B| = 45 .
(c) If B ⊂ A then A ∪ B = A and |A ∪ B| = |A| = 30 .

Exercise 1.9.2 (∗). A , B and C are sets with |A| = |B| = |C| = 100 .
Find |A ∪ B ∪ C| in the following cases.
(a) A = B = C .
(b) A ∩ B = A ∩ C = B ∩ C = ∅ .
(c) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and A ∩ B ∩ C = ∅ .
(d) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and |A ∩ B ∩ C| = 20 .
Solution. (a) If A = B = C then A ∪ B ∪ C = A and |A ∪ B ∪ C| = |A| = 100 .
(b) If A ∩ B = A ∩ C = B ∩ C = ∅ then
|A ∪ B ∪ C| = |A| + |B| + |C| = 300

(c) |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| = 150 .


(d) |A∪B ∪C| = |A|+|B|+|C|−|A∩B|−|A∩C|−|B ∩C|+|A∩B ∩C| = 170 .
Page 33 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 1.9.3 (∗). A , B and C are sets with |A| = 10, |B| = 100, |C| = 1000 .
Find |A ∪ B ∪ C| if

(a) A ⊂ B and B ⊂ C ,

(b) A ∩ B = A ∩ C = B ∩ C = ∅ ,

(c) |A ∩ B| = |A ∩ C| = |B ∩ C| = 5 and |A ∩ B ∩ C| = 1 .

Solution. (a) If A ⊆ B and B ⊆ C then A ∪ B ∪ C = C and |A ∪ B ∪ C| =


|C| = 1000 .
(b) |A ∪ B ∪ C| = |A| + |B| + |C| = 1110 .
(c) |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| = 1096 .

Exercise 1.9.4 (∗). How many integers from 1 to 1000 are divisible by 7 or 11?

Solution. For integers between 1 and 1000, let A be the set of those divisible by 7
and B be the set of those divisible by 11.
We need to find |A ∪ B|. Now, |A| is the integer part of 1000/7 , that is
|A| = 142 . Also |B| = 90 . For a number to be in the set A ∩ B it must be divisible
by 77. Hence |A ∩ B| = 12 .

|A ∪ B| = |A| + |B| − |A ∩ B| = 142 + 90 − 12 = 220.

Exercise 1.9.5 (∗). How many integers from 1 to 200 are divisible by 2 or 5 or 7?

Solution. For integers between 1 and 200, let A be the set of those divisible by 2,
B the set of those divisible by 5 and C the set of those divisible by 7. We need to
find |A ∪ B ∪ C|.
We have that |A| = 200/2 = 100 , |B| = 40 and |C| = 28 .
For a number to be in the set A∩B it must be divisible by 10. Hence |A∩B| = 20.
Similarly, |A ∩ C| = 14 and |B ∩ C| = 5.
Numbers in the set A ∩ B ∩ C are divisible by 70. Hence |A ∩ B ∩ C| = 2 .

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

|A ∪ B ∪ C| = 100 + 40 + 28 − 20 − 14 − 5 + 2 = 131.
Discrete Mathematics Page 34

Exercise 1.9.6 (∗). There are 73 students in the first year Humanities class at the
University of Scabia. Among them a total of 52 can play the piano, 25 can play
the violin, and 20 can play the flute; 17 can play the piano and violin, 12 can play
the piano and flute, and 7 can play the violin and flute. Only Hamish Ratho can
play all three instruments. How many of the class cannot play any of them?

Solution. Let A be the set of students who can play the piano, B the set of
students who can play the violin and C the set of students who can play the flute.
Then
|A| = 52 |B| = 25 |C| = 20
|A ∩ B| = 17 |A ∩ C| = 12 |B ∩ C| = 7 |A ∩ B ∩ C| = 1.
Using

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|,

we have that

|A ∪ B ∪ C| = 52 + 25 + 20 − 17 − 12 − 7 + 1 = 62.

The number of students who cannot play any of the three instruments is 73 − 62 =
11 .

Exercise 1.9.7 (∗). At the University of Scabia all first year students have to
study at least one of the three modules : computing, physics and mathematics.
There are 1400 first year students. Also, 700 study maths, 600 study physics,
500 study computing, 200 study both physics and maths, 150 study both physics
and computing and 120 study both maths and computing.

(a) Find the number who study all three subjects.

(b) Find the number who study computing and maths but not physics.

Solution. Let A be the set of students studying maths, B those doing physics and
C those doing computing.
From the given data,

|A| = 700 |B| = 600 |C| = 500

|A ∩ B| = 200 |A ∩ C| = 120 |B ∩ C| = 150 |A ∪ B ∪ C| = 1400


Using

|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|


Page 35 Dr Matteo Capoferri & Dr Adrian Turcanu

we have that

1400 = 700 + 600 + 500 − 200 − 120 − 150 + |A ∩ B ∩ C|,

so that |A ∩ B ∩ C| = 70 . Hence there are 70 students who study all three subjects.
The set of students not studying physics is B . Hence the number who study
computing and maths but not physics is A ∩ C ∩ B .
Let X = A ∩ C ∩ B and Y = A ∩ C ∩ B . Then

X ∩Y =∅ X ∪ Y = A ∩ C.

Using
|X ∪ Y | = |X| + |Y | − |X ∩ Y | = |X| + |Y |
we have that
|A ∩ C| = |A ∩ C ∩ B| + |A ∩ C ∩ B|.
Hence |A ∩ C ∩ B| = 120 − 70 = 50 .
Chapter 2

Probability

2.1 Probability Space


Probability theory provides mathematical models for understanding uncertain sit-
uations. We call such situations random experiments. Some examples of random
experiments are:

• flipping of a coin;

• rolling a die;

• next week’s lottery draw;

• computer software is designed to process random amounts of data customers


will put in: for a random input data measuring the amount of time needed to
process it by the program is a random experiment;

• the top link in a Google search (it will either lead to the desired information
or it won’t);

• generating a number using a C ++ random number function “rand( )”.

A probability model, although it cannot be used to determine (in advance) the


outcome of a random experiment, should capture the essential features of the random
experiment. Thus it has to be consistent with any observable “statistical regularity”
displayed by the random phenomenon.

A probability model for a random experiment has two ingredients: a sample


space and a probability function.

Definition 2.1. The sample space S for a random experiment is the set of all
possible outcomes of the experiment.

Here are some examples of sample spaces:

36
Page 37 Dr Matteo Capoferri & Dr Adrian Turcanu

Experiment Sample space

Flip a coin once S = {H, T }

Roll a die once S = {1, 2, 3, 4, 5, 6}

Flip a coin repeatedly until you obtain a “head” S = {H, T H, T T H, T T T H, . . . }


The sample spaces in these examples are all discrete (the elements can be counted)
with the first two spaces being finite and the last one infinite. In applications a sam-
ple space can be also a continuous set. In this course we will only deal with discrete
sample spaces.
Definition 2.2. Any subset of a sample space is called an event.
Consider for example the rolling a die experiment. The sample space is S =
{1, 2, 3, 4, 5, 6}. The subset A = {2, 4, 6} is the event that the outcome of the roll is
even. The subset {1, 2, 3, 4} is the event that the outcome is at most 4. A∩B = {2, 4}
is the event that outcome is even and at most 4. A ∪ B = {1, 2, 3, 4, 6} is the event
that the outcome is even or at most 4. Note that any element of the sample space S
can be viewed as a subset and thus also represents an event. For example in rolling
a die model C = {2} is the event that the outcome is 2.

Example 2.3 (*). Suggest suitable sample spaces, and identify the subset corre-
sponding to the event A, for the following situations:
(a) A coin, which shows heads (H) or tails (T), is tossed three times; A is the event
that the coin shows head twice.
(b) A game of football is played; A is the event that the match is drawn.
(c) A couple have two children; A is the event that both are girls.
Solution. (a) We can take

S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

and
A = {HHT, HT H, T HH}.

(b) One option is to take

S = {(n, m) | n, m are integers ≥ 0}

where n is the first team’s score and m is the second team’s score. In this case

A = {(n, m) | n = m}.

Alternatively we can consider S = {W, D, L} with W meaning the first team


wins, D — it’s a draw, L — first team loses. In this case A = {D}.
Discrete Mathematics Page 38

(c) We can take S = {GG, GB, BG, BB} where, for example, GB means first child
is a girl, second child is a boy. In this case, A = {GG}. Alternatively we can
take S = {0, 1, 2} with each sample point corresponding to number of girls born.
In this case, A = {2}.

The second ingredient in a probability model is the probability function.

Definition 2.4. The probability function is a function P (E) that assigns a value
between 0 and 1 to any event E ⊂ S. The value 0 ≤ P (E) ≤ 1 is called the
probability of event E.

P (E) indicates, on a scale from 0 to 1, how “likely” it is that the outcome of


the random experiment will be some element of the event E ⊂ S. P (E) = 0 means
that it is impossible that the outcome will be in set E. P (E) = 1 means that it is
certain that the outcome will be in set E.
The values that a probability function takes have to satisfy certain consistency
requirements, formulated below.

Probability axioms

1. For all subsets E ⊂ S of the sample space 0 ≤ P (E) ≤ 1;

2. P (∅) = 0, P (S) = 1;

3. For any two disjoint subsets: A, B ⊂ S, A ∩ B = ∅ we have

P (A ∪ B) = P (A) + P (B) .

Axiom 3 can be extended to the case where one has more than two events: if the
events E1 , E2 , . . . En are mutually exclusive, that is Ei ∩ Ej = ∅ for any i ̸= j, then

P (E1 ∪ E2 ∪ · · · ∪ En ) = P (E1 ) + P (E2 ) + · · · + P (En ) .

In particular for any finite subset E the probability P (E) is given by the sum of
probabilities assigned to its individual points.
This gives a recipe for specifying and computing a probability function. By
Axiom 2 the sum of the probabilities assigned to all sample points must by equal to
1:
n
X
for S = {1, 2, 3, . . . , n} one has pi = 1, (2.1)
i=1

where we used the notation pi = P ({i}). To specify a probability space it suffices


to specify the values pi ≥ 0 for each of its sample point satisfying (2.1).
Page 39 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 2.5. Consider again the experiment in which we roll a die. The sample
space is S = {1, 2, 3, 4, 5, 6}. If we think that the die is fair, that is all possibilities
are equally likely, then we should set p1 = p2 = ... = p6 = 16 . Then condition (2.1)
is satisfied.

In general for a finite sample space S with N elements one can define an equiprob-
able probability function by assigning pi = N1 for each sample point i ∈ S. This
means that all outcomes are equally likely.

WARNING: not all probability functions on finite sets are equiprobable!

For any event E we call its complement E = S \ E an event “not E” (any other
outcome but E). The following theorem holds.

Theorem 2.6. For any event E ⊂ S we have


P (E) = 1 − P (E) . (2.2)
Proof. The sample space S can be represented as a union of two disjoint sets: S =
E ∪ E. Therefore by Axioms 2 and 3: 1 = P (S) = P (E) + P (Ē). Now subtract
P (E) from both sides to obtain (2.2).
Axiom 3 tells us about the probability of a union of two sets when those sets are
disjoint. A natural question arises — what can be said about P (A ∪ B) when A and
B are not disjoint? The answer is given by the following theorem.

Theorem 2.7. For arbitrary events A, B ⊂ S we have


P (A ∪ B) = P (A) + P (B) − P (A ∩ B) . (2.3)
Proof. From the Venn diagram it can be seen that we can represent each set A, B,
A ∪ B in terms of disjoint unions:
A = (A \ B) ∪ (A ∩ B) , B = (B \ A) ∪ (A ∩ B) ,
A ∪ B = (A \ B) ∪ (A ∩ B) ∪ (B \ A) .
By axiom 3 we have
P (A) = P (A \ B) + P (A ∩ B) , P (B) = P (B \ A) + P (A ∩ B) ,
P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A)
Adding together the first two equations and subtracting the third one we obtain
P (A) + P (B) − P (A ∪ B) = [P (A \ B) + P (A ∩ B)] + [P (B \ A) + P (A ∩ B)]
−[P (A \ B) + P (A ∩ B) + P (B \ A)]
= P (A ∩ B)
and (2.3) follows by rearranging the terms.
Discrete Mathematics Page 40

Note the similarity of the statement of last theorem to the Inclusion-Exclusion


Principle. The similarity extends even further. We will be using the following
theorem.
Theorem 2.8. For any three events A, B, and C we have

P (A∪B∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (A∩C)−P (B∩C)+P (A∩B∩C) .

The proof uses the same idea as in the proof of Theorem 2.7. You are warmly
encouraged to try and write it down yourself as an exercise!

Example 2.9 (*). If P (A) = 2/3 and P (B) = 3/4 what are the largest and the
smallest values that P (A ∩ B) can take?
Solution. The intersection A ∩ B is a subset of A and also is a subset of B. Thus
its probability has to be smaller than P (A) and smaller than P (B). So P (A ∩ B) ≤
2
3
= P (A) - the smaller of the two. Moreover it may take that value if A ⊂ B in
which case P (A ∩ B) = P (A) = 23 . Thus the largest P (A ∩ B) can be is 2/3. Next
consider the equality: P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 1712
− P (A ∪ B). The
17 5
largest P (A ∪ B) can be is 1. Thus the smallest P (A ∩ B) can be is 12 − 1 = 12 .

Example 2.10. One often hears that the theory of probability started in the sev-
enteenth century, when a French nobleman, the Chevalier de Méré, proposed the
following problem in 1654 to his friend Pascal: Why is one more likely to obtain a
6 in four throws of a die than to obtain a double 6 in 24 throws of two dice? This
problem is known as de Méré’s paradox. We use the word paradox, because, based
on the fact that there are 6 possible results when we roll a die and 36 possible results
when we roll two dice, some people thought that the two events above should have
the same probability. Indeed, notice that the number of throws, divided by the num-
ber of possible results, is equal to 2/3 in both cases (4/6 = 24/36 = 2/3 = 0.66...).
Nowadays, we can easily compute the probability of each event. We find that the
probability of obtaining at least one 6 in four rolls of a (fair or non-biased) die is
1 − (5/6)4 = 671/1296 ≈ 0.5177, while the probability of getting at least a double
6 in throwing two dice 24 times is 1 − (35/36)24 ≈ 0.4914. One can conclude that
the Chevalier de Méré must have spent a lot of time throwing dice to discover such
a small difference.

2.1.1 Exercises

Exercise 2.1.1 (*). In the following experiments decide what are the outcomes and
state if they are equally likely.
Page 41 Dr Matteo Capoferri & Dr Adrian Turcanu

(a) A red, blue and a yellow ball are put into a bag and a ball drawn out at random.

(b) Two yellow balls, one red ball and one blue are in a bag. One ball is drawn at
random.

(c) Same collection of balls. Two balls are drawn at once.

(d) Two children throw a die to see who starts the game. If the die shows a prime
number Joe starts, otherwise Ellen starts.

Solution. (b) We can take the outcomes to be {Y, R, B} — the colour of the ball
drawn. It is natural to assume that drawing each ball is equiprobable, that is 1/4.
As there are two yellow balls in the bag there is a larger probability of drawing a
yellow ball — 1/2 = 1/4 × 2 — versus 1/4 — the probability of drawing the blue
ball — and 1/4 — the probability of drawing the red ball.
(c) The outcomes now are unordered pairs {Y Y, Y B, Y R, BR}. The probability
of drawing the pair Y B is the same as the probability of drawing Y R and it is twice
larger than the probability of drawing Y Y (which is the same as the probability for
RB).

Exercise 2.1.2 (*). Prove that if A ⊂ B then P (A) ≤ P (B).

Solution. We can represent B as the disjoint union

B = A ∪ (B \ A).

Hence by the third axiom of probability we have P (B) = P (A) + P (B \ A). Since
by the first axiom P (B \ A) ≥ 0, we conclude that P (B) ≥ P (A).

Exercise 2.1.3 (*). A weather forecaster assigns probabilities as follows:

P (A) = P ( rain today ) = 30% , P (B) = P ( rain tomorrow ) = 40% ,

P (C) = P ( rain today and tomorrow ) = 20% .


What is the probability that it will rain tomorrow but not today?

Solution. We are given P (A) = 0.3, P (B) = 0.4, P (A ∩ B) = 0.2. We are asked
to find P (B ∩ Ā). The set B can be represented as a disjoint union: B = (B ∩
A) ∪ (B ∩ Ā). Thus using P (B) = P (B ∩ A) + P (B ∩ Ā) we find P (B ∩ Ā) =
P (B) − P (B ∩ A) = 0.4 − 0.2 = 0.2. This means that there is 20% chance that it
will rain tomorrow but not today.
Discrete Mathematics Page 42

Exercise 2.1.4 (*). Events A and B are such that P (A) = 0.75, P (B) = 0.65.
Explain why events A and B cannot be mutually exclusive. (Two events A and B
are called mutually exclusive if A ∩ B = ∅.) What can you say about P (A ∪ B) and
P (A ∩ B)?

Solution. If the events A and B were mutually exclusive then by formula (2.3) we
would have P (A ∪ B) = P (A) + P (B) but this is impossible because P (A) + P (B) =
1.4 > 1. As in Example 2.9, the probability P (A∩B) is the largest when the set with
smaller probability, the set B, is a subset of the set with the larger probability, the set
A. In this case P (A ∩ B) = 0.65. In the same situation P (A ∪ B) takes the smallest
possible value: 0.75. The largest P (A ∪ B) can be is 1. By formula (2.3) this is the
situation when P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 0.75 + 0.65 − 1 = 0.4 is the
smallest. All in all, we found that 0.4 ≤ P (A ∩ B) ≤ 0.65 and 0.75 ≤ P (A ∪ B) ≤ 1.

Exercise 2.1.5. Events A, B and C are such that P (A) = 0.7, P (B) = 0.6,
P (C) = 0.5, P (A∩B) = 0.4, P (A∩C) = 0.3, P (B∩C) = 0.2 and P (A∩B∩C) = 0.1.
Find

(a) P (A ∪ B),

(b) P (A ∪ B ∪ C),

(c) P (Ā ∩ B̄ ∩ C).

Exercise 2.1.6. An experiment consists of three successive throws of a die. Con-


sidering each outcome as equally probable, find

(a) the probability of having 3 ‘six’,

(b) the probability of having at list one ‘six’, and

(c) the probability of having two ‘six’ one after another.

2.2 Conditional Probability and Independence


Consider the following example.
Page 43 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 2.11 (*). A fair coin is tossed twice. Let A be the event that at least one
head is obtained. Suppose at the end of experiment we are told that A has occurred
(but we have not seen the actual result). What are the probabilities of other events?
Solution. In light of the occurrence of A we should revise the probabilities of other
events. We define a new conditional probability function P (·|A) given that event A
has occurred:
Sample point P (·) P (·|A)
1 1
HH
4 3
1 1
TH
4 3
1 1
HT
4 3
1
TT 0
4
How did we get the revised probabilities given that A has occurred? Sample
points outside A (the point TT) are assigned probability 0 (since we know that A has
occurred). Sample points inside event A have their original probabilities multiplied
by a constant in such a way that they still add up to 1. Original probabilities total
to 3/4 so if we divided each by 3/4 we get revised probabilities which sum to 1.

Note that 3/4 — the factor we had to divide by — is P (A). This is not a
coincidence. The construction of conditional probability follows the following general
rule
P (B ∩ A)
P (B|A) = . (2.4)
P (A)
(It is assumed that P (A) ̸= 0.)

Example 2.12 (*). A fair coin is tossed three times. What is the probability that
the first toss was tails given that at least one head is obtained?
Solution. We have that A = {HHH, HHT, HT H, HT T, T HH, T HT, T T H} is the
event that at least one head has occurred and B = {T HH, T HT, T T H, T T T } is the
event that the first toss was tails. Their intersection is A∩B = {T HH, T HT, T T H}.
Since each point has probability 1/8 we have P (A) = 7/8, P (A ∩ B) = 3/8. Hence
by the above general formula

P (B ∩ A) 3/8 3
P (B|A) = = = .
P (A) 7/8 7
Discrete Mathematics Page 44

Example 2.13 (*). An urn contains six red balls and three blue balls. Three balls
are drawn at a time from the urn (without replacement). What is the probability
that the first ball is red? How is your answer modified if you are supplied an
additional information that the last two balls are red?
Solution. First we should define a sample space for this model. Let us enumerate
the balls so we can distinguish them: balls numbered from 1 to 6 are red and the
balls numbered 7,8,9 are blue. Then the sample points are triples of distinct numbers
each from 1 to 9 corresponding to the three draws. The total number of points in the
sample space is |S| = 9×8×7 = 504. There is no reason to suppose that one sample
point is more likely than another so we set the probability of each sample point to
be 1/504. Let A1 be the event that the first ball is red. We have |A1 | = 6 × 8 × 7
(we have 6 choices for the red ball at the first draw, 8 choices for the 2nd draw since
one ball was already picked, and 7 choices for the 3rd draw). Therefore the answer
to the first question is

|A1 | 6×8×7 6 2
P (A1 ) = = = = .
|S| 9×8×7 9 3

Now let B be the event that the last two balls are red. Then A1 ∩ B is the event
that all three balls are red. We have |B| = 6 × 5 × 7 and |A1 ∩ B| = 6 × 5 × 4 = 120.
Therefore
P (A1 ∩ B) |A1 ∩ B| 6×5×4 4
P (A1 |B) = = = = .
P (B) |B| 6×5×7 7

The idea of events being unrelated is expressed in probability theory as indepen-


dence of the two events.

Definition 2.14. We say that two events A and B are independent if the occurrence
of B has no effect on the probability of A. So

P (A|B) = P (A) . (2.5)

It follows from formula (2.4) (CHECK THIS!) that the events A and B are
independent if and only if

P (A ∩ B) = P (A)P (B) . (2.6)

The criteria of independence stated in this form is manifestly symmetric with respect
to the roles of A and B. If B has no effect on A then A has no effect on B so that

P (B|A) = P (B)

holds as well.
Page 45 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 2.15 (*). A coin is tossed 3 times. Let the event A be that three heads are
obtained and the event B be that the first toss is a head. Are A and B independent
events?
Solution. We have
1 4 1 1
P (A) = , P (B) = = , P (A ∩ B) = P (A) = .
8 8 2 8
Then
1 1
P (A)P (B) = ̸= P (A ∩ B) = .
16 8
Hence, the events are not independent.

Example 2.16 (*). In a class there are 4 left-handed men, 6 left-handed women
and 6 right-handed men. How many right-handed women must be present if sex
and handedness are to be independent when a student is selected at random?
Solution. Let A be an event that a student is a woman, B the event that the student
is left-handed and x the number of right-handed women students in the class. So
the number of women in the class is 6 + x, the number of left-handed students is
10 and the total number of students is 16 + x. Using formula (2.6) we obtain an
equation
  
6+x 10 6
P (A)P (B) = = P (A ∩ B) =
16 + x 16 + x 16 + x

which is equivalent to
10(6 + x) = 6(16 + x)
and the solution is x = 9. The class must have 9 right-handed women.

2.2.1 Exercises

Exercise 2.2.1. Find P (A|B) if P (B|A) = 1/6, P (A) = 1/2, P (B) = 1/3.

Exercise 2.2.2. Find P (A|B) if P (B|A) = 0.8, P (B|Ā) = 0.3, and P (A) = 0.2.
Discrete Mathematics Page 46

Exercise 2.2.3 (*). If P (A) = 2/3 and P (B) = 3/4, what is the largest and smallest
P (B|A) can be? Answer the same question for P (A|B).

Solution. Since P (A) and P (B) are fixed, in view of formula (2.4) it suffices to find
the maximal and minimal values of P (A ∩ B). This was done in Example 2.9. It was
obtained that 5/12 ≤ P (A ∩ B) ≤ 2/3. Dividing by P (A) and P (B) respectively
we obtain 5/8 ≤ P (B|A) ≤ 1 and 5/9 ≤ P (A|B) ≤ 8/9.

Exercise 2.2.4 (*). The following questions may shed some light on why many
people believe that past performance of coin-tossing (or repeatable random events
in general) affects future performance. In the answers define the sample space and
probability on it first.

(a) A fair coin was tossed 6 times and heads came up (exactly) 4 times. If the coin
is tossed again, what is the probability that the seventh toss will be a head?

(b) A fair coin is tossed 6 times and heads came up (exactly) 4 times. What is the
probability that the 6th toss was a head?

(c) A fair coin is tossed 6 times and heads came up (exactly) 4 times. It is also
known that 2 of the first 3 tosses were heads. What is the probability that the
6th toss was a head?

Solution. (a) It may be intuitively clear that the answer is still 1/2 but let us formally
show this. In total we have 7 tosses so the probability space consists of 27 = 128
strings such as HT HHT HT . Each point in sample space is equiprobable with
probability 1/128. Let A be the event that during the first 6 tosses heads came up 4
times. This means that A includes  only strings with two T ’s in the first 6 positions.
6
The number of such strings is 2 = 15 -the number of unordered selections without
repetition (the positions of T ’s). In addition we have 2 choices for the outcome of
the 7th toss. Therefore P (A) = (15 × 2)/128 = 15/64. Let B be the event that the
7th toss is a head. The intersection A ∩ B consists of those strings in A that end
with a H. There are half as many of them and their (unconditional) probability is
P (A ∩ B) = 15/128. By the general rule P (B|A) = P (A ∩ B)/P (A) = (15/128) :
(15/64) = 1/2 as we suspected.
(b) We have 6 tosses and the probability space contains 26 = 64 points each
having the probability 1/64. Let A be the event that during the first 6 tosses heads
came up 4 times. As was computed in the previous solution |A| = 15 and therefore
P (A) = 15/64. Let B be the event that the 6th toss was a head. The strings in
A ∩ B have 2 T ’s which have to be located in the first 5 tosses. The number of such
strings is |A ∩ B| = 52 = 10 and therefore P (A ∩ B) = 10/64 = 5/32. By formula
(2.4) P (B|A) = 2/3.
Page 47 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 2.2.5 (*). A box contains 10 items of type I, of which 3 are defective,
and 20 items of type II, of which 5 are defective. An item is chosen at random from
the 30 items in the box. Let A be the event that the item is of type I and B be
the event that the item is defective. Find the following probabilities: P (A), P (Ā),
P (B), P (A ∩ B), P (A ∪ B), P (A|B), P (B|A). Are A and B mutually exclusive?
Are A and B independent?

Solution. Here are some answers: P (A) = 10/30 = 1/3, P (A ∪ B) = (10 + 5)/30 =
1/2, P (A ∩ B) = 3/30 = 1/10.

2.3 Random Variables and Probability Distribu-


tions
Definition 2.17. A random variable is any real-valued function X : S → R on a
sample space.

Here are some examples.

Example 2.18. Flip a coin three times. The sample space S consists of all sequences
of 3 elements each of which is either H or T . Define

X(s) = number of heads in the sequence s.

For example X(HT H) = 2. Then X is a random variable.

Example 2.19. Consider the usual sample space for tossing two dice:

S = {(i, j) | 1 ≤ i, j ≤ 6} .

Let X be the sum of the dots when the toss (i, j) occurs. That is X(i, j) = i + j.
X is a random variable.

The value of a random variable X(s) is also “random” in the sense that we can
compute P (X = x) for any possible value x of X according to the rule:

P (X = x) = P ({s | s ∈ S, X(s) = x}) .

The function f (x) = P (X = x) defines probability on a new sample space — the


set of possible values of X, and is called the probability distribution associated with
X. Probability distributions are often referred to as probability mass functions.
Discrete Mathematics Page 48

Example 2.20 (*). Let S be the usual equiprobable, ordered-pair sample space
for tossing two dice. Let X(i, j) = i + j. Determine the real-number sample space
associated with X and determine its probability distribution.
Solution. The new sample space is the range of X: {2, 3, 4, . . . , 12}. The probability
distribution on it is
1
f (2) = P (X = 2) = P ({(1, 1)}) =
36
2
f (3) = P (X = 3) = P ({(1, 2), (2, 1)}) =
36
... = ...
6
f (7) = P (X = 7) = P ({(1, 6), (2, 5), . . . , (6, 1)}) =
36
... = ...
1
f (12) = P (X = 12) = P ({(6, 6)}) = .
36

Example 2.21 (*). The experiment consists of flipping a biased coin three times. A
biased coin has a probability of heads on a flip given by some number p ̸= 1/2. The
probability of tails is q = 1 − p. For the experiment at hand we have the following
probability function

s HHH HHT HTH HTT TTT TTH THT THH


P(s) p3 p2 q p2 q pq 2 q3 pq 2 pq 2 p2 q
Consider a random variable X : S → R defined so that X(s) =
number of heads in the sequence s. Determine the associated sample space and
probability distribution.
Solution. The associated sample space is the space of values of X: Ran(X) =
{0, 1, 2, 3}. The probability distribution is

P (X = 0) = P ({T T T }) = q 3 ,
P (X = 1) = P ({HT T, T HT, T T H}) = 3pq 2 ,
P (X = 2) = P ({HHT, HT H, T HH}) = 3p2 q ,
P (X = 3) = P ({HHH}) = p3 .

Here are two examples of probability distributions that show up frequently in


applications.
Page 49 Dr Matteo Capoferri & Dr Adrian Turcanu

The finite uniform distribution

If S ⊂ R and |S| = n, then a uniform probability distribution f (s)


on S is defined by the condition that for any s ∈ S we have f (s) = n1 .
This is the distribution for which all values s are equiprobable.

The binomial distribution

An experiment with probability p of success and q := 1 − p of failure,


is repeated independently n times. Such an experiment is called a
sequence of Bernoulli trials. We seek the distribution Bn,p (k) which
gives the probability that exactly k trials are successes.
The natural sample space for Bernoulli trials consists of n-tuples like
SSFSFFF... where S means success and F means failure. We map
this space over to the real numbers {0, 1, 2, ..., n} by grouping together
all n-tuples that have the same number of successes. Since the trials
are assumed to be independent, the appropriate probability model
involves multiplications of probabilities for each trial. That is, if an
n-tuple contains k S’s and n − k F’s, then the probability of the  tuple
k n−k n
is p q . Finally, the number of n-tuples with k S’s is k — the
number of unordered selections without repetitions of k objects out
of n (the positions of S’s in the string). Thus the desired distribution
is  
n k n−k
Bn,p (k) = p q . (2.7)
k
Let us check that all probabilities Bn,p (k) sum up to one. We have
n n  
X X n
Bn,p (k) = pk q n−k = (p + q)n = 1n = 1
k=0 k=0
k

where we used formula (1.4) (the latter is often referred to as the


Binomial Theorem).

Example 2.22 (*). You are given a 30 question multiple choice test. For each
answer there are 5 choices. You loose 1/4 as much credit for each wrong answer as
you gain for each right answer. You are time stressed and decide to guess at random
the answers to 10 problems. What are the chances that by doing that you will raise
your total score?
Solution. Suppose you guessed r answers correctly and thus 10 − r incorrectly. The
change in your credit is
10 − r
r−
4
Discrete Mathematics Page 50

where we included the penalties for wrong answers. To improve your score you
need the above number to be positive (an increase in credit). This implies r > 2
that is you must guess correctly more than 2 answers. What are the chances of
that? Since for each problem (each trial) the probability of success is 1/5 = 0.2
we are dealing here with 10 Bernoulli trials with the probability of success of each
trial p = 0.2. Therefore the probability of r successes is given by the Bernoulli
distribution B10,0.2 (r). The final answer is then
10
X
P ( improving the score ) = B10,0.2 (r) .
r=3

There are tables/computer programs available for calculating the values of binomial
distributions. Using any of those one can obtain a numerical answer: P ≈ 0.3222
that is the chances of success are approximately 1 out of 3.

There are lots of ways to generate new distributions from a given set of distri-
butions. For example if X and Y are two random variables so is X + Y , X · Y or
Z = max(X, Y ) simply because each one is also a function on the sample space.
Such new random variables have probability distributions of their own which can be
computed by the usual general rule.

2.3.1 Exercises

Exercise 2.3.1. A fair die is rolled twice. Let X give the maximum value of the
two outcomes. Determine the probability distribution associated with X.

Exercise 2.3.2 (*). A fair coin is tossed 3 times. A random variable X gives
the number of heads minus the number of tails, e.g. X({T HT }) = 1 − 2 = −1.
Determine the probability distribution associated with X.

Solution. The variable X takes only 4 values: −3, −1, 1, 3. The probability distri-
bution for X is

f (−3) = P (X = −3) = 1/8 = P (X = 3) = f (3),

f (−1) = P (X = −1) = P ({T T H, T HT, HT T }) = 3/8,


f (1) = P (X = 1) = P ({HHT, HT H, T HH}) = 3/8.
Page 51 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 2.3.3. A fair die is rolled twice. Let X1 and X2 be the random variables
giving the outcomes of the first and second roll respectively. Find the probability
distributions for the following random variables which are constructed from X1 and
X2 :

(a) (*) Z = X1 − X2 ;

(b) U = max(X1 , X2 );

(c) V = min(X1 , X2 ).

Solution. (a) The sample space consists of ordered pairs of integers each between
1 and 6 which are equiprobable with probability 1/36. The values of Z are
5, 4, 3, 2, 1, 0, −1, −2, −3, −4, −5. The associated probability distribution is

P (Z = 5) = P ({(6, 1)}) = 1/36 , P (Z = 4) = P ({(6, 2), (5, 1)}) = 2/36 ,


P (Z = 3) = P ({(6, 3), (5, 2), (4, 1)}) = 3/36 ,
P (Z = 2) = P ({(6, 4), (5, 3), (4, 2), (3, 1)}) = 4/36 ,
P (Z = 1) = 5/36 , P (Z = 0) = 6/36 ,

and P (Z = −n) = P (Z = n).

Exercise 2.3.4. A biased coin is flipped 4 times. The probability of the coin showing
‘heads’ is p = 3/4. Establish the probability that

(a) exactly 3 flips show ‘heads’;

(b) (*) at least three flips show ‘heads’;

(c) there have been as many ‘heads’ as ‘tails’.

Solution. (b) The allowed outcomes are

A = {HHHT, HHT H, HT HH, T HHH, HHHH}.

Hence, we have
 3  4
3 4 3 1 3 189
P (A) = 4p q + p = 4 × + = ≈ 0.74 .
4 4 4 256

Exercise 2.3.5. You are given a multiple choice test. For each answer there are
Discrete Mathematics Page 52

4 choices. You ran out of time and decide to guess the answers to 6 problems at
random. What are the chances that you will guess half the answers correctly?

Exercise 2.3.6 (*). On a multiple choice test each answer has 4 choices. You loose
1/2 as much credit for each wrong answer as you gain for each right answer. What
are the chances that you will raise your total score by guessing at random answers
to 6 problems?

Solution. Suppose you guessed correctly r answers. Then your credit has changed
by
6−r
Credit change = r −
2
where the penalty for wrong answers is included. For the credit change to be positive
we need r ≥ 3. We are dealing here with 6 Bernoulli trials with p = 1/4 = 0.25.
Hence the probability of having 3 or more successes is
6    r  6−r
X 6 1 3
P (r ≥ 3) = .
r=3
r 4 4

The numerical value can be computed using calculator or Maple: P (r ≥ 3) ≈ 0.17


that is the chance of improving the score is close to 17%.

2.4 Expected value and variance


Definition 2.23. The expected value of a random variable X, denoted E(X), is
defined as X
E(X) := x · P (X = x) .
x
The expected value of a random variable X is often called expectation, mean or
average and is often denoted X̄. The expected value is important because it serves
as a predictor for the average outcome of a random experiment.

Example 2.24 (*). Find the expected value for an equiprobable random variable
(uniform distribution).
Solution. By definition we have
n n n
X X 1 1X
E(X) = xi P (X = xi ) = xi = xi
i=1 i=1
n n i=1
We see that for an equiprobable variable the expectation is the same as the average
of its values xi .
Page 53 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 2.25 (*). Find the expectation E(X) for the random variable described
in example 2.20.
Solution. Using the probability distribution computed in example 2.20 we obtain
12
X
E(X) = i · P (X = i)
i=2
1 2 3 6 5 2 1
= 2· +3· +4· + ··· + 7 · +8· + · · · + 11 · + 12 ·
36 36 36 36 36 36 36
252
= = 7.
36

Example 2.26 (*). Find the expectation E(Bn,p ) for the binomial distribution
Bn,p (k).
Solution. [NON EXAMINABLE] By definition the answer is
n  
X n k n−k
E(Bn,p ) = k p q .
k=0
k

Can this be simplified? Indeed we will show that

E(Bn,p ) = np .

To that end we will use recursion — the technique we will study in a lot more detail
later in the course. In this example it works as follows. Let En = E(Bn,p ). Notice
that after we ran the first trial there are n − 1 remaining trials and we have the
situation of Bn−1,p . With probability q the first trial was a failure and thus the only
successes will be those in the remaining trials. In this case we expect En−1 successes.
On the other hand, if the first trial was a success (the probability for that is p) then
we expect 1 + En−1 successes. In short

En = qEn−1 + p(En−1 + 1) = En−1 + p .

This means that the numbers En form an arithmetic sequence. Evidently E1 =


0 · q + 1 · p = p. And therefore E2 = E1 + p = 2p, E3 = E2 + p = 3p, and so on with
En = np.

At the end of last subsection we talked about how new random variables can be
constructed in terms of other random variables. We will discuss now how, in some
simple cases, the expectations of such composite random variables can be expressed
Discrete Mathematics Page 54

via expectations of the original variables.

Theorem 2.27. Let X1 and X2 be random variables on sample space S. Then

E(X1 + X2 ) = E(X1 ) + E(X2 ) .

Similarly for any number of random variables X1 , X2 , . . . , Xn we have

E(X1 + X2 + . . . Xn ) = E(X1 ) + E(X2 ) + · · · + E(Xn ) . (2.8)

The proof will be given in class.

The above theorem can be applied to Bernoulli trials. Let X be the total number
of successes. For each trial i define a binary random variable Xi so that

1 if trial i is a success ,
Xi = (2.9)
0 if trial i is a failure .

Now note that


n
X
X= Xi . (2.10)
i=1

The whole point of introducing Xi is that they are simple to analyse yet they can
be summed to get X. For each i, E(Xi ) = 0 · q + 1 · p = p. Thus by (2.8) we get
E(X) = E(Bn,p ) = np, as we have already established by a different method.

Theorem 2.27 can be easily generalised from sums to linear combinations of


random variables.

Theorem 2.28. Given a collection of random variables X1 , X2 , . . . , Xn defined on


the same sample space and a collection of real numbers C1 , C2 , . . . , Cn we have

E(C1 X1 + C2 X2 + · · · + Cn Xn ) = C1 E(X1 ) + C2 E(X2 ) + · · · + Cn E(Xn ) .

For instance E(2X) = 2E(X) and E(X − Y ) = E(X) − E(Y ).

What about E(X · Y )? As it turns out, the latter equals E(X)E(Y ) only if X
and Y are independent, where two random variables X and Y are called independent
if two events {X = x} and {Y = y} are independent for any pair of values (x, y).
Thus for independent variables we have

P (X = x and Y = y) = P (X = x)P (Y = y) . (2.11)

The purpose of expected value is to summarise a lot of information about a


random variable in a single number. But no one number can tell the whole story
about a random variable. Consider the following example.
Page 55 Dr Matteo Capoferri & Dr Adrian Turcanu

Example 2.29. Let X and Y be two random variables defined so that


1
X = 1, 2, 3 each with probability
3
and
2 1
Y = 0, 6 with P (Y = 0) = , P (Y = 6) = .
3 3
It is easy to compute that the expected values for both variables are equal to 2.
However, the values of X are much closer to 2 than the values of Y are.

In short, expected value is a measure of the center. We also need a measure of


dispersion around the center. Such measure is provided by the variance Var(X)
which is defined as

Var(X) = E([X − E(X)]2 ) = E(X 2 ) − [E(X)]2 . (2.12)

(Why is the last equality in the above equation true? )


A related measure of dispersion is the standard deviation of X denoted by σX
and defined to be the square root of the variance
p
σX = Var(X) .

Coming back to Example 2.29 above let us compute the variances and standard
deviations of X and Y . We have
1 1 1 2
Var(X) = (1 − 2)2 + (2 − 2)2 + (3 − 2)2 =
3 3 3 3
and
2 1 24
Var(Y ) = (0 − 2)2 + (6 − 2)2 = = 8.
3 3 3
p √
The standard deviations are σX = 2/3 ≈ 0.816, σY = 8 ≈ 2.828. We see that
the standard deviation of Y is a lot bigger than that of X, reflecting the qualitative
observation that the values of Y more spread around the average than those of X.

2.4.1 Exercises

Exercise 2.4.1. Let X be an equiprobable random variable that takes 4 values


X = {1, 2, 3, 4}. Find E(X) and Var(X).
Discrete Mathematics Page 56

Exercise 2.4.2 (*). Suppose you roll a fair die 3 times. Let the random variable
N1 be the number of times that the outcome on a roll is 1. Find the probability
distribution of N1 and its expected value E(N1 ).

Solution. N1 takes the values 0, 1, 2, 3. The sample space is equiprobable consisting


of triplets of numbers from 1 to 6, each assigned probability 1/63 = 1/216. The
probability distribution associated with N1 is computed to be f (0) = 53 /63 , f (1) =
(3 × 5 × 5)/63 , f (2) = (3 × 5)/63 , f (3) = 1/63 . Let us check that the probabilities
sum up to one:
53 3 × 5 × 5 3 × 5 1
+ + + = 1.
63 63 63 63
The expected value is now computed as

53 3×5×5 3×5 1 108 1


E(N1 ) = 3
×0+ 3
×1+ 3 ×2+ 3 ×3= = .
6 6 6 6 216 2

Exercise 2.4.3 (*). For the random variables Z, U , V defined in Exercise 2.3.3
find the expected values and variances.
Solution. Here we show how to compute the expected value and variance for the
random variable Z; the calculations for U and V are analogous.
The distribution for this variable was computed in Exercise 2.3.3, part (a). By
the general rule we have
5
X
E(Z) = n × P (Z = n) .
n=−5

Now, we established in Exercise 2.3.3, part (a), that the probability distribution is
symmetric with respect to flipping the sign of the value: P (Z = n) = P (Z = −n).
Therefore, the expectation value vanishes.
The variance is computed as
5
X 5
X
2
Var(Z) = n × P (Z = n) = 2 × n2 × P (Z = n) ,
n=−5 n=1

where in the last step we used the symmetry of the distribution with respect to
flipping the sign. A straightforward computation now yields

2 5 2 4 2 3 2 2 2 1
Var(Z) = 2 1 × +2 × +3 × +4 × +5 ×
36 36 36 36 36
210
= ≈ 5.833
36

The standard deviation is σZ = 5.833 ≈ 2.415.
Page 57 Dr Matteo Capoferri & Dr Adrian Turcanu

Exercise 2.4.4. Let X be a random variable. What is E(X − E(X))?

Exercise 2.4.5. Determine whether the variables U and V from Exercise 2.3.3 are
independent.

Exercise 2.4.6 (*). Using representation (2.9), (2.10) find the variance and stan-
dard deviation of the binomial distribution Bn,p (k). You will need to use that Xi
and Xj are independent for i ̸= j. Is this obvious?

Solution. Recall that the expectation value was computed earlier, just after formula
(2.10). To compute the variance we note that by formula (2.12) we have

Xn Xn
Var(Bn,p ) = E[( Xi )2 ] − [E( Xi )]2 . (2.13)
i=1 i=1

Using that E(Xi ) = p and expanding the square in the first term we obtain
n X
X n
Var(Bn,p ) = E(Xi · Xj ) − (pn)2 . (2.14)
i=1 j=1

In the first sum we can single out the terms with i = j and use that Xi2 = Xi :
X X
Var(Bn,p ) = E(Xi · Xj ) + E(Xi ) − (pn)2
i̸=j i
X
= E(Xi · Xj ) + np − (pn)2 .
i̸=j

Assuming that Xi and Xj are statistically independent we have

E(Xi · Xj ) = E(Xi ) · E(Xj ) = p2

and substituting this in the previous formula we obtain


X
Var(Bn,p ) = p2 + np − (pn)2 = n(n − 1)p2 + np − (np)2
i̸=j
= np(1 − p) = npq . (2.15)

This method has the advantage of avoiding the complicated combinatorics related
to the binomial coefficients that enter the definition of Bn,p .
Discrete Mathematics Page 58

2.5 Examples of applications to algorithms


2.5.1 Algorithm MaxNumber [NON EXAMINABILE]
Given n numbers {x1 , x2 , . . . , xn } the algorithm MaxNumber finds the maximum
number by sequentially searching the list. It first assigns the variable m = x1 then
at each step it picks a number xi compares with the current value of m. If xi is larger
than m, m is reassigned to be xi and the algorithm proceeds to xi+1 otherwise no
reassignment is made and the algorithm proceeds straight to xi+1 . After the whole
list is done the last value of m is the maximum of all numbers in the list.

Question: For a randomly distributed initial data how many times on average
m will be assigned/reassigned?

Answer: Before we tackle the question let us ask ourselves: what are the best
and the worst scenarios? Clearly if x1 is the maximum m will be only assigned
ones and never reassigned. While if all the numbers happen to be lined up in the
increasing order we will have n (re)assignments. So the average number should be
somewhere in between. Since what matters to the problem is only the order in which
n distinct numbers are arranged, and not the precise magnitudes of these numbers,
we can take our sample space to be the set of all permutations of the numbers
{1, 2, 3, . . . , n}. Furthermore it is reasonable to assume that each permutation is
equiprobable. Let Y be the random variable defined on our sample space which
gives for each permutation the number of times m is assigned by the algorithm.
We want E(Y ). It is convenient to represent Y as a sum of simpler variables Xk .
Specifically, let Xk be the binary variable which is 1 when MaxNumber assigns m
at k-th spot (while examining xk ) and 0 if it doesn’t. Clearly
n
X
Y = Xk .
k=1

We further note that


(k − 1)! 1
E(Xk ) = P (Xk = 1) = = ,
k! k
where we used the fact that there are (k − 1)! different permutations of k numbers in
which the last number is fixed to be maximum. By the sum theorem for expectations
we find n n
X X 1
E(Y ) = E(Xk ) = .
k=1 k=1
k
For large values of n this sum is well approximated by the value of the natural
logarithm ln(k). Compare for example ln 1000 ≈ 6.907 with the exact value
1000
X 1
= 7.485...
k=1
k
Note how small the average number of assignments is for a fairly large sample.
Page 59 Dr Matteo Capoferri & Dr Adrian Turcanu

2.5.2 Algorithm SeqSearch [NON EXAMINABILE]


Given an alphabetically ordered list of n words algorithm SeqSearch searches the
list sequentially for a given word. At each step it compares an i-th item on the list
with the word to find and if it does not match it goes to the i + 1-th word on the
list and so on.

Question: What is the average number of comparisons (when the search word
is guaranteed to be on the list)?

Answer: It is natural to assume that any position of the search word on the list is
equiprobable, that is the probability for 1 comparison, 2 comparisons, 3 comparisons,
etc. is 1/n. By the general rule for computing averages we obtain
1 n+1
Average number of comparisons = (1 + 2 + 3 + · · · + n) = (2.16)
n 2
where the last equality can be proved by mathematical induction.
Clearly the algorithm SeqSearch is very simplistic. It does not make any use of
the fact that the list is alphabetized. A much more efficient search algorithm is the
algorithm BinSearch (short for binary search), which works as follows. The algo-
rithm starts by looking at the (approximate) middle of the list. If it does not find
the right word it determines, using the fact that the list is alphabetized, in which
half the right word must be. It next discards the other half and looks again by going
to the middle of the remaining list. Clearly this algorithm must perform much faster
than SeqSearch because at each step the list shortens by half (approximately). A
good measure of how fast each algorithm performs is the average number of com-
parisons. It can be proved that for BinSearch it is log2 n − 1. Thus for a list of 1000
words SeqSearch will make on average 1001/2 ≈ 500 comparisons while BinSearch
will make only log2 (1000) − 1 ≈ 9 comparisons.

2.5.3 Algorithm Permute [NON EXAMINABILE]


This algorithm produces a random permutation of n numbers. The algorithm as-
sumes that the computer language for its implementation has a built-in random
numbers generator Rand[m, n] that on each call picks an integer between m and n
inclusive, so that each choice is equally likely and independent of all previous choices.
(Strictly speaking a computer, being a deterministic device, cannot generate a true
random number. What it actually does it generates pseudorandom numbers which
look random enough for all essential statistical purposes but which aren’t actually
random). Since we want to produce a sequence of random numbers from 1 to n, it
seems reasonable to invoke Rand[1, n]. We must however ensure that the numbers in
the string are not repeated. The algorithm Permute keeps record of which numbers
have already been picked, and each time calling Rand[1, n] produces a repetition
Permute throws it away until it gets a new number. It continues until all numbers
between 1 and n have been picked once.
Discrete Mathematics Page 60

Question: What is the average number of time the random number generator
Rand is called.

Answer: Let us first derive what is the average number of Rand calls to pick
the (k + 1)st number. The probability of each outcome in Rand is 1/n. The number
of points in the event ‘a new number is produced’ is n − k (because k numbers have
been already picked). Thus the chance of ‘success’ (in generating a new number) is
n−k
n
. Now we see that we are dealing here with Bernoulli trials with p = n−k n
and
the question is what is the average number of trials to achieve success? The number
of trials until success is itself a random variable, call it X. What is the probability
that X = m? Evidently it is q m−1 p — the probability for m − 1 failures and one
success in the end. The desired average number of trials is then given by an infinite
sum

X
Average number of trials to pick the k + 1 number = mq m−1 p , (2.17)
m=1

where p = n−kn
. Using some algebra (the formula for the sum of a geometric se-
quence) it can be found that for a finite sum we have
M
X 1 1
mq m−1 p = − (M + )q M . (2.18)
m=1
p p

For very large M ’s the second piece is negligible because q M becomes very small,
overpowering M . Dropping the second term in (2.18) we arrive at the following
result
1 n
Average number of trials to pick the k + 1 number = = . (2.19)
p n−k
As a final step we have to sum the average numbers of Rand calls for each number
k + 1 in the permutation. We obtain
n−1 n
X n X 1
Average number of Rand calls = =n . (2.20)
k=0
n−k j=1
j

Thus in order to generate a random permutation of n = 1000 numbers the function


Rand will be called on average 1000 × 7.485 = 7485 times. For very large n’s
the average number of Rand calls grows as n log n. The algorithm Perm is not
very efficient. It spends a lot of time picking things that get discarded. A more
sophisticated algorithm exist for which the average number of Rand calls is 4n that
is smaller than n log n for large n’s.
Chapter 3

Graph Theory

3.1 Introduction to Graphs


A graph is a mathematical abstraction that is useful for solving many kinds of
problems. A graph consists of a set of elements called vertices and a list of unordered
pairs of these elements called edges. More formally, a graph G = (V, E) consists of
a nonempty set set V of vertices and a list E of unordered pairs of elements from
V.
For example, the graph G with V = {a, b, c, d} and
E : {a, b}, {a, d}, {b, d}, {c, d} .
is a graph with four vertices and four edges.
Note that the same edge can appear several times in a list, and edges connecting
a vertex to itself (loops) may also appear.
We can obtain a diagram of a graph G by using dots as the vertices and lines or
curves as the edges connecting the appropriate dots. The diagram in Fig. 1 is one
representation of the graph G described above.

a a e5

e3 e4
d
b e1
e2 c
c b
Fig 1 Fig 2

We can present graphs by showing only their diagrams. For example, we can read
off Fig. 2 that the diagram therein represents the graph with vertex set V = {a, b, c}
and edge list
E : {a, b}, {a, c}, {a, a}, {c, b}, {c, b} .

61
Discrete Mathematics Page 62

To distinguish the multiple edges they can be equipped with additional labels. Thus
for the graph depicted in Fig. 2 we can label the edges {e1 , e2 , e3 , e4 , e5 } as shown
in the picture. With each label we associate the pair of vertices the corresponding
edge connects. For example e2 is associated with {b, c} and e5 with {a, a}. These
vertices are called the end points of the corresponding edge.

Definition 3.1. A graph is called simple if it has no loops and no multiple edges.

When edges and vertices are removed from a graph, without removing endpoints
of any remaining edges, a smaller graph is obtained. Such a graph is called a subgraph
of the original graph. The vertices of the subgraph form a subset of the vertices of
the original graph and the edges of the subgraph appear in the edge list of the
original graph.
For example a graph with vertices V = {a, b} and edges E : {a, a}, {a, b}, de-
picted below, is a subgraph of the graph from Fig. 2.

a e5

e3

3.1.1 Exercises

Exercise 3.1.1 (*). Draw the diagram corresponding to the graph given by V =
{a, b, c, d} and
E : {a, b}, {c, d}, {a, d} .

Solution. See Fig. 1 below.

Exercise 3.1.2 (*). Draw the diagram corresponding to the graph given by V =
{a, b, c, d, e f } and

E : {a, f }, {a, e}, {b, c}, {b, f }, {c, f }, {d, e}, {e, f } .

Solution. See Fig. 2 below.


Page 63 Dr Matteo Capoferri & Dr Adrian Turcanu

a a b c
b

c f
d d e

Fig 1 Fig 2

3.2 Adjacency Matrix


We shall often make use of the notation V (G) for the set of vertices of a graph G
and E(G) for the list of edges. If {v, w} is an edge of a graph G then we say
that the vertices v and w are adjacent. We will also say that {v, w} is an edge
connecting the vertices v and w.
Definition 3.2. Let G be a graph with vertex set V = {v1 , v2 , . . . , vn } . The
adjacency matrix of G is the n × n matrix A = (aij ) defined by
aij = the number of edges connecting vi and vj .

Example 3.3. Suppose that


 
0 1 1 0
1 0 0 1
A=
1

0 0 1
0 1 1 0

is the adjacency matrix of a graph G with vertex set V (G) = {v1 , v2 , , v3 , v4 } .


Since a12 = 1 the vertices v1 and v2 are adjacent. Since a32 = 0 the vertices v3
and v2 are not adjacent. Continuing this analysis we find that

E(G) : {v1 , v2 }, {v1 , v3 }, {v2 , v4 }, {v3 , v4 } .

Example 3.4 (*). Find the adjacency matrix for the graph from Fig. 2.
Solution. Ordering the vertices alphabetically we obtain
 
1 1 1
A = 1 0 2 .
1 2 0
Discrete Mathematics Page 64

Note that the adjacency matrix of a graph is symmetric, that is, aij = aji (the
matrix is symmetric upon reflecting across the diagonal).

For a simple graph all entries aij are either 0 or 1. Also, since for a simple graph
an edge does not connect a vertex to itself, we have that in this case the diagonal
elements vanish: aii = 0 .

Definition 3.5. Let v be a vertex of a graph G . The degree of v, denoted by


deg(v) , is the number of edges of G having v as an endpoint with each loop (v, v)
contributing twice.

Example 3.6. For the graph in Fig. 1, deg(a) = 2, deg(b) = 2, deg(c) = 1 and
deg(d) = 3.

By examining structure of the graph, we notice that each edge in a graph has
two ends. Hence, each edge must contribute 2 to the sum of the degrees. This
means that the sum of the degrees of the vertices is twice the number of edges. This
observation is formalised by the following theorem.

Theorem 3.7 (Handshaking Theorem). Let G = (V, E) be a graph with vertex set
V = {v1 , v2 , . . . , vn } . Then

deg(v1 ) + deg(v2 ) + · · · + deg(vn ) = 2|E|

The name “Handshaking Theorem” comes from the fact that a graph can be
used to represent a group of people shaking hands.

3.2.1 Exercises

Exercise 3.2.1 (*). Find the adjacency matrix of the graph in Fig. 1.

Solution.  
0 1 0 1
1 0 0 1
A=
0
.
0 0 1
1 1 1 0

Exercise 3.2.2. Find the adjacency matrix of the graph depicted below:
Page 65 Dr Matteo Capoferri & Dr Adrian Turcanu

b c

a d

Exercise 3.2.3. Verify the Handshaking Theorem for the graph given in the pre-
vious exercise.

Exercise 3.2.4 (*). What can you say about the sum of numbers in any row or
column of an adjacency matrix?

Solution. Corresponding to each n in the ith row (or column) of an adjacency matrix,
there are n edges that have the vertex vi as an endpoint. Hence the sum of numbers
in the ith row (or column) of an adjacency matrix is equal to deg(vi ).

Exercise 3.2.5. A graph has four vertices of degree 3 and three vertices of degree
2. How many edges does it have?

Exercise 3.2.6 (*). A graph G = (V, E) has 23 edges and every vertex of G has
degree four or larger. What is the largest possible number of vertices that G can
have?

Solution. Let x be the number of vertices of G . By the Handshaking Theorem, the


sum of the vertex-degrees is 46. Hence 46 ≥ 4x so 11.5 ≥ x . The largest possible
number of vertices (an integer) is 11.

Exercise 3.2.7. A simple graph has vertices of degree 4, 3, 3, 2, 2. How many


edges does it have? Draw such a graph.
Discrete Mathematics Page 66

Exercise 3.2.8 (*). Prove that in any graph the number of vertices of odd degree
is even.

Solution. By the Handshaking Theorem, the sum of the vertex-degrees is even. If


there was an odd number of vertices of odd degree then the sum of the vertex-degrees
would be odd, a contradiction. Hence the number of vertices of odd degree is even.

Exercise 3.2.9 (*). Let G be a simple graph with at least two vertices. Prove that
G has two vertices of the same degree.

Solution. Let G be a graph with n vertices. The maximum degree of any vertex is
n − 1 so the possible vertex degrees range from 0 to n − 1.
It is not possible for both 0 and n − 1 to be degrees, since a vertex of degree
n − 1 would be adjacent to all the other vertices including the one of degree 0. It
follows that there are at most n − 1 degrees to be attached to n vertices, so at least
two vertices have the same degree.

You might also like