Notes Week7
Notes Week7
Department of Mathematics
Heriot-Watt University
AY 2022–2023
This course is in five parts. The first part is concerned with the foundations of
set and function theory, alongside elements of combinatorics. The second part deals
with probability theory and applications. The third part consists in Graph Theory,
indispensable for students of computer science and applied mathematics. The fourth
part is concerned with recurrence relations and, finally, the fifth part provides an
introduction to matrix theory and linear algebra.
All the teaching materials for the course are available on Canvas.
All you need is for the course is contained in these lecture notes and in the accom-
panying Problem Sets, which will gradually become available on Canvas. However,
if you wish to consult a textbook, you may refer to
• K. H. Rosen, Discrete Mathematics and its application (the graph theory chap-
ters are particularly useful)
2 Probability 36
2.1 Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2 Conditional Probability and Independence . . . . . . . . . . . . . . . 42
2.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Random Variables and Probability Distributions . . . . . . . . . . . . 47
2.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Expected value and variance . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Examples of applications to algorithms . . . . . . . . . . . . . . . . . 58
2.5.1 Algorithm MaxNumber [NON EXAMINABILE] . . . . . . . . 58
2.5.2 Algorithm SeqSearch [NON EXAMINABILE] . . . . . . . . . 59
2.5.3 Algorithm Permute [NON EXAMINABILE] . . . . . . . . . . 59
2
Page 3 Dr Matteo Capoferri & Dr Adrian Turcanu
3 Graph Theory 61
3.1 Introduction to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 1
1.1 Sets
Definition 1.1. A set is an unordered collection of objects.
The notation A = {1, 3, 5, 7} means that the set A contains the four elements
1, 3, 5, 7 . We write a ∈ A to denote that a is an element of the set A . We write
a∈ / A to indicate that a does not belong to A . For example, given A = {1, 3, 5, 7}
we have that 5 ∈ A and 6 ∈ / A.
There are a number of ways of denoting sets. Thus for the set of positive even
numbers we could write
A = {2, 4, 6, 8, . . .}
or
A = {x | x is a positive even number} ,
where the vertical bar | is to be read “such that”.
A set cannot contain the same object more than once. Also, the elements in a
set are not ordered so the sets {a, b, c} and {c, a, b} are the same. In general, two
sets are equal if they contain the same elements.
3 ∈ A, {3, 5} ∈ A, 5∈
/ A, {3, 6} ⊂ A, {3} ⊂ A.
4
Page 5 Dr Matteo Capoferri & Dr Adrian Turcanu
The set that has no elements is called the empty set and is denoted by ∅ . By
convention ∅ ⊂ A for any set A.
Remark 1.2. Note that the set ∅ and the set B = {∅} are different. The single
element of B is the empty set itself.
Example 1.5. Let A = {1, 2, 3} and B = {x, y} . Find the Cartesian product
A×B.
Solution. The Cartesian product A × B is
A × B = { (1, x), (1, y), (2, x), (2, y), (3, x), (3, y) }.
1.1.1 Exercises
Solution. We have
Solution. B × A = { (x, 1), (x, 2), (x, 3), (y, 1), (y, 2), (y, 3) } .
A ∩ B := {x | x ∈ A and x ∈ B}.
Note that the statement x ∈ A or x ∈ B does not exclude the possibility that
x is an element of both sets. For example, we have
and
{1, 3, 5, 7} ∩ {2, 5, 7} = {5, 7} .
Remark 1.3. It is useful to picture combinations of sets with the aid of Venn dia-
grams. This will be done in the lectures.
Definition 1.8. If A and B are sets, then the difference of A and B is denoted
by A \ B and defined as
A \ B := {x | x ∈ A and x ∈
/ B}.
Thus A = {x | x ∈
/ A} .
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C
Page 7 Dr Matteo Capoferri & Dr Adrian Turcanu
A∪B =B∪A
A∩B =B∩A
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
(A ∪ B) = A ∩ B
(A ∩ B) = A ∪ B
Note that the empty set ∅ satisfies the following properties A ∪ ∅ = A and
A ∩ ∅ = ∅ for any set A.
1.2.1 Exercises
Exercise 1.2.3 (∗). What can you say about sets A and B if
(a) A∪B = A ? (b) A∩B = A ? (c) A∩B = B ∩A ? (d) A\B = A ?
− To find the bit string for A from the bit string for A we turn each 1 into a 0
and each 0 into a 1.
For the above example, the bit string for A is 0 1 1 0 0 0 so the bit string for A
is 1 0 0 1 1 1 .
− The k th bit in the bit string for A ∪ B is 1 if either (or both) of the bits in
the k th position of A and B is 1, and is 0 if both bits are 0.
− The k th bit in the bit string for A ∩ B is 1 if both of the bits in the k th
position of A and B are 1 , and is 0 if either (or both) of the two bits is 0.
1.3.1 Exercises
Solution. The bit string for A is 0 0 1 1 1 0 1 , and the bit string for B is 0 0 0 0 1 1 0 .
The bit strings for A , A ∪ B and A ∩ B are 1 1 0 0 0 1 0 , 0 0 1 1 1 1 1 and
0000100.
Page 9 Dr Matteo Capoferri & Dr Adrian Turcanu
Exercise 1.3.2 (∗). Which subsets of a finite universal set U do these bit strings
represent?
(a) the string with all zeros, (b) the string with all ones.
Exercise 1.3.3 (∗). Explain how to find the bit string corresponding to A \ B.
Solution. For A \ B , the k th bit is 1 if the k th bit of the first string is 1 and the
k th bit of the second string is 0, and is 0 otherwise.
1.4 Relations
Let A and B be sets. A relation between A and B links certain elements of A with
certain elements of B. More formally, a relation from A to B is a subset of A × B.
A relation on the set A is a relation from A to A.
1. Let A be the set of students at Heriot-Watt University and let B be the set
of modules. The relation R consists of pairs (a, b) , a ∈ A , b ∈ B with the
student a being enrolled to the course b.
2. Let R be the relation on the set of people consisting of pairs (a, b) where a is
a parent of b.
is a relation from A to B.
It is not symmetric since, for example, it contains (1, 2) but not (2, 1) .
It is antisymmetric since there is no pair of elements a and b with a ̸= b such
that (a, b) and (b, a) belong to R.
It is transitive. To check this we have to show that if (a, b) and (b, c) belong to
R, then so does (a, c) .
Example 1.13. As another example, let R be the relation on the set of people
consisting of pairs (a, b) where a likes b. Unfortunately it is not symmetric. It is
not antisymmetric or transitive. It is probably not even reflexive.
Page 11 Dr Matteo Capoferri & Dr Adrian Turcanu
Example 1.15. Let R be the relation on the set A = {1, 3, 5, 9, 11, 18} defined by
the pairs (a, b) such that a − b is divisible by 4.
Given a set A with an equivalence relation R on it, we can break up all elements in
A into disjoint groups (subsets) such that within each group all elements are related
between themselves but no two elements from two different groups are related. Such
groups of elements are called equivalence classes. For Example 1.15 the equivalence
classes are C1 = {1, 5, 9}, C2 = {3, 11}, C3 = {18}.
1.4.1 Exercises
Exercise 1.4.1 (∗). Let A = {2, 3, 4, 5, 6} and let R be the relation on A given by
Exercise 1.4.2 (∗). Let R be the relation on the set of real numbers consisting
of pairs (a, b) with a ≤ b. Determine whether the relation is reflexive, symmetric.
antisymmetric, or transitive.
Solution. This relation is reflexive because a ⩽ a. It is not symmetric because e.g.
2 ⩽ 3 but 3 ⩽̸ 2. It is antisymmetric because if a ⩽ b and b ⩽ a this does imply
a = b. It is transitive, that is a ⩽ b and b ⩽ c implies a ⩽ c.
1.5 Functions
Let X and Y be sets.
Definition 1.16. A function from X to Y is a rule that assigns to each element of
X exactly one element of Y . We write f : X → Y . The set X is called the domain
of f and the set Y is called the codomain of f .
Discrete Mathematics Page 12
Example 1.20. The next example is called the McCarthy 91 function and was
given by John McCarthy, one of the founders of artificial intelligence.
Here f : N → Z is defined for positive integers n by
n − 10 if n ≥ 101
f (n) =
f (f (n + 11)) if n ≤ 100
If we try to find f (1) directly, we put n = 1 in the definition. This shows that f (1)
depends on f (12) which in turn depends on f (23) etc. It seems hopeless to find
f (1) in this way. The trick is to work backwards. Putting n = 100 in the definition
gives
f (100) = f (f (111)) = f (101) = 91
Putting n = 99 in the definition gives
Continuing in this way we can show that f (n) = 91 for all n ≤ 100 .
In Example 1.17 the range of f is the set of all non-negative real numbers:
Ran(f ) = {y ∈ R | y ≥ 0}. In Example 1.18 the range is the set Ran(f ) = {3, 22}.
For the McCarthy 91 function (Example 1.20) the range is Ran(f ) = {y ∈ N | y ≥
91}.
Definition 1.21. A function is called surjective if its range is equal to its codomain.
None of the above examples defines a surjective function. However it is easy to
construct functions from the above examples which are surjective by simply modi-
fying the codomain. For instance f (x) = x2 becomes surjective if we consider it as
a function from all real numbers into the non-negative real numbers: f : R → R≥0 .
Definition 1.22. A function is called injective if no two distinct elements of the
domain have the same image.
In the above only the factorial function of Example 1.19 is injective.
(f ◦ g)(x) = f (g(x)) = f (3x + 4) = 2(3x + 4)2 = 2(9x2 + 24x + 16) = 18x2 + 48x + 32
while
(g ◦ f )(x) = g(f (x)) = g(2x2 ) = 3(2x2 ) + 4 = 6x2 + 4
1.5.1 Exercises
Discrete Mathematics Page 14
Exercise 1.5.1. For each of the following functions find their range. Determine
whether the function is surjective and whether it is injective.
Solution. (a) The range is R because for any y ∈ R we always have a solution to
y = f (x) = 2x + 1 given by x = (y − 1)/2. This also gives the inverse function
f −1 (y) = (y − 1)/2. Therefore by Theorem 1.25 f is injective and surjective.
(c) The function h is surjective. To show this take any non-negative integer n.
If n > 0 then let sn be a string of length n containing only 1’s. then h(sn ) = n.
If n = 0 then we have h(0) = 0 where we take the string containing a single 0.
This function is not injective. To demonstrate this it suffices to note that, e.g.,
s(101) = 2 = s(011).
Exercise 1.5.2 (*). Let the functions f , g and h have R as their domain and
codomain, and be defined as
(
2
1 if x ≥ 0
f (x) = 4x − 3 , g(x) = x + 1 , h(x) = .
0 if x < 0
Find rules for the following functions (with domain and codomain R)
(c) Since g(x) = 1 + x2 is always greater or equal than 0, h ◦ g(x) = 1 for all x.
Page 15 Dr Matteo Capoferri & Dr Adrian Turcanu
Exercise 1.5.3 (*). Find the inverse function for each of the following functions,
or explain why no inverse exists.
√
(a) g : R → R , g(x) = x2 ,
(b) j : S → S
where S is the set of non-empty finite strings of lower case letters, and j(s) gives
a string obtained by moving the last character to the beginning of the string, e.g.
j(yncrfm) = myncrf.
√
Solution. (a) This function is the same as the absolute value function g(x) = x2 =
|x|. Since g(−1) = 1 = g(1) this function is not injective and therefore by Theo-
rem 1.25 the inverse function does not exist.
(b) The inverse function j −1 (s) performs the inverse operation on the string: it
takes the first letter and moves it to the end of the string, e.g. j −1 (sdfj) = dfjs.
Suppose we want to find a formula for the sum of the first n odd integers:
• For n = 1 the sum is 1 = 12 .
• For n = 2 the sum is 1 + 3 = 4 = 22 .
• For n = 3 the sum is 1 + 3 + 5 = 9 = 32 .
• For n = 4 the sum is 1 + 3 + 5 + 7 = 16 = 42 .
Based on the above, it seems reasonable to conjecture that
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .
But how does one rigorously prove this?
To start with, let us look at what it means for a statement (or proposition) to
depend on a positive integer n .
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .
Then:
Discrete Mathematics Page 16
Example 1.28. Let P (n) be the statement n2 + n is even. Write down P (1) ,
P (3) , P (k) and P (k + 1) .
Solution. P (1) states that 12 + 1 is even. P (3) states that 32 + 3 is even. P (k)
states that k 2 + k is even. P (k + 1) states that (k + 1)2 + k + 1 is even.
Example 1.29. Let P (n) be the statement 50 − 2n ≥ 0 . Write down P (1) , P (5)
and P (30) . Which of them are true?
Solution. P (1) states that 50 − 2 ≥ 0 which is true. P (5) states that 50 − 10 ≥ 0
which is true. P (30) states that 50 − 60 ≥ 0 which is false.
an+1 = 2an − 3 , n ≥ 1 .
Let P (n) be the statement an = 2n−1 + 3 . Write down P (1) , P (2) and P (3) .
Which of them are true?
Solution. P (1) states that a1 = 20 + 3 , that is, a1 = 4 . Hence P (1) is true. P (2)
states that a2 = 21 + 3 , that is, a2 = 5 . P (3) states that a3 = 22 + 3 , that is,
a3 = 7 .
We use an+1 = 2an − 3 to find a2 and a3 : a2 = 2a1 − 3 = 2(4) − 3 so a2 = 5,
and a3 = 2a2 − 3 = 2(5) − 3 so a3 = 7. Hence, both P (2) and P (3) are true.
Page 17 Dr Matteo Capoferri & Dr Adrian Turcanu
Mathematical induction
Example 1.31. Use Mathematical Induction to prove that for any positive integer
n
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2
Solution. Let P (n) be the statement
1 + 3 + 5 + 7 + . . . + (2n − 1) = n2 .
an+1 = an + 8n , n ≥ 1 .
ak+1 = ak + 8k
= (2k − 1)2 + 8k
= 4k 2 + 4k + 1
= (2k + 1)2 .
(c) Do the inductive step : show that if P (k) is true then P (k + 1) is true.
It is worth stressing that one needs to carry out ALL of the above steps. Indeed,
there are a number of pitfalls in using the principle of mathematical induction as
the next three examples show.
I. It does not suffice to prove only that P (1) is true. Indeed, let P (n) be the
statement: n2 = n . Then P (1) is true. But P (n) is true only when n = 1
or n = 0 and is false for n ≥ 2.
II. It does not suffice to prove only that if P (k) is true then P (k + 1) is true.
Indeed, let P (n) be the statement: n + 1 = n . Assume that P (k) holds so
that k + 1 = k . Adding 1 to each side gives k + 2 = k + 1 so P (k + 1) is
true. Of course, P (1) is false so we cannot get started.
III. Another typical error is to give a proof of P (k) implies P (k + 1) which only
works for some values of k , for example k ≥ 2 . Let P (n) be the statement:
any n students taking the same examination must all score the same mark.
Clearly P (1) is true.
Assume that P (k) is true and let any group of k + 1 candidates be given.
Let A and B be particular candidates and C the remaining group of k − 1
students. Then A and C form a group of k students so by P (k) they all
have the same mark. The same is true of B and C . Thus A and B and the
group C score the same mark so P (k + 1) is true. By induction, P (n) holds
for all n .
The mistake is that the group C must not be empty. If k = 1 then C is
empty and the marks of students from the groups A and C do not coincide.
We have proved that P (k) is true implies that P (k + 1) is true if k ≥ 2 but
there is no way of establishing P (2) which is false.
Page 19 Dr Matteo Capoferri & Dr Adrian Turcanu
The next example uses the factorial function n! . The number 5! (read five
factorial or factorial 5) means 5 × 4 × 3 × 2 × 1 . Note that 1! = 1 and that by
convention 0! = 1 . The reader should check by calculation that 4! = 24 and use a
calculator to check that 7! = 5040 .
2k+1 = 2 2k
< 2 (k!)
< (k + 1) k!
= (k + 1)!.
1.6.1 Exercises
Exercise 1.6.1 (∗). Let P (n) be the statement 3n − 100 ≥ 0 . Write down P (1) ,
P (4) and P (40) . Which of them are true?
an+1 = 5an − 8 , n ≥ 1 .
Let P (n) be the statement an = 5n−1 + 2 . Write down P (1) and P (2) . Which of
them are true?
Exercise 1.6.3 (∗). Use Mathematical Induction to prove that for any positive
integer n
1 + 2 + 22 + 23 + . . . + 2n = 2n+1 − 1.
1 + 2 + 22 + 23 + . . . + 2n = 2n+1 − 1.
1 + 2 + 22 + 23 + . . . + 2k + 2k+1 = 2k+2 − 1.
Hence 3k+1 < (k + 1)! so P (k + 1) is true. By induction P (n) is true for all n .
a2n + 12
an+1 = , n≥1
7
where 3 < x < 4. Prove by induction that 3 < an < 4 for all n.
7ak+1 = a2k + 12
< 16 + 12
= 28
Then
1 1 1 1 1 1
sk+1 = 1− 1− ... 1 − 1− = 1− .
2 3 k k+1 k k+1
Since
1 1 1
1− = ,
k k+1 k+1
P (k + 1) is true. By induction P (n) is true for all n ≥ 2 .
s1 = 1 s2 = 5 s3 = 23 s4 = 119.
Since
2! = 2 3! = 6 4! = 24 5! = 120
we conjecture that sn = (n + 1)! − 1 .
Let P (n) be the statement: sn = (n + 1)! − 1 . The above calculations show
that P (n) is true for n = 1, 2, 3, 4 .
Assume that P (k) holds so that sk = (k + 1)! − 1 . Then
Since
Example 1.35. The mathematics teaching committee requires one student repre-
sentative from either the first year or the second year or the third year. If there
are 500 first year, 300 second year and 100 third year students, how many possible
different representatives are there?
Solution. There are obviously 500 different choices for picking a representative from
the first year students, 300 for the second and 100 for the third year. Since the
students are all different and we have to pick only one representative from either
year we can add the numbers of choices for a representative from each year to obtain
900 different possible representatives.
If there are n(A) ways to do A and, distinct from them, n(B) ways
to do B, then the number of ways to do A or B is n(A) + n(B).
Similarly, one adds n(A), n(B), n(C) when one must do A or B or C;
and so on.
To solve the following example we need to use both of the above rules.
Example 1.37. In a certain computer system a file name must be a string of letters
and digits which is 1,2, or 3 symbols long. The first symbol must be a letter and
the system does not make any distinction between uppercase and lowercase letters.
How many legitimate file names are there?
Solution. A file name either has one symbol, or two symbols or three symbols. In
each case we have 26 choices for the first letter (the 26 letters of the alphabet).
The number of one-symbol names is then 26. The number of 2-symbol names is
26 × 36 = 936 because for the second symbol we can have either a letter or a
digit, 36 = 26 + 10 choices. We multiply the number of choices for the first and
for the second symbol because both choices are done independently. Similarly the
number of three-symbol names is 26 × 36 × 36 = 33696. Summing the numbers for
the three different cases (by the sum rule) we obtain that the system at hand can
accommodate 26 + 936 + 33696 = 34658 different files.
1.7.1 Exercises
Exercise 1.7.1 (∗). A binary word is made of 0’s and 1’s. How many binary words
are there with length exactly 7? How many binary words are there with length up
to 3?
Solution. For each digit we have two choices: 0 or 1. Thus by the product rule
we have 2 × 2 × 2 × 2 × 2 × 2 × 2 = 27 different words of length 7. To count the
number of words with length up to 3 we need to add up the numbers of words of
length 1,2,3 (here we use the sum rule). Using the product rule as before we obtain:
21 + 22 + 23 = 14 - the number of binary words with length up to 3.
Exercise 1.7.2. In a restaurant for the main dish there are 10 meat, 5 fowl, and 8
fish choices. How many ways are there to pick a main dish?
Page 25 Dr Matteo Capoferri & Dr Adrian Turcanu
Exercise 1.7.3 (∗). A salesman is to visit 5 towns. Assuming that he goes to one
town after another, visiting all of them once, in how many orders can he make his
trip?
Solution. The salesman has 5 choices for the first town in his itinerary. Since each
town is to be visited only once he/she has 4 choices for the second town in the trip,
and so on. By the product rule we obtain 5 × 4 × 3 × 2 × 1 = 120 - the number of
different trips.
Exercise 1.7.4 (∗). How many numbers between 1 and 1000 have exactly one 7 in
them?
Solution. The condition of the problem means that we are considering numbers of
up to 3 digits. Hence we can count separately the numbers at hand with length
1,2,3 and then add them up. Clearly we have only one such number of length 1: 7.
For two-digit numbers the digit 7 can either be the first digit or the second one. If
it stands first then we have 9 choices for the second digit: 0, 1, 2, 3, 4, 5, 6, 8, 9. If 7
is the second digit we have only 8 choices for the first digit because 0 is not allowed.
Thus we have 9 + 8 = 17 required numbers of length 2. For length 3 the digit 7 can
again be on the 1st, 2nd or 3rd position. As before, taking care of the zero, we obtain
9 × 9, 8 × 9 and 8 × 9 different numbers for each case respectively. Finally using the
sum rule we add up all different cases and obtain 1 + 17 + 81 + 72 + 72 = 243.
Example 1.39. In how many ways can a committee with 10 members select a
chairperson, a treasurer and an administrator?
Discrete Mathematics Page 26
Solution. Suppose that the first person to be selected is to be the chairperson. There
are 10 different was to do it. Suppose the second person selected is the treasurer for
whom we have 9 choices. Finally, suppose the last person chosen is the administrator,
for whom there are 8 possible people available. By the product rule the number of
selections is 10 × 9 × 8 = 720.
Theorem 1.40. The number of ordered selections of k distinct objects from a set
of n objects is n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1).
n!
n(n − 1)(n − 2) . . . (n − k + 2)(n − k + 1) = (1.1)
(n − k)!
n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1
0! = 1 .
With this convention the expression on the right hand side in (1.1) gives the same
result for n = k as the expression on the left hand side.
Proof of Theorem 1.40. There are n ways to select the first object. After the first
object is selected there are n − 1 objects left and there are respectively n − 1 choices
for the second object. We continue selecting until the last object for which there
are n − k + 1 choices. By the product rule these selections can be made in n(n −
1)(n − 2) . . . (n − k + 1) ways.
Remark 1.7. Observe that using formula (1.1) for k = n we find that the number of
different ways to order n objects is n!.
Consider now a different kind of selections called combinations.
In this case the order of objects selected does not matter. We will often say
“unordered selection without repetitions” rather than “combination”.
Page 27 Dr Matteo Capoferri & Dr Adrian Turcanu
where nk stands for the number of unordered selections. Dividing both sides by k!
For example
5 5 0 5 5 1 4 5 2 3 5 3 2 5 4 1 5 5 0
(x + y) = xy + xy + xy + xy + xy + xy
0 1 2 3 4 5
= y 5 + 5xy 4 + 10x2 y 3 + 10x3 y 2 + 5x4 y + x5 .
Example 1.43. In the card game bridge, a hand has 13 cards. There are 52 cards
in a deck. How many different bridge hands are there?
Solution. We have to make an unordered selection of 13 cards out of 52. Repetitions
are not allowed. By the above theorem there are 52 13
= 635013559600 different
bridge hands.
Solution. The order in which we pick cards for a full house does not matter so we
can specify a convenient order. Let us first pick a denomination for three cards.
There are 13 choices. Next we have to pick 3 cards out of 4 in this denomination.
Since the order does not matter we have 43 = 4 choices. Next let us pick the second
denomination. We have 12 choices for it. To pick 2 cards out of 4 in the chosen
denomination we have 42 = 6 choices. Finally by the product rule we multiply
through the choices to obtain 13 × 4 × 12 × 6 = 3744 — the number of different full
houses.
1.8.1 Exercises
Exercise 1.8.1. Find in how many ways an ordered selection of 2 elements out of
a set of 5 elements can be made. No repetitions are allowed. How many unordered
selections of 2 elements from the same set are there?
Exercise 1.8.2. There are 14 available ingredients for making perfume. How many
different perfumes can be made if one is to use 4 ingredients for each one?
Exercise 1.8.3. A travelling salesemen is to visit any 5 different cities from a list
of 10. How many different itineraries he can have if he visits the cities successively
one after another?
Exercise 1.8.4 (∗). What is the number of ways in which a subset of one or two
elements can be picked up from a set of n elements?
Solution. Clearly there are n ways to pick a subset of one element. To pick a subset
of two elements we have to make an unordered selection with no repetitions of 2
elements out of n. Hence we have n2 = 2×(n−2)!
n!
= n(n−1)
2
two-element subsets. By
n(n−1) n(n+1)
the sum rule we obtain n + 2
= 2
.
Exercise 1.8.5 (∗). A university department has 12 male professors and 3 female
professors. It is decided that every committee of professors in this department should
Page 29 Dr Matteo Capoferri & Dr Adrian Turcanu
have at least one female member. How many different committees of 12 people can
be formed?
Exercise 1.8.6. The computer science students are entering a team of 5 in the
university relay race. There are 20 undergraduates and 10 postgraduates. How
many teams are possible if
(a)(∗) any student is eligible
(b) there must be 2 postgraduates and 3 undergraduates
Exercise 1.8.7 (∗). How many ways are there to pick a combination of k distinct
numbers from {1, 2, . . . , n} if 1 and 2 cannot both be picked?
If a set A contains a finite number of elements then it is called a finite set and we
write |A| for the number of elements in it. An infinite set is a set that is not finite.
The number |A| is called the cardinality of A. Hence |{x, y, z}| = 3 and the set of
positive integers is infinite.
Discrete Mathematics Page 30
Inclusion-Exclusion Principle
For instance, suppose we want to find the number of cards in an ordinary deck
that are clubs or aces. Let A be the set of cards that are aces and let C be the set
of cards that are clubs.
We have |A| = 4 , |C| = 13 and |A ∩ C| = 1 (the ace of clubs). Then the
Exclusion Principle gives us
|A ∪ C| = |A| + |C| − |A ∩ C| = 4 + 13 − 1 = 16.
Example 1.45. A survey in Scotland finds that 86% of the population have heard
of product A while 80% have heard of product B. Also, 70% of the population
have heard about both products. What percentage of the population have not
heard about either product?
Solution. Let A be the set of people who have heard about product A and define
B similarly. Let N be the population of Scotland.
From the given data, |A| = (0.86)N , |B| = (0.8)N and |A ∩ B| = (0.7)N .
Then
Hence 96% of the population have heard of either product A or product B and 4%
have not heard about either product.
In the lectures I will indicate why (1.6) holds by means of a Venn diagram.
and
|A ∪ B ∪ C| = 50 + 33 + 20 − 16 − 10 − 6 + 3 = 74.
Example 1.47. At the University of Scabia all first year computer science students
have to study at least one of the three modules : accounting, ethics and mathematics.
There are 237 first year computer science students. Also, 150 study maths, 120
study accounting, 100 study ethics, 50 study both accounting and maths, 20 study
both accounting and ethics and 70 study both maths and ethics.
(b) Find the number who study accounting and maths but not ethics.
Solution. Let A be the set of students studying accounting, M those doing maths
and E those doing ethics. From the given data,
|A ∩ M | = 50 |A ∩ E| = 20 |E ∩ M | = 70
Also, since everyone is studying at least one of the three modules,
|A ∪ E ∪ M | = 237
Using
|A ∪ E ∪ M | = |A| + |E| + |M | − |A ∩ E| − |A ∩ M | − |E ∩ M | + |A ∩ E ∩ M |
X ∩Y =∅ X ∪Y =A∩M
Discrete Mathematics Page 32
By (1.5) , |X ∪ Y | = |X| + |Y | − |X ∩ Y | so
|A ∩ M | = |A ∩ M ∩ E| + |A ∩ M ∩ E|
Hence
50 = 7 + |A ∩ M ∩ E|
and |A ∩ M ∩ E| = 43 .
1.9.1 Exercises
Exercise 1.9.1 (∗). A and B are sets with |A| = 30 and |B| = 20 . Find A ∪ B
if
(a) A ∩ B = ∅ ,
(b) |A ∩ B| = 5 ,
(c) B ⊂ A .
Exercise 1.9.2 (∗). A , B and C are sets with |A| = |B| = |C| = 100 .
Find |A ∪ B ∪ C| in the following cases.
(a) A = B = C .
(b) A ∩ B = A ∩ C = B ∩ C = ∅ .
(c) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and A ∩ B ∩ C = ∅ .
(d) |A ∩ B| = |A ∩ C| = |B ∩ C| = 50 and |A ∩ B ∩ C| = 20 .
Solution. (a) If A = B = C then A ∪ B ∪ C = A and |A ∪ B ∪ C| = |A| = 100 .
(b) If A ∩ B = A ∩ C = B ∩ C = ∅ then
|A ∪ B ∪ C| = |A| + |B| + |C| = 300
Exercise 1.9.3 (∗). A , B and C are sets with |A| = 10, |B| = 100, |C| = 1000 .
Find |A ∪ B ∪ C| if
(a) A ⊂ B and B ⊂ C ,
(b) A ∩ B = A ∩ C = B ∩ C = ∅ ,
(c) |A ∩ B| = |A ∩ C| = |B ∩ C| = 5 and |A ∩ B ∩ C| = 1 .
Exercise 1.9.4 (∗). How many integers from 1 to 1000 are divisible by 7 or 11?
Solution. For integers between 1 and 1000, let A be the set of those divisible by 7
and B be the set of those divisible by 11.
We need to find |A ∪ B|. Now, |A| is the integer part of 1000/7 , that is
|A| = 142 . Also |B| = 90 . For a number to be in the set A ∩ B it must be divisible
by 77. Hence |A ∩ B| = 12 .
Exercise 1.9.5 (∗). How many integers from 1 to 200 are divisible by 2 or 5 or 7?
Solution. For integers between 1 and 200, let A be the set of those divisible by 2,
B the set of those divisible by 5 and C the set of those divisible by 7. We need to
find |A ∪ B ∪ C|.
We have that |A| = 200/2 = 100 , |B| = 40 and |C| = 28 .
For a number to be in the set A∩B it must be divisible by 10. Hence |A∩B| = 20.
Similarly, |A ∩ C| = 14 and |B ∩ C| = 5.
Numbers in the set A ∩ B ∩ C are divisible by 70. Hence |A ∩ B ∩ C| = 2 .
|A ∪ B ∪ C| = 100 + 40 + 28 − 20 − 14 − 5 + 2 = 131.
Discrete Mathematics Page 34
Exercise 1.9.6 (∗). There are 73 students in the first year Humanities class at the
University of Scabia. Among them a total of 52 can play the piano, 25 can play
the violin, and 20 can play the flute; 17 can play the piano and violin, 12 can play
the piano and flute, and 7 can play the violin and flute. Only Hamish Ratho can
play all three instruments. How many of the class cannot play any of them?
Solution. Let A be the set of students who can play the piano, B the set of
students who can play the violin and C the set of students who can play the flute.
Then
|A| = 52 |B| = 25 |C| = 20
|A ∩ B| = 17 |A ∩ C| = 12 |B ∩ C| = 7 |A ∩ B ∩ C| = 1.
Using
we have that
|A ∪ B ∪ C| = 52 + 25 + 20 − 17 − 12 − 7 + 1 = 62.
The number of students who cannot play any of the three instruments is 73 − 62 =
11 .
Exercise 1.9.7 (∗). At the University of Scabia all first year students have to
study at least one of the three modules : computing, physics and mathematics.
There are 1400 first year students. Also, 700 study maths, 600 study physics,
500 study computing, 200 study both physics and maths, 150 study both physics
and computing and 120 study both maths and computing.
(b) Find the number who study computing and maths but not physics.
Solution. Let A be the set of students studying maths, B those doing physics and
C those doing computing.
From the given data,
we have that
so that |A ∩ B ∩ C| = 70 . Hence there are 70 students who study all three subjects.
The set of students not studying physics is B . Hence the number who study
computing and maths but not physics is A ∩ C ∩ B .
Let X = A ∩ C ∩ B and Y = A ∩ C ∩ B . Then
X ∩Y =∅ X ∪ Y = A ∩ C.
Using
|X ∪ Y | = |X| + |Y | − |X ∩ Y | = |X| + |Y |
we have that
|A ∩ C| = |A ∩ C ∩ B| + |A ∩ C ∩ B|.
Hence |A ∩ C ∩ B| = 120 − 70 = 50 .
Chapter 2
Probability
• flipping of a coin;
• rolling a die;
• the top link in a Google search (it will either lead to the desired information
or it won’t);
Definition 2.1. The sample space S for a random experiment is the set of all
possible outcomes of the experiment.
36
Page 37 Dr Matteo Capoferri & Dr Adrian Turcanu
Example 2.3 (*). Suggest suitable sample spaces, and identify the subset corre-
sponding to the event A, for the following situations:
(a) A coin, which shows heads (H) or tails (T), is tossed three times; A is the event
that the coin shows head twice.
(b) A game of football is played; A is the event that the match is drawn.
(c) A couple have two children; A is the event that both are girls.
Solution. (a) We can take
and
A = {HHT, HT H, T HH}.
where n is the first team’s score and m is the second team’s score. In this case
A = {(n, m) | n = m}.
(c) We can take S = {GG, GB, BG, BB} where, for example, GB means first child
is a girl, second child is a boy. In this case, A = {GG}. Alternatively we can
take S = {0, 1, 2} with each sample point corresponding to number of girls born.
In this case, A = {2}.
Definition 2.4. The probability function is a function P (E) that assigns a value
between 0 and 1 to any event E ⊂ S. The value 0 ≤ P (E) ≤ 1 is called the
probability of event E.
Probability axioms
2. P (∅) = 0, P (S) = 1;
P (A ∪ B) = P (A) + P (B) .
Axiom 3 can be extended to the case where one has more than two events: if the
events E1 , E2 , . . . En are mutually exclusive, that is Ei ∩ Ej = ∅ for any i ̸= j, then
In particular for any finite subset E the probability P (E) is given by the sum of
probabilities assigned to its individual points.
This gives a recipe for specifying and computing a probability function. By
Axiom 2 the sum of the probabilities assigned to all sample points must by equal to
1:
n
X
for S = {1, 2, 3, . . . , n} one has pi = 1, (2.1)
i=1
Example 2.5. Consider again the experiment in which we roll a die. The sample
space is S = {1, 2, 3, 4, 5, 6}. If we think that the die is fair, that is all possibilities
are equally likely, then we should set p1 = p2 = ... = p6 = 16 . Then condition (2.1)
is satisfied.
In general for a finite sample space S with N elements one can define an equiprob-
able probability function by assigning pi = N1 for each sample point i ∈ S. This
means that all outcomes are equally likely.
For any event E we call its complement E = S \ E an event “not E” (any other
outcome but E). The following theorem holds.
The proof uses the same idea as in the proof of Theorem 2.7. You are warmly
encouraged to try and write it down yourself as an exercise!
Example 2.9 (*). If P (A) = 2/3 and P (B) = 3/4 what are the largest and the
smallest values that P (A ∩ B) can take?
Solution. The intersection A ∩ B is a subset of A and also is a subset of B. Thus
its probability has to be smaller than P (A) and smaller than P (B). So P (A ∩ B) ≤
2
3
= P (A) - the smaller of the two. Moreover it may take that value if A ⊂ B in
which case P (A ∩ B) = P (A) = 23 . Thus the largest P (A ∩ B) can be is 2/3. Next
consider the equality: P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 1712
− P (A ∪ B). The
17 5
largest P (A ∪ B) can be is 1. Thus the smallest P (A ∩ B) can be is 12 − 1 = 12 .
Example 2.10. One often hears that the theory of probability started in the sev-
enteenth century, when a French nobleman, the Chevalier de Méré, proposed the
following problem in 1654 to his friend Pascal: Why is one more likely to obtain a
6 in four throws of a die than to obtain a double 6 in 24 throws of two dice? This
problem is known as de Méré’s paradox. We use the word paradox, because, based
on the fact that there are 6 possible results when we roll a die and 36 possible results
when we roll two dice, some people thought that the two events above should have
the same probability. Indeed, notice that the number of throws, divided by the num-
ber of possible results, is equal to 2/3 in both cases (4/6 = 24/36 = 2/3 = 0.66...).
Nowadays, we can easily compute the probability of each event. We find that the
probability of obtaining at least one 6 in four rolls of a (fair or non-biased) die is
1 − (5/6)4 = 671/1296 ≈ 0.5177, while the probability of getting at least a double
6 in throwing two dice 24 times is 1 − (35/36)24 ≈ 0.4914. One can conclude that
the Chevalier de Méré must have spent a lot of time throwing dice to discover such
a small difference.
2.1.1 Exercises
Exercise 2.1.1 (*). In the following experiments decide what are the outcomes and
state if they are equally likely.
Page 41 Dr Matteo Capoferri & Dr Adrian Turcanu
(a) A red, blue and a yellow ball are put into a bag and a ball drawn out at random.
(b) Two yellow balls, one red ball and one blue are in a bag. One ball is drawn at
random.
(d) Two children throw a die to see who starts the game. If the die shows a prime
number Joe starts, otherwise Ellen starts.
Solution. (b) We can take the outcomes to be {Y, R, B} — the colour of the ball
drawn. It is natural to assume that drawing each ball is equiprobable, that is 1/4.
As there are two yellow balls in the bag there is a larger probability of drawing a
yellow ball — 1/2 = 1/4 × 2 — versus 1/4 — the probability of drawing the blue
ball — and 1/4 — the probability of drawing the red ball.
(c) The outcomes now are unordered pairs {Y Y, Y B, Y R, BR}. The probability
of drawing the pair Y B is the same as the probability of drawing Y R and it is twice
larger than the probability of drawing Y Y (which is the same as the probability for
RB).
B = A ∪ (B \ A).
Hence by the third axiom of probability we have P (B) = P (A) + P (B \ A). Since
by the first axiom P (B \ A) ≥ 0, we conclude that P (B) ≥ P (A).
Solution. We are given P (A) = 0.3, P (B) = 0.4, P (A ∩ B) = 0.2. We are asked
to find P (B ∩ Ā). The set B can be represented as a disjoint union: B = (B ∩
A) ∪ (B ∩ Ā). Thus using P (B) = P (B ∩ A) + P (B ∩ Ā) we find P (B ∩ Ā) =
P (B) − P (B ∩ A) = 0.4 − 0.2 = 0.2. This means that there is 20% chance that it
will rain tomorrow but not today.
Discrete Mathematics Page 42
Exercise 2.1.4 (*). Events A and B are such that P (A) = 0.75, P (B) = 0.65.
Explain why events A and B cannot be mutually exclusive. (Two events A and B
are called mutually exclusive if A ∩ B = ∅.) What can you say about P (A ∪ B) and
P (A ∩ B)?
Solution. If the events A and B were mutually exclusive then by formula (2.3) we
would have P (A ∪ B) = P (A) + P (B) but this is impossible because P (A) + P (B) =
1.4 > 1. As in Example 2.9, the probability P (A∩B) is the largest when the set with
smaller probability, the set B, is a subset of the set with the larger probability, the set
A. In this case P (A ∩ B) = 0.65. In the same situation P (A ∪ B) takes the smallest
possible value: 0.75. The largest P (A ∪ B) can be is 1. By formula (2.3) this is the
situation when P (A ∩ B) = P (A) + P (B) − P (A ∪ B) = 0.75 + 0.65 − 1 = 0.4 is the
smallest. All in all, we found that 0.4 ≤ P (A ∩ B) ≤ 0.65 and 0.75 ≤ P (A ∪ B) ≤ 1.
Exercise 2.1.5. Events A, B and C are such that P (A) = 0.7, P (B) = 0.6,
P (C) = 0.5, P (A∩B) = 0.4, P (A∩C) = 0.3, P (B∩C) = 0.2 and P (A∩B∩C) = 0.1.
Find
(a) P (A ∪ B),
(b) P (A ∪ B ∪ C),
Example 2.11 (*). A fair coin is tossed twice. Let A be the event that at least one
head is obtained. Suppose at the end of experiment we are told that A has occurred
(but we have not seen the actual result). What are the probabilities of other events?
Solution. In light of the occurrence of A we should revise the probabilities of other
events. We define a new conditional probability function P (·|A) given that event A
has occurred:
Sample point P (·) P (·|A)
1 1
HH
4 3
1 1
TH
4 3
1 1
HT
4 3
1
TT 0
4
How did we get the revised probabilities given that A has occurred? Sample
points outside A (the point TT) are assigned probability 0 (since we know that A has
occurred). Sample points inside event A have their original probabilities multiplied
by a constant in such a way that they still add up to 1. Original probabilities total
to 3/4 so if we divided each by 3/4 we get revised probabilities which sum to 1.
Note that 3/4 — the factor we had to divide by — is P (A). This is not a
coincidence. The construction of conditional probability follows the following general
rule
P (B ∩ A)
P (B|A) = . (2.4)
P (A)
(It is assumed that P (A) ̸= 0.)
Example 2.12 (*). A fair coin is tossed three times. What is the probability that
the first toss was tails given that at least one head is obtained?
Solution. We have that A = {HHH, HHT, HT H, HT T, T HH, T HT, T T H} is the
event that at least one head has occurred and B = {T HH, T HT, T T H, T T T } is the
event that the first toss was tails. Their intersection is A∩B = {T HH, T HT, T T H}.
Since each point has probability 1/8 we have P (A) = 7/8, P (A ∩ B) = 3/8. Hence
by the above general formula
P (B ∩ A) 3/8 3
P (B|A) = = = .
P (A) 7/8 7
Discrete Mathematics Page 44
Example 2.13 (*). An urn contains six red balls and three blue balls. Three balls
are drawn at a time from the urn (without replacement). What is the probability
that the first ball is red? How is your answer modified if you are supplied an
additional information that the last two balls are red?
Solution. First we should define a sample space for this model. Let us enumerate
the balls so we can distinguish them: balls numbered from 1 to 6 are red and the
balls numbered 7,8,9 are blue. Then the sample points are triples of distinct numbers
each from 1 to 9 corresponding to the three draws. The total number of points in the
sample space is |S| = 9×8×7 = 504. There is no reason to suppose that one sample
point is more likely than another so we set the probability of each sample point to
be 1/504. Let A1 be the event that the first ball is red. We have |A1 | = 6 × 8 × 7
(we have 6 choices for the red ball at the first draw, 8 choices for the 2nd draw since
one ball was already picked, and 7 choices for the 3rd draw). Therefore the answer
to the first question is
|A1 | 6×8×7 6 2
P (A1 ) = = = = .
|S| 9×8×7 9 3
Now let B be the event that the last two balls are red. Then A1 ∩ B is the event
that all three balls are red. We have |B| = 6 × 5 × 7 and |A1 ∩ B| = 6 × 5 × 4 = 120.
Therefore
P (A1 ∩ B) |A1 ∩ B| 6×5×4 4
P (A1 |B) = = = = .
P (B) |B| 6×5×7 7
Definition 2.14. We say that two events A and B are independent if the occurrence
of B has no effect on the probability of A. So
It follows from formula (2.4) (CHECK THIS!) that the events A and B are
independent if and only if
The criteria of independence stated in this form is manifestly symmetric with respect
to the roles of A and B. If B has no effect on A then A has no effect on B so that
P (B|A) = P (B)
holds as well.
Page 45 Dr Matteo Capoferri & Dr Adrian Turcanu
Example 2.15 (*). A coin is tossed 3 times. Let the event A be that three heads are
obtained and the event B be that the first toss is a head. Are A and B independent
events?
Solution. We have
1 4 1 1
P (A) = , P (B) = = , P (A ∩ B) = P (A) = .
8 8 2 8
Then
1 1
P (A)P (B) = ̸= P (A ∩ B) = .
16 8
Hence, the events are not independent.
Example 2.16 (*). In a class there are 4 left-handed men, 6 left-handed women
and 6 right-handed men. How many right-handed women must be present if sex
and handedness are to be independent when a student is selected at random?
Solution. Let A be an event that a student is a woman, B the event that the student
is left-handed and x the number of right-handed women students in the class. So
the number of women in the class is 6 + x, the number of left-handed students is
10 and the total number of students is 16 + x. Using formula (2.6) we obtain an
equation
6+x 10 6
P (A)P (B) = = P (A ∩ B) =
16 + x 16 + x 16 + x
which is equivalent to
10(6 + x) = 6(16 + x)
and the solution is x = 9. The class must have 9 right-handed women.
2.2.1 Exercises
Exercise 2.2.1. Find P (A|B) if P (B|A) = 1/6, P (A) = 1/2, P (B) = 1/3.
Exercise 2.2.2. Find P (A|B) if P (B|A) = 0.8, P (B|Ā) = 0.3, and P (A) = 0.2.
Discrete Mathematics Page 46
Exercise 2.2.3 (*). If P (A) = 2/3 and P (B) = 3/4, what is the largest and smallest
P (B|A) can be? Answer the same question for P (A|B).
Solution. Since P (A) and P (B) are fixed, in view of formula (2.4) it suffices to find
the maximal and minimal values of P (A ∩ B). This was done in Example 2.9. It was
obtained that 5/12 ≤ P (A ∩ B) ≤ 2/3. Dividing by P (A) and P (B) respectively
we obtain 5/8 ≤ P (B|A) ≤ 1 and 5/9 ≤ P (A|B) ≤ 8/9.
Exercise 2.2.4 (*). The following questions may shed some light on why many
people believe that past performance of coin-tossing (or repeatable random events
in general) affects future performance. In the answers define the sample space and
probability on it first.
(a) A fair coin was tossed 6 times and heads came up (exactly) 4 times. If the coin
is tossed again, what is the probability that the seventh toss will be a head?
(b) A fair coin is tossed 6 times and heads came up (exactly) 4 times. What is the
probability that the 6th toss was a head?
(c) A fair coin is tossed 6 times and heads came up (exactly) 4 times. It is also
known that 2 of the first 3 tosses were heads. What is the probability that the
6th toss was a head?
Solution. (a) It may be intuitively clear that the answer is still 1/2 but let us formally
show this. In total we have 7 tosses so the probability space consists of 27 = 128
strings such as HT HHT HT . Each point in sample space is equiprobable with
probability 1/128. Let A be the event that during the first 6 tosses heads came up 4
times. This means that A includes only strings with two T ’s in the first 6 positions.
6
The number of such strings is 2 = 15 -the number of unordered selections without
repetition (the positions of T ’s). In addition we have 2 choices for the outcome of
the 7th toss. Therefore P (A) = (15 × 2)/128 = 15/64. Let B be the event that the
7th toss is a head. The intersection A ∩ B consists of those strings in A that end
with a H. There are half as many of them and their (unconditional) probability is
P (A ∩ B) = 15/128. By the general rule P (B|A) = P (A ∩ B)/P (A) = (15/128) :
(15/64) = 1/2 as we suspected.
(b) We have 6 tosses and the probability space contains 26 = 64 points each
having the probability 1/64. Let A be the event that during the first 6 tosses heads
came up 4 times. As was computed in the previous solution |A| = 15 and therefore
P (A) = 15/64. Let B be the event that the 6th toss was a head. The strings in
A ∩ B have 2 T ’s which have to be located in the first 5 tosses. The number of such
strings is |A ∩ B| = 52 = 10 and therefore P (A ∩ B) = 10/64 = 5/32. By formula
(2.4) P (B|A) = 2/3.
Page 47 Dr Matteo Capoferri & Dr Adrian Turcanu
Exercise 2.2.5 (*). A box contains 10 items of type I, of which 3 are defective,
and 20 items of type II, of which 5 are defective. An item is chosen at random from
the 30 items in the box. Let A be the event that the item is of type I and B be
the event that the item is defective. Find the following probabilities: P (A), P (Ā),
P (B), P (A ∩ B), P (A ∪ B), P (A|B), P (B|A). Are A and B mutually exclusive?
Are A and B independent?
Solution. Here are some answers: P (A) = 10/30 = 1/3, P (A ∪ B) = (10 + 5)/30 =
1/2, P (A ∩ B) = 3/30 = 1/10.
Example 2.18. Flip a coin three times. The sample space S consists of all sequences
of 3 elements each of which is either H or T . Define
Example 2.19. Consider the usual sample space for tossing two dice:
S = {(i, j) | 1 ≤ i, j ≤ 6} .
Let X be the sum of the dots when the toss (i, j) occurs. That is X(i, j) = i + j.
X is a random variable.
The value of a random variable X(s) is also “random” in the sense that we can
compute P (X = x) for any possible value x of X according to the rule:
Example 2.20 (*). Let S be the usual equiprobable, ordered-pair sample space
for tossing two dice. Let X(i, j) = i + j. Determine the real-number sample space
associated with X and determine its probability distribution.
Solution. The new sample space is the range of X: {2, 3, 4, . . . , 12}. The probability
distribution on it is
1
f (2) = P (X = 2) = P ({(1, 1)}) =
36
2
f (3) = P (X = 3) = P ({(1, 2), (2, 1)}) =
36
... = ...
6
f (7) = P (X = 7) = P ({(1, 6), (2, 5), . . . , (6, 1)}) =
36
... = ...
1
f (12) = P (X = 12) = P ({(6, 6)}) = .
36
Example 2.21 (*). The experiment consists of flipping a biased coin three times. A
biased coin has a probability of heads on a flip given by some number p ̸= 1/2. The
probability of tails is q = 1 − p. For the experiment at hand we have the following
probability function
P (X = 0) = P ({T T T }) = q 3 ,
P (X = 1) = P ({HT T, T HT, T T H}) = 3pq 2 ,
P (X = 2) = P ({HHT, HT H, T HH}) = 3p2 q ,
P (X = 3) = P ({HHH}) = p3 .
Example 2.22 (*). You are given a 30 question multiple choice test. For each
answer there are 5 choices. You loose 1/4 as much credit for each wrong answer as
you gain for each right answer. You are time stressed and decide to guess at random
the answers to 10 problems. What are the chances that by doing that you will raise
your total score?
Solution. Suppose you guessed r answers correctly and thus 10 − r incorrectly. The
change in your credit is
10 − r
r−
4
Discrete Mathematics Page 50
where we included the penalties for wrong answers. To improve your score you
need the above number to be positive (an increase in credit). This implies r > 2
that is you must guess correctly more than 2 answers. What are the chances of
that? Since for each problem (each trial) the probability of success is 1/5 = 0.2
we are dealing here with 10 Bernoulli trials with the probability of success of each
trial p = 0.2. Therefore the probability of r successes is given by the Bernoulli
distribution B10,0.2 (r). The final answer is then
10
X
P ( improving the score ) = B10,0.2 (r) .
r=3
There are tables/computer programs available for calculating the values of binomial
distributions. Using any of those one can obtain a numerical answer: P ≈ 0.3222
that is the chances of success are approximately 1 out of 3.
There are lots of ways to generate new distributions from a given set of distri-
butions. For example if X and Y are two random variables so is X + Y , X · Y or
Z = max(X, Y ) simply because each one is also a function on the sample space.
Such new random variables have probability distributions of their own which can be
computed by the usual general rule.
2.3.1 Exercises
Exercise 2.3.1. A fair die is rolled twice. Let X give the maximum value of the
two outcomes. Determine the probability distribution associated with X.
Exercise 2.3.2 (*). A fair coin is tossed 3 times. A random variable X gives
the number of heads minus the number of tails, e.g. X({T HT }) = 1 − 2 = −1.
Determine the probability distribution associated with X.
Solution. The variable X takes only 4 values: −3, −1, 1, 3. The probability distri-
bution for X is
Exercise 2.3.3. A fair die is rolled twice. Let X1 and X2 be the random variables
giving the outcomes of the first and second roll respectively. Find the probability
distributions for the following random variables which are constructed from X1 and
X2 :
(a) (*) Z = X1 − X2 ;
(b) U = max(X1 , X2 );
(c) V = min(X1 , X2 ).
Solution. (a) The sample space consists of ordered pairs of integers each between
1 and 6 which are equiprobable with probability 1/36. The values of Z are
5, 4, 3, 2, 1, 0, −1, −2, −3, −4, −5. The associated probability distribution is
Exercise 2.3.4. A biased coin is flipped 4 times. The probability of the coin showing
‘heads’ is p = 3/4. Establish the probability that
Hence, we have
3 4
3 4 3 1 3 189
P (A) = 4p q + p = 4 × + = ≈ 0.74 .
4 4 4 256
Exercise 2.3.5. You are given a multiple choice test. For each answer there are
Discrete Mathematics Page 52
4 choices. You ran out of time and decide to guess the answers to 6 problems at
random. What are the chances that you will guess half the answers correctly?
Exercise 2.3.6 (*). On a multiple choice test each answer has 4 choices. You loose
1/2 as much credit for each wrong answer as you gain for each right answer. What
are the chances that you will raise your total score by guessing at random answers
to 6 problems?
Solution. Suppose you guessed correctly r answers. Then your credit has changed
by
6−r
Credit change = r −
2
where the penalty for wrong answers is included. For the credit change to be positive
we need r ≥ 3. We are dealing here with 6 Bernoulli trials with p = 1/4 = 0.25.
Hence the probability of having 3 or more successes is
6 r 6−r
X 6 1 3
P (r ≥ 3) = .
r=3
r 4 4
Example 2.24 (*). Find the expected value for an equiprobable random variable
(uniform distribution).
Solution. By definition we have
n n n
X X 1 1X
E(X) = xi P (X = xi ) = xi = xi
i=1 i=1
n n i=1
We see that for an equiprobable variable the expectation is the same as the average
of its values xi .
Page 53 Dr Matteo Capoferri & Dr Adrian Turcanu
Example 2.25 (*). Find the expectation E(X) for the random variable described
in example 2.20.
Solution. Using the probability distribution computed in example 2.20 we obtain
12
X
E(X) = i · P (X = i)
i=2
1 2 3 6 5 2 1
= 2· +3· +4· + ··· + 7 · +8· + · · · + 11 · + 12 ·
36 36 36 36 36 36 36
252
= = 7.
36
Example 2.26 (*). Find the expectation E(Bn,p ) for the binomial distribution
Bn,p (k).
Solution. [NON EXAMINABLE] By definition the answer is
n
X n k n−k
E(Bn,p ) = k p q .
k=0
k
E(Bn,p ) = np .
To that end we will use recursion — the technique we will study in a lot more detail
later in the course. In this example it works as follows. Let En = E(Bn,p ). Notice
that after we ran the first trial there are n − 1 remaining trials and we have the
situation of Bn−1,p . With probability q the first trial was a failure and thus the only
successes will be those in the remaining trials. In this case we expect En−1 successes.
On the other hand, if the first trial was a success (the probability for that is p) then
we expect 1 + En−1 successes. In short
At the end of last subsection we talked about how new random variables can be
constructed in terms of other random variables. We will discuss now how, in some
simple cases, the expectations of such composite random variables can be expressed
Discrete Mathematics Page 54
The above theorem can be applied to Bernoulli trials. Let X be the total number
of successes. For each trial i define a binary random variable Xi so that
1 if trial i is a success ,
Xi = (2.9)
0 if trial i is a failure .
The whole point of introducing Xi is that they are simple to analyse yet they can
be summed to get X. For each i, E(Xi ) = 0 · q + 1 · p = p. Thus by (2.8) we get
E(X) = E(Bn,p ) = np, as we have already established by a different method.
What about E(X · Y )? As it turns out, the latter equals E(X)E(Y ) only if X
and Y are independent, where two random variables X and Y are called independent
if two events {X = x} and {Y = y} are independent for any pair of values (x, y).
Thus for independent variables we have
Coming back to Example 2.29 above let us compute the variances and standard
deviations of X and Y . We have
1 1 1 2
Var(X) = (1 − 2)2 + (2 − 2)2 + (3 − 2)2 =
3 3 3 3
and
2 1 24
Var(Y ) = (0 − 2)2 + (6 − 2)2 = = 8.
3 3 3
p √
The standard deviations are σX = 2/3 ≈ 0.816, σY = 8 ≈ 2.828. We see that
the standard deviation of Y is a lot bigger than that of X, reflecting the qualitative
observation that the values of Y more spread around the average than those of X.
2.4.1 Exercises
Exercise 2.4.2 (*). Suppose you roll a fair die 3 times. Let the random variable
N1 be the number of times that the outcome on a roll is 1. Find the probability
distribution of N1 and its expected value E(N1 ).
Exercise 2.4.3 (*). For the random variables Z, U , V defined in Exercise 2.3.3
find the expected values and variances.
Solution. Here we show how to compute the expected value and variance for the
random variable Z; the calculations for U and V are analogous.
The distribution for this variable was computed in Exercise 2.3.3, part (a). By
the general rule we have
5
X
E(Z) = n × P (Z = n) .
n=−5
Now, we established in Exercise 2.3.3, part (a), that the probability distribution is
symmetric with respect to flipping the sign of the value: P (Z = n) = P (Z = −n).
Therefore, the expectation value vanishes.
The variance is computed as
5
X 5
X
2
Var(Z) = n × P (Z = n) = 2 × n2 × P (Z = n) ,
n=−5 n=1
where in the last step we used the symmetry of the distribution with respect to
flipping the sign. A straightforward computation now yields
2 5 2 4 2 3 2 2 2 1
Var(Z) = 2 1 × +2 × +3 × +4 × +5 ×
36 36 36 36 36
210
= ≈ 5.833
36
√
The standard deviation is σZ = 5.833 ≈ 2.415.
Page 57 Dr Matteo Capoferri & Dr Adrian Turcanu
Exercise 2.4.5. Determine whether the variables U and V from Exercise 2.3.3 are
independent.
Exercise 2.4.6 (*). Using representation (2.9), (2.10) find the variance and stan-
dard deviation of the binomial distribution Bn,p (k). You will need to use that Xi
and Xj are independent for i ̸= j. Is this obvious?
Solution. Recall that the expectation value was computed earlier, just after formula
(2.10). To compute the variance we note that by formula (2.12) we have
Xn Xn
Var(Bn,p ) = E[( Xi )2 ] − [E( Xi )]2 . (2.13)
i=1 i=1
Using that E(Xi ) = p and expanding the square in the first term we obtain
n X
X n
Var(Bn,p ) = E(Xi · Xj ) − (pn)2 . (2.14)
i=1 j=1
In the first sum we can single out the terms with i = j and use that Xi2 = Xi :
X X
Var(Bn,p ) = E(Xi · Xj ) + E(Xi ) − (pn)2
i̸=j i
X
= E(Xi · Xj ) + np − (pn)2 .
i̸=j
This method has the advantage of avoiding the complicated combinatorics related
to the binomial coefficients that enter the definition of Bn,p .
Discrete Mathematics Page 58
Question: For a randomly distributed initial data how many times on average
m will be assigned/reassigned?
Answer: Before we tackle the question let us ask ourselves: what are the best
and the worst scenarios? Clearly if x1 is the maximum m will be only assigned
ones and never reassigned. While if all the numbers happen to be lined up in the
increasing order we will have n (re)assignments. So the average number should be
somewhere in between. Since what matters to the problem is only the order in which
n distinct numbers are arranged, and not the precise magnitudes of these numbers,
we can take our sample space to be the set of all permutations of the numbers
{1, 2, 3, . . . , n}. Furthermore it is reasonable to assume that each permutation is
equiprobable. Let Y be the random variable defined on our sample space which
gives for each permutation the number of times m is assigned by the algorithm.
We want E(Y ). It is convenient to represent Y as a sum of simpler variables Xk .
Specifically, let Xk be the binary variable which is 1 when MaxNumber assigns m
at k-th spot (while examining xk ) and 0 if it doesn’t. Clearly
n
X
Y = Xk .
k=1
Question: What is the average number of comparisons (when the search word
is guaranteed to be on the list)?
Answer: It is natural to assume that any position of the search word on the list is
equiprobable, that is the probability for 1 comparison, 2 comparisons, 3 comparisons,
etc. is 1/n. By the general rule for computing averages we obtain
1 n+1
Average number of comparisons = (1 + 2 + 3 + · · · + n) = (2.16)
n 2
where the last equality can be proved by mathematical induction.
Clearly the algorithm SeqSearch is very simplistic. It does not make any use of
the fact that the list is alphabetized. A much more efficient search algorithm is the
algorithm BinSearch (short for binary search), which works as follows. The algo-
rithm starts by looking at the (approximate) middle of the list. If it does not find
the right word it determines, using the fact that the list is alphabetized, in which
half the right word must be. It next discards the other half and looks again by going
to the middle of the remaining list. Clearly this algorithm must perform much faster
than SeqSearch because at each step the list shortens by half (approximately). A
good measure of how fast each algorithm performs is the average number of com-
parisons. It can be proved that for BinSearch it is log2 n − 1. Thus for a list of 1000
words SeqSearch will make on average 1001/2 ≈ 500 comparisons while BinSearch
will make only log2 (1000) − 1 ≈ 9 comparisons.
Question: What is the average number of time the random number generator
Rand is called.
Answer: Let us first derive what is the average number of Rand calls to pick
the (k + 1)st number. The probability of each outcome in Rand is 1/n. The number
of points in the event ‘a new number is produced’ is n − k (because k numbers have
been already picked). Thus the chance of ‘success’ (in generating a new number) is
n−k
n
. Now we see that we are dealing here with Bernoulli trials with p = n−k n
and
the question is what is the average number of trials to achieve success? The number
of trials until success is itself a random variable, call it X. What is the probability
that X = m? Evidently it is q m−1 p — the probability for m − 1 failures and one
success in the end. The desired average number of trials is then given by an infinite
sum
∞
X
Average number of trials to pick the k + 1 number = mq m−1 p , (2.17)
m=1
where p = n−kn
. Using some algebra (the formula for the sum of a geometric se-
quence) it can be found that for a finite sum we have
M
X 1 1
mq m−1 p = − (M + )q M . (2.18)
m=1
p p
For very large M ’s the second piece is negligible because q M becomes very small,
overpowering M . Dropping the second term in (2.18) we arrive at the following
result
1 n
Average number of trials to pick the k + 1 number = = . (2.19)
p n−k
As a final step we have to sum the average numbers of Rand calls for each number
k + 1 in the permutation. We obtain
n−1 n
X n X 1
Average number of Rand calls = =n . (2.20)
k=0
n−k j=1
j
Graph Theory
a a e5
e3 e4
d
b e1
e2 c
c b
Fig 1 Fig 2
We can present graphs by showing only their diagrams. For example, we can read
off Fig. 2 that the diagram therein represents the graph with vertex set V = {a, b, c}
and edge list
E : {a, b}, {a, c}, {a, a}, {c, b}, {c, b} .
61
Discrete Mathematics Page 62
To distinguish the multiple edges they can be equipped with additional labels. Thus
for the graph depicted in Fig. 2 we can label the edges {e1 , e2 , e3 , e4 , e5 } as shown
in the picture. With each label we associate the pair of vertices the corresponding
edge connects. For example e2 is associated with {b, c} and e5 with {a, a}. These
vertices are called the end points of the corresponding edge.
Definition 3.1. A graph is called simple if it has no loops and no multiple edges.
When edges and vertices are removed from a graph, without removing endpoints
of any remaining edges, a smaller graph is obtained. Such a graph is called a subgraph
of the original graph. The vertices of the subgraph form a subset of the vertices of
the original graph and the edges of the subgraph appear in the edge list of the
original graph.
For example a graph with vertices V = {a, b} and edges E : {a, a}, {a, b}, de-
picted below, is a subgraph of the graph from Fig. 2.
a e5
e3
3.1.1 Exercises
Exercise 3.1.1 (*). Draw the diagram corresponding to the graph given by V =
{a, b, c, d} and
E : {a, b}, {c, d}, {a, d} .
Exercise 3.1.2 (*). Draw the diagram corresponding to the graph given by V =
{a, b, c, d, e f } and
E : {a, f }, {a, e}, {b, c}, {b, f }, {c, f }, {d, e}, {e, f } .
a a b c
b
c f
d d e
Fig 1 Fig 2
Example 3.4 (*). Find the adjacency matrix for the graph from Fig. 2.
Solution. Ordering the vertices alphabetically we obtain
1 1 1
A = 1 0 2 .
1 2 0
Discrete Mathematics Page 64
Note that the adjacency matrix of a graph is symmetric, that is, aij = aji (the
matrix is symmetric upon reflecting across the diagonal).
For a simple graph all entries aij are either 0 or 1. Also, since for a simple graph
an edge does not connect a vertex to itself, we have that in this case the diagonal
elements vanish: aii = 0 .
Example 3.6. For the graph in Fig. 1, deg(a) = 2, deg(b) = 2, deg(c) = 1 and
deg(d) = 3.
By examining structure of the graph, we notice that each edge in a graph has
two ends. Hence, each edge must contribute 2 to the sum of the degrees. This
means that the sum of the degrees of the vertices is twice the number of edges. This
observation is formalised by the following theorem.
Theorem 3.7 (Handshaking Theorem). Let G = (V, E) be a graph with vertex set
V = {v1 , v2 , . . . , vn } . Then
The name “Handshaking Theorem” comes from the fact that a graph can be
used to represent a group of people shaking hands.
3.2.1 Exercises
Exercise 3.2.1 (*). Find the adjacency matrix of the graph in Fig. 1.
Solution.
0 1 0 1
1 0 0 1
A=
0
.
0 0 1
1 1 1 0
Exercise 3.2.2. Find the adjacency matrix of the graph depicted below:
Page 65 Dr Matteo Capoferri & Dr Adrian Turcanu
b c
a d
Exercise 3.2.3. Verify the Handshaking Theorem for the graph given in the pre-
vious exercise.
Exercise 3.2.4 (*). What can you say about the sum of numbers in any row or
column of an adjacency matrix?
Solution. Corresponding to each n in the ith row (or column) of an adjacency matrix,
there are n edges that have the vertex vi as an endpoint. Hence the sum of numbers
in the ith row (or column) of an adjacency matrix is equal to deg(vi ).
Exercise 3.2.5. A graph has four vertices of degree 3 and three vertices of degree
2. How many edges does it have?
Exercise 3.2.6 (*). A graph G = (V, E) has 23 edges and every vertex of G has
degree four or larger. What is the largest possible number of vertices that G can
have?
Exercise 3.2.8 (*). Prove that in any graph the number of vertices of odd degree
is even.
Exercise 3.2.9 (*). Let G be a simple graph with at least two vertices. Prove that
G has two vertices of the same degree.
Solution. Let G be a graph with n vertices. The maximum degree of any vertex is
n − 1 so the possible vertex degrees range from 0 to n − 1.
It is not possible for both 0 and n − 1 to be degrees, since a vertex of degree
n − 1 would be adjacent to all the other vertices including the one of degree 0. It
follows that there are at most n − 1 degrees to be attached to n vertices, so at least
two vertices have the same degree.