Greedy Algorithms
Greedy Algorithms
For large values of d, brute force search is not feasible because there are 2d
subsets of {1, . . . , d}.
We can estimate M using the Greedy method:
We first sort the weights in decreasing (or rather nonincreasing order)
w1 w2 ... wd
We then try the weights one at a time, adding each if there is room. Call
this resulting estimate M .
It is easy to find examples for which this greedy algorithm does not give the
optimal solution; for example weights {501, 500, 500} with C = 1000. Then
M = 501 but M = 1000. However, this is just about the worst case:
Lemma 1 M 21 M
1
These notes are based on lecture notes by Stephen Cook and Michael Soltys in CSC
364, University of Toronto, Spring 2003
Proof:
We first show that if M 6= M , then M > 21 C; this is left as an exercise. Since
C M , the Lemma follows.
Unfortunately the greedy algorithm does not necessarily yield an optimal
solution. This brings up the question: is there any polynomial-time algorithm that is guaranteed to find an optimal solution to the simple knapsack
problem?
However, a positive answer to this question would show that NP=P, since
we can show that Knapsack problem is NP-hard. Let us consider a decision
version of Knapsack, in which there is one more parameter B; now, rather
than trying to get S with maximal possible sum, we are just asking if there
exists an S {1, . . . , n} with K(S) B
Lemma 2 The decision version of Simple Knapsack is NP-complete.
Proof:
The valid instances are of the form {w1 , . . . , wn , C, B}, and an instance is in
the language if there exists an S {1, . . . , n} such that B iS wi C.
To see that this problem is in NP, note that given a certificate S it takes
polynomial time to compute the sum of elements in S and compare it to two
numbers B and C.
To see that it is NP-hard, we will use a reduction from SubsetSum problem;
that is, we will show SubsetSum p DecisionSimpleKnapsack. Recall that
instances of SubsetSum are of the form {a1 , . . . , an , t}, and an instance is
in the language iff there is an S {1, . . . , n} such that iS ai = t. Now,
f ({a1 , . . . , an , t}) = {a1 , . . . , an , t, t}. That is, set wi = ai for i {1, . . . , n}
and set both B and C equal to t. Now, if there is a set S 0 such that the sum
of elements in it is both and to t, then it must equal to t. Similarly for
the other direction, a set S with sum of its elements equal to t satisfies both
B and C conditions for B = C = t. .
So unless something drastic (that is, P=NP) happens, we have no hope of
getting a polynomial-time algorithm for Knapsack. However, we can change
the problem in such a way that the new problem is indeed solvable in polynomial time. Let us consider a variation called FractionalKnapsack: in this
2
Kruskals Algorithm:
Sort the edges so that: c(e1 ) c(e2 ) . . . c(em )
T
for i : 1..m
(*) if T {ei } has no cycle then
T T {ei }
end if
end for
But how do we test for a cycle (i.e. execute (*))? After each execution of
the loop, the set T of edges divides the vertices V into a collection V1 . . . Vk
of connected components. Thus V is the disjoint union of V1 . . . Vk , each Vi
forms a connected graph using edges from T , and no edge in T connects Vi
and Vj , if i 6= j.
A simple way to keep track of the connected components of T is to use an
array D[1..n] where D[i] = D[j] iff vertex i is in the same component as
vertex j. So our initialization becomes:
T
for i : 1..n
D[i] i
end for
To check whether ei = [r, s] forms a cycle with T , check whether D[r] = D[s].
If not, and we therefore want to add ei to T , we merge the components
containing r and s as follows:
k D[r]
l D[s]
for j : 1..n
if D[j] = l then
D[j] k
end if
end for
5
in time O(n2 log n). Since it is reasonable to view the size of the input as n,
this is a polynomial-time algorithm.
This running time can be improved to O(m log m) (equivalently O(m log n))
by using a more sophisticated data structure to keep track of the connected
components of T ; this is discussed on page 570 of CLRS (page 505 of CLR).
Correctness of Kruskals Algorithm
It is not immediately clear that Kruskals algorithm yields a spanning tree
at all, let alone a minimum cost spanning tree. We will now prove that it
does in fact produce an optimal spanning tree. To show this, we reason that
after each execution of the loop, the set T of edges can be expanded to an
optimal spanning tree using edges that have not yet been considered. Hence
after termination, since all edges have been considered, T must itself be a
minimum cost spanning tree.
We can formalize this reasoning as follows:
Definition 1 A set T of edges of G is promising after stage i if T can be expanded to a optimal spanning tree for G using edges from {ei+1 , ei+2 , . . . , em }.
That is, T is promising after stage i if there is an optimal spanning tree Topt
such that T Topt T {ei+1 , ei+2 , . . . , em }.
Lemma 4 For 0 i m, let Ti be the value of T after i stages, that is,
after examining edges e1 , . . . , ei . Then the following predicate P (i) holds for
every i, 0 i m:
P (i) : Ti is promising after stage i.
Proof:
We will prove this by induction. P (0) holds because T is initially empty.
Since the graph is connected, there exists some optimal spanning tree Topt ,
and
T0 Topt T0 {e1 , e2 , . . . , em }.
For the induction step, let 0 i < m, and assume P (i). We want to show
P (i + 1). Since Ti is promising for stage i, let Topt be an optimal spanning
7
partial solution after stage i is Si and that the partial solution after stage
i + 1 is Si+1 , and we know that there is an optimal solution Sopt that extends
0
Si ; we want to prove that there is an optimal solution Sopt
that extends
Si+1 . Si+1 extends Si by making only one decision; if Sopt makes the same
0
decision, then it also extends Si+1 , and we can just let Sopt
= Sopt and we
are done. The hard part of the induction step is if Sopt does not extend Si+1 .
In this case, we have to show either that Sopt could not have been optimal
(implying that this case cannot happen), or we show how to change some
0
parts of Sopt to create a solution Sopt
such that
0
Sopt
extends Si+1 , and
0
has value (cost, profit, or whatever it is were measuring) at least
Sopt
0
is
as good as Sopt , so the fact that Sopt is optimal implies that Sopt
optimal.
For most greedy algorithms, when it ends, it has constructed a solution that
cannot be extended to any solution other than itself. Therefore, if we have
proven the above, we know that the solution constructed must be optimal.
10