0% found this document useful (0 votes)
31 views67 pages

Topic 4 - Greedy Method

DAA THEORY

Uploaded by

22052521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views67 pages

Topic 4 - Greedy Method

DAA THEORY

Uploaded by

22052521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Design and Analysis of Algorithms

Greedy Method
Optimization Problems
• An Optimization Problem is a problem that involves searching through a set of
configurations to find one that minimises or maximizes an objective function
defined on these configurations
• Most of these problems have n inputs and require us to obtain a subset that satisfies some
constraints
• Any subset that satisfies those constraints is called a feasible solution
• We need to find a feasible solution that either maximizes or minimizes a given objective function
• A feasible solution that does this is called an optimal solution

• There are Various Methods for solving optimization problems:


• Greedy Methods
• Dynamic programming Methods
• Branch and Bound Techniques
• Backtracking, etc.
Optimization Problems
• An Optimization Problem is a problem that involves searching through a set of
configurations to find one that minimises or maximizes an objective function
defined on these configurations
• Most of these problems have n inputs and require us to obtain a subset that satisfies some
constraints
• Any subset that satisfies those constraints is called a feasible solution
• We need to find a feasible solution that either maximizes or minimizes a given objective function
• A feasible solution that does this is called an optimal solution

• There are Various Methods for solving optimization problems:


• Greedy Methods
• Dynamic programming Methods
• Branch and Bound Techniques
• Backtracking, etc.
Optimization Problems
• For each instance there are (possibly) multiple valid solutions

• Goal is to find an optimal solution

• Minimization problem:
associate cost to every solution, find min-cost solution
• Maximization problem:
associate profit to every solution, find max-profit solution

• Maximization and Minimization are interchangeable, though


Techniques For Optimization
• Solving Optimization problems typically involve making choices
• Brute-Force: Systematically checking all possible candidates for whether or not each candidate
satisfies the problem's statement
§ Intuitive, direct, and straightforward technique
• Backtracking: Just try all solutions systematically
§ Incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as
it determines that the candidate cannot be completed to a valid solution
§ Try all options for first choice: For each option, recursively make other choices
• Branch and Bound: Solving optimization problems by breaking them down into smaller sub-
problems and using a bounding function to eliminate sub-problems that cannot contain the
optimal solution
§ Consists of a systematic enumeration of candidate solutions by means of state space search
§ Set of candidate solutions is thought of as forming a rooted tree with the full set at the root
§ The algorithm explores branches of this tree, which represent subsets of the solution set
• All these techniques can be applied to almost all optimization problems, but all leads to slow
algorithms, in general
Techniques For Optimization ...
• Greedy algorithms: Construct solution iteratively, always make choice that seems best locally
§ Can be applied to few problems, but gives fast algorithms
§ Builds up a solution piece by piece, always choosing the next piece that offers the most
obvious and immediate benefit
§ Only try option that seems best for first choice (greedy choice),
• Recursively make other choices
§ Sometimes, a greedy choice can yield locally-optimal solutions that approximate a
globally-optimal solution in a reasonable amount of time -- So, popular choice for
(approximately) solving NP-hard/NP-Complete problems
• Dynamic programming: Mainly an optimization over plain recursion
§ In between: Not as fast as greedy, but works for larger class of problems
§ Wherever we see a recursive solution that has repeated calls for same inputs, we can
optimize it by simply storing the results of subproblems, so that we do not have to re-
compute them when needed later
§ Often this simple optimization trick reduces time complexities from exponential to
polynomial
The Greedy Methods
• Roughly Speaking, greedy methods are the methods for solving problems with
the simplest possible algorithms!

• Pseudo-definition: An algorithm is Greedy if it builds its solution by adding


elements one at a time using a simple rule
• Perhaps the most straightforward algorithm design technique

• The hard part: Showing that something simple actually works!!

• Moral: We are Being greedy for local optimization with the hope that it will
lead to a global optimal solution. As expected, it may not work always. But, in
many situations, it really works!!!
The Greedy Methods ...
• The greedy method solves a given Optimization Problem going through a
sequence of feasible choices
• The sequence starts from well-understood starting configuration, and then
iteratively makes the decision that seems best from all those that are
currently possible
• It makes the choice that looks best at the moment and adds it to the current
subsolution
• If the inclusion of the next input into the partially-constructed optimal subsolution
will result in an infeasible solution, then this input is not added to the partial
subsolution; otherwise, it is added
• It makes a locally-optimal choice at each step in the hope that these choices
will lead to a globally-optimal solution of the problem
What Makes a Greedy Algorithm?
• Feasible
• Has to satisfy the problem’s constraints
• Locally Optimal
• The greedy part
• Has to make the best local choice among all feasible choices available on that step
• If this local choice results in a global optimum then the problem has optimal
substructure
• Irrevocable
• Once a choice is made it can’t be un-done on subsequent steps of the algorithm

• Simple examples:
• Playing chess by making best move without lookahead
• Giving fewest number of coins as change
Designing A Greedy Algorithm

• Cast the optimization problem as one in which we make a choice and


are left with one subproblem to solve

• Prove that, there is always an optimal solution to the original problem


that makes the greedy choice, so that the greedy choice is always safe

• Demonstrate that, having made the greedy choice, what remains is a


subproblem with the property that, if we combine an optimal
solution to the subproblem with the greedy choice we have made, we
arrive at an optimal solution to the original problem
Designing A Greedy Algorithm
• No general way to tell if a greedy algorithm is optimal;
• However, any Greedy algorithm has two key ingredients:

• Greedy-choice property: A global optimal solution can be achieved by


making a local optimal (optimal) choice

• Optimal substructure: An optimal solution to the problem within its optimal


solution to subproblem
When To Apply Greedy Methods?
• Apply Greedy Methods when

• Problems exhibit optimal substructure

• Problems also exhibit the greedy-choice property

• When we have a choice to make, make the one that looks best right now

• Make a locally optimal choice in hope of getting a globally optimal solution


Apply Greedy Algorithms for Solving
The Knapsack Problem
The Knapsack Problem
• The classic Knapsack Problem:

• Decision version of the Problem:


The Knapsack Problem
• A thief robbing a store finds n items

• The i th item is worth (or gives a profit of) pi dollars and weighs
wi pounds

• Thief’s knapsack can carry at most M pounds

• What items to select to maximize profit?


The Knapsack Problem: Variants

• The fractional knapsack problem:


• Thief can take fractions of items 0 xi 1

• The binary knapsack problem: (0-1 knapsack problem)


• Each item is either taken or left entirely x i  1 or 0
• pi, wi, and M are integers

Let xi be the fraction of item i, which will be put into the knapsack
Fractional Knapsack Problem
The Problem: Given a knapsack with a certain capacity M,
n items, which are to be put into the knapsack,
each item has a weight w1 , w2 ,  , wn and
a profit p1 , p 2 ,  , p n
.
The Goal: Find ( x1 , x 2 ,  , x n ) where 0  xi  1
n n

s.t.  pi xi is maximized and w


i 1
i xi  M
i 1
Example: Fractional Knapsack Problem
n3 (w1 , w2 , w3 )  (18,15,10)
M  20 ( p1 , p2 , p3 )  (25,24,15)

• Greedy Strategy #1: Items are ordered in nonincreasing order of


profits (1,2,3)
2
( x1 , x 2 , x3 )  (1, ,0 )
15
3
2
 p i xi  25 * 1 + 24 *
15
+ 15 * 0  28 .2
i 1
Example: Fractional Knapsack Problem
n3 (w1 , w2 , w3 )  (18,15,10)
M  20 ( p1 , p2 , p3 )  (25,24,15)

• Greedy Strategy #2: Items are ordered in nondecreasing order of


weights (3,2,1)

2
( x1 , x 2 , x 3 )  ( 0 , ,1)
3
3
2
px i i
 25 * 0 + 24 *
3
+ 15 * 1  31
i 1
Example: Fractional Knapsack Problem
n3 (w1 , w2 , w3 )  (18,15,10)
M  20 ( p1 , p2 , p3 )  (25,24,15)

• Greedy Strategy #3: Items are ordered in nonincreasing order of


profit/weight
p1 25
 » 1.4
w1 18
1
Optimal Solution?
p2 24 
( x1 , x2 , x3 ) ( 0, 1, )
  1.6 Þ ( 2,3,1) 2
w2 15 3
1
p3 15  pi xi  25 * 0 + 24 * 1 + 15 * 2
 31.5
  1.5 i 1
w3 10
Fractional Knapsack Problem: Greedy Algorithm
1. Calculate vi = pi / wi for i = 1, 2, …, n
2. Sort items by nonincreasing vi
(all wi, pi are also reordered correspondingly)
3. Let M' be the current weight limit (Initially, M' = M and
xi=0 ). In each iteration, choose item i from the head
of the unselected list
If M' >= wi , set xi = 1, and M' = M'-wi
If M' < wi , set xi = M'/wi and the algorithm is finished

n4 ( w1 , w2 , w3 , w4 )  (5,15,10,12)
M  25 ( p1 , p2 , p3 , p4 )  (25,21,15,6)
Time Complexity
1. Calculate vi = pi / wi for i = 1, 2, …, n O(n)
2. Sort items by nonincreasing vi O(nlogn)
(all wi are also reordered correspondingly )
3. Let M' be the current weight limit (Initially, M' = M and
xi=0 ). In each iteration, choose item i from the head
of the unselected list. O(n)
If M' >= wi , set xi = 1, and M' = M' - wi
O(1) If M' < wi , set xi = M'/wi and the algorithm is finished

O(nlogn)
Correctness?

???
Proof: Correctness of Strategy #3
• Proved by the method of Contradiction
• Let X be the solution of greedy strategy #3
• Assume that X is not optimal
• There is an optimal solution Y and the profit of Y is greater than the profit of X
• Consider the item j in X but not in Y
• Get rid of some items with total weight wj (possibly fractional items) and add item j to Y
• The capacity remains the same
• Total value is not decreased
• One more item in X is added to Y
• Repeat the process until Y is changed to contain all items selected in X
• Total value is not decreased.
• The capacity remains the same
• X is optimal, too
Contradiction!
0-1 Knapsack Problem (xi can be 0 or 1)
Knapsack capacity: 50

0 0 0
+ + +
100 120 120
+ + +
60 60 100
=160 =180 =220
i 1 2 3
pi 60 100 120
Can 0-1 Knapsack be solved
wi 10 20 30 by greedy algorithm?
Apply Greedy Algorithms for Solving
Job Sequencing with Deadlines
JOB SEQUENCING WITH DEADLINES
The problem is stated as below.
• There are n jobs to be processed on a machine.
• Each job i has a deadline di≥ 0 and profit pi≥0 .
• Pi is earned iff the job is completed by its deadline.
• The job is completed if it is processed on a machine
for unit time.
• Only one machine is available for processing jobs.
• Only one job is processed at a time on the machine.
27
JOB SEQUENCING WITH DEADLINES
(Contd..)
• A feasible solution is a subset of jobs J such that
each job is completed by its deadline.
• An optimal solution is a feasible solution with
maximum profit value.
Example : Let n = 4,
(p1,p2,p3,p4) = (100,10,15,27),
(d1,d2,d3,d4) = (2,1,2,1)

28
JOB SEQUENCING WITH DEADLINES
(Contd..)
Sr.No. Feasible Processing Profit value
Solution Sequence
(i) (1,2) (2,1) 110
(ii) (1,3) (1,3) or (3,1) 115
(iii) (1,4) (4,1) 127 is the optimal one
(iv) (2,3) (2,3) 25
(v) (3,4) (4,3) 42
(vi) (1) (1) 100
(vii) (2) (2) 10
(viii) (3) (3) 15
(ix) (4) (4) 27

29
GREEDY ALGORITHM TO OBTAIN AN
OPTIMAL SOLUTION
• Consider the jobs in the non increasing order of profits
subject to the constraint that the resulting job sequence J is a
feasible solution.
• In the example considered before, the non-increasing profit
vector is
(100 27 15 10) (2 1 2 1)
p1 p4 p3 p2 d1 d4 d3 d2

30
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

J = { 1} is a feasible one
J = { 1, 4} is a feasible one with processing
sequence ( 4,1)
J = { 1, 3, 4} is not feasible
J = { 1, 2, 4} is not feasible
J = { 1, 4} is optimal

31
Job sequencing with deadlines
n Problem: n jobs, S={1, 2, …, n}, each job i has a
deadline di  0 and a profit pi  0. We need one unit
of time to process each job and we can do at most
one job each time. We can earn the profit pi if job i
is completed by its deadline.

i 1 2 3 4 5
pi 20 15 10 5 1
di 2 2 1 3 3

3 -32
Algorithm:
Step 1: Sort pi into nonincreasing order. After
sorting p1  p2  p3  …  pn.
Step 2: Add the next job i to the solution set if i
can be completed by its deadline. Assign i to
time slot [r-1, r], where r is the largest
integer such that 1  r  di and [r-1, r] is free.
Step 3: Stop if all jobs are examined. Otherwise,
go to step 2.

Time complexity: O(n2)

3 -33
e.g.
i pi di
1 20 2 assign to [1, 2]
2 15 2 assign to [0, 1]
3 10 1 reject
4 5 3 assign to [2, 3]
5 1 3 reject

Solution = {1, 2, 4}
Total Profit = 20 + 15 + 5 = 40

3 -34
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

Theorem: Let J be a set of K jobs and


 = (i1,i2,….ik) be a permutation of jobs in J such that di1 ≤
di2 ≤…≤ dik.
• J is a feasible solution iff the jobs in J can be processed in
the order  without violating any deadline.

35
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)
Proof:
• By definition of the feasible solution if the jobs in J can be
processed in the order without violating any deadline then J
is a feasible solution.
• So, we have only to prove that if J is a feasible one, then 
represents a possible order in which the jobs may be
processed.

36
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• Suppose J is a feasible solution. Then there exists 1 =


(r1,r2,…,rk) such that
drj  j, 1  j <k
i.e. dr1 1, dr2  2, …, drk  k.
each job requiring an unit time.

37
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

•  = (i1,i2,…,ik) and 1 = (r1,r2,…,rk)


• Assume  1  . Then let a be the least index in which  1
and  differ. i.e. a is such that ra  ia.
• Let rb = ia, so b > a (because for all indices j less than a rj =
ij).
• In  1 interchange ra and rb.

38
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

 = (i1,i2,… ia ib ik ) [rb occurs before ra


in i1,i2,…,ik]
 1 = (r1,r2,… ra rb … rk )
i1=r1, i2=r2,…ia-1= ra-1, ia  rb but ia = rb

39
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• We know di1  di2  … dia  dib …  dik.


• Since ia = rb, drb  dra or dra  drb.
• In the feasible solution dra  a drb  b
• So if we interchange r a and r b , the resulting permutation
11= (s1, … sk) represents an order with the least index in
which 11 and  differ is incremented by one.

40
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• Also the jobs in 11 may be processed without violating a


deadline.
• Continuing in this way,  1 can be transformed into 
without violating any deadline.
• Hence the theorem is proved.

41
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)
• Theorem2:The Greedy method obtains an optimal solution to the
job sequencing problem.
• Proof: Let(pi, di) 1in define any instance of the job sequencing
problem.
• Let I be the set of jobs selected by the greedy method.
• Let J be the set of jobs in an optimal solution.
• Let us assume I≠J .

42
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• If J ⊂ I then J cannot be optimal, because less number of


jobs gives less profit which is not true for optimal solution.
• Also, I ⊂ J is ruled out by the nature of the Greedy method.
(Greedy method selects jobs (i) according to maximum
profit order and (ii) All jobs that can be finished before dead
line are included).

43
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• So, there exists jobs a and b such that aI, aJ, bJ,bI.
• Let a be a highest profit job such that aI, aJ.
• It follows from the greedy method that pa  pb for all jobs
bJ,bI. (If pb > pa then the Greedy method would consider
job b before job a and include it in I).

44
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)
• Let S i and S j be feasible schedules for job sets I and J
respectively.
• Let i be a job such that iI and iJ.
(i.e. i is a job that belongs to the schedules generated by the
Greedy method and optimal solution).
• Let i be scheduled from t to t+1 in SI and t1to t1+1 in Sj.

45
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)
• If t < t1, we may interchange the job scheduled in [t1
t1+1] in SI with i; if no job is scheduled in [t1 t1+1] in
SI then i is moved to that interval.
• With this, i will be scheduled at the same time in S I
and SJ.
• The resulting schedule is also feasible.
• If t1 < t, then a similar transformation may be made in
Sj.
• In this way, we can obtain schedules SI1 and SJ1 with
the property that all the jobs common to I and J are
scheduled at the same time.
46
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)
• Consider the interval [Ta Ta+1] in SI1 in which the job a is scheduled.
• Let b be the job scheduled in Sj1 in this interval.
• As a is the highest profit job, pa  pb.
• Scheduling job a from ta to ta+1 in Sj1 and discarding job b gives us a
feasible schedule for job set J1 = J-{b} U {a}. Clearly J1 has a profit
value no less than that of J and differs from in one less job than does
J.

47
GREEDY ALGORITHM TO OBTAIN AN OPTIMAL
SOLUTION (Contd..)

• i.e., J1 and I differ by m-1 jobs if J and I differ from m jobs.


• By repeatedly using the transformation, J can be
transformed into I with no decrease in profit value.
• Hence I must also be optimal.

48
GREEDY ALGORITHM FOR JOB SEQUENSING
WITH DEADLINE
Procedure Greedy_Job (D, J, n) J may be represented by
// J is the set of n jobs to be completed// one dimensional array J (1: k)
// by their deadlines // The deadlines are
J {1} D (J(1))  D(J(2))  ..  D(J(k))
FOR i  2 to n DO To test if J U {i} is feasible,
IF all jobs in J U{i} can be completed we insert i into J and verify
by their deadlines D(J(r))  r 1  r  k+1
THEN J  J U{i}
END IF
END FOR
END Greedy_Job

49
Apply Greedy Algorithms for Solving
Optimal Code Design Problem
Optimal text encoding: Huffman Code
“bla□bla …”

0100110000010000010011000001 …

Standard text encoding schemes: fixed number of bits per character


• ASCII: 7 bits (extended versions 8 bits)
• UCS-2 (Unicode): 16 bits

Can we do better using variable-length encoding?


Idea: Give characters that occur frequently a short code and
give characters that do not occur frequently a longer code
The Encoding Problem
Input: set C of n characters c1,…cn; for each character ci its frequency f(ci )

Output: binary code for each character


code(c1) = 01001, code (c2) = 010, … not a prefix-code

Variable length encoding: how do we know where characters end ?


text = 0100101100 … Does it start with c1 = 01001 or c2 = 010 or … ??

Use prefix-code: No character code is prefix of another character code


Variable-length Prefix Encoding: Will It Help?
Text: “een□voordeel”
Frequencies: f(e) = 4, f(n) = 1, f(v) = 1, f(o) = 2, f(r) = 1, f(d) = 1, f(l) = 1, f(□) = 1

Fixed-length code:
e = 000 n = 001 v = 010 0 = 011 r = 100 d = 101 l = 110 □ = 111
Length of encoded text: 12 x 3 = 36 bits

A possible prefix code:


e = 00 n = 0110 v = 0111 o = 010 r = 100 d = 101 l = 110 □ = 111
length of encoded text: 4 x 2 + 2 x 4 + 6 x 3 = 34 bits
Huffman Codes
a b c d e f
Frequency (in Thousands) 45 13 12 16 9 5

Fixed length codeword 000 001 010 011 100 101

Variable length codeword 0 101 100 111 1101 1100

Code for Data Compression


(45+13+12+16+9+5)*3*1000=300,000 bits --- Fixed length Code

(45*1+13*3+12*3+16*3+9*4+5*4)*1000=224,000 bits --- Variable length Code


A saving of 25%
Constructing a Huffman Code

ALGO: HUFFMAN( C )
1 n  |C|
2Q  C
3 For i  1 to n – 1
4 Do allocate a new node z
5 left[z]  x  EXTRACT-MIN(Q)
6 right[z]  y  EXTRACT-MIN(Q)
7 f[z]  f[x] + f[y]
8 INSERT(Q,Z)
9 Return EXTRACT-MIN(Q)

Why??
The Steps of Huffman’s Algorithm
The Steps of Huffman’s Algorithm ...

Resulting Huffman Code:


a = 0 b = 101 c = 100 d = 111 e = 1101 f = 1100
Representation of Prefix Codes
Text: “een□voordeel”
Frequencies: f(e) = 4, f(n) = 1, f(v) = 1, f(o) = 2, f(r) = 1, f(d) = 1, f(l) = 1, f(□) = 1
Prefix Code: e = 00 n = 0110 v = 0111 o = 010 r = 100 d = 101 l = 110 □ = 111

0 1 Representation is a binary tree T:

0 1 0 1 § One leaf for each character


e
0 1 1 1
§ Internal nodes always have two
0 0
o r d l □ outgoing edges, labeled 0 and 1
0 1 § Code of character: follow path to leaf
n v and list bits
Representation of Prefix Codes
Text: “een□voordeel”
Frequencies: f(e) = 4, f(n) = 1, f(v) = 1, f(o) = 2, f(r) = 1, f(d) = 1, f(l) = 1, f(□) = 1
Prefix Code: e = 00 n = 0110 v = 0111 o = 010 r = 100 d = 101 l = 110 □ = 111

0 1

0 1 0 1 Cost of encoding represented by T:


e
4 0 1 0 1 0 1 ∑i f(ci) ∙ depth(ci)
o r d l □
2 1 1 1 1 1
0
n v
1 1
frequencies
Designing Greedy Algorithms: Recap
1. Try to discover structure of optimal solutions: what properties do optimal
solutions have ?
§ What are the choices that need to be made ?
§ Do we have optimal substructure ?

§ Do we have greedy-choice property for the first choice ?

2. Prove that optimal solutions indeed have these properties


§ Prove optimal substructure and greedy-choice property

3. Use these properties to design an algorithm and prove correctness


§ Proof by Induction (possible because of presence of optimal substructure)
Bottom-up Construction Of Tree
Start with separate leaves, and then “merge” n-1 times until we have the tree

Choices: Which subtrees to merge at every step?

c1 c2 c3 c4 c5 c6 c7 c8
4 2 1 1 1 1 1 1

we do not have to merge


adjacent leaves
Bottom-up Contruction Of Tree
Start with separate leaves, and then “merge” n-1 times until we have the tree

Choices: Which subtrees to merge at every step?

c1 c2 c3 c4 c5 c6 c7 c8 b
4 2 1 1 1 1 1 1 2

Do we have a greedy-choice property?


Which leaves should we merge first?

Greedy choice: first merge two leaves with smallest character


frequency
Bottom-up Contruction Of Tree
Start with separate leaves, and then “merge” n-1 times until we have the tree

Choices: Which subtrees to merge at every step?

c1 c2 c3 c4 c5 c6 c7 c8 b
4 2 1 1 1 1 1 1 2

Do we have optimal substructure?


Do we even have a problem of the same type?

Yes, we have a subproblem of the same type:


After merging, replace merged leaves ci, ck by a single leaf b with f (b) = ?f (ci ) + f(ck)
(other way of looking at it: problem is about merging weighted subtrees)
Lemma: Let ci and ck be siblings in an optimal tree for set C of characters.
Let B = ( C \ {ci , ck } ) U {b}, where f (b) = f (ci ) + f(ck).
Let TB be an optimal tree for B.
Then replacing the leaf for b in TB by an internal node with ci , ck as
children results in an optimal tree for C.

Proof. Left as an excercise.


Lemma: Let ci, ck be two characters with the lowest frequency in C.
Then there is an optimal tree TOPT for C where ci, ck are siblings.

Proof. Let OPT be an optimal tree TOPT for C. If ci, ck are siblings in TOPT then
the lemma obviously holds, so assume this is not the case.
We will show how to modify TOPT into a tree T* such that

(i) T* is a valid tree standard text you can


(ii) ci, ck are siblings in T* basically use in proof for
any greedy-choice
(iii) cost(T*) ≤ cost(TOPT) property

quality T* ≥ quality TOPT


Thus T* is an optimal tree in which ci, ck are siblings, and so the lemma holds. To
modify TOPT we proceed as follows.
How to modify TOPT ?
T*
TOPT

ck cm
ci depth = d1 cs
v

cs cm depth = d2 ci ck

change in cost due to swapping ci and cs


§ Take a deepest internal node v cost (TOPT) – cost (T*)
§ Make ci, ck children of v by = f (cs) ∙ (d2 – d1) + f (ci ) ∙ (d1 – d2)
swapping them with current children
(if necessary) = ( f (cs) – f (ci ) ) ∙ (d2 – d1)
≥ 0
Conclusion: T* is valid tree where ci, ck are siblings and cost(T*) ≤ cost (TOPT).
Algorithm Construct-Huffman-Tree (C: set of n characters)
1. if |C | = 1
2. then return a tree consisting of single leaf, storing the character in C
3. else ci , ck ← two characters from C with lowest frequency
4. Remove ci , ck from C, and replace them by a new character b
with f(b) = f(ci ) + f(ck ). Let B denote the new set of characters.
5. TB← Construct-Huffman-Tree(B)
6. Replace leaf for b in TB with internal node with ci , ck as children.
7. Let T be the new tree.
8. return T

Correctness: by induction, using optimal substructure and greedy-choice roperty

Running time:
• O(n2) ?!
• O(n log n) if implemented smartly (use heap)
• Sorting + O(n) if implemented even smarter (hint: 2 queues)

You might also like