unit-3 r23
unit-3 r23
Greedy Method: General Method, Job Sequencing with deadlines, Knapsack Problem, Minimum
cost spanning trees, Single Source Shortest Paths
Dynamic Programming: General Method, All pairs shortest paths, Single Source Shortest Paths–
General Weights (Bellman Ford Algorithm), Optimal Binary Search Trees, 0/1 Knapsack, String
Editing, Travelling Salesperson problem
1. To find an optimal solution (Activity selection, Fractional Knapsack, Job Sequencing, Huffman
Coding).
2. To find close to the optimal solution for NP-Hard problems like TSP.
Advantages and Disadvantages of Greedy Approach
Following are the various advantages and disadvantages of the greedy approach.
Advantages
It is easy to implement.
Has fewer time complexities.
Can be used for the purpose of optimization or finding close to optimization in the case of NP-Hard
problems.
Disadvantages
One disadvantage of this algorithm is that the local optimal solution may not always be globally
optimal.
Possible Solutions
{coin * count}
{5 * 10} = 50 [10 coins]
{5 * 8 + 10 * 1} = 50 [9 coins] goes on.
{10 * 5} = 50 [5 coins]
{20 * 2 + 10 * 1} = 50 [3 coins]
{20 * 2 + 5 * 2} = 50 [4 coins]
{25 * 2} = 50 [2 coins]
etc etc
Best Solution
Two 25 rupees. Total coins two.
25 * 2 = 50
PROBLEM BASED ON JOB SEQUENCING WITH DEADLINES-We are given the jobs, their deadlines
and associated profits as shown-
Jobs J1 J2 J3 J4 J5 J6
Deadlines 5 3 3 2 4 2
Profits 201 181 191 301 121 101
Answer the following questions-
1. Write the optimal schedule that provides us the maximum profit.
2. Can we complete all the jobs in the optimal schedule?
3. What is the maximum earned profit?
Solution:
Step-01:
Firstly, we need to sort all the given jobs in decreasing order of their profit as follows.
Jobs J4 J1 J3 J2 J5 J6
Deadlines 2 5 3 3 4 2
Profits 300 200 190 180 120 100
Step-02:
For each step, we calculate the value of the maximum deadline.
Here, the value of the maximum deadline is 5.
So, we draw a Gantt chart as follows and assign it with a maximum time on the Gantt chart with
5 units as shown below.
Now,
We will be considering each job one by one in the same order as they appear in the Step-01.
We are then supposed to place the jobs on the Gantt chart as far as possible from 0.
Step-03:
We now consider job4.
Since the deadline for job4 is 2, we will be placing it in the first empty cell before deadline 2 as
follows.
Step-04:
Now, we go with job1.
Since the deadline for job1 is 5, we will be placing it in the first empty cell before deadline 5 as
shown below.
Step-05:
We now consider job3.
Since the deadline for job3 is 3, we will be placing it in the first empty cell before deadline 3 as
shown in the following figure.
Step-06:
Next, we go with job2.
Since the deadline for job2 is 3, we will be placing it in the first empty cell before deadline 3.
Since the second cell and third cell are already filled, so we place job2 in the first cell as shown
below.
Step-07:
Now, we consider job5.
Since the deadline for job5 is 4, we will be placing it in the first empty cell before deadline 4 as
shown in the following figure.
Now,
We can observe that the only job left is job6 whose deadline is 2.
Since all the slots before deadline 2 are already occupied, job6 cannot be completed.
Now, the questions given above can be answered as follows:
Part-01:
The optimal schedule is-
Job2, Job4, Job3, Job5, Job1
In order to obtain the maximum profit this is the required order in which the jobs must be
completed.
Part-02:
As we can observe, all jobs are not completed on the optimal schedule.
This is because job6 was not completed within the given deadline.
Part-03:
Maximum earned profit = Sum of the profit of all the jobs from the optimal schedule
= Profit of job2 + Profit of job4 + Profit of job3 + Profit of job5 + Profit of job1
= 181 + 301 + 191 + 121 + 201
= 995 units
Fractional Knapsack
Knapsack
Fractional Knapsack
In this category, items can be broken into smaller pieces, and the thief can select fractions of
items.
According to the problem scenario,
We can see that the provided items are not sorted based on the value of piwi, we perform
sorting. After sorting, the items are shown in the following table.
Item B A C D
12
Profit 101 281 121 1
Weight 10 40 20 24
(pi/wi) 10 7 6 5
Solution
Once we sort all the items according to the pi/wi, we choose all of B as the weight of B is less
compared to that of the capacity of the knapsack.
Further, we choose item A, as the available capacity of the knapsack is greater than the
weight of A.
Now, we will choose C as the next item. Anyhow, the whole item cannot be chosen as the
remaining capacity of the knapsack is less than the weight of the chosen item – C.
Hence, a fraction of C (i.e. (60 − 50)/20) is chosen.
Now, we reach the stage where the capacity of the Knapsack is equal to the chosen items.
Hence, no more items can be selected.
The total weight of the chosen items is 40 + 10 + 20 * (10/20) = 60
And the total profit is 101 + 281 + 121 * (10/20) = 380 + 60 = 440units
This is the optimal solution. We cannot gain more profit compared to this by selecting any
different combination of items out of the provided items.
Algorithm:
Greedy-Fractional-Knapsack (w[1..n], p[1..n], W)
for i = 1 to n
do x[i] = 0
weight = 0
for i = 1 to n
if (weight + w[i]) ≤ W then,
x[i] = 1
weight = weight + w[i]
else
x[i] = (W - weight) / w[i]
weight = W
break
return x
Step-1:
Randomly choose any vertex. ( Here vertex 1)
The vertex connecting to the edge having least weight is usually selected.
Step-2: Now we are at node / Vertex 6, It has two adjacent edges, one is already selected, select second
one.
Step-3: Now we are at node 5, it has three edges connected, one is already selected, from reaming two
select minimum cost edge (that is having minimum weight) Such that no loops can be formed by adding
that vertex.
Step-4: Now we are at node 4, select the minimum cost edge from the edges connected to this node.
Such that no loops can be formed by adding that vertex.
Step-5: Now we are at node 3, since the minimum cost edge is already selected, so to reach node 2
we selected the edge which cost 16. Then the MST is
Step-6: Now we are at node 2, select minimum cost edge from the edges attached to this node. Such that
no loops can be formed by adding that vertex.
Time Complexity: O(V2), If the input graph is represented using an adjacency list, then the time
complexity of Prim’s algorithm can be reduced to O(E log V) with the help of a binary heap. In this
implementation, we are always considering the spanning tree to start from the root of the graph
2. Kruskal’s Algorithm-
Step-03:
Keep adding edges until all the vertices are connected and a Minimum Spanning Tree (MST) is
obtained.
Analysis: Where E is the number of edges in the graph and V is the number of vertices,
Kruskal's Algorithm can be shown to run in O (E log E) time, or simply, O (E log V) time,
all with simple data structures. These running times are equivalent because:
Hence, Dijkstra’s algorithm does the same. A subgraph is a part or subset of a graph that is
undirected and connected. The Dijkstra algorithm was published by a dutch scientist Edsger
Dijkstra in 1959.
Dijkstra’s shortest path algorithm is similar to Prim’s algorithm for MST (Minimum Spanning
Tree). In this algorithm, the shortest path is generated from the starting node to a target node.
This algorithm also maintains two sets of vertices. One set comprises all the vertices included in
the shortest-path graph and another set comprises all the vertices that are not included in the
shortest-path graph yet.
In every next step of Dijkstra’s algorithm, we will select a vertex (from the set of non-included
vertices) which have the minimum distance from the source. This algorithm is different from the
minimum spanning tree because this might not include all the vertices.
Dijkstra algorithm is applied on each step and follows the following steps:
Create a shortest path tree set; say U, this will keep a track of all the vertices included in the
graph. The vertex included in this set will have the minimum distance from the destination. So,
the minimum distance for each vertex is calculated and finalized accordingly. The set is empty at
the start.
First, assign a distance value to all the nodes/vertices of the input graph.
Initialize all vertices with an INFINITE(∞) distance value.
Set the source vertex’s distance value to zero.
While U (shortest path tree set) doesn’t include all the vertices.
Choose a vertex u that is not present in U and has a minimum distance value.
Include u in U.
Now, update the distance value of u in all the adjacent vertices.
For updating the distance value, iterate this in all adjacent vertices.
For each adjacent vertex v, if the weight of edge u-v and the sum of the distance value of
a is less than the distance value v, then update the distance value a.
This algorithm can be derived to the main formula, which is
if d(u) + c(u, v) < d(v)
d(v) = d(u) + c(u,v)
This means if the sum of distance of u (source vertex) and the cost of going from u to
v(adjacent/ destination vertex(initialized as infinite)) is less than the distance value of v.
Then, the distance of v becomes the sum of the distance of u (source vertex) and the cost
of going from u to v.
Example
Consider the following graph, and calculate the shortest path between A and all other vertices.
Solution: Initially, the cost of every vertex from A will be infinite or unknown, and the distance
value from A to A will be 0.
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
=0+7<∞ //True
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 ∞ ∞ ∞ ∞
=0+9<∞ //True
=9
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ ∞
= 0 + 14 < ∞ //True
= 14
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
Now, the minimum cost between A and B is 7, which is minimum and no need to update that.
Now, let’s check cost of other adjacent node with respect to A via B.
Cost of reaching C from A via B is 17, which is more than previous cost. So we will not update
then.
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
(A,B) 7 9 ∞ ∞ 14
Now, let’s check the cost of reaching vertex D from A via B. The path will be A-B-D, and the cost is
22. Updating this on table, we will get
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
(A,B) 7 9 ∞ ∞ 14
7 9 22 ∞ 14
The minimum cost between A and B is fixed now which is 7, so now we will choose next
minimum cost which is 9. So now we will check the cost of adjacent node via B and C.
Also, the cost of reaching F via C is 11, which is also small then previous.
So, the updated table will be
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
(A, B) 7 9 ∞ ∞ 14
(A, B, C) 7 9 20 ∞ 11
Now, the minimum cost between A and C is fixed which is 9, hence will be choosing next
minimum which is 11 among 20, infinity and 11.
Source Destination
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
(A, B) 7 9 ∞ ∞ 14
(A, B, C) 7 9 20 ∞ 11
(A, B, C, F) 7 9 20 20 11
Now, 11 is also fixed, selecting any one of D and E because both are 20.
Let’s say selecting D and checking all the nodes with respect to (A, B, C, D, F), but we will get the
same values as the minimum one.
A B C D E F
∞ ∞ ∞ ∞ ∞
7 9 ∞ ∞ 14
(A, B) 7 9 ∞ ∞ 14
(A, B, C) 7 9 20 ∞ 11
(A, B, C, F) 7 9 20 20 11
(A, B, C, F, D) 7 9 20 20 11
Pseudo Code
DIJKSTRA (G, w, s)
A(G, s) //Initialize-single source
S←Ø
Q ← V[G]
while Q ≠ Ø
S ← S ∪ (u)
do u ← EX-MIN (Q)//Find minimum distance value
Complexity of Algorithm
Dijkstra’s algorithm takes O (A log B) time to find the shortest path for any graph.
Where A is the number of edges and B is the number of vertices.
It requires O(B) space complexity.
Dynamic Programming
Dynamic programming is a name, coined by Richard Bellman in 1955. Dynamic
programming, as greedy method, is a powerful algorithm design technique that can be
used when the solution to the problem may be viewed as the result of a sequence of
decisions. In the greedy method we make irrevocable decisions one at a time, using a
greedy criterion. However, in dynamic programming we examine the decision
sequence to see whether an optimal decision sequence contains optimal decision
subsequence.
1. Bottom-Up approach
Start computing result for the subproblem. Using the subproblem result solve another
subproblem and finally solve the whole problem.
Example
Let's find the nth member of a Fibonacci series.
Fibonacci(0) = 0
Fibonacci(1) = 1
Fibonacci(2) = 1 (Fibonacci(0) + Fibonacci(1))
Fibonacci(3) = 2 (Fibonacci(1) + Fibonacci(2))
We can solve the problem step by step.
1. Find Oth member
2. Find 1st member
3. Calculate the 2nd member using 0th and 1st member
4. Calculate the 3rd member using 1st and 2nd member
5. By doing this we can easily find the nth member.
Algorithm
1. set Fib[0] = 0
2. set Fib[1] = 1
3. From index 2 to n compute result using the below formula
Fib[index] = Fib[index - 1] + Fib[index - 2]
4. The final result will be stored in Fib[n].
2.Top-Down approach
Top-Down breaks the large problem into multiple subproblems.
if the subproblem solved already just reuse the answer.
Otherwise, Solve the subproblem and store the result.
Top-Down uses memoization to avoid recomputing the same subproblem again.
Let's solve the same Fibonacci problem using the top-down approach.
Top-Down starts breaking the problem unlike bottom-up.
Like,
If we want to compute Fibonacci(4), the top-down approach will do the following
Fibonacci(4) -> Go and compute Fibonacci(3) and Fibonacci(2) and return the results.
Fibonacci(3) -> Go and compute Fibonacci(2) and Fibonacci(1) and return the results.
Fibonacci(2) -> Go and compute Fibonacci(1) and Fibonacci(0) and return the results.
Finally, Fibonacci(1) will return 1 and Fibonacci(0) will return 0.
Fib(5)
/ \
Fib(3) Fib(4)
/ \ / \
Fib(2) Fib(1) Fib(3) Fib(2)
/ \ / \ / \
Fib(1) Fib(0) Fib(2) Fib(1) Fib(1) Fib(0)
We are computing the result of Fib(2) twice.
This can be avoided using memoization.
Algorithm
Fib(n)
If n == 0 || n == 1 return n;
Otherwise, compute subproblem results recursively.
return Fib(n-1) + Fib(n-2);
In the all pairs shortest path problem, we are to find a shortest path
between every pair of vertices in a directed graph G. That is, for every
pair of vertices (i, j), we are to find a shortest path from i to j as well as
one from j to i. These two paths are the same when G is undirected.
When no edge has a negative length, the all-pairs shortest path problem
may be solved by using Dijkstra’s greedy single source algorithm n times,
once with each of the n vertices as the source vertex.
The all pairs shortest path problem is to determine a matrix A such that A
(i, j) is the length of a shortest path from i to j. The matrix A can be
obtained by solving n single- source problems using the algorithm
shortest Paths. Since each application of this procedure requires O (n2)
time, the matrix A can be obtained in O (n3)time.
The dynamic programming solution, called Floyd’s algorithm, runs in O
(n3) time. Floyd’s algorithm works even when the graph has negative
length edges (provided there are no negative length cycles).
The shortest i to j path in G, i ≠ j originates at vertex i and goes through
some intermediate vertices (possibly none) and terminates at vertex j. If k
is an intermediate vertex on this shortest path, then the subpaths from i to
k and from k to j must be shortest paths from i to k and k to j,
respectively. Otherwise, the i to j path is not of minimum length. So, the
principle of optimality holds. Let Ak (i, j) represent the length of a
shortest path from i to j going through no vertex of index greater than k,
we obtain:
for j := 1 to n do
Why would one ever have edges with negative weights in real life?
Negative weight edges might seem useless at first but they can explain a lot of phenomena
like cashflow, the heat released/absorbed in a chemical reaction, etc.
For instance, if there are different ways to reach from one chemical A to another chemical B,
each method will have sub-reactions involving both heat dissipation and absorption.
If we want to find the set of reactions where minimum energy is required, then we will need
to be able to factor in the heat absorption as negative weights and heat dissipation as
positive weights.
Negative weight edges can create negative weight cycles i.e. a cycle that will reduce the total
path distance by coming back to the same point.
Neg
ative weight cycles can give an incorrect result when trying to find out the shortest path
Shortest path algorithms like Dijkstra's Algorithm that aren't able to detect such a cycle can
give an incorrect result because they can go through a negative weight cycle and reduce the
path length.
Bellman Ford algorithm works by overestimating the length of the path from the starting
vertex to all other vertices. Then it iteratively relaxes those estimates by finding new paths
that are shorter than the previously overestimated paths.
By doing this repeatedly for all vertices, we can guarantee that the result is optimized.
Note: To relax the path, an edge(U, V), if distance(U) + edge_weight(U,V) < distance(V),
assign distance(V) = distance(U) + edge_weight(U,V).
Optimal Binary Search Tree:
As we know that in binary search tree, the nodes in the left subtree have lesser value than
the root node and the nodes in the right subtree have greater value than the root node.
We know the key values of each node in the tree, and we also know the frequencies of each
node in terms of searching means how much time is required to search a node. The
frequency and key-value determine the overall cost of searching a node. The cost of
searching is a very important factor in various applications. The overall cost of searching a
node should be less. The time required to search a node in BST is more than the balanced
binary search tree as a balanced binary search tree contains a lesser number of levels than
the BST. There is one way that can reduce the cost of a binary search tree is known as
an optimal binary search tree.
Now we will see how many binary search trees can be made from the given number of keys.
For example: 10, 20, 30 are the keys, and the following are the binary search trees that can
be made out from these keys.
When we use the above formula, then it is found that total 5 number of trees can be created.
The cost required for searching an element depends on the comparisons to be made to
search an element. Now, we will calculate the average cost of time of the above binary search
trees.
In the above tree, total number of 3 comparisons can be made. The average number of
comparisons can be made as:
Advertisement
In the above tree, the average number of comparisons that can be made as:
In the above tree, the average number of comparisons that can be made as:
In the above tree, the total number of comparisons can be made as 3. Therefore, the average
number of comparisons that can be made as:
In the above tree, the total number of comparisons can be made as 3. Therefore, the average
number of comparisons that can be made as:
In the third case, the number of comparisons is less because the height of the tree is less, so
it's a balanced binary search tree.
Till now, we read about the height-balanced binary search tree. To find the optimal binary
search tree, we will determine the frequency of searching a key.
Let's assume that frequencies associated with the keys 10, 20, 30 are 3, 2, 5.
The above trees have different frequencies. The tree with the lowest frequency would be
considered the optimal binary search tree. The tree with the frequency 17 is the lowest, so it
would be considered as the optimal binary search tree.
Dynamic Approach
Consider the below table, which contains the keys and frequencies.
First, we will calculate the values where j-i is equal to zero.
Now to calculate the cost, we will consider only the jth value.
The cost of c[0,1] is 4 (The key is 10, and the cost corresponding to key 10 is 4).
The cost of c[1,2] is 2 (The key is 20, and the cost corresponding to key 20 is 2).
Now to calculate the cost, we will consider only the jth value.
The cost of c[0,1] is 4 (The key is 10, and the cost corresponding to key 10 is 4).
The cost of c[1,2] is 2 (The key is 20, and the cost corresponding to key 20 is 2).
Advertisement
The cost of c[2,3] is 6 (The key is 30, and the cost corresponding to key 30 is 6)
The cost of c[3,4] is 3 (The key is 40, and the cost corresponding to key 40 is 3)
o When i=0 and j=2, then keys 10 and 20. There are two possible trees that can be made
out from these two keys shown below:
o When i=1 and j=3, then keys 20 and 30. There are two possible trees that can be made
out from these two keys shown below:
o When i=0, j=3 then we will consider three keys, i.e., 10, 20, and 30.
The following are the trees that can be made if 10 is considered as a root node.
In the above tree, 10 is the root node, 20 is the right child of node 10, and 30 is the right child
of node 20.
In the above tree, 10 is the root node, 30 is the right child of node 10, and 20 is the left child
of node 20.
The following are the trees that can be created if 30 is considered as the root node.
In the above tree, 30 is the root node, 20 is the left child of node 30, and 10 is the left child of
node 20.
Therefore, the minimum cost is 20 which is the 3rd root. So, c[0,3] is equal to 20.
o When i=1 and j=4 then we will consider the keys 20, 30, 40
= min{12, 5, 10} + 11
In this case, we will consider four keys, i.e., 10, 20, 30 and 40. The frequencies of 10, 20, 30
and 40 are 4, 2, 6 and 3 respectively.
w[0, 4] = 4 + 2 + 6 + 3 = 15
= min{4 + 12} + 15
= 16 + 15 = 31
= min {8 + 3} + 15
= 26
= min{20 + 0} + 15
= 35
Advertisement
In the above cases, we have observed that 26 is the minimum cost; therefore, c[0,4] is equal
to 26.
Mark area H6 because it is the minimum cost area reachable from H1 and then select minimum cost area
reachable from H6.
Mark area H7 because it is the minimum cost area reachable from H6 and then select
minimum cost area reachable from H7.
Mark area H2 because it is the minimum cost area reachable from H2.
Mark area H4 and then select the minimum cost area reachable from H4 it is H1.So, using the
greedy strategy, we get the following. 4 3 2 4 3 2 1 6 H1 → H6 → H7 → H8 → H5 → H2 → H3 →
H4 → H1. Thus the minimum travel cost = 4 + 3 + 2 + 4 + 3 + 2 + 1 + 6 = 25
Time Complexity
Using the above recurrence relation, we can write a dynamic programming-based solution.
There are at most O(n*2n ) subproblems, and each one takes linear time to solve. The total
running time is therefore O(n2*2n ). The time complexity is much less than O(n!) but still
exponential. The space required is also exponential. So this approach is also infeasible even
for a slightly higher number of vertices. We will soon be discussing approximate algorithms
for the traveling salesman problem. The dynamic programming approach breaks the
problem into 2nn subproblems. Each subproblem takes n time resulting in a time complexity
of O(2n n2 ). Here n refers to the number of cities needed to be travelled too.