Analysis of Algorithms
Analysis of Algorithms
S)
SEMESTER - I
ANALYSIS OF ALGORITHMS
AND
RESEARCHING COMPUTING
SUBJECT CODE: PSCS101
© UNIVERSITY OF MUMBAI
Published by : Director,
Institute of Distance and Open Learning,
University of Mumbai,
Vidyanagari,Mumbai - 400 098.
ii
CONTENTS
5. Researching Computing....................................................................................... 79
iii
M.SC (C.S) Semester - I
Analysis of Algorithms and Researching Computing
SYLLABUS
iv
Unit IV: Researching Computing
Introduction, purpose and products of research, overview of research process, internet
research, participants and research ethics, reviewing literature, design and creation,
experiments, Quantitative data analysis, presentation of research.
Text book:
Introduction to Algorithms, Third Edition, Thomas H. Cormen, Charles E.
Leiserson, Ronald L. Rivest, Clifford Stein, PHI Learning Pvt. Ltd-New Delhi
(2009).
Researching Information Systems and Computing, Brinoy J Oates, Sage
Publications India Pvt Ltd (2006).
References:
Algorithms, Sanjoy Dasgupta , Christos H. Papadimitriou, Umesh Vazirani,
McGraw-Hill Higher Education (2006)
Grokking Algorithms: An illustrated guide for programmers and other curious
people, MEAP, Aditya Bhargava, http://www.manning.com/bhargava
Research Methodology, Methods and Techniques, Kothari, C.R.,1985, third
edition, New Age International (2014) .
Basic of Qualitative Research (3rd Edition), Juliet Corbin & Anselm Strauss:,
Sage Publications (2008).
v
Chapter 1: Design Strategies - Role of Algorithm
Unit I
1
DESIGN STRATEGIES
ROLE OF ALGORITHM
Unit Structure
1.0 Objective
1.1 Introduction
1.2.1 Algorithms
1.8 Exercises
1
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
1.0 Objective
• What is an algorithm?
1.1 Introduction
In this chapter, you will learn what is an algorithm and its importance in computer
technologies. It covers some suitable examples as well. Along with we will get the
idea about growth function, standard notations and functions which are additional
interesting ingredients of the algorithms.
1.2.1 Algorithms
Input Algorithm
Output
2
Chapter 1: Design Strategies - Role of Algorithm
For example – if we need to sort the numbers in ascending order then it requires
the formulate the problem as follows:
Now we can take one example to solve this problem with numerical values.
So, the input is (5,2,4,6,1), and after sorting the output should be (1,2,4,5,6). Input
sequence is called as instance of a problem.
An algorithm is to accurate if for every input it halts with the accurate output. We
can use flowchart to represent the sequence of steps in pictorial form.
Characteristics of Algorithm
The Algorithm designed is language independent, i.e. they are just plain
instructions set that can be implemented in any language, and the output will be
the same, as expected.
• Well-Defined Outputs: The algorithm must clearly define that what type
of output will be generated.
• Feasibility: The algorithm must be generic and practical, such that it can
be executed upon available resources.
Consider the scenario, computers were tremendously fast and memory of computer
was free as well. You would still like to prove that your solution method terminates
with the correct answer. If computers were tremendously fast, any accurate method
for solving a problem would give accurate result.
3
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
So, computing time is a bounded resource, and so is space in memory. You should
use these resources sensibly or as per your project need, and algorithms that are
efficient in terms of time or space will help you do so.
As algorithm should be effective and correct algorithm selection is the art and for
implementation purpose, selection of hardware is also important part. As
technological changes are happened at every single day and which help us to do
our work more effectively. Algorithm is also get the similar importance as it save
your reverse engineering or reframing the implementation of problem. Ultimately
it helps for different technologies to take proper decision such as web technology,
networking, wired and wireless network; etc.
First algorithm is insertion sort. It is just like playing cards game as you can insert
the card in between two cards and make the sequence. The logic is to arrange list
in ascending order using insertion sort.
So, firstly we have our number in the form of array with size n and the numbers we
need to sort can be called as keys. We can implement it in any language but in this
chapter, we will illustrate it using C programming language. In insertion sort,
procedure should be follow in following steps –
4
Chapter 1: Design Strategies - Role of Algorithm
Pseudo code
Insertion_Sort(Arr)
For j = 2 to Arr.length
key =Arr[j]
i = j-1
Arr[i+1] = Arr[i]
i =i – 1
Arr[i+1] = key
For example –
Let us loop for i = 1 (second element of the array) to 4 (last element of the array)
i = 2. 23 will remain at its position as all elements in A[0..i-1] are smaller than 23
21, 22, 23, 15, 16
i = 3. 15 will move to the beginning and all other elements from 21 to 23 will
move one position ahead of their current position.
15, 21, 22, 23, 16
5
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
i = 4. 16 will move to position after 15, and elements from 21 to 23 will move
one position ahead of their current position.
15, 16, 21, 22, 23
Program –
int i, key, j;
key = a1[i];
j = i - 1;
a1[j + 1] = a1[j];
j = j - 1;
a1[j + 1] = key;
Pseudocode conventions
1. the body of the for loop that begins on line 1 consists of lines 2–8
2. body of the while loop that begins on line 5 contains lines 6–7 but not line 8.
6
Chapter 1: Design Strategies - Role of Algorithm
statements, greatly reduces clutter while preserving, or even enhancing, clarity. The
looping constructs while, for, and repeat-until and the if-else conditional construct
have interpretations similar to those in C, C++, Java, Python, and Pascal.
Analysing an algorithm means predicting the required resources for the solving
problem statement. Resources means computer hardware such as memory,
bandwidth and communication channel. But the majorly computer hardware used
for solving problem statement is primary concern, but most often it is
computational time that we want to measure.
Generally, we used to do analysis of several algorithm and then pick up the efficient
one. It is our practice to check whether we are using right algorithm or not. Before
we can analyze an algorithm, we must have a model of the implementation
technology that we will use, including a model for the resources of that technology
and their costs.
So, following example will gives you the detail idea about how to analyse the
problem and how the algorithm plays important role to solve the problem.
Following program will gives you the idea about insertion sort.
int i, key, j;
key = a1[i];
j = i - 1;
7
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
a1[j + 1] = a1[j];
j = j - 1;
a1[j + 1] = key;
Observations are –
1. Its total time requires is based on number of inputs. (sorting 5 numbers will take
less time as compare to sorting 50 numbers)
2. If the size of array or list is same but some number of the first list is sorted and
another list is fully unsorted then first list will take less time as compare to
whole unsorted list.
3. The time taken by an algorithm raises with the size of the input, so it is
traditional to define the running time of a program as a function of the size of
its input.
4. So, most important part of analysis is “running time” and “size of input”.
The main body of the code for bubble sort looks something like this:
8
Chapter 1: Design Strategies - Role of Algorithm
Observations are -
1. This looks like the double. The innermost statement, the if, takes O(1) time. It
doesn‟t necessarily take the same time when the condition is true as it does
when it is false, but both times are bounded by a constant. But there is an
important difference here.
2. The outer loop executes n times, but the inner loop executes a number of times
that depends on i. The first time the inner for executes, it runs i = n-1 times. The
second time it runs n-2 times, etc. The total number of times the inner if
statement executes is therefore:
The value of the sum is n(n-1)/2. So the running time of bubble sort is O(n(n-1)/2),
which is O((n2 -n)/2). Using the rules for big-O given earlier, this bound simplifies
to O((n2 )/2) by ignoring a smaller term, and to O(n2 ), by ignoring a constant
factor. Thus, bubble sort is an O(n2 ) algorithm.
3. Transformation
4. Dynamic programming
5. Greedy programming
7. Randomization
9
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
For insertion sort, we used an incremental approach: having sorted the subarray
A[1 .. j – 1], we inserted the single element A[j] into its proper place, yielding the
sorted subarray A[1.. j].
Many algorithm has different designing techniques like merge sort or quick sort
has recursive structure. So, the observation is –
1. Many algorithms are recursive in structure: to solve a given problem, they call
themselves recursively one or more times to deal with closely related
subproblems.
1. Divide the problem into a number of subproblems that are smaller instances of
the same problem.
3. Combine the solutions to the subproblems into the solution for the original
problem.
5 2 3 1 4 7 9 8 6 10
1 2 3 4 5 6 7 8 9 10
10
Chapter 1: Design Strategies - Role of Algorithm
• Combine: Merge the two sorted sub-sequences to produce the sorted answer
Growth function gives a simple description of the algorithm’s efficiency and also
allows us to compare the relative performance of alternative algorithms. Following
steps need to consider –
1. Once the input size n becomes large enough, merge sort, with its ‚θ(nlg n)
worst-case running time, beats insertion sort, whose worst-case running time is
θ(n2).
2. The extra precision is not usually worth the effort of computing it.
3. For large enough inputs, the multiplicative constants and lower-order terms of
an exact running time are dominated by the effects of the input size itself.
11
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
4. When we look at input sizes large enough to make only the order of growth of
the running time relevant, we are studying the asymptotic efficiency of
algorithms.
That is, we are concerned with how the running time of an algorithm increases with
the size of the input in the limit, as the size of the input increases without bound.
Usually, an algorithm that is asymptotically more efficient will be the best choice
for all but very small inputs.
Complexity of Algorithms
The complexity of an algorithm M is the function f(n) which gives the running time
and/or storage space necessity of the algorithm in terms of the size n of the input
data. The storage space essential by an algorithm is simply a multiple of the data
size n.
Complexity shall refer to the running time of the algorithm. The function f(n), gives
the running time of an algorithm, depends not only on the size n of the input data
but also on the particular data.
1. Best Case : The minimum possible value of f(n) is called the best case.
3. Worst Case : The maximum value of f(n) for any key possible input.
The notations we use to define the asymptotic running time of an algorithm are
defined in terms of functions whose domains are the set of natural numbers
N={0,1,2,…}.
Such notations are suitable for describing the worst-case running-time function
T(n), which frequently is defined only on integer input sizes. We sometimes find it
convenient, however, to abuse asymptotic notation in a variation of ways.
12
Chapter 1: Design Strategies - Role of Algorithm
For example, we might extend the notation to the domain of real numbers or,
alternatively, restrict it to a subset of the natural numbers. We should make sure,
however, to understand the precise meaning of the notation so that when we abuse,
we do not misuse it.
This section defines the basic asymptotic notations and also presents some common
abuses. The following notations are commonly use notations in performance
analysis and used to characterize the complexity of an algorithm:
1. Big–OH (O) 1,
2. Big–OMEGA (Ω),
4. Little–OH (o)
f(n) = Ω(g(n)) (pronounced omega), says that the growth rate of f(n) is greater
than or equal to (>) that of g(n).
13
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
f(n) = θ(g(n)) (pronounced theta), says that the growth rate of f(n) equals (=) the
growth rate of g(n) [if f(n) = O(g(n)) and T(n) = Ω (g(n)].
4. Little-OH (o)
T(n) = o(p(n)) (pronounced little oh), says that the growth rate of T(n) is less
than the growth rate of p(n) [if T(n) = O(p(n)) and T(n) ≠ θ(p(n))].
Monotonicity
3. A function f(n) is strictly increasing if m < n implies f(m) < f(n) and strictly
decreasing if m < n implies f(m) > f(n).
14
Chapter 1: Design Strategies - Role of Algorithm
For any real number x, we denote the greatest integer less than or equal
to x by ⌊x⌋ (read "the floor of x") and the least integer greater than or equal
to x by ⌈x⌉ (read "the ceiling of x").
(3.3)
⌈n/2⌉ + ⌊n/2⌋ = n,
(3.4)
(3.5)
(3.6)
(3.7)
Modular arithmetic
For any integer a and any positive integer n, the value a mod n is
the remainder (or residue) of the quotient a/n:
(3.8)
Given a precise notion of the remainder of one integer when divided by another, it
is convenient to provide special notation to specify equality of remainders.
15
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Polynomials
Given a nonnegative integer d, a polynomial in n of degree d is a function p(n) of
the form
• where the constants a0, a1, ..., ad are the coefficients of the polynomial
and ad ≠ 0. A polynomial is asymptotically positive if and only if ad > 0.
• For any real constant a ≥ 0, the function na is monotonically increasing, and for
any real constant a ≤ 0, the function na is monotonically decreasing.
We say that a function f(n) is polynomially bounded if f(n) = O(nk) for some
constant k.
Exponentials
For all real a > 0, m, and n, we have the following identities:
a0 = 1,
a1 = a,
a-1 = 1/a,
(am)n = amn,
(am)n = (an)m,
am an = am+n.
(3.9)
16
Chapter 1: Design Strategies - Role of Algorithm
Thus, any exponential function with a base strictly greater than 1 grows faster than
any polynomial function.
Using e to denote 2.71828..., the base of the natural logarithm function, we have
for all real x,
(3.10)
where "!" denotes the factorial function defined later in this section. For all real x,
we have the inequality
(3.11)
where equality holds only when x = 0. When |x| ≤ 1, we have the approximation
(3.12)
ex = 1 + x + Θ(x2).
(In this equation, the asymptotic notation is used to describe the limiting behavior
as x → 0 rather than as x → ∞.) We have for all x,
(3.13)
Logarithms
lg lg n = lg(lg n) (composition) .
17
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
(3.14)
(3.15)
By equation (3.14), changing the base of a logarithm from one constant to another
only changes the value of the logarithm by a constant factor, and so we shall often
use the notation "lg n" when we don't care about constant factors, such as in O-
notation. Computer scientists find 2 to be the most natural base for logarithms
because so many algorithms and data structures involve splitting a problem into
two parts.
(3.16)
lgb n = o(na)
18
Chapter 1: Design Strategies - Role of Algorithm
for any constant a > 0. Thus, any positive polynomial function grows faster than
any polylogarithmic function.
Factorials
Thus, n! = 1 · 2 · 3 n.
A weak upper bound on the factorial function is n! ≤ nn, since each of the n terms
in the factorial product is at most n. Stirling's approximation,
(3.17)
where e is the base of the natural logarithm, gives us a tighter upper bound, and a
lower bound as well.
(3.18)
(3.19)
where
(3.20)
Functional iteration
We use the notation f(i)(n) to denote the function f(n) iteratively applied i times to
an initial value of n. Formally, let f(n) be a function over the reals. For nonnegative
integers i, we recursively define
19
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
We use the notation lg* n (read "log star of n") to denote the iterated logarithm,
which is defined as follows. Let lg(i) n be as defined above, with f(n) = lg n. Because
the logarithm of a nonpositive number is undefined, lg(i) n is defined only if lg(i-
1)
n > 0. Be sure to distinguish lg(i) n (the logarithm function applied i times in
succession, starting with argument n) from lgi n (the logarithm of n raised to the ith
power). The iterated logarithm function is defined as
lg* 2 = 1,
lg* 4 = 2,
lg* 16 = 3,
lg* 65536 = 4,
lg*(265536) = 5.
Since the number of atoms in the observable universe is estimated to be about 1080,
which is much less than 265536, we rarely encounter an input size n such that
lg* n > 5.
Fibonacci numbers
(3.21)
Thus, each Fibonacci number is the sum of the two previous ones, yielding the
sequence
Fibonacci numbers are related to the golden ratio φ and to its conjugate , which
are given by the following formulas:
20
Chapter 1: Design Strategies - Role of Algorithm
(3.22)
Specifically, we have
(3.23)
2. https://www.javatpoint.com/algorithms-and-functions
4. http://serverbob.3x.ro/IA/DDU0020.html#!/stealingyourhistory
21
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
1.8 Bibliography
Exercises
v[i] = v[i] + 1;
C[i, j] = 0;
4. What is the smallest value of n such that an algorithm whose running time is
100n2 runs faster than an algorithm whose running time is 2n on the same
machine?
22
Chapter 2: Divide and Conquer and Randomized Algorithm
Unit I
2
DESIGN STRATEGIES
DIVIDE AND CONQUER AND
RANDOMIZED ALGORITHM
Unit Structure
1.0 Objective
2.1 Introduction
2.2 Divide-and-Conquer
2.2.1 The maximum-subarray problem
2.2.2 Strassen’s algorithm for matrix multiplication
2.2.3 The substitution method for solving recurrences
2.6 Bibliography
2.7 Exercise
2.0 Objective
23
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
2.1 Introduction
2.2 Divide-and-Conquer
Divide: Divide the problem into sub problems. The sub problems are solved
recursively.
Conquer: The solution to the original problem is then formed from the solutions to
the sub problems (patching together the answers).
Combine: Combine the solutions to the subproblems into the solution for the
original problem.
Traditionally, routines in which the text contains at least two recursive calls are
called divide and conquer algorithms, while routines whose text contains only one
recursive call are not.
Recurrences
24
Chapter 2: Divide and Conquer and Randomized Algorithm
terms of its value on smaller inputs. For example, the worst-case running time T(n)
of the MERGE-SORT procedure by the recurrence.
2. The recursion-tree method converts the recurrence into a tree whose nodes
represent the costs incurred at various levels of the recursion. We use
techniques for bounding summations to solve the recurrence.
3. The master method provides bounds for recurrences of the form. It is used to
determine the running times of the divide-and-conquer algorithms for the
maximum-subarray problem and for matrix multiplication.
But the problem gets more interesting when some of the elements are negative then
the subarray whose sum of the elements is largest over the sum of the elements of
any other subarrays of that element can lie anywhere in the array.
25
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
The Brute Force technique to solve the problem is simple. Just iterate through every
element of the array and check the sum of all the subarrays that can be made starting
from that element i.e., check all the subarrays and this can be done in nC2 ways i.e.,
choosing two different elements of the array to make a subarray. Thus, the brute
force technique is of Θ(n2) time. However, we can solve this in Θ(nlog(n)) time
using divide and conquer.
As we know that the divide and conquer solves a problem by breaking into
subproblems, so let's first break an array into two parts. Now, the subarray with
maximum sum can either lie entirely in the left subarray or entirely in the right
subarray or in a subarray consisting both i.e., crossing the middle element.
The first two cases where the subarray is entirely on right or on the left are
actually the smaller instances of the original problem. So, we can solve them
recursively by calling the function to calculate the maximum sum subarray on
both the parts.
Max_sum_subarray(array, low, high)
{
if (high == low) // only one element in an array
{
return array[high]
}
mid = (high+low)/2
Max_sum_subarray(array, low, mid)
Max_sum_subarray(array, mid+1, high)
}
26
Chapter 2: Divide and Conquer and Randomized Algorithm
Now, we have to handle the third case i.e., when the subarray with the maximum
sum contains both the right and the left subarrays (containing the middle
element). At a glance, this could look like a smaller instance of the original
problem also but it is not because it contains a restriction that the subarray must
contain the middle element and thus makes our problem much more narrow and
less time taking.
return array[high];
mid = (high+low)/2;
27
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Here, we are covering all three cases mentioned above to and then just returning
the maximum of these three.
left_sum = -infinity
sum = 0
sum = sum+ar[i]
if (sum>left_sum)
left_sum = sum
right_sum = -infinity;
sum = 0
28
Chapter 2: Divide and Conquer and Randomized Algorithm
sum=sum+ar[i]
if (sum>right_sum)
right_sum = sum
return (left_sum+right_sum)
Here, our first loop is iterating from the middle element to the lowest element of
the left subarray to find the maximum sum and similarly the second loop is iterating
from the middle+1 element to the highest element of the subarray to calculate the
maximum sum of the subarray on the right side. And finally, we are returning
summing both of them and returning the sum which is calculated from the subarray
crossing the middle element.
for i := 1 to n do
for j :=1 to n do
c[i, j] := 0;
for K: = 1 to n do
29
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Which leads to T (n) = O (n3), where n is the power of 2. Strassens insight was to
find an alternative method for calculating the Cij, requiring seven (n/2) x (n/2)
matrix multiplications and eighteen (n/2) x (n/2) matrix additions and subtractions:
This method is used recursively to perform the seven (n/2) x (n/2) matrix
multiplications, then the recurrence equation for the number of scalar
multiplications performed is:
30
Chapter 2: Divide and Conquer and Randomized Algorithm
So, concluding that Strassen‟s algorithm is asymptotically more efficient than the
standard algorithm. In practice, the overhead of managing the many small matrices
does not pay off until „n‟ revolves the hundreds.
= 22 T(n/22 ) + 2 b n
= 23 T(n/23 ) + 3 b n
31
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
The hope in applying the iterative substitution method is that, at some point, we
will see a pattern that can be converted into a general closed-form equation (with
T only appearing on the left-hand side). In the case of merge-sort recurrence
equation, the general form is:
T(n) = 2i T (n/2i) + i b n
Note that the general form of this equation shifts to the base case, T(n) = b, where
n = 2i, that is, when i = log n, which implies:
T(n) = b n + b n log n.
In other words, T(n) is O(n log n). In a general application of the iterative
substitution technique, we hope that we can determine a general pattern for T(n)
and that we can also figure out when the general form of T(n) shifts to the base
case.
Suppose that you need to hire a new office assistant. Your previous attempts at
hiring have been unsuccessful, and you decide to use an employment agency. The
employment agency sends you one candidate each day. You interview that person
and then decide either to hire that person or not. You must pay the employment
agency a small fee to interview an applicant. To actually hire an applicant is more
costly, however, since you must fire your current office assistant and pay a
substantial hiring fee to the employment agency. You are committed to having, at
all times, the best possible person for the job. Therefore, you decide that, after
interviewing each applicant, if that applicant is better qualified than the current
office assistant, you will fire the current office assistant and hire the new applicant.
You are willing to pay the resulting price of this strategy, but you wish to estimate
what that price will be. The procedure HIRE-ASSISTANT, given below, expresses
this strategy for hiring in pseudocode. It assumes that the candidates for the office
32
Chapter 2: Divide and Conquer and Randomized Algorithm
assistant job are numbered 1 through n. The procedure assumes that you are able
to, after interviewing candidate i, determine whether candidate i is the best
candidate you have seen so far. To initialize, the procedure creates a dummy
candidate, numbered 0, who is less qualified than each of the other candidates.
HIRE-ASSISTANT(n)
for i = 1 to n
interview candidate i
best = i
hire candidate i
We focus not on the running time of HIRE-ASSISTANT, but instead on the costs
incurred by interviewing and hiring. On the surface, analyzing the cost of this
algorithm may seem very different from analyzing the running time of, say, merge
sort. The analytical techniques used, however, are identical whether we are
analyzing cost or running time. In either case, we are counting the number of times
certain basic operations are executed.
Interviewing has a low cost, say ci, whereas hiring is expensive, costing ch. Letting
m be the number of people hired, the total cost associated with this algorithm is
O(cin+chm). No matter how many people we hire, we always interview n
candidates and thus always incur the cost cin associated with interviewing. We
therefore concentrate on analyzing chm, the hiring cost. This quantity varies with
each run of the algorithm. This scenario serves as a model for a common
computational paradigm. We often need to find the maximum or minimum value
in a sequence by examining each element of the sequence and maintaining a current
“winner.” The hiring problem models how often we update our notion of which
element is currently winning.
Worst-case analysis
In the worst case, we actually hire every candidate that we interview. This situation
occurs if the candidates come in strictly increasing order of quality, in which case
we hire n times, for a total hiring cost of O(Chn). Of course, the candidates do not
always come in increasing order of quality. In fact, we have no idea about the order
33
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
in which they arrive, nor do we have any control over this order. Therefore, it is
natural to ask what we expect to happen in a typical or average case.
Probabilistic analysis
Randomized algorithms
34
Chapter 2: Divide and Conquer and Randomized Algorithm
algorithm for the hiring problem, we must have greater control over the order in
which we interview the candidates. We will, therefore, change the model slightly.
We say that the employment agency has n candidates, and they send us a list of the
candidates in advance. On each day, we choose, randomly, which candidate to
interview. Although we know nothing about the candidates (besides their names),
we have made a significant change. Instead of relying on a guess that the candidates
come to us in a random order, we have instead gained control of the process and
enforced a random order. More generally, we call an algorithm randomized if its
behavior is determined not only by its input but also by values produced by a
random-number generator. We shall assume that we have at our disposal a random-
number generator RANDOM. A call to RANDOM(a,b) returns an integer between
a and b, inclusive, with each such integer being equally likely. For example,
RANDOM(0,1) produces 0 with probability 1/2, and it produces 1 with probability
1/2. A call to RANDOM(3,7) returns either 3, 4, 5, 6, or 7, each with probability
1/5. Each integer returned by RANDOM is independent of the integers returned on
previous calls. You may imagine RANDOM as rolling a(b-a+1) -sided die to obtain
its output. (In practice, most programming environments offer a pseudorandom-
number generator: a deterministic algorithm returning numbers that “look”
statistically random.) When analyzing the running time of a randomized algorithm,
we take the expectation of the running time over the distribution of values returned
by the random number generator. We distinguish these algorithms from those in
which the input is random by referring to the running time of a randomized
algorithm as an expected running time. In general, we discuss the average-case
running time when the probability distribution is over the inputs to the algorithm,
and we discuss the expected running time when the algorithm itself makes random
choices.
In order to analyze many algorithms, including the hiring problem, we use indicator
random variables. Indicator random variables provide a convenient method for
converting between probabilities and expectations. Suppose we are given a sample
space S and an event A. Then the indicator random variable I {A} associated with
event A is defined as
As a simple example, let us determine the expected number of heads that we obtain
when flipping a fair coin. Our sample space is S = {H, T}, with Pr{H} = Pr{T}=1/2.
35
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
We can then define an indicator random variable XH , associated with the coin
coming up heads, which is the event H. This variable counts the number of heads
obtained in this flip, and it is 1 if the coin comes up heads and 0 otherwise. We
write
The expected number of heads obtained in one flip of the coin is simply the
expected value of our indicator variable XH :
Thus the expected number of heads obtained by one flip of a fair coin is 1/2. As the
following lemma shows, the expected value of an indicator random variable
associated with an event A is equal to the probability that A occurs.
In the previous section, we showed how knowing a distribution on the inputs can
help us to analyze the average-case behavior of an algorithm. Many times, we do
not have such knowledge, thus precluding an average-case analysis. As mentioned
in Section 5.1, we may be able to use a randomized algorithm. For a problem such
as the hiring problem, in which it is helpful to assume that all permutations of the
input are equally likely, a probabilistic analysis can guide the development of a
randomized algorithm. Instead of assuming a distribution of inputs, we impose a
distribution. In particular, before running the algorithm, we randomly permute the
candidates in order to enforce the property that every permutation is equally likely.
Although we have modified the algorithm, we still expect to hire a new office
assistant approximately ln n times. But now we expect this to be the case for any
input, rather than for inputs drawn from a particular distribution.
36
Chapter 2: Divide and Conquer and Randomized Algorithm
same. Furthermore, the number of times we hire a new office assistant differs for
different inputs, and it depends on the ranks of the various candidates. Since this
number depends only on the ranks of the candidates, we can represent a particular
input by listing, in order, the ranks of the candidates, i.e., ((rank(1),
rank(2),…rank(n)). Given the rank list A1 = (1,2,3,4,5,6,7,8,910), a new office
assistant is always hired 10 times, since each successive candidate is better than the
previous one, and lines 5–6 are executed in each iteration. Given the list of ranks
A2 =(10,9,8,7,6,5,4,3,2,1), a new office assistant is hired only once, in the first
iteration. Given a list of ranks A3=(5, 2, 1, 8, 4, 7, 10, 9, 3, 6), a new office assistant
is hired three times, upon interviewing the candidates with ranks 5, 8, and 10.
Recalling that the cost of our algorithm depends on how many times we hire a new
office assistant, we see that there are expensive inputs such as A1, inexpensive
inputs such as A2, and moderately expensive inputs such as A3. Consider, on the
other hand, the randomized algorithm that first permutes the candidates and then
determines the best candidate. In this case, we randomize in the algorithm, not in
the input distribution. Given a particular input, say A3 above, we cannot say how
many times the maximum is updated, because this quantity differs with each run of
the algorithm. The first time we run the algorithm on A3, it may produce the
permutation A1 and perform 10 updates; but the second time we run the algorithm,
we may produce the permutation A2 and perform only one update. The third time
we run it, we may perform some other number of updates. Each time we run the
algorithm, the execution depends on the random choices made and is likely to differ
from the previous execution of the algorithm. For this algorithm and many other
randomized algorithms, no particular input elicits its worst-case behavior. Even
your worst enemy cannot produce a bad input array, since the random permutation
makes the input order irrelevant. The randomized algorithm performs badly only if
the random-number generator produces an “unlucky” permutation. For the hiring
problem, the only change needed in the code is to randomly permute the array.
RANDOMIZED-HIRE-ASSISTANT(n)
randomly permute the list of candidates
best = 0 // candidate 0 is a least-qualified dummy candidate
for i = 1 to n
interview candidate i
if candidate i is better than candidate best
best = i
hire candidate i
37
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
1. https://www.codesdope.com/blog/article/maximum-subarray-sum-using-
divide-and-conquer/
3. https://www.javatpoint.com/algorithms-and-functions
2.6 Exercise
3. Describe the concept of randomized algorithm. Write down algorithm for the
same.
38
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
Unit II
3
ADVANCED DESIGN AND ANALYSIS
TECHNIQUES
ROLE OF ALGORITHM
Unit Structure
3.0 Objective
3.1 Introduction
3.2 Dynamic Programming
3.2.1 Rod cutting
3.2.2 Elements of dynamic programming
3.2.3 Longest common subsequence
3.3 Greedy algorithm
3.3.1 An activity-selection problem
3.3.2 Elements of the greedy strategy
3.3.3 Huffman codes
3.4 Elementary Graph Algorithms
3.4.1 Representations of graphs
3.4.2 Breadth-first search
3.4.3 Depth-first search
3.5 Minimum Spanning Trees
3.5.1 Growing a minimum spanning tree
3.5.2 Algorithms of Kruskal and Prim
3.6 Single-Source Shortest Paths
3.6.1 The Bellman-Ford algorithm
3.6.2 Single-source shortest paths in directed acyclic graphs
3.6.3 Dijkstra’s algorithm
3.7 Let us Sum Up
3.8 List of References
3.9 Exercises
39
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
3.1 Objective
3.2 Introduction
Given a rod of length n inches and an array of prices that contains prices of all
pieces of size smaller than n. Determine the maximum value obtainable by cutting
up the rod and selling the pieces. For example, if length of the rod is 8 and the
values of different pieces are given as following, then the maximum obtainable
value is 22 (by cutting in two pieces of lengths 2 and 6)
40
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
length | 1 2 3 4 5 6 7 8
--------------------------------------------
price | 1 5 8 9 10 17 17 20
And if the prices are as following, then the maximum obtainable value is 24 (by
cutting in eight pieces of length 1)
length | 1 2 3 4 5 6 7 8
--------------------------------------------
price | 3 5 8 9 10 17 17 20
1) Optimal Substructure:
We can get the best price by making a cut at different positions and
comparing the values obtained after a cut. We can recursively call the
same function for a piece obtained after a cut.
Let cutRod(n) be the required (best possible price) value for a rod of
length n. cutRod(n) can be written as following.
cutRod(n) = max(price[i] + cutRod(n-i-1)) for all i in {0, 1 .. n-1}
2) Overlapping Subproblems
Following is simple recursive implementation of the Rod Cutting problem.
The implementation simply follows the recursive structure.
2. Table Structure: After solving the sub-problems, store the results to the
sub problems in a table. This is done because subproblem solutions are
41
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
reused many times, and we do not want to repeatedly solve the same
problem over and over again.
Bottom-up means:-
2. Recursively defined the value of the optimal solution. Like Divide and
Conquer, divide the problem into two or more optimal parts recursively.
This helps to determine what the solution will look like.
3. Compute the value of the optimal solution from the bottom up (starting with
the smallest subproblems)
4. Construct the optimal solution for the entire problem form the computed
values of smaller subproblems.
42
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
The longest common subsequence problem is finding the longest sequence which
exists in both the given strings.
Subsequence
A sequence Z = <z1, z2, z3, z4, …,zm> over S is called a subsequence of S, if and
only if it can be derived from S deletion of some elements.
Common Subsequence
Suppose, X and Y are two sequences over a finite set of elements. We can say
that Z is a common subsequence of X and Y, if Z is a subsequence of both X and Y.
If a set of sequences are given, the longest common subsequence problem is to find
a common subsequence of all the sequences that is of maximal length.
Algorithm
LCS-LENGTH takes two sequences X = (x1, x2,…, xm) and Y = (y1, y2,…,yn)
as inputs. It stores the c[i, j] values in a table c[0..m, 0..n] and it computes the
entries in row-major order. (That is, the procedure fills in the first row of c from
left to right, then the second row, and so on.) The procedure also maintains the
table b[1..m,1..n] to help us construct an optimal solution. Intuitively, b[i, j] points
43
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
to the table entry corresponding to the optimal subproblem solution chosen when
computing c[i, j]. The procedure returns the b and c tables; c[m,n] contains the
length of an LCS of X and Y.
The problem of scheduling several competing activities that require exclusive use
of a common resource, with a goal of selecting a maximum-size set of mutually
compatible activities. Suppose we have a set S = {a1, a2,…an} of n proposed
activities that wish to use a resource, such as a lecture hall, which can serve only
one activity at a time. Each activity ai has a start time si and a finish time fi ,
where 0 si < fi < ∞. If selected, activity ai takes place during the half-open time
interval (si,fi). Activities ai and aj are compatible if the intervals (si,fi) and (si,fi)
do not overlap. That is, ai and aj are compatible if si >= fj or sj >= fi. In the activity-
selection problem, we wish to select a maximum-size subset of mutually
compatible activities. We assume that the activities are sorted in monotonically
increasing order of finish time:
i 1 2 3 4 5 6 7 8 9 10 11
si 1 3 0 5 3 5 6 8 8 2 12
fi 4 5 6 7 9 9 10 11 12 14 16
For this example, the subset {a3, a9, a11} consists of mutually compatible
activities. It is not a maximum subset, however, since the subset {a1, a4, a8, a11}
is larger. In fact, {a1, a4, a8, a11} is a largest subset of mutually compatible
activities; another largest subset is {a2, a4, a9, a11}.
44
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
3. Show that if we make the greedy choice, then only one subproblem remains.
4. Prove that it is always safe to make the greedy choice. (Steps 3 and 4 can
occur in either order.)
The 0-1 knapsack problem is the following. A thief robbing a store finds n items.
The ith item is worth i dollars and weighs wi pounds, where i and wi are integers.
The thief wants to take as valuable a load as possible, but he can carry at most W
pounds in his knapsack, for some integer W. Which items should he take? (We call
this the 0-1 knapsack problem because for each item, the thief must take it or leave
it behind; he cannot take a fractional amount of an item or take an item more than
once.)
45
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
In the fractional knapsack problem, the setup is the same, but the thief can take
fractions of items, rather than having to make a binary (0-1) choice for each item.
You can think of an item in the 0-1 knapsack problem as being like a gold ingot
and an item in the fractional knapsack problem as more like gold dust.
Huffman codes compress data very effectively: savings of 20% to 90% are typical,
depending on the characteristics of the data being compressed. We consider the
data to be a sequence of characters. Huffman’s greedy algorithm uses a table giving
how often each character occurs (i.e., its frequency) to build up an optimal way of
representing each character as a binary string. Suppose we have a 100,000-
character data file that we wish to store compactly. We observe that the characters
in the file occur with the frequencies.
a b c d e f
That is, only 6 different characters appear, and the character a occurs 45,000 times.
We have many options for how to represent such a file of information. Here, we
consider the problem of designing a binary character code (or code for short) in
which each character is represented by a unique binary string, which we call a
codeword. If we use a fixed-length code, we need 3 bits to represent 6 characters:
a = 000, b = 001, ..., f = 101. This method requires 300,000 bits to code the entire
file. Can we do better? A variable-length code can do considerably better than a
fixed-length code, by giving frequent characters short codewords and infrequent
characters long codewords. Figure shows such a code; here the 1-bit string 0
represents a, and the 4-bit string 1100 represents f.
46
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
In this topic presents methods for representing a graph and for searching a graph.
Searching a graph means systematically following the edges of the graph so as to
visit the vertices of the graph. A graph-searching algorithm can discover much
about the structure of a graph. Many algorithms begin by searching their input
graph to obtain this structural information. Several other graph algorithms elaborate
on basic graph searching. Techniques for searching a graph lie at the heart of the
field of graph algorithms.
Graphs are used to represent many real-life applications: Graphs are used to
represent networks. The networks may include paths in a city or telephone
network or circuit network. Graphs are also used in social networks like linkedIn,
Facebook. For example, in Facebook, each person is represented with a vertex(or
node). Each node is a structure and contains information like person id, name,
gender, and locale. See this for more applications of graph.
The following two are the most commonly used representations of a graph.
1. Adjacency Matrix
2. Adjacency List
47
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
There are other representations also like, Incidence Matrix and Incidence List.
The choice of graph representation is situation-specific. It totally depends on the
type of operations to be performed and ease of use.
Adjacency Matrix:
Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less
number of edges), it consumes the same space. Adding a vertex is O(V^2) time.
Please see this for a sample Python implementation of adjacency matrix.
Adjacency List:
An array of lists is used. The size of the array is equal to the number of vertices.
Let the array be an array[]. An entry array[i] represents the list of vertices
adjacent to the ith vertex. This representation can also be used to represent a
weighted graph. The weights of edges can be represented as lists of pairs.
Following is the adjacency list representation of the above graph.
48
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
Breadth-first search is one of the simplest algorithms for searching a graph and the
archetype for many important graph algorithms. Prim’s minimum-spanning-tree
algorithm and Dijkstra’s single-source shortest-paths algorithm use ideas similar
to those in breadth-first search. Given a graph G = (V; E) and a distinguished source
vertex s, breadth-first search systematically explores the edges of G to “discover”
every vertex that is reachable from s. It computes the distance (smallest number of
edges) from s to each reachable vertex. It also produces a “breadth-first tree” with
root s that contains all reachable vertices. For any vertex reachable from s, the
simple path in the breadth-first tree from s to corresponds to a “shortest path” from
s to in G, that is, a path containing the smallest number of edges. The algorithm
works on both directed and undirected graphs.
BFS(G, s)
u.color = WHITE
u.d = ∞
u.π = NIL
s.color = GRAY
s.d - 0
s.π = NIL
Q=φ
ENQUEUE(Q,s)
while Q ≠ φ
49
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
u = DEQUEUE(Q)
if v.color = = WHITE
v.color = GRAY
v.d = u.d + 1
v.π = u
ENQUEUE(Q,v)
u.color = BLACK
The procedure BFS works as follows. With the exception of the source vertex s,
lines 1–4 paint every vertex white, set u:d to be infinity for each vertex u, and set
the parent of every vertex to be NIL. Line 5 paints s gray, since we consider it to
be discovered as the procedure begins. Line 6 initializes s.d to 0, and line 7 sets the
predecessor of the source to be NIL. Lines 8–9 initialize Q to the queue containing
just the vertex s. The while loop of lines 10–18 iterates as long as there remain gray
vertices, which are discovered vertices that have not yet had their adjacency lists
fully examined.
50
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
51
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
At each step, we determine an edge (u,v) that we can add to A without violating
this invariant, in the sense that AU {(u,v)} is also a subset of a minimum spanning
tree.
We call such an edge a safe edge for A, since we can add it safely to A while
maintaining the invariant.
Kruskal’s algorithm finds a safe edge to add to the growing forest by finding, of all
the edges that connect any two trees in the forest, an edge (u,v) of least weight. Let
52
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
C1 and C2 denote the two trees that are connected by (u,v). Since (u,v) must be a
light edge connecting C1 to some other tree, Corollary 23.2 implies that (u,v) is a
safe edge for C1. Kruskal’s algorithm qualifies as a greedy algorithm because at
each step it adds to the forest an edge of least possible weight. Our implementation
of Kruskal’s algorithm is like the algorithm to compute connected components. It
uses a disjoint-set data structure to maintain several disjoint sets of elements. Each
set contains the vertices in one tree of the current forest.
The operation FIND-SET(u) returns a representative element from the set that
contains u. Thus, we can determine whether two vertices u and belong to the same
tree by testing whether FIND-SET(u) equals FIND-SET(v) To combine trees,
Kruskal’s algorithm calls the UNION procedure.
53
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Figure shows how Kruskal’s algorithm works. Lines 1–3 initialize the set A to the
empty set and create |V| trees, one containing each vertex. The for loop in lines 5–
8 examines edges in order of weight, from lowest to highest.
The loop checks, for each edge (u,v) whether the endpoints u and belong to the
same tree. If they do, then the edge (u,v) cannot be added to the forest without
creating a cycle, and the edge is discarded. Otherwise, the two vertices belong to
different trees. In this case, line 7 adds the edge (u,v) to A, and line 8 merges the
vertices in the two trees.
Prim’s algorithm
Prim's Algorithm is used to find the minimum spanning tree from a graph. Prim's
algorithm finds the subset of edges that includes every vertex of the graph such that
the sum of the weights of the edges can be minimized.
Prim's algorithm starts with the single node and explore all the adjacent nodes with
all the connecting edges at every step. The edges with the minimal weights causing
no cycles in the graph got selected.
Algorithm
o Step 1: Select a starting vertex
o Step 3: Select an edge e connecting the tree vertex and fringe vertex that has
minimum weight
o Step 4: Add the selected edge and the vertex to the minimum spanning tree T
[END OF LOOP]
o Step 5: EXIT
54
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
Example –
Construct a minimum spanning tree of the graph given in the following figure by
using prim's algorithm.
Solution –
o Step 2: Add the vertices that are adjacent to A. the edges that connecting the
vertices are shown by dotted lines.
o Step 3: Choose the edge with the minimum weight among all. i.e. BD and add
it to MST. Add the adjacent vertices of D i.e. C and E.
o Step 3: Choose the edge with the minimum weight among all. In this case, the
edges DE and CD are such edges. Add them to MST and explore the adjacent
of C i.e. E and A.
o Step 4: Choose the edge with the minimum weight i.e. CA. We can't choose
CE as it would cause cycle in the graph.
The graph produces in the step 4 is the minimum spanning tree of the graph
shown in the above figure.
cost(MST) = 4 + 2 + 1 + 3 = 10 units.
55
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
The single source shortest path algorithm (for arbitrary weight positive or negative)
is also known Bellman-Ford algorithm is used to find minimum distance from
source vertex to any other vertex. The single-destination shortest path problem, in
which we have to find shortest paths from all vertices in the directed graph to a
single destination vertex v. This can be reduced to the single-source shortest path
problem by reversing the arcs in the directed graph.
56
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
Example –
Step1: To topologically sort vertices apply DFS (Depth First Search) and then
arrange vertices in linear order by decreasing order of finish time.
57
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Now, take each vertex in topologically sorted order and relax each edge.
2. 0 + 3 < ∞
3. d [t] ← 3
4. 0 + 2 < ∞
5. d [x] ← 2
1. adj [t] → r, x
2. 3 + 1 < ∞
3. d [r] ← 4
4. 3 + 5 ≤ 2
58
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
1. adj [x] → y
2. 2 - 3 < ∞
3. d [y] ← -1
1. adj [y] → r
2. -1 + 4 < 4
3. 3 <4
4. d [r] ← 3
1. s to x is 2
2. s to y is -1
3. s to t is 3
4. s to r is 3
59
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
60
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
7 The set sptSet is initially empty and distances assigned to vertices are {0,
INF, INF, INF, INF, INF, INF, INF} where INF indicates infinite. Now pick
the vertex with minimum distance value. The vertex 0 is picked, include it
in sptSet. So sptSet becomes {0}. After including 0 to sptSet, update
distance values of its adjacent vertices. Adjacent vertices of 0 are 1 and 7.
The distance values of 1 and 7 are updated as 4 and 8. Following subgraph
shows vertices and their distance values, only the vertices with finite
distance values are shown. The vertices included in SPT are shown in green
colour.
8
9 Pick the vertex with minimum distance value and not already included in
SPT (not in sptSET). The vertex 1 is picked and added to sptSet. So sptSet
now becomes {0, 1}. Update the distance values of adjacent vertices of 1.
The distance value of vertex 2 becomes 12.
10 Pick the vertex with minimum distance value and not already included in
SPT (not in sptSET). Vertex 7 is picked. So sptSet now becomes {0, 1, 7}.
Update the distance values of adjacent vertices of 7. The distance value of
vertex 6 and 8 becomes finite (15 and 9 respectively).
61
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
11 Pick the vertex with minimum distance value and not already included in
SPT (not in sptSET). Vertex 6 is picked. So sptSet now becomes {0, 1, 7,
6}. Update the distance values of adjacent vertices of 6. The distance value
of vertex 5 and 8 are updated.
12 We repeat the above steps until sptSet does include all vertices of given
graph. Finally, we get the following Shortest Path Tree (SPT).
62
Chapter 3: Advanced Design And Analysis Techniques - Role Of Algorithm
1. https://www.javatpoint.com/prim-algorithm
3.10 Exercises
2. Prove that the fractional knapsack problem has the greedy-choice property.
63
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Unit III
4
NUMBER-THEORETIC ALGORITHMS
AND NP – COMPLETENESS
Unit Structure: -
4.0 Objective.
4.1 Introduction.
4.9 NP-Completeness.
4.9.1 Decision vs Optimization Problems
4.9.2 What is Reduction?
4.9.3 How to prove that a given problem is NP complete?
64
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
4.11 Summary
4.12 References
4.13 Bibliography
4.14 Exercise
4.0 Objective
Algorithms are heavily based on Number Theory. So many day to day problems
can be solved with simple numerical methods. We are trying to include such
materials in this topic.
4.1 Introduction
Number theory was once viewed as a beautiful but largely useless subject in pure
mathematics. Today number-theoretic algorithms are used widely, due in large part
to the invention of cryptographic schemes based on large prime numbers. These
schemes are feasible because we can find large primes easily, and they are secure
because we do not know how to factor the product of large primes (or solve related
problems, such as computing discrete logarithms) efficiently. This chapter presents
some of the number theory and related algorithms that underlie such applications
65
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Let us consider set Z= {…., - 2, -1,0,1, 2….} of Integers and Set N = {0, 1, 2
….}, Prime and composite numbers are defined as follows:
An integer a > 1 whose only divisors are the trivial divisors 1 and a is a prime
number or, more simply, a prime. Primes have many special properties and play a
critical role in number theory. The first 20 primes, in order, are
2, 3, 5, 7 ,11, 13, 17, 19; 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71
EUCLID (a,b)
if(b = = 0)
return a
else return EUCLID(b, a mod b)
66
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
The worst case time complexity is calculated with the sum of a + b. As all
positive integers have common divisor as 1, time complexity will have an upper
bound of O(a+b).
A === b mod x.
ax === b (mod n)
Where a > 0 and n > 0. This problem has several applications like RSA public-
key cryptosystem. Assuming a, b, and n are given, and aim to find all values of x,
modulo n, that satisfy equation The equation may have zero, one, or more than
one such solution.
MODULAR-LINEAR-EQUATION-SOLVER (a,b,n)
(d, x’,y’) = EXTENDED-EUCLID (a , n ) …………………x0 = x’(b/d)
mod n
if d | b
for i = 0 to d-1
print (x0 + i( n | d) ) mod n
else print “no solutions”
67
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Let us consider set of pairwise relatively prime integers m1, m2, . . . , mr. Then the
system of simultaneous congruence’s
x ≡ a1 (mod m1)
x ≡ a2 (mod m2)
.
.
.
x ≡ ar (mod mr)
The above equation has a unique solution modulo M = m1,m2, · · · mr, for any
given integers a1, a2, . . . , ar.
Proof of CRT.
Here x is the solution for above problem. Since operator m1, . . . , mr are pairwise
relatively prime, any two simultaneous solutions to the system must be congruent
modulo M. Thus the solution is a unique congruence class modulo M, and the value
of x computed above is in that class.
Example:-
x===1 (mod 5)
x=== 1(mod 7)
Here 5, 7 and 11 are co-prime to each other therefore we will find out solution with
Chinese Remainder Theorem.
68
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
Find M,
Here m1 = 5, m2 = 7, m3 = 11.
Therefore,
M1 = M / m1 = 385 / 5 = 77.
M2 = M / m2 = 385 / 7 = 55.
M3 = M / m3 = 385 / 11 = 35.
Let us Calculate x1
77. x1 (mod 5) = 1
x1 = 3 ………. ( (2 * 3 ) mod 5 = 1)
Let us Calculate x2
55. x2 (mod 7) = 1
x2 = 6 ………. ( (6 * 6 ) mod 7 = 1)
Let us Calculate x3
X3 = 6 ………. ( (2 * 6 ) mod 11 = 1)
69
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
= 36
modulo n. Indexing from 0, the 0th value in this sequence is a0 mod n D 1, and the
ith value is ai mod n. For example, the powers of 3 modulo 7 are
i 0 1 2 3 4 5 6 7 8
3i mod 7 1 3 2 6 4 5 1 3 2
In the RSA public-key cryptosystem, a participant creates his or her public and
secret keys with the following procedure:
1. Select at random two large prime numbers p and q such that p =/ q. The
primes p and q might be, say, 1024 bits each.
2. Compute n = pq.
6. Keep secret the pair S =(d,n)as the participant’s RSA secret key.
70
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
4.9 NP-Completeness:-
P v/s NP Problem is very famous issue in computer Science. The Abbrevations are
defined as follows:-
NP-completeness applies to the realm of decision problems. It was set up this way
because it’s easier to compare the difficulty of decision problems than that of
optimization problems. In reality, though, being able to solve a decision problem
in polynomial time will often permit us to solve the corresponding optimization
problem in polynomial time (using a polynomial number of calls to the decision
problem). So, discussing the difficulty of decision problems is often really
equivalent to discussing the difficulty of optimization problems. (Source Ref 2).
For example, consider the vertex cover problem (Given a graph, find out the
minimum sized vertex set that covers all edges). It is an optimization problem.
Corresponding decision problem is, given undirected graph G and k, is there a
vertex cover of size k?
71
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Let L1 and L2 be two decision problems. Suppose algorithm A2 solves L2. That
is, if y is an input for L2 then algorithm A2 will answer Yes or No depending upon
whether y belongs to L2 or not.
72
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
for any fixed ɛ > 0, the scheme runs in time polynomial in the size n of its input
instance.
APPROX-VERTEX-COVER(G)
C=∅
E’ = G. E
While E ≠ ∅
C = C ∪ {𝑢𝑢, 𝑣𝑣}
return C
73
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
(b) The edge (b, c), shown heavy, is the first edge chosen by APPROX-
VERTEXCOVER. Vertices b and c, shown lightly shaded, are added to the
set C containing the vertex cover being created. Edges (a, b), (c, e), and (c,
d), shown dashed, are removed since they are now covered by some vertex
in C.
(f) The optimal vertex cover for this problem contains only three vertices: b, d,
and e.
Given a set of cities and distance between every pair of cities, the problem is to find
the shortest possible route that visits every city exactly once and returns to the
starting point.
74
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
Note the difference between Hamiltonian Cycle and TSP. The Hamiltoninan cycle
problem is to find if there exist a tour that visits every city exactly once. Here we
know that Hamiltonian Tour exists (because the graph is complete) and in fact
many such tours exist, the problem is to find a minimum weight Hamiltonian Cycle.
APPROX-TSP-TOUR(G,c)
Let H be a list of vertices, ordered according to when they are first visited in a
preorder tree walk of T
75
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
(d) A tour obtained by visiting the vertices in the order given by the preorder
walk, which is the tour H returned by APPROX-TSP-TOUR. Its total cost is
approximately 19:074. (e) An optimal tour H * for the original complete
graph. Its total cost is approximately 14,715.
GREEDY-SET-COVER(X, F)
U=X
C=D
while U ≠ ∅
U=U-S
C = C ∪ {S}
return C
76
Chapter 4: Number-Theoretic Algorithms and NP – Completeness
The subset sum problem is a decision problem in computer science. In its most
general formulation, there is a multiset S of integers and a target sum T, and the
question is to decide whether any subset of the integers sum to precisely T.[1] The
problem is known to be NP-complete. Moreover, some restricted variants of it are
NP-complete too, for example:[1]
The variant in which input integers may be positive or negative, and T = 0. For
example, given the set { 7,-3,-2,9000,5,8|}.{ 7,-3,-2,9000,5,8}, the answer is yes
because the subset{-3,-2,5\}{-3,-2,5} sums to zero.
Subset sum can also be regarded as an optimization problem: find a subset whose
sum is as close as possible to T. It is NP-hard, but there are several algorithms that
can solve it reasonably fast in practice.
For the given the set S = {x1, x2, x3, … xn } of positive integers and t, is there a
subset of S that adds up to t. Subset Sum is an optimization problem, what subset
of S adds up to the greatest total <= t. Here, we use the notation S + x = { s+x : s
S}. we have a merge lists algorithm that runs in time proportional to the sum of
the lengths of the two lists.
EXACT-SUBSET-SUM(S,t)
1 n <- |S|
2 L0 = <0>
3 for i = 1 to n
4.11 Summary:-
77
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
4.12 References: -
4.13 Bibliography: -
• https://en.wikipedia.org/wiki/NP_(complexity)
4.14 Exercise: -
• What is NP Problem?
78
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
Unit IV
5
RESEARCHING COMPUTING
Unit Structure
5.0 Introduction,
5.7 Experiments,
5.11 Exercise
5.0 Introduction
▪ To solve a problem
79
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
▪ A new theory
▪ A critical analysis.
The problem is based on general topic and issues. We need to find out the correct
issue and problem to find out the exact solution.
80
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
Literature is nothing but the available material which will decide the direction
of the research. Literature is findings of experienced person and we can embed
their experienced views in our research.
o Be neither too broad nor too narrow. See Appendix A for a brief
explanation of the narrowing process and how your research question,
purpose statement, and hypothesis(es) are interconnected.
81
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
Internet research is a research method in which information and data collected from
Internet. Various qualitative journals and articles are available which are used to
retrieve the information. As a practice there are variety of journals available which
can be used to support the research.
One can collect various reviews and surveys through internet. There are various
tools and techniques available to categorize the information. After searching you
can get thousands of quick results for some topics. You can subscribe freely to
several sites to receive updates regarding to your topic. You can subscribe to
several groups also. Internet research is not like offline resource who are bound to
certain limitation. Offline resources also makes restriction on availability. Internet
research can provide quick, immediate, and worldwide access to information,
although results may be affected by unrecognized bias, difficulties in verifying a
writer's credentials (and therefore the accuracy or pertinence of the information
obtained) and whether the searcher has sufficient skill to draw meaningful results
from the abundance of material typically available.[2]
Researchers also need to meet their ethical obligations once their research is
published: If authors learn of errors that change the interpretation of research
findings, they are ethically obligated to promptly correct the errors in a
correction, retraction, erratum or by other means.
Perhaps one of the most common multiple roles for researchers is being both
a mentor and lab supervisor to students they also teach in class. Psychologists
need to be especially cautious that they don't abuse the power differential
between themselves and students, say experts. They shouldn't, for example,
82
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
Experts also suggest covering the likelihood, magnitude and duration of harm
or benefit of participation, emphasizing that their involvement is voluntary
and discussing treatment alternatives, if relevant to the research. Keep in
mind that the Ethics Code includes specific mandates for researchers who
conduct experimental treatment research. Specifically, they must inform
individuals about the experimental nature of the treatment, services that will
or will not be available to the control groups, how participants will be
assigned to treatments and control groups, available treatment alternatives
and compensation or monetary costs of participation. If research participants
or clients are not competent to evaluate the risks and benefits of participation
themselves--for example, minors or people with cognitive disabilities--then
the person who's giving permission must have access to that same
information, says Koocher.
Researcher can write literature review for dissertation purpose. In this kind of
analysis researcher has to focus on detailed analysis. Researcher should write brief
summery on given topic.
If Researcher are writing a stand-alone paper, give some background on the topic
and its importance, discuss the scope of the literature you will review (for example,
the time period of your sources), and state your objective.
83
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
• Introduction
• Body
Give an overview of the main points of each source and combine them
into a coherent whole
o Critically evaluate:
84
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
• Timeline
• Measurement of analysis
• Neutrality:
The design should be neutral, it should not be in influence of any other pre-
concept.
• Reliability:
• Validity:
• Generalization:
The outcome of your design should apply to a population and not just a
restricted sample.
85
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
86
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
• After analyzing the results, you can apply your findings to similar ideas or
situations.
• You can identify the cause and effect of a hypothesis. Researchers can further
analyze this relationship to determine more in-depth ideas.
• Experimental research makes an ideal starting point. The data you collect is
a foundation on which to build more ideas and conduct more research.
• Whether you want to know how the public will react to a new product or if a
certain food increases the chance of disease, experimental research is the best
place to start. Begin your research by finding subjects using QuestionPro
Audience and other tools today.
Quantitative data is defined as the value of data in the form of counts or numbers
where each data-set has an unique numerical value associated with it.
Projection of data: Future projection of data can be done using algorithms and
other mathematical analysis tools.
87
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
• Accurate results: As the results obtained are objective in nature, they are
extremely accurate.
88
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
While presenting a research you should clearly mention your objective and research
question before audience. Author or Presenter should reach along with research
question an solution. It is very important, when considering your audience, to
know:
• what their needs are and how you can help them?
The presentation should include: a short intro, your hypotheses, a brief description
of the methods, tables and/or graphs related to your findings, and an interpretation
of your data.
The trick to giving good presentations is distilling your information down into a
few bulleted lists, diagrams, tables and graphs.
89
ANALYSIS OF ALGORITHMS AND RESEARCHING COMPUTING
than the other, a drawing of a happy Daphnia) and state that result. Then
display the results in graphical form, reminding the audience of your
hypothesis and stating whether it was supported as you do so. Use simple,
clean, clearly labeled graphs with proper axis labels (no extraneous 3-D
effects please). Do not use light colors (yellow, light green, or pink) in your
figures, they do not show up well when projected. Indicate the results of the
statistical tests on the slides by including values (or asterisks/letters that
indicate the significance level) on the same slides with the graphs.
• Basic of Qualitative Research (3rd Edition), Juliet Corbin & Anselm Strauss:,
Sage Publications (2008)
90
Chapter 5: Number-Theoretic Algorithms and NP – Completeness
5.10 Exercise:-
• What is research?
91