Advanced DS
Advanced DS
M.C.A./P.G.D.C.A. - I YEAR
SYLLABUS
TINIT I
Introduction: Mathematics Review - A brief introduction ro recursion. Algorithm analysis, Mathematic
background - model - what to analyze - Running time calculations.
Lists, Stacks, Queues: Abstract data types (ADT) - The List ADT - The stack ADT - The Queue ADT
LINIT II
Trees: Implementation of trees, tree travels with an application - -
Binary rrees The search tree ADT -
Binary Search Trees
Hashing: General Idea - Hash function - Separate Chaining.
Priority Queues ftIeaps): Model - Simple implementations - Binary Heap.
UNIT III
Sorting: Preliminaries - Insertion sort - Shell Sort - Heap Son - Merge Sort - Quick Sort.
LINIT IV
Graph Algorithms: Definition Topological sort shortest - Path Algorithms - Network How Problems
- Minimum Spanning Tree - Applications of Depth - First Search.
LINIT V
Algorithm Design Techniques: Greedy Algorithms - Divide and Conquer - Running Time of divide
and Conquer Algorithms - Closest-Points Problems - The Selection Problem TheoretiJal improvements
-
for Arithmetic Probiems.
UNITI
,
LESSON
I
FUNDAMENTAL OF DATA STRUCTURES
CONTENTS
1.0 Aims and Obiectives
t.t Introduction
t.2 Recursion
1.3 Algorithm Analysis
1.3.1 Problem Solving using Pseudocode
1.3.2 Problem Solution using Flow Chart Diagram
1.3.3 MathematicBackground
t.3.4 Model: Abstract Data Type (ADT)
1.3.5 ttr7hat to Analyze?
t.3.6 Running Time Calculations
1.4 Let us Sum up
1.5 Keywords
t.6 Questions for Discussion
L.7 Suggested Readings
1.1 INTRODUCTION
Semantically data can exist in either of the two forms - atomic or structured. In most of the
programming problems data to be read, processed and written are often related to each other' Data
ir.-, r.l"t.d in a variety of different ways. \ilhereas the basic data types such as integers, characters
"r.
etc. can be directly created and manipulated in a programming language, the responsibility of creating
the structured type data items remains with the programmers themselves. Accordingly, programming
languages provide mechanism to create and manipulate structured data items.
8 Advanced Data Structure
M.S. University - D.D.C.E.
1.2 RECURSION
Recursion is a wonderful, powerful *Jr to_ solve problems.
It is an imponanr concepr in computer
science' Many algorithms can be best described i., i"r.n,
of recursion. Recursion defines a function in
terms of itself' That is, in the course of the function definition
there is a call to thar very same
function' At first this may seem like a never ending loop, o, like
a dog .t ir, rail. It can never
catch it' so too it seems our method will never finisi'. "rirg
Thi, ,rright b" t^J i, ,o-.'*r.s, but in practice
we can check to see if a certain condition is true and in thrt
.i"r" exit (return from) our method. The
case in which we end our recursion is called a base case.
Addiri,onally, just as in a loJp, *. must change
some value and incrementally advance closer to our base
case.
Consider this function:
void myMethod( int counter)
{
if(counter == 0)
return,.
else
{
System. out. println 1, " +counter) ;
myMethod ( - -counter ) ;
return;
)
)
This recursion is nor infinite, assuming the method is passed
a positive integer value.
Consider this method:
void myMethod( int counter)
{
i_f(counter == 0)
return;
else
{
System. out . println "he11o" + count.er)
(
M.S. University - D.D.C.E Fundamental of Data Structures 9
myMethod ( - -counter ) ;
The above recursion is essentially a loop like a for loop or a while loop. \7hen do we prefer recursion
to an iterative loop? We use recursion when we can see that our problem can be reduced to a simpler
problem that can be solved after further reduction.
Every recursion should have the following characteristics:
. A simple base case which we have a solution for and a return value.
. A way of getting our problem closer to the base case, i.e., a way to chop out parr of the problem
to get a somewhar simpler problem.
. A recursive call which passes the simpler problem back into the method.
The key to thinking recursively is to see the solution to rhe problem as a smaller version of the same
problem. The key to solving recursive programming requirements is to imagine that your method
does what its name says it does even before you have actually finish writing it. You must pretend the
method does its job and then use it to solve the more complex cases. Here is how.
Identify the base case(s) and what the base case(s) do. A base case is the simplest possible problem (or
case) your method could be passed. Return the correct value for the base case. Your recursive method
will then be comprised of an if-else statement where the base case returns one value and the non-base
case(s) recursively call(s) the same method with a smaller paramerer or set of data.
return 1;
else
{
return ( integer* (myFact.orial ( integer-1 ) ;
Note that the base case (the factorial of 1) is solved and the return value is given. Now let us imagine
that our method actually works. If it works we can use it to give the result of more complex cases. If
our number is 7 we will simply return 7 * the result of factorial of 6. So we actually have the exact
answer for all cases in the top level recursion. Our problem is getting smaller on each recursive call
because each time we call the method we give it a smaller number. Try to run this program in your
mind with the number 2. Does it give the right value? If it works for 1 then it mirst *ork for t*o ,irr..
2 merely returns 2 * factoriel of 1. Now will it work for 3) '\tr7ell, 3 must return 3 * factorial of 2. Now
since we know that factorial of 2 works, factorial of 3 also works. W'e can prove that 4 works in the
same way, and so on and so on.
However, in fact, your code won't run forever like an infinite loop,.instead, you will eventually run
out of stack space (memory) and get a run-time error or exceprion called a stack overflow. There are
several significant problems with recursion.
Mostly it is hard (especially for inexperienced programmers) to think recursively, though many AI
specialists claim that in reality recursion is closer to basic human thought processes than other
programming methods (such as iteration). There also exists the problem of stack overflow when using
some forms of recursion (head recursion.) The other main problem with recursion is that it can be
slower to run than simple iteration. Then why use it? It seems that there is always an iterative soludon
to any problem that can be solved recursively. Is there a difference in computational complexity? No.
Is there a difference in the efficiency of execution? Yes, in fact, the recursive version is usually less
efficient because of having to push and pop recursions on and off the run-time sack, so iteration is
quicker. On the other hand, you might notice that the recursive versions use fewer or no local
variables.
So why use recursion? The answer to our question is predominantly because it is easier to code a
recursive solution once one is able to identify that solution. The recursive code is usually smaller,
more concise, more elegant, possibly even easier to understand, though that depends on ones thinking
style. But also, there are some problems that are very difficult to solve without recursion. Those
problems that require backtracking such as searching a maze for a path to an exit or tree based
operations are best solved recursively.
Tail Recursion
Tail reiursion is defined as occurring when the recursive call is at the end of the recursive instruction.
This is not the case with my factorial solution above. It is useful to notice when ones algorithm uses
tail recursion because in such a case, the algorithm can usually be rewritten to use iteration instead. In
fact, the compiler will (or at least should) convert the recursive program into an iterative one. This
eliminates the potential problem of stack overflow.
M.S. University - D.D.C.E. Fundamental o{Data Structures 11
To convert this to tail recursion we need to get all the multiplication finished and resolved before
recursively calling the function. 'We need to force the order of operation so rhar we are not waiting on
multiplication before returning. If we do this the stack frame ."r, b. freed up.
The proper way to do a tail-recursive factorial is this:
int. factorial-(int number) t
if(number -= 0) t
return 1;
)
factorial_i (number, l) ;
)
int, fact.orial_i (int currentNumber, int sum) {
if(currentNumber -- 1) {
ret.urn sum;
) else {
return fact.orial i (currentNumber - 1, sum*currentNumber) ;
)
Recursion
)
This solution typically consists of two parrs: dara srrucrures and algorithms.
An algorithm is a well-defined list of steps for solving a parricular problem. A set of algorithms are
always used for performing operations on the data stored Ly
-.rrm of d^t^srnrcrure. Thu"s algorithms
handle data through data stnrcture. In constructing a solution to a problem, a dara strucrure mu$ be
chosen that allows the data to be operated upon easily in the manneirequired by the algorithm.
Data may be arranged and managed at many levels. Algorithm has to be designed in such a manner so
that it can perform the desired operation on rhe stored data.
An algorithm may need to put new data into an existing collection of data, remove data from a
collection, or query a collection of data for a specific purpose.
In the design of many types of programs, the choice of data srrucrures is a primary design
consideration' Exper-ience-in building large systems has shown that the difficulty of implementatiJn
and the quality and performance of the final resuh depends heavily on choosing the best data
stnrcture' After the data structures are chosen, the algorithms to be used often become relatively
obvious. Sometimes things work in the opposite direction - data structures are chosen because certain
key tasks have algorithms that work best with particular data structures. In either case, rhe choice of
appropriate data strucrures is crucial.
The formal algorithm consists of two parts. The first part is a paragraph describing the purpose of the
algorithm, identifies the variable which is used in the algorithm.rrd liri. of input datr. Tie ,".ord prrt
of the algorithm consisrs of the list of sreps thar is to be executed.
Problem
A shop has started a discount scheme. According ro rhar scheme if the purchased quantity is more 10
then 10olo discount DISC has been given to the customer. Operator has to enrer rheiu antity
eTy and
rate RATE of the item; the program will display the total value TOT_VAL. One way to solve the
problem is as follows:
Solution
First Initialize DISC with : 0 and accept the RATE and QTY from the user. Then check whether the
QTY is more than 10 or not. If QTY ) 10 then set DISC : 10. Calculate the rotal value TOT VAL
and display it.
A formal Algorithm of the stated problem:
Algorithm: (TotalValueCalculation)
This algorithm accepts the input for QTY and RATE from the user rhen checks whether the
10 or not. If QTY ) 10 then assigns DISC : 10. After that the VAL : (QTY * RATE)
eTy >*
- (RATE
DISC/100) is calculated and displayed.
M.S. University - D.D.C.E. Fundamental of Data Structures 13
[End of If structure.]
Step 4. fCalculation of total value]
Set VAL: : (QTY x RATE)- EATE " DISC/100).
Step 5. Print VAL.
Step 5. Exit.
1.3.2 Problem Solution using Flow Chart Diagram
A flow chart is a graphical or symbolic representation of a process. Each step in the process flow is
represented by a different symbol and contains a short text description of the process step in the flow chart
symbol. The flow chan symbols are linked together with arrow connectors (also known as flow lines).
Table 1.1: Program Flow Chart Symbols
I l---+ <--
I
t+ direction of flow.
A Flow chart of the previously stated Total Value Calculation problem is given below:
Expressions will be made up of variables and constants connecred by means of operators. I[7e shall
use
the usual arithmetic operators like +, -, * and /. In addition to these, operarors iike mod will be used
to mean remainder of integer division. Thus a mod b will mean the remainder of division of a by b.
The operators * and mod will have higher precedence than + and -. Float and integer can be mi*ei in
expressions but the result will be float. Boolean expressions can be obtained by ising the relational
operators lihe the following:
- = (equal to)
!- (not equal to)
( (less than)
Boolean variables or expressions can be connected with the logical operators not represented by !, and
represented by &&, or represented by I l, to obtain further compound boolean expressions. For
example,
(r)=10)ll(r<:20)
would mean a should be greater than or equal to 10 or a should be less than or equal
to 20. Among the logical operators not will have a higher precedence than and which in turn will have
a precedence higher than or.
As regards associativity of operators, we shall use parentheses to avoid confusion. Implicitly, left to
right associativity will be assumed. Thus, a* b / c would mean (aob)/c.
The variables used in the programs should be declared before the executable statements in a section
beginning with the keyword var. The format of a variable declaration is as follows:
( data type ) < list of variables ) ;
For example,
int x, y, z;
means thar x, y and z are variables of type integer. The data type can be standard type or user-defined.
The enumerated type as available in C are assumed to be included in this pseudo-code. We illustrate
this with the help of following example:
enum colour (brown, red, green);
This declaration assigns 0 to brown, 1 to red and 2 to green. The enumerated type is defined with a list
of names as shown in the case of colour. These names are the values of a variable of this tyPe we can
assume. For example, the statement,
a = brown;
will assign the value of brown to a. Note that brown is a constant of type color and not a variable.
As in every programming language, the implied sequencing of statements will be assumed in the
pseudo-code. This means that statements will be executed sequentially in a top-to-bottom manner
unless the flow of control is explicitly altered by a control construct such as a loop construd. The
sratemenrs will be separated from one another by means of a semicolon$. A group of statements
placed wirhin begin and end will be a compound statement and will be treated as a single unit.
Thus,
a[i] : x-y;
is an example of the assignment statement.
\7e shall also use the usual if-then-else statement in the following format:
if < condition ) then ( statement block > else ( statement block >
16 Advanced Data Structure
M.S. University - D.D.C.E.
representing the ADTs in terms of the data.types_ and operarors supported by the programming
language itself. To rePresent the mathematical model underlying an ADT, we use dara srrucrures,
which are a collection of variables, possibly of several data types, corrnecte,i in various
ways.
The cell is the basic building block of data structures. rwe can picture a cell as
a box that is capable of
hoiding a value drawn from some basic or.composite datatypi. Data srructures
are creared by giving
names to aggregates of.cells. and (optionally) interpreting tle values of
some cells as ..pr"rJ"rl"i
relationships or connections (e.g., pointers) ,*orrg ."ilr.
Suppose we are writing an algorithm for searching the occurrence of a parricular letter from a given
word. If, the letter occurs at the beginning of the word then the f(n) is small. On the other hand, if the
particular letter does not appear in the given word then the f(n) is big.
Generally the complexity of an algorithm is measured by three cerrain cases.
Big-O Notation
It is a theoretical measure of the execution of an algorithm, usually the time or memory needed, given
the problem size n, which is usually the number of items. Informally, saying some equarion
f(") : O(g(")) means it is less than some constant multiple of g(n). The notation is read, "f of n is big
oh of g of n".'
Formal Definition: f(n) : O(g(")) means there are positive constants c and k, such that O : (") : cg(n)
for all n : k. The values of c and k must be fixed for the function f and must not depend on n.
cq {n}
However, an algorithm can almost always be developed that uses rhe available space and execurion
time balance in a given sysrem.
20 Advanced Data Structure M.S. University - D.D.C.E.
heck Your
Fill in the blanks:
(r) Iteration (ooping) in functional languages is usually accomplished via .............
(b) Recursion defines a function in terms of .................. .
(.) The case in which we end our recursion is called a ............... case.
2. Define algorithm.
3. FIow many parts are there in a formal algorithm? Mention the parts.
4. Define pseudocode.
The objective of analyzingan algorithm is to obtain quantitatiye measures for the resources required
by the algorithm during its execution. Performance of an algorithm is often measured in terms of the
space and time required to execute it. Generally, time and space required to execure a program are
inversely proportion. Accordingly, if the program is required ro be executed quickly, the space it will
occupy will be more and vice-versa.
1.5 KEY\TORDS
Recursion: It is a method in which a function calls itself.
Tail Recursion: It is defined as occurring when the recursive call is ar rhe end of the recursive
instrucrion.
Flow Chart: Diagramatic representation of an algorithm.
Pseudocodc: An outline of a program, written in a form that can easily be converted into real
programming statements.
Data Structure.'A combination of one or more basic data types ro form a single addressable data type
along with operations defined on it.
Algorithms: A finite set of instructions which, when followed, accomplishes a particular task, the
termination of which is guaranteed under all cases.
ADT (Abstract Data T!pe).' A mathematical model with a collection of operations defined on rhar
model.
Fundamental of Data Structures 21
M.S. University - D.D'C.E'
l.6QUESTIONSFON
7. Vhat is recursion? How does recursion works?
(r)y:(x,+x,+..'..+x)
(b)y : L + 2x+ 4x,* 8x,+ ... + 2"x"'
(.)y:(1+x)"
4. GivenS:1+ 22 + 32 + 42 +......... + n2,whereSisthesumof thesquaresof nnumbers'in'rite
an algorithm to comPute S.
5. \write an algorithm ro compure the sums for the first n terms of the following series, where n has
(b) Itself
(c) Base
2
LISTS, STACKS AND QUEUES
CONTENTS
2.0 Aims and Objectives
2.1, Introduction
2.2 Singly Linked List
2.2.1 ADT of Singly Linked Lists
2.2.2 Implementation
2.3 Application:PolynomialAddition
2.4 ADT of Stacks
2.5 Implementation
2.5.1, Implementing a Stack using an Array
2.5.2 Implementing Stacirs using Linked Lists
2.6 Analysis of Stack Implementations
2.7 ADT of Queues
2.8 Queuelmplementations
2,8.t Arra,v Implementation of Queues
2.8.2 Linked Impiementation of Queues
2.9 Analysis of Queue Implementations
2.1,0 Let us Sum up
2.11 Keywords
2.12 Questions for Discussion
2.13 SuggestedReadings
2.1 INTRODUCTION
Computers help us in solving many real life problems. 'We know that
compurers can work with
greater speed and acctracy than human beings. lVhat actually does
, .o-p.ri", do? A very simple
answer to this question is that a computer .tor., data and reproduces
it as information as and when
required' Representation of data should be in a proper for-rt so that
accurare information can be
produced f-iqh speed. In this lesson, *" *iil siudy the various ways
I in which data can be
represented' Efficient storage and retrieval of data is imiortant in computirrg.
I., this lesson, we will
study and implement various data structures.
organized data is known as information. Let us consider a few examples and try
fi understand what
data structures are' You all must have seen a pile of plates in a resraurant.
\Thenever a plate is required,
p!1. on the toP is removed. Similarly, if , plri. is added to the pile, it is kept on th" ,op of rfr"
'Lt There is a definite
pile' Process involved in 1!e ,tor^g. and rerrieval oi ttre plates. If the rack i, .-ptf
ther.e will be no plates Similarly, if the rack"is full then there will b" ,ro plr." for more plri.r.
1]'ailabl:'
Similar Proc€ss is applied with stacks. Stack is a data srnrcrure which stores 'daa
at the toi, this
operation is known as push. It retrieves data from the top, the operation is
known as pop. If the stack
then the pop operation raises an error while push op.rrrion cannor be
:t :T.P? performed-if the stack
is full. Stack is shown in Figure 2.1.
acomparrment. Linked lists in data structures follow similar approach. An element can be inserted and
deleteJ from any position in the linked list. A list can be shown as in Figure 2.3'
There are other data srructures thar we will study in this course, for example graphs, trees etc. As we
have seen in the above examples data structures stores a given set of data according to certain rules.
These rules help in organizing the data.
you have studied programming languages before. This example use some concePts from it to
will
explain why data srnrcrures are essential. Data can be stored in various ways.. Usually the
,.p."r.rrtrtion chosen depends on the nature of the problem and not on the data. Consider a program,
*iri.h requires storing mark, of five students for a single subject. The simplest way to store them will
be to use ii,r. irrtege. variables. \fle assume that the roll numbers are from one to five. Now I wish to
write a program, *t i.t can give me the marks of any roll number given as input. Is it possible to write
an efficient program to do this task if the numbers are stored as variables?
& b * * *
() i1 f! f1
*
Figure 2.4: Marks Stored in five different Variables
The data,'anin this case the marks, is stored, but it cannot be reproduced as information efficiently. Now
we take of integers with five elements. The above data is stored in the array - Ln array will
^rray element can be accessed using index. Therefore, the marks for the
have a name'a''arrd
"rcliirdividual
first roll number will be stored as a [0], for the second roll number as a [1] and so on.
There is a relation between the roll number and the array index. Now it is much easier to access the
marks according ro rhe rcll number. This could not be achieved using variables, as there was no
relation between the data. It was not possible to relate marks and roll number.
The above example can be slightly modified so that now we will store the marks of three subjects per
roll number. \7e can take three independent arrays to store them.'W'e can access the individual marks
for each roll number from the respective arrays. Ve find that the arrays contain marks of different
subjects for the same roll number. It would be easier to handle the marks if these three artays were
grouped together.
26 Advanced Data Srrucrure
M.S. University - D.D.C.E.
top and an element can be removed only from the top of the stack. It does not specify how these rules
should be implemented. It specifies what are the requiremenrs of a stack.
The implementation of a data stnrcture is done with the help of algorithms. An algorithm is a logical
sequence of discrete steps that describe a complete solution to a given problem in a finite amount of
time. A task can be carried out in various ways, hence there are more than one algorithm which can be
used to implement a given data structure. This means that we will have to analyze algorithms to select
the best suited for the application.
Now, this size cannot be changed while running the program. This we all know is static allocation.
\7hen writing the program, we have to decide on the maximum amount of memory that would be
needed. If we run the program on a small collection of. data, then much of the space will go waste. If
program is run on bigger collection of data, then we may exhaust the space and encounter an
overflow. Consider the following example:
Example: Suppose, we define an array of size 5. if we store 5 elements in it, it is said to be full and no
space is left in it. On the contrary, if we store 2 elements in it, then 3 positions are empty and virtually
useless, resulting in wastage of memory.
1 1
? ?
Figure 2.7
Dynamic data structures can avoid these difficulties. The idea is to use dynamic memory allocation.
'$7e
allocate memory for individual elements as and when they are required. Each memory location
contains a pointer to the location where the successive element is stored. A pointer or a link or a
reference is a variable, which stores the memory address of some other variable. If we use pointers to
locate the data in which we are interested, then we need not worry about where the data is actually
stored, since by using a pointer, we can let the computer system itself locate the data when required.
28 Advanced Data Structure M.S. Universitv - D.D.C.E.
Linked lists use the concept of dynamic memory allocation. In this respect, they are different than
arrays. Every node in a linked list contains a 'link' to the next node as shown below. This link is
achieved by using poinrers.
rteffi t
t 1
item I itr*
l*
ltI*xt rnd
$tilt
ListADT Specification
Value Definition: The value definition of a linked list contains a dara type for storing the value of the
node along with the pointer to the next node. The value can be represented using a simple data type or
a collection of basic data types. Ffowever, it must necessarily contain at least one pointer to the next
stnrcture. This can be shown as follows:
sLruct datatlpe
t
int item;
struct datatlpe *next;
)
or,
struct datatlpe
{
int item;
f1oat. info;
char str;
struct. datatlpe *next;
)
Definition clause: The nodes of the list are all of the same rype, and have a key field called key. The list
is logically ordered from smallest unique element of key ro the largest value i.e. at any position the key
of the element is greater than its predecessor and smaller than its successor.
Lists, Stacks and Queues 29
M.S. University - D.D.C.E.
Operations:
l. Crlist:
Function: creates a list and initializes it as empty'
Preconditions.' none.
2. Irusert:
middle or at the end'
Function:inserts new element into the list either at the beginning, in the
Preconditions.' a list already exists'
postcond.itiozs: list is returned with the new element insened in it.
3. Delete:
the list'
Function: searches a list for the element and removes the element from
Preconditions; the list already exists'
Postcond.itiozs; list elements are printed in the order they are Present in the list. List remains
unchanged.
5. Modifii:
Function: searches for an element and replaces it with a new value.
Preconditions; the list already exists.
Postconditions: theelement if present is modified by a new value'
maintain a list of elements.
These are the basic set of operations that might be needed to create and
other operations, which can be performed on linked lists, are:
t. Counting the elements in a list.
2.2.2lmplementation
members. Some members can
Each element of the list is called a node and consists of two or more
pointels to other nodes' In case of
contain the information p"rt"irring to that node and the others may be
a singly linked lisr, one membeiconsists of such a pointer. R iint ed
list is therefore a collection of
strucrures ordered rr* uy their physical placement in memory
but by logical links that are stored as part
stmcture of the same type'
of data in the structure iiself. tti" ii"t is in form of a pointer ro another
30 Advanced Data Structure
M.S. University _ D.D.C.E
+ede
r-lgfln ng)*
Figure 2.9: Node
Such structures, which contain a member
field that points to the same stnrcrure type,
self-referential structures. A node may be represented are called
in general form as foilows:
SF."EU+-.[*.",* EhF".t - n arne
{
typ,e:uemherl;
"!"y.&.8 rremherZ;
type rnemherl;
The node may conrain more than one item with different data types. However, one of the items must
be a pointer of the type label-name. The above node with all its members can be depicted as followsr
men:hert
node
Consider a simple example to understand the concept of linking. Suppose we define a stnrcture
follows:
struct list
{
int value;
struct list *next,'
);
Assume that the list contains two node viz. nodel and node2. They are of type strud list and are
defined as follows:
struct list nodel,node2;
This statemenr creates space for two nodes each containing two empty fields as shown below:
nfrdet
Nodel.vn,lue
Nodel. next"
nod,Ez
&0dEt.lIffi"Iu€
n*dea . next
Figure 2.11: Creation of the two Nodes
The next pointer of nodel can be made to point to node2 by the statement
nodeL.next: &node2;
32 Advanced Data Srructure
M.S. University _ D.D.C.E.
This statement stores the address of node2 into the field nodel.nexr and thus establishes a ,,link,,
between nodel and node2 as shown below:
nfi*e1
node1.:ralue
rrode t
nsdei "vnlue
Address sf
n*de?
node?.value
node2.value:40;
The result is as follows:
aadel
node? . nel:L
INT COEF;
INT EXPO;
];
Polynomial nodes will be drawn as below:
For example, the polynomial a : 3x1a + 2xtc + 3xa will be stored as:
In order to add two polynomials together, we examine their starting terms. Two pointers are used to
move aiong the two polynomials. If the exponents of two terms are equal, then the coefficients are
added and a new term created for result. If the exponents are nor equal, then the term with bigger
exponent is attached to the resultant polynomial.
Following is the program and algorithm for the above problem:
/*this program adds two polynomials using linked lists's/
Algoithm
Step 1: Stan.
Step 2: Ask the user ro enter the first polynomial.
Step 3: Ask the user to enter the value of coefficient and exponent, or '0,0' terminate the polynomial.
Step 4: Ask the user ro enter rhe second polynomial.
Step f : Repeat step 2
Step 6: If the
user enters the value greater than 0, then match the values of different exponenrs entered
by the user and then perform addition of the coefficients with similar exponenrs.
Step 7: Display the results after the polynomial addition.
Step 8:End.
#include <stdlib.h>
#include <conio.h>
#include <iostream.h>
34 Advanced Data Structure M.S. University - D.D.C.E.
#include <stdio.h>
#define TRUE 1
#defi-ne FALSE 0
#defi-ne MAx 10
int arrayll{Axl;
cfass polyadd
{
protected:
struct link_Ilst
{
int coef;
int. expo;
struct link_1ist *next;
);tlpedef struct link_list node;
public:
void crpoly(node *list) ;
void padd(node *1ist1, node *1ist2, node *1ist3);
void print(node *list) ;
int x, y;
cout<<endl<<"Input a pair of number as 'coef, expo' :";
cout.<<end1<< " input ' 0, 0 ' to stop enterinq" ;
cout<<endl<<"input the coefficient " ;
cin >>x;
cout<<end1<<"input the exponent ";
cin >>y;
if(x==Q && y -- 0)
{
list->next = NULL;
)
else
if (x l= 0 il y == 0)
{
list->next = new node O ;
M.S. University - D.D.C.E. Lists, Stacks and Queues 35
list->next->coef = x;
list->next->expo = y;
crpoly(list->next);
)
)
void polyadd::padd(node *1ist1, node *1ist2, node *1ist3)
{ / /add two polynomials
if(list1 l= NULL I I list2 l= NULL)
{
if (1istl-->expo == 1ist2->expo)
{
list3->next = new nodeO;
1ist3->nexL->next = NULL;
1ist3->next->coef = 1ist1->coef + list2->coef;
1ist3->next->expo = 1ist1->expo;
padd(1ist1->next, list2->next, 1ist3->next) ;
)
else if ( list1->expo>list2->expo)
t
1ist3->next= new node ( ) ;
1ist3->next->next = NULL;
list3->next->coef = list1->coef;
1isL3->next->expo = listl-->expo;
padd(1ist1->next, 1ist.2, 1ist3->next) ;
)
else if (Iist1->expo< 1ist.2->expo)
{
1ist3->next = new nodeO;
1ist3->next->next = NULL;
list3->next->coef = 1ist2->coef;
list3->next->expo = 1ist2->expo;
padd(1ist1, 1ist2->next, 1ist3->next) ;
)
else if(list1 == NULL && list2!= NULL)
{
36 Advanced Data Structure M.S. Universitv - D.D.C.E.
)
el-se if (1ist2 == NULL && listl_t= NULL)
i
l-ist3->nexL = new nodeO;
l-ist3->next->next = NULL;
1ist3->next->co€f = 1ist.1->coef ;
1ist3->next->expo = 1ist1->expo;
padd(1ist1->next, 1ist.2, 1ist3->next) ;
)
return;
)
void polyadd: :print (node *l-ist)
{
if(1ist->next != NULLi
i
couE<<endl<< "Coef ficient is,'<<1ist->next->coef ;
pa->crpoly (head1 ) ;
Value definitioz.'A stack can contain anything of the type its implementing data srrucrure is defined,
i.e. integers, characters, complex records etc.
Definition elause: A stack as explained is a list of elements in which the item added mosr recenrly is
taken out first, i.e. the Last item In is the First one Out. Therefore, a stack can be defined as a LIFO
list of elemenrs.
Operations:
I. Create:
2. Push:
5. Full:
Function: tests whether stack is full.
Precondition' stack is created.
Postcond.itioz.. answer as yes or no depending on the status of the stack; no change in stack
contents.
6. Destrojt:
Function: removes all elements from stack, leaving the stack emPty.
Precondition' stack is created.
Postconditioz.' stack is empty.
These are the specifications for some of the operations, which are commonly performed on
ADT
stacks. Reader can think of more such operations depending on a particular situation and requirement
and can write ADT for them on the same lines.
2.5 IMPLEMENTATION
A stack can be implemented using both static and dynamic implementations, i.e. the space can be
allocated at compile time itself or at the execurion time of the program. Each implementation has its
own advantages and disadvantages, which we will consider later.
step I: Start the pop operation and display the status of the stack after
each pop operation.
step 6: continue wit h the pop operarion until the stack is empty.
Step 7:End.
Program
To implemenr a stack using arrays.
#include<iostream. h>
#include<stdfib. h>
#include<stdio. h>
#include<conio. h>
cLass fntstack
{
protected:
int count;
public:
fntStack(int num)
{
top - 0;
maxelem = flufili
s = new int [maxelem] ;
count =0;
)
int push(int r)
M.S. University - D.D.C.E. Lists, Stacks and Queues 41
{
if (LoP =- maxelem)
return maxelem;
sIt.op++] = t;
count++;
return count;
)
int pop ( )
{
if (top < 0)
{
return (-1);
)
top - top-1;
cout<<"top elelmnt is " <<sItop];
return (s Itop] ) ;
)
void display__pop o
i
if (top < 0)
{
cout << " (empty) \n" ;
16lr1rh.
)
for (int t=top;t>=0;t--)
)
gretch ( ) ;
)
M.S. Universitv - D.D.C.E. Lists, Stacks and Queues 43
Algorithm
Step 1:Start.
Step 2: Declare the structure of the linked list.
Step Insert elemenrs through the top of the linked list and increment the top position by 1 after each
i:
insertion.
Step 4: Insert the elements into the list until it is full.
Step f : Perform the pop operation by decrementing the top position by 1 after each pop.
Step 6: Display the list after every pop operation until the list is empty.
Step 7:End.
Program
private:
node *top;
public:
stack ( )
t
toP=1tr911 '
44 Advanced Data Structure
M.S. University - D.D.C.E.
)
void push(int item)
j-nt pop O ;
-stack ( )
if (t.op==NgLL)
return;
node *temp;
while (topt =NULL)
{
temp=gqpi
top-top->1ink;
delete temp;
)
node *temp;
int item;
temP=39p'
M.S. University - D.D.C.E. Lists, Stacks and Queues 45
item=temp->data;
toP=3on-t1ink;
delete temp;
return it.em;
)
void main( )
{
clrscr ( ) ;
stack s;
s.push(11);
s . push (L2) ;
s.push(13);
s.push(14);
s.push(15);
int i=s.popO;
cout<<end1;
cout<<"Item popped=" <<i<<end1;
i=s.popO;
cout<< " Item Popped= " <<i<<endl ;
i=s.popO;
cout<< " Item Popped= "<<i<<endI ;
i=s.pop ( ) ;
cout<< " ltem Popped= "<<i<<endl ;
i=s.pop ( ) ;
i=s.pop ( ) ;
getchr ( ) ;
elements actuallyPresent in the stack at run time. But, the elements arelarger since we must
store the
link (the next field) along with rhe user's dara.
Apart from the space requirement, the two implementations can be compared on other criteria also.
For example, Programming efforts and program complexity. \7e can .o-prr. the efficiency of
the two
representations with respecr to each other in terms of Big oh norarion.
3. Push or pop operation: In push or pop operations, the number of elements in the stack do not
affect the amourt of work done by theie operations. Because, in both operations we directly
access the top of the stack, i.'e. only one element. Therefore, push and pop have
measure O(1).
4. Destroy operation: Probably, this is the only operarion amongst the basic ones, which differs
from
one implementation to other. In the array version, we just have to ser rhe top field to zero)so
it is
an O(1) operation. But, in the linked version, we musr process every node in the $ack ro free
the
node space.
Therefore, the operation is o(n) where n is the number of nodes in the stack.
In all, the two implementations are almost equivalent in terms of amount of work they do. Since the
destroying operation is not widely used, the difference is not significant. The Table 2.1 summarizes
the
Big oh measures of various operations on the two implem"rr"iiorrr.
Table 2.1: Big Oh Measure of Common Stack Operations
Examples: A stack is an appropriate data stnrcture when information must be saved and then later
retrieved in reverse order. Any-situation requiring to store a previous steP and coming back to it in
furure may be a good one ro .rr. a str.k. Following are some examples using stack as their data structure.
Reuersing an lnput Text Line
Brief
As a simple example of using stacks, let us try to make a function that will read a line of input and
will
then write it out Lackward.'We can accomplish this task by pushing each character onto the stack as it
is read. \flhen we come to end of the i.rprrt, we will pop characters off the stack and they will come
off
in the reverse order.
Program
To reverse an input text line.
#include<stdio . h>
#i-nclude<sLdlib. h>
#define MAX 10 /*defining stack size* /
mainO /*main starts here*/
{
int toP=Q;
char stacktMAxl,c; /*declaring a character stack*/
clrscr ( ) ;
Lop- - ;
)
)
/*main ends here*/
Vatidating an Expression by Parenthesis Matching
Brief
Consider a mathematical expression that includes several sets of nested Parentheses, for e.g.
(a-((b+c(d))))
48 Advanced Data Structure
M.S. Universitv _ D.D.C.E.
\7e want to make sure that the parentheses are nested correctly, i.e.
1' lwe want to check that there are equal numbers of right and left parentheses.
2. Every right parenthesis is preceded by a matching left parenthesis.
Expressions such as
((r+b)
violate condition 1, and expressions such as-
)a+ b(c
violate condition 2.
the actual code for the program let us write an algorithm for it
l*:fning and rry ro understand
rne toglc.
Algorithm
Stelr l:Start.
Step 2: Declare a character array to store opening braces.
Step 1: Start accepting the expression.
step 4: If the character is.an opening brace, e.g. '('or'{'or'[', push it onto rhe stack. If successive
opening braces, keep pushing them on the stack.
Step.f : If the character is a closing brace, e.g. ,), or,), or,],,
l. Check if the stack is empty.
2' If the stack is empty, it means there was no corresponding opening brace for the closing brace.
Therefore, the expression is invalid.
3. If the stack is not emptI, pop an elemenr from the stack.
4' If the popped opening brace corresponds to the closing brace then the expression
is valid.
5. Else the expression is invalid.
#define MAX 10
class stk
i
public:
char stacklMAXl;
char c, e1e;
int top;
stk o
t
top-0;
)
int push(char c)
{
stack I top] =q.
return ++top;
)
char pop ( )
{
e1e=stackltop-11 ;
stack I top-1 ] =Q ;
top--;
return ele,.
)
t.
void main( )
t
clrscr ( ) ;
stk s,.
int flag=l-,top;
int tFlag =Q,.
char ret;
char c;
system("C1ear,') ;
else
t
flag=Q;
break;
)
)
)
)
if(s.stack[0] !=0)
{
flag=g;
i
if (f1ag==11 1
cout<<"Expression is vafid'; )
else if ( flag==g 1 1
cout<<"Expression is invalid. "; )
M.S. Universiry - D.D.C.E. Lists, Stacks and Queues 51
get.che O ;
)
's7'e
have tried to keep the above program as simple as possible, because our
aim was to illustrate the
concePt of stacks and.not of making an e*tersive e"piession evaluaror. This
is *hy -;h;;il;
some assumptions, which are as follows:
Any expression is to be contained in brackets, for e.g. the following expression will not work with the
above program:
a+(b+c) - d
Rather, it has to be in the following formar:
(a+ (b+c) - d)
The reason for using a.stack in this problem should be clear. The last parenthesis to
be opened should
be the first one to be closed. This is simulated by a stack in which the last
element ro arrive is the first
to leave' Each element on the stack represents a parenthesis that has been opened but has not
yet been
closed. Pushing an item onto the ,tr.k .orr"rporrdr to an opening brace ,ri poppirg
an item fro* th.
stack corresponds to closing a parenrhesis.
ll
tt
H { -------- {a+{---
{ n+ {h** 3 { a+ {h-c}
Figure 2.14
52 Advanced Data Structure M.S. University - D.D.C.E.
The Figure 2.14 depicts rhe contents of the stack at various stages of processing the for exPression:
{a+ (b-c)}
P os tfix Expre s si on Eual ua t i o n
Brief
The sum of. 2 normally as 2+3. This is called infix notation. The same sum can be
and 3 is represented
represented as +23,which is called prefix notation, and23+, which is called Postfix notation.
The pre{ixes "in", "pre" and "post" refer to the relative position of the oPerator with respect to the
t*o op".rrrds. In prefix norarion, operator is before the two operands. In infix notation, operator is in
b"t*".n the two tpera.rds. in postfix notation, operator comes immediately after the two operands.
Reader should gather more information on various notations in relevant literature.
Postfix notation has some obvious advantages over the most commonly used infix notation.
1.. Need for parenthesis is eliminated.
2. Knowledge of operator precedence is not required.
'We
try to develop a program for evaluating a postfix expression. Each operator encountered refers
can
'When
to the previous .*o op.irrrds. Each time we come across. an operand we push it on to a stack.
*" ,.*ih an operaror, its operands wiil be the two top elements on the stack. 'We can Pop these two
elemenrs, perform the operation on them and push back the result on to the stack.
It is then available for use with the next operator. The following program evaluates a Postfix
expression using this method. But let us write an algorithm first.
Algoritbm
Step 1: Start.
1. '+', then perform addition between the values and store as result.
2. '-', then perform subtraction between the values and save as result.
3. .'t', then perform multiplication between the values and store as result'
4. '/', then perform division between the values and store as result.
Step I: Push the result obtained from the expression and store it on the top position of the stack.
for S ub-prograrns
S tac k
This is one of the most important applications of stacks. What happens within the computer when
sub-programs are called? Tlie system (or the program) must remember the place where the call was
,r,rd., ,J thrt ir can return there after the sub-program is complete. It must also remember all the local
variables, CPU registers, and other data, so that information is not lost while the sub-program is
working. \0e can tlink of all this information as one large structure, a temporary storage area for each
sub-program.
Suppose that we have 3 sub-programs called A, B, and C, and suppose that A invokes B and B invokes
c.^'ihen B has not .ornplet.d iis work unril C has tinished and returned. Similarly, A is the first to
srarrwork, but it is the last to be finished, nor unril after B has finished and returned. Thus the
sequence in which this activityproceeds is summed up as the property last in,
first out. If we consider
the machine's task of temporarY storage areas for use by sub-programs, then those areas
"rrig.rirrg this same
would be allocated in a list with ProPeny, that is, in a stack.
The example is represented in the Figure 2.15.
There are various examples of queues in the real world. A line at a tailway counter or a L'us
stop is
familiar examples of qr",r"r. The person first in the queue will be the first to get the ticket. Similarly,
any new passenger will have to stand at the back of the queue'
To add elements to a queue we access the rear of the queue. To remove elements we access the front'
The middle elements are logically inaccessible, even if we physically store the queue elements
in a
random access structure such as an array.
As-it is clear by now, there are two operations that can be applied to a queue. Firsr,
new elements are
added to the rear of the queue. We wil cafl this operation ."i"rq. \7e
can also take the elements off the
front of the queue. \X/e will call this operation exiiq.
\7e are also required to check whether the queue conrains anything or is empty.
Theoretically, we can always enter in a queue, for in principle, a queue is
not limited in size. But
certain implementation, for e.g. an atray imple-.nt"rion, ,"q,rir., us to check
whether the data
structure is full, before entering a new element. Therefore, *. .rn also have
an operation for this
purPose' Before doing anphing, we also need to create a queue and initialize
it to an empty state. Also,
we might wanr ro delete all elements of the queue, leaving an empty srrucrure.
Following is the ADT representation of some of the common operations that
can be performed on
queues.
o value definition: A queue can contain any'thing of the type its implementing
defined, i.e. integers, characters, complex ,".ord{.t..
r - -----' data srrucrure is
t. Create:
underflow.
front : rear: 0.
Initial pueue
fror:t
rear
Rear
fr*nt
reaf
56 Advanced Dara Structure
M.S. University - D.D.C.E
k"qu
tear
Figure 2.15: Various Stages in Array Implementation of
eueues
A full queue is shown by rear, which is equal ro rhe last storage
secrion.
The following Program shows insertion and. deletion
operarions on a queue using array. Before writing
the actual code, let us try to write an algorithm f"r rfr"
prrgrr_.
Algorithm
Step 1: Start.
Step 2:Declare the structure of the queue.
step 1: Insert the elements into the queue from the rear and display the status of queue after
each
insenion.
Step 4: Continue insertion until the queue is full.
step r: Initiate the pop operation by popping the elements from the front.
Step 6: Display the queue after eachpop operation.
Step 7:End.
Ptogram
To delete and insen from a queue.
//CREATTON OF QUEUES USING ARRAYS
#include<iostream. h>
#include<conio. h>
#include<stdio. h>
#define MAx 10
class queue
{
private:
int .arr [MAX] ;
int. front,Tear;
public:
queue ( )
{
M.S. University - D.D.C.E. Lists, Stacks and Queues 57
front =-1;
rear=-1;
)
void addq(int i-tem)
t
if (rear==MAX-1 )
{
cout<<end1<<"Queue is Full_,, ;
return;
)
rear++;
arr Irear] = j-tem;
cout<<endl<< " items added,,<<arr I rear ] <<end1 ;
if ( front==-1 1
frnnl-=O.
)
int delq( )
{
int. data;
if(front==-1)
tI
cout<<end1<< " Queue is empt.y" ;
return NULL;
]
data =arr Ifront] ;
if (front==rear)
front=rear=-1;
else
f ront,++;
return data;
)
j;
void main( )
{ clrscr O ;
queue a;
a.addq(11);
58 Advanced Data Structure M.S. Universitv - D.D.C.E
a.addq(21);
a.addq(31);
a.addq(41);
a.addq(51);
int i=a. delq ( ) ;
cout<<endl<<" ILem Deleted='<<i<<endl;
i = a.delq0;
cout<<endl<< " f tem Deleted= "<<i-<<endl ,'
i= a.delqO;
cout<<endl<< " Item Deleted= " <<i<<endl ;
i= a.delq( ) ;
cout<<endl<< " Item Delet.ed= " <<i<<endl ,'
i = a.delqo;
cout<<endf<< " Item Deleted= "<<i<<endl ;
getche ( ) ;
)
The above design has a shoncoming. As we enter and delete elements from the queue, the front and
rear locations also shift forward i.e. as we delete the first item from the queue, the second location
becomes the front. Therefore, we loose the first storage location for future storage. As we continue
entering and deleting elements, the total storage space available goes on decreasing. Since, we are using
arrays as our basic data stnrcture for queues; this can be a serious drawback.
One solution to the above problem can be to shift all remaining elements up every time we remove an
element from the queue. But, this increases the overheads. To understand this, take a look at the
Figure 2.18.
Initial pueue
fr*nt
k*r.t
"{eflt
M.S. University - D.D.C.E. Lists, Stacks and Queues 59
kp.nj
r.-fg{
One way of rectifying this problem is to shift the rest of the elements of the queue one space up each
time the front element is deleted, as said above. But if a queue is large, this will require a lot of effort.
The decision to use this design depends on the final use to which the queue will be put. If the number
of elements to be stored in the queue is large, there will be a lot of processing required ro move up all
the elements. On the other hand, if the queue generalll, conrains only a few elements, this movement
may not be much of an overhead. Thus, the complete evaluation of the design depends on rhe intended
use of the program. Ve will see other implementations to rectify this shortcoming as we proceed.
Existing pueue
*ffit N-Ult
I{ILL
Deleting a Node from the Front
}"TIILL
The following program shows how to use linked lists to implement queues. Before writing the actual
code, let us write an algorithm for the program.
Algorithm
Step 1: Start.
Step 6: Display the status of the queue after each pop operarion.
Step z:End.
Prograrn
#include<stdio. h>
#include<conio . h>
struct node
{
int data;
node *link;
j;
class queue
{
private:
node *front, *rear,.
public:
queue ( )
{
f ront=rear=NULL,'
]
void addq(int item)
{
node *temp;
Lists, Stacks and Queues 61
M.S. University - D.D,C.E.
temp=nsry node;
if (temP==NULL)
cout<<endl<<"Queue is Full" ;
temp->data=item;
temp- >1 ink=NULL ;
if ( front==NULL)
{
rear=front=temp;
return i
)
rear->1ink=temP i
rear=rear- >1 ink;
)
int delq( )
{
if ( front==NULL)
{
cout<<end1<<"Queue is EmPtY" ;
return NULL;
)
node *temp;
int item;
item= front- >data;
temp=front;
front= front ->1ink;
delete temp;
return item;
)
-queue ( )
{
if (front==NULL)
return;
node *temP;
while(front!=NULL)
t
temp=front;
f ront=f ront.->1ink;
delete temp;
62 Advanced Data Structure
M.S. University * D.D.C.E.
]
)
];
void main ( )
{
clrscr ( ) ;
queue a,.
a.addq(11);
a.addq(21);
a.addq(31);
int i=a.delqO;
cout<<endl<<,,The Item deleted=,,<<i ;
i=a.delq( ) ;
cout<<endl<<,,The Item deleted=,,<<i ;
i=a.delq( ) ;
cout<<endl<<,,The Item deleted=,,<<i ;
getche ( ) ;
)
SetADT
A set is a collection of bindings. Each binding consisrs
of a key and a value. A key uniquely identifies
key. Programming systems use sets often'
its binding; a value is data thi't is somehow p'"nirr*t to its.
compilers and assemblers use ,.i, to relate symbols to their
meanings'
F;;";rdie
Set lnterface
typedef srrun {
float real;
float imag;
) GoMPLEX;
COMPLEX makecomplex (float, float)
;
Second implementation has advantages of dynamic allocation of space; modifying string also may be
more efficient, as needn't recalculate size.
heck Your
Define the following:
t. Stack
2. Linked Implementation
A stack is an ordered list in which all insertions and deletions are done at one end, called the rop. A
queue is an ordered list in which all insertions take place at one end, the rear, while all deletions take
place at the other end, the front. Unlike an the definition of stacks and queues provide for the
insertion and deletion of items. So, stacks^rray,
and queues are dynamic, constanrly changing objects.
Queues provide FIFO storage as opposed to LIFO storage provided by stacks. A stack is a dynamic
stnrcture i.e. it changes as elements are added to and removed from it. The operation that adds an
element to the top of a stack is usually called PUSH and the operarion that takes the top element off
the stack is called POP. Fven though the representation of the stack may be a random-access srnrcrure
such as anarray, the stack itself as a logical entity is not randomly accessed. Stacks and queues can be
implemented using arrays as well as using linked lists. As stated in earlier lessons, an array variable of
MAx stack size will take the same amount of memory, no matter how many array slots are actually
used. Therefore, we need to reserve space for the maximum possible. On the other hand, the linked
implementation using dynamically allocated storage space only requires space for the number of
elements actually present in the stack at run time. But the elements are larger since we must store the
link (the next field) along with the user's data. Stack is an appropriate data strucrure when information
must be saved and then later retrieved A in reverse order. Any situation requiring to backtrack to
some earlier position may be a good one to use a stack. There are 2 operations that can be applied to a
queue. First, new elements are added to the rear of the queue. \Ve will call this operation enrerq. \fe
can also take the elements off the front of the queue. Ve will call this operation exitq. The result of an
illegal aftempt to remove an element from an empty data structure is called underflow.
2.11 KEY\TORDS
Stack: Stack is a data stnrcrure which srores data at the top.
Jean-Paul, Tremblay, Paul G Sorenson, Introduction to data structures uitb application, McGraw Hill Book
Company
Richard F Gilberg, et al., Data Structures - A Pseudocode Approach aitb C, First Edition, Thomson, 2002
Kutti, NS Padhye, P.Y., Data Structures in C+ +,2nd ed., Prentice Hall 2000
Robert Sedgewick, Algoithms in C+ +,3rd ed., Addison Vesely 1999
3
TREES
CONTENTS
3.0 Aims and Objectives
3.1 Introduction
3.2 Trees
3.2.L Degree of Node of a Tree
3.2.2 Degree of a Tree
3.2.3 Level of a Node
3.3 N-ary Tree
3.3.1 Binary Tree
3.3.2 Full and Complete Binary Tree
3.3.3 Representations in Contiguous Memory
3.4 Linked Tree Representation
3.5 Binary Tree Traversal
3.5.1 Order of Traversal of Binary Tree
3.5.2 Procedure for Inorder Traversal
3.5.3 PreorderTraversal
3.5.4 PostorderTraversal
3.6 Binary Search Tree
3.6.1 Creating a Binary Search Tree
3.6.2 Deletion of a Node from Binary Search Tree
3.6.3 Deletion of a Node with two Children
3.6.4 Deletion of a Node with one Child
3.6.5 Deletion of a Node with no Child
3.6.6 Searching ior aTarget Key in a Binary Search Tree
3.6.7 An Application of a Binary Search Tree
3.7 AVL Trees
3.8 Let us Sum up
3.9 Keywords
3.10 Questions for Discussion
3.ll SuggestedReadings
72 Advanced Data Structure M.S. University - D.D.C.E
3.1 INTRODUCTION
While dealing with many problems in computer science, engineering and many other disciplines, it is
needed to impose a hierarchical structure on a collection of drta -it"*r. For exampl", *. ,r""d to
impose a hierarchical structure on a collection of dati items while preparing organisaiional charts and
genealogies, to rePresent the syntactic structure of source programs- ir, .o-pll"rs. A tree is a data
stmcture that is used to model such a hierarchical strucrure t, drt, irems, hence the study of rree as
one of the data structures is important. This module discusses tree as a datastnrcture.
3.2 TREES
A tree is a set of one or more nodes T such that
This is a tree because it is a set of nodes {A, B, C, D, E, F, G, FI, I}, with node A as a root node, and
the remaining nodes are partitioned into three disjoint sets: {8, G, H, I}, {C, E, F} AND {D}
respecrively. Each of these sets is a tree individually because each of these sets satisfies the above
properties. Shown below in Figure 3.2 is a stnrcture, which is not a tree:
t
B C D
\- /
o H
Figure
I
3.22
E ^
A Non-tree Structure
F
This is not a tree because it is a set of nodes {A, B, C, D, E, F, G, H, I}, with node A as a root node,
but the remaining nodes cannot be partitioned into disjoint sets, because the node I is shared.
Given below are some of the important definitions, which are used in connection with trees.
Node Degree
A 3
B 3
C 2
D 0
E 0
F 0
G 0
H 0
I 0
A 1
B 2
C 2
D 2
E 3
F 3
G 3
H )
I 3
For example, a complete binary tree with depth k: 3, having the number of nodes n: 5, can be
represented using an array of.5 as show below in Figure 3.7.
Trees 77
M.S. University - D.D.C.E'
A B C D E
Figure 3.7: AnArray Representation of a Complete Binary Tree having 5 Nodes and Depth 3
Shown below in Figure 3.8 is another example of an array representation of a complete binary tree
with depth k : 3, having the number of nodes n : 4.
A B C D
Figure 3.8: An Array Representa(ion of a Complete Binary Tree having 4 Nodes and Depth 3
In general any binary tree can be represented using an array. But we see that ^1 array rePresentation of
. .J-p1"t. bin ry trl. does not leal to the wastage of any storage. But if we walt to rePresent a binary-
tree which is noi a complete binary tree using representation, then it leads to the wastage of
^i ^rr^y
storage as shown in Figure 3.9.
78 'Advanced Data Structure
M.S. University - D.D.C.E.
A B C D E F G H I
78 10 t1
A tree representation using the above node structure is shown below in Figure 3.10.
Inorder: DBHEIAFCG
preorder: ABDEHICFG
postorder: DHIEBFGCA
Figure 3.11: A Binary Tree along with its Inorder, Preorder and Postorder
Ifan expression is represented as a binary tree then the inorder rraversal of the rree gives us an infix
exPression, whereas the postorder traversal gives us posfix expression as shown below in Figure 3.12.
bd
Inorder:a+bx-c+d+
Postorder: abcx-+de'f +
Figure 3.122 A Binary Tree of an Expression along with its Inorder and postorder
Given an order of traversal of a tree it is possible to construcr a rree. For example consider the
following order:
Inorder - DBEAC
^
M.S. Universitv - D.D.C.E. Trees 81
'We
can constnrct the binary trees shoyrn below in Figure 3.13 using this order of traversal:
'$(i
e can construct a unique binary tree shown in Figure 3.14 using these orders of traversal.
Figure 3.14: A Unique Binary Constructed using the Inorder and Postorder
printf (p->daLa) ;
inorder (p->rchild) ;
A non-recursive/iterative procedure for traversing a binary tree in inorder is given below for the
purpose of doing the analysis.
void inorder(tnode *p)
{
tnode *stackli00J;
int top;
{
top - 0;
if (p != NULL)
t
top - top + 1;
stackItop] = p;
n - p->1child;
while(top > 0)
t
while(p != NULL)
/*push the left child onto the stack*,/
M.S. University - D.D.C.E. Trees 83
top - top + 1;
stackltopl = P;
p - p->lchild;
)
p = stackltopl;
top - top-l;
printf (p->data) ;
p - p->rchild;
if (p l= NULL)
/*push right child*/
{
toP = toP+l;
stackltopl = P;
p - p->1child;
]
Analgsis
Consider the iterative version of the inorder given above. If the binary rree ro be traversed is having n
nodes, then the number of nil links are n + 1. Since every node is placed on rhe stack once, the
statements stack[top]:: p and p :: stack[top] are executed n rimes. The test for nil link will be done
exactly n+ 1 times. So every step will be executed no more than some small constant times n, hence
the order of algorithm O(n). Similar analysis can be done to obtain the estimate of the compurarion
time for preorder and post order.
)
84 Advanced Data Structure M.S. University - D.D.C.E
postorder (p->1chi1d) ;
postorder 1p->rchild) ;
print.f (p->data) ;
i
Consider the following example.
Given the preorder and inorder traversal of a binary tree. Draw the tree and write down its postorder
traversal.
Figure 3.15: A Unique Binary Tree Constructed Using the Inorder and Postorder
The post order for this tree is:
Z, A,P, X, B, C, Y, Q
The following function counts the number of leaf node in a binary tree.
int. count (tnode *p)
{
Trees 85
M.S. University - D.D.C.E
if (P == NULL)
count = 0;
e1 se
count = 1,'
else
count = count (p->l-child) + count (p->rchild) ;
The following procedure swaps the left and the right child of every node of a given binary tree.
void swaptree(tnode *P)
{
tnode *temp;
if (p != NULL)
{
swaptree (p->1child) ;
swaptree (p->rchild) ;
temp = p->1chi1d;
p->1chi1d - P->rchild;
p->rchild = temp;
]
The following function checks whether the two binary trees are gqual or not.
boolean equal(tnode *p1, tnode *p2)
{
boolean ans;
if ( (p1 == NULL) && (P2 == NULL) )
ans = true,'
else
if ( ( (p1==NULL)&&(p2 !=NULL) I | | I lpl!=NULL) sg(p2==NULL) ) )
ans = false;
e1 se
else
ans = false;
else
ans = false;
]
return (ans ) ,.
tnode *q;
{
if (p -- NULL)
ret.urn (NUf,f,) ;
el_se
{
Q = new(tnode) ;
q->data - p->data;
q->Ichild = copytree (p->Ichild) ;
q->rchild = copytree (p->rchild) ;
return (q) ;
A binary search tree is basically a binary tree, and therefore it can be traversed is in-order, Preorder,
arrd pori-order. If we traverse a binary search tree in inorder and print the identifiers contained in the
nodes of the tree, we get a sorted list of identifiers in the ascending order.
A binary search tree is an important search structure. For example, consider the problem of searching
a list. If a list is an ordered thln searching becomes faster, if we use a contiguous list and binary search,
but if we need to make changes in the list like inserting new entries and deleting old entries. Then it is
much slower ro use a contiguous list because insertion and deletion in a contiguous list requires
moving many of the entries every time. So we may think of using a linked list because it permits
insertions and deletions to be carried out by adjusting only few pointers, but in a linked list there is no
v/ay ro move rhrough the list other than one node at a time hence permitting only sequential access.
Binary trees provide an ercellent solution to this problem. By making the entries of an ordered list
into the .rodei of a binary search tree, we find that we can search for a key in O(n log n) steps.
To create abinary search tree we use a procedure named insert which creates a new node with the data
value supplied as a parameter to it, and inserts into an already existing tree whose root pointer is also
passed ,i-^ prrrrrreier. The procedure accomplishes this by checking whether_ the tree whose root
poirrte, is passed ,, , prr"*"ier is empty. If it is empty then the newly created node is inserted as a
88 Advanced Dara Structure
M.S. University - D.D.C.E.
root node. If it is not emPty then it copies the root pointer into a variable remp 1, it then srores yalue
of temp 1 in another variable temp2, compares the data value of the node poirrr"i to by temp
1 with
the data value supplied as a prr"-"t.r, if the data value supplied as a prr.-"r., is smaller
than the data
value of node pointed to by temp 1 then it copies theleft link of the node pointed by
.the temp 1 into
temp 1 (goes to the left), otherwise it copies the iight link of the node poirt"i by temp
1 inio temp
t(g9.1 to the right). It this process till temp-1 becomes nil. \Xrhen temp 1 b..orrr., nil, the new
i.p_"rJ:
node is inserted as a left child of the node pointedto by temp2 if daarrrlrr" of the node
poinied to by
tlmp2 is greater than data value supplied as param"t.r. Oth.r*ise the new node is inserted ,s , ,iglrt
child of node pointed to by rcmp2. Therefore the insen procedure is
void insert(tnode *p, int val)
t
tnode *temp1, *temp2;
if (p == NULL)
{
p = new(tnode) ;
p->data = val;
p->1chi1d = NULL;
p->rchild = NULL;
)
else
i
templ = p;
while(temp1 l= NULL)
{
temp2 = temlrl;
if(temp1->data > val)
templ = templ->1eft.;
el_se
templ = templ->right.;
)
if(Eemp2->data > va1)
{
temp2->1eft = new(tnode) ;
t.emp2 = t.emp2->l_eft ;
t.emp2->data = va1;
temp2->1eft = NULL;
temp2->right= NULL;
)
M.S. Universitv - D.D.C.E. Trees 89
else
{
temp2->riSht - new(tnode) ;
temp2 = temp2->right;
temp2->data = va1;
temp2->left = NULL;
bemp2->right = NULL;
temp - x->rchil-d;
y->1-chi1d = x->rchild;
while (temp->1child != NULL)
= temp->1chi1d;
temP
temp->1chil-d = x->1chi1d;
x->lchild = NULL;
x->rchild = NULL;
delete (x) ;
Nil
branch at each steP, this will allow us to make a 26-way branching according the first
letrer, followed
by another branch according ro rhe second letter and so on.
A program to create abinary search tree, given a list of identifiers is given below:
char keylMAxLENl;
struct tnode
{
key name;
tnode *1chi1d;
tnode *rchild;
)
void btree ( )
I
tnode *rooL;
key item;
int n;
root = NULL;
printf ("Number of data values: ,,) ;
scanf ( "8d", &n) ;
while( n > 0)
{
printf ('.Enter the data va1ue,,);
scanf ( "8s", item) ;
insert (root, item) ,.
n = n_1i
)
printtree (root) ;
Here, the height of the tree is h. Height of one subtree is h-t while that of another subtree of the same
node is h-2, differing from each other by just 1. Therefore, it is an AVL tree.
This insertion causes irs height to become 2 greater than node-2's right sub-tree. A right-rotation is
performed to correct the imbalance, as shown below:
Your
'$7hat
1. are the characteristic properties of an AVL tree?
2. Define level of a node.
3.9 KEY\TORDS
Tree: A two-dimensional data structure comprising of nodes where one node is the root and rest of the
nodes form two disjoint sets each of which is a tree.
Nodc: A data structure that holds information and links to other nodes.
Root nodc: The node in a tree which does not have a parent node.
Degree of a tree: The highest degree of a node appearing in the tree.
I*uel of a node:The number of nodes that must be traversed to reach the node from the root.
N-ary tree: A tree in whose degree is N.
Binary tree: A tree of degree 2.
Inordcr: A tree traversing method in which the tree is traversed in the order of left-tree, node and then
right-tree.
Postordcr: A tree traversing method in which the tree is traversed in the order of left-tree, right-tree
and then node.
Preordcr: A tree traversing method in which the tree is traversed in the order of node, left-tree and
then right-tree.
Search tree: A tree constnrcted and used in searching algorithms.
AW tree (Afulson, Velskii and Izndis Tree): A balanced binary search tree in which the sub-trees of
every node differ in height by at most one level and every sub-tree is an AVL tree.
2. Give the array representation of a complete binary tree wirh depth k: 3, having the number of
nodes n:7.
3. How many binary trees are possible with three nodes?
4. Construct a binary tree whose in-order and pre-order rraversal is given below:
In-order: 5,L,3 rLL,6,8 14,2,7
If the preorder traversal of a tree gives the following sequence of nodes, draw the tree. Also
traverse it in inorder and postoder.
ABCDEFGH
7. show the result of deleting node (60) from the following binary search tree.
M.S. University - D.D.C.E. Trees 99
8. Show the result of inserting node (45) into the above binary search tree.
9. Convert the following graph into a binary tree by removing necessary edges.
4.1 INTRODUCTION
Hashing is a method of directly computing the index of the table by using some suitable mathematical
function called hash function. The hash function operares on the name ro be stored in the symbol
table, or whose attributes are to be retrieved from the symbol table. This concepr has been discussed in
this lesson in detail.
A priority queue is a collection of elements such that each element has been assigned a priority. \(e
have discussed priority queues and its implementation in this lesson.
4.2 HASHING
In many applications we require to use a data object called symbol table. A symbol table is nothing but
a set of pairs (name, value) where value represents collection of amributes associated with the name,
and this collection of attributes depends upon the program element identified by the name. For
example if a name x is used to identify array in a program, rhen the attributes associated with x are
^n
M.S. University - D.D.C.E. Hashing and Priority Queues 101
the number of dimensions, lower bound and upper bound of each dimension, and the element type.
Therefore a symbol table can be thought of as a linear list of pairs (name, value), and hence we can use
a list of data object {or reahzine a symbol table. A symbol table is referred to or accessed frequently
either for adding the name, or for storing the attributes of the name, or for retrieving the attributes of
the name. Therefore accessing efficiency is a prime concern while designing a symbol table. Hence, the
mosr common way of getting a symbol table implemented is to use a hash table. Hashing is a method
of directly .o*p,rtirrg th. irJ.* of the table by using some suitable mathematical function called hash
function. The hash frnction operates on the name to be stored in the symbol table, or whose
attributes are to be retrieved from the symbol table. If h is a hash function and x is a name, then h(x)
gives the index of the table where x along with its attributes can be stored. If x is already stored in the
Iable, then h(x) gives the index of the t"tl. *h"r. it is stored to retrieve the attributes of x from the
rable. There are various methods of defining a hash function like a division method. In this method we
take the sum of the values of the characters, divide it by the size of the table, and take the remainder.
This gives us an integer value lying in the range of 0 to (n-1) if the size of the table is n. The other
method is a mid square method. In this method, the identifier is first squared and then the appropriate
number of bits from the middle of square is used as the hash value. Since the middle bits of the square
usually depend on all the characters in the identifier, it is expected that different identifiers will result
irrto diff"..rt values. The number of middle bits that we select depends on the table size. Therefore if r
is the number of middle bits that we use to form hash value, then the table size will be 2'. Hence when
we use this method the table size is required to be power of 2. Another method is folding in which the
identifier is partitioned into several parts, all but the last part being of the same length. These Parts are
then added together to obtain the hash value.
L. Modular aithmetic: In this method, first the key is converted to integer, then it is divided by the
size of index range, and the remainder is taken to be the hash value. The spread achieved depends
very much on the modulus. If modulus is power of small integers like 2 or 1O, then many keys
tend to map into the same index, while other indices remain unused. The best choice for modulus
is often b.ri not always is a prime number, which usually has the effect of spreading the keys quite
uniformly.
Z. Truneation: This method ignores part of key, and use the remainder part directly as hash value.
(considering non-numeric iields as their numerical code) if the keys for example are eight digit
numbers *nd ,h. hash table has 1000 entries, then the first, second, and fifth digit from right
might make hash value. So 62538194 maps to 394. k is a fast method, but often fails to distribute
keys evenly.
3. Fold.ing: In this method, the idencifier is partitioned into several parts all but the last part being of
the same length. These parts are then added together to obtain the hash value. For example, an
eight digit inieger can bi divided into groups of three, three, and two digits. The groups are the
added rogether, and truncated if necessary to be in the proper range of indices. Hence 62538t49
-rp, to, 625 + 381 + 94 : 1100, truncated to 100. Since all information in the key can affect the
value of the function, folding often achieves a better spread of indices than truncation.
4. Mid. square method: In this method, the identifier is squared (considering non-numeric fields as
their numerical code), and then the appropriate number of bits from the middle of the square are
102 Advanced Data Structure
M.S. University - D.D.C.E.
used to get the hash value. Since, the middle bits of the square usually depend on all the
characrers
in the identifier, it is expected that different identifiers wlll ,es.rlt in'different values. The number
of middle bits that we select depends on table size. Therefore, if r is the number of middle bits
used to form hash value, then the table size will be 2', hence when we use mid square
method the
table size should be a power of 2.
4.2.2Hash Collision
To store the name or to add attributes of the name, v/e compute hash value of the name, and place the
name or attributes as the case may be, at that place in the iable whose index is the hash value
of the
name. For retrieving the attribute values.of the name kept in the symbol table, we apply the
hash
function to the name to obtain index of the table where xre ger the aitributes of the name. Hence we
find that no comparis-ons are required to be done, Henc.l the time required for the retrieval is
independent of the table size. Therefore retrieval is possible in a consranr amounr of time,
which will
be the time taken for computing the hash function. Therefore, hash table seems to be
the best for
realization, of the symbol table, but there is one problem associated with the hashing, and it
is of
collisions. Hash collision occurs when the two identifiers are mapped into the same hash value. This
happens because a hash function defines a mapping from a set of valid identifiers to
the set of those
integers, which are used as indices of the table. Therefore, we see that the domain of the mapping
defined by the hash function is much larger than the range of the mapping, and hence th"
ii
of many to one nature. Therefore, when we implement a hash trbi. ,-r,ritable collisio.,-rppirrg hrrrdfrg
mechanism is to be provided which will be activated when there is a collision.
Collision handling involve finding out an alternative location for one of the two colliding symbols.
For example, if x and y are the different identifiers and if h(x : h(y), x and, y are rhe collidini
symbols.
If x is encountered before /, then the i'h entry of the table will be used for accommodating"rymbol x,
but later on when y comes there is a hash collision, and therefore, we have ro find out an ahernarive
location either for x or y. This means we find our a suitable alternative location and either
accommodate y in that location, or we can move x to that location and place y in the i h location
of the
table. There are various methods available to obtain an alternative location to handle the
collision.
They differ from each.other in the way search is made for an alrernative location. The following
are
the commonly used collision handling techniques:
l. Linearprobing or linear open adtlressing:In this method, if for an identifier x, h(x) : i, and
if the i,h
location is already occupied then we search for a location close to the i,h location by doing
alinear
search starting from the (i+1)il location to accommodate x. This means we srarr iro1;1
rli" (i+1)m
location and do the linear search till we get an empty location, and once we ger an empty location
we accommodate x there.
2. Rehashing: This is another method of collision handling. In this method, we find
an alernative
emPty location by modifying the hash function, and ,pplyirg the modified hash function
to rhe
colliding symbol. For example, if x is symbol and h(") : l, and if the i,h location is
already
occupied, then we modify the hash function h to h,, and find out h,(x), if h,(x) : j,
and j,h location
is empty, then we accommodate x in the j'h location. Otherwise, *. orr." again modify h, to
some, h, and repeat the process till the collision gets handled. Once, the collision
gets handledwe
revert back to the original hash function before considering the next symbol.
3. Separate Chaining/,oae|flow chaining This is a method of implemenring a hash table, in which
collisions gets handled automatically. In this method, we use two tables, a slymbol table to
accommo-
M.S. University - D.D.C.E. Hashing and Priority Queues 103
date identifiers and their attributes, and a hash table which is an array of pointers pointing to symbol
table entries. Each symbol table entry is made of three fields, first for holding the identifier, second for
holing the attributes, and the third for holding the link or pointer which can be made pointing to any
symbol table entry. The insertions into the symbol table are done as follows:
If x is symbol to it will be added to the next available -entry of the symbol table.
be inserted, then
The hash value of x is then computed, if hG) : i, then the i'h hash table pointer is made pointing
to rhe symbol table entry in which x is stored if the i'h hash table pointer is not pointing to any
symbol table entry. If the i h hash table pointer is already pointing to some symbol table entry,
then the link field of symbol table entry containing x is made pointing to that symbol table entry
to which i,h hash table pointer is pointing to, and make the i'h hash table pointer pointing the
symbol entry containing x. This is equivalent to building a linked list on the i'h index of the hash
table. The retrieval of attributes is done as follows:
If x is a symbol, then we obtain h(x), and use this value as the index of the hash table, and traverse
the list built on this index to get that entry which contains x. A typical hash table implemented
using this technique is shown below:
Let the symbols to be stored are xl, ! 1, 21, x2, f ,, z, The hash function that we use is:
if h(xf :i
h(vd : j
h(z) : k
then
h(x) : i
h(il:j
h(22) : k
Therefore the contents of the symbol table will be the one shown in Figure 4.1.
Link
k x, NULL
yr 1,lULL
:t.
,."-tJLl
i x, a
v:
q O
j
Figure 4.1: Hash Table Implementation using Overflow Chaining for Collision Handling
Consider using division method of hashing store the following values in the hash table of size 11:
For96,h(96) : 96mod 11 : 8,
1 15
2 101
3 25
4 102
5 201
B 96
9 162
10 197
M.S. University - D.D.C.E. Hashing and Priority Queues 105
An element of higher priority is processed before any element of lower priority. Two elements with
the same priority are processed according to the order in which they were inserted into the queue.
'We
would use a singly linked list to implement the priority queue. Each node of the linked list would
have a type definition as follows.
struct qElement
{
T item;
int priority;
.qElement *next;
) *Pqueue, *front, *rear;
The algorithm for the insertion would change now. Insertion would insert the new element ar rhe
correct position according to the priority of the element. The elements of the priority queue would be
sorted in a non-descending order of the priority with the front of the queue having the element with
the highest priority. The deletion procedure need not change since the element at the front is the one
with the highest priority and that is the one that should be deleted.
void insert(Pqueue *front, Pqueue *rear, T e, i_nt p)
/* this inserts an element having data e and priority p into the priority
queue */
/*the insertion maintains the sorted order of the priority queue */
{
Pqueue *f, *r;
Pqueue *x,.
int pr;
x = new(Pqueue);
x->iCem = e; X->priority = p;
if (front == NULL)
{
front = x,.
x->next = NULL;
rear = x;
i
/" x is the first node being added to the priority queue*/
106 Advanced Data Structure M.S. University - D.D.C.E.
f = f->next,. r = f; pr = f->priority;
)
Binary Heap
A binary tree that has the following properties (called heap properties) is called a heap tree or binary
heap.
t. Either it is empty
Or
2. The key in the roor is larger than that in either child
And
3. Both subtrees have the heap properries.
M.S. University - D.D.C.E Hashing and Priority Queues 107
Thus, a heap tree or binary heap can be used as a priority queue where the highest priority item is at
the root and is trivially extracted. But if the root is deleted, we are left with two sub-trees and we must
efficiently re-create a single tree with the heap property. Insertion and deletion in a heap tree is very
efficient - of the order of O0og n) - as compared to other trees.
if (n < MAX)
{
n = n + 1;
heap[n] = v;
bubble_up (n) ;
else
report error: out of space;
)
bubble_up(int i)
{
while (not isroot(i) and heaplil > heap[parent(i)J)
t
swap heaplil and heap[parent(i)];
I - parent(i);
)
return TRUE;
el_se
return FALSE;
)
Let us insert a node with data value 49. Since the next position is the left child of the node with value 43,
the new node will be added as shown below:
M.S. University - D.D.C.E. Hashing and Priority Queues 109
However, this makes the tree violate the heap property. Therefore, 43 must be swapped with 49.
Even now the heap property is not being fulfilled. Therefore,45 must be swapped with 49. At this
point the tree possesses the heap property and thus, we stop.
In a heap-tree, only the node with the highest priority (the one at the root) is deleted. 'We're then left
with something which isn't a binary tree at all. \7e can now 'trickle down' the new root by comparing
it to both its children and exchanging it for the largest. This process is then repeated until this element
has found its place.
Again, this takes at most log(n) steps. Note that this algorithm does not try to be fair in the sense that
if two nodes have the same priority, it is not necessary rhat the one that has been waiting longer is
removed first. A solution to this is to keep some kind of time-stamp on arrivals, or by giving them
numbers.
110 Advanced Data Strucure M.S. University - D.D.C.E.
Deleting a node will remove 50 from the root of the heap tree. The empty root must then be filled
with the last element of the heap tree (i.e., a3).
A
)o
However, in doing so, the heap looses its heap property and therefore, ir must be rearranged. 43 must
be swapped with 49.
M.S. Universitv - D.D.C.E. Hashing and Priority Queues i11
Even this does not confirm with the definition of heap. One more swap is necessary - 43 with 45 -
resulting in the final heap tree.
t. Define rehashing.
'What
2. is mid-square method?
4.5 KEY\TORDS
Hashing: Hashing is a method of directly computing the index of the table by using some suitable
mathematical function called hash function.
Separate Chaining: This is a method of implementing a hash table, in which collisions gets handled
automatically.
Prioritjt pueue: A queue in which elements are assigned priorities to determine the order in which they
can be retrieved.
SORTING
CONTENTS
5.C Aims and Objectives
5.1 Introduction:SoningPreliminaries
5.2 'O'Notation
5.3 Insertion Sort
5.4 Shell Son
5.5 Heap Sort
5.5.1 Insertion in F{eap
5.5.2 Deletion from Heap
5.6 Construction of Heap
5.6.1 Top-downConstrucion
5.6.2 Bottom-upConsrrucrion
5.7 Soning using Heap
5.8 Merge Sort
5.9 Quick Son
5.10 Let us Sum up
5.11 Ke),words
5.1.2 Questions for Discussion
5.13 Suggested Reading
Since P has n elements, there are n! ways that the contents can appear in P. These ways correspond
precisely to the n! permutations of 1, 2, ...n. Accordingly, each sorting algorithm must take care of
these n! possibilities.
5.2'O'NOTATION
Given two functions f(n) and g(n), we say that f(n) is of tbe order of g(n) or that f(n) is O(S(") if there
exists positive integers a and b such that
f(n)<a"g(n)forn>b.
For example, if f(n) : n2 + 100n and g (n) : n'
f(n) is O(S(")), since n2 + 100n is less than or equal to2n' for all n greaterthan or equal to 100. In this
casea:2andb:100.
The same f(n) is also O("), since n' + 100 n is less than or equal to 2n' {or all n greater than or equal to
8. If (n) is O(g(n)) and g(n) is O(h(n)), then f(n) is O(h(n)). For example, n" + 100n is O(n) and n, is
O(t ), then n2 + 100n is O(n') for (a : 1, b :1). This is called the transitive properry.
If the function is C 'r n then its order will be O(nk) for any constant c and k. As c 'r n is iess than or
equaltoc'r'nkforanyn3l (i.ea: c, b: 1).If f(n)isnkthenitsorderwillbeO(ru.) foranyj3 O(for
a:l,b:1).
M.S. University - D.D.C.E. Soning 117
are both O(h (n)), the new function f(n) + g(n) is also O(h(n)). If
If f(n) and g(n) (") is any polynomial
whose leading power is k I i.e. f(") : c, " flu *c,'i- 11t't + ". + co'r n*cy*,J
f(n)is O(nk).
Algorithm EfficienE in Lagarithmic Function
Let log- n and logun be two functions. Let xm be log-n and xk be logon then
m'^: nandk'k: n
Since if mx : n
So that log,,n : x
Now it can easily be shou,n that log, (x) equals y "'logrx for any x,y and z, so that the last equation
can be written as
or as
If f(n) : . ', g(r) then f(n) is O(g(n)) thus log-n is O(ogon) and logun is O(og-n) for any m and k.
i! I
I I I { i" th*rtrf,r3ft rilien Fri{}r tfi j
! J !
-i > !, rrr*m lret,*efr ] & j
;
1 > i, :n:*n bpt.i,r.*r, J !t j
Thus' to find the correct position, search the list till an item just
grearer than the target
" is found; shift
all rhe items from this point one down the rist, insert the r..g.,
i"?t;;;;;ri;;.
Algoithm to Implement Insertion Sort
insert sorL (x, n)
' int x[ ], n,.
{
int i, k, y;
for (k = 1; k < n' k++)
I
L
y = x [k];
for (i = k-1; i > = 0 &&y < x iil; i __)
x [i+11 = xli];
x [i+11 = y;
)
]
Analjsis oif Insenion Sort
If the initial file is sorted, only one comparison is made on each pass, so that sort is
initially sorted in reverse order, the sort I o(N), since the ,orrl o(n). If the file is
;.*;;;ons are:
("-1) + (r- 2) +... + 3 + 2+ t: (N_ 1) *N/2 ^r-b", "f
which is O(N).
The closer the file is to sorted order, the more efficient
the simple insertion sorr becomes. The space
requirements for the sort consists of only one rempor^ry
u^ri^b!", y. The qp".J.r the sort can be
improved somewhat by using abinary r"rr.h to find'the
;;"p;; position for xlkl in the soned file.
M.S. University - D.D.C.E. Soning 119
After the k subfiles are sorted (usually by simple insertion), a new smaller value of k is chosen and the
file is again partitioned into a new set of subfiles. Each of these larger subfiles are sorted and the
process is repeated yet again with an even smaller value of k.
Eventually, the value of k is set to 1, so that the subfile consisting of the entire file is sorted.
A decreasing sequence of increments is fixed at the start of the entire process. The last value in this
sequence must be 1.
and the sequence (5, 3, 1) is chosen, the following subfiles are soned on each iteration:
First iteration (increment : 5)
(x[0], x[s])
(x[1], x[6]
(xl2l,xl7l
(,t31)l
kr4)
Second iteration (increment = 3)
IlasE .l
5F"* i
i*a;s ri
tl $ i; s$s57
r"-lltI
3 .:ti
$pen= i llt
*":ned lile l: 25 S 3l .l* :A lh *l
Algorithm to Implement Shell Sort
void shellsort (int x[ ], int n, inr increments [ ], int numeric)
{
int. incr, J, k, span, y;
for (incr = 0; i_ncr <numeric; incr ++)
{
spdrr = increments Iincr];
for (j = span; j .n; j++)
/* insert el_ement xljl into
,/* its proper position with its *,/
/* subfife * /
y = xljl;
for (x = j-span; k > = 0 && y < x tkl; k_=span)
x Ik+span] = x Ik] ;
x Ik+span] =y;
j /* end for */
)/*endfor*/
) /* end shel1 sorL */
Analysis of Shell Sort
Since the first increment trced by the shell son is large, the individual
subfiles are quite small, so that the
simple insertion sort on those subfiles are fairly fast. Each sort of a subfile causes
the entire file to be
M.S. University - D.D.C.E. Soning i21
more neady sorted. Thus, although successive passes of the shell son use smaller increments and
therefore, deal with larger subfiles, those subfiles are almost sorted due to the actions of previous passes.
Thus the insertion sort on these subfiles are also quite efficient. The actual time requirement for a
specific sort depends on the number of elements in the array increments and on their actual values.
Ithas been shown that order of the shell sort can be approximated by O(n " 0og (.r,)) if an appropriate
sequence of increments is used. For other series, the running time O(n,).
A complete binary tree is said to satisfy the "Heap Condition" if the key of each node is greater
than or equal to the keys in its children. Thus, the root node will have the largest key value. Trees
can be represented as arrays, by first numbering the nodes (starting from the root) from left to
right.
The key value of nodes are then assigned to array positions whose index is given by the number of the
node.
"23435
E
The relationships of the node can also be determined from the array representation. If a node is at
position j, its children will be at positions 2j and2j+1. Its parenr will be at position j/2. A heap is a
complete binary tree in which each node satisfies the heap condition, represented as an array. The
operation on a heap works in two steps:
(, The required node is inserted/d eleted/ or replaced.
(i0 First step may cause violation of the heap condition so the heap is traversed and modified to
rectify any such violations.
1.22 Advanced Data Structure M.S. University _ D.D.C.E.
1. Initially R is added as the righr child of J and given the number 13.
2. But R J, the heap condition is violated.
3. Move R upto position 6 and move J to position 13.
4. R P, therefore, the heap condition is still violated.
5. Swap R and P.
6. The heap condition is now satisfied by all nodes.
Figure 5.2:Heap 2
controls the number of insertions which are to be performed. The integer variable denotes the index
J
of the parent of key k[I]. Key contains the key of the record being inserted into an existing heap.
1. [Build Heap]
Repeat through step 7 for Q :2,3.....5
2. llnitialize consrrucrion phasel
IfQ
KEYfKtQI
3. [Obtain parenr of new record]
Jffrunc (I/2)
4. [Place new record in existing heap]
Repeat through step 6 while I> 1 and KEy > KUI
5. flnterchangerecord]
Ktll fKul
6. [Obtain nexr parenr]
IfJ
J{Trunc (I/2)
ifJ <1
then Jf1
7. [Copy new record into its proper place]
K [r] fKEY
8. [Finished]
Return
An_ efficientsorting method is based on the heap construction and node removal from the heap in
order. This algorithm is guaranteed to sorr N elements in N log N steps.
r Insert items into an initially empty heap, keeping the heap condition intact in all steps.
124 Advanced Data Structure M.S. University - D.D.C.E.
B ottom - up H e ap C on struction
t Pt
P .q"
6 Gb
ih:
Figure 5.4
M.S. University - D.D.C.E. Soning 125
Figure 5.5
1 i 3 '1 5 & 7 B I 1* 11 1:
xl i* s
3 1tr 11 13
vl
,.1 ll= t
J
'r : 3 4 ; * r s $ 1{} 11 1l
xl l* r( 5 5
M.S. University - D.D.C.E. Sorting 127
'2 3 ;l 5 h 7 fi I .lli 11 1l
rl l= fl K S 5
I J 3 r 3 b " I li ltj 11 1:
{l- t)
I 5 5
'tu 11 1l
xl 1= U $3 R 5 5
128 Advanced Data Structure M.S. University - D.D.C.E.
.tI
'I {"1 1l
\ {} P lc
( 5
'l I
a
,{b
I
(,
1t 1't
xl l= t_ H fi rlxlsls
(ix) Similarly, the remaining 3 nodes are removed and the heap modified ro ger the sorted list
AEFILNOOPRSS.
Examplc
ll :t?l tlti:5*l
\./\./
\,/
L.
I'r : j j -:. i 5 ll
If (i>m)
130 Advanced Data Structure M.S. University - D.D.C.E.
{
for (t = j; t < lL r t++ )
z[k + t - jl = xltl
el,se
The above algorithm for merge sort has one important property that after pass K, the arr^y A will be
positioned into sorted subarrays of exactly L : 2K elements (except the last subarray).
By dividing n (size of array A) by 2t'L, we get the quotient Q which is number of pairs of sorted
subarrays, of size L, i.e.
Q : rNT OI/2',iL)
S:2*1*q will be the total number of elements in the Q pairs of subarrays. R:N - S denotes the
number of remaining elements.
Analjtsis of 'MSORTI
On the ith pass the files being merged are of size2it. Consequently, a total of llog (It{)] passes are made
over the data. Since two files can be merged in linear time (algorithm 'MERGE'), each pass of merge
*
sorr takes O(I.,I) time. As there are [1og, (N)] passes, the total computing time is O(N log N).
The purpose of the Quick Sort is to move a data item in the correct direction just enough for it to
reach its final place in the The method, therefore reduces unnecessary swaps, and moves an item
^rray.
a great distance in one move. A pivotal item near the middle of the array is chosen, and then items on
either side are moved so that the data items on one side of the pivot are smaller than the pivot, whereas
those on the other side are larger, the middle (pivot) item is in its correct position. The procedure is
then applied recursively to the parts of the array, on either side of the pivot, until the whole array is
sorted.
M.S. University - D.D.C.E. Soning 131
Example:
If an initial array is given as:
25 57 48 37 t2 92 86 33
and the first element (25) is placed in its proper position, the resulting array is:
t2 25 57 48 37 92 86 33
At this point, 25 is in its proper position in the array (x[1]), each element below that position (12) is
lessthan or equal to 25, and each element above that positio n (57, 40, 37,92 86 and 33) is greate, thm
or equal to 25.
Since 25 is in its final position the original problem has been decomposed into the problem of sorting
the two subarrays.
(r2) and (s7 48 37 e2 86 33)
First of these subarrays has one element so there is no need to sorr it. Repeating the process on the
subarray x[2] through x[7] yields:
i = m.
j = n+1;
k = x [m] ; /* keY *1
while (1)
do
132 Advanced Data Structure M.S. Universitv - D.D.C.E
j =j - 1-;
i
while (x tj I > k) ;
:
r-l.E t:
tr < l/. i\
L = x til;
v ft+Jil = v fi'l
t) ) t
v f il = l.
)
else break;
t = x [m] ;
x [m] = x tjl;
LJI _ L,
^
q sort (x, m/ j -1) ;
q sort (x, j+1; +n);
)
\7e shall illustrate the mechanics of this method by applying it to an array of numbers. Suppose, the
array A initially appears as:
(15, 20, 5, 8, 95, t2,80, 17,9,55)
Figure 5.6 shows a quick sort applied to this array.
A(1) A(2) A(3) A(4) A(s) A(6,) A(7) A(8) A(e) A(10)
t5 2A 58 95 t2 80 t7 955
920 58 95 12 80 T7 0ss
e0 58 95 t2 80 t7 20 55
912 58 e5080 17 20 55
9t2 58 0es80 T7 20 55
912 58 15 95 80 t7 20 55
2. Scan line 2 from left to right beginning with position A(2), comparing data item values with 15.
\7hen you find the Ist value greater than 15, extracr it and srore ir, the position marked by
parentheses in line 2. This is shown in line 3 in the Figure 5.2.
3. Begin the right to left scan of line 3 with position A(8) looking for a value smaller than 15. \7hen
you find it, extract it and store it in the position marked by the parentheses in line 3 of Figure 5.2.
rr
B-a
ItFFFFFFFFFFI...
*-*
fu,-
i
!&tbd
':.---
JsJ - g-c Lrffi-
"
5 _ J l-r
-'.- :i::1{ 4 .-: * -f
: t= *-i
_-\.- rr:- !
=r.
- ;*i{*.
s*i.;il;:*i
l* ' .:,1* -. ',i}ra p.Ej: t: rrE}a*+rt }*'*tsE:-: -,}-:ii i I
,.^,.. -,--JJ/' r;
i Ji-
j,'=ill.f;':r
li=l-
+ ''3*s f 1i- * E 3 -.s e' :.:-':: ,i-
4 --..s
;l =,'f i.
' :":- j j-: "rE : :":-1if, E-.-'
'"'.'". :=rE]-.r' .i... ;E :* -'
't.- 4-&*_- ..-"JlJs-
' ^.*.:. r ll! at t I rd -
L-,#"ErF_
Me'1 -
"'d
d '-- e*
TB
i ;.;' Ei
xI
_ {t
!t
g I :r 3,i* i'EF,
!ix
It!
it-_*-
g * - --*
I I}
JII
I t!
X
L'#
;.-'..: li
tt
:;-
_:r-:: J
'-i..
- -.@
Figure 5.7
134 Advanced Data Slructure M.S. University - D.D.C.E.
4. Begin scanning line 4 from left to right at position A(3), find a value greater than 15, remove it,
mark its position, and store it inside the parentheses in line 4. This is shown in line 5 of Figure 5.7.
5. Now, when you line 5 from right to left beginning at position A(7), you find no value
scan
smalier than 15. Moreover, you come to a parentheses position, position A(5). This is the location
to put the Ist data item, 15, as shown in line 6 of Figure 5.7. At this stage 15 is in its correct place
relative to final sorted arruy.
Proper position for the pivot always turns out to be exact middle of the subarray. There will be
approximately n comparisons on the first pass after which file wili split into two subfiles each of size
n/2 approximately.
For each of these two files there will be approximately n/2 comparisons. So, after halving the subfiles
m times, there are n files of size 1. Thus, the total number of comparisons fcr the entire sort is
approximately:
n+2't (n/2) + 4't (n/4) + 8't (n/8) + .. + n't (n/n)
or
n+n+n+n+..n(mterms)
There are m terms because the file is divided m times. Thus, the total number of comparisons are:
O(r, o m) or O(n log n) fas m : log,n]
The worst case occurs when the first pivot fails to split the list. This happens when the original file is
already sorted. If, for example, x[b] is in its correct position, the original file is split into subfiles of
sizesOandn-1.
If this process continues, a total of n - 1 subfiles are sorted, the first of size n; the second of size ("
- 1)
andsoon.Totalnumberofcomparisonstosorttheentirefilearen+(n-1) +("-2)+..+Q)
which is O(n)
Thus, the quick sort works best with completely unsorted files and worst for files that are completely
sorted.
heck Your P
T. Define sorting. Name its various categories.
2. Fill in the blanks:
(r) In a heap, the
----- node has the largest key value.
5.11KEY\TORDS
Sorting: The operation of arranging data in some given order, such as increasing or decreasing with
numerical data or alphabetically, with character data.
puick Sort: A divide and conquer algorithm which works by creating two problems of half size,
solving them recursively, then combining the solutions to the small problems to ger a solution ro the
original problem.
Insertion Sort: A sorting technique that sorts a set of records by inserting records into an existing
sorted file.
Merge Jorz.' Sorting method that uses merging of two ordered lists which can be combined to produce a
single sorted list.
9. \7hat is insertion son? Explain with suitable example. Also explain its time complexity.
10. 'Write a 'C' function for insertion sort. For searching the smallest element tn array use binary
search. Explain your program's time complexity.
'u7hat
LL. is merging? \frite a 'C' program to merge two sorted lists to ger anorher sorted list. Explain
its time complexity.
t2. Vhat is merge sort? Explain the method with a suitable example.
13' \flrite a 'C' program for merge son, which uses the function MergeQ to merge two sorted lists to
get the third sorted list. Explain its time complexity
t4- Explain Quick Son.'VTrite a 'C'program for quick sort. Explain its time complexity.
15. Show that algorithm for quick sort takes O(n) time when the input file is already in sorred order.
0 Internal Sorting
(ii) External Soning
3. (r) root
(b) [log,(N)], O(N'rlog.N)
6
GRAPH ALGORITHMS
CONTENTS
5.0 Aims and Objectives
6.1 Introduction
6.2 Definitions
6.3 Topological Son
6.4 Dijkstra Shortest Path Algorithm
6.5 Warshall Algorithm
6.6 Minimal Algorithm
6.7 Traversing a Graph
6.7.1. Depth-firstTraversal
6.7.2 Breadth-firstTraversal
6.8 Spanning Trees
6.9 Minimum-cost Spanning Tree
6.9.1 MST Propeny
6.9.2 Application of Minimum-cost Spanning Tree
6.10 Let us Sum up
6.11, Ke1'words
6.12 Questions for Discussion
6.13 SuggestedReadings
6.1 INTRODUCTION
Graphs are natural models used to represent arbhrary relationship among data objects. .We often need
to rePresent such atbittary relationship among the data objects *hil. d..li"g with many problems in
comPuter science, engineering, and many other disciplines. Therefore the stl.rdy of graphr ,, or" of the
basic data structures is important.
This section P{esents the definition of a graph (both directed as well as undirected) and related terms.
\7e will discuss various shorresr path algorithms and minimum spanning tree.
6.2 DEFINITIONS
A graph is a structure.made of two components, a ser of vertices V, and the set of edges E. Therefore a
qraph is G:(V, E), where G is a graph. The graph may be directed or undirect.d. \izh.r, the graph is
directed every edge of a graph is an ordered pair of vertices connecred by the edge, wherer, *-h.r, th.
graph is undirected every edge of a graph is an unordered pair of venices .orr.r".,Id by the edge. Given
below in Figure 6.1are rhe srrucrures which are graphs.
,{lr\
/\J\
{1
\/
\ 1:-\t
r i
Figure 6.1
Incifunt edge: If. (V, ,V) is an edge, then edge (V, ,V) is said to be incident on venices v, and
1. For
example, in the graph G, shown above in Figure 5.1 the edges incident on verrex l are (!,2),
$,4), and,
(1,3), whereas in G" the edges incident on verrex I are (1,2)_
\egree
of uertex: It is the number of edges incident on the verrex. For example, in graph G, shown
above the degree of ',zertex 1 is 3, because 3 edges are incident on it. For a directed graph, we need ro
define indegree and outdegree.
Indegree of a vertex v, is the number of edges incident on v,, wirh v as the head. Outdegree of verrex
v.
is the number of edges incident on v,, with vi as the tail. For a graph G, shown the indegree of the
vertex 2 is 1, whereas rhe ourdegree of the vertex2 is 2.
Direaed edge: A directed edge between the vertices v, and v is an ordered pair, and denoted as ( \,
t; )'
Undirected edge: An undirected edge between the vertices v, and v is an unordered pair, and denoted as
('rr, u).
M.S. Universitv - D.D.C.E. Graph Algorithms 141
Path: A path between the veftices vp and v, is a sequence of vertices vo:v;1:v;2:...,v;nrvn SUCh that there
exists a sequence of edges (ro, r,,), (v,,, v,) , ... , (v,",v). In the case of a directed graph, a parh between
the vertices vp and vn is a sequence of vertices vp,v;1,v;2,...,v;,,:v, SUCh that there exists a sequence of edges
(to,r,,), (vi, v,r),... : <v;,rvo>.If there exists a path from vertex vp to vq in an undirected graph,
then there always exists a path from vq to vp also. But in the case of a directed graph, if there exists a
path from yertex vo to vo: then it does not necessarily imply that there exists a path from vq to vp also.
Simple path.' A simple path is a path given by a sequence of vertices in which except the first and the
last vertex all vertices are distinct. If the first and the last vertex is the same then the path will be a
cycle.
Maximum number of edges: The maximum number of edges in an undirected graph with n vertices is
n(n - 1)/2 whereas in case of a directed graph it is n(n - 1).
Subgraph
If E (Gi) consists of all edges (v,,v) in E(G), such that both v, and v, are in V(Gl), then Gl is called an
induced subgraph of G.
For example, the graph shown in Figure 6.2 is a subgraph of the graph G, shown in Figure 6.1.
ry
M.S. University - D.D.C.E.
G
Figure 6.4: Induced Subgraph of Graph G of Figure 6.3
In an undirected graph G, the two vertices v1 and v2 are said to be connected, if there exist a path in G
from v, to vr.(being undirected gra?tlt there exists a patb from o2 to al ako).
connected graph: A graph G is said to be connected if for every pair of distinct verrices (v,,v) there is
path from vito vj. Given below in Figure 6.5 is a graph which is connected.
/:\
-r\rA
.av\
1\
i/t\
lrL/---:A.'
'--' t..
ti*-{ 'r
/
\r//
Figure 5.5
'$7hen
there is no cycle, "topological sorting'(- is a categorizing of vertices such that if there is a path
from v; to v;, then v1 ocCurs prior to v; in the plan.
Algorithm:
Find a vertex v with zero in-degree (must exist!)
Repeat;
Take O(V^2) time.
i. Son the vertices in V- S according to the current best estimate of their distance from the
source.
Consider the following example for illustration. Find the shonest parh
from node X ro node y in the
following graph. A label on an edge indicates the distance between ih",*o
nodes the edge connecrs.
Solution:
Initially P:{A} and
T: {B,C,E,D,Z}
The lengths of different verrices (with respect to P) are:
L(B):1 , L(C):4, L(D):L(E) : L(Z): Y
7 (a, b, c, e)
l0 (a, b, c, e)
4 (a" b, c)
T (a, b, c, e)
(a, b, c, e, d)
4(ab,c)
6.5'$TARSHALL ALGORITHM
Given the Adjacency Matrix A , this marrix produces the path marrix P.
1. [Initialization]
P-A
2 [Perform a pass]
3 [Process Rows]
Repeat step 4 for i:1(1) n.
4 [Across column]
Repeat for j:1(1)n
P,;-P,;V0* ^ Po)
s. [Exit]
tll c-.'B
v"
) [v,
Figure 6.7: Graph g and its Dcpth First Traversds Starting at Vertex v,
Some of the depth first traversal orders are:
(, v,%Y,v,vr%%yrv,
(ir) vr vs v1 y6yev7v, v, %
The procedure for depth first traversal of a graph is given below. The procedure makes use of a global
array visited of n elements where n is the number of vertices of the graph, and the .lemeni, .r.
boolean. If visited[i] :
true then it means that i,h verrex is visited. Initialli we ser visited[i] false, :
therefore:
For(i=1; i<n; i++1
visitedlil = false;
for(i=1; icn; i++)
148 Advanced Data Structure M.S. University - D.D.C.E.
If the graph G to which the dfs is applied is represented by using adjacency lists then the vertices y
adjacent to can
x be determined by following the list of adjacent vertices for each vertex. Therefore the
loop searching for adjacent vertices has the total cost of d, + d, + ... + d", where d, is degree of vertex
v, because the number of nodes in the adjacency list of vertex v, is d,. If the graph G is having n vertices
and e edges then the sum of the degree of each vertex, i.e., (d, + d, +..,+ dJ is 2e. Therefore there are
rctal 2e list nodes in the adjacency lists of G. (if G is directed graph then there are total e list nodes
only). The algorithm examines each node in the adjacency lists at the most once. Hence the time
required to complete the search is O(e) provided n ( : e. Instead of using adjacency lists if adjacency
matrix is used to represent a graph G, then the time required to determine all adjacent vertices of a
verrex is O(n), and since most n vertices are visited the total time required is O(n').
'S7hen
this procedure is applied to the graph of Figure 6.7, then one of the orders in which the vertices
gets visited is shown below:
V1 false true true true true true true true true true
Y2 false false true true true true true true true true
V3 false false false true true true true true true true
Y4 false false false false false false false false true true
V5 false false false false false false false false false true
V6 false false false false false false false true true true
false false false false true true true true true true
false false false false false true true true true true
V9 . false false false false false false true true true true
The procedure for breadth first traversal of a graph is given below. The procedure makes use of a
global array o{ n elements where n is number of vertices of the graph, and the elements are boolean. If
visited[i] : true then it means that i'h vertex is visited. The procedure also makes use of a queue, and
the procedures addqueue and deletequeue are assumed to be available for adding a vertex to the
queue, and for deleting the vertex from the queue. Initially we set visitedli] :: false, therefore:
For(i=1;I<n;i++)
vlsitedlil = false;
deletequeue (y );
if (visitedlyl == false)
{
visitedlyl = tgUei
for everlz adjacent i of x do
if (visitedlil == false)
addgueue (i) ;
i
If the graph G to which the bfs is applied is represented by using adjacency lists, then the vertices
adjacent to x can be determined by following the list of adjacent vertices for each vertex. Therefore,
150 Advanced Data Structure
M.S. University - D.D.C.E.
the loop searching for adjacent veftices has the toral cost of d, + d, + ... + d,, where d, is degree
of
vertex v, because the number of nodes in the a$acency list of verrex v, is d,. If the graph
G is having n
veftices and e edges then the sum of the degree of each verrex, i.e (d, + d, + .....+ ajir 2..
TherefJre
there are 2e list nodes in adjacency lists of G. (if G is directed gr"pl then there are e list nodes
$9
only). Each vertex gets added to queue exactly once, hence the loop ihii. qr.rr. nor empry is iterated
at the most n times. Hence, time requir.d to .o-plete the ,.rr.h ir OGi provided n
_the e. Instead (:
of using adiacency lists if adiacency matrix is used tJ represent a graph i,',h.r, the time required to
determine all adjacent vertices of a vertex is o(n), and since .rr.ry-r.i.x gets added ,o qrr..r. exactly
once the total time required is O(nr).
\Uflhen this procedure is applied to the Figure 6.9 graph, then one of the orders in which the venices
gets visited is shown below:
false true true true true true fue true true true
v2 false false tn€ true tflJe true fte fue fue true
v3 false false false false true true true true true true
false false false true true true true true true true
v6 false false false false false false false true true true
false false false false false false true true true true
v8 false false false false false false false false true tnJe
v9 false false false false false false false false false true
Figure 6.9
T = T 0 {(v,i)};
dfst(i);
i
If G is not connected, then the tree edges, which are precisely those edges followed during the
a graph
depth-first traversal of the graph G, constitutes the depth-firsr spanning forest. The depth-first
spanning forest will be made of trees each of which is one of the connecred componenrs of graph G.
\fhen a graph G is directed then the tree edges, which are precisely those edges followed during the
depth-first traversal of the graph G, form a depth-first spanning forest for G. In addition to this, there
are three other types of edges. These are called back edges, forward edges, and cross edges. An edge A
--+ B is called a back edge if B is an ancestor of A in the spanning forest. A non-tree edge rhat goes from
a vertex to a ProPer descendant is called a forward edge. An edge which goes from a vertex to another
vertex that is neither an ancestor nor a descendant is called cross edge. An edge from a vertex to itself is
a back edge.
Figure 6.14: Depth-first Spanning Forest for the Graph G of Figure 7.17
Consider a graph show below in Figure 6.15.
is we do not add to the set T, because it will form a cycle. For example, consider the graph shown
below in Figure 6.17.
Prim'sAlgoithm
Let G:(V, E) be a weighted graph, and suppose V={1,2,.. ..,n}. The prim's algorithm begins with a set
U initialized to {1}, and at each stage finds the shortest edge (u, v) that connecrs u in U and v in V - U,
and then adds v to U. It repeats this step until U = V.
T=0
u = {1}
WhileU#Vdo
i
find t.he lowest cost edge (u,v)
such that u is in U
and v is in V-U
add (u,v) to T
addvtoU
)
)
6.11 KEY\T/ORDS
Digraph: A graph in which every edge is directed.
Undireaed Graph: A graph in which every edge is undirected.
155 Advanced Data Structure M.S. University - D.D.C.E.
Spanning Tree: A tree obtained from a graph which coyers all its vertices.
Minimum SpanningTree: A tree from the set of spanning tree which has minimum weight.
8. By considering the complete graph with n vertices, show that the number of spanning trees is at
least 2*t-1.
9. Prove that when DFS and BFS are applied to a connected graph the edges of the graph form a
tree.
'\tr7hat
10. do you understand by shortest path from one node to another in a weighted graph. r['/rite
Dijkstra's algorithm to find the shortest path in a weighted graph. Find the shortest path from 3
to T using Dijkstra's algorithm in the following graphs:
(0
M.S. University - D.D.C.E. Graph Algorithms 152
11. Find the minimum distance between the nodes A and F in the following graph.
A B C D E F
A 0 (50) 0 (s3) 0 gi
B 04\ 0 0 (13) (ri (40)
F 0 13) (2 r) (3 1) 0
t2. Obtain a spanning tree for the following graph.
u*) (n
73. Obtain the minimum spanning tree for the following graph. The number in the parentheses are
the cost of the corresponding edge.
A B C D E F
A 0 (60) 0 (s3) 0 (41)
D 0 11 (1e) 0 0t 0
F U (13) (2t\ (3 1) 0 0
7
ALGORITHM DESIGN TECHNIQUES
CONTENTS
7.A Aims and Objectives
7.1 Introduction
7.2 Greedy Algorithms
7.2.1 A Simple Scheduling Problem
7.2.2 Huffman Codes
7.3 Divide and Conquer
7.3.1 Running Time of Divide and Conquer Algorithms
7.3.2 Closest-pointsProblem
7.3.3 SelectionProblem
7.3.4 Theoretical Improvements for Arithmetic Problems
7.4 Let us Sum up
7.5 Ke1'words
7.6 Questions for Discussion
7.7 Suggested Readings
7.1 INTRODUCTION
In this lesson, we will discuss about the design of algorithms. \7e will focus on some common types of
algorithms used to solve problems. For mafly problems, it is pretty possible that at leasr one of these
methods will work.
162 Advanced Data Structure M.S. University - D.D.C.E.
The most evident real-life case of greedy algorithms is the coin-changing problem. To formulate
modification in U.S. currency, we frequently distribute the major quantity. Therefore, to provide
seventeen dollars and sixty-one cents in change, we provide a ten-dollar bill, a five-dollar bill, two
one-dollar bills, two quarters, one dime, and one penny. By doing this, we are assured to diminish the
number of bills and coins. This algorithm does not function in all financial systems, but luckily, we
can establish that it does function in the American financial system. Certainly, it functions even if
two-dollar bills and fifty-cent pieces are permitted.
Another real-life example is traffic problems where building locally best possible choices does not
forever work. For instance, for the duration of certain rush hour times in Miami, it is best to keep
away from the prime lanes even if they seems to be vacant, as traffic will come to a languish a mile
down the lane, and you will be trapped. Also more scandalous, it is healthier in some cases to make a
momenrary deviation in the direction opposite your target in order to evade all traffic holdups.
Now, we will that use greedy algorithms. The first application that will be
discuss some applications
discussed is a simple scheduling problem. Practically all scheduling problems are either NP-complete
(or of alike complicated complexity) or are solved by a greedy algorithm. The second application that
we will discuss is file compression and is one of the most primitive fallout in computer science.
Finally, we will discuss an example of a greedy approximation algorithm.
Job Time
J, 1,6
J, 8
J, -)
J, 14
Scheduling
J, J, J, Jo
16 20 28 40
J, t, J, J,
3122440
Average complerion time : (3+1,2+24+40)/4 : 19.75
o Optimal substrueture: If shortest job is detached from optimal solution, left over solution for n-1
jobs is optimal.
Optimalitjt Proof
Total cost of a schedule is
N
>(l{-k + 1)tik
k:1
t, + (t,+t) + (t,+t,+tr)... (t,+tr+...+tJ
N
(I.,i+ 1) )tik - )k'ttik
k:1
o First term is independent of ordering, as second term increases, total cost becomes smaller.
Assume that there is a job ordering such that x > y and tix < tiy. Swapping jobs (smaller first)
increases second term decreasing total cost
The normal ASCII character set includes roughly 100 "printable" characters. To differentiate these
characters, 7 bits are needed. Seven bits permit the demonstration of I28 characters, so rhe ASCII
characterset adds some other "nonprintable" characters. An eighth bit is added asaparity check.
Assume we have a file that encloses only the characters' A, e, i, s, /, plus empty spaces and neulines.
Assume further, that the file has ten A's, fifteen e's, twelve I's, three s's, four /'s, thirteen blanks, and
one neuline. As the table in Figure 7.L shows, this file needs 174 bits to signify, since there are 58
characters and each character requires three bits.
154 Advanced Data Structure M.S. University - D.D.C.E.
Total L'7 4
Start with a string of characters you would like to compress. For each characrer in the string, compure
its frequency of appearing in the string. Then arrange the characters into order from lowest frequency
to highest frequency. Take the two characters with the minimum frequencies and make a node with
each character (and its frequency) as children of the node. The parent node's data element consists of
the sum of the frequencies of the two child nodes. Insert the node back into the list. Continue this
process until every character is located into the tree. On the completion of this process, you have a
M.S. University - D.D.C.E. Algorithm Design Techniques 155
complete binary tree that can be used to decode the Huffman code. Decoding comprises following a
path of 0s and 1s until you get to aleaf node, which will enclose a character.
Conventionally, schedules in which the text consists of at least two recursive calls are known as divide
and conquer algorithms, where as schedules whose text consists of only one recursive call are not. rVe
usually persist that the sub problems be displaced (that is, basically nonoverlapping). Let us review
some of the recursive algorithms that have been covered in this text.
'We
have already seen several divide and conquer algorithms. In lesson 3, we saw tree traversal
strategies. In lesson 5, we saw the classic examples of divide and conquer, namely mergesort and
quicksort, which have O (n log n) worst-case and average-case bounds, respectively.
Lesson 6 showed routines to recover the shortest path in Dijkstra's algorithm and other events to
perform depth-first search in graphs. None of these algorithms are really divide and conquer
algorithms, because only one recursive call is performed.
Now, we will see more cases of the divide and conquer pattern. Our first application is a problem in
conTputational geometry. Specified z points in a plane, we will illustrate that the closest pair of points
can be found in O(n log z) time. The rest of the discussion shows some awfully interesting, but mostly
hypothetical, results. \7e offer an algorithm which solves the selection problem in O(n) worst-case
time. \(e also prove that 2 n-bit numbers can be multiplied rn o(n) operations and that two n x n
matrices can be multiplied in o(d) operations. Unluckily, yet these algorithms have improved worst-
case bounds than the conventional algorithms, none are realistic barring very large inputs.
. Ve will build up a divide-and-conquer based O(n log n) algorithm; dimension assumed constant.
*t I
*
l t
i
t t_J
ll
t t lt
I
1-Dimension Problem
. 1D problem can be solved in O(n 1og n) by means of sorting.
. Sorting, though, does not simplify to higher dimensions. So, let's build up a divide-and-conquer for
1D.
o Divide the points S into two sets 51; 52 by some x-coordinate so that p< q for all p €S1 and q
€ s2.
. Recursively calculate closest pair (p1; p2) inSl and (q1; q2) in 52.
$ : min(lp2-prl; lqz'qtl)
lD Diaifu d2 Coruquer
. The closest pair is {pl; p2}, or {q1; q2}, or some {p3; q3} where p3 e 51. and q3 eS2.
. In 1D, p3 must be the rightmost point of 51 and q3 the leftmost point of 52, but these ideas do not
simplify to higher proportions.
. How many points of SL can lie in the interval (m-$;m]?
. By definition of $, at most one. Same holds for 52.
lD Diuide dg Conquer
o Closest-Pair (S).
. If lsl :1,output$: infinity. If lsl:2,output$: lp2-ptl.Orelse,performthefollowing
stePS:
1. Letm:median(S).
2. Divide S into 51; 52 at m.
M.S. University - D.D.C.E. Algorithm Design Techniques 157
3. :
$1 Closest-Pair(S1).
4. $2 : Closest-Pair(S2).
. Recursively compute closest pair distances $1 and $2. Set $: min($1; $2).
. Now compute the closest pair with one point each in 51 and 52.
o In each candidate pair (p; d, *h.re p € 51 and q e 52, the points p; q must both lie within $of 1.
. At this point, complications arise, which weren't present in 1D. It's entirely possible that all n:2
points of 51 (and 52) lie within $ of 1.
. Naively, this would require n2l4 calculations.
'We
o show that points in P1; P2 ($ strip around ) have a special structure, and solve the conquer
step faster.
Specifically, we divide the list into groups of 5 elements each, discover the median in each group in
constant time (as each group is of constant size), and then discover the median of these medians
recursively. The main point to observe is that the final step of locaring the median of medians applies
to a much smaller list - of size n/5, and so we still get a small enough running time.
This was just a coarse description and analysis of the algorithm. A more formal analysis detined below:
For straightforwardness of analysis, we suppose that all the list sizes we come across while running rhe
algorithm are divisible by 5.
A lgo rit hm for S e le etion
One way of solving this recurrence is to estimate that the running time is T(") : c'n and then verify
whether the equation is fulfilled for some worth of c'. Substiruring this in the equation we ger
c'n : cn + 9/10 c'n
which entails c' : 1Oc.
also represent the typical divide and conquer algorithm that multiplies two n by n marrices in sub
cubic time.
o Multiplying Integers
. MatrixMultiplication
Multiplying lntegers
Let us consider multiplying two n-dtgit numbers x andy.If precisely one of x andy is negative, then
the solution is negative; or else it is positive.
Ifx:61,438,521 andy:94,736,407,xy:5,820,464,73A,934,047.Letusdividexandyintotwo
halves, including the most important and least important digits, correspondingly. Then xl : 6,143, xr
: 8,52L, yl : 9,473, andyr : 6,4Q7.\we also have x : xllo4 + xr andy : ylroa + yr.rtshows that
xry = xbltO\ + (xlyr + xryl)104 + xryr
Observe that this equation comprises of four multiplications, xlyl, xlyr, xryl, and xryr, which are each
half the size of the original problem (n/2 digits). The multiplications by 108 and 104 amounr ro rhe
placing of zeros. This and the following additions add only O(n) supplemenrary work. If we execute
these four multiplications recursively by means of this algorithm, discontinuing at an suitable base
case, then we acquire the recurrence
T("):aT(n/2)+o(")
'We : O(n), so, unluckily,
know that T(n) we have nor enhanced the algorithm. To attain a
subquadratic algorithm, we must use less than four recursive calls. The main inspection is that
xlyr + xryl: (xl-xr)(yr-yl) + xlyl + xryr
Therefore, rather than using two multiplications to calculate rhe coefficienr of 104, we can use one
multiplication, plus the result of two multiplications that have by now been performed. It is simple to
see that at the present the recurrence equation gratifies
T("):37(n/2)+o(n),
and so we acquire T(") : O(n1og23) : O(n1.59). To complete the algorithm, we mu$ have a base case,
which can be solved lacking recursion.
'When
both numbers are one-digit, we can do the multiplication by table lookup. If one number has
zero digits, then we return zero. In practice, if we were to use this algorithm, we would choose the
base case to be that which is most convenienr for the machine.
Although this algorithm has bemer asymptotic performance than the standard quadratic algorithm, it
is rarely used, because for small n the overhead is significanr, and for larger n there rr" .',r"r, better
algorithms. These algorithms also make widespread use of divide and conquer.
Matix Multiplication
A basic arithmetical problem is the multiplication oftwo marrices. Figure 7.2 gives a simple O(23)
algorithm to figure out C : AB, where A, B, and C are n by n
-r*i."r. The algorithm follows
straightforwardly from the description of matrix multiplication. To calculate C,u we.o-prrt. the dot
product of the zth row in A with theTth column in B. Typic ally arrays commences ar index 0.
120 Advanced Data Structure M.S. University - D.D.C.E.
For a long time it was presumed that O(n3) was needed for matrix multiplication. Yet, in the late
sixties Strassen showed how to break the O(r3) obstruction. The fundamental idea of Srrassen's
algorithm is to split each matrix into four quadrants, as shown in Figure 7.3. Then it is simple to show
that
Ct, : At, Br,, * At, Brl
Cr,r: Ar,rBr,, * ArlBrl
Cr.r: Ar.,8,., * Ar,rBr,,
Cr,o:Mr-Mr+Mr-M,
Itis simple to confirm that this complicated ordering generares the preferred values. The running rime
now assures the recurrence
T(") :77(n/2) + O(n').
The solution of this recurrence is T(n) : O(nlog27) : O(n2.8t).
Typically, there are particulars to consider, like the case when z is nor a power of rwo, but these are
essentially minor troubles. Strassen's algorithm is poorer than the simple algorithm unttl n is quite
large. It does not simplify for the case where the matrices are light (conrain many zero entries), and it
does not effortlessly parallelize. \7hen run with floating-poinr enrries, it is less stable numerically than
the typical algorithm. Therefore, it is has only restricted applicability. However, ir symbolizes an
imperative theoretical landmark and surely shows that in compurer science, as in many other fields,
even despite the fact that a problem appears to have an inherent difficulty, norhing is sure until
verified.
Your
t. Define greedy-choice properry.
2. \7hat is simple scheduling problem?
7.5 KEY\trORDS
Optimal Substructure: If shortest job is detached from optimal solution, left over solution for n-1 jobs is
optimal.
Diuide: Smaller problems are resolved recursively.
Conquer: The key to the original problem is then produced from the solutions ro the sub problems.
172 Advanced Data Structure M.S. University - D.D.C.E.
3. Complete the proof that Huffman's algorithm generates an optimal prefix code.
4. \X/rite a program to implement file compression (and uncompression) using Huffman's
algorithm.
5. lWrite a program to implement the closest-pair algorithm.
2. In simple scheduling problem we are provided with some jobs71, jr, . . . , j^, all with given
running times /,, t,, . . . ,1., respectively with a single Processor.