Chapter 6 - Trees
Chapter 6 - Trees
Chapter 6: Trees
Until now we have studied linear data structures. But there are many applications where
only the linear data structures are not sufficient. For such situations, we use non-linear
data structures like trees and graphs.
In computer science, a tree is an abstract model of a hierarchical structure.
– It is used to represent a hierarchical relationship existing among several data
items.
– A tree consists of nodes with a parent-child relationship.
– It is a non-linear data structure.
– Searching as fast as in ordered array.
– Insertion and deletion as fast as in linked list.
Definition of a Tree
It is a finite set of one or more data items (nodes) such that
- There is a special data item called the root of the tree.
- Its remaining data items are partitioned into number of disjoint subsets, each of
which is itself a tree again, called subtrees.
B C D
E F G H I
J K L M
– Node: Each item in a tree is called a node. A, B, F, G etc. are examples of node.
1
Sharad Chandra Joshi
– Parent: A node that points to (one or more) other nodes is the parent of those nodes while
the nodes pointed to are the children. B is parent of E, F and E,F are children of B.
– Leaf (External Nodes): Nodes with no children are leaf nodes or external nodes or
terminal nodes. In the above tree E, J, G, H, K, L, and M are terminal nodes
– Internal Nodes: Nodes with children are internal nodes or non-terminal nodes.
Note that the root node is also an internal node. Node such as B, C, D, are non-
terminal nodes
– Siblings (Brother): Nodes that have the same parent are siblings.
In the above tree:
• E and F are siblings of parent node B
• K, L, and M are siblings of parent node I.
– Descendents: The descendents of a node consist of its children, and their children,
and so on. All nodes in a tree are descendents of the root node (except the root node
itself).
– Ancestors: The ancestors of a node consist of the parent node and the parent of
that node also. The root node is the ancestor of all nodes and the root does not have
any ancestor.
– Order (Degree): Order (degree) of a node is given by the number of its children.
If a node has two children, its degree is two.
In the above tree.
• degree of node A is 3.
• degree of node B is 2.
• degree of node C is 1.
Similarly, order (degree) of a tree is given by the maximum of the degrees of its
nodes. In other words, if the degree of a tree is n, its nodes can have, at most, n
children. In the above tree the node A has degree 3 which is the maximum degree.
So the degree of the above tree is 3.
– Edge: The link between any two nodes is called an edge. It is directed from the
parent towards its child.
– Path: There is a single, unique path from the root to any node. For e.g. the path
from node A to J is A, B, F, and J.
– Height: A height of a node is equal to the maximum path length from that node
to a leaf node. A leaf node has a height of 0. The height of a tree is equal to the
height of the root. For e.g. the height of node B from leaf node J is 2.
2
Sharad Chandra Joshi
– Depth: A depth of a node is equal to the maximum path length from root to that
node. A root has a depth of 0. The depth of a tree is equal to the depth of the
deepest node in the tree. It, of course, is equal the tree height. For e.g. the depth of
node J is 3.
– Forest: it is a set of disjoint tress. In a given tree, if you remove its node then it
becomes a forest.
– Level: The level of a node is 0, if it is root. Otherwise, it is one more than its
parent. For e.g.
• Node C is at level 1
• Node K is at level 3
Binary Tree
A binary tree is a finite set of data items which is either empty or consists of a single item
called the root and the two disjoint binary trees called the left subtree and right subtree.
In a binary tree the maximum degree of any node is at most two. That means, there may
be a zero degree node (usually an empty tree) or one degree node or two degree node.
Fig: A simple binary tree, in which every node has at most two children.
3
Sharad Chandra Joshi
A Strictly binary tree or Full binary tree, or Proper binary tree, is a tree in which
every node has zero or two children. i.e. Every internal node has nonempty left and right
subtrees. A strictly binary tree with n leaves contains 2n – 1 nodes.
A Complete binary tree or a Perfect binary tree is a strictly binary tree in which all
leaves are at the same depth. It has following properties.
Tree traversals
Visiting the nodes in certain order is called traversing. The reason for visiting the nodes is
to process the contents of the node according to our requirement.
In a linear structure like an array or a linked list, there are two ways of traversal
–from front to end and from end to front
These traversals can be achieved, simply, by using loops.
Tree, being a non-linear data structure, there are three different ways for traversal.
Preorder, Inorder and Postorder traversals.
4
Sharad Chandra Joshi
InOrder traversal
In this traversal, the root node is visited in between the left and the right node in a
subtree. The rule is
1. Visit the nodes in the left subtree in inorder (L).
2. Visit the root node (N).
3. Finally, visit the nodes in the right subtree in inorder(R).
PostOrder traversal (also called breadth first traversal)
In this traversal, the root node is visited after the left and the right node in a subtree. The
rule is
1.visit the nodes in the left subtree in postorder(L)
2.then, visit the nodes in the right subtree in postorder(R).
3.Finally, visit the root node (N).
• Preorder traversal : A, H, G, I, F, E, B, C, D
• Postorder traversal : G, F, E, I, H, D, C, B, A
• In-order traversal : G, H, F, I, E, A, B, D, C
B C
D E F G
H I
5
Sharad Chandra Joshi
Expression Tree
An expression tree is used to represent arithmetic and logical expressions. It is built up from the
simple operands and operators of an expression by placing the simple operands as the leaves of a
binary tree, and the operators as the interior nodes.
For each binary operator, the left subtree contains all the simple operands and operators in the left
operand of the given operator and the right subtree contains everything in the right operand
For a unary operator, one subtree will be empty.
When an expression tree is traversed in inorder, we get an infix expression. When it is traversed
in preorder, we get the equivalent prefix expression. Similarly, when it is traversed in postorder,
we get the equivalent postfix expression.
a+b*c
Evaluation of Expression Tree +
3 2
6
Sharad Chandra Joshi
A
*
B C
As, already said, When an expression tree is traversed in inorder, we get an infix
expression. When it is traversed in preorder, we get the equivalent prefix expression.
Similarly, when it is traversed in postorder, we get the equivalent postfix expression.So,
now let us check:
2. (A + B) * C
+ C
A B
7
Sharad Chandra Joshi
3. (A + B * C) $ ((D + E) * F)
+ *
A + F
*
E
B c D
8
Sharad Chandra Joshi
a b
a b c
a -
b c
a - d
b c
a +
*
- d
+
a
b - d
c 9
b c
Sharad Chandra Joshi
A binary tree is a collection of nodes. Each node in a binary tree has three fields, the data part,
and two pointers that point to the root node of left and right subtree. For a leaf the left and right
pointers will be null. Also for an empty subtree, the corresponding pointers will be null. An
external pointer is used to point to the root node of the tree.
struct bnode
{
int info; //for storing data. Currently it is of type int.
struct bnode *leftchild; //pointer to the root of the left subtree
struct bnode *rightchild; //pointer to the root of the right subtree
};
root
a
b c
b c
d e
e
d
10
Sharad Chandra Joshi
A Binary Tree can be represented by means of a linear array having index ‘k’, 0< k<= n,
where n is the number of nodes.
Siblings(k) : If the left child at index k is given, then its right sibling is at k+1. Similarly,
if right child at index k is given, its left child is at k-1.
11
Sharad Chandra Joshi
Some Binary Trees along with their sequential representation are as follows:
a
0 a
1 b
2 c
3 d b c
4 e
5 f
6 g
7 d
8
f g
9 h e
10 i
11 j
12 k
13 h j k
14
i
A sequential representation consumes more space for representation binary trees. But for
representing complete binary tree, it proves to be efficient as no space is wasted. But, the
amount of spaces has to be determined at first.
A Binary Search Tree is a binary tree which is either empty, or in which every node
contains a key and satisfies the following criteria:
▪ All keys (if any) in the left –subtree is smaller than the key in the root.
▪ All keys (if any) in the right subtree is larger than the key in the root.
▪ The left and right subtrees again BST.
12
Sharad Chandra Joshi
E.g. 18 10 26 5 20 30 4 6 25
10 26
5 20 30
4 6 25
▪ With the above definition, we can also conclude that, no two nodes in a BST
have same value.
▪ The inorder traversal of a BST yields the keys in a sorted order (ascending).
▪ If the tree is reasonably full, searching for a key takes O(log2(n)) time, where
n is the number of nodes.
Insert (x,T): Insert a new node with x in the info part in the tree T such that the
property of BST is maintained.
Delete(x,T): Delete a node x in the info part from the tree T such that the property of
BST is maintained.
13
Sharad Chandra Joshi
Searching
In BST, searching of the desired data item can be performed by branching into left or
right subtree until the desired data item is found. We proceed from the root node.
▪ If the tree is empty, then we failed to find x, so return NULL.
▪ Else, compare x with the key stored in the root of T.
✓ If x is same as the key in root, return address of the root.
✓ If x is smaller than the key in root, search for x in the left subtree of T
recursively.
✓ If x is greater than the key in root, search for x in right subtree recursively.
Insert
In BST, we don’t allow duplicate data items. So, to insert a data item having key ‘K’ in a
binary search tree, we must check that its key is different from those of existing data
items by performing a search for data item with the same key K. If the search is
unsuccessful, the data item is inserted into BST at the point the search is terminated.
Algorithm
Given a new item with key x, then following algorithm will insert it in BST T.
▪ If the tree is empty, create a new node n storing the key x and make this new node
n the root of the tree. Otherwise, compare x to the key stored at root R.
▪ If x is identical to key in R, don’t change the tree.
▪ If x is smaller, insert new item into left subtree of the root recursively.
▪ If x is larger, insert new item into right subtree of the root recursively.
7
7 7
3 8
3 8
7
5 9
5 1
3
7 7
7
3 8 3 8
3
5 5 9
1 1
5
2
14
Sharad Chandra Joshi
Deletion
To delete a given node (key), it is searched first. If search is unsuccessful, the algorithm
terminates. Otherwise, the following possibilities arise:
▪ If node is a leaf, it can be deleted immediately.
▪ If the node has one subtree, the node can be deleted by replacing it with the root
of the non-empty subtree.
If the node has two subtrees, we replace the node with the rightmost node of the left
subtree Tr or the leftmost node of the right subtree Tr
E.g.
10
Fig:1
6 13
3 9 16
2 4 7 15 18
10
Fig:2
6 13
3
9 16
2 4
7 18
15
Sharad Chandra Joshi
Now lets delete node with key 13 from the above tree.
10
Fig:3
6 16
3
9 18
2 4 7
Now let’s see another example of deleting root node.
10
Deleting root
node with key 10
6 16
3
9 15 18
2 4 7
Leftmost node of right subtree takes its place Rightmost node of left subtree takes its place
9
15
6 16
6 16
3 7
3 9 15 18
18
2 4
2 4 7
16
Sharad Chandra Joshi
1. Complexity of BST operations is proportional to the length of the path from the
root to the node being manipulated, i.e., height of the tree
2. In a well balanced tree, the length of the longest path is roughly log n, where n is
the no of nodes in the tree
a. E.g., 1 million entries → longest path is log21,048,576 = 20
3. For a thin, unbalanced tree, operations become O(n):
a. E.g., elements are added to tree in sorted order
4. So, Balancing is important to decrease the computational complexity
Example:
1
4
2
2 6 3
4
1 3 5 7
5
In the left tree, the Search operation In the right tree, the Search operation
requires 3 comparisons to find 7 requires 7 comparisons to find 7
1. We know the complete binary tree has the shortest overall path length for any binary
tree.
2. The longest path in a complete binary tree with n elements is guaranteed to be no
longer than log(n).
3. If we can keep our tree complete, we’re set for fast search times but the cost is very
high to maintain a complete binary tree.
4. Instead, use height-balanced binary trees: for each node, the height difference
between the left and right sub trees is at most one.
A binary tree of height h is completely balanced if all the leaves occur at level h or h-l, and all
nodes at levels lower than h-l have two children.
E.g. of balance tree
17
Sharad Chandra Joshi
Level 2 4
9 15 25
Level 3 1
5
8 14
Definition: An AVL tree is a binary search tree in which the heights of the left and right
subtree of the root differ by, at most, 1 and in which the left and right subtree are again
AVL trees. It is named after their inventors’ G.M. Adelson, Velskii and E.M. Landis.
2 8
1 4 7
18
Sharad Chandra Joshi
AVL trees
Cases of violation
After insertion, violation might occur in four ways.
1) Insertion at left subtree of the left child of node n.
2) Insertion at right subtree of the left child of node n.
3) Insertion at left subtree of the right child of node n.
4) Insertion at right subtree of the right child of node n.
The first and forth cases (left-left and right-right) is rebalanced using Single Rotation.
The second and third cases (right-left and left-right) are fixed using Double Rotation.
This is also known as dogleg pattern.
Single Rotation:
K2
K1
K1 K2
X
Y Y Z
X
5
5
7
8 2
2
1 6 8
1 7 4
4
3
3 6
19
Sharad Chandra Joshi
K1 K2
K2 K1
X
Z
Y X Y
Z
Single Rotation to fix case 4
2
4
1 4
2 5
3 5
1 3 6
6
AVL property violated by insertion of 6, then fixed by single rotation
Insert 3 3 Insert 2 3
Insert 1 3
2
2
Perform single rotation between the
root 3 and its left child 2 1 3
1
AVL property is
violated at root 3
20
Sharad Chandra Joshi
Insert 4
2
1 3
Insert 5 2 2
AVL property is
violated at root 3 5
2
Insert 6
1 4
3 5
AVL property is 6
violated at root 2 4
2 5
21
Sharad Chandra Joshi
4
Insert 7
2 5
1 3 6
AVL property is 7
violated at root 5
4
2 6
Perform single rotation
between the root 5 and 1 3 5 7
its right child 6
Double Rotation:
K2
K1
Z X
Y Z
X Y
K3 K2
K3
K2 K1 K3
K1
D D
K2 K1
A B C D
A C
22
Sharad Chandra Joshi
K1
K1
K3
K2
A
K2
D A K3
B
B C
C D
K2
A B C D
Insert 16 4
2 6
1 3 5 7
16
23
Sharad Chandra Joshi
Insert 15
4 Perform double rotation
at the root 7 4
2 6
K1 2 6
1 3 5 7 K2
K3 1 3 5 15
16
K3
AVL property is K1 7 16
violated at root 7 15 K2
Insert 14 4
K1
Perform double 4
2 6 rotation at the root 6 K2
K3 2 7
1 3 5 15 K1 K3
1 3 6 15
K2 7 16
5 14 16
AVL property is
violated at root 6 14
24
Sharad Chandra Joshi
Insert 13
4
Perform single 7
1 3 6 15
2 6 14
16
5 14 16 1 3 5
13
13
AVL property is
violated at root 4
Insert 12
7 Perform single
rotation at the root 14 7
4 15
4 15
2 6 14 16
2 6 13 16
1 3 5 13
1 3 5 12 14
12
AVL property is
violated at root 14
25
Sharad Chandra Joshi
1. Insert the new node using the same algorithm as for an ordinary binary search tree.
2. Beginning with new node, calculate the difference in heights of the left or right
subtrees of each node on the path leading from the new node back up the tree towards
the root(node).
3. If the height balance is within the range -1 to 1 continue their checks until the root
node is encountered.
4. If height balance is out of the range then an imbalance is found, perform a rotation of
nodes to correct imbalance.
i. If insertion is at left subtree of the left child of node n or insertion at right
subtree of the right child of node n.
Perform single rotation right or single rotation left respectively.
ii. Insertion at right subtree of the left child of node n or insertion at left
subtree of the right child of node n.
Perform double rotation left-right or right-left respectively.
26
Sharad Chandra Joshi
Heap
A binary heap or simply a heap is an almost complete binary tree with the key in each
node, such that
Above definition is of max heap (descending partially ordered tree). For min heap the
root has the smallest key.
8
7
7 9
5 5 8
3
3 9
This is not a Heap since it violates rule 1 This is not a Heap since it violates rule 2
6 7
3 8 5
27
Sharad Chandra Joshi
92 parent(i) = floor(i/2)
left-child(i) = 2i
right-child(i) = 2i + 1
86 48
33 57 22 26
25 12 37 42 18
92 86 48 33 57 22 26 25 12 37 42 18
1 2 3 4 5 6 7 8 9 10 11 12
Note that a heap is definitely not a search tree
Types of Heap:
Max Heap : For max heap, the root has the largest key. The key at the parent node is
larger than the keys at their children and descendents.
Min Heap : The root has the smallest key. The key at the parent node is smaller than the
keys at their children and descendents.
Heaps are represented in array rather than linked structures since it is a special kind of
binary tree that have no gaps in an array implementation.
A Binary tree with only one node satisfies the properties of a heap.
To insert a new element in to the heap, we create a hole in the next available location,
since otherwise the tree will not be complete tree.
If new item can be placed in the hole without violating heap order, then we do insert
there. Otherwise, we slide the element that is in the hole’s parent node into the hole, thus
bubbling the hole up towards the root. The process is continued until the new item can be
placed in the hole. This strategy is known as a percolate up. The new element is
percolated up the heap until the correct location is formed.
28
Sharad Chandra Joshi
Eg: Inserting following numbers in the max Heap: 40, 32, 75, 20, 65, 48, 88.
32 32 75
75
percolate 75 Insert 20 75 Insert 65 75
32 40
32 40
32 40
20
20 65
Percolate 65 75 75
Insert 48
65 40 65 40
20 32 20 32 48
Percolate 48
75 Insert 88
75
65 48
65 48
20 40
32 20 40
32 88
29
Sharad Chandra Joshi
Percolate up 88 75
Percolate up 88 88
65 88
65 75
20 32 40 48 20 32 40 48
Deletions in Heap
Deleting in heap is always applied for the item at the root (i.e minimum in min heap and
maximum in max heap). When the root item is removed, a hole is created at the root.
Since, the heap now becomes one smaller, it follows that the last element ‘x’ in the heap
is placed in the hole. If the heap property is violated, we slide the smaller (larger) hole’s
children into the hole in the case of min (max) heap. This process of pushing up the
smaller item is continued until the item ‘x’ finds its proper place. For deleting in heap lets
take above example:
30
Sharad Chandra Joshi
65 75
65 75
20 32 40 48 20 32 40
75
Percolate down
Del max
65 48
65 48
20 32 40 20 32 40
Del max 40
Percolate down 65
65 48
40 48
20 32 20 32
Del max
Percolate down 48
40 48
40 32
20 32 20
31
Sharad Chandra Joshi
Game Tree
Game tree is most popular algorithm for determining the best move for a player during a
game
• Nodes of game tree represents possible game positions
• Children of a node represent positions that can be arrived at via one move from
the parent position
For example:
General idea is to increase the game winning possibility for one player
70 40
30 9 10 15
45 50 30 20 12
The tree here is known as min-max tree. Here, we search for best move.
The best move for this time is calculated by analyzing the moves in further levels. For
example, in given example, if 7 is our move and our opponent will definitely choose 70,
as it can lead up to 50, the maximum value at 4th level. Now our strategy is that, if
opponent chooses 70, then do not choose 30, because it will lead to 50. Therefore, choose
9. After level 4, we might get chance of wining the game.
32
Sharad Chandra Joshi
Huffman Algorithm
Lots of compression systems use Huffman Algorithm for the compression of text,
images, audio, data etc. The general idea behind the Huffman algorithm is that, use less
number of bits to represent mostly used key.
For example:
AAABCAABBCDDAAAABAAABABABADDAAAABC
If the number of A is 10,000, B is 234, C is 45 and D is 134 then the total storage
required will be (10,000 + 234 + 45 + 134) * 2 = 20,826 bits to store.
Since, the data ‘A’ is repeated ten thousand times, if the storage for A can be reduced,
then the total space required will automatically be reduced. Therefore, in Huffman
algorithm, the data (symbol) that is used most is represented by a single bit.
Now, while assigning the code word, the most frequently used symbol is assigned a code
word of length ‘1’, and the final one always gets code word with all ones.
Now, for the given data string, the binary value will be,
00010110001010110111111000010000100100100111111000010110
Now, if we calculate the total space required to store the data, it will be:
33
Sharad Chandra Joshi
Now the problem is how we recover the original data from compressed string;
00010110001010110111111000010000100100100111111000010110
1/0
1
0
1/0
A-0
1
0
1/0
B - 10
0 1
C - 110 D - 111
Now, to decode the encoded string, read a bit at a time. Compare with root node. If it is 0,
then we are at the end decision point. So, it must be ‘A’. If it was 1, then we have to read
the next bit again and check. If the next bit is 0 then it is B. Otherwise, read next bit and
check until we do not reach a decision point.
34
Sharad Chandra Joshi
a b C d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed-length codeword 000 001 010 011 100 101
Variable-length codeword 0 101 100 111 1101 1100
f:5 e:9
d:16 a:45
14 25
0 1 0 1
a:45
25 30
0 1 0 1
35
Sharad Chandra Joshi
a:45 55
0 1
25 30
0
0 1
100
1
0
a:45 55
1
0
25 30
1 0
0 1
Each part shows the contents of the queue sorted into increasing order by frequency. At
each step, the two trees with lowest frequencies are merged.
Leaves are shown as rectangles containing a character and its frequency. Internal nodes
are shown as circles containing the sum of the frequencies of its children. An edge
connecting an internal node with its children is labeled 0 if it is an edge to a lift child and
1 if it is an edge to a right child.
The codeword for a letter is the sequence of labels on the edges connecting the root to the
leaf for that letter.
36
Sharad Chandra Joshi
B-Tree
In binary search tree, each node contains a single key and points to two sub-trees.
We may extend this concept to a general search tree in which each node contains one or
more keys.
A multi-way search tree of order n is a general tree in which each node has n or fewer
sub-trees and contains one fewer key than it has sub-trees.
25 50 75 A
B C D E
12 18 40 55 60 70 90
H
F 30 G 56 58 100 120 140
Each node can have at most 3 keys and 4 children, since the order is 4. The nodes A, D
and H are called full nodes since it has the maximum no. of keys.
Definition of B-Tree
A B-tree of order m is an m-way search tree in which
1. All leaves are on the same level
2. All internal nodes except the root have at most m nonempty children, and at least (m/2)
nonempty children
3. The number of keys in each internal node is one less than the number of its children,
and these keys partition the keys in the children in the fashion of a search tree.
37
Sharad Chandra Joshi
4. The root has at most m children, or as few as 2 if it is not a leaf, or none if the tree
consists of the root alone
400
First a search is made to see if the new key is in the tree. This search (if the key is new)
will terminate in failure at a leaf. The new key is then added to the leaf node. If the node
was not previously full, then the insertion is finished.
When a key is added to a full node, then the node splits into two nodes on the same level,
except that the median key is not put into either of the two new nodes but is instead sent
up the tree to be inserted into the parent node.
10 20 60 70
Inserting 110
60
10 20 70 110
38
Sharad Chandra Joshi
60
10 20 40 70 80 110 130
Inserting 100
60 100
10 20 40 70 80 110 130
60 100
Inserting 240
60 100 180
39
Sharad Chandra Joshi
The first four keys will be inserted into one node as shown in the diagram. They are
sorted into proper order as they are inserted. There is no room for fifth key 110, so its
insertion causes the node to split into two, and the medial key, 60 moves up to enter a
new node, which is a new root. Since the split nodes are now only half full, the next three
keys can be inserted without any difficulty. Note these simple insertions can require
rearrangement of the keys within a node. The next insertion 100, again splits a node and
this time it is 100 itself that is the median key, so moves up to join 60.
Next several insertions are similar. The final insertion of 240 splits the node containing
110,130, 180 and190 which is again full. Hence, this node in turn splits and 180 is
promoted into the root.
Try inserting 30, 120, 140, 200, 210, 160 respectively into the above B-Tree.
B+ Tree
One of the popular techniques for implementing indexed sequential organization is to use
a variation on the basic known as B+ tree. In contrast to a B-tree, here all records are
stored at the lowest level of the tree; only keys are stored in interior blocks. All the leaves
are connected to form a linked list of the keys in sequential order. It has two parts: the
index part is the interior nodes; the sequence set is the leaf nodes. The linked leaves are
the excellent aspect of B+ tree; the keys can be accessed both directly and sequentially.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented
storage context. It is used to provide indexed sequential file organization, the key value in
the sequence set are the key values in the record collection.
70
20 43 110 136
7 15 16 18 19 20 26 30 36 40 42 43 50 58 60 62 65 69 70 10 11 12 12 13 13 13 13 14 14 15
0 0 0 8 0 2 6 1 0 5 0
The ReiserFS filesystem (for Unix and Linux), XFS filesystem (for IRIX and Linux),
JFS2 filesystem (for AIX, OS/2 and Linux) and NTFS filesystem (for Microsoft
40
Sharad Chandra Joshi
Windows) all use this type of tree for block indexing. Relational databases also often use
this type of tree for table indices.
41