0% found this document useful (0 votes)
9 views65 pages

Digital Search Structures

Chapter 6 discusses Digital Search Trees (DST) and their operations including insertion, searching, and deletion, emphasizing the binary representation of keys. It also introduces related structures like Binary Tries and Patricia, which optimize search efficiency for long or variable-length keys. The chapter outlines algorithms for these operations and provides examples to illustrate the concepts.

Uploaded by

Honey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views65 pages

Digital Search Structures

Chapter 6 discusses Digital Search Trees (DST) and their operations including insertion, searching, and deletion, emphasizing the binary representation of keys. It also introduces related structures like Binary Tries and Patricia, which optimize search efficiency for long or variable-length keys. The chapter outlines algorithms for these operations and provides examples to illustrate the concepts.

Uploaded by

Honey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Chapter 6

Digital Search Structures


INTRODUCTION
• A Digital Search Tree (DST) is a binary tree, where each node contains one
unique element.
• The element assigned to node is in binary representation.
• The bits in the binary representation of a key are numbered from left to right.
• For example in the binary representation 1000, bit one is 1, and bits two, three
and four are 0.
• All the keys in the left sub tree of a node at level i have bit i equal to zero
whereas the nodes in the right sub tree at this level have bit i = 1.
• The nodes in the left sub tree of the tree starts with bit 0 and the nodes in the
right sub tree starts with bit 1.
Operations on Digital Search Trees

Similar to the Binary Search Tree (BST) operations, the


Digital Search Tree also has three operations which could
be performed on it.

• Insertion
• Searching
• Deletion.
Insertion

• For performing the operation of insertion, the first key inserted


into the DST is considered as the root node.
• The consecutive keys to be inserted are compared with the root
node.
• If the first bit of the key is 0 (key starts with bit 0) then the
key is inserted as a left child at Level 1 and if the key starts with
bit ‘1’ then the key is inserted as a right child at Level 1.
• If already a node exists at this level 1, the next bit is considered
to decide the position of the key at next level.
• The process is repeated until all the keys are inserted into the
digital search tree.
Insertion
• Example : Construct a Digital Search Tree by inserting the
keys in the given order 0111,1001,0101,0010,1011,1000,0110
• Step 1: insert 0111 Step 2:insert 1001

• Step 3: insert 0101 Step 4: insert 0010


• Step 5: insert 1011 Step 7:insert 0110

• Step 6: insert 1001


Algorithm for insertion in a DST
Searching
• The process of searching for an element in a
digital search tree is similar to insertion into a
digital search tree.
• The process of searching for a key in a Digital
Search Tree starts at the root node.
• If the key is same as the root node value, the
search stops as the element is found in the tree.
• If it is not same, the first bit of the key is
checked.
• If it is 0, the left child of the root is checked
and if it is 1, the right child of the root is
Algorithm for searching an element in a DST
Example: Consider the DST shown. Search for the element
1010 in the tree.
• Solution: Let the tree pointer t point to root node of Fig. 7.10 and the search key is 1010. The step-by-step process
for searching the element is given below.
• Step 1: Check if t = = NULL

As t ! = NULL, goto Step 2


• Step 2: Check if t -> value = = Key

As t->value of 0111 != key value of 1010, goto Step 3


• Step 3: Consider two variables i and temp; where, i is an integer variable and temp is a temporary tree pointer.

Let i = 1 and temp = t


• Step 4:
• Iteration 1: Check if key[1] = =0. As the key value is 1010, key[1] !=0. In the “else” part check if temp->right! =
NULL. Execute temp= temp->right. Therefore, temp is now pointing to 1001. As temp->value != key, goto else part
and execute i++. Now, i = 2.
Repeat Step 4
• Iteration 2: Check if key[2] = =0. As the key value is 1010, key[2] = =0. In the “if” part, check if temp-> left !=
NULL. Execute temp = temp ->left. Therefore, temp is now pointing to 1011. As temp->value != key, goto else part
and execute i++. Now, i = 3.
Repeat Step 4
• Iteration 3: Check if key[3] = =0. As the key value is 1010, key[3] = =1. In the “else” part check if temp-> right !=
NULL. Execute temp = temp ->right. Therefore, temp is now pointing to 1010. As temp->value = = key, write
“Element Found” and Exit.
Deletion
Deleting an element from DST replaces the node with any
leaf node in either of the nodes subtrees. Therefore deletion
can be processed as two cases.

• Case 1: The element to be deleted is a Leaf node.

• Case 2: The element to be deleted has one or two children.


• Case 1:

As the element to be removed is the leaf node, it can be


deleted by just removing the link from its parent. The same
process is followed even though the leaf node is either a left
node or a right node.
Example: Consider the DST shown and delete 1011
After deleting
1011 from the
DST
• Case 2:
When the node to be deleted contains one or two child nodes, then the deletion can be
performed by replacing the deleted node with any leaf node in either of the subtrees.
Example : Consider the DST shown and delete 1001
Note: 1001 has one child

After deleting 1001


from the
DST

• The element 1001 is in the node which has one right child. This node can be removed by
just replacing the deleted node 1001 with the child node 1011
Example : Consider the DST shown and delete 1001
Note: 1001 has two children

After deleting 1001 from the DST

The element 1001 can be replaced with


any leaf node from its sub-trees. From
the Fig., it can be observed that 1001
can be replaced with either 1010 or 1110.
Selecting 1010 to replace, the DST
after deleting the element 1001 is
shown.
Algorithm for Deletion
delete_dst(tree *t, key)
Step 1: IF t = = Null
Write “Tree is NULL, deletion not possible”
[END OF IF]
Go to Step 4
Step 2: Use the “dst_search(t, key)” algorithm and find the position of the node to
be deleted. Let the node to be deleted is represented as “temp” and its parent is
represented as “previous”.
Step 3: IF temp ->left = = NULL && temp ->right = = NULL
IF previous -> left = = temp
SET previous ->left = NULL
FREE temp
ELSE
SET previous ->right = NULL
FREE temp
[END of IF]
ELSE IF temp->left ! = NULL
Replace temp->value with any leaf node value of left sub tree
ELSE
Replace temp->value with any leaf node value of right sub tree
[END of IF]
Step 4: EXIT
Binary Tries and Patricia
• When the length of the key is very long, keys are of different
length, digital search trees are inefficient.
• To reduce the number of comparisons for searching, an efficient
related structure called PATRICIA (Practical Algorithm to
Retrieve Information Coded in Alphanumeric) is used.
• By using Patricia, we can reduce the number of key comparisons
done during a search to one.

The Patricia can be developed in three steps.


• Step 1: Construct Binary Trie
• Step 2: Transform Binary Trie into Compressed Binary Trie.
• Step 3: Obtain Patricia from compressed Binary Trie.
Binary Tries
• A Binary Trie is a binary tree with two possible nodes namely Branch
Node and Element Node.
• A Branch Node has two link parts Left Child and Right Child and has no
data part.
• An Element Node has only the data part and has no link parts.
• In binary trie, branch nodes are represented as ellipses and element nodes
are represented using rectangles.
• Searching in the binary trie is done based on the bit pattern of the search

• If the bit is zero, the search moves to the left sub tree, otherwise it
moves to the right sub tree.

• In each branching, the ith bit of search key is used at level i.

• Once the element node is reached, the key in this node is compared with
the key we are searching for. This is the only key comparison that takes
place.

• Example: Search for the key 0010 in the binary trie .


• Solution: First follow the left child, then again the left child and finally
the right child. This node is now compared with the key.
Compressed Binary Trie
• A Compressed binary trie is a binary trie that is modified
by removing all the nodes with degree one.
• There is no branch node whose degree is 1
• A BitNumber is added to each branch node, so it consists
three fields Left child, Right child and bit field.
• The BitNumber tells which bit of the key to be used to
decide whether to move to the left or right subtrie.
• Example: Search the key 1001 in the Compressed Trie.
Solution:
• Compare the search key with root node (branch node). It consists of BitNumber
1, which indicates the branching is to be done depending on the first bit of the
data. Since the first bit in the search key is 1, move to the right child of the
root.
• The right child is the branch node with BitNumber 2, which indicates that the
branching is done depending on the second bit of the data.
• In the search key, the second bit is 0 and so move to the left branch. Observe
that this branch node has BitNumber 4.
• The 4th bit in the search key is 1, so move to the right branch.
• Finally the element node is reached. Compare the data in the element node with
the search key.
• The search key is matched with the data in the element node. Hence, the search
is successful.
• Note: The number of moves for searching 1001 are reduced from 4 in binary trie
to 3 in compressed binary trie.
Patricia
• Compressed binary trie may be restructured using a type of
node called augmented branch node instead of two
nodes—branch node and element node.
• The new structure formed, is called a Patricia.
• An augmented branch node is a compressed trie branch node
augmented by another field ‘data’ .
• The augmented branch node consists of

• The root is 0 iff Particia is empty


The Patricia is obtained from a compressed binary trie by using the rules:

1. Branch nodes are replaced by the augmented branch nodes.


2. Remove all the element nodes.
3. Store the data previously in the element node in the ‘data’ part of the augmented
branch nodes.
4. The total number of element nodes in the non-empty compressed binary trie is equal to
one more than the total number of branch nodes in it. So, it is necessary to add one extra
augmented branch node called header node in the Patricia.
5. For the header node, BitNumber is set to zero, right child link is not used and by using
the left child link the remaining structure is set as the left sub-tree to the header node.
6. The assignment of data to augmented branch node is done in such a way that the
BitNumber in the augmented branch node is less than or equal to that in the parent of the
element node that contained this data.
7. Replace the pointers to element nodes in compressed binary trie by pointers to the
respective augmented branch nodes.
• The Patricia obtained from the compressed trie is

Operations on Patricia:
1) Search
2) Insert
3) Delete
Searching Patricia
• To search for a key in Patricia, we start at the root and
proceed down the tree, using the BitNumber in each node to
tell us which bit to examine in the search key.

We proceed left if the bit is 0 and right if it is 1.
• The keys in the nodes are not examined at all on the way
down the tree.
• Eventually, when an element pointer is encountered i.e pointer
to node with a lesser BitNumber, the node value is checked
with the search key.
• Thus, if the key at the node pointed to by this element pointer
is equal to the search key, then the search is successful;
otherwise, it is unsuccessful.
• Example: Search for 1001 in the Patricia
Algorithm for Searching Patricia
Patricia* patricia_ search(Patricia *t, key)

// returns a pointer to node whose data is checked with key

Patricia p, y;

Step 1: IF t = = NULL

Return t // empty tree

[END of IF]

Step 2: SET y = t->left_child; // move to left child

SET p = t;

Step 3:

WHILE y->bit_number > p->bit_number

do

SET p = y;

If key[y->bit_number]= =0

SET y = y->left_child;

ELSE

SET y = y->right_child;

[END of IF]

[END of WHILE]

Step 4: IF key = = y ->data

Write “Element Found”

ELSE
Inserting into Patricia
• Let key be the element to be inserted. Firstly, search for the key in Patricia.
Let q be the node where the patricia_search algorithm terminates. Find the bit
position j at which q and key differs. The BitNumber for the new node is j. The
position for the new node has to be decided. The new node will be placed in
between A and B where A and B can be obtained as follows.
• Initially A points to the header node and B points to its left child. Move A and
B down the Patricia until the condition A - BitNumber < B - BitNumber < j is
satisfied and update the position of A and B. Set node A as B. B is updated
based on the bit in the key at position specified by the B - BitNumber. If the
bit is 0, update B as B's left child and otherwise as its right child.
• As the condition fails, the positions of A and B are fixed and the new node is to
be placed in between A and B. If B is a left child of A, the new node is inserted
as a left child to A and otherwise new node is inserted as a right child to A.
• If the jth bit in the key is 0 then the left link of new node will be self-pointed
and right link of new node points to node B. Otherwise, the right link of new
node will be self-pointed and left link points to node B.
Example 7: Construct a Patricia by inserting elements in the order
1000, 0010,1001,1100,0000 and 0001

Step 2: insert 0010


Step 1: insert 1000

Step 4: insert 1100


Step 3: insert 1001
Inserting into Patricia
Step 5: insert 0000 Step 6: insert 0001
Delete from Patricia
• The process of deleting a node from a Patricia can be done by
updating the forward and backward pointer.
• Let ‘p’ be the node which consists of the element to be deleted.

• Deletion in Patricia can be handled as two different cases as


listed below.

Case 1: ‘p’ has a self-pointer


Case 2: ‘p’ has no self-pointer.
Delete from Patricia
Case 1: ‘p’ has a self-pointer
There are again two possibilities in this case.
‘p’ is a header node
‘p’ is not a header node.
• If ‘p’ is a header node and has a self-pointer, deleting
‘p’ from Patricia will set the Patricia ‘t’ to NULL.
• If ‘p’ is not a header node and has a self-pointer,
update the forward pointer of parent of ‘p’ to point to
the child of ‘p’ and free the node ‘p’ from the Patricia.
DELETE FROM PATRICIA

Delete 1100
from the
patricia
Delete from Patricia

Case 2: ‘p’ has no self-pointer


• If the node ‘p’ has no self-pointer, then identify the nodes
‘q’ and ‘r’ in such a way that q has a back pointer to p and
r has a back pointer to q.
• Now copy the key value of node q into p.
• The back pointer of r is pointed to p.
• Set the forward pointer to q from its parent, to point to
the child of q.
Delete from Patricia

After
deleting
0010
Delete from Patricia

After
deleting 1100
MUTI-WAY TRIES
• A multi-way trie (or simply trie) is a tree data structure used to store strings of
varying length.
• The word ‘TRIE’ is extracted from the word ‘RETRIEVAL’.
• A trie is used for efficient retrieval of the data, i.e., for performing efficient
search on the data.
• A trie is a tree of degree m>=2 in which the branching at any level of the tree
is determined not by entire key value, but by only a portion of it.
• The trie consists of two types of nodes, i.e., element nodes and branch nodes.
• The element node has only a data field which consists of the key which is
being stored in the trie.
• The branch node consists of the pointers to other sub-trees which may again
contain pointers to other sub-trees or pointers to element nodes.
• The elements or keys are stored in the leaf nodes.
• The main advantage of trie data structure is that the strings of similar
character prefixes can use the same prefix data and store only tails as separate
data.
• Example: Consider the trie which stores English words of different
lengths. In this trie, each branch node contains 27 pointers, 26
pointers pointing to English alphabets and an extra pointer field
which stores a blank character that is used to terminate the keys.
• To access a key in the trie, we need to move down in a series of
branch nodes following the appropriate branches based on the
alphabetical characters forming the key.
• All the nodes which neither point to branch nodes or element nodes
are represented using NULL pointers.
• Thus the depth of the information nodes or element nodes depends on
the similarity of the first few characters (prefixes) with its fellow
keys.
• Operations:
• Searching
• Insertion
• Deletion
• Example: Consider set of records consisting of names, AadharID, date of joining,
and department name of the employees of an organization. Construct a trie using
Aadhar ID as the key field.

• Solution:
• Let us consider radix 10, so that we will have 10 pointers for each branch node,
from 0 to 9 . Examine the digits of the key AadharID from left to right. Using the
first digit of AadharID, partition the records into three groups. First group whose
AadharID begins with 5 (i.e., Rajani, Arun, and Sushrut), the second group whose
AadharID begins with 2 (i.e., Nirmal and Anshul) and the third group that starts
with 9 (i.e., Ram). Groups having more than one element are partitioned with the
help of the next digit in the key. This process of partitioning is continued until
every group has exactly one element in it.
Searching a Trie
• For searching an element in the trie, we start searching for the key
from the root node which is a branch node.
• Let us suppose the key k is made up of k1, k2, k3, kn characters.
• The first character of the key k1 is extracted and the respective
child pointer in the root node is identified.
• If it is an element node, its value is compared with the key and
otherwise if it is a branch node, the pointer at next character k2 is
considered.
• If this pointer points to an element node the key is compared with
the element node value and otherwise if it is a branch node, the
pointer at next character k3 is considered.
• The process is repeated until we reach the element node which is
equivalent to the key we are searching for.
Sampling Strategies
• F1 : sample(key,i) = keyi
• [Branching at level i is done basing on the ith character of the key]
• F2 : sample(key,i)= keyn-i+1 where n is number of characters in key.
• [Branching at level i is done basing on the n-i+1 character of the key]
• F3 : sample(key,i)= keyr(key,i) for r(key,i) is a randomization function
• [Branching at level i is done basing on the randomization function. The
randomization function will yield any value from 1 to n where n is the
no. of characters in key.]
• F4 : sample(key,i)= keyi/2 if i is even otherwise
• key n-(i-1)/2 if i is odd
• [Branching at level i is done basing on the i/2th character if i is even
and on n-(i-1)/2th character if i is odd.]
Trie obtained by applying sampling function F1
Trie obtained by applying sampling function
F2
Trie obtained by applying sampling function
F3

3rd character at level 1, 2nd character at Level


• If the maximum number of levels is limited to 'L', then all the
keys that are synonyms upto level L-1 are entered in the same
element node.
Example: The trie obtained by restricting the number of levels to 3.
The element nodes can hold more than one key value.
Inserting into a Trie.
• Inserting a key into the trie follows the same procedure as
the search. Insert telegram into the trie.
After inserting telegram, the trie is
Deletion from a trie
Delete ‘torn’ from the previous trie. The resultant trie is..
Keys with Different Length
• When the keys are of different length one key may be a prefix
of another key which is violating the no-prefix property.
• There are two ways to handle this keys collection. They are:

Case 1: Append a special character such as blank or # to the end


of each key.
Case 2: Attach to each branch node a data field, to store the
element whose
key exhausts at that node.
Case 2: Attach to each branch node a data field
to store the element whose key exhausts at that
node.
Each node attached with a data field
Compressed Tries
• A compressed trie can be obtained by eliminating all branch
nodes that have only one child in the trie.
• We can improve both time and space performance metrics of a
trie with the compressed trie.
• When branch nodes with a single child are removed from a trie,
we need to keep additional information so that trie operations
may be performed correctly.
• The additional information stored in a compressed trie can be
described by using the structures
• compressed tries with digit numbers,
• compressed tries with skip fields or
• compressed tries with labeled edges.
Compressed Tries with Digit Numbers
• In a compressed trie with digit numbers, an additional field
digitNumber is associated for every branch node that indicates
which digit of the key is used to branch to the next node from
this node.
Compressed Tries with Skip Fields
In a compressed trie with skip fields, each branch node has an additional
field skip which tells us the number of branch nodes that were originally
between the current branch node and its parent.
Compressed Tries with Labeled Edges
• In a compressed trie with labeled edges, each branch node has a
label associated with it that includes an element field and a
skip field. The element field consists of a pointer/reference
name to an element node in the sub-trie. The skip field is the
number of branch nodes eliminated between this branch node
and its parent.
TRIES AND INTERNET PACKET (IP)
FORWARDING
•Generally, the data packets are transmitted from source to destination in the
internet through a sequence of routers.
• Each router moves a packet one step ahead to reach the destination.
• Consider a packet which is to be transmitted from Delhi to Chennai.
• Firstly, the router in Delhi will process the packet and forward it to the next
neighbouring router Mumbai and then forwarded to the next neighbouring
router Hyderabad and from there the packet is routed to Chennai, the
destination.
• A routing table with the neighbouring nodes information is used to frame the
routing between the source and destination.
• A router table is a collection of rules of the form (P,NH) where P is the prefix
and NH is the next hop (i.e., the neighbouring node).
• For example, the rule (10*, A1) states that the next hop for the packets whose
destination address begins with 10 is A1.
• It is common that the router table has more than one rule that is matched with
the destination address.
• In this situation, the next hop is determined by the matching rule which has the
longest prefix.
• For example, consider there are only two rules (10*, A1) and (1011*, B1) in the
routing table that matched with the destination address that begins with the
sequence 1011.
• By default, the next hop is B1, as the packet forwarding in the internet is done by
finding the longest matching prefix.
1-bit Tries
• A 1-bit trie is similar to a binary tree where each node consists of a left child, left
data, right child, and right data.
• The prefix with the length ‘l’ will be stored in the level l of the trie.
• At level l, if the rightmost bit of the prefix with length l is 0, the prefix is
stored in the left data field or else the prefix is stored in the right data field.
• At level i, branching is done using bit i. When the bit i = 1, we move to the right
sub-tree and else we move to the left sub-tree.
• Example: Consider the prefixes.
Construct a 1-bit trie.
A1 = 101*,
A2 = 1*,
A3 = 1001*,
A4 = 10*,
A5 = 10000*,
A6 = 100001*.

• The height of the 1-bit trie is given


as O(W), where W is the length of
the longest prefix in the router
table.
• In the this example, the height of
the tree is O(6), as the longest
prefix 100001* is with the length 6.
Fixed–stride Tries
• We can reduce the access time in Fixed Stride Tries by constructing a trie with
lesser height.
• The stride of a node is defined to be the number of bits used at the node to
determine which branch to take. A node whose stride is s has 2s child fields and 2s
data fields.
• Example: For the previous prefixes of A1 to A6, Consider the FST with three levels.
Assume that the strides are 2, 3, and 2.
• In the trie, the root stores the prefix with the length 2, the level 2 node stores the
prefixes with the length of 5 (2+3), and the level 3 stores the prefix with 7, i.e.,
2+3+2.
• This poses a problem as the length of the prefixes is different from the storable
length.
• Consider the prefix A2 = 1* with the length of 1. To get rid of this problem, the
prefix with the different length is expanded to the next admissible length.
• In the above example, A2 = 1* is expanded to A2a = 10* and A2b = 11*. However, A4
= 10*, which is a longer match is selected instead of A2a, and it is removed.
• The expanded strides are:
• A1a = 10100*
• A1b = 10101*
• A1c = 10110*
• A1d = 10111*
• A2a = 10* ( same as A4)
• A2b = 11*
• A3a = 10010*
• A3b = 10010*
• A4 = 10*
• A5 = 10000*
• A6a = 1000010*
• A6b = 1000011*
Variable – stride Tries

In a variable–stride trie (VST), nodes


at the same level may have different
strides.
The stride for the root is 2, the left
child of root node is 5 and that for the
root’s right child is 3.
As a result, the memory required by
this VST is 4(root) + 32(left child of
root) + 8 (right child of root) = 44.

You might also like