A level Computer
Science
1.4.2 Data Structures
Binary Search Trees and Hash Tables
Specification
Today we will look at binary search trees and
hash tables, two data structures whereby the
primary function is the fast retrieval of data.
Binary Search Tree
In arrays, binary search compares the target value to the middle element of
an ordered array. If they are not equal, the half in which the target cannot
lie is eliminated and the search continues on the remaining half, again
taking the middle element to compare to the target value, and repeating
this until the target value is found. If the search ends with the remaining
half being empty, the target is not in the array.
Binary search trees keep their keys in sorted order, so that lookup and
other operations can use the principle of binary search: when looking for a
key in a tree (or a place to insert a new key), they traverse the tree from
root to leaf, making comparisons to keys stored in the nodes of the tree and
deciding, on the basis of the comparison, to continue searching in the left
or right subtrees. The height of the tree is determines the search speed.
BST are optimised for searching by balancing them, where the tree is
rebalanced to keep the height of the tree to a minimum.
Balanced BST
Unbalanced BST Balanced BST
Constructing a BST
Given an unsorted list of numbers, insert them into a binary search
tree given a specified key order, e.g. alphabetical or ascending
numerical etc.
Make the first item the root node.
For each item in the list, visit the root (current node) and branch left if
less than or branch right if greater than or equal to.
Continue down the branch reapplying the rule at each node visited
until a leaf node is reached and then add the node as either the left
or right child of that leaf.
16, 9 , 15, 17, 12, 3, 19, 16, 7, 20 – insert this list into a BST on the
next slide
BST for 16, 9 , 15, 17, 12, 3, 19, 16, 7,
20
Hash tables
Very large data sets can result in relatively slow retrieval times if the data
has to be access sequentially. One of the methods used to improve access
rates is to implement a hash table.
A hash table is a data structure that implements an associative array
(dictionary) - a structure that can map keys to values – using a hash
function to compute an index into an array of buckets, from which the
desired value can be found.
A hash function is any function that can be used to map data of arbitrary
size to data of a fixed size, e.g. names mapped to 100 buckets in an phone
book.
Ideally, the hash function will assign each key to a unique bucket, but most
hash table designs employ an imperfect hash function, which might cause
hash collisions where the hash function generates the same index for more
than one key. Such collisions must be accommodated in some way.
Hash function algorithms and
collisions
A simple hash function might be to take a numeric representation of a key (or
part of a key) and divide by the number of available addresses and using the
remainder as the address i.e. key mod N.
Hash functions need to be designed to map keys as uniformly to the available
addresses as possible in order to reduce collisions.
A collision happens when a function maps different keys to the same address.
If a collision happens then rehashing is required to find an empty slot.
For instance if a collision occurs then the algorithm iterates through the array
until the next available empty slot is found.
Other methods could perform a second mapping function to generate a new
key etc.
Research challenge – explore one more hash algorithm.
Recap of Trees and Graphs
Trees are hierarchical (non linear) data structure consisting of
different levels of connected nodes.
There are one-to-many links between nodes on one level and their
descendants on the next. Links are called edges or branches.
Nodes on the same level are not connected.
Each node has a unique parent node, i.e. it is a child of another node,
the exception being the root node.
The root node is the entry node to the tree. It has no parent.
A terminal or leaf node has no descendants.
Data is held in the tree in an order determined by its traversal
method.
Various traversal algorithms exist to create different tree structures.
Trees continued
Tree diagram Parent nodes can have one or
more child nodes.
In this example node B has 3
children, D, E & F.
A special type of tree is a
binary tree in which each node
has a maximum of two
children.
Binary trees are a particularly
important data structure for
searching for data held in a
sequence.
Binary Tree
Binary tree diagram Array Implementation
Inde Left Data Righ
x t
[0] 1 “1” 2
[1] 3 “2” 4
[2] - “3” -
[3] - “4” -
[4] - “5” -
Traversal Traversal of a tree (or graph) can
be classified as depth first or
breadth first.
Depth First Traversals:
(a) In-order (Left, Root, Right) :
42513
(b) Pre-order (Root, Left, Right) :
12453
(c) Post-order (Left, Right, Root) :
45231
Breadth First Traversal :
12345
Data must be added or removed in
order to maintain a specific order
determined by the traversal
method. In this example that
would typically be a breadth first
method.
Graphs
Graph diagram A graph consists of a finite set
of nodes or vertices, connected
by edges.
Graphs can be undirected,
whereby all edges are
bidirectional, or directed
(digraph) whereby edges are
traversed in the direction
indicated by arrows.
Edges can be weighted,
indicating the cost to go from
one vertex to another.
Implementations
Adjacency matrix Adjacency list
Unweighted graphs would
indicate connections with
1s
Graph Traversal
There are two ways to traverse a graph so that every node is visited,
depth first and breadth first.
In depth-first traverse as far down one route as you can before
backtracking and taking the next route.
In breadth-first visit all the neighbouring nodes and then all the
neighbours of the first node visited and then all the neighbours of the
second node visited etc.
Order in which Order in which
nodes are visited in nodes are visited in
a DFS a BFS:
A, B, D, F, E, C, G A, B, C, E, D, F, G