CSA Lab 10
CSA Lab 10
LAB # 10
HOFFMAN CODES
OBJECTIVE
To implement Hoffman codes based on Greedy Algorithm to encode the data with the prefix code.
THEORY
HOFFMAN CODES:
Huffman Coding is a famous Greedy Algorithm. It is used for the lossless compression of data. It uses variable
length encoding. It assigns variable length code to all the characters. The code length of a character depends on
how frequently it occurs in the given text. The character which occurs most frequently gets the smallest code.
The character which occurs least frequently gets the largest code. It is also known as Huffman Encoding.
Prefix Rule:
Huffman Coding implements a rule known as a prefix rule. This is to prevent the ambiguities while decoding.
It ensures that the code assigned to any character is not a prefix of the code assigned to any other character.
Input is an array of unique characters along with their frequency of occurrences and output is Huffman Tree.
1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a
priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least
frequent character is at root)
2. Extract two nodes with the minimum frequency from the min heap.
3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first
extracted node as its left child and the other extracted node as its right child. Add this node to the min
heap.
4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and
the tree is complete.
Time Complexity:
The time complexity analysis of Huffman Coding is as follows- extractMin( ) is called 2 x (n-1)
times if there are n nodes.
As extractMin( ) calls minHeapify( ), it takes O(logn) time.
Thus, Overall time complexity of Huffman Coding becomes O(nlogn). Here, n is the number of
unique characters in the given text.
EXERCISE
A. Given a string S of distinct character of size N and their corresponding frequency f [] i.e.
character S[i] has f[i] frequency. Your task is to build the Huffman tree print all the Huffman
codes in preorder traversal of the tree.
NOTE: If two elements have same frequency, then the element which occur at first will be taken on
the left of Binary Tree and other one to the right.
Source Code:
# A Huffman Tree Node
class node:
def __init__(self, freq, symbol, left=None, right=None):
# frequency of symbol
self.freq = freq
# symbol name (character)
self.symbol = symbol
# node left of current node
self.left = left
# node right of current node
self.right = right
# tree direction (0/1)
self.huff = ''
# utility function to print huffman codes for all symbols in the newly created Huffman tree
def printNodes(node, val=''):
# huffman code for current node
newVal = val + str(node.huff)
# if node is not an edge node then traverse inside it
if(node.left):
printNodes(node.left, newVal)
if(node.right):
printNodes(node.right, newVal)
# if node is edge node then display its huffman code
Lab 10: Huffman Codes
Name: Tanzeel Ur Rehman 2 Roll no: BMCS22S-
002
Computer System Algorithm (MCS-205) SUET/QR/114
if(not node.left and not node.right):
print(f"{node.symbol} -> {newVal}")
# characters for huffman tree
chars = ['a', 'b', 'c', 'd', 'e', 'f']
# frequency of characters
freq = [ 5, 9, 12, 13, 16, 45]
# list containing unused nodes
nodes = []
# converting characters and frequencies into huffman tree nodes
for x in range(len(chars)):
nodes.append(node(freq[x], chars[x]))
while len(nodes) > 1:
# sort all the nodes in ascending order based on theri frequency
nodes = sorted(nodes, key=lambda x: x.freq)
# pick 2 smallest nodes
left = nodes[0]
right = nodes[1]
# assign directional value to these nodes
left.huff = 0
right.huff = 1
# combine the 2 smallest nodes to create new node as their parent
newNode = node(left.freq+right.freq, left.symbol+right.symbol, left, right)
# remove the 2 nodes and add their parent as new node among others
nodes.remove(left)
nodes.remove(right)
nodes.append(newNode)
# Huffman Tree is ready!
printNodes(nodes[0])
Output:
B. How many bits may be required for encoding the message ‘mississippi’?
Solution:
Character Frequency
M 1
P 2
S 4
I 4
11
0 1
2 7
0 1
4 3 5
0 1
m 4
p
1 2