Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
IMPLEMENTATION
Submitted by
KESARWANI AASHITA
Int. M.Sc. in Applied Mathematics (3RD YEAR)
DEPARTMENT OF MATHEMATICS
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
ROORKEE-247667 (INDIA)
Acknowledgement
-Kesarwani Aashita
ABSTRACT
Huffman Tree
Step 1:- Create a leaf node for each symbol and add it to the priority
queue (i.e.Create a min heap of Binary trees and heapify it).
Step 2:- While there is more than one node in the queue (i.e. min heap):
Step 3:- The remaining node is the root node and the Huffman tree is
complete.
Joining trees by frequency is the same as merging sequences by length
in optimal merge. Since a node with only one child is not optimal, any
Huffman coding corresponds to a full binary tree.
Definition of optimal merge: Let D={n1, ... , nk} be the set of lengths of
sequences to be merged. Take the two shortest sequences, ni, nj∈ D, such
that n≥ ni and n≥ nj ∀ n∈ D. Merge these two sequences. The new set D is
D' = (D - {ni, nj}) ∪ {ni+nj}. Repeat until there is only one sequence.
Since efficient priority queue data structures require O(log n) time per
insertion, and a tree with n leaves has 2n−1 nodes, this algorithm
operates in O(n log n) time.
The worst case for Huffman coding (or, equivalently, the longest
Huffman coding for a set of characters) is when the distribution of
frequencies follows the Fibonacci numbers.
If the estimated probabilities of occurrence of all the symbols are same
and the number of symbols are a power of two, Huffman coding is same
as simple binary block encoding, e.g., ASCII coding.
Although Huffman's original algorithm is optimal for a symbol-by-
symbol coding (i.e. a stream of unrelated symbols) with a known input
probability distribution, it is not optimal when the symbol-by-symbol
restriction is dropped, or when the probability mass functions are
unknown, not identically distributed, or not independent (e.g., "cat" is
more common than "cta").
class minHeap
{
private:
BinaryTree *T; // Array of Binary Trees
int n; // Number of symbols
public:
minHeap();
void heapify(int i);
BinaryTree dequeue(); // Returns the first Binary Tree of the min heap and
// then heapify the array of Binary trees in order of the
//frequencies of their root nodes.
void enqueue(BinaryTree b); // To insert another Binary tree
// and then heapify the array of Binary trees
void print();
friend class HuffmanCode;
};
class HuffmanCode
{
private:
BinaryTree HuffmanTree; // (a minimum weighted external path length tree)
public:
HuffmanCode();
};
HuffmanCode::HuffmanCode()
{
minHeap Heap;
// Huffman Tree is build from bottom to top.
// The symbols with lowest frequency are at the bottom of the tree
// that leads to longer codes for lower frequency symbols and hence
// shorter codes for higher frequency symbol giving OPTIMAL code length.
while (Heap.T[0].root->freq>1)
{
// The first two trees with min. priority (i.e. frequency) are taken and
BinaryTree l=Heap.dequeue();
cout<<"\nAfter dequeueing "<<l.root->freq<<endl;
Heap.print();
BinaryTree r=Heap.dequeue();
cout<<"\nAfter dequeueing "<<r.root->freq<<endl;
Heap.print();
// a new tree is constructed taking the above trees as left and right sub-trees
// with the frequency of root node as the sum of frequencies of left & right child.
HuffmanTree.root=new node;
HuffmanTree.root->info='\0';
HuffmanTree.root->freq=l.root->freq + r.root->freq;
HuffmanTree.root->Llink=l.root;
HuffmanTree.root->Rlink=r.root;
// then it is inserted in the array and array is heapified again.
// Deletion and Insertion at an intermediate step is facilitated in heap-sort.
Heap.enqueue(HuffmanTree);
cout<<"\nAfter enqueueing "<<l.root->freq<<"+"<<r.root->freq<<"=
"<<HuffmanTree.root->freq<<endl;
Heap.print();
}
//The process continues till only one tree is left in the array of heap.
cout<<"\nThe process is completed and Huffman Tree is obtained\n";
HuffmanTree=Heap.T[1]; // This tree is our HuffmanTree used for coding
delete []Heap.T;
cout<<"Traversal of Huffman Tree\n\n";
HuffmanTree.print();
cout<<"\nThe symbols with their codes are as follows\n";
HuffmanTree.assign_code(0); // Codes are assigned to the symbols
cout<<"Enter the string to be encoded by Huffman Coding: ";
char *str;
str=new char[30];
cin>>str;
HuffmanTree.encode(str);
cout<<"Enter the code to be decoded by Huffman Coding: ";
char *cd;
cd=new char[50];
cin>>cd;
int length;
cout<<"Enter its code length: ";
cin>>length;
HuffmanTree.decode(cd,length);
delete [ ]cd;
delete [ ]str;
}
minHeap::minHeap()
{
cout<<"Enter no. of symbols:";
cin>>n;
T= new BinaryTree [n+1];
T[0].root=new node;
T[0].root->freq=n; //Number of elements in min. Heap at any time is stored in the
// zeroth element of the heap
for (int i=1; i<=n; i++)
{
T[i].root=new node;
cout<<"Enter characters of string :- ";
cin>>T[i].root->info;
cout<<"and their frequency of occurence in the string:- ";
cin>>T[i].root->freq;
T[i].root->code=NULL;
T[i].root->Llink=NULL;
T[i].root->Rlink=NULL;
// Initially, all the nodes are leaf nodes and stored as an array of trees.
}
cout<<endl;
int i=(int)(n / 2);// Heapification will be started from the PARENT element of
//the last ( 'n th' ) element in the heap.
cout<<"\nAs elements are entered\n";
print();
while (i>0)
{
heapify(i);
i--;
}
cout<<"\nAfter heapification \n";
print();
}
BinaryTree minHeap::dequeue()
{
BinaryTree b=T[1];
T[1]= T[T[0].root->freq];
T[0].root->freq--;
if (T[0].root->freq!=1)
heapify(1);
return b;
}
void minHeap::enqueue(BinaryTree b)
{
T[0].root->freq++;
T[T[0].root->freq]=b;
int i=(int) (T[0].root->freq /2 );
while (i>0)
{
heapify (i);
i=(int) (i /2 );
}
}
void BinaryTree::assign_code(int i)
{
if (root==NULL)
return;
if (isleaf(root))
{
root->code[i]='\0';
cout<<root->info<<"\t"<<root->code<<"\n";
return;
}
BinaryTree l,r;
l.root=root->Llink;
r.root=root->Rlink;
l.root->code=new char[i+1];
r.root->code=new char[i+1];
for (int k=0; k<i; k++)
{
l.root->code[k]=root->code[k];
r.root->code[k]=root->code[k];
}
l.root->code[i]='0';
r.root->code[i]='1';
i++;
l.assign_code(i);
r.assign_code(i);
}
void BinaryTree::print_code(char c)
{
int f=0;
if (isleaf(root))
{
if (c==root->info)
{f=1; cout<<root->code;}
return ;
}
BinaryTree l,r;
l.root=root->Llink;
if (f!=1)
l.print_code(c);
r.root=root->Rlink;
if (f!=1)
r.print_code(c);
}
void BinaryTree::print()
{
if (root==NULL)
return;
cout<<root->info<<"\t"<<root->freq<<"\n";
if (isleaf(root))
return;
BinaryTree l,r;
l.root=root->Llink;
r.root=root->Rlink;
l.print();
r.print();
}
int ispowerof2(int i)
{
if (i==1)
return 0;
if (i==0)
return 1;
while (i>2)
{
if (i%2!=0)
return 0;
i=i/2;
}
return 1;
}
int fn(int l)
{
if (l==1||l==0)
return 0;
return 2*fn(l-1)+1;
}
void minHeap::print()
{
cout<<"The Heap showing the root frequencies of the Binary Trees are:\n";
if (T[0].root->freq==0)
{
cout<<endl;
return;
}
int level=1;
while( T[0].root->freq >= power(2,level) ) // 2^n-1 is the max. no. of nodes
///in a complete tree of n levels
level++;
if(level==1)
{
cout<<T[1].root->freq<<”\n”;
return;
}
for (int i=1; i<=T[0].root->freq; i++)
{
if (ispowerof2(i))
{cout<<”\n”; level--;}
for (int k=1; k<=fn(level); k++)
cout<<” “;
cout<<T[i].root->freq<<” “;
for (int k=1; k<=fn(level); k++)
cout<<” “;
}
cout<<endl;
}
int main()
{
HuffmanCode c;
system (“pause”);
return 0;}
Output
4. Variations of Huffman Coding:
a) n-ary Huffman coding
The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to
encode message and build an n-ary tree.
b) Adaptive Huffman coding
It calculates the probabilities dynamically based on recent actual
frequencies in the source string. This is somewhat related to
the LZ family of algorithms.
c) Huffman template algorithm
The Huffman template algorithm enables one to use any kind of
weights (costs, frequencies, pairs of weights, non-numerical
weights) and one of many combining methods (not just addition).
d) Optimal alphabetic binary trees (Hu-Tucker coding)
In the alphabetic version, the alphabetic order of inputs and
outputs must be identical.This is also known as the Hu-
Tucker problem, after the authors of the paper presenting the
first linearithmic solution to this optimal binary alphabetic
problem, which has some similarities to Huffman algorithm, but is
not a variation of this algorithm. These optimal alphabetic binary
trees are often used as binary search trees.
e) The canonical Huffman code
If weights corresponding to the alphabetically ordered inputs are
in numerical order, the Huffman code has the same lengths as the
optimal alphabetic code, which can be found from calculating these
lengths, rendering Hu-Tucker coding unnecessary. The code
resulting from numerically (re-)ordered input is sometimes called
the canonical Huffman code and is often the code used in practice,
due to ease of encoding/decoding. The technique for finding this
code is sometimes called Huffman-Shannon-Fano coding, since it
is optimal like Huffman coding, but alphabetic in weight
probability, like Shannon-Fano coding.
5. Applications:
Arithmetic coding can be viewed as a generalization of Huffman
coding; indeed, in practice arithmetic coding is often preceded by
Huffman coding, as it is easier to find an arithmetic code for a binary
input than for a nonbinary input. Also, although arithmetic coding
offers better compression performance than Huffman coding,
Huffman coding is still in wide use because of its simplicity, high
speed and lack of encumbrance by patents.
Huffman coding today is often used as a "back-end" to some other
compression method. DEFLATE (PKZIP's algorithm) and
multimedia codecs such as JPEG and MP3 have a front-end model
and quantization followed by Huffman coding.
6. References