Huffman Coding Algorithm
Huffman Coding Algorithm
Every information in computer science is encoded as strings of 1s and 0s. The objective of
information theory is to usually transmit information using fewest number of bits in such a
way that every encoding is unambiguous. This tutorial discusses about fixed-length and
variable-length encoding along with Huffman Encoding which is the basis for all data
encoding schemes
Fixed-Length encoding - Every character is assigned a binary code using same number of
bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for storage or transmission,
assuming that each character uses 8 bits.
a0
b 011
c 111
d 11
But the real problem lies with the decoding phase. If we try and decode the string
00011011111011, it will be quite ambiguous since, it can be decoded to the multiple strings,
few of which are-
To prevent such ambiguities during decoding, the encoding phase should satisfy the “prefix
rule” which states that no binary code should be a prefix of another code. This will produce
uniquely decodable codes. The above codes for „a‟, „b‟, „c‟ and „d‟ do not follow prefix rule
since the binary code for a, i.e. 0, is a prefix of binary code for b i.e 011, resulting in
ambiguous decodable codes.
Let‟s reconsider assigning the binary codes to characters „a‟, „b‟, „c‟ and „d‟.
a0
b 11
c 101
d 100
This encoding scheme results in saving lot of storage space, since the binary codes
generated are variable in length
It generates shorter binary codes for encoding symbols/characters that appear more
frequently in the input string
Huffman encoding is a relatively slower process since it uses two passes- one for
building the statistical model and another for encoding. Thus, the lossless techniques
that use Huffman encoding are considerably slower than others.
Since length of all the binary codes is different, it becomes difficult for the decoding
software to detect whether the encoded data is corrupt. This can result in an incorrect
decoding and subsequently, a wrong output.
Huffman coding is a data compression technique that can be used in a variety of applications.
Here are just a few examples:
1. Image compression: By compressing image data, Huffman coding can reduce the
amount of storage space required for digital images. This can be especially beneficial
for large images or images with a lot of detail.
2. Audio compression: Similar to image compression, Huffman coding can also be used
to compress audio data. This can save storage space and bandwidth when streaming or
downloading audio files.
3. Text compression: Text files can also be compressed using Huffman coding. This can
be useful for reducing the size of text documents or email attachments.
4. Data transmission: When data is transmitted over a network, it is often compressed
using Huffman coding to reduce the amount of bandwidth required. This can help to
speed up data transfer and reduce costs associated with transmitting data.