0% found this document useful (0 votes)
2 views3 pages

Huffman Coding Algorithm

The document explains Huffman Coding, a variable-length encoding scheme that efficiently encodes data by assigning shorter binary codes to more frequently occurring characters. It highlights the advantages of Huffman Encoding, such as storage space savings and prefix-free codes, while also noting its disadvantages, including slower processing and challenges in detecting data corruption. Real-life applications include image, audio, and text compression, as well as data transmission over networks.

Uploaded by

naima fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Huffman Coding Algorithm

The document explains Huffman Coding, a variable-length encoding scheme that efficiently encodes data by assigning shorter binary codes to more frequently occurring characters. It highlights the advantages of Huffman Encoding, such as storage space savings and prefix-free codes, while also noting its disadvantages, including slower processing and challenges in detecting data corruption. Real-life applications include image, audio, and text compression, as well as data transmission over networks.

Uploaded by

naima fatima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Huffman Coding Algorithm

Every information in computer science is encoded as strings of 1s and 0s. The objective of
information theory is to usually transmit information using fewest number of bits in such a
way that every encoding is unambiguous. This tutorial discusses about fixed-length and
variable-length encoding along with Huffman Encoding which is the basis for all data
encoding schemes

Encoding, in computers, can be defined as the process of transmitting or storing sequence of


characters efficiently. Fixed-length and variable length are two types of encoding schemes,
explained as follows-

Fixed-Length encoding - Every character is assigned a binary code using same number of
bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for storage or transmission,
assuming that each character uses 8 bits.

Variable- Length encoding - As opposed to Fixed-length encoding, this scheme uses


variable number of bits for encoding the characters depending on their frequency in the given
text. Thus, for a given string like “aabacdad”, frequency of characters „a‟, „b‟, „c‟ and „d‟ is
4,1,1 and 2 respectively. Since „a‟ occurs more frequently than „b‟, „c‟ and „d‟, it uses least
number of bits, followed by „d‟, „b‟ and „c‟. Suppose we randomly assign binary codes to
each character as follows-

a0
b 011
c 111
d 11

Thus, the string “aabacdad” gets encoded to 00011011111011 (0 | 0 | 011 | 0 | 111 | 11 | 0 |


11), using fewer number of bits compared to fixed-length encoding scheme.

But the real problem lies with the decoding phase. If we try and decode the string
00011011111011, it will be quite ambiguous since, it can be decoded to the multiple strings,
few of which are-

aaadacdad (0 | 0 | 0 | 11 | 0 | 111 | 11 | 0 | 11)


aaadbcad (0 | 0 | 0 | 11 | 011 | 111 | 0 | 11)
aabbcb (0 | 0 | 011 | 011 | 111 | 011)
… and so on

To prevent such ambiguities during decoding, the encoding phase should satisfy the “prefix
rule” which states that no binary code should be a prefix of another code. This will produce
uniquely decodable codes. The above codes for „a‟, „b‟, „c‟ and „d‟ do not follow prefix rule
since the binary code for a, i.e. 0, is a prefix of binary code for b i.e 011, resulting in
ambiguous decodable codes.
Let‟s reconsider assigning the binary codes to characters „a‟, „b‟, „c‟ and „d‟.

a0
b 11
c 101
d 100

Using the above codes, string “aabacdad” gets encoded to 001101011000100 (0 | 0 | 11 | 0 |


101 | 100 | 0 | 100). Now, we can decode it back to string “aabacdad”.

Advantages of Huffman Encoding-

 This encoding scheme results in saving lot of storage space, since the binary codes
generated are variable in length

 It generates shorter binary codes for encoding symbols/characters that appear more
frequently in the input string

 The binary codes generated are prefix-free

Disadvantages of Huffman Encoding-

 Lossless data encoding schemes, like Huffman encoding, achieve a lower


compression ratio compared to lossy encoding techniques. Thus, lossless techniques
like Huffman encoding are suitable only for encoding text and program files and are
unsuitable for encoding digital images.

 Huffman encoding is a relatively slower process since it uses two passes- one for
building the statistical model and another for encoding. Thus, the lossless techniques
that use Huffman encoding are considerably slower than others.

 Since length of all the binary codes is different, it becomes difficult for the decoding
software to detect whether the encoded data is corrupt. This can result in an incorrect
decoding and subsequently, a wrong output.

Real-life applications of Huffman Encoding-

Huffman coding is a data compression technique that can be used in a variety of applications.
Here are just a few examples:

1. Image compression: By compressing image data, Huffman coding can reduce the
amount of storage space required for digital images. This can be especially beneficial
for large images or images with a lot of detail.
2. Audio compression: Similar to image compression, Huffman coding can also be used
to compress audio data. This can save storage space and bandwidth when streaming or
downloading audio files.
3. Text compression: Text files can also be compressed using Huffman coding. This can
be useful for reducing the size of text documents or email attachments.
4. Data transmission: When data is transmitted over a network, it is often compressed
using Huffman coding to reduce the amount of bandwidth required. This can help to
speed up data transfer and reduce costs associated with transmitting data.

You might also like