ITC_2020_21_Lecture_6
ITC_2020_21_Lecture_6
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 1
Lecture - 6
Module - 1
Information and Source Coding
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 2
Outline
• Shannon-Fano-Elias Coding
• Problem With Prefix Codes
• Arithmetic Coding
• The Lempel-Ziv Algorithm
• Run Length Encoding (RLE)
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 3
Shannon-Fano-Elias Coding
• Codes that use the codeword lengths of l(x) are called Shannon Codes.
1
l x log
P x
• Shannon codeword lengths satisfy the Kraft Inequality and can therefore be used to construct a
uniquely decodable code.
• We will discuss another simple method for constructing uniquely decodable codes based on
Shannon-Fano-Elias encoding technique.
• It uses the Cumulative Distribution Function to allocate the codewords.
• The cumulative distribution function is defined as
F x P z
z x
where P(z) are the probability of occurrence of the z.
• The cumulative distribution function consists of step size
P(x), as shown in the figure.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 4
Shannon-Fano-Elias Coding
• Let us define modified cumulative distribution function as
1
F x P z P x
z x 2
where F̅(x) represents the sum of the probabilities of all symbols less than x plus half of the
probability of the symbols x.
• The value of the function F̅(x) is the midpoint of the step corresponding to x of the cumulative
distribution function.
• Since probabilities are positive, F(x) ≠ F(y) if x ≠ y.
• Thus, it is possible to determine x given F̅(x) merely by looking at the graph of the cumulative
distribution function. Therefore, the value of F̅(x) can be used to code x.
• In general, F̅(x) is a real number.
• This means we require an infinite number of bits to represent F̅(x), which would lead to an
inefficient code.
• Suppose we round off F̅(x) and use only the first l(x) bits, denoted by
F x
l x
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 5
Shannon-Fano-Elias Coding
• By the definition of rounding off, we have
1
F x F x
2
l x l x
1 1 P x
• If l x log 1, then l x F x F x 1
P x 2 2
• This implies that F x l x lies within the step corresponding to x, and l(x) bits are sufficient to
describe x.
• The interval corresponding to any codeword is of length 2–l(x). We see that this interval is less than
half the height of the step corresponding to x.
1
• Since we use l x log 1 bits to represent x, the expected length of this code is
P x
1
R P x l x P x log 1 H X 2
x x
P x
• The Shannon-Fano-Elias coding scheme achieves codeword length within two bits of the entropy.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 6
Shannon-Fano-Elias Coding (Example - 1)
• Consider the D-adic distribution given in the following table.
1
Symbol Probability F(x) F̅(x) F̅(x) (binary) l x log 1 Codeword
P x
• The entropy of this distribution is 1.75 bits. However, the average codeword length for the
Shannon-Fano-Elias coding scheme is 2.75 bits.
• It is easy to observe that if the last bit from all the codeword is deleted, we get the optimal code
(Huffman code).
• It is worthwhile to note that unlike in Huffman coding procedure, here we do not have to arrange
the probabilities in decreasing order first.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 7
Shannon-Fano-Elias Coding (Example - 2)
• Let us shuffle the probabilities and redo the exercise.
1
Symbol Probability F(x) F̅(x) F̅(x) (binary) l x log 1 Codeword
P x
• We observe that the codewords obtained from the Shannon-Fano-Elias coding procedure is not
unique. The average codeword length is again 2.75 bits.
• However, this time we cannot get the optimal code simply by deleting the last bit from every
codeword. If we do so, the code no longer remains a prefix code.
• The basic concept of Shannon-Fano-Elias coding is used in a computationally efficient algorithm
for encoding and decoding called Arithmetic Coding.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 8
Shannon-Fano-Elias Coding (Example - 3)
1
Symbol Probability F(x) F̅(x) F̅(x) (binary) l x log 1 Codeword
P x
Note:
• The bits shown in red keep on recurring.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 9
Problem With Prefix Codes
• If we consider the prefix codes being generated using a binary tree, the decisions between tree
branches always take one bit.
• The Huffman Code also needs one bit for each decision.
• Huffman Codes are only optimal if the probabilities of the symbols are negative powers of two.
• Arithmetic Coding does not have this restriction.
• It works by representing the file to be encoded by an interval of real numbers between 0 and 1.
• Successive symbols in the message reduce this interval according to the probability of that symbol.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 10
Arithmetic Coding
• Let our alphabet consists of only three symbols A, B and C with probabilities of occurrence P(A) =
0.5, P(B) = 0.25 and P(C) = 0.25.
• We first divide the interval [0, 1) into three intervals proportional to their probabilities.
• Thus, the variable A corresponds to [0, 0.5), the variable B corresponds to [0.5, 0.75) and the
variable C corresponds to [0.75, 1.0).
• Note that the lengths of these intervals are proportional to their probabilities.
• Next, suppose the input symbol stream is B A C A ...
• We first encode B. This is simply choosing the corresponding interval, i.e., [0.5, 0.75).
• Now, this interval is again subdivided into three intervals, proportional to the probabilities of
occurrence.
• So, for the second step, the variable A corresponds to [0.5, 0.625), the variable B corresponds to
[0.625, 0.6875) and the variable C corresponds to [0.6875, 0.75).
• Since the next symbol to arrive after B is A, we choose the interval corresponding to A, which is
[0.5, 0.625).
• This is again subdivided to yield the interval [0.5, 0.5625) for A, the interval [0.5625, 0.59375) for
B and the interval [0.59375, 0.625) for C.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 11
Arithmetic Coding
• Now we look at the next symbol to encode, which is C. This corresponds to the interval [0.59375,
0.625).
• Continuing this process, after encoding A, we are left with the interval [0.59375, 0.609375).
• The arithmetic code for B A C A is any number that lies within this interval.
• To complete this example, we can say that the arithmetic code for the sequence B A C A is 0.59375.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 12
Arithmetic Coding
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 13
Arithmetic Coding
• Next consider the decoding at the receiver. The receiver needs to know a-priori, the probabilities of
A, B and C.
• So, it will also have an identical number line, partitioned into three segments, proportional to the
probabilities of A, B and C.
• Let us say that the receiver receives 0.59375.
• First it checks where this number lies. Clearly, 0.5 < 0.59375 < 0.75, which is the segment
corresponding to B. So the 1st decoded symbol is B.
• Now we split the segment corresponding to B into three sub-segments proportional to the
probabilities of A, B and C (exactly as we did at the encoder side).
• Again we map the received number 0.59375, and find that it lies in the region of A.
• The decoding is instantaneous.
• Mechanically, we proceed to decode the next symbol, and so on.
• But, how will the receiver know when to stop?
• Therefore, we need to have a stopping criterion, or a pre-decided protocol to do so.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 14
The Lempel-Ziv Algorithm
• Huffman coding requires symbol probabilities.
• But most real life scenarios do not provide the symbol probabilities in advance (i.e., the statistics of
the source is unknown).
• In principle, it is possible to observe the output of the source for a long enough time period and
estimate the symbol probabilities. However, this is impractical for real-time application.
• Also, Huffman coding is optimal for a DMS source where the occurrence of one symbol does not
alter the probabilities of the subsequent symbols.
• Huffman coding is not the best choice for a source with memory.
• For example, consider the problem of compression of written text.
• We know that many letters occur in pairs or groups, like ‘q-u’, ‘t-h’, ‘i-n-g’etc.
• It might be more efficient to use the statistical inter-dependence of the letters in the alphabet along
with their individual probabilities of occurrence.
• Such a scheme was proposed by Lempel and Ziv in 1977. Their source coding algorithm does not
need the source statistics.
• It is a variable-to-fixed length source coding algorithm and belongs to the class of Universal Source
Coding algorithms.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 15
The Lempel-Ziv Algorithm
• The logic behind Lempel-Ziv Universal Coding is as follows.
• The compression of an arbitrary sequence of bits is possible by coding a series of 0’s and 1’s as
some previous such string (the prefix string) plus one new bit.
• Then, the new string formed by adding the new bit to the previously used prefix string becomes a
potential prefix string for future strings.
• These variable length blocks are called phrases.
• The phrases are listed in a dictionary which stores the existing phrases and their locations.
• In encoding a new phrase, we specify the location of the existing phrase in the dictionary and
append the new letter.
• We can derive a better understanding of how the Lempel-Ziv algorithm works by the following
example.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 16
The Lempel-Ziv Algorithm (Example)
• Suppose we wish to code the string:
101011011010101011
• We will begin by parsing it into comma-separated phrases that represent strings that can be
represented by a previous string as a prefix, plus a bit.
• The first bit, a 1, has no predecessors, so it has a null prefix string and the one extra bit is itself.
1, 01011011010101011
• The same goes for the 0 that follows since it can’t be expressed in terms of the only existing prefix:
1, 0, 1011011010101011
• So far our dictionary contains the strings ‘1’and ‘0’.
• Next we encounter a 1, but it already exists in our dictionary. Hence we proceed further.
• The following 10 is obviously a combination of the prefix 1 and a 0, so we now have:
1, 0, 10, 11011010101011
• Continuing in this way, we eventually parse the whole string as follows:
1, 0, 10, 11, 01, 101, 010, 1011
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 17
The Lempel-Ziv Algorithm (Example)
• Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first
seven phrases for a total of 8 numbered phrases (with the ninth and last phrase we found being
expressed in the others, hence not required to be numbered).
• Next, we write the string in terms of the prefix phrase plus the new bit needed to create the new
phrase.
• We will use parentheses and commas to separate these at first, in order to aid our visualization of
the process.
• The eight phrases can be described by:
(000,1), (000,0), (001,0), (001,1), (010,1), (011,1), (101,0), (110,1)
• It can be read out as: (codeword at location 0,1), (codeword at location 0,0), (codeword at location
1,0), (codeword at location 1,1), (codeword at location 2,1), (codeword at location 3,1) ...
• Thus, the coded version of the above string is:
00010000001000110101011110101101
• The dictionary for this example is given in the table (next slide).
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 18
The Lempel-Ziv Algorithm (Example)
Table: Dictionary for the Lempel-Ziv algorithm
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 19
The Lempel-Ziv Algorithm (Length of the Table)
• In this case, we have not obtained any compression, or coded string is actually longer.
• However, the larger the initial string, the more saving we get as we move along, because prefixes
that are quite larger become representable as small numerical indices.
• In fact, Ziv proved that for long documents, the compression of the file approaches the optimum
obtainable as determined by the information content of the document.
• The next question is what should be the length of the table.
• In practical application, regardless of the length of the table, it will eventually overflow.
• This problem can be solved by pre-deciding a large enough size of the dictionary.
• The encoder and decoder can update their dictionaries by periodically substituting the less used
phrases from their dictionaries by more used ones.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 20
Run Length Encoding (RLE)
• Run Length Encoding or RLE is a technique used to reduce the size of a repeating string of
characters.
• This repeating string is called a run.
• Typically RLE encodes a run of symbols into two bytes, a count and a symbol.
• RLE can compress any type of data regardless of its information content, but the content of data to
be compressed affects the compression ratio.
• RLE cannot achieve high compression ratios compared to other compression methods, but it is easy
to implement and is quick to execute.
• RLE is supported by most bitmap file formats such as TIFF, JPG, BMP, PCX and fax machines.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 21
RLE (Example)
• Consider the following bit stream:
S = 11111111111111100000000000000000001111
• This can be represented as: fifteen 1’s , nineteen 0’s, four 1’s, i.e., (15, 1), (19, 0), (4, 1).
• Since the maximum number of repetitions is 19, which can be represented with 5 bits, we can
encode the bit stream as (01111, 1), (10011, 0), (00100, 1).
• The compression ratio in this case is 18:38 = 1:2.11.
ECE F344 Information Theory and Coding | Dr. Amit Ranjan Azad | BITS Pilani, Hyderabad Campus 22