MODELING AND
CODING
Module 1
Data Compression
◦ Representation of a file in fewest number of bits and as accurate as
possible.
◦ Data compression works on symbols/characters from an Input text,
processing them, and writing codes.
◦ To be effective, a data compression able to transform the
compressed file back into an Identical copy of the Input text [
Encoding & Decoding required].
◦ Data compression Implies sending or sorting a smaller number of
bits. Compr
Compression essed Decompression
File
Source Deco
(CF)
File mpress
(SF) 5 KB ed File
(DF)
20 KB
20 KB
We may not get exact source file after decompression in some cases.
Modeling and Coding
The development of data compression algorithms for a
variety of data can be divided into two phases. The first
phase is usually referred to as modeling. In this phase we
try to extract information about any redundancy that
exists in the data and describe the redundancy in the form
of a model. The second phase is called coding.
Modeling
In this phase, we try to extract information about any
redundancy or similarity that exists in the data and
describe the redundancy of data in the form of a model.
This model acts as the basis of any data compression
algorithm and the performance of any algorithm will
depend on how well the model is being formed.
Modeling and Coding cntd..
Coding
This is the second phase. It is the description of
the model and a description of how the data
different from the model is encoded, generally
encoding is done using binary digits.
Example1
Consider the following sequence of numbers:
9, 10, 11, 12, 13.
By examining and exploiting the structure of data in
a graph paper it seems to be a straight line, so we
modeled it with the equation,
x = n + 9, where n = 0,1,2,3...
Modeling and Coding cntd..
◦ Consider the following sequence of
numbers (x1, x2, x3, ……..}:
9 11 11 11 14 13 15 17 16 17 20 21
◦ The plot of this sequence is linear.
(All points are available in the straight
line.)
◦ It is required minimum 5 bit for each
sample. But we can store the each
sample in fewer bits.
◦ We can mode the above sequence Example2
using the following equation:
xˆn =n+8, n=1,2,...........
Modeling and Coding cntd..
◦To make use of this structure, let’s examine
the difference between the data and the
model. The difference (or residual) is given
by the sequence.
en = xn - xˆn = 0 1 0 -1 1 -1 0 1 -1 -1 1 1
◦The residual sequence consists of only three
numbers – {1, 0, 1}. If we assign a code of 00
to -1, a code of 01 to 0, and a code of 10 to 1,
we need to use 2 bits to represent each
element of the residual sequence.
Example 3
◦ Consider the following sequence of
numbers:
27 28 29 28 26 27 29 28 30 32 34 36 38
◦ In this sequence each value is close to
the previous value. Suppose we send
the first value, then in place of
subsequent values we send the
difference between it and the previous
value.
◦ The sequence of transmitted values
would be:
27 1 1 −1 −2 1 2 −1 2 2 2 2 2
Example 4
◦ Consider the following sequence of numbers:
1, 1, 1, 2, 2, 2, 2, 3, 3.
◦ In this sequence only 3 digits are there that is repeated in
multiple place. Now design the model for compression of
these sequence like that:
1, 3, 2, 4, 3, 2
◦ Here first element represent the first value and the second
value represent the number of times the first value present in
the sequence at the consecutive place. The third element is
the second unique value and the 4th value is the number of
times the third value present in the sequence.
Performance measurement
◦ Compression ratio
𝑏𝑖𝑡𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑏𝑒𝑓𝑜𝑟𝑒 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛(𝑆𝐹)
𝐶𝑅 =
𝑏𝑖𝑡𝑠 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑎𝑓𝑡𝑒𝑟 𝑐𝑜𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛 (𝐶𝐹)
Lets SF=4 and CF=1 then CR= 4:1
If the compression ratio high then we can say model is good.
◦ Space
◦ Time
◦ Implementation complexity.