IS502:MULTIMEDIA DESIGN FOR INFORMATION
SYSTEM
MULTIMEDIA OF DATA
COMPRESSION
Presenter Name: Mahmood A.Moneim
Supervised By: Prof. Hesham A.Hefny
Winter 2014
Multimedia Data Compression
Reduce the size of data.
Reduces storage space and hence storage cost.
Reduces time to retrieve and transmit data.
original data size
compression ratio
compressed data size
original Data
compressed Data
compress
decompress
By Mahmood A.Moneim
compressed Data
Decompressed Data
2
Lossless And Lossy Compression
Compression ratios of lossy compressors generally is
higher than lossless compressors.
E.g. 100(lossy) vs. 2(lossless).
Lossless compression is essential in applications
such as text file compression.
Lossy compression is acceptable in many imaging
and voice applications.
E.g. JPEG, MP3, etc.
By Mahmood A.Moneim
Kinds of Lossless Compressors
[1] Model and code
The source is modeled as a stochastic process.
The probabilities (or statistics) is given or acquired.
[2] Dictionary-based
There is no explicit model and there is no explicit
statistics gathering. Instead, a codebook (or
dictionary) is used to map source words into codewords.
By Mahmood A.Moneim
Model and Code
Example:
Shannon code
Huffman code
Arithmetic code
By Mahmood A.Moneim
Dictionary-based
Example:
LZ family
runlength code
By Mahmood A.Moneim
Basics of information theory
Entropy is a measure of disorder of the
system
By Mahmood A.Moneim
Shannon-Fano Algorithm
To illustrate the algorithm, lets suppose the symbols
to coded are characters in the word HELLO. The
frequency count of the symbols is
the top-down algorithm manner is:
Sort the symbols according to the frequency count.
Recursively divide the symbols into two parts, each with
approximately the same number counts, until all parts
contains only one symbol.
By Mahmood A.Moneim
Coding tree for HELLO by the
Shannon-Fano algorithm
By Mahmood A.Moneim
Cont.
Entropy
By Mahmood A.Moneim
10
Huffman code
Huffman code: (illustrated with a
manageable example)
Letter
Frequency (%)
A
25
B
15
C
10
D
20
E
30
Huffman code
Huffman code: Code formation
- Assign weights to each character
- Merge two lightest weights into one root
node with sum of weights .
- Repeat until one tree is left
- Traverse the tree from root to the leaf (for
each node, assign 0 to the left, 1 to the right)
Huffman code
Huffman code: Code Interpretation
- No prefix property: code for any character
never appears as the prefix of another code
(Verify)
- Receiver continues to receive bits until it
finds a code and forms the character
- 01110001110110110111 (extract the string)
Example. Find Huffman codes and compression ratio (C.R.) for Table 1,
assuming that uncompressed representation takes 8-bit per character and
assume that size of Huffman table is not part of the compressed size.
Table 1:
Char
Freq
90
60
50
20
12
Huffman Codes:
A
00
01
10
111
1101
11001
110000
110001
11
10
01
000
0010
00110
001111
001110
14
Huffman Tree
250
/
\
150 100
/ \ / \
A B C 50
/ \
30 D
/ \
18 E
/ \
10 F
/ \
G H
Char
Freq
90
60
50
20
12
Huffman
Code
00
01
10
111
1101
11001
110000
110001
C.R. = (250*8) / (2*90 + 2*60 + 2*50 + 3*20 + 4*12 + 5*8 + 6*7 + 6*3) = 3.29
15
Decompression - Huffman Codes
A
00
01
10
111
1101
11001
110000
110001
11
10
01
000
0010
00110
001111
001110
Compress DEAF using above Huffman Codes.
111 1101 00 11001
Decompress 110001 1101 00 111
Ans.: HEAD
Sunday, November 24, 2013
WILPD, B.I.T.S., PILANI EA ZC473
Multimedia Computing On-Line Lecture-6
16
Arithmetic compression
Arithmetic compression: is based on
Interpreting a character-string as a single real
number
Letter
Frequency (%) Subinterval [p, q]
A
25
[0, 0.25]
B
15
[0.25, 0.40]
C
10
[0.40, 0.50]
D
20
[0.50, 0.70]
E
30
[0.70, 1.0]
Arithmetic compression
Arithmetic compression: Coding CABAC
Generate subintervals of decreasing length,
subintervals depend uniquely on the strings
characters and their frequencies.
Interval [x, y] has width w = y x, the new
interval based on [p, q] is x = x + w.p, y = x +
w.q
Step 1: C 0..0.4.0.5..1
based on p = 0.4, q = 0.5
Arithmetic compression
Step 2: A 0.40.425.....0.5
based on p = 0.0, q = 0.25
Step 3: B
0.40.406250.41..0.425
based on p = 0.25, q = 0.4
Step 4: A
Step 5: C
0.406625 0.4067187
Final representation (midpoint)?
Arithmetic compression
Arithmetic compression: Extracting CABAC
N
0.4067
0.067
0.268
0.12
0.48
Interval[p, q]
0.4 0.5
0 0.25
0.25 0.4
0 0.25
0.4 0.5
Width Character
0.1
C
0.25
A
0.15
B
0.25
A
0.1
C
N-p
0.0067
0.067
0.018
0.12
0.08
(N-p)/width
0.067
0.268
0.12
0.48
0.8
When to stop? A terminal character is added to the original
character set and encoded. During decompression, once it is
encountered the process stops.
LZW Algorithm
LZW Compression
Begin
S= next input character
While not EOF
{
C= next input character
Is s+c exists in the dictionary
S= s+c
Else{
Output the code for s;
Add String s+ c to dictionary with a new code
S=c
}
}
Output the code for s
End
By Mahmood A.Moneim
21
LZW for String ABABBABCABABBA
Initially containing only three characters
By Mahmood A.Moneim
22
Cont.
By Mahmood A.Moneim
23
LZW Decompression
By Mahmood A.Moneim
24
Cont.
Input code for the decoder is 124523461.
By Mahmood A.Moneim
25
Run Length Encoding
Huffman code requires:
- frequency values
- bits are grouped into characters or units
Many items do not fall into such category
- machine code files
- facsimile Data (bits corresponding to light or
dark area of a page)
- video signals
Run Length Encoding
For such files, RLE is used.
Instead of sending long runs of 0s or 1s, it
sends only how many are in the run.
70%-80% space is white on a typed character
space, so RLE is useful.
Run Length Encoding
Runs with different characters
Send the actual character with the run-length
HHHHHHHUFFFFFFFFFYYYYYYYYYYYDGGGGG
code = 7, H, 1, U, 9, F, 11, Y, 1, D, 5, G
SAVINGS IN BITS (considering ASCII): ?
QUESTIONS?
By Mahmood A.Moneim
29