0% found this document useful (0 votes)
642 views25 pages

Data Compression Techniques Explained

Data compression techniques aim to optimize the use of limited storage space and transmission time by removing redundant data from files. There are two main categories of compression: lossless techniques exactly reconstruct the original data, while lossy techniques tolerate minor data losses. Common lossless methods include run-length encoding, Huffman coding, and Lempel-Ziv encoding, which substitute repeated patterns with shorter codes. Lossy methods like JPEG, MPEG, and MP3 are used for multimedia as human perception masks some data loss. They apply transforms, quantization, and entropy encoding to remove visual and auditory redundancies.

Uploaded by

khanimran182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
642 views25 pages

Data Compression Techniques Explained

Data compression techniques aim to optimize the use of limited storage space and transmission time by removing redundant data from files. There are two main categories of compression: lossless techniques exactly reconstruct the original data, while lossy techniques tolerate minor data losses. Common lossless methods include run-length encoding, Huffman coding, and Lempel-Ziv encoding, which substitute repeated patterns with shorter codes. Lossy methods like JPEG, MPEG, and MP3 are used for multimedia as human perception masks some data loss. They apply transforms, quantization, and entropy encoding to remove visual and auditory redundancies.

Uploaded by

khanimran182
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Compression

CS 147
Minh Nguyen

Why Data Compression?


Make

optimal use of limited


storage space

Save

time and help to optimize


resources
If compression and decompression are done in I/O processor,
less time is required to move data to or from storage
subsystem, freeing I/O bus for other work
In sending data over communication line: less time to transmit
and less storage to host

Data CompressionEntropy

Entropy is the measure of information


content in a message.
Messages with higher entropy carry more information
than messages with lower entropy.

How

to determine the entropy

Find the probability p(x) of symbol x in the message


The entropy H(x) of the symbol x is:
H(x) = - p(x) log2p(x)

The

average entropy over the entire


message is the sum of the entropy of all
n symbols in the message

Data Compression
Methods

Data compression is about storing and


sending a smaller number of bits.
Therere two major categories for
methods to compress data: lossless
and lossy methods

Lossless Compression
Methods

In lossless methods, original data and the


data after compression and
decompression are exactly the same.

Redundant

data is removed in
compression and added during
decompression.

Lossless

methods are used when we


cant afford to lose any data: legal and
medical documents, computer programs.

Run-length encoding

Simplest method of compression.

How: replace consecutive repeating occurrences of a


symbol by 1 occurrence of the symbol itself, then
followed by the number of occurrences.

The method can be more efficient if the data uses


only 2 symbols (0s and 1s) in bit patterns and 1
symbol is more frequent than another.

Huffman Coding
Assign

fewer bits to symbols that occur more


frequently and more bits to symbols appear less
often.
Theres no unique Huffman code and every
Huffman code has the same average code length.
Algorithm:

Make a leaf node for each code symbol


Add the generation probability of each symbol to the leaf
node
Take the two leaf nodes with the smallest probability and
connect them into a new node
Add 1 or 0 to each of the two branches
The probability of the new node is the sum of the
probabilities of the two connecting nodes
If there is only one node left, the code construction is
completed. If not, go back to (2)

Huffman Coding

Example

Huffman Coding
Encoding

Decoding

Lempel Ziv Encoding


It

is dictionary-based encoding

Basic

idea:

Create a dictionary(a table) of strings


used during communication.
If both sender and receiver have a copy
of the dictionary, then previouslyencountered strings can be substituted
by their index in the dictionary.

Lempel Ziv Compression


Have

2 phases:

Building an indexed dictionary


Compressing a string of symbols

Algorithm:
Extract the smallest substring that cannot be
found in the remaining uncompressed string.
Store that substring in the dictionary as a
new entry and assign it an index value
Substring is replaced with the index found in
the dictionary
Insert the index and the last character of the
substring into the compressed string

Lempel Ziv Compression

Compression
example:

Audio Encoding
Predictive

encoding

Only the differences

Lempel Ziv
Decompression
Its just the inverse

of compression process

Lossy Compression
Methods

Used for compressing images and


video files (our eyes cannot
distinguish subtle changes, so lossy
data is acceptable).
These methods are cheaper, less time
and space.
Several

methods:

JPEG: compress pictures and graphics


MPEG: compress video
MP3: compress audio

JPEG Encoding
Used

to compress pictures and


graphics.
In JPEG, a grayscale picture is divided
into 8x8 pixel blocks to decrease the
number of calculations.
Basic idea:
Change the picture into a linear (vector) sets of
numbers that reveals the redundancies.
The redundancies is then removed by one of
lossless compression methods.

JPEG Encoding- DCT


DCT:

Discrete Concise Transform


DCT transforms the 64 values in 8x8 pixel
block in a way that the relative relationships
between pixels are kept but the
redundancies are revealed.
Example:
A gradient grayscale

Quantization &
Compression

Quantization:

After T table is created, the values are quantized


to reduce the number of bits needed for encoding.
Quantization divides the number of bits by a
constant, then drops the fraction. This is done to
optimize the number of bits and the number of 0s
for each particular application.

Compression:
Quantized values are read from the table and
redundant 0s are removed.
To cluster the 0s together, the table is read
diagonally in an zigzag fashion. The reason is if
the table doesnt have fine changes, the bottom
right corner of the table is all 0s.
JPEG usually uses lossless run-length encoding at
the compression phase.

JPEG Encoding

MPEG Encoding
Used

to compress video.
Basic idea:
Each video is a rapid sequence of a set of
frames. Each frame is a spatial
combination of pixels, or a picture.
Compressing video =
spatially compressing each frame
+
temporally compressing a set of
frames.

MPEG Encoding
Spatial

Compression

Each frame is spatially compressed by JPEG.

Temporal Compression
Redundant frames are removed.
For example, in a static scene in which someone
is talking, most frames are the same except for
the segment around the speakers lips, which
changes from one frame to the next.

Audio Compression
Used

for speech or music

Speech: compress a 64 kHz digitized


signal
Music: compress a 1.411 MHz signal

Two categories of techniques:


Predictive encoding
Perceptual encoding

Audio Encoding
Predictive

Encoding

Only the differences between samples are


encoded, not the whole sample values.
Several standards: GSM (13 kbps), G.729 (8
kbps), and G.723.3 (6.4 or 5.3 kbps)

Perceptual Encoding: MP3


CD-quality audio needs at least 1.411
Mbps and cannot be sent over the
Internet without compression.
MP3 (MPEG audio layer 3) uses perceptual
encoding technique to compress audio.

References
http://www.csie.kuas.edu.tw/course/cs/

english/ch-15.ppt
CS157B-Lecture

19 by Professor

Lee
http://cs.sjsu.edu/~lee/cs157b/cs157b
.html
The

essentials of computer
organization and architecture by
Linda Null and Julia Nobur.

Data Compression

QUESTION?

You might also like