0% found this document useful (0 votes)

642 views25 pages

Data Compression Techniques Explained

Data compression techniques aim to optimize the use of limited storage space and transmission time by removing redundant data from files. There are two main categories of compression: lossless techniques exactly reconstruct the original data, while lossy techniques tolerate minor data losses. Common lossless methods include run-length encoding, Huffman coding, and Lempel-Ziv encoding, which substitute repeated patterns with shorter codes. Lossy methods like JPEG, MPEG, and MP3 are used for multimedia as human perception masks some data loss. They apply transforms, quantization, and entropy encoding to remove visual and auditory redundancies.

Uploaded by

khanimran182

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

642 views25 pages

Data Compression Techniques Explained

Uploaded by

khanimran182

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 25

Data Compression

CS 147
Minh Nguyen

Why Data Compression?

Make

optimal use of limited

storage space

Save

time and help to optimize

resources
If compression and decompression are done in I/O processor,
less time is required to move data to or from storage
subsystem, freeing I/O bus for other work
In sending data over communication line: less time to transmit
and less storage to host

Data CompressionEntropy

Entropy is the measure of information

content in a message.
Messages with higher entropy carry more information
than messages with lower entropy.

How

to determine the entropy

Find the probability p(x) of symbol x in the message

The entropy H(x) of the symbol x is:
H(x) = - p(x) log2p(x)

The

average entropy over the entire

message is the sum of the entropy of all
n symbols in the message

Data Compression
Methods

Data compression is about storing and

sending a smaller number of bits.
Therere two major categories for
methods to compress data: lossless
and lossy methods

Lossless Compression
Methods

In lossless methods, original data and the

data after compression and
decompression are exactly the same.

Redundant

data is removed in
compression and added during
decompression.

Lossless

methods are used when we

cant afford to lose any data: legal and
medical documents, computer programs.

Run-length encoding

Simplest method of compression.

How: replace consecutive repeating occurrences of a

symbol by 1 occurrence of the symbol itself, then
followed by the number of occurrences.

The method can be more efficient if the data uses

only 2 symbols (0s and 1s) in bit patterns and 1
symbol is more frequent than another.

Huffman Coding
Assign

fewer bits to symbols that occur more

frequently and more bits to symbols appear less
often.
Theres no unique Huffman code and every
Huffman code has the same average code length.
Algorithm:

Make a leaf node for each code symbol

Add the generation probability of each symbol to the leaf
node
Take the two leaf nodes with the smallest probability and
connect them into a new node
Add 1 or 0 to each of the two branches
The probability of the new node is the sum of the
probabilities of the two connecting nodes
If there is only one node left, the code construction is
completed. If not, go back to (2)

Huffman Coding

Example

Huffman Coding
Encoding

Decoding

Lempel Ziv Encoding

is dictionary-based encoding

Basic

idea:

Create a dictionary(a table) of strings

used during communication.
If both sender and receiver have a copy
of the dictionary, then previouslyencountered strings can be substituted
by their index in the dictionary.

Lempel Ziv Compression

Have

2 phases:

Building an indexed dictionary

Compressing a string of symbols

Algorithm:
Extract the smallest substring that cannot be
found in the remaining uncompressed string.
Store that substring in the dictionary as a
new entry and assign it an index value
Substring is replaced with the index found in
the dictionary
Insert the index and the last character of the
substring into the compressed string

Lempel Ziv Compression

Compression
example:

Audio Encoding
Predictive

encoding

Only the differences

Lempel Ziv
Decompression
Its just the inverse

of compression process

Lossy Compression
Methods

Used for compressing images and

video files (our eyes cannot
distinguish subtle changes, so lossy
data is acceptable).
These methods are cheaper, less time
and space.
Several

methods:

JPEG: compress pictures and graphics

MPEG: compress video
MP3: compress audio

JPEG Encoding
Used

to compress pictures and

graphics.
In JPEG, a grayscale picture is divided
into 8x8 pixel blocks to decrease the
number of calculations.
Basic idea:
Change the picture into a linear (vector) sets of
numbers that reveals the redundancies.
The redundancies is then removed by one of
lossless compression methods.

JPEG Encoding- DCT

DCT:

Discrete Concise Transform

DCT transforms the 64 values in 8x8 pixel
block in a way that the relative relationships
between pixels are kept but the
redundancies are revealed.
Example:
A gradient grayscale

Quantization &
Compression

Quantization:

After T table is created, the values are quantized

to reduce the number of bits needed for encoding.
Quantization divides the number of bits by a
constant, then drops the fraction. This is done to
optimize the number of bits and the number of 0s
for each particular application.

Compression:
Quantized values are read from the table and
redundant 0s are removed.
To cluster the 0s together, the table is read
diagonally in an zigzag fashion. The reason is if
the table doesnt have fine changes, the bottom
right corner of the table is all 0s.
JPEG usually uses lossless run-length encoding at
the compression phase.

JPEG Encoding

MPEG Encoding
Used

to compress video.
Basic idea:
Each video is a rapid sequence of a set of
frames. Each frame is a spatial
combination of pixels, or a picture.
Compressing video =
spatially compressing each frame
+
temporally compressing a set of
frames.

MPEG Encoding
Spatial

Compression

Each frame is spatially compressed by JPEG.

Temporal Compression
Redundant frames are removed.
For example, in a static scene in which someone
is talking, most frames are the same except for
the segment around the speakers lips, which
changes from one frame to the next.

Audio Compression
Used

for speech or music

Speech: compress a 64 kHz digitized

signal
Music: compress a 1.411 MHz signal

Two categories of techniques:

Predictive encoding
Perceptual encoding

Audio Encoding
Predictive

Encoding

Only the differences between samples are

encoded, not the whole sample values.
Several standards: GSM (13 kbps), G.729 (8
kbps), and G.723.3 (6.4 or 5.3 kbps)

Perceptual Encoding: MP3

CD-quality audio needs at least 1.411
Mbps and cannot be sent over the
Internet without compression.
MP3 (MPEG audio layer 3) uses perceptual
encoding technique to compress audio.

References
http://www.csie.kuas.edu.tw/course/cs/

english/ch-15.ppt
CS157B-Lecture

19 by Professor

Lee
http://cs.sjsu.edu/~lee/cs157b/cs157b
.html
The

essentials of computer
organization and architecture by
Linda Null and Julia Nobur.

Data Compression

QUESTION?

Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
17 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
84 pages
Artificial Intelligence Unit 3
No ratings yet
Artificial Intelligence Unit 3
69 pages
BEC503 Digital Communication Question Bank
No ratings yet
BEC503 Digital Communication Question Bank
1 page
Cyclic Codes. Detailed Solutions To Problems
No ratings yet
Cyclic Codes. Detailed Solutions To Problems
12 pages
Unit-Ii Search Techniques: Prepared by Mrs. Shah S. S. 1
No ratings yet
Unit-Ii Search Techniques: Prepared by Mrs. Shah S. S. 1
48 pages
Unit 2 Topic 8 Alpha - Beta Pruning
No ratings yet
Unit 2 Topic 8 Alpha - Beta Pruning
37 pages
Fourier Series for Engineers
No ratings yet
Fourier Series for Engineers
39 pages
Basic Simulation Lab Manual
No ratings yet
Basic Simulation Lab Manual
90 pages
Effective Cyber Security Countermeasures
No ratings yet
Effective Cyber Security Countermeasures
8 pages
Information Theory and Coding Syllabus
No ratings yet
Information Theory and Coding Syllabus
185 pages
3-3-Arithmetic Coding
100% (1)
3-3-Arithmetic Coding
71 pages
ADSP Unit 1 QB
100% (1)
ADSP Unit 1 QB
4 pages
Bce613a-Mod 3
No ratings yet
Bce613a-Mod 3
22 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
11 pages
Unit 2 Topic 2 Uniformed Search Strategies
No ratings yet
Unit 2 Topic 2 Uniformed Search Strategies
26 pages
Understanding Predictive Coding Techniques
No ratings yet
Understanding Predictive Coding Techniques
14 pages
(WWW Vtuworld Com) Multimedia-Communication-Notes PDF
No ratings yet
(WWW Vtuworld Com) Multimedia-Communication-Notes PDF
220 pages
Information Theory 5th Unit
No ratings yet
Information Theory 5th Unit
20 pages
Multimedia Networks & Compression
No ratings yet
Multimedia Networks & Compression
38 pages
Text & Image Compression Guide
100% (1)
Text & Image Compression Guide
54 pages
Error Control Coding Sep 2023 To Jan 2018
No ratings yet
Error Control Coding Sep 2023 To Jan 2018
22 pages
Transform Coding II
No ratings yet
Transform Coding II
19 pages
DSP Unit 1
No ratings yet
DSP Unit 1
186 pages
Redundancy
No ratings yet
Redundancy
3 pages
Module 5 Convolutional Codes Code Tree
100% (1)
Module 5 Convolutional Codes Code Tree
13 pages
Wavelet Thresholding Explained
100% (2)
Wavelet Thresholding Explained
44 pages
Dictionary Coding Explained
No ratings yet
Dictionary Coding Explained
56 pages
Truncated Huffman
No ratings yet
Truncated Huffman
5 pages
Multirate Digital Signal Processing Guide
No ratings yet
Multirate Digital Signal Processing Guide
71 pages
PTSP Unit-1
No ratings yet
PTSP Unit-1
53 pages
LDPC Codes for ECE Students
No ratings yet
LDPC Codes for ECE Students
20 pages
JNTUA Communication Systems - PPT Notes - R20
No ratings yet
JNTUA Communication Systems - PPT Notes - R20
70 pages
IT2302-Information Theory and Coding
No ratings yet
IT2302-Information Theory and Coding
9 pages
Yash CEP Final
No ratings yet
Yash CEP Final
17 pages
Hopfield Networks Overview
No ratings yet
Hopfield Networks Overview
48 pages
Image Processing PPT Unit 5
No ratings yet
Image Processing PPT Unit 5
19 pages
Chapter 5: Introduction To Information Theory and Coding: Efficient and Reliable Communication
No ratings yet
Chapter 5: Introduction To Information Theory and Coding: Efficient and Reliable Communication
22 pages
Multirate Signal Processing
No ratings yet
Multirate Signal Processing
33 pages
Overview of Hopfield Neural Networks
100% (1)
Overview of Hopfield Neural Networks
6 pages
Sprinklr OA Interview Process Guide
No ratings yet
Sprinklr OA Interview Process Guide
8 pages
Routing Algorithms (Distance Vector, Link State) Study Notes - Computer Sc. & Engg
No ratings yet
Routing Algorithms (Distance Vector, Link State) Study Notes - Computer Sc. & Engg
6 pages
Multimedia Info Representation Notes
100% (2)
Multimedia Info Representation Notes
29 pages
Analog Communication Lab VIVA Questions & Answers
No ratings yet
Analog Communication Lab VIVA Questions & Answers
9 pages
Final SS LAB MANUAL
No ratings yet
Final SS LAB MANUAL
22 pages
AI Unit 1 PDF
No ratings yet
AI Unit 1 PDF
14 pages
Subject Name: Communication Networks and Transmission Lines Subject Code: EC-5004 Semester: 5
100% (1)
Subject Name: Communication Networks and Transmission Lines Subject Code: EC-5004 Semester: 5
16 pages
JPEG2000 Image Compression Standard
No ratings yet
JPEG2000 Image Compression Standard
12 pages
r05321002 Principles of Communication
No ratings yet
r05321002 Principles of Communication
8 pages
Signals and Systems Micro Lesson Plan
No ratings yet
Signals and Systems Micro Lesson Plan
5 pages
Image Compression Techniques Explained
No ratings yet
Image Compression Techniques Explained
14 pages
23.01.2025 Data Communication Syllabus
100% (1)
23.01.2025 Data Communication Syllabus
2 pages
Cs 3rd Unit Problems
No ratings yet
Cs 3rd Unit Problems
10 pages
Image Compression Techniques Overview
100% (1)
Image Compression Techniques Overview
110 pages
Digital Communication - Information Theory
100% (1)
Digital Communication - Information Theory
4 pages
Data Compression 1
No ratings yet
Data Compression 1
25 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Data Compression Techniques Explained
No ratings yet
Data Compression Techniques Explained
22 pages
Compression Techniques
No ratings yet
Compression Techniques
24 pages
Mahatma Gandhi's Legacy and Campus Events
No ratings yet
Mahatma Gandhi's Legacy and Campus Events
13 pages
Corel Draw & MS Office MCQ Test
No ratings yet
Corel Draw & MS Office MCQ Test
1 page
UML Modeling Assignments Overview
No ratings yet
UML Modeling Assignments Overview
5 pages
Java Programming Lab Manual
No ratings yet
Java Programming Lab Manual
48 pages
ASE 2020-21 Lecture 4
No ratings yet
ASE 2020-21 Lecture 4
40 pages
SQLT Instructions
No ratings yet
SQLT Instructions
15 pages
DBMS Entc 3
No ratings yet
DBMS Entc 3
3 pages
Group by - Having Clause - Stored Procedures
No ratings yet
Group by - Having Clause - Stored Procedures
30 pages
CHAPTER ONE-background of The Study
100% (1)
CHAPTER ONE-background of The Study
6 pages
People Analytics & Digital HR Course
No ratings yet
People Analytics & Digital HR Course
10 pages
Effectiveness of Mango and Tomato As A Natural Highlighters On Bibulous Paper
100% (2)
Effectiveness of Mango and Tomato As A Natural Highlighters On Bibulous Paper
52 pages
LCP - Empowerment (M.majoy)
No ratings yet
LCP - Empowerment (M.majoy)
6 pages
Relational Database Management System: Normalization
No ratings yet
Relational Database Management System: Normalization
8 pages
Birnbaum and Somers (2022)
No ratings yet
Birnbaum and Somers (2022)
13 pages
Enhancing Sales with Einstein AI Features
No ratings yet
Enhancing Sales with Einstein AI Features
3 pages
Manually
No ratings yet
Manually
6 pages
SQL Guide
No ratings yet
SQL Guide
23 pages
Data Quality Analysis of Northwind Database
No ratings yet
Data Quality Analysis of Northwind Database
18 pages
Translating IT Neologisms: EN-VN Guide
100% (4)
Translating IT Neologisms: EN-VN Guide
19 pages
Python Dictionary Methods Explained
No ratings yet
Python Dictionary Methods Explained
9 pages
09 Data Serving
No ratings yet
09 Data Serving
46 pages
Solaris 10 Disk Layout
No ratings yet
Solaris 10 Disk Layout
3 pages
DMW Ebook TechKnowledge
No ratings yet
DMW Ebook TechKnowledge
216 pages
PNL and ADL Guidelines for Airlines
100% (1)
PNL and ADL Guidelines for Airlines
37 pages
B.K. Chatterjee Has: Quality
No ratings yet
B.K. Chatterjee Has: Quality
5 pages
Chapter7 Student PPT - Databases
No ratings yet
Chapter7 Student PPT - Databases
26 pages
Greenplum Database 42 Release-Notes
No ratings yet
Greenplum Database 42 Release-Notes
23 pages
Azure & Snowflake Data Engineer Role
No ratings yet
Azure & Snowflake Data Engineer Role
2 pages
ChatGPT's Impact on Student Search Habits
No ratings yet
ChatGPT's Impact on Student Search Habits
9 pages
ECEG2052 Course Outline
No ratings yet
ECEG2052 Course Outline
3 pages
Employee Training Tracking System Design
No ratings yet
Employee Training Tracking System Design
60 pages
Grade 11 Practical Research Exam Guide
No ratings yet
Grade 11 Practical Research Exam Guide
3 pages
Dynamic Tables
No ratings yet
Dynamic Tables
2 pages
Java Collections & I/O Practice
No ratings yet
Java Collections & I/O Practice
4 pages

Data Compression Techniques Explained

Uploaded by

Data Compression Techniques Explained

Uploaded by

Data Compression

Why Data Compression?

optimal use of limited

time and help to optimize

Entropy is the measure of information

to determine the entropy

Find the probability p(x) of symbol x in the message

average entropy over the entire

Data compression is about storing and

In lossless methods, original data and the

methods are used when we

Simplest method of compression.

How: replace consecutive repeating occurrences of a

The method can be more efficient if the data uses

fewer bits to symbols that occur more

Make a leaf node for each code symbol

Lempel Ziv Encoding

Create a dictionary(a table) of strings

Lempel Ziv Compression

Building an indexed dictionary

Lempel Ziv Compression

Only the differences

Used for compressing images and

JPEG: compress pictures and graphics

to compress pictures and

JPEG Encoding- DCT

Discrete Concise Transform

After T table is created, the values are quantized

Each frame is spatially compressed by JPEG.

for speech or music

Speech: compress a 64 kHz digitized

Two categories of techniques:

Only the differences between samples are

Perceptual Encoding: MP3

You might also like