0% found this document useful (0 votes)
22 views58 pages

Ch-5 Data Compression

The document discusses data compression techniques essential for managing the high storage and bandwidth demands of uncompressed multimedia data, highlighting methods such as JPEG, H.261, and MPEG. It outlines the coding requirements for various media types and the importance of compression ratios, entropy, and statistical coding methods like Huffman coding. Additionally, it details the steps involved in data compression, including preparation, processing, quantization, and entropy encoding, while emphasizing the need for compatibility and standards in multimedia systems.

Uploaded by

gautamb639
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views58 pages

Ch-5 Data Compression

The document discusses data compression techniques essential for managing the high storage and bandwidth demands of uncompressed multimedia data, highlighting methods such as JPEG, H.261, and MPEG. It outlines the coding requirements for various media types and the importance of compression ratios, entropy, and statistical coding methods like Huffman coding. Additionally, it details the steps involved in data compression, including preparation, processing, quantization, and entropy encoding, while emphasizing the need for compatibility and standards in multimedia systems.

Uploaded by

gautamb639
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Compression

Storage Space
 Uncompressed graphics, audio and video data require
considerable storage capacity which in the case of
uncompressed video is often not even feasible given CD
technology. The same is true for multimedia
communications.
 Data transfer of uncompressed video data over digital
networks requires very high bandwidth to be provided for a
single point-to-point communication.
 To provide feasible and cost effective solutions, most
multimedia systems handle compressed digital video and audio
data streams.
 There already exist many compression techniques that are in
part competitive and in part complementary.
 Most important compression techniques are:
 JPEG (for single pictures)
 H.261 ( for video)
 MPEG (for video and audio, etc.
 Proprietary developments including HQ Learn's
Digital Visual Interface (DVI) (for still images, audio
and video)
Coding Requirements
 Images have considerably higher storage requirements
than text; audio and video have even more demanding
properties for data storage.
 Not only is a huge amount of storage required, but the
data rates for the communication of continuous media
are also significant.
 An uncompressed audio signal of telephone quality is
sampled at 8 kHz and quantized with 8 bits per sample.
This leads to a bandwidth requirement of 64 kbits/second
and storage requirement of 64 kbits to store one second
of playback.
 An uncompressed stereo audio signal of CD quality is
sampled at a rate of 44.I kHz and is quantized with 16 bits
per sample; hence, the storage requirement is (44.1 kHz x
16 bits) = 705.6 x 103 bits to store one second of
playback and the bandwidth (throughput) requirement is
705.6 x 103 bits/second
 To determine storage requirements for the PAL standard, we
assume the same image resolution as used before (640 x 480
pixels) and 3 bytes/pixel to encode the luminance and chrominance
components. Hence, the storage requirement for one image (frame)
is (640 pixels x 480 pixels x 3 bytes) = 921,600 bytes or 7,872,800
bits. To store 25 frames/second, the storage requirement is 230.4 x
105 bytes or 184.32 x 106 bits.
 In the case of video, processing uncompressed data streams in an
integrated multimedia system leads to secondary storage
requirements in the range of at least giga-bytes, and in the range of
mega-bytes for buffer storage.
 The throughput in a multimedia system can be as high as 140
Mbits/second, which must be transferred between different systems.
 This kind of data transfer rate is not realizable with today's
technology, or in the near future with reasonably priced hardware.
 The use of appropriate compression techniques
considerably reduces the data transfer rates, and
fortunately research, development and standardization
have rapidly progressed in this area during the last few
years
 some compression techniques for different media are
often mentioned in the literature and product
descriptions: JPEG for still image compression and H.267
for video conferencing, MPEG is used for video and audio
compression, while DVI can be used for still images, as
well as for continuous media.
 Compression in multimedia systems is subject to certain
constraints.
 The quality of the coded, and later on, decoded data should be
as good as possible.
 To make a cost-effective implementation possible, the
complexity of the technique used should be minimal.
 The processing of the algorithm must not exceed certain time
spans
 For each compression technique, there are requirements
that differ from those of other techniques
 one can distinguish between requirements of an application
running in a ,”dialogue” mode and in a "retrieval" mode, where
 a dialogue mode means an interaction among human users via
multimedia information and
 a retrieval mode means a retrieval of multimedia information by a
human user from a multimedia database.
 Some compression techniques like px64 are more suitable for
dialogue mode applications.
 Other techniques, like the Digital Video Interactive
Presentation Level Video (DVI PLV) mode, are optimized for
use in retrieval mode applications
 For both dialogue and retrieval mode, the following
requirements apply:
 To support scalable video in different systems it is necessary to
define a format independent of frame size and video frame
rate.
 Various audio and video data rates should be supported;
usually this leads to different qualities. Thus, depending on
specific system conditions, the data rates can be adjusted.
 It must be possible to synchronize audio with video data, as
well as with other media.
 To make an economical solution possible, coding should be
realized using software (for a cheap and low-quality solution)
or VLSI chips (for a high quality solution).
 It should be possible to generate data on one multimedia
system and reproduce these data on another system. The
compression technique should be compatible.
 As many applications exchange multimedia data using
communication networks, the compatibility of
compression techniques is required.
 Standards like CCITT, ISO and European Computer
Manufacturers Association (ECMA) and/or defacto
standards are used to achieve this compatibility.
Compression Ratio

 A good metric for compression is the compression


factor (or compression ratio) given by:

 If we have a 100KB file that we compress to 40KB,


we have a compression factor of:

12
Information and Entropy

 Compression is achieved by removing data


redundancy while preserving information content.
 The information content of a group of bytes (a
message) is its entropy.
 Data with low entropy permit a larger compression ratio than
data with high entropy.
 Entropy, H, is a function of symbol frequency. It is the
weighted average of the number of bits required to
encode the symbols of a message:
H= -P(x)  log2P(xi)

13
7A.1 Introduction

 The entropy of the entire message is the sum of the


individual symbol entropies.
 -P(x)  log2P(xi)

14
7A.2 Statistical Coding

 Consider the message: HELLO WORLD!


 The letter L has a probability of 3/12 = 1/4 of appearing in
this message. The number of bits required to encode this
symbol is -log2(1/4) = 2.
 Using our formula,  -P(x)  log2P(xi), the average
entropy of the entire message is 3.022.
 This means that the theoretical minimum number of bits per
character is 3.022.
 Theoretically, the message could be sent using only
37 bits. (3.022 12 = 36.26)

15
7A.2 Statistical Coding

 The entropy metric just described forms the basis


for statistical data compression.
 Two widely-used statistical coding algorithms are
Huffman coding and arithmetic coding.
 Huffman coding builds a binary tree from the letter
frequencies in the message.
 The binary symbols for each character are read directly
from the tree.
 Symbols with the highest frequencies end up at the
top of the tree, and result in the shortest codes.

16
Source, Entropy and Hybrid Coding
Major Steps of Data Compression
 Preparation includes analog-to-digital conversion and
generating an appropriate digital representation of the
information. An image is divided into blocks of 8 x 8
pixels, and represented by a fixed number of bits per
pixel.
 Processing is actually the first step of the compression
process which makes use of sophisticated algorithms. A
transformation from the time to the frequency domain
can be performed using DCT. In the case of motion video
compression, inter-frame coding uses a motion vector for
each 8 x 8 block.
 Quantization processes the results of the previous step. It specifies
the granularity of the mapping of real numbers into integers. This
process results in a reduction of precision. This can also be
considered as the equivalence of the µ law and A-law, which apply to
audio data. In the transformed domain, the coefficients are
distinguished according to their significance. For example, they could
be quantized using a different number of bits per coefficient.
 Entropy encoding is usually the last step. It compresses a sequential
digital data stream without loss. For example, a sequence of zeroes in
a data stream can be compressed by specifying the number of
occurrences followed by the zero itself.
 After compression, the compressed video builds a data stream,
where a specification of the image starting point and an
identification of the compression technique may be part of this data
stream; the error correction code may also be added to the stream.
JPEG
 JPEG is a joint project of ISO/IECJTCIISC2/WG10 and
the commission Q.16 of CCITT SGVIII.
 In 1992, JPEG became an ISO International Standard (IS)
 An adaptive transformation coding technique based on
the DCT achieved the best (subjective) results and,
therefore, was adopted for JPEG.
 Four different variants of image compression can be
determined that lead to four modes. Each mode itself
includes further combinations:
 The lossy sequential DCT-based mode (baseline process)
must be supported by every JPEG implementation.
 The expanded lossy DCT-based mode provides a set of
further enhancements to the baseline process.
 The lossless mode has a low compression ratio that
allows perfect reconstruction of the original image.
Image Preparation
 For the first step of image preparation, JPEG specifies a
very general image model.
 With this model it is possible to describe most of the
well-known two-dimensional image representations.
 This fulfills the demand of image parameter independence, like
image size, image and pixel aspect ratio.
 The resolution of the individual components may be
different. Figure 6.6 shows an image with half the number
of columns (i.e., half the number of horizontal samples)in
the second and third planes as compared to the first
plane:Y1= Y2 = Y3, and X1=2X2 =2X3.
 A gray-scale image will, in most cases, consist of a single
component .
 An RGB color representation has three components with
equal resolution (i.e., the same number of lines Y1 = Y2 = Y3
and same number of columns X1 = X2 = X3).
 For JPEG,YUV color image processing uses Y1 = 4Y2 = 4Y3 and,
X1 = 4X2 = 4X2.
 Each pixel is represented by p bits with values in the range of
0 to 2p - 1.
 All pixels of all components within the same image are coded
with the same number of bits.
 The lossy modes of JPBG use a precision of either 8 or 12 bits
per pixel.
 Lossless modes use a precision of up to 12 bits per pixel.
 With the given ceiling functions, Xi andYi are calculated as
follows:
 for the use of compression the image is divided in data units.
The lossless mode uses one pixel as one data unit.
 The lossy mode uses blocks of 8 x 8 pixels.
 This definition of data units is a result of DCT, which always
transforms connected blocks. In most cases the data units are
processed component by component
 Using this non-interleaved mode for an RGB-encoded image
with very high resolution, the display would initially present
only the red component, then, in turn, the blue and green
would be drawn resulting in the original image colors being
reconstructed.
 Due to the finite processing speed of the JPEG decoder, it is
therefore often more suitable to interleave the data units.
Lossy Sequential DCT-based Mode
-Image Processing
 After image preparation, the uncompressed image
samples ale grouped into data units of 8 x 8 pixels and
passed to the encoder; the order of these data units is
defined by the MCUs.
 In this baseline mode, single samples are encoded using p
= 8 bits. Each pixel is an integer in the range of 0 to 255.
Lossy Sequential DCT-based Mode
-Quantization
 The quantization of all DCT-coefficients is performed. This is a
lossy transformation.
 For this step, the JPEG application provides a table with 64
entries. Each entry will be used for the quantization of one of
the 64 DCT-coefficients.
Lossy Sequential DCT-based Mode
-Entropy Encoding
 During the initial step of entropy encoding, the quantized
DC-coefficients are treated separately from the quantized
AC-coefficients
 The DC-coefficients determine the basic color of the
data units. Between adjacent data units the variation of
color is fairly small. Therefore, a DC coefficient is
encoded as the difference between the current
DC-coefficient and the previous one. only the
differences are subsequently processed.
 The DCT processing order of the AC-coefficients using
the zig-zag sequence illustrates that coefficients with
lower frequencies( typically with higher values) are
encoded first, followed by the higher frequencies (with
typically small, almost zero values).
 JPEG specifies Huffman and arithmetic encoding as entropy encoding methods.
 For the lossy sequential DCT-based mode, discussed in this section, only Huffman
encoding is allowed.
 In both methods, a run-length encoding of zero values of the quantized AC-
coefficients is applied first.
 Additionally, non-zero AC-coefficients, as well as the DC-coefficients, are
transformed into a spectral representation to compress the data even more.
 The number of required bits depends on the coefficient's value. A non-zero AC-
coefficient will be represented using between 1 and 10 bits.
 For the representation of DC-coefficients, a higher resolution of 1 bit to a
maximum of 11 bits is used.
Expanded Lossy DCT-based Mode
 Image preprocessing in this mode differs from the previously
described mode in terms of the number of bits per sample.
 Specifically, a sample precision of 12 bits per sample in addition
to 8 bits per sample can be used.
 For this mode, JPEG specifies progressive encoding in addition to
sequential encoding. In the first run, a very rough
representation of the image appears which looks out of focus
and is refined during successive steps
 Progressive image representation is achieved by an
expansion of quantization. This is also known as layered
coding. For this expansion, a buffer is added at the output
of the quantizer that temporarily stores all coefficients of
the quantized DCT. Progressiveness achieved in two
different ways:
 By using a spectral selection in the first run only, the quantized
DCT-coefficients of low frequencies of each data unit are
passed to the entropy encoding. In successive runs, the
coefficients of higher frequencies are processed.
 Successive approximation transfers all run, but single bits are
differentiated according to their significance. The most-
significant bits are encoded first, of the quantized coefficients in
each then the less-significant bits.
Lossless Mode
 Uses data units of single pixels for image preparation. Any
precision between 2 and 16 bits per pixel can be used.
 In this mode, image processing and quantization use a
predictive technique instead of a transformation encoding
technique.
 As shown in Figure 6.15, for each pixel X, one of eight possible
predictors is selected.
MPEG
 Moving Picture Experts Group
 Established in 1988
 Standards under International Organization for
standardization (ISO) and International Electro
technical Commission (IEC)
 Official name is: ISO/IEC JTC1 SC29 WG11
 MPEG-1 : a standard for storage and retrieval of moving pictures
and audio on storage media
 MPEG-2 : a standard for digital television
 MPEG-4 : a standard for multimedia applications
 MPEG-7 : a content representation standard for information search

 MPEG-21: offers metadata information for audio and video files


 MPEG strives for a data stream compression rate of about 1.2 Mbps
 MPEG can deliver a data rate of at most 856000 Mbps, which should
not be exceeded
 Data rates for audio are between 32 and 448 Kbits/second; this data
rate enables video and audio compression of acceptable quality.
 The MPEG standard explicitly considers functionalities of other
standards:
 JPEG. Since a video sequence can be regarded as a sequence of still
images, and the JPEG standard development was always ahead of the
MPEG standard, the MPEG standard makes use of JPEG.
 H.261. Since the H.261 standard was already available during the work
on the MPEG standard, the working group strived for compatibility (at
least in some areas) with this standard. Implementations that are capable
of H.261, as well as of MPEG, may arise, however, MPEG is the more
advanced technique
Video Encoding
 The MPEG data stream includes more information than a data
stream compressed according to the JPEG standard.
 MPEG provides 14 different image aspect ratios per pixel. The most
important are:
 A square pixel (1:1) is suitable for most computer graphics systems.
 For an image with 702 x 575 pixels, an aspect ratio of 4:3 is defined.
 For an image of 711, x 487 pixels, an aspect ratio of 4:3 is defined.
 For an image with 625 lines, an aspect ratio of 16:9 is defined, the ratio
required for European HDTV.
 For an image with 525 lines, an aspect ratio of 16:9 is defined, the ratio
required for U.S. HDTV.
 The image refresh frequency is also encoded in the data stream.
Eight frequencies are defined: 23.976 Hz. 24 Hz, 25 Hz, 29.97 Hz, 30
Hz, 50 Hz, 59.94 Hz and 60 Hz.
 A temporal prediction of still images leads to a considerable
compression ratio.
 The use of temporal predictors requires the storage of a great
amount of information and image data.
 There is a need to balance this required storage capacity and
the achievable compression rate.
 In most cases, predictive encoding only makes sense for parts
of images and not for the whole image. Therefore, each image
is divided into areas called macro blocks. Each macro block is
partitioned into 16 x 16 pixels for the luminance component
and 8 x 8 pixels for each of the two chrominance components.
 These macro blocks turn out to be quite suitable for
compression based on motion estimation.
 This is a compromise of costs for prediction and the resulting
data reduction.
 MPEG distinguishes four types of image coding for
processing.
 The reasons behind this are the contradictory demands
for an efficient coding scheme and fast random access.
 To achieve a high compression ratio, temporal
redundancies of subsequent pictures must be exploited
(inter-frame), whereas the demand for fast random access
requires intra-frame coding.
 The following types of images are distinguished (image is
used as a synonym for still image or frame):
 I-frames (Intra-coded images) are self contained, i.e.,
coded without any reference to other images. An I-frame
is treated as a still image. MPEG makes use of JPEG for I-
frames.
 I-frames use 8 x 8 blocks defined within a macro
block, on which a DCT is performed. The DC-
coefficients are then DPCM coded; differences of
successive blocks of one component are computed
and transformed using variable-length coding.
 P-frames (Predictive-coded frames) require information of
the previous I-frame and/or all previous P-frames for
encoding and decoding.
 The coding of P-frames is based on the fact that, by
successive images, their areas often do not change at all
but instead, the whole area is shifted.
 In this case of temporal redundancy, the block of the last
P- or I-frame that is most similar to the block under
consideration is determined.
 Several methods for motion estimation are available to
the encoder. The most processing intensive methods tend
to give better results, so the following trade-offs must be
made in the encoder: computational power, and
hence cost, versus video quality.
 B-frames (Bi-directionally predictive-coded frames)
require information of the previous and following I-and/or
P-frame for encoding and decoding.
 The highest compression ratio is attainable by using these
frames. A B-frame is defined as the difference of a
prediction of the past image and the following P- or I-
frame. B-frames can never be directly accessed in a
random fashion.
 D-frames (DC-coded frames) are intra-frame-encoded.
They can be used for fast forward or fast rewind modes.
 The DC-parameters are DCT-coded; the AC-coefficients
are neglected. D-frames consist only of the lowest
frequencies of an image. They only use one type of macro
block and only the DC-coefficients are encoded.
 D-frames are used for display in fast-forward or fast-
rewind modes. This could also be realized by a suitable
order of I-frames. For this purpose, I-frames must occur
periodically in data stream.
 The regularity of a sequence of I-, P- and B-frames is
determined by the MPBG application.
 For fast random access, the best resolution would be
achieved by coding the whole data stream as I-frames.
 On the other hand, the highest degree of compression is
attained by using as many B-frames as possible.
 For practical applications, the following sequence has proved
to be useful," IBBPBBPBB IBBPBBPBB ..." fn this case, random
access would have a resolution of nine still images (i.e., about
330 milliseconds), and it still provides a very good
compression ratio.
 Concerning quantization, it should be mentioned that AC-
coefficients of B- and P-frames are usually large values,
whereas those of I-frames are smaller values. Thus, the MPEG
quantization is adjusted respectively.
Audio Encoding
 MPEG audio coding uses the same sampling frequencies as
Compact Disc Digital Audio (CD-DA) and Digital Audio Tape
(DAT), i.e. 44.1 kHz and 48 kHz, and additionally, 32kHz is
available, all at 16 bits.
 The audio coding can be performed with a single channel, two
independent channels or one stereo signal.
 In the definition of MPEG, there are two different stereo
modes: two channels that are processed either independently
or as joint stereo. In the case of joint stereo, MPEG exploits
redundancy of both channels and achieves a higher
compression ratio.
 layer 1 allows for a maximal bit rate of 448 Kbits/second, layer
2 for 384 Kbits/second and layer 3 for 320 Kbits/s.
Data Stream
 Audio Stream
 MPEG specifies a syntax for the interleaved audio and video
data streams.
 An audio data stream consists of frames, which are divided into
audio access units.
 Each audio access unit is composed of slots.
 Video Stream: A video data stream is comprised of six
layers
 At the highest level, the sequence layer, data buffering is
handled. A data stream should have low requirements in
terms of storage capacity.
 The group of pictures layer is the next layer. This layer consists
of a minimum of one I-frame, which is the first frame.
 The picture layer contains a whole picture. The temporal
reference is defined by an image number.
 The next layer is the slice layer. Each slice consists of a number
of macro blocks that may vary from one image to the next.
Additionally, the DCT quantization of each macro block of a
slice is specified.
 The fifth layer is the macro block layer. It contains the sum of
the features of each macro block as described above.
 The lowest layer is the block layer.
 The MPEG standard also specifies the combination of
data streams into a single data stream in the system
definition.
 The same idea was pursued in DVI to define the AVSS
(Audio/Video Support System) data format.
 The most important task of this process is the actual
multiplexing. It includes the coordination of input data
streams and output data streams, the adjustment of
clocks and buffer management.
 For further reading visit: [Link]
85/2/ce342/resources/root/BOOK/Multimedia/215814-
%20Chapter%[Link]

You might also like