Image Compression
Goal
 Store image data as efficiently as possible
 Ideally, want to
– Maximize image quality
– Minimize storage space and processing resources
 Can’t have best of both worlds
 What are some good compromises?
Why is it possible to compress images?
 Data != information/knowledge
 Data >> information
 Key idea in compression: only keep the info.
 But why is data != info? Answer: Redundancy
 Statistical redundancy
– Spatial redundancy and coding redundancy
 Phychovisual redundancy
– Greyscale redundancy and frequency redundancy
Spatial redundancy
 Pixel values are not spatially independent
 High correlation among neighbor pixels
Coding redundancy
 Redundancy when mapping from the pixels (symbols)
to the final compressed binary code (Information theory)
 Example:
 Lavg,1 = 3 bits/symbol
 Lavg,2 = 4x0.1+2x0.2+0.5+4x0.05+3x0.15 = 1.95 bits/sym
 Code 2 is also unique and shorter
Phychovisual redundancy
 The ”end-user” is a human => only represent the info.
which can be perceived by the Human Visual System
(HSV)
 From a data’s point of view => lossy
 From the HSV’s point of view => lossless
 Intensity redundancy
 Frequency redundancy
Low-level
Processing
unit
Incident
light
Preceived visual
information
High-level
Processing
unit
Intensity redundancy
 Weber’s law: ∆I / I1 ~ constant
∆I = I1 – I2, where just recognizable
 The high (bright) values need
a less accurate representation
compared to the low (dark) values
 Weber’s law holds for all
human senses!
I2
I1
Sound
∆I
∆I
Noise level
I2
I2
I1
I1
Frequency redundancy
 Human eye functions as a lowpass filter =>
– High frequencies in an image can be ”ignored” without the
HVS noticing
– Key issue in lossy image compression
 Now we know why image compression is possible:
Redundancies
 Investigate how to implement these redundancies
in algorithms
Two main schools of image
compression
 Lossless
– Stored image data can
reproduce original image
exactly
– Takes more storage
space
– Uses entropy coding only
(or none at all)
– Examples: BMP, TIFF,
GIF
 Lossy
– Stored image data can
reproduce something that
looks “close” to the
original image
– Uses both quantization
and entropy coding
– Usually involves transform
into frequency or other
domain
– Examples: JPEG, JPEG-
2000
Concepts
 Lossless compression
– Astronomy, medicine => details
 Lossy compression
– ”normal” images => overall
 An image: 2D array of pixels (rows, columns)
– A pixel value:
 Greyscale/intensity: [8 bits]
 Color: [3 x 8 bits] [Red, Green, Blue] [R,G,B]
– Data:
 Greyscale: 300x300 = 90.000 pixels = 90Kbyte = 720Kbit
 Color: x3 = 270Kbyte = 2.16Mbit
BMP (Bitmap)
 Use 3 bytes per pixel, one
each for R, G, and B
 Can represent up to 224
= 16.7
million colors
 No entropy coding
 File size in bytes =
3*length*height, which can be
very large
 Can use fewer than 8 bits per
color, but you need to store
the color palette
 Performs well with ZIP, RAR,
etc.
GIF (Graphics Interchange Format)
 Can use up to 256 colors
from 24-bit RGB color space
– If source image contains
more than 256 colors, need
to reprocess image to fewer
colors
 Suitable for simpler images
such as logos and textual
graphics, not so much for
photographs
 Uses LZW lossless data
compression
JPEG (Joint Photographic Experts
Group)
 Most dominant image
format today
 Typical file size is about
10% of that of BMP (can
vary depending on
quality settings)
 Unlike GIF, JPEG is
suitable for photographs,
not so much for logos
and textual graphics
JPEG Encoding Steps
 Preprocess image
 Apply 2D forward DCT
 Quantize DCT coefficients
 Apply RLE, then entropy encoding
JPEG Block Diagram
FDCT
Source
Image
Quantizer
Entropy
Encoder
TableTable
Compressed
image data
DCT-based encoding
8x8 blocks
R
B
G
Preprocess
 Shift values [0, 2P
- 1] to [-2P-1
, 2P-1
- 1]
– e.g. if (P=8), shift [0, 255] to [-127, 127]
– DCT requires range be centered around 0
 Segment each component into 8x8 blocks
 Interleave components (or not)
– may be sampled at different rates
Interleaving
 Non-interleaved: scan from left to right, top to
bottom for each color component
 Interleaved: compute one “unit” from each
color component, then repeat
– full color pixels after each step of decoding
– but components may have different resolution
Color Transformation (optional)
 Down-sample chrominance components
– compress without loss of quality (color space)
– e.g., YUV 4:2:2 or 4:1:1
 Example: 640 x 480 RGB to YUV 4:1:1
– Y is 640x480
– U is 160x120
– V is 160x120
Interleaving
 ith color component has dimension (xi, yi)
– maximum dimension value is 216
– [X, Y] where X=max(xi) and Y=max(yi)
 Sampling among components must be integral
– Hi and Vi; must be within range [1, 4]
– [Hmax, Vmax] where Hmax=max(Hi) and Vmax=max(Vi)
 xi = X * Hi / Hmax
 yi = Y * Vi / Vmax
Example
[Wallace, 1991]
Forward DCT
 Convert from spatial to frequency domain
– convert intensity function into weighted sum of
periodic basis (cosine) functions
– identify bands of spectral information that can be
thrown away without loss of quality
 Intensity values in each color plane often
change slowly
Understanding DCT
 For example, in R3
, we can write (5, 2, 9) as the
sum of a set of basis vectors
– we know that [(1,0,0), (0,1,0), (0,0,1)] provides one
set of basis functions in R3
(5,2,9) = 5*(1,0,0) + 2*(0,1,0) + 9*(0,0,1)
 DCT is same process in function domain
DCT Basic Functions
 Decompose the intensity function into a
weighted sum of cosine basis functions
Alternative Visualization
1D Forward DCT
 Given a list of n intensity values I(x),
where x = 0, …, N-1
 Compute the N DCT coefficients:
1...0,
2
)12(
cos)()(
2
)(
1
0
−=
+
= ∑
−
=
nu
n
x
xIuC
n
uF
n
x
µπ




=
=
otherwise
ufor
uCwhere
1
,0
2
1
)(
1D Inverse DCT
 Given a list of n DCT coefficients F(u),
where u = 0, …, n-1
 Compute the n intensity values:




=
=
otherwise
ufor
uCwhere
1
,0
2
1
)(
1...0,
2
)12(
cos)()(
2
)(
1
0
−=
+
= ∑
−
=
nx
n
x
uCuF
n
xI
n
u
µπ
Extend DCT from 1D to 2D
 Perform 1D DCT on each
row of the block
 Again for each column of
1D coefficients
– alternatively, transpose
the matrix and perform
DCT on the rows
X
Y
Equations for 2D DCT
 Forward DCT:
 Inverse DCT:





 +





 +
= ∑∑
−
=
−
= m
vy
n
ux
yxIvCuC
nm
vuF
m
y
n
x 2
)12(
cos*
2
)12(
cos*),()()(
2
),(
1
0
1
0
ππ





 +





 +
= ∑∑
−
=
−
= m
vy
n
ux
vCuCuvF
nm
xyI
m
v
n
u 2
)12(
cos*
2
)12(
cos)()(),(
2
),(
1
0
1
0
ππ
Visualization of Basis FunctionsIncreasingfrequency
Increasing frequency
Quantization
 Divide each coefficient by integer [1, 255]
– comes in the form of a table, same size as a block
– multiply the block of coefficients by the table, round result to
nearest integer
 In the decoding process, multiply the quantized
coefficients by the inverse of the table
– get back a number close to original
– error is less than 1/2 of the quantization number
 Larger quantization numbers cause more loss
De facto Quantization Table
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72 92 95 98 112 100 103 99
Eye becomes less sensitive
Eyebecomeslesssensitive
Entropy Encoding
 Compress sequence of quantized DC and AC
coefficients from quantization step
– further increase compression, without loss
 Separate DC from AC components
– DC components change slowly, thus will be
encoded using difference encoding
DC Encoding
 DC represents average intensity of a block
– encode using difference encoding scheme
– use 3x3 pattern of blocks
 Because difference tends to be near zero, can
use less bits in the encoding
– categorize difference into difference classes
– send the index of the difference class, followed by
bits representing the difference
AC Encoding
 Use zig-zag ordering of coefficients
– orders frequency components from low->high
– produce maximal series of 0s at the end
 Apply RLE to ordering
Huffman Encoding
 Sequence of DC difference indices and values
along with RLE of AC coefficients
 Apply Huffman encoding to sequence – Exploits
sequence’s statistics by assigning frequently used
symbols fewer bits than rare symbols
 Attach appropriate headers
 Finally have the JPEG image!
JPEG Decoding Steps
 Basically reverse steps performed in encoding
process
– Parse compressed image data and perform
Huffman decoding to get RLE symbols
– Undo RLE to get DCT coefficient matrix
– Multiply by the quantization matrix
– Take 2-D inverse DCT of this matrix to get
reconstructed image data
Reconstruction Error
 Resulting image is “close” to original image
 Usually measure “closeness” with MSE (Mean
Squared Error) and PSNR (Peak Signal to
Noise Ratio) – Want low MSE and high PSNR
Example - One everyday photo with
file size of 2.76 MB
Example - One everyday photo with
file size of 600 KB
Example - One everyday photo with
file size of 350 KB
Example - One everyday photo with
file size of 240 KB
Example - One everyday photo with
file size of 144 KB
Example - One everyday photo with
file size of 88 KB
Analysis
 Near perfect image at 2.76M, so-so image at 88K
 Sharpness decreases as file size decreases
 False contours visible at 144K and 88K
– Can be fixed by dithering image before quantization
 Which file size is the best?
– No correct answer to this question
– Answer depends upon how strict we are about image quality, what
purpose image is to be used for, and the resources available

Why Image compression is Necessary?

  • 1.
  • 2.
    Goal  Store imagedata as efficiently as possible  Ideally, want to – Maximize image quality – Minimize storage space and processing resources  Can’t have best of both worlds  What are some good compromises?
  • 3.
    Why is itpossible to compress images?  Data != information/knowledge  Data >> information  Key idea in compression: only keep the info.  But why is data != info? Answer: Redundancy  Statistical redundancy – Spatial redundancy and coding redundancy  Phychovisual redundancy – Greyscale redundancy and frequency redundancy
  • 4.
    Spatial redundancy  Pixelvalues are not spatially independent  High correlation among neighbor pixels
  • 5.
    Coding redundancy  Redundancywhen mapping from the pixels (symbols) to the final compressed binary code (Information theory)  Example:  Lavg,1 = 3 bits/symbol  Lavg,2 = 4x0.1+2x0.2+0.5+4x0.05+3x0.15 = 1.95 bits/sym  Code 2 is also unique and shorter
  • 6.
    Phychovisual redundancy  The”end-user” is a human => only represent the info. which can be perceived by the Human Visual System (HSV)  From a data’s point of view => lossy  From the HSV’s point of view => lossless  Intensity redundancy  Frequency redundancy Low-level Processing unit Incident light Preceived visual information High-level Processing unit
  • 7.
    Intensity redundancy  Weber’slaw: ∆I / I1 ~ constant ∆I = I1 – I2, where just recognizable  The high (bright) values need a less accurate representation compared to the low (dark) values  Weber’s law holds for all human senses! I2 I1 Sound ∆I ∆I Noise level I2 I2 I1 I1
  • 8.
    Frequency redundancy  Humaneye functions as a lowpass filter => – High frequencies in an image can be ”ignored” without the HVS noticing – Key issue in lossy image compression  Now we know why image compression is possible: Redundancies  Investigate how to implement these redundancies in algorithms
  • 9.
    Two main schoolsof image compression  Lossless – Stored image data can reproduce original image exactly – Takes more storage space – Uses entropy coding only (or none at all) – Examples: BMP, TIFF, GIF  Lossy – Stored image data can reproduce something that looks “close” to the original image – Uses both quantization and entropy coding – Usually involves transform into frequency or other domain – Examples: JPEG, JPEG- 2000
  • 10.
    Concepts  Lossless compression –Astronomy, medicine => details  Lossy compression – ”normal” images => overall  An image: 2D array of pixels (rows, columns) – A pixel value:  Greyscale/intensity: [8 bits]  Color: [3 x 8 bits] [Red, Green, Blue] [R,G,B] – Data:  Greyscale: 300x300 = 90.000 pixels = 90Kbyte = 720Kbit  Color: x3 = 270Kbyte = 2.16Mbit
  • 11.
    BMP (Bitmap)  Use3 bytes per pixel, one each for R, G, and B  Can represent up to 224 = 16.7 million colors  No entropy coding  File size in bytes = 3*length*height, which can be very large  Can use fewer than 8 bits per color, but you need to store the color palette  Performs well with ZIP, RAR, etc.
  • 12.
    GIF (Graphics InterchangeFormat)  Can use up to 256 colors from 24-bit RGB color space – If source image contains more than 256 colors, need to reprocess image to fewer colors  Suitable for simpler images such as logos and textual graphics, not so much for photographs  Uses LZW lossless data compression
  • 13.
    JPEG (Joint PhotographicExperts Group)  Most dominant image format today  Typical file size is about 10% of that of BMP (can vary depending on quality settings)  Unlike GIF, JPEG is suitable for photographs, not so much for logos and textual graphics
  • 14.
    JPEG Encoding Steps Preprocess image  Apply 2D forward DCT  Quantize DCT coefficients  Apply RLE, then entropy encoding
  • 15.
  • 16.
    Preprocess  Shift values[0, 2P - 1] to [-2P-1 , 2P-1 - 1] – e.g. if (P=8), shift [0, 255] to [-127, 127] – DCT requires range be centered around 0  Segment each component into 8x8 blocks  Interleave components (or not) – may be sampled at different rates
  • 17.
    Interleaving  Non-interleaved: scanfrom left to right, top to bottom for each color component  Interleaved: compute one “unit” from each color component, then repeat – full color pixels after each step of decoding – but components may have different resolution
  • 18.
    Color Transformation (optional) Down-sample chrominance components – compress without loss of quality (color space) – e.g., YUV 4:2:2 or 4:1:1  Example: 640 x 480 RGB to YUV 4:1:1 – Y is 640x480 – U is 160x120 – V is 160x120
  • 19.
    Interleaving  ith colorcomponent has dimension (xi, yi) – maximum dimension value is 216 – [X, Y] where X=max(xi) and Y=max(yi)  Sampling among components must be integral – Hi and Vi; must be within range [1, 4] – [Hmax, Vmax] where Hmax=max(Hi) and Vmax=max(Vi)  xi = X * Hi / Hmax  yi = Y * Vi / Vmax
  • 20.
  • 21.
    Forward DCT  Convertfrom spatial to frequency domain – convert intensity function into weighted sum of periodic basis (cosine) functions – identify bands of spectral information that can be thrown away without loss of quality  Intensity values in each color plane often change slowly
  • 22.
    Understanding DCT  Forexample, in R3 , we can write (5, 2, 9) as the sum of a set of basis vectors – we know that [(1,0,0), (0,1,0), (0,0,1)] provides one set of basis functions in R3 (5,2,9) = 5*(1,0,0) + 2*(0,1,0) + 9*(0,0,1)  DCT is same process in function domain
  • 23.
    DCT Basic Functions Decompose the intensity function into a weighted sum of cosine basis functions
  • 24.
  • 25.
    1D Forward DCT Given a list of n intensity values I(x), where x = 0, …, N-1  Compute the N DCT coefficients: 1...0, 2 )12( cos)()( 2 )( 1 0 −= + = ∑ − = nu n x xIuC n uF n x µπ     = = otherwise ufor uCwhere 1 ,0 2 1 )(
  • 26.
    1D Inverse DCT Given a list of n DCT coefficients F(u), where u = 0, …, n-1  Compute the n intensity values:     = = otherwise ufor uCwhere 1 ,0 2 1 )( 1...0, 2 )12( cos)()( 2 )( 1 0 −= + = ∑ − = nx n x uCuF n xI n u µπ
  • 27.
    Extend DCT from1D to 2D  Perform 1D DCT on each row of the block  Again for each column of 1D coefficients – alternatively, transpose the matrix and perform DCT on the rows X Y
  • 28.
    Equations for 2DDCT  Forward DCT:  Inverse DCT:       +       + = ∑∑ − = − = m vy n ux yxIvCuC nm vuF m y n x 2 )12( cos* 2 )12( cos*),()()( 2 ),( 1 0 1 0 ππ       +       + = ∑∑ − = − = m vy n ux vCuCuvF nm xyI m v n u 2 )12( cos* 2 )12( cos)()(),( 2 ),( 1 0 1 0 ππ
  • 29.
    Visualization of BasisFunctionsIncreasingfrequency Increasing frequency
  • 30.
    Quantization  Divide eachcoefficient by integer [1, 255] – comes in the form of a table, same size as a block – multiply the block of coefficients by the table, round result to nearest integer  In the decoding process, multiply the quantized coefficients by the inverse of the table – get back a number close to original – error is less than 1/2 of the quantization number  Larger quantization numbers cause more loss
  • 31.
    De facto QuantizationTable 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 Eye becomes less sensitive Eyebecomeslesssensitive
  • 32.
    Entropy Encoding  Compresssequence of quantized DC and AC coefficients from quantization step – further increase compression, without loss  Separate DC from AC components – DC components change slowly, thus will be encoded using difference encoding
  • 33.
    DC Encoding  DCrepresents average intensity of a block – encode using difference encoding scheme – use 3x3 pattern of blocks  Because difference tends to be near zero, can use less bits in the encoding – categorize difference into difference classes – send the index of the difference class, followed by bits representing the difference
  • 34.
    AC Encoding  Usezig-zag ordering of coefficients – orders frequency components from low->high – produce maximal series of 0s at the end  Apply RLE to ordering
  • 35.
    Huffman Encoding  Sequenceof DC difference indices and values along with RLE of AC coefficients  Apply Huffman encoding to sequence – Exploits sequence’s statistics by assigning frequently used symbols fewer bits than rare symbols  Attach appropriate headers  Finally have the JPEG image!
  • 36.
    JPEG Decoding Steps Basically reverse steps performed in encoding process – Parse compressed image data and perform Huffman decoding to get RLE symbols – Undo RLE to get DCT coefficient matrix – Multiply by the quantization matrix – Take 2-D inverse DCT of this matrix to get reconstructed image data
  • 37.
    Reconstruction Error  Resultingimage is “close” to original image  Usually measure “closeness” with MSE (Mean Squared Error) and PSNR (Peak Signal to Noise Ratio) – Want low MSE and high PSNR
  • 38.
    Example - Oneeveryday photo with file size of 2.76 MB
  • 39.
    Example - Oneeveryday photo with file size of 600 KB
  • 40.
    Example - Oneeveryday photo with file size of 350 KB
  • 41.
    Example - Oneeveryday photo with file size of 240 KB
  • 42.
    Example - Oneeveryday photo with file size of 144 KB
  • 43.
    Example - Oneeveryday photo with file size of 88 KB
  • 44.
    Analysis  Near perfectimage at 2.76M, so-so image at 88K  Sharpness decreases as file size decreases  False contours visible at 144K and 88K – Can be fixed by dithering image before quantization  Which file size is the best? – No correct answer to this question – Answer depends upon how strict we are about image quality, what purpose image is to be used for, and the resources available