Module 1
Module 1
• Improve the pictorial information so that it is more suited for human interpretation.
• Facilitate autonomous perception of an image by processing image data for storage,
transmission and representation.
Digital image: An image may be defined as a two-dimensional function f(x, y), where x and y are
spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the
intensity or gray level of the image at that point. When x, y, and the intensity values of f are all
finite, discrete quantities, we call the image a digital image. A digital image composed of a finite
number of elements, each of which has a particular location and value. These elements are
commonly called pixels.
Digital image processing is processing of digital images using computers. We can divide the
processing of images as low-level, mid-level, and high-level processes.
A low-level process includes primitive operations such as reducing noise, enhancing
contrast, and sharpening of images. Both input and output of a low-level process are images.
A mid-level process includes segmentation (partitioning and image into regions or objects),
descriptions of those objects to reduce them to a form suitable for computer processing, and
classification of individual objects. Inputs to a mid-level process are images but outputs are
attributes extracted from those images (identity of individual objects, contours, and edges).
A high-level process involves making sense of the identified objects and performing
cognitive functions. This last step is commonly called computer vision.
Digital image processing generally involves low-level processes and mid-level processes.
Steps in digital image processing.
1. Image acquisition
2. Image enhancement
3. Image restoration
4. Color image processing
5. Wavelets
6. Compression
7. Morphological processing
8. Segmentation
9. Representation and Description
10. Recognition
Inputs to all these processes are images. Outputs of processes 1 – 7 are generally images and
outputs of processes 8 – 10 are generally image attributes. Process 7, morphological processing may
be thought of as kind of in between, that is, its output may be images and image attributes.
Image sensing – With reference to sensing, two elements are needed to acquire digital images. The
first is the physical device that is used to capture the image of the object of interest, and second one
is the digitizer, that converts the captured image into a digital image.
Image processing hardware – It contains the digitizer and hardware (ALU) that performs other
primitive arithmetic and logical operations on entire image. The digitization and ALU operations
are done in parallel. For example, digitization and averaging of video images can be done 30 fps by
such hardware. These kind of performance cannot be expected from a typical computer hardware.
Computer – It is a general purpose computer, that can range from a personal computer or laptop to
supercomputer.
Image processing software – Such software consists of specialized modules for performing specific
operations on images, and programming facility to use those modules; usually known as an
Application Programming Interface (API). Usually the APIs available in multiple programming
languages.
Mass storage – Size of images may get out of hand very rapidly. As an example, a tiny 16x16 image
with 32-bit encoding will take 8 KB of space in memory. Most of the practical scenarios involves
images with far greater size and resolution. As a result every image processing applications must
contain mass storage facility. Digital storage for image processing applications falls into three
principal categories: (1) short-term storage for use during processing; (2) on-line storage for
relatively fast recall; and (3) archival storage, characterized by infrequent access. One method of
providing short-term storage is computer memory. Another is by specialized boards, called frame
buffers, that store one or more images and can be accessed rapidly, usually at video rates (e.g., at 30
complete images per second). The latter method allows virtually instantaneous image zoom, as well
as scroll (vertical shifts) and pan (horizontal shifts). Frame buffers usually are housed in the
specialized image processing hardware. On-line storage generally takes the form of magnetic disks
or optical-media storage. The key factor characterizing on-line storage is frequent access to the
stored data. Finally, archival storage is characterized by massive storage requirements but
infrequent need for access. Magnetic tapes and optical disks housed in “jukeboxes” are the usual
media for archival applications.
Image Displays – Image displays in use today are mainly color, flat screen monitors. Monitors are
driven by the outputs of image and graphics display cards that are an integral part of the computer
system.
Hardcopy Devices – Hardcopy devices for recording images include laser printers, film cameras,
heat-sensitive devices, ink-jet units, and digital units, such as optical and CD-ROM disks.
Networking – Networking and cloud communication are almost default functions in any computer
system in use today. Because of the large amount of data inherent in image processing applications,
the key consideration in image transmission is bandwidth. In dedicated networks, this typically is
not a problem, but communications with remote sites via the internet are not always as efficient.
In order to generate a 2-D image using a single sensing element, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be imaged. The
figure below shows an arrangement used in high-precision scanning, where a film negative is
mounted onto a drum whose mechanical rotation provides displacement in one dimension. The
sensor is mounted on a lead screw that provides motion in the perpendicular direction. A light
source is contained inside the drum. As the light passes through the film, its intensity is modified by
the film density before it is captured by the sensor. This "modulation" of the light intensity causes
corresponding variations in the sensor voltage, which are ultimately converted to image intensity
levels by digitization.
Image Interpolation
Interpolation is used in tasks such as zooming, shrinking, rotating, and geometrically correcting
digital images.
Interpolation is the process of using known data to estimate values at unknown locations. We begin
the discussion of this topic with a short example. Suppose that an image of size 500 * 500 pixels has
to be enlarged 1.5 times to 750 * 750 pixels. A simple way to visualize zooming is to create an
imaginary 750 * 750 grid with the same pixel spacing as the original image, then shrink it so that it
exactly overlays the original image. Obviously, the pixel spacing in the shrunken 750 * 750 grid
will be less than the pixel spacing in the original image. To assign an intensity value to any point in
the overlay, we look for its closest pixel in the underlying original image and assign the intensity of
that pixel to the new pixel in the 750 * 750 grid. When intensities have been assigned to all the
points in the overlay grid, we expand it back to the specified size to obtain the resized image. The
method just discussed is called nearest neighbor interpolation because it assigns to each new
location the intensity of its nearest neighbor in the original image. This approach is simple but, it
has the tendency to produce undesirable artifacts, such as severe distortion of straight edges. A more
suitable approach is bilinear interpolation, in which we use the four nearest neighbors to estimate
the intensity at a given location. Let (x, y) denote the coordinates of the location to which we want
to assign an intensity value (think of it as a point of the grid described previously), and let v(x, y)
denote that intensity value. For bilinear interpolation, the assigned value is obtained using the
equation v(x, y) = ax + by + cxy + d, where the four coefficients are determined from the four
equations in four unknowns that can be written using the four nearest neighbors of point (x, y).
Bilinear interpolation gives much better results than nearest neighbor interpolation, with a modest
increase in computational burden.
The next level of complexity is bicubic interpolation, which involves the sixteen nearest
neighbors of a point. The sixteen coefficients are determined from the sixteen equations with
sixteen unknowns that can be written using the sixteen nearest neighbors of point (x, y). Bicubic
interpolation is the standard used in commercial image editing applications, such as Adobe
Photoshop and Corel Photopaint.
Although images are displayed with integer coordinates, it is possible during processing to
work with subpixel accuracy by increasing the size of the image using interpolation to “fill the
gaps” between pixels in the original image.
It is possible to use more neighbors in interpolation, and there are more complex techniques,
such as using splines or wavelets, that in some instances can yield better results than the methods
just discussed.
Basic Relationships Between Pixels
In this section, we discuss several important relationships between pixels in a digital image. When
referring in the following discussion to particular pixels, we use lowercase letters, such as p and q.
Neighbors of a Pixel
A pixel p at coordinates (x, y) has two horizontal and two vertical neighbors with coordinates
(x + 1, y), (x − 1, y), (x, y + 1), (x, y − 1). This set of pixels, called the 4-neighbors of p, is denoted
as N4 (p). The four diagonal neighbors of p have coordinates (x + 1, y + 1), (x + 1, y − 1), (x − 1, y
+ 1), (x − 1, y − 1) and are denoted as ND (p). These neighbors, together with the 4-neighbors, are
called the 8-neighbors of p, denoted by N8(p). The set of image locations of the neighbors of a
point p is called the neighborhood of p. The neighborhood is said to be closed if it contains p.
Otherwise, the neighborhood is said to be open.
Distance Measures
For pixels p, q, and s, with coordinates (x, y), (u,v), and (w,z), respectively, D is a distance function
or metric if
a) D(p,q) ≥ 0 (D(p,q) = 0 iff p = q),
b) D(p,q) = D(q, p), and
c) D(p, s) ≤ D(p, q) + D(q, s).
Note that the D4 and D8 distances between p and q are independent of any paths that might exist
between these points because these distances involve only the coordinates of the points. In the case
of m-adjacency, however, the Dm distance between two points is defined as the shortest m-path
between the points. In this case, the distance between two pixels will depend on the values of the
pixels along the path, as well as the values of their neighbors.
Arithmetic Operations: Arithmetic operations between two images f (x, y) and g(x, y) are denoted
as
s(x,y) = f(x,y) + g(x,y)
d(x,y) = f(x,y) – g(x,y)
p(x,y) = f(x,y) x g(x,y)
v(x,y) = f(x,y) / g(x,y)
These are element-wise operations which means that they are performed between corresponding
pixel pairs in f and g for x = 0, 1, 2,…,M − 1 and y = 0, 1, 2,…, N − 1. As usual, M and N are the
row and column sizes of the images. Clearly, s, d, p, and v are images of size M × N also.
Addition of images for noise reduction: Suppose that g(x, y) is a corrupted image formed by the
addition of noise, η(x,y), to a noiseless image f (x, y) ; that is, g( x , y )=f ( x , y)+η ( x , y ). where
the assumption is that at every pair of coordinates (x, y) the noise is uncorrelated and has zero
average value. We assume also that the noise and image values are uncorrelated (this is a typical
assumption for additive noise). The objective of the following procedure is to reduce the noise
content of the output image by adding a set of noisy input images, {gi(x,y)}. This is a technique
used frequently for image enhancement. If the noise satisfies the constraints just stated, it can be
K
1
shown that if an image is formed by averaging K different noisy images, ḡ( x , y )= ∑ g i ( x , y) ,
K i=1
2 1 2
then it follows that E { ḡ( x , y)}=f (x , y ) , σ ḡ( x , y)= σ η ( x , y). As K increases (more and more
K
images are added) the variance decreases and as a result the variability of pixel values at each
location decreases and the summed up image converges to its expected value, f(x,y); which is the
noiseless image. In order to avoid blurring, images are aligned spatially (registered).
An important application of image averaging is in the field of astronomy, where imaging
under very low light levels often cause sensor noise to render individual images virtually useless for
analysis (lowering the temperature of the sensor helps reduce noise).
Subtracting images for comparison: If two images are subtracted pixel-by-pixel, then the
resulting image will contain pixel values 0 (black) where there were no differences, 1 (white) where
pixels were completely different and values between black and white for the rest. This technique is
used to find out the acceptable reduction limit of resolution of an image. If an image can be stored
using lower resolutions, then it will occupy smaller space in computer memory. But reducing
resolution means losing information. Subtraction of images of same object with different
resolutions tells how much reduction in resolution is acceptable, so that there is a balance between
loss of information and the size of the image.
(a) Difference between the 930 dpi and 72 dpi images. (b) Difference between the 930 dpi and
150 dpi images. (c) Difference between the 930 dpi and 300 dpi images.
Using image multiplication and division for shading correction and for masking.
An important application of image multiplication (and division) is shading correction. Suppose that
an imaging sensor produces images that can be modeled as the product of a “perfect image,”
denoted by f (x, y), times a shading function, h(x, y); that is, g(x, y) = f (x, y)h(x, y). If h(x, y) is
known or can be estimated, we can obtain f (x, y) (or an estimate of it) by multiplying the sensed
image by the inverse of h(x, y) (i.e., dividing g by h using element wise division).
Another use of image multiplication is in masking, also called region of interest (ROI),
operations. The process consists of multiplying a given image by a mask image that has 1’s in the
ROI and 0’s elsewhere. There can be more than one ROI in the mask image, and the shape of the
ROI can be arbitrary.
Geometric Transformations
We use geometric transformations modify the spatial arrangement of pixels in an image. These
transformations are called rubber-sheet transformations because they may be viewed as analogous
to “printing” an image on a rubber sheet, then stretching or shrinking the sheet according to a
predefined set of rules. Geometric transformations of digital images consist of two basic operations:
1. Spatial transformation of coordinates.
2. Intensity interpolation that assigns intensity values to the spatially transformed pixels.
The transformation of coordinates may be expressed as
[ ] [] [
x ' = T x = t 11 t 12 x
y' y ][ ]
t 21 t 22 y
where (x,y) are pixel coordinates in the original image and (x′, y′) are the corresponding pixel
coordinates of the transformed image. For example, the transformation (x′, y′) = (x/2,y/2) shrinks
the original image to half its size in both spatial directions.
Our interest is in so-called affine transformations, which include scaling, translation, rotation, and
shearing. The key characteristic of an affine transformation in 2-D is that it preserves points,
straight lines, and planes. The above can be used to express the transformations just mentioned,
except translation, which would require that a constant 2-D vector be added to the right side of the
equation. However, it is possible to use homogeneous coordinates to express all four affine
transformations using a single 3 × 3 matrix in the following general form:
[ ] [] [ ][ ]
x' x a 11 a12 a 13 x
y' = A y = a 21 a 22 a23 y
1 1 0 0 1 1
This transformation can scale, rotate, translate, or sheer an image, depending on the values chosen
for the elements of matrix A.
A significant advantage of being able to perform all transformations using the unified representation
in above equation is that it provides the framework for concatenating a sequence of operations. For
example, if we want to resize an image, rotate it, and move the result to some location, we simply
form a 3 × 3 matrix equal to the product of the scaling, rotation, and translation matrices.
We can use the equation in two basic ways. The first, is a forward mapping, which consists of
scanning the pixels of the input image and, at each location (x,y), computing the spatial location (x′,
y′) of the corresponding pixel in the output image using the equation directly. A problem with the
forward mapping approach is that two or more pixels in the input image can be transformed to the
same location in the output image, raising the question of how to combine multiple output values
into a single output pixel value. In addition, it is possible that some output locations may not be
assigned a pixel at all.
The second approach, called inverse mapping, scans the output pixel locations and, at each location
(x′, y′), computes the corresponding location in the input image using (x,y) = A−1(x′, y′). It then
interpolates among the nearest input pixels to determine the intensity of the output pixel value.
Inverse mappings are more efficient to implement than forward mappings, and are used in
numerous commercial implementations of spatial transformations (for example, MATLAB uses this
approach).
Intensity Transformations and Spatial Filtering
Spatial domain techniques operate directly on the pixels of an image. The spatial domain processes
we discuss here are based on the expression g(x, y) = T[f (x, y)] where f (x, y) is an input image, g(x,
y) is the output image, and T is an operator on f defined over a neighborhood of point (x, y). The
operator can be applied to the pixels of a single image or to the pixels of a set of images, such as
performing the element wise sum of a sequence of images for noise reduction.
The following figure shows the basic implementation on a single image. The point (x0 , y0 ) shown
is an arbitrary location in the image, and the small region shown is a neighborhood of (x0 , y0 ).
Typically, the neighborhood is rectangular, centered on (x0 , y0), and much smaller in size than the
image.
The process that the figure illustrates consists of moving the center of the neighborhood from pixel
to pixel, and applying the operator T to the pixels in the neighborhood to yield an output value at
that location. Thus, for any specific location (x0 , y0 ), the value of the output image g at those
coordinates is equal to the result of applying T to the
neighborhood with origin at (x0 , y0 ) in f.
For example, suppose that the neighborhood is a square
of size 3 × 3 and that operator T is defined as “compute
the average intensity of the pixels in the neighborhood.”
Consider an arbitrary location in an image, say
(100,150). The result at that location in the output
image, g(100,150), is the sum of f (100,150) and its 8-
neighbors, divided by 9. The center of the neighborhood
is then moved to the next adjacent location and the
procedure is repeated to generate the next value of the
output image g. Typically, the process starts at the top
left of the input image and proceeds pixel by pixel in a
horizontal (vertical) scan, one row (column) at a time.
The smallest possible neighborhood is of size 1 × 1. In this case, g depends only on the value of f at
a single point (x, y) and T becomes an intensity (also called a gray-level, or mapping)
transformation function of the form s = T(r), where, for simplicity in notation, we use s and r to
denote, respectively, the intensity of g and f at any point (x, y).
Fig A Fig B
For example, if T(r) has the form in Fig. A, the result of applying the transformation to every pixel
in f to generate the corresponding pixels in g would be to produce an image of higher contrast than
the original, by darkening the intensity levels below k and brightening the levels above k. In this
technique, sometimes called contrast stretching, values of r lower than k reduce (darken) the values
of s, toward black. The opposite is true for values of r higher than k. Observe how an intensity value
r0 is mapped to obtain the corresponding value s0. In the limiting case shown in Fig. B, T(r)
produces a two level (binary) image. A mapping of this form is called a thresholding function.
Approaches whose results depend only on the intensity at a point sometimes are called point
processing techniques.
Some Basic Intensity Transformation Functions
Image Negatives
The negative of an image with intensity levels in the range [0, L − 1] is obtained by using the
negative transformation function, which has the form: s = L − 1 − r. Reversing the intensity levels
of a digital image in this manner produces the equivalent of a photographic negative. This type of
processing is used, for example, in enhancing white or gray detail embedded in dark regions of an
image, especially when the black areas are dominant in size.
Log Transformations
The general form of the log transformation in is s = c
log(1 + r). where c is a constant and it is assumed that r
≥ 0. The shape of the log curve in the figure shows that
this transformation maps a narrow range of low intensity
values in the input into a wider range of output levels.
For example, note how input levels in the range [0, L 4]
map to output levels to the range [0, 3L 4]. Conversely,
higher values of input levels are mapped to a narrower
range in the output. We use a transformation of this type
to expand the values of dark pixels in an image, while
compressing the higher-level values. The opposite is
true of the inverse log (exponential) transformation.
γ = 3.0
γ = 0.3
γ = 4.0 γ = 5.0
γ = 0.4 γ = 0.6
Contrast Stretching
Low-contrast images can result from poor illumination, lack of dynamic range in the imaging
sensor, or even the wrong setting of a lens aperture during image acquisition. Contrast stretching
expands the range of intensity levels in an image so that it spans the ideal full intensity range of the
recording medium or display device.
Intensity-Level Slicing a b
There are applications in which it is of interest to highlight a specific range of intensities in an
image. Some of these applications include enhancing features in satellite imagery, such as masses of
water, and enhancing flaws in X-ray images. The method, called intensity-level slicing, can be
implemented in several ways, but most are variations of two basic themes. One approach is to
display in one value (say, white) all the values in the range of interest and in another (say, black) all
other intensities. This transformation, produces a binary image. The second approach, brightens (or
darkens) the desired range of intensities, but leaves all other intensity levels in the image
unchanged.
Bit-Plane Slicing c d
Pixel values are integers composed of bits. For example, values in a 256-level grayscale image are
composed of 8 bits (one byte). Instead of highlighting intensity-level ranges, we could highlight the
contribution made to total image appearance by specific bits. An 8-bit image may be considered as
being composed of eight one-bit planes, with plane 1 containing the lowest-order bit of all pixels in
the image, and plane 8 all the highest-order bits.
Histogram Processing
Let rk , for k = 0,1, 2,…,L − 1, denote the intensities of an L-level digital image, f (x, y). The
unnormalized histogram of f is defined as h(rk) = nk for = 0,1, 2,…, L − 1, where nk is the number of
pixels in f with intensity rk , and the subdivisions of the intensity scale are called histogram bins.
Similarly, the normalized histogram of f is defined as p(rk) = h(rk) / MN = nk / MN, where, as usual,
M and N are the number of image rows and columns, respectively. Mostly, we work with
normalized histograms, which we refer to simply as histograms or image histograms. The sum of
p(rk) = 1 for all values of k is always 1. The components of p(rk) are estimates of the probabilities of
intensity levels occurring in an image. Histograms are simple to compute and are also suitable for
fast hardware implementations, thus making histogram-based techniques a popular tool for real-
time image processing.
In the dark image that the most populated histogram bins are concentrated on the lower (dark) end
of the intensity scale. Similarly, the most populated bins of the light image are biased toward the
higher end of the scale. An image with low contrast has a narrow histogram located typically toward
the middle of the intensity scale. For a monochrome image, this implies a dull, washed-out gray
look. Finally, we see that the components of the histogram of the high-contrast image cover a wide
range of the intensity scale, and the distribution of pixels is not too far from uniform, with few bins
being much higher than the others. Intuitively, it is reasonable to conclude that an image whose
pixels tend to occupy the entire range of possible intensity levels and, in addition, tend to be
distributed uniformly, will have an appearance of high contrast and will exhibit a large variety of
gray tones. The net effect will be an image that shows a great deal of gray-level detail and has a
high dynamic range.
Histogram Equalization
Assuming initially continuous intensity values, let the variable r denote the intensities of an image
to be processed. As usual, we assume that r is in the range [0,L − 1], with r = 0 representing black
and r = L − 1 representing white. For r satisfying these conditions, we focus attention on
transformations (intensity mappings) of the form s = T(r), 0 ≤ r ≤ L − 1, that produce an output
intensity value, s, for a given intensity value r in the input image. We assume that,
(a) T(r) is a monotonic increasing or strictly monotonic increasing function in the interval 0 ≤ r
≤ L − 1; and
(b) 0 ≤ T(r) ≤ L − 1 for 0 ≤ r ≤ L − 1.
The transformation equation for discrete values of k can be written as follows.
k
s k = T (r k ) = ( L−1) ∑ p r (r j ) ∀ k=0,1,2 ,... ,( L−1), where the probability of occurrence of
j=0
nk
intensity level rk in a digital image is given by, p r (r k ) = . This is called a histogram
MN
equalization or histogram linearization transformation.
The left column in the figure shows
the four images, and the center
column shows the result of
performing histogram equalization on
each of these images. The first three
results from top to bottom show
significant improvement. As expected,
histogram equalization did not have
much effect on the fourth image
because its intensities span almost the
full scale already.
For example, consider a 1-D function (such as an intensity scan line through an image) and suppose
that we want to eliminate all its frequencies above a cutoff value, u0 , while “passing” all
frequencies below that value. Figure (a) shows a frequency-domain filter function for doing this.
(The term filter transfer function is used to denote filter functions in the frequency domain—this is
analogous to our use of the term “filter kernel” in the spatial domain.) Appropriately, the function in
Fig. (a) is called a lowpass filter transfer function. In fact, this is an ideal lowpass filter function
because it eliminates all frequencies above u0, while passing all frequencies below this value. That
is, the transition of the filter between low and high frequencies is instantaneous. Such filter
functions are not realizable with physical components, and have issues with “ringing” when
implemented digitally.
To lowpass-filter a spatial signal in the frequency domain, we first convert it to the frequency
domain by computing its Fourier transform, and then multiply the result by the filter transfer
function in Fig. (a) to eliminate frequency components with values higher than u0. To return to the
spatial domain, we take the inverse Fourier transform of the filtered signal. The result will be a
blurred spatial domain function.
Because of the duality between the spatial and frequency domains, we can obtain the same result in
the spatial domain by convolving the equivalent spatial domain filter kernel with the input spatial
function. The equivalent spatial filter kernel is the inverse Fourier transform of the frequency-
domain filter transfer function.
Approaches of constructing spatial filters
There are three general approaches as described below.
• One approach is based on formulating filters based on mathematical properties. For
example, a filter that computes the average of pixels in a neighbourhood blurs an image.
Computing an average is analogous to integration. Conversely, a filter that computes the
local derivative of an image sharpens the image.
• A second approach is based on sampling a 2-D spatial function whose shape has a desired
property. For example, we will show in the next section that samples from a Gaussian
function can be used to construct a weighted-average (lowpass) filter. These 2-D spatial
functions sometimes are generated as the inverse Fourier transform of 2-D filters specified
in the frequency domain.
• A third approach is to design a spatial filter with a specified frequency response. This
approach is based on the concepts discussed in the previous section, and falls in the area of
digital filter design. A 1-D spatial filter with the desired response is obtained (typically using
filter design software). The 1-D filter values can be expressed as a vector v, and a 2-D
separable kernel can then be obtained. Or the 1-D filter can be rotated about its centre to
generate a 2-D kernel that approximates a circularly symmetric function.
Smoothing (Lowpass) Spatial Filters
Smoothing (also called averaging) spatial filters are used to reduce sharp transitions in intensity.
Because random noise typically consists of sharp transitions in intensity, an obvious application of
smoothing is noise reduction. Smoothing prior to image resampling to reduce aliasing, is also a
common application. Smoothing is used to reduce irrelevant detail in an image, where “irrelevant”
refers to pixel regions that are small with respect to the size of the filter kernel. Another application
is for smoothing the false contours that result from using an insufficient number of intensity levels
in an image. Smoothing filters are used in combination with other techniques for image
enhancement, such as the histogram processing techniques.
Box Filter Kernels
The simplest, separable lowpass filter kernel is the box kernel, whose coefficients have the same
value (typically 1). The name “box kernel” comes from a constant kernel resembling a box when
viewed in 3-D. An m × n box filter is an m × n array of 1’s, with a normalizing constant in front,
whose value is 1 divided by the sum of the values of the coefficients (i.e., 1/mn when all the
coefficients are 1’s). This normalization, which we apply to all lowpass kernels, has two purposes.
First, the average value of an area of constant intensity would equal that intensity in the filtered
image, as it should. Second, normalizing the kernel in this way prevents introducing a bias during
filtering; that is, the sum of the pixels in the original and filtered images will be the same. Because
in a box kernel all rows and columns are identical, it means that they are separable.
ab
cd
(a) Test pattern of
size 1024 × 1024
pixels.
(b)-(d) Results of
lowpass filtering
with box kernels
of sizes 3 × 3,
11 × 11,
and 21 × 21,
respectively.
ab
Examples of
smoothing kernels:
(a) is a box kernel;
(b) is a Gaussian
kernel.
ab
(a) Sampling a
Gaussian
function
to obtain a
discrete
Gaussian kernel.
The values
shown are for K
= 1 and s = 1. (b)
Resulting 3 × 3
kernel.
abc
(a)A test pattern of size 1024 × 1024. (b) Result of lowpass filtering the pattern with a Gaussian kernel of
size 21 × 21, with standard deviations s = 3.5. (c) Result of using a kernel of size 43 × 43, with s = 7. We
used K = 1 in all cases.
Using lowpass filtering and thresholding for region extraction.
Figure (a) is a 2566 × 2758 Hubble Telescope image of the Hickson Compact Group (see figure
caption), whose intensities were scaled to the range [0,1]. Our objective is to illustrate lowpass
filtering combined with intensity thresholding for eliminating irrelevant detail in this image. In the
present context, “irrelevant” refers to pixel regions that are small compared to kernel size.
Figure (b) is the result of filtering the original image with a Gaussian kernel of size 151 × 151
(approximately 6% of the image width) and standard deviation s = 25. We chose these parameter
values in order generate a sharper, more selective Gaussian kernel shape than we used in earlier
examples. The filtered image shows four predominantly bright regions. We wish to extract only
those regions from the image. Figure (c) is the result of thresholding the filtered image with a
threshold T = 0 4. As the figure shows, this approach effectively extracted the four regions of
interest, and eliminated details deemed irrelevant in this application.
abc
(a) A 2566 × 2758 Hubble Telescope image of the Hickson Compact Group. (b) Result of lowpass filtering
with a Gaussian kernel. (c) Result of thresholding the filtered image (intensities were scaled to the range [0,
1]). The Hickson Compact Group contains dwarf galaxies that have come together, setting off thousands of
new star clusters. (Original image courtesy of NASA.)
Order-Statistic (Nonlinear) Filters
Order-statistic filters are nonlinear spatial filters whose response is based on ordering (ranking) the
pixels contained in the region encompassed by the filter. Smoothing is achieved by replacing the
value of the center pixel with the value determined by the ranking result. The best-known filter in
this category is the median filter, which, as its name implies, replaces the value of the center pixel
by the median of the intensity values in the neighborhood of that pixel (the value of the center pixel
is included in computing the median).
Median filters provide excellent noise reduction capabilities for certain types of random noise, with
considerably less blurring than linear smoothing filters of similar size. Median filters are
particularly effective in the presence of impulse noise (sometimes called salt-and-pepper noise,
when it manisfests itself as white and black dots superimposed on an image).
In order to perform median filtering at a point in an image, we first sort the values of the pixels in
the neighborhood, determine their median, and assign that value to the pixel in the filtered image
corresponding to the center of the neighborhood. For example, in a 3 × 3 neighborhood the median
is the 5th largest value, in a 5 × 5 neighborhood it is the 13th largest value, and so on. When several
values in a neighborhood are the same, all equal values are grouped. For example, suppose that a 3
× 3 neighborhood has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10,
15, 20, 20, 20, 20, 20, 25, 100), which results in a median of 20.
The median filter is by far the most useful order-statistic filter in image processing, but is not the
only one. The median represents the 50th percentile of a ranked set of numbers, but ranking lends
itself to many other possibilities. For example, using the 100th percentile results in the so-called
max filter, which is useful for finding the brightest points in an image or for eroding dark areas
adjacent to light regions. The 0th percentile filter is the min filter, used for the opposite purpose.
Sharpening (Highpass) Spatial Filters
Sharpening highlights transitions in intensity. Uses of image sharpening range from electronic
printing and medical imaging to industrial inspection and autonomous guidance in military systems.
Unsharp Masking And Highboost Filtering
Subtracting an unsharp (smoothed) version of an image from the original image is process that has
been used since the 1930s by the printing and publishing industry to sharpen images. This process,
called unsharp masking, consists of the following steps:
1. Blur the original image.
2. Subtract the blurred image from the original (the resulting difference is called the mask.)
3. Add the mask to the original.
Let fB(x, y) denote the blurred image and f(x, y) be the original image, then the mask can be defined
as, gmask(x, y) = f(x, y) – fB(x, y). To get the sharpened image, we add the weighted mask back to the
original image, fsharp(x, y) = f(x, y) + k.gmask(x, y). When k = 1 we have unsharp masking, as defined
above. When k > 1, the process is referred to as highboost filtering. Choosing k < 1 reduces the
contribution of the unsharp mask.
Sobel filter: Let a 3x3 region of image be represented as follows.
We prefer to use kernels of odd sizes because they have a unique, (integer) center of spatial
symmetry. The smallest kernels in which we are interested are of size 3 × 3. Approximations to gx
and gy using a 3 × 3 neighborhood centered on z5 are as follows:
∂f ∂f
gx = = ( z7 +2 z8 + z 9 ) − ( z 1 +2 z 2+ z 3 ) and g y = = ( z3 +2 z 6 + z9 ) − ( z1 +2 z4 + z7 ).
∂x ∂y
The resultant image intensities are computed as,
1 1
2 2
[ 2 2
]
M ( x , y ) = [ g x + g y ] 2 = [ ( z7 +2 z 8 + z9 ) − ( z1 +2 z 2 + z 3 ) ] + [ ( z 3 +2 z 6 + z 9 ) − ( z 1 +2 z 4 + z 7 ) ] 2 .
Fourier Series And Transform
The Fourier series is named after the French mathematician Jean Baptiste Joseph Fourier. His
contribution was that any periodic function can be expressed as the sum of sines and/or cosines of
different frequencies, each multiplied by a different coefficient. It does not matter how complicated
the function is; if it is periodic and satisfies some mathematical conditions, it can be represented by
such a sum.
Functions that are not periodic (but whose area under the curve is finite) can be expressed as
the integral of sines and/or cosines multiplied by a weighting function. The formulation in this case
is the Fourier transform, and its utility is even greater than the Fourier series in many theoretical
and applied disciplines.
Both representations share the important characteristic that a function, expressed in either a
Fourier series or transform, can be reconstructed (recovered) completely via an inverse process,
with no loss of information. This is one of the most important characteristics of these
representations because it allows us to work in the Fourier domain (generally called the frequency
domain) and then return to the original domain of the function without losing any information.
Frequency Domain Filtering Fundamentals
Filtering in the frequency domain consists of modifying the Fourier transform of an image, then
computing the inverse transform to obtain the spatial domain representation of the processed result.
Thus, given (a padded) digital image, f(x,y), of size P ×Q pixels, the basic filtering equation in
which we are interested has the form:
g( x , y ) = Real { F−1 [ H (u , v) F(u , v)] }, where F−1 is the IDFT, F(u,v) is the DFT of the input
image, f(x, y), H(u,v) is a filter transfer function (which we often call just a filter or filter function),
and g(x, y) is the filtered (output) image. Functions F, H, and g are arrays of size P ×Q, the same as
the padded input image. The product H(u,v)F(u,v) is formed using elementwise multiplication.
One of the simplest filter transfer functions we can construct is a function H(u,v) that is 0 at
the center of the (centered) transform, and 1’s elsewhere. This filter would reject the dc term and
“pass” (i.e., leave unchanged) all other terms of F(u,v) when we form the product H(u,v)F(u,v). We
know that the dc term is responsible for the average intensity of an image, so setting it to zero will
reduce the average intensity of the output image to zero.The image became much darker. An
average of zero implies the existence of negative intensities.
As noted earlier, low frequencies in the transform are related to slowly varying intensity
components in an image, such as the walls of a room or a cloudless sky in an outdoor scene. On the
other hand, high frequencies are caused by sharp transitions in intensity, such as edges and noise.
Therefore, we would expect that a function H(u,v) that attenuates(blocks) high frequencies while
passing low frequencies (called a lowpass filter, as noted before) would blur an image, while a filter
with the opposite property (called a highpass filter) would enhance sharp detail, but cause a
reduction
in contrast in the image.
Adding a small constant to the filter does not affect sharpening appreciably, but it does
prevent elimination of the dc term and thus preserves tonality.
Wraparound, saturation and Zero padding: If an image is represented in a byte or integer pixel
format, the maximum pixel value is limited by the number of bits used for the representation, e.g.
the pixel values of a 8-bit image are limited to 255. However, many image processing operations
produce output values which are likely to exceed the given maximum value. In such cases, we have
to decide how to handle this pixel overflow. It is specifically true for Fourier transformations,
because it involves addition of two frequencies representing two images.
One possibility is to wrap around the overflowing pixel values. This means that if a value is
greater than the possible maximum, we subtract the pixel value range so that the value starts again
from the possible minimum value. For 8-bit format it means that 256 will be replaced by 0, 257 by
1, and so on.
Another possibility is to set all overflowing pixels to the maximum possible values --- an
effect known as saturation. For 8-bit format it means that all values greater than 255 will be
replaced by 255. This technique is known as saturation.
If only a few pixels in the image exceed the maximum value it is often better to apply the
latter technique, especially if we use the image for display purposes. However, by setting all
overflowing pixels to the same value we lose an essential amount of information. In the worst case,
when all pixels exceed the maximum value, this would lead to an image of constant pixel values.
Wrapping around overflowing pixel retains the differences between values. On the other hand, it
might cause the problem that pixel values passing the maximum ‘jump’ from the maximum to the
minimum value. This may distort the resulting image.
In digital image processing, a filter works like a sliding window containing (usually) an odd
number of pixels, arranged in a grid. The filter is moved from top left corner of the image to right
for every row of pixel in the image. For the pixels that are on the edges of the image, some part of
the filter will be outside the image area. This is compensated by wrapping the filter around the
opposite end of the image. This is known as circular convolution. It means that in circular
convolution, pixels on the edges of the image will influence the pixels on the opposite end of the
image. This may result in distortion on the edges of the image(s).
To remedy this problem extra redundant pixels are concatenated along the edges of the
images and those pixels are usually set to zero. This is known as zero padding.
There are two other types of padding –
a) Mirror padding – In mirror padding, values outside the boundary of the image are
obtained by mirror-reflecting the image across its border.
b) Replicate padding – In replicate padding, values outside the boundary are set equal to the
nearest image border value.
Let us consider two functions, f (x) and h(x) composed of A and B samples, respectively. It
can be shown that if we append zeros to both functions so that they have the same length, denoted
by P, then wraparound is avoided by choosing P ≥ A + B − 1. In our example, each function has 400
points, so the minimum value we could use is P = 799, which implies that we would append 399
zeros to the trailing edge of each function.
In the two-dimensional case, the above can be expressed as follows. Let f(x,y) and h(x,y) are
two images of size A x B and C x D respectively. Then P ≥ A + C – 1 and Q ≥ B + D – 1, where P
and Q are the width and height of the resulting image. If both arrays are same, then P ≥ 2M – 1, and
Q ≥ 2N – 1.
As a rule, DFT algorithms tend to execute faster with arrays of even size, so it is good
practice to select P and Q as the smallest even integers that satisfy the preceding equations. If the
two arrays are of the same size, this means that P and Q are selected as: P = 2M and Q = 2N.
Steps For Filtering In The Frequency Domain
The process of filtering in the frequency domain can be summarized as follows:
1. Given an input image f (x, y) of size M × N, obtain the padding sizes P and Q; that is, P =
2M and Q = 2N.
2. Form a padded image fp(x, y) of size P ×Q using zero-, mirror-, or replicate padding.
3. Multiply f p(x,y) by (−1)x+y to center the Fourier transform on the P ×Q frequency rectangle.
4. Compute the DFT, F(u,v), of the image from Step 3.
5. Construct a real, symmetric filter transfer function, H(u,v), of size P ×Q with center at
(P/2,Q/2).
6. Form the product G(u,v) = H(u,v)F(u,v) using elementwise multiplication; that is,
G(i,k) = H(i,k)F(i,k) for i = 0, 1, 2,…,M − 1 and k = 0, 1, 2,…, N − 1.
7. Obtain the filtered image (of size P ×Q) by computing the IDFT of G(u,v):
g p ( x , y) = (real [ F−1 {G (u , v) } ])(−1)x+ y .
8. Obtain the final filtered result, g(x, y), of the same size as the input image, by extracting the
M×N region from the top, left quadrant of gp(x,y).
Correspondence Between Filtering In The Spatial And Frequency Domains
The link between filtering in the spatial and frequency domains is the convolution theorem. We
defined filtering in the frequency domain as the elementwise product of a filter transfer function,
H(u,v), and F(u,v), the Fourier transform of the input image. It can be proven that, given an H(u,v),
its equivalent kernel in the spatial domain is F−1{H(u,v)}, which is the inverse transform of the
frequency domain filter transfer function, which is the corresponding kernel in the spatial domain.
Conversely, it follows from a similar analysis that, given a spatial filter kernel, we obtain its
frequency domain representation by taking the forward Fourier transform of the kernel. Therefore,
the two filters form a Fourier transform pair: h(x, y)⇔ H(u,v) where h(x, y) is the spatial kernel and
H(u,v) is the frequency kernel. Because this kernel can be obtained from the response of a
frequency domain filter to an impulse, h(x, y) sometimes is referred to as the impulse response of
H(u,v). Also, because all quantities in a discrete implementation are finite, such filters are called
finite impulse response (FIR) filters.
Image Smoothing Using Lowpass Frequency Domain Filters
Edges and other sharp intensity transitions (such as noise) in an image contribute significantly to the
high frequency content of its Fourier transform. Hence, smoothing (blurring) is achieved in the
frequency domain by high-frequency attenuation; that is, by lowpass filtering. We consider three
types of lowpass filters: ideal, Butterworth, and Gaussian. These three categories cover the range
from very sharp (ideal) to very smooth (Gaussian) filtering.
Ideal Lowpass Filters
A 2-D lowpass filter that passes without attenuation all frequencies within a circle of radius from
the origin, and “cuts off” all frequencies outside this, circle is called an ideal lowpass filter (ILPF);
it is specified by the transfer function as follows.
where D0 is a positive constant, and D(u,v) is the distance between a point (u,v) in the frequency
domain and the center of the P×Q frequency rectangle; that is,
abc
(a) Perspective plot of an ideal lowpass-filter transfer function. (b) Function displayed as an image.
(c) Radial cross section.
The name ideal indicates that all frequencies on or inside a circle of radius D0 are passed without
attenuation, whereas all frequencies outside the circle are completely attenuated (filtered out). The
ideal lowpass filter transfer function is radially symmetric about the origin. This means that it is
defined completely by a radial cross section. A 2-D representation of the filter is obtained by
rotating the cross section 360°.
abc
def
(a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using ILPFs with cutoff
frequencies set at radii values 10, 30, 60, 160, and 460.
Gaussian Lowpass Filters
D 2 (u , v )
−
2σ 2
Gaussian lowpass filter (GLPF) transfer functions have the form H (u , v) = e , where, D(u,v)
is the distance from the center of the P×Q frequency rectangle to any point, (u,v), contained by the
rectangle. As before, σ is a measure of spread about the center. By letting σ = D0, we can express the
D 2 (u ,v )
−
2 D 20
Gaussian transfer function in the same notation as other functions: H (u , v ) = e . where D0 is
the cutoff frequency.
abc
FIGURE (a) Perspective plot of a GLPF transfer function. (b) Function displayed as an image. (c)
Radial cross sections for various values of D0.
abc
def
FIGURE (a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using GLPFs with
cutoff frequencies at the radii.
abc
FIGURE (a) Perspective plot of a Butterworth lowpass-filter transfer function. (b) Function
displayed as an image. (c) Radial cross sections of BLPFs of orders 1 through 4.