0% found this document useful (0 votes)
18 views

Module 1

Digital image processing involves processing digital images using computers. It can be divided into low-level, mid-level, and high-level processes. Low-level processes include operations like noise reduction and contrast enhancement. Mid-level processes include image segmentation and object classification. High-level processes involve computer vision and making sense of identified objects.

Uploaded by

rofiwi6692
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 1

Digital image processing involves processing digital images using computers. It can be divided into low-level, mid-level, and high-level processes. Low-level processes include operations like noise reduction and contrast enhancement. Mid-level processes include image segmentation and object classification. High-level processes involve computer vision and making sense of identified objects.

Uploaded by

rofiwi6692
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Digital image processing has two principle application areas -

• Improve the pictorial information so that it is more suited for human interpretation.
• Facilitate autonomous perception of an image by processing image data for storage,
transmission and representation.
Digital image: An image may be defined as a two-dimensional function f(x, y), where x and y are
spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is called the
intensity or gray level of the image at that point. When x, y, and the intensity values of f are all
finite, discrete quantities, we call the image a digital image. A digital image composed of a finite
number of elements, each of which has a particular location and value. These elements are
commonly called pixels.

Digital image processing is processing of digital images using computers. We can divide the
processing of images as low-level, mid-level, and high-level processes.
A low-level process includes primitive operations such as reducing noise, enhancing
contrast, and sharpening of images. Both input and output of a low-level process are images.
A mid-level process includes segmentation (partitioning and image into regions or objects),
descriptions of those objects to reduce them to a form suitable for computer processing, and
classification of individual objects. Inputs to a mid-level process are images but outputs are
attributes extracted from those images (identity of individual objects, contours, and edges).
A high-level process involves making sense of the identified objects and performing
cognitive functions. This last step is commonly called computer vision.
Digital image processing generally involves low-level processes and mid-level processes.
Steps in digital image processing.
1. Image acquisition
2. Image enhancement
3. Image restoration
4. Color image processing
5. Wavelets
6. Compression
7. Morphological processing
8. Segmentation
9. Representation and Description
10. Recognition
Inputs to all these processes are images. Outputs of processes 1 – 7 are generally images and
outputs of processes 8 – 10 are generally image attributes. Process 7, morphological processing may
be thought of as kind of in between, that is, its output may be images and image attributes.

Components of a digital image processing system.


• Image sensors
• Image processing hardware
• Computer
• Image processing software
• Image displays
• Image hardcopy
• Image mass storage
• Computer networks

Image sensing – With reference to sensing, two elements are needed to acquire digital images. The
first is the physical device that is used to capture the image of the object of interest, and second one
is the digitizer, that converts the captured image into a digital image.
Image processing hardware – It contains the digitizer and hardware (ALU) that performs other
primitive arithmetic and logical operations on entire image. The digitization and ALU operations
are done in parallel. For example, digitization and averaging of video images can be done 30 fps by
such hardware. These kind of performance cannot be expected from a typical computer hardware.

Computer – It is a general purpose computer, that can range from a personal computer or laptop to
supercomputer.

Image processing software – Such software consists of specialized modules for performing specific
operations on images, and programming facility to use those modules; usually known as an
Application Programming Interface (API). Usually the APIs available in multiple programming
languages.

Mass storage – Size of images may get out of hand very rapidly. As an example, a tiny 16x16 image
with 32-bit encoding will take 8 KB of space in memory. Most of the practical scenarios involves
images with far greater size and resolution. As a result every image processing applications must
contain mass storage facility. Digital storage for image processing applications falls into three
principal categories: (1) short-term storage for use during processing; (2) on-line storage for
relatively fast recall; and (3) archival storage, characterized by infrequent access. One method of
providing short-term storage is computer memory. Another is by specialized boards, called frame
buffers, that store one or more images and can be accessed rapidly, usually at video rates (e.g., at 30
complete images per second). The latter method allows virtually instantaneous image zoom, as well
as scroll (vertical shifts) and pan (horizontal shifts). Frame buffers usually are housed in the
specialized image processing hardware. On-line storage generally takes the form of magnetic disks
or optical-media storage. The key factor characterizing on-line storage is frequent access to the
stored data. Finally, archival storage is characterized by massive storage requirements but
infrequent need for access. Magnetic tapes and optical disks housed in “jukeboxes” are the usual
media for archival applications.

Image Displays – Image displays in use today are mainly color, flat screen monitors. Monitors are
driven by the outputs of image and graphics display cards that are an integral part of the computer
system.

Hardcopy Devices – Hardcopy devices for recording images include laser printers, film cameras,
heat-sensitive devices, ink-jet units, and digital units, such as optical and CD-ROM disks.

Networking – Networking and cloud communication are almost default functions in any computer
system in use today. Because of the large amount of data inherent in image processing applications,
the key consideration in image transmission is bandwidth. In dedicated networks, this typically is
not a problem, but communications with remote sites via the internet are not always as efficient.

Image Sensing And Acquisition


Most of the images in which we are interested are generated by the combination of an
“illumination” source and the reflection or absorption of energy from that source by the elements of
the “scene” being imaged. For example, the illumination may originate from a source of
electromagnetic energy, such as a radar, infrared, or X-ray system. But, as noted earlier, it could
originate from less traditional sources, such as ultrasound or even a computer-generated
illumination pattern. Similarly, the scene elements could be familiar objects, but they can just as
easily be molecules, buried rock formations, or a human brain. Depending on the nature of the
source, illumination energy is reflected from, or transmitted through, objects.
Image Acquisition Using A Single Sensing Element
The figure below shows the components of a single sensing element. A familiar sensor of this type
is the photodiode, which is constructed of silicon materials and whose output is a voltage
proportional to light intensity. Using a filter in front of a sensor improves its selectivity. For
example, an optical green-transmission filter favors light in the green band of the color spectrum.
As a consequence, the sensor output would be stronger for green light than for other visible light
components.

In order to generate a 2-D image using a single sensing element, there has to be relative
displacements in both the x- and y-directions between the sensor and the area to be imaged. The
figure below shows an arrangement used in high-precision scanning, where a film negative is
mounted onto a drum whose mechanical rotation provides displacement in one dimension. The
sensor is mounted on a lead screw that provides motion in the perpendicular direction. A light
source is contained inside the drum. As the light passes through the film, its intensity is modified by
the film density before it is captured by the sensor. This "modulation" of the light intensity causes
corresponding variations in the sensor voltage, which are ultimately converted to image intensity
levels by digitization.

Image Acquisition Using Sensor Strips


A geometry used more frequently than single sensors is an in-line sensor strip. The strip provides
imaging elements in one direction. Motion perpendicular to the strip provides imaging in the other
direction. This arrangement is used in most flat bed scanners. Sensing devices with 4000 or more
in-line sensors are possible. In-line sensors are used routinely in airborne imaging applications, in
which the imaging system is mounted on an aircraft that flies at a constant altitude and speed over
the geographical area to be imaged. One dimensional imaging sensor strips that respond to various
bands of the electromagnetic spectrum are mounted perpendicular to the direction of flight. An
imaging strip gives one line of an image at a time, and the motion of the strip relative to the scene
completes the other dimension of a 2-D image. Lenses or other focusing schemes are used to project
the area to be scanned onto the sensors.
Sensor strips in a ring configuration are used in medical and industrial imaging to obtain cross-
sectional (“slice”) images of 3-D objects. A rotating X-ray source provides illumination, and X-ray
sensitive sensors opposite the source collect the energy that passes through the object. This is the
basis for medical and industrial computerized axial tomography (CAT) imaging. The output of the
sensors is processed by reconstruction algorithms whose objective is to transform the sensed data
into meaningful cross sectional images. In other words, images are not obtained directly from the
sensors by motion alone; they also require extensive computer processing.

Image Acquisition Using Sensor Arrays


Electromagnetic and ultrasonic sensing devices frequently are arranged in the form of a 2-D array.
This is also the predominant arrangement found in digital cameras. A typical sensor for these
cameras is a CCD (charge-coupled device) array, which can be manufactured with a broad range of
sensing properties and can be packaged in rugged arrays of 4000 * 4000 elements or more. CCD
sensors are used widely in digital cameras and other light-sensing instruments. The response of each
sensor is proportional to the integral of the light energy projected onto the surface of the sensor, a
property that is used in astronomical and other applications requiring low noise images. Noise
reduction is achieved by letting the sensor integrate the input light signal over minutes or even
hours. Because the sensor array is two dimensional, its key advantage is that a complete image can
be obtained by focusing the energy pattern onto the surface of the array. The figure below shows the
principal manner in which array sensors are used. This figure shows the energy from an
illumination source being reflected from a scene (as mentioned at the beginning of this section, the
energy also could be transmitted through the scene). The first function performed by the imaging
system is to collect the incoming energy and focus it onto an image plane. If the illumination
is light, the front end of the imaging system is an optical lens that projects the viewed scene onto the
focal plane of the lens. The sensor array, which is coincident with the focal plane, produces outputs
proportional to the integral of the light received at each sensor. Digital and analog circuitry sweep
these outputs and convert them to an analog signal, which is then digitized by another section of the
imaging system. The output is a digital image, as shown diagrammatically.
We denote images by two-dimensional functions of the form f (x, y). The value of f at spatial
coordinates (x, y) is a scalar quantity whose physical meaning is determined by the source of the
image, and whose values are proportional to energy radiated by a physical source (e.g.,
electromagnetic waves).
As a consequence, f (x, y) must be non-negative and finite; that is, 0 ≤ f (x, y) < ∞. Function f (x, y)
is characterized by two components: (1) the amount of source illumination incident on the scene
being viewed, and (2) the amount of illumination reflected by the objects in the scene.
Appropriately, these are called the illumination and reflectance components, and are denoted by
i(x, y) and r(x, y), respectively. The two functions combine as a product to form f (x, y):
f (x, y) = i(x, y)r(x, y) where 0 ≤ i(x, y) < ∞ and 0 ≤ r(x, y) ≤ 1.
Thus, reflectance is bounded by 0 (total absorption) and 1 (total reflectance). The nature of i(x, y) is
determined by the illumination source, and r(x, y) is determined by the characteristics of the imaged
objects. These expressions are applicable also to images formed via transmission of the illumination
through a medium, such as a chest X-ray. In this case, we would deal with a transmissivity instead
of a reflectivity function, but the limits would be the same and the image function formed would be
modeled as the product.

Image Sampling And Quantization


The output of most sensors is a continuous voltage waveform whose amplitude and spatial behavior
are related to the physical phenomenon being sensed. To create a digital image, we need to convert
the continuous sensed data into a digital format. This requires two processes: sampling and
quantization.

Basic Concepts In Sampling And Quantization


An image may be continuous with respect to the x- and y-coordinates, and also in amplitude. To
digitize it, we have to sample the function in both coordinates and also in amplitude. Digitizing the
coordinate values is called sampling. Digitizing the amplitude values is called quantization.
The one-dimensional function is a plot of amplitude (intensity level) values of the continuous image
along the line segment AB (see figure). The random variations are due to image noise. To sample
this function, we take equally spaced samples along line AB. The samples are shown as small dark
squares superimposed on the function, and their (discrete) spatial locations are indicated by
corresponding tick marks in the bottom of the figure. The set of dark squares constitute the sampled
function. However, the values of the samples still span (vertically) a continuous range of intensity
values. In order to form a digital function, the intensity values also must be converted (quantized)
into discrete quantities. The vertical gray bar depicts the intensity scale divided into eight discrete
intervals, ranging from black to white. The vertical tick marks indicate the specific value assigned
to each of the eight intensity intervals. The continuous intensity levels are quantized by assigning
one of the eight values to each sample, depending on the vertical proximity of a sample to a vertical
tick mark. The digital samples resulting from both sampling and quantization are shown as white
squares. Starting at the top of the continuous image and carrying out this procedure downward, line
by line, produces a two-dimensional digital image. It is implied, in addition to the number of
discrete levels used, the accuracy achieved in quantization is highly dependent on the noise content
of the sampled signal.
In practice, the method of sampling is determined by the sensor arrangement used to generate the
image. When an image is generated by a single sensing element combined with mechanical motion,
the output of the sensor is quantized in the manner described above. However, spatial sampling is
accomplished by selecting the number of individual mechanical increments at which we activate the
sensor to collect data.
When a sensing strip is used for image acquisition, the number of sensors in the strip
establishes the samples in the resulting image in one direction, and mechanical motion establishes
the number of samples in the other. Quantization of the sensor outputs completes the process of
generating a digital image.
When a sensing array is used for image acquisition, no motion is required. The number of
sensors in the array establishes the limits of sampling in both directions. The quality of a digital
image is determined to a large degree by the number of samples and discrete intensity levels used in
sampling and quantization.

Image Interpolation
Interpolation is used in tasks such as zooming, shrinking, rotating, and geometrically correcting
digital images.
Interpolation is the process of using known data to estimate values at unknown locations. We begin
the discussion of this topic with a short example. Suppose that an image of size 500 * 500 pixels has
to be enlarged 1.5 times to 750 * 750 pixels. A simple way to visualize zooming is to create an
imaginary 750 * 750 grid with the same pixel spacing as the original image, then shrink it so that it
exactly overlays the original image. Obviously, the pixel spacing in the shrunken 750 * 750 grid
will be less than the pixel spacing in the original image. To assign an intensity value to any point in
the overlay, we look for its closest pixel in the underlying original image and assign the intensity of
that pixel to the new pixel in the 750 * 750 grid. When intensities have been assigned to all the
points in the overlay grid, we expand it back to the specified size to obtain the resized image. The
method just discussed is called nearest neighbor interpolation because it assigns to each new
location the intensity of its nearest neighbor in the original image. This approach is simple but, it
has the tendency to produce undesirable artifacts, such as severe distortion of straight edges. A more
suitable approach is bilinear interpolation, in which we use the four nearest neighbors to estimate
the intensity at a given location. Let (x, y) denote the coordinates of the location to which we want
to assign an intensity value (think of it as a point of the grid described previously), and let v(x, y)
denote that intensity value. For bilinear interpolation, the assigned value is obtained using the
equation v(x, y) = ax + by + cxy + d, where the four coefficients are determined from the four
equations in four unknowns that can be written using the four nearest neighbors of point (x, y).
Bilinear interpolation gives much better results than nearest neighbor interpolation, with a modest
increase in computational burden.
The next level of complexity is bicubic interpolation, which involves the sixteen nearest
neighbors of a point. The sixteen coefficients are determined from the sixteen equations with
sixteen unknowns that can be written using the sixteen nearest neighbors of point (x, y). Bicubic
interpolation is the standard used in commercial image editing applications, such as Adobe
Photoshop and Corel Photopaint.
Although images are displayed with integer coordinates, it is possible during processing to
work with subpixel accuracy by increasing the size of the image using interpolation to “fill the
gaps” between pixels in the original image.
It is possible to use more neighbors in interpolation, and there are more complex techniques,
such as using splines or wavelets, that in some instances can yield better results than the methods
just discussed.
Basic Relationships Between Pixels
In this section, we discuss several important relationships between pixels in a digital image. When
referring in the following discussion to particular pixels, we use lowercase letters, such as p and q.

Neighbors of a Pixel
A pixel p at coordinates (x, y) has two horizontal and two vertical neighbors with coordinates
(x + 1, y), (x − 1, y), (x, y + 1), (x, y − 1). This set of pixels, called the 4-neighbors of p, is denoted
as N4 (p). The four diagonal neighbors of p have coordinates (x + 1, y + 1), (x + 1, y − 1), (x − 1, y
+ 1), (x − 1, y − 1) and are denoted as ND (p). These neighbors, together with the 4-neighbors, are
called the 8-neighbors of p, denoted by N8(p). The set of image locations of the neighbors of a
point p is called the neighborhood of p. The neighborhood is said to be closed if it contains p.
Otherwise, the neighborhood is said to be open.

Adjacency, Connectivity, Regions, and Boundaries


Let V be the set of intensity values used to define adjacency. In a binary image, V = {1} if we are
referring to adjacency of pixels with value 1. In a grayscale image, the idea is the same, but set V
typically contains more elements. For example, if we are dealing with the adjacency of pixels
whose values are in the range 0 to 255, set V could be any subset of these 256 values. We consider
three types of adjacency:
1. 4-adjacency: Two pixels p and q with values from V are 4-adjacent if q is in the set N4(p).
2. 8-adjacency: Two pixels p and q with values from V are 8-adjacent if q is in the set N8(p).
3. m-adjacency (also called mixed adjacency): Two pixels p and q with values from V are m-
adjacent if
a) q is in N8(p), or
b) q is in ND(p) and the set N4(p)∩N4(q) has no pixels whose values are from V.
A digital path (or curve) from pixel p with coordinates (x0, y0) to pixel q with coordinates (xn, yn) is
a sequence of distinct pixels with coordinates (x0, y0), (x1, y1), ..., (xn, yn) where points (xi, yi) and
(xi-1, yi-1) are adjacent for 1 ≤ i ≤ n. In this case, n is the length of the path. If (x0, y0) = (xn, yn) the
path is a closed path.
Let S represent a subset of pixels in an image. Two pixels p and q are said to be connected
in S if there exists a path between them consisting entirely of pixels in S. For any pixel p in S, the
set of pixels that are connected to it in S is called a connected component of S. If it only has one
component, and that component is connected, then S is called a connected set.
Let R represent a subset of pixels in an image. We call R a region of the image if R is a
connected set. Two regions, Ri and Rj are said to be adjacent if their union forms a connected set.
Regions that are not adjacent are said to be disjoint.
Suppose an image contains K disjoint regions, Rk , k =1, 2,…, K, none of which touches the
image border. Let Ru denote the union of all the K regions, and let ( Ru )c denote its complement
(recall that the complement of a set A is the set of points that are not in A). We call all the points in
Ru the foreground, and all the points in ( Ru )c the background of the image.
The boundary (also called the border or contour) of a region R is the set of pixels in R that
are adjacent to pixels in the complement of R. Stated another way, the border of a region is the set
of pixels in the region that have at least one background neighbor. Here again, we must specify the
connectivity being used to define adjacency. As a rule, adjacency between points in a region and its
background is defined using 8-connectivity. The preceding definition sometimes is referred to as the
inner border of the region to distinguish it from its outer border, which is the corresponding border
in the background. This distinction is important in the development of border-following algorithms.
Such algorithms usually are formulated to follow the outer boundary in order to guarantee that the
result will form a closed path. For instance, the inner border does not satisfy the definition of a
closed path. On the other hand, the outer border of the region does form a closed path around the
region. If R happens to be an entire image, then its boundary (or border) is defined as the set of
pixels in the first and last rows and columns of the image. This extra definition is required because
an image has no neighbors beyond its border.
The concept of an edge is found frequently in discussions dealing with regions and
boundaries. However, there is a key difference between these two concepts. The boundary of a
finite region forms a closed path and is thus a “global” concept. Edges are formed from pixels with
derivative values that exceed a preset threshold. Thus, an edge is a “local” concept that is based on a
measure of intensity-level discontinuity at a point. It is possible to link edge points into edge
segments, and sometimes these segments are linked in such a way that they correspond to
boundaries, but this is not always the case. The one exception in which edges and boundaries
correspond is in binary images.

Distance Measures
For pixels p, q, and s, with coordinates (x, y), (u,v), and (w,z), respectively, D is a distance function
or metric if
a) D(p,q) ≥ 0 (D(p,q) = 0 iff p = q),
b) D(p,q) = D(q, p), and
c) D(p, s) ≤ D(p, q) + D(q, s).

The Euclidean distance between p and q is defined as


1
D e ( p , q) = [( x−u)2 +( y−v)2 ] 2 For this distance measure, the pixels having a distance less than
or equal to some value r from (x, y) are the points contained in a disk of radius r centered at (x, y).

The D4 distance, (called the city-block distance) between p and q is defined as


D e ( p , q) = | ( x−u)+( y−v) | In this case, pixels having a D4 distance from (x, y) that is less
than or equal to some value d form a diamond centered at (x, y).

The D8 distance (called the chessboard distance) between p and q is defined as


D 8 ( p , q) = max (x−u , y−v). In this case, the pixels with D8 distance from (x, y) less than or
equal to some value d form a square centered at (x, y).

Note that the D4 and D8 distances between p and q are independent of any paths that might exist
between these points because these distances involve only the coordinates of the points. In the case
of m-adjacency, however, the Dm distance between two points is defined as the shortest m-path
between the points. In this case, the distance between two pixels will depend on the values of the
pixels along the path, as well as the values of their neighbors.
Arithmetic Operations: Arithmetic operations between two images f (x, y) and g(x, y) are denoted
as
s(x,y) = f(x,y) + g(x,y)
d(x,y) = f(x,y) – g(x,y)
p(x,y) = f(x,y) x g(x,y)
v(x,y) = f(x,y) / g(x,y)
These are element-wise operations which means that they are performed between corresponding
pixel pairs in f and g for x = 0, 1, 2,…,M − 1 and y = 0, 1, 2,…, N − 1. As usual, M and N are the
row and column sizes of the images. Clearly, s, d, p, and v are images of size M × N also.

Addition of images for noise reduction: Suppose that g(x, y) is a corrupted image formed by the
addition of noise, η(x,y), to a noiseless image f (x, y) ; that is, g( x , y )=f ( x , y)+η ( x , y ). where
the assumption is that at every pair of coordinates (x, y) the noise is uncorrelated and has zero
average value. We assume also that the noise and image values are uncorrelated (this is a typical
assumption for additive noise). The objective of the following procedure is to reduce the noise
content of the output image by adding a set of noisy input images, {gi(x,y)}. This is a technique
used frequently for image enhancement. If the noise satisfies the constraints just stated, it can be
K
1
shown that if an image is formed by averaging K different noisy images, ḡ( x , y )= ∑ g i ( x , y) ,
K i=1
2 1 2
then it follows that E { ḡ( x , y)}=f (x , y ) , σ ḡ( x , y)= σ η ( x , y). As K increases (more and more
K
images are added) the variance decreases and as a result the variability of pixel values at each
location decreases and the summed up image converges to its expected value, f(x,y); which is the
noiseless image. In order to avoid blurring, images are aligned spatially (registered).
An important application of image averaging is in the field of astronomy, where imaging
under very low light levels often cause sensor noise to render individual images virtually useless for
analysis (lowering the temperature of the sensor helps reduce noise).

Subtracting images for comparison: If two images are subtracted pixel-by-pixel, then the
resulting image will contain pixel values 0 (black) where there were no differences, 1 (white) where
pixels were completely different and values between black and white for the rest. This technique is
used to find out the acceptable reduction limit of resolution of an image. If an image can be stored
using lower resolutions, then it will occupy smaller space in computer memory. But reducing
resolution means losing information. Subtraction of images of same object with different
resolutions tells how much reduction in resolution is acceptable, so that there is a balance between
loss of information and the size of the image.

(a) Difference between the 930 dpi and 72 dpi images. (b) Difference between the 930 dpi and
150 dpi images. (c) Difference between the 930 dpi and 300 dpi images.
Using image multiplication and division for shading correction and for masking.
An important application of image multiplication (and division) is shading correction. Suppose that
an imaging sensor produces images that can be modeled as the product of a “perfect image,”
denoted by f (x, y), times a shading function, h(x, y); that is, g(x, y) = f (x, y)h(x, y). If h(x, y) is
known or can be estimated, we can obtain f (x, y) (or an estimate of it) by multiplying the sensed
image by the inverse of h(x, y) (i.e., dividing g by h using element wise division).

Another use of image multiplication is in masking, also called region of interest (ROI),
operations. The process consists of multiplying a given image by a mask image that has 1’s in the
ROI and 0’s elsewhere. There can be more than one ROI in the mask image, and the shape of the
ROI can be arbitrary.

Geometric Transformations
We use geometric transformations modify the spatial arrangement of pixels in an image. These
transformations are called rubber-sheet transformations because they may be viewed as analogous
to “printing” an image on a rubber sheet, then stretching or shrinking the sheet according to a
predefined set of rules. Geometric transformations of digital images consist of two basic operations:
1. Spatial transformation of coordinates.
2. Intensity interpolation that assigns intensity values to the spatially transformed pixels.
The transformation of coordinates may be expressed as

[ ] [] [
x ' = T x = t 11 t 12 x
y' y ][ ]
t 21 t 22 y
where (x,y) are pixel coordinates in the original image and (x′, y′) are the corresponding pixel
coordinates of the transformed image. For example, the transformation (x′, y′) = (x/2,y/2) shrinks
the original image to half its size in both spatial directions.
Our interest is in so-called affine transformations, which include scaling, translation, rotation, and
shearing. The key characteristic of an affine transformation in 2-D is that it preserves points,
straight lines, and planes. The above can be used to express the transformations just mentioned,
except translation, which would require that a constant 2-D vector be added to the right side of the
equation. However, it is possible to use homogeneous coordinates to express all four affine
transformations using a single 3 × 3 matrix in the following general form:

[ ] [] [ ][ ]
x' x a 11 a12 a 13 x
y' = A y = a 21 a 22 a23 y
1 1 0 0 1 1
This transformation can scale, rotate, translate, or sheer an image, depending on the values chosen
for the elements of matrix A.
A significant advantage of being able to perform all transformations using the unified representation
in above equation is that it provides the framework for concatenating a sequence of operations. For
example, if we want to resize an image, rotate it, and move the result to some location, we simply
form a 3 × 3 matrix equal to the product of the scaling, rotation, and translation matrices.

We can use the equation in two basic ways. The first, is a forward mapping, which consists of
scanning the pixels of the input image and, at each location (x,y), computing the spatial location (x′,
y′) of the corresponding pixel in the output image using the equation directly. A problem with the
forward mapping approach is that two or more pixels in the input image can be transformed to the
same location in the output image, raising the question of how to combine multiple output values
into a single output pixel value. In addition, it is possible that some output locations may not be
assigned a pixel at all.
The second approach, called inverse mapping, scans the output pixel locations and, at each location
(x′, y′), computes the corresponding location in the input image using (x,y) = A−1(x′, y′). It then
interpolates among the nearest input pixels to determine the intensity of the output pixel value.
Inverse mappings are more efficient to implement than forward mappings, and are used in
numerous commercial implementations of spatial transformations (for example, MATLAB uses this
approach).
Intensity Transformations and Spatial Filtering
Spatial domain techniques operate directly on the pixels of an image. The spatial domain processes
we discuss here are based on the expression g(x, y) = T[f (x, y)] where f (x, y) is an input image, g(x,
y) is the output image, and T is an operator on f defined over a neighborhood of point (x, y). The
operator can be applied to the pixels of a single image or to the pixels of a set of images, such as
performing the element wise sum of a sequence of images for noise reduction.
The following figure shows the basic implementation on a single image. The point (x0 , y0 ) shown
is an arbitrary location in the image, and the small region shown is a neighborhood of (x0 , y0 ).
Typically, the neighborhood is rectangular, centered on (x0 , y0), and much smaller in size than the
image.
The process that the figure illustrates consists of moving the center of the neighborhood from pixel
to pixel, and applying the operator T to the pixels in the neighborhood to yield an output value at
that location. Thus, for any specific location (x0 , y0 ), the value of the output image g at those
coordinates is equal to the result of applying T to the
neighborhood with origin at (x0 , y0 ) in f.
For example, suppose that the neighborhood is a square
of size 3 × 3 and that operator T is defined as “compute
the average intensity of the pixels in the neighborhood.”
Consider an arbitrary location in an image, say
(100,150). The result at that location in the output
image, g(100,150), is the sum of f (100,150) and its 8-
neighbors, divided by 9. The center of the neighborhood
is then moved to the next adjacent location and the
procedure is repeated to generate the next value of the
output image g. Typically, the process starts at the top
left of the input image and proceeds pixel by pixel in a
horizontal (vertical) scan, one row (column) at a time.
The smallest possible neighborhood is of size 1 × 1. In this case, g depends only on the value of f at
a single point (x, y) and T becomes an intensity (also called a gray-level, or mapping)
transformation function of the form s = T(r), where, for simplicity in notation, we use s and r to
denote, respectively, the intensity of g and f at any point (x, y).

Fig A Fig B
For example, if T(r) has the form in Fig. A, the result of applying the transformation to every pixel
in f to generate the corresponding pixels in g would be to produce an image of higher contrast than
the original, by darkening the intensity levels below k and brightening the levels above k. In this
technique, sometimes called contrast stretching, values of r lower than k reduce (darken) the values
of s, toward black. The opposite is true for values of r higher than k. Observe how an intensity value
r0 is mapped to obtain the corresponding value s0. In the limiting case shown in Fig. B, T(r)
produces a two level (binary) image. A mapping of this form is called a thresholding function.
Approaches whose results depend only on the intensity at a point sometimes are called point
processing techniques.
Some Basic Intensity Transformation Functions

Image Negatives
The negative of an image with intensity levels in the range [0, L − 1] is obtained by using the
negative transformation function, which has the form: s = L − 1 − r. Reversing the intensity levels
of a digital image in this manner produces the equivalent of a photographic negative. This type of
processing is used, for example, in enhancing white or gray detail embedded in dark regions of an
image, especially when the black areas are dominant in size.

Log Transformations
The general form of the log transformation in is s = c
log(1 + r). where c is a constant and it is assumed that r
≥ 0. The shape of the log curve in the figure shows that
this transformation maps a narrow range of low intensity
values in the input into a wider range of output levels.
For example, note how input levels in the range [0, L 4]
map to output levels to the range [0, 3L 4]. Conversely,
higher values of input levels are mapped to a narrower
range in the output. We use a transformation of this type
to expand the values of dark pixels in an image, while
compressing the higher-level values. The opposite is
true of the inverse log (exponential) transformation.

Power-Law (Gamma) Transformations


γ
Power-law transformations have the form s = cr where c and γ are positive constants. Sometimes
γ
the equation is written as s = c (r + ϵ ) to account for offsets (that is, a measurable output when
the input is zero). However, offsets typically are an issue of display calibration, and as a result they
are normally ignored. As with log transformations, power-law curves with fractional values of γ
map a narrow range of dark input values into a wider range of output values, with the opposite
being true for higher values of input levels. Note also that a family of transformations can be
obtained simply by varying γ. Curves generated with values of γ > 1 have exactly the opposite
effect as those generated with values of γ < 1. When c = γ = 1 the equation reduces to the identity
transformation.
The response of many devices used for image capture,
printing, and display obey a power law. By convention,
the exponent in a power-law equation is referred to as
gamma. The process used to correct these power-law
response phenomena is called gamma correction or
gamma encoding.
The response of many devices used for image capture,
printing, and display obey a power law. By convention,
the exponent in a power-law equation is referred to as
gamma. The process used to correct these power-law
response phenomena is called gamma correction or
gamma encoding. For example, cathode ray tube (CRT)
devices have an intensity-to-voltage response that is a
power function, with exponents varying from
approximately 1.8 to 2.5. As the curve for γ = 2.5, such display systems would tend to produce
images that are darker than intended. Figure below illustrates this effect. Figure (a) is an image of
an intensity ramp displayed in a monitor with a gamma of 2.5. As expected, the output of the
monitor appears darker than the input, as Fig. (b) shows. In this case, gamma correction consists of
using the transformation s = r1/2.5 = r0.4 to preprocess the image before inputting it into the monitor.
Figure (c) is the result. When input into the same monitor, the gamma-corrected image produces an
output that is close in appearance to the original image, as Fig. (d) shows.

γ = 3.0

γ = 0.3

γ = 4.0 γ = 5.0

γ = 0.4 γ = 0.6
Contrast Stretching
Low-contrast images can result from poor illumination, lack of dynamic range in the imaging
sensor, or even the wrong setting of a lens aperture during image acquisition. Contrast stretching
expands the range of intensity levels in an image so that it spans the ideal full intensity range of the
recording medium or display device.
Intensity-Level Slicing a b
There are applications in which it is of interest to highlight a specific range of intensities in an
image. Some of these applications include enhancing features in satellite imagery, such as masses of
water, and enhancing flaws in X-ray images. The method, called intensity-level slicing, can be
implemented in several ways, but most are variations of two basic themes. One approach is to
display in one value (say, white) all the values in the range of interest and in another (say, black) all
other intensities. This transformation, produces a binary image. The second approach, brightens (or
darkens) the desired range of intensities, but leaves all other intensity levels in the image
unchanged.
Bit-Plane Slicing c d
Pixel values are integers composed of bits. For example, values in a 256-level grayscale image are
composed of 8 bits (one byte). Instead of highlighting intensity-level ranges, we could highlight the
contribution made to total image appearance by specific bits. An 8-bit image may be considered as
being composed of eight one-bit planes, with plane 1 containing the lowest-order bit of all pixels in
the image, and plane 8 all the highest-order bits.
Histogram Processing
Let rk , for k = 0,1, 2,…,L − 1, denote the intensities of an L-level digital image, f (x, y). The
unnormalized histogram of f is defined as h(rk) = nk for = 0,1, 2,…, L − 1, where nk is the number of
pixels in f with intensity rk , and the subdivisions of the intensity scale are called histogram bins.
Similarly, the normalized histogram of f is defined as p(rk) = h(rk) / MN = nk / MN, where, as usual,
M and N are the number of image rows and columns, respectively. Mostly, we work with
normalized histograms, which we refer to simply as histograms or image histograms. The sum of
p(rk) = 1 for all values of k is always 1. The components of p(rk) are estimates of the probabilities of
intensity levels occurring in an image. Histograms are simple to compute and are also suitable for
fast hardware implementations, thus making histogram-based techniques a popular tool for real-
time image processing.

In the dark image that the most populated histogram bins are concentrated on the lower (dark) end
of the intensity scale. Similarly, the most populated bins of the light image are biased toward the
higher end of the scale. An image with low contrast has a narrow histogram located typically toward
the middle of the intensity scale. For a monochrome image, this implies a dull, washed-out gray
look. Finally, we see that the components of the histogram of the high-contrast image cover a wide
range of the intensity scale, and the distribution of pixels is not too far from uniform, with few bins
being much higher than the others. Intuitively, it is reasonable to conclude that an image whose
pixels tend to occupy the entire range of possible intensity levels and, in addition, tend to be
distributed uniformly, will have an appearance of high contrast and will exhibit a large variety of
gray tones. The net effect will be an image that shows a great deal of gray-level detail and has a
high dynamic range.
Histogram Equalization
Assuming initially continuous intensity values, let the variable r denote the intensities of an image
to be processed. As usual, we assume that r is in the range [0,L − 1], with r = 0 representing black
and r = L − 1 representing white. For r satisfying these conditions, we focus attention on
transformations (intensity mappings) of the form s = T(r), 0 ≤ r ≤ L − 1, that produce an output
intensity value, s, for a given intensity value r in the input image. We assume that,
(a) T(r) is a monotonic increasing or strictly monotonic increasing function in the interval 0 ≤ r
≤ L − 1; and
(b) 0 ≤ T(r) ≤ L − 1 for 0 ≤ r ≤ L − 1.
The transformation equation for discrete values of k can be written as follows.
k
s k = T (r k ) = ( L−1) ∑ p r (r j ) ∀ k=0,1,2 ,... ,( L−1), where the probability of occurrence of
j=0
nk
intensity level rk in a digital image is given by, p r (r k ) = . This is called a histogram
MN
equalization or histogram linearization transformation.
The left column in the figure shows
the four images, and the center
column shows the result of
performing histogram equalization on
each of these images. The first three
results from top to bottom show
significant improvement. As expected,
histogram equalization did not have
much effect on the fourth image
because its intensities span almost the
full scale already.

Local Histogram Processing


The histogram processing methods
discussed thus far are global, in the
sense that pixels are modified by a
transformation function based on the
intensity distribution of an entire
image. This global approach is
suitable for overall enhancement, but
generally fails when the objective is to
enhance details over small areas in an
image. This is because the number of
pixels in small areas have negligible
influence on the computation of
global transformations. The solution is
to devise transformation functions
based on the intensity distribution of
pixel neighborhoods. The histogram processing techniques previously described can be adapted to
local enhancement. The procedure is to define a neighborhood and move its center from pixel to
pixel in a horizontal or vertical direction. At each location, the histogram of the points in the
neighborhood is computed, and either a histogram equalization or histogram specification
transformation function is obtained. This function is used to map the intensity of the pixel centered
in the neighborhood. The center of the neighborhood is then moved to an adjacent pixel location
and the procedure is repeated. Because only one row or column of the neighborhood changes in a
one-pixel translation of the neighborhood, updating the histogram obtained in the previous location
with the new data introduced at each motion step is possible. This approach has obvious advantages
over repeatedly computing the histogram of all pixels in the neighborhood region each time the
region is moved one pixel location. Another approach used sometimes to reduce computation is to
utilize non-overlapping regions, but this method usually produces an undesirable “blocky” effect.
Fundamentals Of Spatial Filtering
Spatial filtering modifies an image by replacing the value of each pixel by a function of the values
of the pixel and its neighbors. If the operation performed on the image pixels is linear, then the filter
is called a linear spatial filter. Otherwise, the filter is a nonlinear spatial filter. We will focus
attention first on linear filters and then introduce some basic nonlinear filters.
The Mechanics Of Linear Spatial Filtering
A linear spatial filter performs a sum-of-products operation between an image f and a filter kernel,
w. The kernel is an array whose size defines the neighborhood of operation, and whose coefficients
determine the nature of the filter. Other terms used to refer to a spatial filter kernel are mask,
template, and window. We use the term filter kernel or simply kernel.
At any point (x, y) in the image, the response, g(x, y), of the filter is the sum of products of the
kernel coefficients and the image pixels encompassed by the kernel:
g(x, y) = w(-1,-1)f(x-1, y-1) + w(-1,0)f(x-1, y) + ... + w(0,0)f(x, y) + ... + w(1,1)f(x+1, y+1)
As coordinates x and y are varied, the center of the kernel moves from pixel to pixel, generating the
filtered image, g, in the process.
Observe that the centre coefficient of the kernel, w(0, 0), aligns with the pixel at location (x, y). For
a kernel of size m × n, we assume that m = 2a + 1 and n = 2b + 1, where a and b are nonnegative
integers. This means that our focus is on kernels of odd size in both coordinate directions. In
general, linear spatial filtering of an image of size M × N with a kernel of size m × n is given by the
a b
expression: g( x , y) = ∑ ∑ w (s , t ) f ( x+ s , y+t ).
−a −b
Sometimes linear spatial filtering process is also known as convolving a kernel with an image or
simply convolution.
Sometimes an image is filtered (i.e., convolved) sequentially, in stages, using a different kernel in
each stage. For example, suppose than an image f is filtered with a kernel w1, the result filtered with
kernel w2, that result filtered with a third kernel, and so on, for Q stages. Because of the
commutative property of convolution, this multistage filtering can be done in a single filtering
operation, w * f, where w = w1 * w2 * w3 * ... * wQ.
Separable Filter Kernels
A 2-D function, G(x,y) is said to be separable if it can be written as the product of two 1-D
functions G1(x) and G2(y), that is, G(x,y) = G1(x)G2(y). A spatial filter kernel is a matrix, and a
separable kernel is a matrix that can be expressed as the outer product of two vectors.
A separable kernel of size m × n can be expressed as the outer product of two vectors, v1 and v2: w =
v1 v2T, where v1 and v2 are vectors of size m x 1 and n x 1, respectively. For a square kernel v of size
m x m, we write, w = vvT.
The importance of separable kernels lies in the computational advantages that result from the
associative property of convolution. If we have a kernel w that can be decomposed into two simpler
kernels, such that w = w1 * w2 , then it follows from the commutative and associative properties in
that, w * f = (w1 * w2) * f = (w2 * w1) * f = w2 * (w1 * f) = (w1 * f) * w2. This equation says that
convolving a separable kernel with an image is the same as convolving w1 with f first, and then
convolving the result with w2.
Comparisons Between Filtering In The Spatial And Frequency Domains
The bridge between spatial- and frequency-domain processing is the Fourier transform. We use the
Fourier transform to go from the spatial to the frequency domain; to return to the spatial domain we
use the inverse Fourier transform. The comparison between two fundamental properties relating the
spatial and frequency domains:
1. Convolution, which is the basis for filtering in the spatial domain, is equivalent to
multiplication in the frequency domain, and vice versa.
2. An impulse of strength A in the spatial domain is a constant of value A in the frequency
domain, and vice versa. a function (e.g., an image) can be expressed as the sum of sinusoids
of different frequencies and amplitudes.
Thus, the appearance of an image depends on the frequencies of its sinusoidal components—
change the frequencies of those components, and you will change the appearance of the image.
What makes this a powerful concept is that it is possible to associate certain frequency bands with
image characteristics. For example, regions of an image with intensities that vary slowly (e.g., the
walls in an image of a room) are characterized by sinusoids of low frequencies. Similarly, edges and
other sharp intensity transitions are characterized by high frequencies. Thus, reducing the high
frequency components of an image will tend to blur it. Linear filtering is concerned with finding
suitable ways to modify the frequency content of an image. In the spatial domain we do this via
convolution filtering. In the frequency domain we do it with multiplicative filters.

For example, consider a 1-D function (such as an intensity scan line through an image) and suppose
that we want to eliminate all its frequencies above a cutoff value, u0 , while “passing” all
frequencies below that value. Figure (a) shows a frequency-domain filter function for doing this.
(The term filter transfer function is used to denote filter functions in the frequency domain—this is
analogous to our use of the term “filter kernel” in the spatial domain.) Appropriately, the function in
Fig. (a) is called a lowpass filter transfer function. In fact, this is an ideal lowpass filter function
because it eliminates all frequencies above u0, while passing all frequencies below this value. That
is, the transition of the filter between low and high frequencies is instantaneous. Such filter
functions are not realizable with physical components, and have issues with “ringing” when
implemented digitally.
To lowpass-filter a spatial signal in the frequency domain, we first convert it to the frequency
domain by computing its Fourier transform, and then multiply the result by the filter transfer
function in Fig. (a) to eliminate frequency components with values higher than u0. To return to the
spatial domain, we take the inverse Fourier transform of the filtered signal. The result will be a
blurred spatial domain function.
Because of the duality between the spatial and frequency domains, we can obtain the same result in
the spatial domain by convolving the equivalent spatial domain filter kernel with the input spatial
function. The equivalent spatial filter kernel is the inverse Fourier transform of the frequency-
domain filter transfer function.
Approaches of constructing spatial filters
There are three general approaches as described below.
• One approach is based on formulating filters based on mathematical properties. For
example, a filter that computes the average of pixels in a neighbourhood blurs an image.
Computing an average is analogous to integration. Conversely, a filter that computes the
local derivative of an image sharpens the image.
• A second approach is based on sampling a 2-D spatial function whose shape has a desired
property. For example, we will show in the next section that samples from a Gaussian
function can be used to construct a weighted-average (lowpass) filter. These 2-D spatial
functions sometimes are generated as the inverse Fourier transform of 2-D filters specified
in the frequency domain.
• A third approach is to design a spatial filter with a specified frequency response. This
approach is based on the concepts discussed in the previous section, and falls in the area of
digital filter design. A 1-D spatial filter with the desired response is obtained (typically using
filter design software). The 1-D filter values can be expressed as a vector v, and a 2-D
separable kernel can then be obtained. Or the 1-D filter can be rotated about its centre to
generate a 2-D kernel that approximates a circularly symmetric function.
Smoothing (Lowpass) Spatial Filters
Smoothing (also called averaging) spatial filters are used to reduce sharp transitions in intensity.
Because random noise typically consists of sharp transitions in intensity, an obvious application of
smoothing is noise reduction. Smoothing prior to image resampling to reduce aliasing, is also a
common application. Smoothing is used to reduce irrelevant detail in an image, where “irrelevant”
refers to pixel regions that are small with respect to the size of the filter kernel. Another application
is for smoothing the false contours that result from using an insufficient number of intensity levels
in an image. Smoothing filters are used in combination with other techniques for image
enhancement, such as the histogram processing techniques.
Box Filter Kernels
The simplest, separable lowpass filter kernel is the box kernel, whose coefficients have the same
value (typically 1). The name “box kernel” comes from a constant kernel resembling a box when
viewed in 3-D. An m × n box filter is an m × n array of 1’s, with a normalizing constant in front,
whose value is 1 divided by the sum of the values of the coefficients (i.e., 1/mn when all the
coefficients are 1’s). This normalization, which we apply to all lowpass kernels, has two purposes.
First, the average value of an area of constant intensity would equal that intensity in the filtered
image, as it should. Second, normalizing the kernel in this way prevents introducing a bias during
filtering; that is, the sum of the pixels in the original and filtered images will be the same. Because
in a box kernel all rows and columns are identical, it means that they are separable.
ab
cd
(a) Test pattern of
size 1024 × 1024
pixels.
(b)-(d) Results of
lowpass filtering
with box kernels
of sizes 3 × 3,
11 × 11,
and 21 × 21,
respectively.

ab
Examples of
smoothing kernels:
(a) is a box kernel;
(b) is a Gaussian
kernel.

ab
(a) Sampling a
Gaussian
function
to obtain a
discrete
Gaussian kernel.
The values
shown are for K
= 1 and s = 1. (b)
Resulting 3 × 3
kernel.

Lowpass Gaussian Filter Kernels


Because of their simplicity, box filters are suitable for quick experimentation and they often yield
smoothing results that are visually acceptable. They are useful also when it is desired to reduce the
effect of smoothing on edges.
However, box filters have limitations that make them poor choices in many applications. For
example, a defocused lens is often modeled as a lowpass filter, but box filters are poor
approximations to the blurring characteristics of lenses.
Another limitation is the fact that box filters favor blurring along perpendicular directions. In
applications involving images with a high level of detail, or with strong geometrical components,
the directionality of box filters often produces undesirable results.
These are but two applications in which box filters are not suitable. The kernels of choice in
applications such as those just mentioned are circularly symmetric (also called isotropic, meaning
their response is independent of orientation). As it turns out, Gaussian kernels of the form
s2 + t 2

2σ 2
w ( s , t) = G (s , t )=Ke , are the only circularly symmetric kernels that are also separable. Thus,
because Gaussian kernels of this form are separable, Gaussian filters enjoy the same computational
advantages as box filters, but have a host of additional properties that make them ideal for image
processing.

abc
(a)A test pattern of size 1024 × 1024. (b) Result of lowpass filtering the pattern with a Gaussian kernel of
size 21 × 21, with standard deviations s = 3.5. (c) Result of using a kernel of size 43 × 43, with s = 7. We
used K = 1 in all cases.
Using lowpass filtering and thresholding for region extraction.
Figure (a) is a 2566 × 2758 Hubble Telescope image of the Hickson Compact Group (see figure
caption), whose intensities were scaled to the range [0,1]. Our objective is to illustrate lowpass
filtering combined with intensity thresholding for eliminating irrelevant detail in this image. In the
present context, “irrelevant” refers to pixel regions that are small compared to kernel size.
Figure (b) is the result of filtering the original image with a Gaussian kernel of size 151 × 151
(approximately 6% of the image width) and standard deviation s = 25. We chose these parameter
values in order generate a sharper, more selective Gaussian kernel shape than we used in earlier
examples. The filtered image shows four predominantly bright regions. We wish to extract only
those regions from the image. Figure (c) is the result of thresholding the filtered image with a
threshold T = 0 4. As the figure shows, this approach effectively extracted the four regions of
interest, and eliminated details deemed irrelevant in this application.

abc
(a) A 2566 × 2758 Hubble Telescope image of the Hickson Compact Group. (b) Result of lowpass filtering
with a Gaussian kernel. (c) Result of thresholding the filtered image (intensities were scaled to the range [0,
1]). The Hickson Compact Group contains dwarf galaxies that have come together, setting off thousands of
new star clusters. (Original image courtesy of NASA.)
Order-Statistic (Nonlinear) Filters
Order-statistic filters are nonlinear spatial filters whose response is based on ordering (ranking) the
pixels contained in the region encompassed by the filter. Smoothing is achieved by replacing the
value of the center pixel with the value determined by the ranking result. The best-known filter in
this category is the median filter, which, as its name implies, replaces the value of the center pixel
by the median of the intensity values in the neighborhood of that pixel (the value of the center pixel
is included in computing the median).
Median filters provide excellent noise reduction capabilities for certain types of random noise, with
considerably less blurring than linear smoothing filters of similar size. Median filters are
particularly effective in the presence of impulse noise (sometimes called salt-and-pepper noise,
when it manisfests itself as white and black dots superimposed on an image).
In order to perform median filtering at a point in an image, we first sort the values of the pixels in
the neighborhood, determine their median, and assign that value to the pixel in the filtered image
corresponding to the center of the neighborhood. For example, in a 3 × 3 neighborhood the median
is the 5th largest value, in a 5 × 5 neighborhood it is the 13th largest value, and so on. When several
values in a neighborhood are the same, all equal values are grouped. For example, suppose that a 3
× 3 neighborhood has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10,
15, 20, 20, 20, 20, 20, 25, 100), which results in a median of 20.
The median filter is by far the most useful order-statistic filter in image processing, but is not the
only one. The median represents the 50th percentile of a ranked set of numbers, but ranking lends
itself to many other possibilities. For example, using the 100th percentile results in the so-called
max filter, which is useful for finding the brightest points in an image or for eroding dark areas
adjacent to light regions. The 0th percentile filter is the min filter, used for the opposite purpose.
Sharpening (Highpass) Spatial Filters
Sharpening highlights transitions in intensity. Uses of image sharpening range from electronic
printing and medical imaging to industrial inspection and autonomous guidance in military systems.
Unsharp Masking And Highboost Filtering
Subtracting an unsharp (smoothed) version of an image from the original image is process that has
been used since the 1930s by the printing and publishing industry to sharpen images. This process,
called unsharp masking, consists of the following steps:
1. Blur the original image.
2. Subtract the blurred image from the original (the resulting difference is called the mask.)
3. Add the mask to the original.
Let fB(x, y) denote the blurred image and f(x, y) be the original image, then the mask can be defined
as, gmask(x, y) = f(x, y) – fB(x, y). To get the sharpened image, we add the weighted mask back to the
original image, fsharp(x, y) = f(x, y) + k.gmask(x, y). When k = 1 we have unsharp masking, as defined
above. When k > 1, the process is referred to as highboost filtering. Choosing k < 1 reduces the
contribution of the unsharp mask.
Sobel filter: Let a 3x3 region of image be represented as follows.

The Sobel filters, also known as Sobel operators are as follows.

We prefer to use kernels of odd sizes because they have a unique, (integer) center of spatial
symmetry. The smallest kernels in which we are interested are of size 3 × 3. Approximations to gx
and gy using a 3 × 3 neighborhood centered on z5 are as follows:
∂f ∂f
gx = = ( z7 +2 z8 + z 9 ) − ( z 1 +2 z 2+ z 3 ) and g y = = ( z3 +2 z 6 + z9 ) − ( z1 +2 z4 + z7 ).
∂x ∂y
The resultant image intensities are computed as,
1 1
2 2
[ 2 2
]
M ( x , y ) = [ g x + g y ] 2 = [ ( z7 +2 z 8 + z9 ) − ( z1 +2 z 2 + z 3 ) ] + [ ( z 3 +2 z 6 + z 9 ) − ( z 1 +2 z 4 + z 7 ) ] 2 .
Fourier Series And Transform
The Fourier series is named after the French mathematician Jean Baptiste Joseph Fourier. His
contribution was that any periodic function can be expressed as the sum of sines and/or cosines of
different frequencies, each multiplied by a different coefficient. It does not matter how complicated
the function is; if it is periodic and satisfies some mathematical conditions, it can be represented by
such a sum.
Functions that are not periodic (but whose area under the curve is finite) can be expressed as
the integral of sines and/or cosines multiplied by a weighting function. The formulation in this case
is the Fourier transform, and its utility is even greater than the Fourier series in many theoretical
and applied disciplines.
Both representations share the important characteristic that a function, expressed in either a
Fourier series or transform, can be reconstructed (recovered) completely via an inverse process,
with no loss of information. This is one of the most important characteristics of these
representations because it allows us to work in the Fourier domain (generally called the frequency
domain) and then return to the original domain of the function without losing any information.
Frequency Domain Filtering Fundamentals
Filtering in the frequency domain consists of modifying the Fourier transform of an image, then
computing the inverse transform to obtain the spatial domain representation of the processed result.
Thus, given (a padded) digital image, f(x,y), of size P ×Q pixels, the basic filtering equation in
which we are interested has the form:
g( x , y ) = Real { F−1 [ H (u , v) F(u , v)] }, where F−1 is the IDFT, F(u,v) is the DFT of the input
image, f(x, y), H(u,v) is a filter transfer function (which we often call just a filter or filter function),
and g(x, y) is the filtered (output) image. Functions F, H, and g are arrays of size P ×Q, the same as
the padded input image. The product H(u,v)F(u,v) is formed using elementwise multiplication.
One of the simplest filter transfer functions we can construct is a function H(u,v) that is 0 at
the center of the (centered) transform, and 1’s elsewhere. This filter would reject the dc term and
“pass” (i.e., leave unchanged) all other terms of F(u,v) when we form the product H(u,v)F(u,v). We
know that the dc term is responsible for the average intensity of an image, so setting it to zero will
reduce the average intensity of the output image to zero.The image became much darker. An
average of zero implies the existence of negative intensities.
As noted earlier, low frequencies in the transform are related to slowly varying intensity
components in an image, such as the walls of a room or a cloudless sky in an outdoor scene. On the
other hand, high frequencies are caused by sharp transitions in intensity, such as edges and noise.
Therefore, we would expect that a function H(u,v) that attenuates(blocks) high frequencies while
passing low frequencies (called a lowpass filter, as noted before) would blur an image, while a filter
with the opposite property (called a highpass filter) would enhance sharp detail, but cause a
reduction
in contrast in the image.
Adding a small constant to the filter does not affect sharpening appreciably, but it does
prevent elimination of the dc term and thus preserves tonality.
Wraparound, saturation and Zero padding: If an image is represented in a byte or integer pixel
format, the maximum pixel value is limited by the number of bits used for the representation, e.g.
the pixel values of a 8-bit image are limited to 255. However, many image processing operations
produce output values which are likely to exceed the given maximum value. In such cases, we have
to decide how to handle this pixel overflow. It is specifically true for Fourier transformations,
because it involves addition of two frequencies representing two images.
One possibility is to wrap around the overflowing pixel values. This means that if a value is
greater than the possible maximum, we subtract the pixel value range so that the value starts again
from the possible minimum value. For 8-bit format it means that 256 will be replaced by 0, 257 by
1, and so on.
Another possibility is to set all overflowing pixels to the maximum possible values --- an
effect known as saturation. For 8-bit format it means that all values greater than 255 will be
replaced by 255. This technique is known as saturation.
If only a few pixels in the image exceed the maximum value it is often better to apply the
latter technique, especially if we use the image for display purposes. However, by setting all
overflowing pixels to the same value we lose an essential amount of information. In the worst case,
when all pixels exceed the maximum value, this would lead to an image of constant pixel values.
Wrapping around overflowing pixel retains the differences between values. On the other hand, it
might cause the problem that pixel values passing the maximum ‘jump’ from the maximum to the
minimum value. This may distort the resulting image.
In digital image processing, a filter works like a sliding window containing (usually) an odd
number of pixels, arranged in a grid. The filter is moved from top left corner of the image to right
for every row of pixel in the image. For the pixels that are on the edges of the image, some part of
the filter will be outside the image area. This is compensated by wrapping the filter around the
opposite end of the image. This is known as circular convolution. It means that in circular
convolution, pixels on the edges of the image will influence the pixels on the opposite end of the
image. This may result in distortion on the edges of the image(s).
To remedy this problem extra redundant pixels are concatenated along the edges of the
images and those pixels are usually set to zero. This is known as zero padding.
There are two other types of padding –
a) Mirror padding – In mirror padding, values outside the boundary of the image are
obtained by mirror-reflecting the image across its border.
b) Replicate padding – In replicate padding, values outside the boundary are set equal to the
nearest image border value.
Let us consider two functions, f (x) and h(x) composed of A and B samples, respectively. It
can be shown that if we append zeros to both functions so that they have the same length, denoted
by P, then wraparound is avoided by choosing P ≥ A + B − 1. In our example, each function has 400
points, so the minimum value we could use is P = 799, which implies that we would append 399
zeros to the trailing edge of each function.
In the two-dimensional case, the above can be expressed as follows. Let f(x,y) and h(x,y) are
two images of size A x B and C x D respectively. Then P ≥ A + C – 1 and Q ≥ B + D – 1, where P
and Q are the width and height of the resulting image. If both arrays are same, then P ≥ 2M – 1, and
Q ≥ 2N – 1.
As a rule, DFT algorithms tend to execute faster with arrays of even size, so it is good
practice to select P and Q as the smallest even integers that satisfy the preceding equations. If the
two arrays are of the same size, this means that P and Q are selected as: P = 2M and Q = 2N.
Steps For Filtering In The Frequency Domain
The process of filtering in the frequency domain can be summarized as follows:
1. Given an input image f (x, y) of size M × N, obtain the padding sizes P and Q; that is, P =
2M and Q = 2N.
2. Form a padded image fp(x, y) of size P ×Q using zero-, mirror-, or replicate padding.
3. Multiply f p(x,y) by (−1)x+y to center the Fourier transform on the P ×Q frequency rectangle.
4. Compute the DFT, F(u,v), of the image from Step 3.
5. Construct a real, symmetric filter transfer function, H(u,v), of size P ×Q with center at
(P/2,Q/2).
6. Form the product G(u,v) = H(u,v)F(u,v) using elementwise multiplication; that is,
G(i,k) = H(i,k)F(i,k) for i = 0, 1, 2,…,M − 1 and k = 0, 1, 2,…, N − 1.
7. Obtain the filtered image (of size P ×Q) by computing the IDFT of G(u,v):
g p ( x , y) = (real [ F−1 {G (u , v) } ])(−1)x+ y .
8. Obtain the final filtered result, g(x, y), of the same size as the input image, by extracting the
M×N region from the top, left quadrant of gp(x,y).
Correspondence Between Filtering In The Spatial And Frequency Domains
The link between filtering in the spatial and frequency domains is the convolution theorem. We
defined filtering in the frequency domain as the elementwise product of a filter transfer function,
H(u,v), and F(u,v), the Fourier transform of the input image. It can be proven that, given an H(u,v),
its equivalent kernel in the spatial domain is F−1{H(u,v)}, which is the inverse transform of the
frequency domain filter transfer function, which is the corresponding kernel in the spatial domain.
Conversely, it follows from a similar analysis that, given a spatial filter kernel, we obtain its
frequency domain representation by taking the forward Fourier transform of the kernel. Therefore,
the two filters form a Fourier transform pair: h(x, y)⇔ H(u,v) where h(x, y) is the spatial kernel and
H(u,v) is the frequency kernel. Because this kernel can be obtained from the response of a
frequency domain filter to an impulse, h(x, y) sometimes is referred to as the impulse response of
H(u,v). Also, because all quantities in a discrete implementation are finite, such filters are called
finite impulse response (FIR) filters.
Image Smoothing Using Lowpass Frequency Domain Filters
Edges and other sharp intensity transitions (such as noise) in an image contribute significantly to the
high frequency content of its Fourier transform. Hence, smoothing (blurring) is achieved in the
frequency domain by high-frequency attenuation; that is, by lowpass filtering. We consider three
types of lowpass filters: ideal, Butterworth, and Gaussian. These three categories cover the range
from very sharp (ideal) to very smooth (Gaussian) filtering.
Ideal Lowpass Filters
A 2-D lowpass filter that passes without attenuation all frequencies within a circle of radius from
the origin, and “cuts off” all frequencies outside this, circle is called an ideal lowpass filter (ILPF);
it is specified by the transfer function as follows.
where D0 is a positive constant, and D(u,v) is the distance between a point (u,v) in the frequency
domain and the center of the P×Q frequency rectangle; that is,

abc
(a) Perspective plot of an ideal lowpass-filter transfer function. (b) Function displayed as an image.
(c) Radial cross section.
The name ideal indicates that all frequencies on or inside a circle of radius D0 are passed without
attenuation, whereas all frequencies outside the circle are completely attenuated (filtered out). The
ideal lowpass filter transfer function is radially symmetric about the origin. This means that it is
defined completely by a radial cross section. A 2-D representation of the filter is obtained by
rotating the cross section 360°.

abc
def
(a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using ILPFs with cutoff
frequencies set at radii values 10, 30, 60, 160, and 460.
Gaussian Lowpass Filters
D 2 (u , v )

2σ 2
Gaussian lowpass filter (GLPF) transfer functions have the form H (u , v) = e , where, D(u,v)
is the distance from the center of the P×Q frequency rectangle to any point, (u,v), contained by the
rectangle. As before, σ is a measure of spread about the center. By letting σ = D0, we can express the
D 2 (u ,v )

2 D 20
Gaussian transfer function in the same notation as other functions: H (u , v ) = e . where D0 is
the cutoff frequency.

abc
FIGURE (a) Perspective plot of a GLPF transfer function. (b) Function displayed as an image. (c)
Radial cross sections for various values of D0.

abc
def
FIGURE (a) Original image of size 688 × 688 pixels. (b)–(f) Results of filtering using GLPFs with
cutoff frequencies at the radii.

Butterworth Lowpass Filters


The transfer function of a Butterworth lowpass filter (BLPF) of order n, with cutoff frequency at a
distance D0 from the center of the frequency rectangle, is defined as
1
H (u , v ) =
1+[ D(u , v)/ D0 ]2 n
BLPF function can be controlled to approach the characteristics of the ILPF using higher values of
n, and the GLPF for lower values of n, while providing a smooth transition in from low to high
frequencies.

abc
FIGURE (a) Perspective plot of a Butterworth lowpass-filter transfer function. (b) Function
displayed as an image. (c) Radial cross sections of BLPFs of orders 1 through 4.

Image Sharpening Using Highpass Filters


Because edges and other abrupt changes in intensities are associated with high-frequency
components, image sharpening can be achieved in the frequency domain by highpass filtering,
which attenuates low-frequencies components without disturbing high-frequencies in the Fourier
transform.
Ideal, Gaussian, And Butterworth Highpass Filters From Lowpass Filters
As was the case with kernels in the spatial domain, subtracting a lowpass filter transfer function
from 1 yields the corresponding highpass filter transfer function in the frequency domain:
HHP(u,v) = 1 − HLP(u,v), where HLP(u,v) is the transfer function of a lowpass filter. Thus, an ideal
highpass filter (IHPF) transfer function is given by,

FIGURE Top row:


Perspective plot, image,
and, radial cross section of
an IHPF transfer function.
Middle and bottom rows:
The same sequence for
GHPF and BHPF transfer
functions.

You might also like