Unit-3 Notes CV
Unit-3 Notes CV
Image segmentation is the process of dividing a digital image into multiple segments (sets of
pixels) to simplify the image and make it more meaningful for analysis. The goal is to assign
labels to every pixel in the image such that pixels with the same label share certain
characteristics. Segmentation helps in isolating objects and boundaries within an image,
which are critical for various applications like medical imaging, object recognition, and
computer vision.
1. Thresholding: Separates objects from the background based on pixel intensity values.
o Example: Otsu’s method for determining an optimal threshold.
2. Edge-based Segmentation: Identifies object boundaries by detecting edges, which
are sharp changes in intensity.
o Example: Sobel, Canny edge detectors.
3. Region-based Segmentation: Groups pixels into regions based on predefined criteria
such as intensity or texture.
o Example: Region growing, Watershed algorithm.
4. Clustering-based Segmentation: Divides the image into clusters based on pixel
attributes using algorithms like k-means or mean-shift clustering.
o Example: K-means clustering, Gaussian Mixture Models (GMM).
5. Deep Learning-based Segmentation: Uses convolutional neural networks (CNNs)
for tasks like semantic segmentation or instance segmentation.
o Example: U-Net, Fully Convolutional Networks (FCN), Mask R-CNN.
Image Representation
Image representation involves converting segmented image regions into a form that can be
easily analyzed. This step is crucial as it provides the foundation for feature extraction, object
recognition, and interpretation. The choice of representation depends on the nature of the
application and the type of analysis required.
Types of Representations:
Image Description
Applications:
Together, segmentation, representation, and description form the foundation for advanced
image analysis tasks like pattern recognition, computer vision, and machine learning in image
processing.
1. Point Detection
Point detection is often used to find specific key points in an image, such as corners or blobs.
Algorithms that detect points of interest include:
2. Edge Detection
Edge detection algorithms detect areas in an image where intensity changes sharply,
indicating the boundary of objects.
Canny Edge Detector: A widely used multi-stage edge detector. It uses Gaussian filtering,
gradient calculation, non-maximum suppression, and hysteresis thresholding to detect strong
and weak edges.
Sobel Operator: A gradient-based method that detects edges using convolution with Sobel
kernels in the horizontal and vertical directions.
Prewitt Operator: Similar to Sobel, but with slightly different kernel coefficients.
Laplacian of Gaussian (LoG): Detects edges by first applying a Gaussian blur to reduce
noise and then applying the Laplacian operator to highlight regions of rapid intensity change.
3. Line Detection
Line detection algorithms aim to find straight lines in an image. The most commonly used
algorithm is:
Hough Transform (for lines): A voting-based algorithm that detects lines by transforming
edge points into parameter space and finding peaks corresponding to potential lines.
4. Corner Detection
Corner detection algorithms identify points where two or more edges meet. Corners are
useful for tracking features in images.
Harris Corner Detector: Uses the gradient of image intensity and looks for regions with
significant variation in intensity in two directions.
Shi-Tomasi Corner Detector (Good Features to Track): An improvement on the Harris
Corner Detector, it selects corners based on the minimum of eigenvalues of the covariance
matrix.
FAST (Features from Accelerated Segment Test): A fast corner detection algorithm that
checks the intensity difference between a candidate pixel and its surrounding pixels in a
circular neighborhood.
In a grayscale image, each pixel has an intensity value that typically ranges between 0 (black)
and 255 (white). Thresholding works by selecting a threshold value, T, and then classifying
each pixel based on whether its intensity is above or below this value.
1. Global Thresholding:
o A single threshold value T is selected for the entire image.
o Pixels with intensity values greater than T are classified as foreground (object), and
pixels with intensity values less than or equal to T are classified as background.
2. Adaptive Thresholding:
o Instead of using a single threshold for the entire image, adaptive thresholding
calculates different thresholds for small regions of the image. This is useful for
images where lighting conditions or contrasts vary significantly.
o Two common methods are:
1. Mean Adaptive Thresholding: Uses the mean of pixel intensities in the
local region to compute the threshold.
2. Gaussian Adaptive Thresholding: Uses a weighted sum of the pixel
intensities in the local region, where nearby pixels have higher weight.
3. Otsu’s Thresholding:
o Otsu's method is an automatic global thresholding technique. It determines the
optimal threshold value by minimizing the intra-class variance (variance within the
object and background) and maximizing inter-class variance.
o It is useful when the histogram of the image has two distinct peaks, representing the
object and background.
4. Multilevel Thresholding:
o Instead of converting the image to two classes (foreground and background),
multilevel thresholding divides the image into several classes based on multiple
threshold values.
o Useful in applications where more than two regions are needed, for example,
different intensity levels of an image.
5. Band Thresholding:
o A range of intensities is used to define a foreground object.
o Pixels whose intensity falls within a specified range are classified as foreground,
while others are treated as background.
6. Inverse Thresholding:
o The logic of the thresholding is reversed.
o Pixels below the threshold are set to foreground, and pixels above are set to
background.
Medical Imaging: Thresholding can separate tissues from the background in X-ray or MRI
scans.
Document Binarization: Converts scanned documents into black-and-white images for
Optical Character Recognition (OCR).
Object Detection: Separates objects from the background in surveillance or industrial
imaging systems.
Key Challenges:
Edge and Boundary Linking is a process in image processing that follows edge
detection to form continuous, meaningful boundaries or contours of objects in an image.
After detecting edges, many edge detectors (like the Sobel or Canny operator) return
disconnected or fragmented edges, which may not clearly represent the full shape of objects.
Edge and boundary linking techniques help in connecting these disjointed edges to form
complete object boundaries.
Edge Linking
Edge linking is the process of connecting isolated edge pixels to form continuous edges based
on certain criteria, such as proximity, gradient direction, and intensity. It is crucial because,
after edge detection, many real-world images produce incomplete or broken edges that must
be linked together to represent object contours accurately.
2. Hough Transform:
o Used for detecting geometric shapes like lines and circles. It converts points in
the image space into a parameter space, where patterns (lines, circles) are
easier to detect.
o Edge pixels that form a straight line are detected and linked by voting for lines
in parameter space.
o Line Detection (Hough Transform): Points on an edge are mapped to
sinusoidal curves in Hough space. When curves intersect, it indicates the
presence of a line.
o Circle Detection (Hough Circle Transform): Similar to line detection but
detects circular shapes by mapping edge points to circles in parameter space.
3. Edge Relaxation:
o In this technique, nearby edge pixels are examined iteratively, and the decision to link
them is refined based on the analysis of nearby pixel relationships.
o Initially weak edges might get strengthened as more information from neighboring
pixels is considered.
Boundary Linking
1. Contour Following:
o In this method, once an edge pixel is found, the algorithm "follows" along the edge to
trace out the full boundary of the object.
o A common approach is to start at an edge pixel and move to adjacent pixels that are
likely part of the same boundary based on their proximity and gradient direction.
o The Freeman Chain Code is often used to represent the sequence of directions in
which pixels are connected along the boundary.
1. Noise and Weak Edges: Real-world images often contain noise, and some edges may be
weak or incomplete, making the linking process challenging.
2. Texture Complexity: Highly textured regions might produce many edges that are not part of
the actual object boundary, leading to spurious linking.
3. Broken or Disconnected Edges: Disjointed or broken edges need to be linked intelligently
without introducing false boundaries.
The Canny Edge Detector includes an edge-linking step through hysteresis thresholding:
The basic idea is that neighboring pixels within the same region share similar characteristics,
and pixels from different regions have distinct properties. This method is widely used in
applications like medical imaging, satellite image analysis, and object recognition.
1. Region Growing
o How it works: This method starts with a set of seed points, which can be manually
selected or automatically determined. The algorithm then "grows" the regions by
adding neighboring pixels to each seed that are similar to it in terms of intensity or
other features. The process continues until no more pixels can be added to the region.
o Criteria for Growing: Pixels are added to a region if their intensity or color is within
a certain threshold compared to the seed pixel. Thresholding ensures that only similar
pixels are grouped together.
o Example:
Start with a seed pixel (say, with intensity 100).
Expand the region by adding neighboring pixels with intensities close to 100
(e.g., within a range of 90-110).
Challenges:
o Selecting good seed points is crucial. Poor selection can lead to incorrect
segmentations.
o Sensitivity to noise: Noise in the image can cause unwanted regions to grow.
o Region Splitting: If a region does not meet the homogeneity criteria, it is split into
smaller regions (usually into quadrants).
o Region Merging: Once the splitting is done, adjacent regions are examined. If two
adjacent regions are found to be similar, they are merged into a larger region.
Example:
Challenges:
3. Watershed Algorithm
o How it works: The watershed algorithm is inspired by the concept of a landscape or
topographic surface. Imagine that high-intensity pixels represent peaks (mountains),
and low-intensity pixels represent valleys. The algorithm simulates the flooding of
this landscape, starting from the lowest points. The "water" will gradually fill the
valleys (regions) and stop where it encounters a boundary (i.e., the watershed line)
between two basins.
o Steps:
First, a gradient image is computed, where the gradient values represent the
steepness of intensity changes.
Water starts flooding from the lowest gradient points.
As the flooding progresses, boundaries are formed where waters from
different sources meet.
Applications:
Challenges:
Applications:
o Commonly used in computer vision for object detection, video segmentation, and
object tracking.
1. Region Homogeneity: This method naturally segments regions that are homogeneous in
intensity or texture, making it suitable for images with well-defined areas.
2. Accurate Boundary Localization: Since region-based methods work by growing and
merging regions, they can produce more accurate boundaries compared to edge-based
segmentation in some cases.
3. Flexibility: Region-based methods can be applied to different types of images, including
grayscale and color images, by adjusting the homogeneity criteria.
Boundary Representation (or B-Rep) is a method used in computer graphics and image
processing to define the shape of an object by representing its boundaries rather than its
interior. In digital images, boundary representation focuses on outlining the objects by
identifying their edges or contours, which provides a clear and compact way to describe the
shape and structure of objects.
1. Boundary Points: Points that lie on the boundary or edge of the object, usually obtained
through edge detection.
2. Boundary Curves or Lines: The sequence of connected boundary points that form a
continuous contour or shape around the object.
Chains of pixels (in raster images) that follow the contour of the object.
Lines, arcs, or curves (in vector graphics or 3D models) that define the shape
mathematically.
2. Polygonal Approximation
o How it Works: This method approximates a boundary by a series of straight-line
segments, effectively converting the boundary into a polygon. It simplifies complex
boundaries by representing them with fewer vertices and edges.
o Methods: The Ramer-Douglas-Peucker algorithm and split-and-merge
algorithms are commonly used for polygonal approximation.
o Benefits: Reduces complexity by smoothing minor variations and noise along the
boundary, creating a simpler and more manageable representation.
3. Boundary Descriptors
o Boundary descriptors represent boundaries by analyzing their geometric properties
rather than focusing on individual pixels or segments. Common boundary descriptors
include:
1. Curvature: Measures the change in the direction of the boundary curve at
each point.
2. Fourier Descriptors: Uses the Fourier transform to represent boundaries as a
series of sinusoidal components. Fourier descriptors are rotation and scale-
invariant, making them ideal for shape matching and recognition.
3. Shape Signatures: These are 1D functions that describe the boundary shape,
such as the distance signature (distance from each boundary point to the
object’s centroid).
Boundary representation is useful in various image processing and computer vision tasks,
including:
1. Object Recognition: The boundary or shape of an object can be used to classify or recognize
objects, especially when color or texture is not reliable.
2. Shape Analysis: Boundary representation enables the analysis of shapes to measure
properties like perimeter, area, curvature, and orientation.
3. Pattern Matching and Feature Extraction: Using Fourier descriptors or curvature-based
descriptors, patterns in object boundaries can be matched to known shapes.
4. Medical Imaging: Boundary representation helps in segmenting and analyzing shapes of
anatomical structures like bones, organs, and tumors.
1. Compact Representation: Boundaries are typically more compact than representing the
entire object interior, especially for thin or elongated shapes.
2. Shape Analysis: Boundaries allow for detailed shape analysis, which is useful for object
recognition and matching.
3. Flexible with Noise Reduction: Boundary representation methods like polygonal
approximation help reduce noise by smoothing minor variations in the boundary.
Disadvantages of Boundary Representation
1. Sensitivity to Noise: Boundary extraction can be sensitive to noise, which may result in
jagged or incomplete boundaries.
2. Dependence on Accurate Edge Detection: Boundary representation relies on precise edge
detection, which can be challenging in images with low contrast or complex textures.
3. Complex Shapes: Some methods (like chain codes) may struggle with representing highly
irregular or fractal-like boundaries efficiently.
Region Representations
1. Homogeneity: Region-based methods assume that pixels within a region are homogeneous,
meaning they share similar intensity, color, or texture.
2. Connectivity: Pixels within a region are connected, meaning each pixel can be reached from
any other pixel within the same region by following a path through neighboring pixels.
3. Compactness: Region representations should ideally be compact and efficient in terms of
storage, focusing on representing the overall structure and composition of the region.
There are several ways to represent regions within an image, depending on the nature of the
image data and the application’s requirements.
How it Works: Region-based segmentation divides an image into distinct regions based on
specific properties like intensity or color.
Example: In a medical image, regions could represent different tissues or organs based on
intensity differences in MRI scans.
Methods:
o Region Growing: Begins with a set of seed points and expands by adding
neighboring pixels with similar properties.
o Region Splitting and Merging: Starts with the entire image as one region, splits it
based on homogeneity, and then merges similar adjacent regions.
2. Pixel-Based Representation
How it Works: Each pixel within a region is represented individually, with properties such as
intensity, color, or texture recorded for each pixel.
Storage: This method is straightforward but can be memory-intensive, especially for large or
high-resolution images.
Applications: Useful in cases where detailed analysis of individual pixels within a region is
required, such as in medical imaging or remote sensing.
3. Run-Length Encoding (RLE)
How it Works: RLE compresses a region by representing consecutive pixels of the same
value in each row as a single value (the pixel value) and the number of repetitions.
Example: A region of white pixels in a binary image can be represented as "white, length 5"
instead of storing each pixel individually.
Benefits: Reduces storage requirements, especially for images with large homogeneous
regions.
Applications: Commonly used in document processing, binary image representation, and
simple shape representation.
4. Quadtrees
How it Works: Quadtrees divide an image region recursively into four quadrants until each
quadrant is homogeneous. The division stops when the region meets a homogeneity criterion,
resulting in a tree structure where each node is a region.
Storage: Quadtrees provide a hierarchical representation, where each node represents a
square region, and the level in the tree determines the size of the region.
Benefits: Efficient storage, as it allows a compact representation by grouping similar areas.
Applications: Widely used in image compression, geographic information systems (GIS),
and applications that require hierarchical image analysis.
How it Works: A binary spatial array (also known as a bitmask) represents each region by
setting specific bits to 1 for pixels that belong to the region and 0 for those that do not.
Benefits: Efficient for representing regions in binary images (like text or silhouettes) where
each region is either present or absent.
Applications: Used in simple segmentation tasks, object counting, or areas where only binary
information (presence or absence) is needed.
How it Works: Contour-based methods can describe regions by their internal contours or any
closed shape that exists within the region. While contour-based methods are usually used for
boundaries, they can also describe internal regions when there are nested or complex shapes.
Benefits: Allows for hierarchical or layered representation if there are regions within regions
(e.g., an object with internal features).
Applications: Used in pattern recognition and analysis of images where nested or enclosed
structures need to be represented.
How it Works: Skeletonization reduces the region to its “skeleton,” a thin representation of
the region's shape that preserves its structure and connectivity. This is done by iteratively
peeling off layers of pixels until only a central line remains, equidistant from the region
boundaries.
Benefits: Simplifies complex shapes and reduces the amount of data needed to represent the
region while preserving the topological structure.
Applications: Useful in shape analysis, object recognition, and applications where the
internal connectivity of a region is more important than its full representation.
1. Medical Imaging: Different tissues or organs are often segmented and represented as regions
for analysis in MRI or CT scans, making it easier to quantify volumes and measure attributes.
2. Object Detection: Regions in an image are analyzed to detect and identify specific objects
based on size, shape, or color.
3. Pattern Recognition: Regions can represent specific textures or color patterns, aiding in the
identification of objects like foliage, urban areas, or geological formations in satellite
imagery.
4. Content-Based Image Retrieval: Regions with distinctive features (color, texture) are stored
as descriptors to enable efficient image search and retrieval based on visual similarity.
1. Preserves Object Area: Region-based methods capture the entire interior of an object, which
is valuable for applications requiring measurements of area, volume, or texture.
2. Supports Complex Shape Analysis: By working with the entire area, region representation
is less sensitive to fragmentation and noise, which can affect edge or boundary-based
methods.
3. Flexibility in Property Analysis: Region-based representations allow for a wide range of
analysis methods, including intensity, texture, and color.
Boundary Descriptors and Regional Descriptors are methods used to characterize and
analyze the shape, structure, and properties of objects within an image. These descriptors play
a crucial role in applications like image recognition, classification, and analysis by providing
quantitative data on the features of an object’s boundary (external outline) and region
(interior properties).
1. Boundary Descriptors
Boundary Descriptors focus on the shape and characteristics of the outline or contour of an
object. These descriptors provide information about the form, structure, and spatial
arrangement of the object’s boundary, which is particularly useful in shape analysis and
object recognition.
2. Fourier Descriptors
o How it Works: Fourier descriptors are computed by applying a Fourier transform to
the boundary coordinates (x, y) of the object. They represent the shape as a series of
sinusoidal components, which capture various details about the boundary’s frequency
features.
o Benefits: Effective for shape recognition as Fourier descriptors are invariant to
translation, scaling, and rotation.
o Applications: Useful in matching and recognition tasks, such as recognizing objects
in different orientations.
3. Curvature
o How it Works: Curvature measures the rate at which the boundary curve changes
direction. It’s calculated as the angle change along the contour at each boundary
point.
o Benefits: Curvature-based descriptors help in identifying significant shape features,
like corners or points of high angular change, and are useful for distinguishing objects
with unique angles or contours.
4. Shape Signatures
o How it Works: Shape signatures convert the 2D boundary information into a 1D
function. Common shape signatures include:
Radial Distance Signature: Distance from the centroid to each boundary
point.
Angle Signature: The angle between each boundary point and a reference
line.
o Benefits: Reduces shape information to a simple 1D form, making it efficient for
matching and comparing shapes.
Object Recognition: Fourier descriptors and curvature are particularly effective for
recognizing and matching complex shapes.
Pattern Recognition: Chain codes and shape signatures allow efficient matching of known
shapes to detected objects.
Medical Imaging: Curvature and Fourier descriptors help identify anatomical structures or
boundaries of organs.
2. Regional Descriptors
Regional Descriptors (also called region-based descriptors) focus on the properties of the
pixels within the object region, rather than its boundary. These descriptors analyze
characteristics such as area, texture, color, and intensity, which provide a fuller representation
of the object’s internal features.
1. Area
o How it Works: Area is a measure of the number of pixels within the object region.
It’s a simple yet powerful descriptor used to differentiate objects based on size.
o Applications: Used in tasks where object size matters, such as counting objects in a
scene or distinguishing large regions from small ones.
2. Centroid
o How it Works: The centroid is the center of mass or the average position of all pixels
within the region. It’s calculated by averaging the x and y coordinates of the region’s
pixels.
o Applications: Useful for locating the spatial center of an object, which is often
needed in alignment, tracking, or measurement tasks.
3. Moment Invariants
o How it Works: Moment invariants are statistical measures based on the distribution
of pixel intensities in the region. Common moments include:
Raw Moments: Simple calculations that describe the shape’s area,
orientation, and spread.
Central Moments: Provide translation-invariant measures.
Hu’s Invariants: Seven specific moment invariants that are invariant to
rotation, scaling, and translation.
o Benefits: Provide robust and scale-invariant measures, useful for shape matching.
o Applications: Widely used in shape analysis and recognition, especially for complex
regions.
4. Texture
o How it Works: Texture describes the variation in pixel intensity within a region and
provides information on the surface properties of the object. Common methods to
describe texture include:
Gray Level Co-occurrence Matrix (GLCM): Measures the frequency of
pixel pairs at a certain distance and orientation, providing metrics like
contrast, correlation, energy, and homogeneity.
Gabor Filters: Apply frequency-based filters to capture texture at different
scales and orientations.
o Applications: Texture descriptors are used extensively in medical imaging (e.g.,
tissue classification) and remote sensing (e.g., identifying land cover types).
Classification: Texture and intensity-based descriptors are highly effective for classification
in remote sensing, object recognition, and medical imaging.
Object Analysis: Moment invariants, area, and eccentricity enable the analysis of object
shapes and properties, aiding in recognition and measurement tasks.
Image Retrieval: In content-based image retrieval systems, regional descriptors help in
finding images with similar texture or color distributions.
Image wrapping
Image Warping is a transformation technique used in image processing to change the spatial
configuration of an image. This manipulation can alter the position, shape, or orientation of
objects within the image by adjusting pixel locations according to a mathematical function or
mapping rule. Image warping is commonly used in applications like image registration,
correction of perspective distortion, and aligning images in computer graphics or computer
vision.
Key Concepts in Image Warping
1. Geometric Transformations:
o Rotation: Rotates the entire image around a specific point, often the image center.
o Scaling: Enlarges or shrinks the image by scaling pixel coordinates.
o Shearing: Distorts the image by shifting one axis direction, creating a slant or skew.
o Translation: Moves the image by shifting all pixels in a particular direction.
2. Perspective Transformations:
o How it Works: Perspective transformations are used to change the viewpoint of an
image, useful for simulating 3D perspective. They can transform an image to appear
as if it was taken from a different angle.
o Applications: Correcting perspective distortion in photographs, creating realistic
scenes in graphics, and aligning images from different viewpoints.
4. Elastic Deformations:
o How it Works: Elastic warping stretches or contracts regions within an image, often
applied using thin-plate splines or radial basis functions.
o Applications: Useful for medical imaging (e.g., deforming anatomical structures to
fit templates), facial recognition, and texture mapping.
1. Image Registration and Alignment: Aligning images taken from different angles, times, or
sensors by warping one image to match another.
2. Perspective Correction: Used in photo editing to correct distortions, such as keystone
distortion, where objects appear skewed due to perspective.
3. Augmented Reality: Warping is used to overlay digital objects onto real-world scenes in a
way that they align with the camera perspective.
4. Panorama Stitching: Warping is essential for stitching images together into a single,
seamless panoramic image, aligning features across overlapping images.
5. Cartography: Map projections use warping to transform 3D geographical data onto 2D
surfaces while minimizing distortions.