Unit-2 Notes CV
Unit-2 Notes CV
Color Fundamental
Color fundamentals refer to the basic principles and concepts that describe how colors are
perceived, represented, and manipulated in various fields, including image processing,
computer graphics, and visual arts. Understanding these fundamentals is crucial for working
with color images and creating visually appealing designs.
1. Color Perception
Human Vision:
o The human eye perceives color through photoreceptor cells called cones,
which are sensitive to different wavelengths of light.
o There are three types of cones: S-cones (sensitive to short wavelengths, blue
light), M-cones (sensitive to medium wavelengths, green light), and L-cones
(sensitive to long wavelengths, red light).
o The brain processes Ssignal from these cones to create the perception of
various colors.
Visible Spectrum:
o The visible spectrum is the range of electromagnetic wavelengths that the
human eye can detect, approximately from 380 nm (violet) to 700 nm (red).
o Colors are perceived based on the wavelength of light, with shorter
wavelengths appearing blue and longer wavelengths appearing red.
Color Attributes:
o Hue: The type of color, determined by the wavelength of light (e.g., red,
green, blue).
o Saturation: The intensity or purity of the color; a fully saturated color has no
addition of white.
o Brightness (Value or Lightness): The perceived intensity or luminance of the
color; how light or dark the color appears.
2. Color Models
3. Color Spaces
4. Color Mixing
White Balance:
o The process of adjusting the colors in an image to ensure that whites appear
neutral, compensating for the color temperature of the light source.
Color Correction:
o Adjusting the overall color balance of an image to achieve a desired look or to
correct color casts.
Gamma Correction:
o Adjusting the luminance of the colors in an image to correct for the nonlinear
response of display devices.
Digital Imaging: Understanding color models and spaces is crucial for accurate
image capture, editing, and display.
Computer Vision: Color information is used for object detection, recognition, and
tracking.
Graphics Design: Effective use of color models and color theory enhances visual
communication.
Printing: Correct use of subtractive color models ensures accurate reproduction of
colors in print media.
Color fundamentals provide the foundation for working with color in various fields, from
digital imaging to visual arts. They encompass the science of how colors are perceived,
represented, and manipulated in both analog and digital formats. Understanding these
concepts is essential for anyone involved in image processing, computer graphics,
photography, or any field that involves color management and reproduction.
1. Grayscale Image:
o A grayscale image consists of varying shades of gray, where each pixel's
intensity value ranges from 0 (black) to 255 (white) in an 8-bit image.
o In pseudo-color processing, these grayscale intensity levels are mapped to a
set of colors.
2. Color Mapping:
o Color Look-Up Table (LUT):
A LUT is a predefined table that associates each intensity value in the
grayscale image with a specific color.
For example, an intensity value of 0 could be mapped to blue, 128 to
green, and 255 to red.
o Custom Color Maps:
Users can define custom color maps based on the specific application
or desired visualization effect.
3. Applications of Pseudo-Color:
o Medical Imaging:
Enhancing features in X-rays, MRIs, or CT scans to better visualize
tissues, tumors, or other structures.
o Remote Sensing:
Analyzing satellite images to highlight different land covers,
vegetation types, or water bodies.
o Thermography:
Representing temperature distributions in thermal images, where
different temperatures are assigned different colors.
o Scientific Visualization:
Visualizing data like elevation maps, heat maps, and other scalar fields
where color mapping helps in interpreting the data.
4. Methods of Pseudo-Color Processing:
o Intensity Slicing:
The grayscale image is divided into several ranges or "slices" of
intensity values. Each range is assigned a specific color.
Example: Pixel values from 0-50 could be mapped to blue, 51-100 to
green, 101-150 to yellow, and 151-255 to red.
o Color Transformation Functions:
A function is applied to the intensity values to produce corresponding
color values. Common functions include linear and nonlinear
transformations.
Example: Applying a rainbow color map where low intensities map to
blue, mid-range to green, and high intensities to red.
5. Advantages of Pseudo-Color Processing:
o Enhanced Visualization:
By adding color, important details and patterns in the data become
more apparent, facilitating easier interpretation.
o Improved Contrast:
Different regions of an image with similar grayscale intensities can be
distinguished more easily when color is applied.
o Customizability:
Users can apply different color mappings based on the specific
application, allowing for flexible and targeted visualization.
6. Disadvantages of Pseudo-Color Processing:
o Potential Misinterpretation:
Incorrect or misleading color mappings can cause confusion or
misinterpretation of the data.
o Loss of Intensity Information:
In some cases, the original intensity information might be obscured or
lost in the color mapping process.
Pseudo-color image processing is a powerful tool for enhancing the visual interpretation of
grayscale images by assigning colors to different intensity levels. It is widely used in various
fields such as medical imaging, remote sensing, and scientific visualization, helping to
highlight important features and patterns that might not be easily detectable in grayscale.
While it offers significant advantages in terms of visualization, careful consideration is
required to avoid potential misinterpretation of the data.
Neural networks, particularly convolutional neural networks (CNNs), are widely used for
image processing tasks such as classification, segmentation, object detection, and
enhancement. Neural networks are designed to learn patterns and features from images,
making them highly effective in solving complex image-related problems.
CNNs are the most popular type of neural networks used for image processing. CNNs are
designed to automatically detect spatial hierarchies in images and learn representations of
image content through a series of layers.
Convolutional Layers: These layers apply a convolution operation (a filter or kernel) to the
input image to extract local features like edges, textures, and shapes.
Pooling Layers: These layers reduce the spatial dimensions of the feature maps, making the
network more efficient and reducing the risk of overfitting. Max pooling and average pooling
are common pooling techniques.
Fully Connected Layers: These layers are typically found at the end of the network and are
used to output the final classification result by combining all the learned features.
Activation Functions: Commonly used functions like ReLU (Rectified Linear Unit) are applied
after convolution to introduce non-linearity to the model, helping it to capture complex
patterns.
Image Classification: The CNN learns features from images and assigns them to a specific
class. For example, classifying an image as a cat or a dog.
Object Detection: CNNs can detect objects within an image, predicting both the object class
and its location using bounding boxes.
Image Segmentation: In tasks like semantic segmentation, CNNs assign a label to every pixel
in an image, identifying the objects or regions.
Image Enhancement: Neural networks can be used for image super-resolution, denoising,
and other enhancement tasks by learning the mapping from low-quality to high-quality
images.
2. Training Neural Networks for Image Processing
Dataset Preparation: Large labeled datasets like ImageNet, CIFAR-10, or custom datasets are
prepared, with images resized and augmented to improve generalization.
Loss Function: The network is trained using a loss function, such as categorical cross-entropy
for classification or mean squared error for regression tasks.
Optimization: Backpropagation and optimizers like stochastic gradient descent (SGD) or
Adam are used to minimize the loss and adjust the network weights.
Evaluation: Once trained, the performance of the model is evaluated using metrics such as
accuracy, precision, recall, and the confusion matrix.
To evaluate the performance of neural networks on image classification tasks, various metrics
are used. One of the most important tools for performance evaluation is the Confusion
Matrix.
1. Confusion Matrix
True Positives (TP): The number of instances where the model correctly predicted the
positive class.
True Negatives (TN): The number of instances where the model correctly predicted the
negative class.
False Positives (FP): The number of instances where the model incorrectly predicted the
positive class (also known as Type I error).
False Negatives (FN): The number of instances where the model incorrectly predicted the
negative class (also known as Type II error).
For a binary classification problem, the confusion matrix can be structured as:
Actual Positive TP FN
Actual Negative FP TN
For multi-class classification, the confusion matrix expands to show the results for each class.
Several important performance metrics can be calculated using the confusion matrix:
Accuracy:
Accuracy is the ratio of correctly predicted instances (both positive and negative) to
the total number of instances.
Precision:
Precision measures how many of the predicted positive instances are actually positive.
It is also called the Positive Predictive Value (PPV).
Recall measures how many of the actual positive instances were correctly identified
by the model.
F1-Score:
The F1-score is the harmonic mean of precision and recall. It is useful when there is
an imbalance between classes.
Specificity measures how many of the actual negative instances were correctly
identified.
3. Example: Confusion Matrix for Image Classification
Assume we have a model that classifies images into two classes: "Cats" and "Dogs." The
confusion matrix for this binary classification could look like this:
Predicted
Predicted Dogs
Cats
The confusion matrix helps provide a detailed breakdown of model performance, especially
when working with imbalanced datasets where accuracy alone might be misleading. For
example, in a dataset where 95% of the images are of cats, a model that always predicts
"cats" will achieve high accuracy but perform poorly on dog images. The confusion matrix
reveals this problem by showing false negatives and false positives for each class.
Neural networks, especially CNNs, play a critical role in modern image processing tasks like
classification, segmentation, and object detection. The performance of these models is often
evaluated using the confusion matrix and derived metrics such as accuracy, precision, recall,
and F1-score. The confusion matrix provides a comprehensive view of the model’s
performance by identifying the number of correct and incorrect predictions for each class,
offering insights beyond just accuracy.
Binary Images: In a binary image, each pixel has only two possible values: 0 (black) or 1
(white). These images are typically used in tasks where the goal is to distinguish between
foreground (objects) and background.
Grayscale Images: In grayscale images, each pixel can take a range of values, typically from 0
(black) to 255 (white), representing varying intensities of gray. Grayscale images are often
used when more detail is required compared to binary images.
2. Structuring Element
Before we dive into the specific operations, it's essential to understand the concept of the
structuring element (also called a kernel or mask). This is a small binary or grayscale matrix
used to probe or scan the input image. The structuring element defines the neighborhood
around each pixel that is considered during operations like dilation, erosion, or opening.
Square
Rectangle
Disk (circular)
Cross
The size and shape of the structuring element play a crucial role in how the operations affect
the image.
3. Dilation
Working of Dilation:
In binary images: Dilation adds pixels to the boundaries of objects. If at least one pixel in the
neighborhood (defined by the structuring element) is 1, the central pixel is set to 1.
In grayscale images: Dilation replaces the pixel value with the maximum value in the
neighborhood. As a result, bright regions (high intensity) grow outward.
Applications of Dilation:
The white pixel in the center causes the surrounding black pixels to turn white after dilation.
Dilation Function:
In OpenCV (Python):
4. Erosion
Erosion is the opposite of dilation. It "shrinks" the boundaries of foreground objects, reducing
their size. In binary images, erosion removes pixels from the boundaries of objects. In
grayscale images, erosion darkens regions by reducing the intensity of bright areas.
Working of Erosion:
In binary images: If any pixel in the neighborhood is 0 (black), the central pixel is set to 0.
In grayscale images: Erosion replaces the pixel value with the minimum value in the
neighborhood. As a result, dark regions grow outward, and bright areas shrink.
Applications of Erosion:
Here, the central white pixel is surrounded by black pixels, so it is eroded away, leaving a
smaller shape.
Erosion Function:
In OpenCV (Python):
5. Opening
Working of Opening:
First Step (Erosion): Removes small objects or noise from the image.
Second Step (Dilation): Restores the shape of the remaining objects after erosion.
Applications of Opening:
The small noise is removed after erosion, and dilation restores the shape of the main object.
Opening Function:
In OpenCV (Python):
Summary
Morphological operations like dilation, erosion, and opening are crucial in image processing
for tasks like noise removal, shape analysis, and object enhancement. These operations can be
applied to both binary and grayscale images using structuring elements.
These operations are foundational in image preprocessing and segmentation tasks, especially
in medical imaging, computer vision, and object detection.
These operations are crucial for tasks like object recognition, shape analysis, noise removal,
and image enhancement.
Both opening and closing are composite morphological operations involving combinations
of dilation and erosion.
Opening:
Closing:
2. Boundary Extraction
Boundary extraction is the process of obtaining the boundaries or edges of objects within an
image.
Method:
The remaining image will highlight the boundaries of objects, as the erosion will shrink the
objects, and subtracting it will leave only the boundary pixels.
Formula:
Where:
Application:
Methods:
Flood fill algorithms: Group pixels with similar properties (e.g., color, intensity) into a single
region.
Morphological operations: Can help clean the image before region extraction, ensuring that
small noise is removed or objects are connected properly.
Application:
Extracting objects from images for further analysis (e.g., in medical imaging for tumor
detection).
4. Convex Hull
The convex hull of a shape is the smallest convex shape that entirely encloses the object. It's
like wrapping a rubber band around the shape — the convex hull is the tightest boundary that
can be drawn around it.
Algorithm:
The convex hull can be computed by identifying the extreme points of the object and
connecting them in such a way that the resulting shape is convex.
Application:
Convex hull is often used in shape analysis and object recognition tasks to simplify complex
shapes.
It is also used in image processing to approximate the shape of objects or to fill concave
regions.
5. Thinning
Thinning is a morphological operation that reduces the thickness of object boundaries or lines
to a single-pixel-wide skeleton while preserving the original structure and topology.
Method:
Thinning is done by repeatedly applying erosion operations until the objects are reduced to a
skeleton. The process retains the essential features of the shape, such as connectivity and
topology.
Applications:
Thinning is commonly used in OCR (Optical Character Recognition), where it helps extract
and recognize characters in text images.
Used in fingerprint recognition for extracting ridges and valleys.
6. Thickening
Method:
Applications:
Thickening is useful in applications where the objects are too thin and need to be more
prominent for subsequent processing, such as in road detection or vessel enhancement in
medical images.
7. Skeletonization
Skeletonization reduces a shape to its "skeleton", a thin version of the object that retains its
overall structure but is reduced to a single-pixel-wide representation. This is similar to
thinning, but with the goal of preserving the general form and connectivity of the object.
Method:
Skeletonization can be seen as a more extreme form of thinning, where all unnecessary
pixels are removed except for those that lie along the "centerline" of the shape.
Applications:
Skeletonization is useful in applications like pattern recognition and shape analysis, where
the skeleton gives a simplified representation of the object while retaining key structural
information.
Example (Skeletonization):
8. Pruning
Pruning is a process used to remove small, spurious branches from the skeleton of an object.
After thinning or skeletonization, small extraneous parts may remain that are not part of the
main structure of the object. Pruning eliminates these artifacts while preserving the overall
skeleton.
Method:
Application:
Summary of Operations:
Each of these morphological operations is essential in image processing tasks that require
shape analysis, noise removal, object detection, or image enhancement. They enable fine
control over the structure and features of objects in an image, making them powerful tools in
various applications such as medical imaging, character recognition, and object segmentation.