0% found this document useful (0 votes)
26 views32 pages

Lab Manual Cv-Final

The document outlines a series of experiments for a Computer Vision course, focusing on geometric transformations, texture feature extraction, morphological operations, and chain coding. Each experiment includes objectives, theoretical background, procedures, lab tasks, and post-lab questions to enhance understanding of image processing techniques. The aim is to equip students with practical skills using OpenCV for various image manipulation tasks.

Uploaded by

1032212014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views32 pages

Lab Manual Cv-Final

The document outlines a series of experiments for a Computer Vision course, focusing on geometric transformations, texture feature extraction, morphological operations, and chain coding. Each experiment includes objectives, theoretical background, procedures, lab tasks, and post-lab questions to enhance understanding of image processing techniques. The aim is to equip students with practical skills using OpenCV for various image manipulation tasks.

Uploaded by

1032212014
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

T. Y.

ECE - AIML Year 2023-24


Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 1

Name of the Experiment: Geometric Transformations of an image


Performed on: -----------------------------------------------

Submitted on: -----------------------------------------------

Aim: To perform geometric transformations and arithmetic operations on an image.

Pre-requisite: Image matrix formation


Objectives:
a) Understand basics of image processing
b) Understand geometric transformations like translation, scaling, rotation,
reflection, shearing
SCOPE: This experiment will help you understand cv2.add(),cv2.subtract()
functions and cv2.resize, cv2.warpaffine() functions of OpenCV to do arithmetic
and geometric transformations of an image.

Theory: Geometric transformations are needed to give an entity the needed position,
orientation, or shape starting from existing position, orientation, or shape. The basic
transformations are scaling, rotation, translation, and shear. Other important types of
transformations are projections and mappings.

www.mitwpu.edu.in
By scaling relative to the origin, all coordinates of the points defining an entity are multiplied
by the same factor, possibly different for each axis. Scaling can be also performed relative to
an arbitrary point. Mirroring is a special kind of scaling, with one or more scaling factors
negative.

By translation, all coordinates of the points defining an entity are modified by adding the same
vector quantity. Rotation is performed by premultiplying the coordinates of the points defining
an entity by a special rotation matrix, dependent on the rotation angles. Shear produces a
deformation by forcing contacting parts or layers to slide upon each other in opposite directions
parallel to the plane of their contact.

Mappings are complex transformations, in CAD usually consisting of applying a bi-


dimensional image on the 3-D surface of an entity. They are used to improve the aspect of
modeled objects, adding texture information.

Projections are transformations between systems with different numbers of dimensions. The
most important use of projections is for rendering 3-D models on screen or paper (2-D
geometric entities). Traditional drafting uses orthographic projections (parallel to one of the
coordinate axis). To give the sensation of depth to the rendered scenes, the perspective
projection is applied. In such a projection, all lines in the scene that are not parallel to the screen
(projection plane) converge in one point for each direction. In the simplest case, all the lines
perpendicular to the screen converge in one point.

It is often necessary to perform a spatial transformation to:

Affine Transformation
An affine transformation is any transformation that preserves collinearity (i.e., all points lying
on a line initially still lie on a line after transformation) and ratios of distances (e.g., the
midpoint of a line segment remains the midpoint after transformation). In general, an affine
transformation is a composition of rotations, translations, magnifications, and shears.

www.mitwpu.edu.in
c13 and c23 affect translations, c11 and c22 affect magnifications, and the combination affects
rotations and shears.
The transformation matrices below can be used as building blocks.

Using these matrices we can do the transformations like translation, scaling, rotation of given
image to match with the reference image, which is the requirement of image registration.
PROCEDURE:
1. Read image from the given path
2. Convert to grayscale and display image
3. Do Arithmetic operations like addition, subtraction, multiplication, division.
4. Perform geometric transformations translation, rotation, scaling, shearing with
cv2.warpAffine function.
5. Display all the results.
LAB TASKS:
Q.1 Write a program to add/subtract two images of your two-digit roll number. e.g., Add
1.jpg and 2.jpg to get 12.jpg
Q.2 Display following images

www.mitwpu.edu.in
Q.3 Display distance patterns for D8 and De

Post Lab Questions:


1. What is affine transformation? What is difference between affine and perspective
transformation.
2. For Image pixel (-3,2) perform following operations:
i) Translate right by 3 units
ii) Scale in y direction by 5 units
iii) Rotate image about y-axis by 45 degree
iv) Shear in x direction by 30 units
3. What is the need of geometric transformations?

Conclusion:

Additional links:
https://medium.com/@livajorge7/geometric-transformation-in-image-processing-basics-
applications-and-cronj-as-an-expert-f06417193695

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 2

Name of the Experiment: Texture feature extractions using GLCM


Performed on: -----------------------------------------------
Submitted on: -----------------------------------------------

Theory:

Theory

An image texture is a set of metrics calculated in image processing designed to quantify the
perceived texture of an image. Image texture gives us information about the spatial
arrangement of color or intensities in an image or selected region of an image.[1]

Image textures can be artificially created or found in natural scenes captured in an image. Image
textures are one way that can be used to help in segmentation or classification of images.

Gray-Level Co-occurrence Matrices (GLCMs)

then we get the gray-level co-occurrence matrix (below right):

where an entry cij is a count of the number of times that F(x, y) = i and F(x + 1, y + 1) = j.
For example, the first entry comes from the fact that 4 times a 0 appears below and to the

www.mitwpu.edu.in
right of another 0. The factor 1/16 is because there are 16 pairs entering into this matrix, so
this normalizes the matrix entries to be estimates of the co-occurrence probabilities.

For statistical confidence in the estimation of the joint probability distribution, the matrix
must contain a reasonably large average occupancy level. Achieved either by (a) restricting
the number of amplitude quantization levels (causes loss of accuracy for low-amplitude
texture), or (b) using large measurement window. (causes errors if texture changes over the
large window). Typical compromise: 16 gray levels and window size of 30 or 50 pixels on
each side. Now we can analyze C:

maximum probability entry


element difference moment of order k:
This descriptor has relatively low values when the high values of C are near the main
diagonal. For this position operator, high values near the main diagonal would
indicate that bands of constant
likely. When k = 2, it is called the contrast.

Contrast =
Entropy =

This is a measure of randomness, having its highest value when the elements of C are
all equal. In the case of a checkerboard, the entropy would be low.

Uniformity (also called Energy) =


Homogeneity = (large if big values are on the main diagonal)
So by analysing GLCM, we can comment on the type of texture. Many statistical
features also can be used to describe the texture like mean, standard deviation, variance,
skewness, uniformity, entropy, etc.
PROCEDURE:
1. Consider minimum 3 images of different textures.
2. Find GLCM matrix for each.
3. Compare 3 textures and justify from the values of GLCM properties like contrast,
entropy, uniformity, homogeneity.
www.mitwpu.edu.in
4. Segment any image based on texture.

LAB TASKS:
1. Take 3 types of textured images like smooth, course, random and find their GLCM
matrix. Compare GLCM properties and justify the about the texture.
2.

Post Lab Questions:


1) How texture is useful in segmentation? What are different types of texture?
2)
3) What are different statistical methods to describe feature.
4) Is Fourier descriptor useful for texture identification? Justify your answer.
5) Here are 4 different texture patches of size 96x96 pixels. All the pixels in the patch
(quantized to 16 levels) were used to form GLCMs shown below. The position

to each GLCM. Note that 3 of the plots show perspective views of the GLCM from
the vantage point of the (0,0) position. However, one of the plots has the (0,0) matrix
coordinate position placed in the upper left corner since that provides a better view.
So check the axis labels.

www.mitwpu.edu.in
Conclusion:

Additional links:

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 3

Name of the Experiment: To perform morphological operations.


Performed on: -----------------------------------------------

Submitted on: -----------------------------------------------

Aim: To perform morphological operations.

Pre-requisite: Morphological operations dilation, erosion, opening, closing.


Objectives:
c) Understand erosion, dilation
d) Understand opening and closing operation

SCOPE: At the end of this experiment we will be able to understand the various compression and
expansion methods like dilation, erosion, opening and closing of image is studied .
These operations are typically used to extract information about forms and shapes of
structures.
FACILITIES:
Laptop/ PC with Python, Pycharm & Open CV package, different types of images

Theory:

www.mitwpu.edu.in
Morphology is a broad set of image processing operations that process images based on
shapes. Morphological operations apply a structuring element to an input image, creating
an output image of the same size. In a morphological operation, the value of each pixel in
the output image is based on a comparison of the corresponding pixel in the input image
with its neighbors. By choosing the size and shape of the neighborhood, you can construct
these operation that are sensitive to specific shapes in image.

The most basic morphological operations are dilation and erosion. Dilation adds pixels to
the boundaries of objects in an image, while erosion removes pixels on object boundaries.
The number of pixels added or removed from the objects in an image depends on the size
and shape of the structuring element used to process the image. In the morphological
dilation and erosion operations, the state of any given pixel in the output image is
determined by applying a rule to the corresponding pixel and its neighbors in the input
image. The rule used to process the pixels defines the operation as a dilation or an erosion.
Morphological operations are used predominantly for the following purposes:-
-processing (noise filtering, shape simplification).

marking).

-Poincare
characteristic).

a) DILATION:-
Dilation grows or thickens objects in a binary image. The specific manner and extent
of thickening is controlled by shape of structuring element. One of the simplest
application of dilation is bridging gaps.
The morphological transformation dilation combines two sets using vector addition (or
Minkowski set addition, e.g., (a, b) + (c, d) = (a + c, b + d)). The dilation A B is the
point set of all possible vector additions of pairs of elements, one from each of the sets
A and B
= {z ( )z }

b) EROSION

www.mitwpu.edu.in
Erosion shrinks or thins object in a binary image. We can view erosion as a
morphological filtering operation in which image detail smaller than the structuring
element are filtered from the image. Erosion combines two sets using vector subtraction
of set elements and is the dual operator of dilation. Neither erosion nor dilation is an
invertible transformation

c) OPENING
Opening generally smooths the counter of an object, breaks narrow
isthmuses(bridges/strips), an eliminates thin protrusions(projections). Erosion and
dilation are not inverse transformations if an image is eroded and then dilated, the
original image is not re-obtained. Instead, the result is a simplified and less detailed
version of the original image.
Erosion followed by dilation creates an important morphological transformation
called opening.
The opening of an image A by the structuring element B is denoted by A o B and is
defined as

d) CLOSING
Closing also tends to smooth sections of contours but as oppose to opening, it generally
fuses narrow breaks and long thin gulfs, eliminates small holes and fills gaps in the
contours. Dilation followed by erosion is called closing.

defined as

If an image A is unchanged by opening with the structuring element B, it is


called open with respect to B. Similarly, if an image A is unchanged by closing with B,
it is called closed with respect to B.
Opening and closing with an isotropic structuring element is used to eliminate specific
image details smaller than the structuring element the global shape of the objects is
not distorted. Closing connects objects that are close to each other, fills up small holes,
and smoothes the object outline by filling up narrow gulfs.
e) Structuring Element

www.mitwpu.edu.in
An essential part of the dilation and erosion operations is the structuring element used
to probe the input image. A structuring element is a matrix consisting of only 0's and
1's that can have any arbitrary shape and size. The pixels with values of 1 define the
neighborhood.
Two-dimensional, or flat, structuring elements are typically much smaller than the
image being processed. The center pixel of the structuring element, called the origin,
identifies the pixel of interest -- the pixel being processed. The pixels in the structuring
element containing 1's define the neighborhood of the structuring element. These pixels
are also considered in dilation or erosion processing.

PROCEDURE:
The experiment is designed to understand and learn the morphological operations in the
images.
Steps to run the experiments:
1. Select an image on which to perform morphological operations.
2. Select one option from 'Dilation', 'Erosion' ,'Closing' and 'Opening' according to the
required output.
3. Select appropriate structuring element.
4. Display output using imshow function.
LAB TASKS:
Lab Task 1: Perform Erosion on Fig 1 such that all balls get separated from each other.
Optional (you can further apply your connected component analysis algorithm to count total
number of balls present in this image)

Lab Task 2: Remove the noise from Fig 2 and then fill the holes or gap between thumb
impression. You can apply morphological closing and opening.
www.mitwpu.edu.in
Lab Task 3: We have 512 *512 image of a head CT scan. Perform Gray scale 3x3 dilation and
erosion on Fig 3. Also find Morphological gradient. Use the following expression to compute
gradient.

Post Lab Questions:


1) What is erosion and Dilation, explain with example.
2) Explain opening and closing with any application.
3) Write an algorithm to count number of balls in lab task 1.

Conclusion:

1) http://www.codebind.com/python/opencv-python-tutorial-beginners-
morphological-transformations/
2) https://www.geeksforgeeks.org/image-segmentation-using-morphological-
operation/

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 4

Name of the Experiment: Chain coding and decoding


Performed on: -----------------------------------------------

Submitted on: -----------------------------------------------

Theory: Chain codes are used to represent a boundary as a connected sequence of straight line
segments of specified length and direction.

- or 8- connectivity.
clockwise direction and assigning a
direction to the segments connecting every pair of pixels.

1) The chain code is quite long


2) Any small disturbance along the boundary due to noise cause change in the code that may
not related to the shape of the boundary.
The chain code of a boundary depends on the starting point.
difference of the chain code.
the first difference is 3133030 or 33133030, the 1st
first element of the chain as shown in figure 1.
alternating the size of the sampling grid.

www.mitwpu.edu.in
The shape number of a chain coded boundary is defined as the first difference of smallest
magnitude. The difference of a chain code is independent of it rotation, it depend on the
orientation of the grid. The order n of a shape number is defined as the number of digits in its
representation. Chain coding is an efficient representation of binary images composed of
contours.

The chain codes could be generated by using conditional statements for each direction but it
becomes very tedious to describe for systems having large number of directions(3-D grids
can have up to 26 directions). Instead we use a hash function. The difference in X(dx )
and Y(dy ) co-ordinates of two successive points are calculated and hashed to generate
the key for the chain code between the two points.

PROCEDURE:
1. Load image
2. Find counters
3. Find difference in x and y coordinates of two successive points to find direction.
4. Assign 4 or 8 directional chain code to respective direction.
5. Trace the contours as per the direction and append chaincodes.
6. Downsample if required to represent with order less than or equal to 10.
7. Find first difference and shape number

LAB TASKS:
www.mitwpu.edu.in
Take input of any digit of your rollnumber and display chain code of the contours in it. Find
first difference and shape number by downsampling and forming a chaincode of the order less
than 10.
Post Lab Questions:
1. What is 8 Directional shape number for the given shape

2. What is shape no for the given four directional chain code


0103032221
3. Is chain code translation, scale, rotation invariant?
4. What are different applications of chain code?

Conclusion:

Additional links:
https://www.geeksforgeeks.org/chain-code-for-2d-line/
https://en.wikipedia.org/wiki/Chain_code

www.mitwpu.edu.in
T.Y. ECE- AIML Academic Year 2023-24
Semester: I Subject: Computer Vision
Name ------------------------------ Division ---------
Roll No ---------------------------- Batch ----------

Experiment No: 5

Name of the Experiment: Image registration with optimization technique.

Performed on: -----------------------------------------------

Submitted on: -----------------------------------------------

Aim: To perform Image registration with optimization technique.


Pre-requisite: Open cv, Image registration, optimization techniques.

Objectives:

1. To draw the contours for the various shapes present in the image.
2. To write the name of the shapes in its centre.

SCOPE: This experiment will help you understand cv2.findContours() function


and cv2.drawContours() function of OpenCV to draw edges on images. A contour
is an outline or a boundary of shape.

Theory:
Image registration is an image processing technique used to align multiple scenes
into a single integrated image. It helps overcome issues such as image rotation,
scale, and skew that are common when overlaying images.
Or it the process of transforming different sets of data into a single unified
coordinate system and can be thought of as aligning images so that comparable
www.mitwpu.edu.in
characteristics can be related easily. It involves mapping points from one image
to corresponding points in another image.
Image alignment and registration have a number of practical, real-world use
cases, including:

Medical: MRI scans, SPECT scans, and other medical scans produce multiple
images. To help doctors and physicians better interpret these scans, image
registration can be used to align multiple images together and overlay them on
top of each other. From there the doctor can read the results and provide a more
accurate diagnosis.
Military: Automatic Target Recognition (ATR) algorithms accept multiple input
images of the target, align them, and refine their internal parameters to improve
target recognition.
Optical Character Recognition (OCR): Image alignment (often called document
alignment in the context of OCR) can be used to build automatic form, invoice,
or receipt scanners. We first align the input image to a template of the document
we want to scan. From there OCR algorithms can read the text from each
individual field.
Scale-Invariant Feature Transform (SIFT) SIFT is an algorithm in computer
vision to detect and describe local features in images. It is a feature that is widely
used in image processing. The processes of SIFT include Difference of Gaussians
(DoG) Space Generation, Keypoints Detection, and Feature Description.
Four steps of Scale-Invariant Feature Transform (SIFT)
Scale-space extrema selection: It is the first step of SIFT algorithm. The
potential interest points are located using difference-of-gaussian.
Keypoint localization: A model is fit to determine the location and scale at
each potential location. Keypoints are selected based on their stability.

www.mitwpu.edu.in
Orientation assignment: orientations are assigned to keypoints locations
based on local image gradient direction.
Keypoint descriptor: It is the final step of SIFT algorithm. A coordinate
system around the feature point is created that remains the same for the
different views of the feature.

There are a number of image alignment and registration algorithms:

The most popular image alignment algorithms are feature-based and


include keypoint detectors (DoG, Harris, GFFT, etc.), local invariant
descriptors (SIFT, SURF, ORB, etc.), and keypoint matching (RANSAC
and its variants).
Medical applications often use similarity measures for image registration,
typically cross-correlation, sum of squared intensity differences, and
mutual information.
With the resurgence of neural networks, deep learning can even be used
for image alignment by automatically learning the homography transform.

www.mitwpu.edu.in
How does image registration work?
Alignment can be looked at as a simple coordinate transform.
PROCEDURE
1. Import module
2. Import images as a reference image & aligned images.
3. Apply the registration effects on it.
The algorithm works as follows:
1. Convert both images to grayscale.
2. Match features from the image to be aligned, to the reference image and
store the coordinates of the corresponding key points.
3. Keypoints are simply the selected few points that are used to compute the
transform (generally points that stand out), and descriptors are histograms
of the image gradients to characterize the appearance of a keypoint.
4. Use ORB (Oriented FAST and Rotated BRIEF)/ SIFT (Scale Invarient
Feature Transform) implementation in the OpenCV library, which
provides us with both key points as well as their associated descriptors.
5. Match the key points between the two images.
6. Pick the top matches, and remove the noisy matches.
7. Find the homomorphy transform.
8. Apply this transform to the original unaligned image to get the output
image.

LAB TASKS:
Acquire a reference image and another image of the same scene and try to align them with
keypoints using SIFT algorithm.
Post Lab Questions:
1. Explain why SIFT algorithm is invariant to scale and rotation.
2. Explain any other algorithm for keypoint descriptors.
3. What are the applications of image registration/ alignment? Explain any one in detail.
4. Explain any matching algorithm.

www.mitwpu.edu.in
5. What is homography transform?

Conclusion:
Additional links:
https://www.geeksforgeeks.org/image-registration-using-opencv-python/

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 6

Name of the Experiment: Face recognition using Viola-Jones algorithm or similar application
Performed on: -----------------------------------------------
Submitted on: -----------------------------------------------

Theory:

Object detection is one of the computer technologies that is connected to image processing and
computer vision. It is concerned with detecting instances of an object such as human faces,
buildings, trees, cars, etc. The primary aim of face detection algorithms is to determine whether
there is any face in an image or not.

In recent years, we have seen significant advancement of technologies that can detect and
recognise faces. Our mobile cameras are often equipped with such technology where we can
see a box around the faces. Although there are quite advanced face detection algorithms,
especially with the introduction of deep learning, the introduction of viola jones algorithm in
2001 was a breakthrough in this field. Now let us explore the viola jones algorithm in detail.

What is Viola Jones algorithm?

Viola Jones algorithm is named after two computer vision researchers who proposed the

la-Jones is
quite powerful, and its application has proven to be exceptionally notable in real-time face
detection. This algorithm is painfully slow to train but can detect faces in real-time with
impressive speed.

www.mitwpu.edu.in
Given an image(this algorithm works on grayscale image), the algorithm looks at many smaller
subregions and tries to find a face by looking for specific features in each subregion. It needs
to check many different positions and scales because an image can contain many faces of
various sizes. Viola and Jones used Haar-like features to detect faces in this algorithm.

The Viola Jones algorithm has four main steps, which we shall discuss in the sections to follow:

Selecting Haar-like features


Creating an integral image
Running AdaBoost training
Creating classifier cascades

What are Haar-Like Features?

In the 19th century a Hungarian mathematician, Alfred Haar gave the concepts of Haar
-
wavelet family or basis. Voila and Jones adapted the idea of using Haar wavelets and developed
the so-called Haar-like features.

Haar-like features are digital image features used in object recognition. All human faces share
some universal properties of the human face like the eyes region is darker than its neighbour
pixels, and the nose region is brighter than the eye region.

A simple way to find out which region is lighter or darker is to sum up the pixel values of both
regions and compare them. The sum of pixel values in the darker region will be smaller than
the sum of pixels in the lighter region. If one side is lighter than the other, it may be an edge of
an eyebrow or sometimes the middle portion may be shinier than the surrounding boxes, which
can be interpreted as a nose This can be accomplished using Haar-like features and with the
help of them, we can interpret the different parts of a face.

There are 3 types of Haar-like features that Viola and Jones identified in their research:

Edge features
Line-features
Four-sided features

Edge features and Line features are useful for detecting edges and lines respectively. The four-
sided features are used for finding diagonal features.

www.mitwpu.edu.in
The value of the feature is calculated as a single number: the sum of pixel values in the black
area minus the sum of pixel values in the white area. The value is zero for a plain surface in
which all the pixels have the same value, and thus, provide no useful information.

Since our faces are of complex shapes with darker and brighter spots, a Haar-like feature gives
you a large number when the areas in the black and white rectangles are very different. Using
this value, we get a piece of valid information out of the image.

To be useful, a Haar-like feature needs to give you a large number, meaning that the areas in
the black and white rectangles are very different. There are known features that perform very
well to detect human faces:

For example, when we apply this specific haar-like feature to the bridge of the nose, we get a
good response. Similarly, we combine many of these features to understand if an image region
contains a human face.

How is AdaBoost used in viola jones algorithm?

In the Viola-Jones algorithm, each Haar-like feature represents a weak learner. To decide the
type and size of a feature that goes into the final classifier, AdaBoost checks the performance
of all classifiers that you supply to it.

We set up a cascaded system in which we divide the process of identifying a face into multiple
stages. In the first stage, we have a classifier which is made up of our best features, in other
words, in the first stage, the subregion passes through the best features such as the feature which
identifies the nose bridge or the one that identifies the eyes. In the next stages, we have all the
remaining features.

What are Cascading Classifiers?

When an image subregion enters the cascade, it is evaluated by the first stage. If that stage

maybe.

www.mitwpu.edu.in
When a subregion gets a maybe, it is sent to the next stage of the cascade and the process
continues as such till we reach the last stage.

If all classifiers approve the image, it is finally classified as a human face and is presented to
the user as a detection.

PROCEDURE:
1. Read image
2. Detect face using cv2.CascadeClassifier
3. Display rectangle around detected area
4. Design CNN with number of layers, strides, max pooling, activation function, etc.
5. Load training and testing dataset.
6. Recognize the face and display result.
LAB TASKS:
1. Detect your face using Viola-Jones algorithm.
2. Design and implement CNN to recognize your face.
Post Lab Questions:
1. What is Viola-Jones algorithm algorithm?
2. Explain how can you recognize the face with CNN?
3. What are different performance metrics to check the accuracy?
4. Explain design of the CNN you used in your program in terms of hyperparameters.

Conclusion:

Additional links:

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 7

Name of the Experiment: Disparity estimation / Depth map generation


Performed on: -----------------------------------------------
Submitted on: -----------------------------------------------

Theory:

Given two or more images of the same 3D scene, taken from different points of view, the
correspondence problem refers to the task of finding a set of points in one image which can be
identified as the same points in another image. To do this, points or features in one image are
matched with the points or features in another image, thus establishing corresponding points
or corresponding features, also known as homologous points or homologous features. The
images can be taken from a different point of view, at different times, or with objects in the
scene in general motion relative to the camera(s). There are two basic ways to find the
correspondences between two images.
Correlation-based checking if one location in one image looks/seems like another in another
image.
Feature-based finding features in the image and seeing if the layout of a subset of features is
similar in the two images.
Epipolar geometry as shown in figure 1is the geometry of stereo vision. When two cameras
view a 3D scene from two distinct positions, there are a number of geometric relations between
the 3D points and their projections onto the 2D images that lead to constraints between the
image points. These relations are derived based on the assumption that the cameras can be
approximated by the pinhole camera model.

www.mitwpu.edu.in
Fig.1: Epipolar geometry
Epipole or epipolar point
Since the optical centers of the cameras lenses are distinct, each center projects onto a distinct
point into the other camera's image plane. These two image points, denoted by eL and eR, are
called epipoles or epipolar points. Both epipoles eL and eR in their respective image planes
and both optical centers OL and OR lie on a single 3D line.
Epipolar line
The line OL X is seen by the left camera as a point because it is directly in line with that
camera's lens optical center. However, the right camera sees this line as a line in its image
plane. That line (eR xR) in the right camera is called an epipolar line. Symmetrically, the line
OR X is seen by the right camera as a point and is seen as epipolar line eL xLby the left
camera.
An epipolar line is a function of the position of point X in the 3D space, i.e. as X varies, a set
of epipolar lines is generated in both images. Since the 3D line OL X passes through the optical
center of the lens OL, the corresponding epipolar line in the right image must pass through the
epipole eR (and correspondingly for epipolar lines in the left image). All epipolar lines in one
image contain the epipolar point of that image. In fact, any line which contains the epipolar
point is an epipolar line since it can be derived from some 3D point X.
Epipolar constraint and triangulation

If the relative position of the two cameras is known, this leads to two important observations:
Assume the projection point xL is known, and the epipolar line eR xR is known and the point
X projects into the right image, on a point xR which must lie on this particular epipolar line.
This means that for each point observed in one image the same point must be observed in the

www.mitwpu.edu.in
other image on a known epipolar line. This provides an epipolar constraint: the projection of X
on the right camera plane xR must be contained in the eR xR epipolar line. All points X e.g.
X1, X2, X3 on the OL XL line will verify that constraint. It means that it is possible to test if
two points correspond to the same 3D point. Epipolar constraints can also be described by the
essential matrix or the fundamental matrix between the two cameras.

If the points xL and xR are known, their projection lines are also known. If the two image
points correspond to the same 3D point X the projection lines must intersect precisely at X.
This means that X can be calculated from the coordinates of the two image points, a process
called triangulation as shown in figure 2. Depth of the scene can be estimated from disparity.

Fig.2: stereo rectification and triangulation

We can recover depth by finding image corresponds to x in the


other image after stereo rectification.

PROCEDURE:
1. Acquire 2 images of same scent from 2 different cameras i.e. left camera and right
camera.
2. Make sure about their shape, size, dtype, etc must be same

www.mitwpu.edu.in
3. Convert 2 images to gray.
4. Set parameters of the function cv.StereoBM.create() in opencv.
5. Find disparity map and display with normalization.
6. Find depth map and display with normalization if required.
LAB TASKS:
1. Acquire 2 stereo images of the same scene
2. Find disparity map and depth map.
Post Lab Questions:
1. What is stereo correspondence?
2. Explain epipolar geometry with neat diagram and define the terms- epipoles, epipolar
lines, baseline ?
3. How disparity and depth are related, explain with concept of triangulation?
4. Explain construction and working of Lidar and Kinect.

Conclusion:

Additional links:
https://en.wikipedia.org/wiki/Kinect
https://en.wikipedia.org/wiki/Fundamental_matrix_(computer_vision)
https://www.pyroistech.com/lidar/
https://web.eecs.umich.edu/~jjcorso/t/598F14/files/lecture_1027_stereo.pdf

www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision

Name ------------------------------ Division ---------


Roll No ---------------------------- Batch ----------

Experiment No: 8

Name of the Experiment: 3D modelling


Performed on: -----------------------------------------------
Submitted on: -----------------------------------------------

Theory:

3D computer vision extracts, processes, and analyzes 2D visual data to generate their 3D
models. To do so, it employs different algorithms and data acquisition techniques that enable
computer vision models to reconstruct the dimensions, contours and spatial relationships of
objects within a given visual setting. The 3D CV techniques combine principles from multiple
disciplines, such as computer vision, photogrammetry, geometry and machine learning with
the objective of deriving valuable three-dimensional information from images, videos or sensor
data.

Spatial dimensions refer to the three orthogonal axes (X, Y, and Z) that make the 3D coordinate
system as shown in figure 1. These dimensions capture the height, width, and depth values of
objects. Spatial coordinates facilitate the representation, examination, and manipulation of 3D
data like point clouds, meshes, or voxel grids essential for applications such as robotics,
augmented reality, and 3D reconstruction.

www.mitwpu.edu.in
Fig. 1: spatial dimension

Techniques for 3D Reconstruction in Computer Vision:

Passive Techniques:
Shape from Shading -
shape using just a single 2D image. This technique analyzes how light hits the object (shading
patterns) and how bright different areas appear (intensity variations). By understanding how

Shape from Texture- Shape from texture is a method used in computer vision to determine the
three-dimensional shape of an object based on the distortions found in its surface texture. This
technique relies on the assumption that the surface possesses a textured pattern with known
characteristics

Depth from Defocus- Depth from defocus is a process that calculates the depth or three-
dimensional structure of a scene by examining the degree of blur or defocus present in areas of
an image. It works on the principle that objects situated at distances, from the camera lens will
exhibit varying levels of defocus blur. By comparing these blur levels throughout the image,
DfD can generate depth maps or three-dimensional models representing the scene.

Active Techniques:

Structured Light - Structured light is an active 3D CV technique where a specifically designed


light pattern or beam is projected onto a visual scene. This light pattern can be in various forms
including grids, stripes, or even more complex designs. As the light pattern strikes objects that
have varying shapes and depths, the light beams get deformed. Therefore by analyzing how the

www.mitwpu.edu.in
information of different points on the object.

Time-of-Flight (ToF) Sensors- Time-of-flight (ToF) sensor is another active vision technique
that measures the time it takes for a light signal to travel from the sensor to an object and back.
Common light sources for ToF sensors are lasers or infrared (IR) LEDs. The sensor emits a
light pulse and then calculates the distance based on the time-of-flight of the reflected light
beam. By capturing this time for each pixel in the sensor array, a 3D depth map of the scene is
generated. Unlike regular cameras that capture color or brightness, ToF sensors provide depth
information for every point which essentially helps in building a 3D image of the surroundings.

PROCEDURE:
1. Install Kinect camera
2. Capture the images of scene
3. Observe 3D model of the scence .

Post Lab Questions:


1. How shape can be construced from texture?
2. What is pointcloud? How 3D shape can be represented with pointcloud?
3. Explain surface representation? What is surface normal?
4. What is Active range finding?

Conclusion:

Additional links:
https://viso.ai/computer-vision/3d-computer-vision/

www.mitwpu.edu.in

You might also like