Lab Manual Cv-Final
Lab Manual Cv-Final
Experiment No: 1
Theory: Geometric transformations are needed to give an entity the needed position,
orientation, or shape starting from existing position, orientation, or shape. The basic
transformations are scaling, rotation, translation, and shear. Other important types of
transformations are projections and mappings.
www.mitwpu.edu.in
By scaling relative to the origin, all coordinates of the points defining an entity are multiplied
by the same factor, possibly different for each axis. Scaling can be also performed relative to
an arbitrary point. Mirroring is a special kind of scaling, with one or more scaling factors
negative.
By translation, all coordinates of the points defining an entity are modified by adding the same
vector quantity. Rotation is performed by premultiplying the coordinates of the points defining
an entity by a special rotation matrix, dependent on the rotation angles. Shear produces a
deformation by forcing contacting parts or layers to slide upon each other in opposite directions
parallel to the plane of their contact.
Projections are transformations between systems with different numbers of dimensions. The
most important use of projections is for rendering 3-D models on screen or paper (2-D
geometric entities). Traditional drafting uses orthographic projections (parallel to one of the
coordinate axis). To give the sensation of depth to the rendered scenes, the perspective
projection is applied. In such a projection, all lines in the scene that are not parallel to the screen
(projection plane) converge in one point for each direction. In the simplest case, all the lines
perpendicular to the screen converge in one point.
Affine Transformation
An affine transformation is any transformation that preserves collinearity (i.e., all points lying
on a line initially still lie on a line after transformation) and ratios of distances (e.g., the
midpoint of a line segment remains the midpoint after transformation). In general, an affine
transformation is a composition of rotations, translations, magnifications, and shears.
www.mitwpu.edu.in
c13 and c23 affect translations, c11 and c22 affect magnifications, and the combination affects
rotations and shears.
The transformation matrices below can be used as building blocks.
Using these matrices we can do the transformations like translation, scaling, rotation of given
image to match with the reference image, which is the requirement of image registration.
PROCEDURE:
1. Read image from the given path
2. Convert to grayscale and display image
3. Do Arithmetic operations like addition, subtraction, multiplication, division.
4. Perform geometric transformations translation, rotation, scaling, shearing with
cv2.warpAffine function.
5. Display all the results.
LAB TASKS:
Q.1 Write a program to add/subtract two images of your two-digit roll number. e.g., Add
1.jpg and 2.jpg to get 12.jpg
Q.2 Display following images
www.mitwpu.edu.in
Q.3 Display distance patterns for D8 and De
Conclusion:
Additional links:
https://medium.com/@livajorge7/geometric-transformation-in-image-processing-basics-
applications-and-cronj-as-an-expert-f06417193695
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 2
Theory:
Theory
An image texture is a set of metrics calculated in image processing designed to quantify the
perceived texture of an image. Image texture gives us information about the spatial
arrangement of color or intensities in an image or selected region of an image.[1]
Image textures can be artificially created or found in natural scenes captured in an image. Image
textures are one way that can be used to help in segmentation or classification of images.
where an entry cij is a count of the number of times that F(x, y) = i and F(x + 1, y + 1) = j.
For example, the first entry comes from the fact that 4 times a 0 appears below and to the
www.mitwpu.edu.in
right of another 0. The factor 1/16 is because there are 16 pairs entering into this matrix, so
this normalizes the matrix entries to be estimates of the co-occurrence probabilities.
For statistical confidence in the estimation of the joint probability distribution, the matrix
must contain a reasonably large average occupancy level. Achieved either by (a) restricting
the number of amplitude quantization levels (causes loss of accuracy for low-amplitude
texture), or (b) using large measurement window. (causes errors if texture changes over the
large window). Typical compromise: 16 gray levels and window size of 30 or 50 pixels on
each side. Now we can analyze C:
Contrast =
Entropy =
This is a measure of randomness, having its highest value when the elements of C are
all equal. In the case of a checkerboard, the entropy would be low.
LAB TASKS:
1. Take 3 types of textured images like smooth, course, random and find their GLCM
matrix. Compare GLCM properties and justify the about the texture.
2.
to each GLCM. Note that 3 of the plots show perspective views of the GLCM from
the vantage point of the (0,0) position. However, one of the plots has the (0,0) matrix
coordinate position placed in the upper left corner since that provides a better view.
So check the axis labels.
www.mitwpu.edu.in
Conclusion:
Additional links:
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 3
SCOPE: At the end of this experiment we will be able to understand the various compression and
expansion methods like dilation, erosion, opening and closing of image is studied .
These operations are typically used to extract information about forms and shapes of
structures.
FACILITIES:
Laptop/ PC with Python, Pycharm & Open CV package, different types of images
Theory:
www.mitwpu.edu.in
Morphology is a broad set of image processing operations that process images based on
shapes. Morphological operations apply a structuring element to an input image, creating
an output image of the same size. In a morphological operation, the value of each pixel in
the output image is based on a comparison of the corresponding pixel in the input image
with its neighbors. By choosing the size and shape of the neighborhood, you can construct
these operation that are sensitive to specific shapes in image.
The most basic morphological operations are dilation and erosion. Dilation adds pixels to
the boundaries of objects in an image, while erosion removes pixels on object boundaries.
The number of pixels added or removed from the objects in an image depends on the size
and shape of the structuring element used to process the image. In the morphological
dilation and erosion operations, the state of any given pixel in the output image is
determined by applying a rule to the corresponding pixel and its neighbors in the input
image. The rule used to process the pixels defines the operation as a dilation or an erosion.
Morphological operations are used predominantly for the following purposes:-
-processing (noise filtering, shape simplification).
marking).
-Poincare
characteristic).
a) DILATION:-
Dilation grows or thickens objects in a binary image. The specific manner and extent
of thickening is controlled by shape of structuring element. One of the simplest
application of dilation is bridging gaps.
The morphological transformation dilation combines two sets using vector addition (or
Minkowski set addition, e.g., (a, b) + (c, d) = (a + c, b + d)). The dilation A B is the
point set of all possible vector additions of pairs of elements, one from each of the sets
A and B
= {z ( )z }
b) EROSION
www.mitwpu.edu.in
Erosion shrinks or thins object in a binary image. We can view erosion as a
morphological filtering operation in which image detail smaller than the structuring
element are filtered from the image. Erosion combines two sets using vector subtraction
of set elements and is the dual operator of dilation. Neither erosion nor dilation is an
invertible transformation
c) OPENING
Opening generally smooths the counter of an object, breaks narrow
isthmuses(bridges/strips), an eliminates thin protrusions(projections). Erosion and
dilation are not inverse transformations if an image is eroded and then dilated, the
original image is not re-obtained. Instead, the result is a simplified and less detailed
version of the original image.
Erosion followed by dilation creates an important morphological transformation
called opening.
The opening of an image A by the structuring element B is denoted by A o B and is
defined as
d) CLOSING
Closing also tends to smooth sections of contours but as oppose to opening, it generally
fuses narrow breaks and long thin gulfs, eliminates small holes and fills gaps in the
contours. Dilation followed by erosion is called closing.
defined as
www.mitwpu.edu.in
An essential part of the dilation and erosion operations is the structuring element used
to probe the input image. A structuring element is a matrix consisting of only 0's and
1's that can have any arbitrary shape and size. The pixels with values of 1 define the
neighborhood.
Two-dimensional, or flat, structuring elements are typically much smaller than the
image being processed. The center pixel of the structuring element, called the origin,
identifies the pixel of interest -- the pixel being processed. The pixels in the structuring
element containing 1's define the neighborhood of the structuring element. These pixels
are also considered in dilation or erosion processing.
PROCEDURE:
The experiment is designed to understand and learn the morphological operations in the
images.
Steps to run the experiments:
1. Select an image on which to perform morphological operations.
2. Select one option from 'Dilation', 'Erosion' ,'Closing' and 'Opening' according to the
required output.
3. Select appropriate structuring element.
4. Display output using imshow function.
LAB TASKS:
Lab Task 1: Perform Erosion on Fig 1 such that all balls get separated from each other.
Optional (you can further apply your connected component analysis algorithm to count total
number of balls present in this image)
Lab Task 2: Remove the noise from Fig 2 and then fill the holes or gap between thumb
impression. You can apply morphological closing and opening.
www.mitwpu.edu.in
Lab Task 3: We have 512 *512 image of a head CT scan. Perform Gray scale 3x3 dilation and
erosion on Fig 3. Also find Morphological gradient. Use the following expression to compute
gradient.
Conclusion:
1) http://www.codebind.com/python/opencv-python-tutorial-beginners-
morphological-transformations/
2) https://www.geeksforgeeks.org/image-segmentation-using-morphological-
operation/
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 4
Theory: Chain codes are used to represent a boundary as a connected sequence of straight line
segments of specified length and direction.
- or 8- connectivity.
clockwise direction and assigning a
direction to the segments connecting every pair of pixels.
www.mitwpu.edu.in
The shape number of a chain coded boundary is defined as the first difference of smallest
magnitude. The difference of a chain code is independent of it rotation, it depend on the
orientation of the grid. The order n of a shape number is defined as the number of digits in its
representation. Chain coding is an efficient representation of binary images composed of
contours.
The chain codes could be generated by using conditional statements for each direction but it
becomes very tedious to describe for systems having large number of directions(3-D grids
can have up to 26 directions). Instead we use a hash function. The difference in X(dx )
and Y(dy ) co-ordinates of two successive points are calculated and hashed to generate
the key for the chain code between the two points.
PROCEDURE:
1. Load image
2. Find counters
3. Find difference in x and y coordinates of two successive points to find direction.
4. Assign 4 or 8 directional chain code to respective direction.
5. Trace the contours as per the direction and append chaincodes.
6. Downsample if required to represent with order less than or equal to 10.
7. Find first difference and shape number
LAB TASKS:
www.mitwpu.edu.in
Take input of any digit of your rollnumber and display chain code of the contours in it. Find
first difference and shape number by downsampling and forming a chaincode of the order less
than 10.
Post Lab Questions:
1. What is 8 Directional shape number for the given shape
Conclusion:
Additional links:
https://www.geeksforgeeks.org/chain-code-for-2d-line/
https://en.wikipedia.org/wiki/Chain_code
www.mitwpu.edu.in
T.Y. ECE- AIML Academic Year 2023-24
Semester: I Subject: Computer Vision
Name ------------------------------ Division ---------
Roll No ---------------------------- Batch ----------
Experiment No: 5
Objectives:
1. To draw the contours for the various shapes present in the image.
2. To write the name of the shapes in its centre.
Theory:
Image registration is an image processing technique used to align multiple scenes
into a single integrated image. It helps overcome issues such as image rotation,
scale, and skew that are common when overlaying images.
Or it the process of transforming different sets of data into a single unified
coordinate system and can be thought of as aligning images so that comparable
www.mitwpu.edu.in
characteristics can be related easily. It involves mapping points from one image
to corresponding points in another image.
Image alignment and registration have a number of practical, real-world use
cases, including:
Medical: MRI scans, SPECT scans, and other medical scans produce multiple
images. To help doctors and physicians better interpret these scans, image
registration can be used to align multiple images together and overlay them on
top of each other. From there the doctor can read the results and provide a more
accurate diagnosis.
Military: Automatic Target Recognition (ATR) algorithms accept multiple input
images of the target, align them, and refine their internal parameters to improve
target recognition.
Optical Character Recognition (OCR): Image alignment (often called document
alignment in the context of OCR) can be used to build automatic form, invoice,
or receipt scanners. We first align the input image to a template of the document
we want to scan. From there OCR algorithms can read the text from each
individual field.
Scale-Invariant Feature Transform (SIFT) SIFT is an algorithm in computer
vision to detect and describe local features in images. It is a feature that is widely
used in image processing. The processes of SIFT include Difference of Gaussians
(DoG) Space Generation, Keypoints Detection, and Feature Description.
Four steps of Scale-Invariant Feature Transform (SIFT)
Scale-space extrema selection: It is the first step of SIFT algorithm. The
potential interest points are located using difference-of-gaussian.
Keypoint localization: A model is fit to determine the location and scale at
each potential location. Keypoints are selected based on their stability.
www.mitwpu.edu.in
Orientation assignment: orientations are assigned to keypoints locations
based on local image gradient direction.
Keypoint descriptor: It is the final step of SIFT algorithm. A coordinate
system around the feature point is created that remains the same for the
different views of the feature.
www.mitwpu.edu.in
How does image registration work?
Alignment can be looked at as a simple coordinate transform.
PROCEDURE
1. Import module
2. Import images as a reference image & aligned images.
3. Apply the registration effects on it.
The algorithm works as follows:
1. Convert both images to grayscale.
2. Match features from the image to be aligned, to the reference image and
store the coordinates of the corresponding key points.
3. Keypoints are simply the selected few points that are used to compute the
transform (generally points that stand out), and descriptors are histograms
of the image gradients to characterize the appearance of a keypoint.
4. Use ORB (Oriented FAST and Rotated BRIEF)/ SIFT (Scale Invarient
Feature Transform) implementation in the OpenCV library, which
provides us with both key points as well as their associated descriptors.
5. Match the key points between the two images.
6. Pick the top matches, and remove the noisy matches.
7. Find the homomorphy transform.
8. Apply this transform to the original unaligned image to get the output
image.
LAB TASKS:
Acquire a reference image and another image of the same scene and try to align them with
keypoints using SIFT algorithm.
Post Lab Questions:
1. Explain why SIFT algorithm is invariant to scale and rotation.
2. Explain any other algorithm for keypoint descriptors.
3. What are the applications of image registration/ alignment? Explain any one in detail.
4. Explain any matching algorithm.
www.mitwpu.edu.in
5. What is homography transform?
Conclusion:
Additional links:
https://www.geeksforgeeks.org/image-registration-using-opencv-python/
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 6
Name of the Experiment: Face recognition using Viola-Jones algorithm or similar application
Performed on: -----------------------------------------------
Submitted on: -----------------------------------------------
Theory:
Object detection is one of the computer technologies that is connected to image processing and
computer vision. It is concerned with detecting instances of an object such as human faces,
buildings, trees, cars, etc. The primary aim of face detection algorithms is to determine whether
there is any face in an image or not.
In recent years, we have seen significant advancement of technologies that can detect and
recognise faces. Our mobile cameras are often equipped with such technology where we can
see a box around the faces. Although there are quite advanced face detection algorithms,
especially with the introduction of deep learning, the introduction of viola jones algorithm in
2001 was a breakthrough in this field. Now let us explore the viola jones algorithm in detail.
Viola Jones algorithm is named after two computer vision researchers who proposed the
la-Jones is
quite powerful, and its application has proven to be exceptionally notable in real-time face
detection. This algorithm is painfully slow to train but can detect faces in real-time with
impressive speed.
www.mitwpu.edu.in
Given an image(this algorithm works on grayscale image), the algorithm looks at many smaller
subregions and tries to find a face by looking for specific features in each subregion. It needs
to check many different positions and scales because an image can contain many faces of
various sizes. Viola and Jones used Haar-like features to detect faces in this algorithm.
The Viola Jones algorithm has four main steps, which we shall discuss in the sections to follow:
In the 19th century a Hungarian mathematician, Alfred Haar gave the concepts of Haar
-
wavelet family or basis. Voila and Jones adapted the idea of using Haar wavelets and developed
the so-called Haar-like features.
Haar-like features are digital image features used in object recognition. All human faces share
some universal properties of the human face like the eyes region is darker than its neighbour
pixels, and the nose region is brighter than the eye region.
A simple way to find out which region is lighter or darker is to sum up the pixel values of both
regions and compare them. The sum of pixel values in the darker region will be smaller than
the sum of pixels in the lighter region. If one side is lighter than the other, it may be an edge of
an eyebrow or sometimes the middle portion may be shinier than the surrounding boxes, which
can be interpreted as a nose This can be accomplished using Haar-like features and with the
help of them, we can interpret the different parts of a face.
There are 3 types of Haar-like features that Viola and Jones identified in their research:
Edge features
Line-features
Four-sided features
Edge features and Line features are useful for detecting edges and lines respectively. The four-
sided features are used for finding diagonal features.
www.mitwpu.edu.in
The value of the feature is calculated as a single number: the sum of pixel values in the black
area minus the sum of pixel values in the white area. The value is zero for a plain surface in
which all the pixels have the same value, and thus, provide no useful information.
Since our faces are of complex shapes with darker and brighter spots, a Haar-like feature gives
you a large number when the areas in the black and white rectangles are very different. Using
this value, we get a piece of valid information out of the image.
To be useful, a Haar-like feature needs to give you a large number, meaning that the areas in
the black and white rectangles are very different. There are known features that perform very
well to detect human faces:
For example, when we apply this specific haar-like feature to the bridge of the nose, we get a
good response. Similarly, we combine many of these features to understand if an image region
contains a human face.
In the Viola-Jones algorithm, each Haar-like feature represents a weak learner. To decide the
type and size of a feature that goes into the final classifier, AdaBoost checks the performance
of all classifiers that you supply to it.
We set up a cascaded system in which we divide the process of identifying a face into multiple
stages. In the first stage, we have a classifier which is made up of our best features, in other
words, in the first stage, the subregion passes through the best features such as the feature which
identifies the nose bridge or the one that identifies the eyes. In the next stages, we have all the
remaining features.
When an image subregion enters the cascade, it is evaluated by the first stage. If that stage
maybe.
www.mitwpu.edu.in
When a subregion gets a maybe, it is sent to the next stage of the cascade and the process
continues as such till we reach the last stage.
If all classifiers approve the image, it is finally classified as a human face and is presented to
the user as a detection.
PROCEDURE:
1. Read image
2. Detect face using cv2.CascadeClassifier
3. Display rectangle around detected area
4. Design CNN with number of layers, strides, max pooling, activation function, etc.
5. Load training and testing dataset.
6. Recognize the face and display result.
LAB TASKS:
1. Detect your face using Viola-Jones algorithm.
2. Design and implement CNN to recognize your face.
Post Lab Questions:
1. What is Viola-Jones algorithm algorithm?
2. Explain how can you recognize the face with CNN?
3. What are different performance metrics to check the accuracy?
4. Explain design of the CNN you used in your program in terms of hyperparameters.
Conclusion:
Additional links:
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 7
Theory:
Given two or more images of the same 3D scene, taken from different points of view, the
correspondence problem refers to the task of finding a set of points in one image which can be
identified as the same points in another image. To do this, points or features in one image are
matched with the points or features in another image, thus establishing corresponding points
or corresponding features, also known as homologous points or homologous features. The
images can be taken from a different point of view, at different times, or with objects in the
scene in general motion relative to the camera(s). There are two basic ways to find the
correspondences between two images.
Correlation-based checking if one location in one image looks/seems like another in another
image.
Feature-based finding features in the image and seeing if the layout of a subset of features is
similar in the two images.
Epipolar geometry as shown in figure 1is the geometry of stereo vision. When two cameras
view a 3D scene from two distinct positions, there are a number of geometric relations between
the 3D points and their projections onto the 2D images that lead to constraints between the
image points. These relations are derived based on the assumption that the cameras can be
approximated by the pinhole camera model.
www.mitwpu.edu.in
Fig.1: Epipolar geometry
Epipole or epipolar point
Since the optical centers of the cameras lenses are distinct, each center projects onto a distinct
point into the other camera's image plane. These two image points, denoted by eL and eR, are
called epipoles or epipolar points. Both epipoles eL and eR in their respective image planes
and both optical centers OL and OR lie on a single 3D line.
Epipolar line
The line OL X is seen by the left camera as a point because it is directly in line with that
camera's lens optical center. However, the right camera sees this line as a line in its image
plane. That line (eR xR) in the right camera is called an epipolar line. Symmetrically, the line
OR X is seen by the right camera as a point and is seen as epipolar line eL xLby the left
camera.
An epipolar line is a function of the position of point X in the 3D space, i.e. as X varies, a set
of epipolar lines is generated in both images. Since the 3D line OL X passes through the optical
center of the lens OL, the corresponding epipolar line in the right image must pass through the
epipole eR (and correspondingly for epipolar lines in the left image). All epipolar lines in one
image contain the epipolar point of that image. In fact, any line which contains the epipolar
point is an epipolar line since it can be derived from some 3D point X.
Epipolar constraint and triangulation
If the relative position of the two cameras is known, this leads to two important observations:
Assume the projection point xL is known, and the epipolar line eR xR is known and the point
X projects into the right image, on a point xR which must lie on this particular epipolar line.
This means that for each point observed in one image the same point must be observed in the
www.mitwpu.edu.in
other image on a known epipolar line. This provides an epipolar constraint: the projection of X
on the right camera plane xR must be contained in the eR xR epipolar line. All points X e.g.
X1, X2, X3 on the OL XL line will verify that constraint. It means that it is possible to test if
two points correspond to the same 3D point. Epipolar constraints can also be described by the
essential matrix or the fundamental matrix between the two cameras.
If the points xL and xR are known, their projection lines are also known. If the two image
points correspond to the same 3D point X the projection lines must intersect precisely at X.
This means that X can be calculated from the coordinates of the two image points, a process
called triangulation as shown in figure 2. Depth of the scene can be estimated from disparity.
PROCEDURE:
1. Acquire 2 images of same scent from 2 different cameras i.e. left camera and right
camera.
2. Make sure about their shape, size, dtype, etc must be same
www.mitwpu.edu.in
3. Convert 2 images to gray.
4. Set parameters of the function cv.StereoBM.create() in opencv.
5. Find disparity map and display with normalization.
6. Find depth map and display with normalization if required.
LAB TASKS:
1. Acquire 2 stereo images of the same scene
2. Find disparity map and depth map.
Post Lab Questions:
1. What is stereo correspondence?
2. Explain epipolar geometry with neat diagram and define the terms- epipoles, epipolar
lines, baseline ?
3. How disparity and depth are related, explain with concept of triangulation?
4. Explain construction and working of Lidar and Kinect.
Conclusion:
Additional links:
https://en.wikipedia.org/wiki/Kinect
https://en.wikipedia.org/wiki/Fundamental_matrix_(computer_vision)
https://www.pyroistech.com/lidar/
https://web.eecs.umich.edu/~jjcorso/t/598F14/files/lecture_1027_stereo.pdf
www.mitwpu.edu.in
T. Y. ECE - AIML Year 2023-24
Semester: VI Subject: Computer Vision
Experiment No: 8
Theory:
3D computer vision extracts, processes, and analyzes 2D visual data to generate their 3D
models. To do so, it employs different algorithms and data acquisition techniques that enable
computer vision models to reconstruct the dimensions, contours and spatial relationships of
objects within a given visual setting. The 3D CV techniques combine principles from multiple
disciplines, such as computer vision, photogrammetry, geometry and machine learning with
the objective of deriving valuable three-dimensional information from images, videos or sensor
data.
Spatial dimensions refer to the three orthogonal axes (X, Y, and Z) that make the 3D coordinate
system as shown in figure 1. These dimensions capture the height, width, and depth values of
objects. Spatial coordinates facilitate the representation, examination, and manipulation of 3D
data like point clouds, meshes, or voxel grids essential for applications such as robotics,
augmented reality, and 3D reconstruction.
www.mitwpu.edu.in
Fig. 1: spatial dimension
Passive Techniques:
Shape from Shading -
shape using just a single 2D image. This technique analyzes how light hits the object (shading
patterns) and how bright different areas appear (intensity variations). By understanding how
Shape from Texture- Shape from texture is a method used in computer vision to determine the
three-dimensional shape of an object based on the distortions found in its surface texture. This
technique relies on the assumption that the surface possesses a textured pattern with known
characteristics
Depth from Defocus- Depth from defocus is a process that calculates the depth or three-
dimensional structure of a scene by examining the degree of blur or defocus present in areas of
an image. It works on the principle that objects situated at distances, from the camera lens will
exhibit varying levels of defocus blur. By comparing these blur levels throughout the image,
DfD can generate depth maps or three-dimensional models representing the scene.
Active Techniques:
www.mitwpu.edu.in
information of different points on the object.
Time-of-Flight (ToF) Sensors- Time-of-flight (ToF) sensor is another active vision technique
that measures the time it takes for a light signal to travel from the sensor to an object and back.
Common light sources for ToF sensors are lasers or infrared (IR) LEDs. The sensor emits a
light pulse and then calculates the distance based on the time-of-flight of the reflected light
beam. By capturing this time for each pixel in the sensor array, a 3D depth map of the scene is
generated. Unlike regular cameras that capture color or brightness, ToF sensors provide depth
information for every point which essentially helps in building a 3D image of the surroundings.
PROCEDURE:
1. Install Kinect camera
2. Capture the images of scene
3. Observe 3D model of the scence .
Conclusion:
Additional links:
https://viso.ai/computer-vision/3d-computer-vision/
www.mitwpu.edu.in