Computer Vision Unit 4
Computer Vision Unit 4
COMPUTER VISION
UNIT-4
Dr.S.PRABU
Assistant Professor
CTech, SRMIST
COMPUTER VISION – UNIT 4
3D VISION
• 3D vision aims to infer the spatial geometry and structure of a scene from
one or more images.
• Different projection schemes model how 3D points are projected onto 2D
image planes, such as
• perspective, orthographic, or
• weak-perspective projections. A
• Several computational techniques are employed to estimate depth and
reconstruct 3D shape:
COMPUTER VISION – UNIT 4
3D VISION
MOTION ANALYSIS
• In the real world, all objects around us exist in three dimensions (3D) — they
have height, width, and depth.
WHY 3D VISION?
• Both cameras capture the same scene from slightly different angles.
• By comparing the two images, the computer finds how much each object
“shifts” between them (called disparity).
▪ The way light and shadows fall on an object gives hints about its
surface shape.
✓Bright areas face the light source.
✓Darker areas face away or are in shadow.
PHOTOMETRIC STEREO
▪ Similar to “shape from shading,” but uses several images of the same
object.
▪ Each image is taken under different lighting directions while the
camera position stays fixed.
▪ The brightness differences between these images help calculate the
surface normals and detailed texture.
COMPUTER VISION – UNIT 4
PROJECTION SCHEMES
IMPORTANCE OF PROJECTION
ORTHOGRAPHIC PROJECTION
PERSPECTIVE PROJECTION
PERSPECTIVE PROJECTION
▪ In perspective projection:
✓ Real-world parallel lines appear to meet at a vanishing point in the
image.
✓ All vanishing points lying on the ground form the horizon line.
PERSPECTIVE PROJECTION
PERSPECTIVE PROJECTION
• Projection through Lenses
• Cameras use convex lenses to project
3D scenes onto an image plane.
• (A) The real projection forms an
inverted image behind the lens at the
focal plane (F).
• (B) For geometric convenience,
projection is often represented as a
non-inverted image formed at a
virtual focal plane in front of the lens.
COMPUTER VISION – UNIT 4
PROJECTIONS
16.1 (A): Orthographic projection of a rectangular box — parallel lines remain parallel.
16.1 (B): Perspective projection of a box — lines converge toward vanishing points, giving depth realism.
COMPUTER VISION – UNIT 4
• When light falls on an object, some parts appear bright and others dark.
• These variations in brightness or intensity are called shading.
• The shape from shading technique uses this brightness information from a
single image to estimate the 3d shape and surface orientation of an object.
COMPUTER VISION – UNIT 4
Applications
• Face and object modeling
• Surface inspection in industry
• Medical imaging (surface curvature estimation)
• Digital art and rendering
COMPUTER VISION – UNIT 4
PHOTOMETRIC STEREO
PHOTOMETRIC STEREO
PHOTOMETRIC STEREO
• Let’s assume we have:
✓A fixed camera (same position and view for all images).
✓A Lambertian surface (light reflects evenly in all directions).
✓Three or more light sources, each shining from a different direction.
• Each light source gives one image.
• The brightness of a point changes depending on how the surface faces each light.
• For a Lambertian surface:
• I=ρ×(L⋅N)
COMPUTER VISION – UNIT 4
PHOTOMETRIC STEREO
PHOTOMETRIC STEREO
Applications
• Digital 3D modeling of sculptures, coins, and artifacts.
• Surface inspection in manufacturing (detecting dents or bumps).
• Medical imaging (skin texture analysis).
• Face reconstruction for biometric recognition.
• Archaeological documentation (restoring worn carvings).
COMPUTER VISION – UNIT 4
• The shape from texture method uses changes in texture patterns in an image
to estimate the 3D shape, orientation, and depth of a surface.
COMPUTER VISION – UNIT 4
Example
• Imagine looking at a brick wall:
• When you face it directly, all bricks look the same size.
• When you look at it at an angle, the bricks near you look big, and the bricks
far away look smaller and packed closely.
• This difference in texture density tells your brain that the wall is slanted.
COMPUTER VISION – UNIT 4
Working Steps
1. Capture a single image with visible texture patterns.
2. Detect texture features (like dots, lines, or edges).
3. Measure how texture size and spacing change across the image.
4. Estimate surface tilt and depth variation using these texture changes.
5. Reconstruct the 3D shape of the surface.
COMPUTER VISION – UNIT 4
Applications
• Terrain mapping (from aerial or satellite images).
• Road slope detection in autonomous driving.
• Object modeling and surface reconstruction.
• Robot navigation (detecting ground surface).
• Augmented reality (estimating floor or wall orientation).
COMPUTER VISION – UNIT 4
• This happens because the camera lens can focus sharply only at a certain
distance.
• Shape from Focus (SFF) is a 3D shape recovery technique that uses this
principle of sharpness or focus to estimate the depth (distance) of objects in
a scene.
COMPUTER VISION – UNIT 4
Applications
• 3D shape measurement of small objects (microscopy, manufacturing).
• Quality control in industries.
• Medical imaging.
• Digital surface inspection.
COMPUTER VISION – UNIT 4
Applications
• 3D scanning and mapping (LiDAR)
• Robotics and autonomous vehicles
• Industrial inspection and quality control
• Gesture and motion recognition (e.g., Kinect)
• Medical imaging and surgery assistance
COMPUTER VISION – UNIT 4
SURFACE REPRESENTATIONS
POINT-BASED REPRESENTATION
Definition:
• Point-based representation describes a 3D surface using a set of discrete
3D points in space, often called a point cloud.
• Each point has 3D coordinates: (x, y, z).
• Points may also include additional information such as color or intensity.
How it is obtained:
• 3D scanners, LiDAR sensors, stereo cameras, or photogrammetry techniques.
COMPUTER VISION – UNIT 4
POINT-BASED REPRESENTATION
• Instead, we record thousands or millions of tiny points (x, y, z) on its surface.
• When all these points are plotted together → the shape of the object appears.
• This collection of points is called a Point Cloud.
COMPUTER VISION – UNIT 4
POINT-BASED REPRESENTATION
Characteristics:
• No polygons or surfaces are explicitly connected.
• The shape is implied by the distribution of points in space.
Advantages:
• Easy to capture from real-world objects.
• Simple and efficient to store.
• Useful for visualization and surface reconstruction.
COMPUTER VISION – UNIT 4
POINT-BASED REPRESENTATION
Applications
• 3D modeling and reconstruction
• Robotics and autonomous navigation
• Cultural heritage and archaeology scanning
• Virtual reality (VR) / augmented reality (AR)
• Industrial inspection and reverse engineering
COMPUTER VISION – UNIT 4
VOLUMETRIC REPRESENTATION
Definition:
• Volumetric representation describes a 3D object by dividing space into
small volume elements (voxels) rather than just representing its surface.
• A voxel (volume element) is similar to a pixel but in 3D.
• Each voxel may contain information like occupancy, density, or color.
Characteristics:
• Represents both interior and exterior of the object.
• Commonly used in medical imaging and 3D simulations.
COMPUTER VISION – UNIT 4
VOLUMETRIC REPRESENTATION
COMPUTER VISION – UNIT 4
VOLUMETRIC REPRESENTATION
• The 3D space around an object is divided into tiny cubes called voxels (short
for volume elements).
• Each voxel holds information such as:
✓Whether the point is part of the object or empty space
✓Or it may store distance, density, or color
Example:
• Imagine a 3D box made of tiny cubes.
• If a cube is part of the solid object, we mark it as 1 (occupied).
• If it is empty space, we mark it as 0 (empty).
COMPUTER VISION – UNIT 4
VOLUMETRIC REPRESENTATION
Applications
• Medical imaging (CT, MRI → represent human organs)
• 3D reconstruction in computer vision
• 3D modeling and graphics rendering
• Robotics (environment mapping)
• Additive manufacturing (3D printing)
COMPUTER VISION – UNIT 4
3D OBJECT RECOGNITION
3D OBJECT RECOGNITION
3D OBJECT RECOGNITION
Steps in 3D Object Recognition
Step Description
3D OBJECT RECOGNITION
This diagram shows the basic steps of 3D Object Recognition — how a computer
identifies and names a 3D object.
COMPUTER VISION – UNIT 4
INTRODUCTION TO MOTION
INTRODUCTION TO MOTION
INTRODUCTION TO MOTION
INTRODUCTION TO MOTION
Basic Types of Motion
(a) Object Motion
• When the object moves in a scene and the camera is fixed.
• Example: A ball rolling on the floor recorded by a stationary camera.
(b) Camera Motion
• When the camera itself moves, even if the object is stationary.
• Example: A moving drone capturing a still building.
(c) Combined Motion
• Both the camera and object move at the same time.
• Example: Filming a moving car from another moving vehicle.
COMPUTER VISION – UNIT 4
TRIANGULATION
Triangulation is a method to find the 3D position of a point using two or more 2D
images taken from different viewpoints.
The basic idea:
If we know where the cameras are and where the point appears in each image, we
can calculate the 3D coordinates of that point by geometry.
Example:
• When both your eyes look at an object, your brain uses the small difference
between the two views to estimate how far the object is — that’s triangulation!
COMPUTER VISION – UNIT 4
TRIANGULATION
Let:
𝐶1 , 𝐶2 → positions of the two cameras
Then,
lines 𝐶1 𝑃1 and 𝐶2 𝑃2 are viewing rays,
and their intersection gives the 3D point
P.
COMPUTER VISION – UNIT 4
INTRODUCTION TO MOTION
Applications
• 3D reconstruction – build 3D models from images
• Robotics – estimate distance to obstacles
• Autonomous vehicles – depth sensing for road objects
• Augmented Reality (AR) – align virtual objects correctly in 3D
• Stereo vision systems – depth perception like human eyes
COMPUTER VISION – UNIT 4
BUNDLE ADJUSTMENT
In simple words:
• Bundle Adjustment improves the accuracy of 3D reconstruction by minimizing
errors between the observed image points and the projected 3D model
points.
COMPUTER VISION – UNIT 4
BUNDLE ADJUSTMENT
Working Principle
• Bundle Adjustment works by minimizing the reprojection error.
• Reprojection Error = Difference between
• where a 3D point actually appears in the image, and
• where the model predicts it should appear.
The goal:
• "Minimize "∑("Observed position"-"Projected position" )^2
• This optimization is usually done using non-linear least squares methods (like
the Levenberg–Marquardt algorithm).
COMPUTER VISION – UNIT 4
BUNDLE ADJUSTMENT
BUNDLE ADJUSTMENT
Applications
• Structure from Motion (SfM) – refining 3D scene reconstruction
• SLAM (Simultaneous Localization and Mapping) – robot and drone
mapping
• Photogrammetry – accurate 3D measurement from photos
• 3D scanning and modeling – improving geometry precision
• Augmented Reality – ensuring virtual overlays match real-
world geometry
COMPUTER VISION – UNIT 4
TRANSLATIONAL ALIGNMENT
• Translational alignment means aligning two or more images or 3D
datasets by shifting (translating) them in x, y, or z directions so that
they overlap correctly.
• It corrects position differences between images or scans taken from
different viewpoints.
Example:
• If two images of the same object are slightly shifted, translational
alignment moves one image horizontally or vertically until both line up
perfectly.
COMPUTER VISION – UNIT 4
TRANSLATIONAL ALIGNMENT
What is Translation?
• In geometry, translation means moving an object without rotating or resizing it.
• In image alignment, it involves shifting the entire image by a fixed distance.
Translation vector:
T=(t_x,t_y,t_z)
where
t_x→ shift in x-direction
t_y→ shift in y-direction
t_z→ shift in z-direction (for 3D data)
COMPUTER VISION – UNIT 4
TRANSLATIONAL ALIGNMENT
How It Works
Step 1 – Input
• Two or more images (or 3D point sets) that represent the same scene but are
shifted relative to each other.
TRANSLATIONAL ALIGNMENT
COMPUTER VISION – UNIT 4
TRANSLATIONAL ALIGNMENT
Applications
• Image registration – align satellite, medical, or microscope images
• Panorama stitching – combine overlapping photos
• Motion analysis – detect how far an object moved
• 3D reconstruction – align depth maps or point clouds
• Robotics and SLAM – align sensor data in navigation
COMPUTER VISION – UNIT 4
PARAMETRIC MOTION
• In computer vision, motion means how objects move between image frames.
• We often need to describe and analyze this movement.
• For example, how a car moves across a video.
• Parametric motion means representing motion using a mathematical model
(equation) with a few parameters (like translation, rotation, scaling, etc.).
• Parametric” means the motion is described by parameters.
• Instead of describing how each pixel moves separately, we describe the
overall motion of a region or object using a small set of parameters.
COMPUTER VISION – UNIT 4
PARAMETRIC MOTION
COMPUTER VISION – UNIT 4
PARAMETRIC MOTION
Mathematical Model
• The general form of a 2D motion model is:
where:
(x, y) → coordinates in the first frame
(x’, y’) → coordinates in the next frame
A → transformation matrix (rotation, scaling, shear)
tₓ, tᵧ → translation (shifts in x and y direction)
• This matrix A contains the parameters that define how the image
moves.
COMPUTER VISION – UNIT 4
PARAMETRIC MOTION
SPLINE-BASED MOTION
SPLINE-BASED MOTION
COMPUTER VISION – UNIT 4
SPLINE-BASED MOTION
SPLINE-BASED MOTION
OPTICAL FLOW
OPTICAL FLOW
COMPUTER VISION – UNIT 4
OPTICAL FLOW
• Mathematically:
I(x,y,t)=I(x+u,y+v,t+1)
where
I(x,y,t): intensity of pixel at position (x, y) at time t
(u,v): motion of that pixel between frames
COMPUTER VISION – UNIT 4
OPTICAL FLOW
LAYERED MOTION
LAYERED MOTION
Each layer:
• Has its own motion model (e.g., parametric motion: translation, affine, etc.)
• Represents one part of the image (like foreground, background, or a
moving object)
• This makes motion analysis simpler and more accurate.
COMPUTER VISION – UNIT 4
LAYERED MOTION
LAYERED MOTION
LAYERED MOTION
Applications
• Video object segmentation
• Background subtraction
• Motion-based video compression
• 3D scene understanding
• Visual tracking and surveillance
COMPUTER VISION – UNIT 4