0% found this document useful (0 votes)
20 views90 pages

Computer Vision Unit 4

Unit 4 of the Computer Vision course focuses on vision and motion, detailing methods for 3D vision including projection schemes, shape from shading, photometric stereo, and motion analysis techniques. It explains how these methods help in estimating depth, recognizing shapes, and reconstructing 3D models, which are essential for applications like robotics and medical imaging. The unit also covers surface representations, emphasizing point-based and volumetric representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views90 pages

Computer Vision Unit 4

Unit 4 of the Computer Vision course focuses on vision and motion, detailing methods for 3D vision including projection schemes, shape from shading, photometric stereo, and motion analysis techniques. It explains how these methods help in estimating depth, recognizing shapes, and reconstructing 3D models, which are essential for applications like robotics and medical imaging. The unit also covers surface representations, emphasizing point-based and volumetric representations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

COMPUTER VISION – UNIT 4

COMPUTER VISION
UNIT-4

VISION & MOTION

Dr.S.PRABU
Assistant Professor
CTech, SRMIST
COMPUTER VISION – UNIT 4

VISION AND MOTION

Methods For 3D Vision – Projection Schemes – Shape From Shading –


Photometric Stereo – Shape From Texture – Shape From Focus – Active
Range Finding – Surface Representations – Point-based Representation
Volumetric Representations – 3D Object Recognition – 3D Reconstruction
– Introduction To Motion – Triangulation – Bundle Adjustment –
Translational Alignment – Parametric Motion – Spline-based Motion –
Optical Flow – Layered Motion
COMPUTER VISION – UNIT 4

3D VISION

• 3D vision aims to infer the spatial geometry and structure of a scene from
one or more images.
• Different projection schemes model how 3D points are projected onto 2D
image planes, such as
• perspective, orthographic, or
• weak-perspective projections. A
• Several computational techniques are employed to estimate depth and
reconstruct 3D shape:
COMPUTER VISION – UNIT 4

3D VISION

That techniques are


▪ Shape from Shading
▪ Photometric Stereo
▪ Shape from Texture
▪ Shape from Focus/Defocus
▪ Active Range Finding
COMPUTER VISION – UNIT 4

MOTION ANALYSIS

• Understanding motion helps in analyzing how objects and observers move


within a scene.
• Motion estimation involves determining correspondences between image
frames and computing changes over time.
• Key techniques include:
✓ Triangulation
✓ Bundle Adjustment
✓ Spline-based Motion
✓ Optical Flow
✓ Layered Motion Models
COMPUTER VISION – UNIT 4

METHODS FOR 3D VISION

• In the real world, all objects around us exist in three dimensions (3D) — they
have height, width, and depth.

• However, when a camera captures a photo, it creates a 2D image, which


loses the depth information.

• 3D vision in computer vision refers to techniques that help a computer


recover this depth and understand the shape and position of objects in
space.
COMPUTER VISION – UNIT 4

WHY 3D VISION?

• 3D vision allows machines to:


✓ Estimate how far or near an object is (depth).
✓ Recognize shapes and objects more accurately.
✓ Move safely in the environment (as in robots or self-driving cars).
✓ Reconstruct 3D models for applications such as medical imaging,
virtual reality, and industrial inspection.
COMPUTER VISION – UNIT 4

3D VISION – MAIN METHODS

• There are many ways to get 3D information from 2D images.


• They can be divided into:
• Passive methods: Use only the available light from the environment
(no special light or sensors).
• Active methods: Use additional light sources or sensors to directly
measure depth.
• Let’s look at the common methods one by one.
COMPUTER VISION – UNIT 4

STEREO VISION (BINOCULAR VISION)


• Uses two cameras, like human eyes, placed a small distance apart.

• Both cameras capture the same scene from slightly different angles.

• By comparing the two images, the computer finds how much each object
“shifts” between them (called disparity).

• Using this shift, the depth (Z) of each point is calculated.


✓Near objects have a large disparity.
✓Far objects have a small disparity.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING

▪ Works with one image only.

▪ The way light and shadows fall on an object gives hints about its
surface shape.
✓Bright areas face the light source.
✓Darker areas face away or are in shadow.

▪ By studying these brightness variations, we can estimate the surface’s


orientation and curvature.
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO

▪ Similar to “shape from shading,” but uses several images of the same
object.
▪ Each image is taken under different lighting directions while the
camera position stays fixed.
▪ The brightness differences between these images help calculate the
surface normals and detailed texture.
COMPUTER VISION – UNIT 4

PROJECTION SCHEMES

• When a 3D object is viewed through a camera, its image is formed on


a 2D image plane (camera sensor).
• This process of converting a 3D world (X, Y, Z) into a 2D image (x, y) is
called projection.
• Projection plays a key role in computer vision, since every captured
image is a projection of the real world.
• Different projection schemes define how 3D points are mapped to 2D
and help us understand image geometry.
COMPUTER VISION – UNIT 4

IMPORTANCE OF PROJECTION

• Projection models describe how a camera views the world.


• If we understand the projection model, we can reverse it — to estimate the
real-world 3D positions of objects.
• Applications include:
✓ 3D Reconstruction
✓ Camera Calibration
✓ Object Measurement
✓ Robot Navigation
COMPUTER VISION – UNIT 4

TYPES OF PROJECTION SCHEMES


COMPUTER VISION – UNIT 4

ORTHOGRAPHIC PROJECTION

▪ Simplest form of projection.


▪ Assumes light rays are parallel and fall perpendicularly on the image
plane.
▪ No perspective distortion — parallel lines stay parallel, and object
size does not change with distance.
▪ Features:
✓ Easy to calculate
✓ True-to-scale object dimensions
✓ Depth (Z) information is lost
COMPUTER VISION – UNIT 4

PERSPECTIVE PROJECTION

• Realistic projection similar to camera or human eye view.


• Light rays converge at a single point called the center of projection or focal
point.
• Distant objects appear smaller, and parallel lines seem to meet at a
vanishing point.
• Features:
✓ Produces realistic images with depth
✓ Causes distortion — parallel lines appear to converge
✓ Requires more complex calculations
COMPUTER VISION – UNIT 4

PERSPECTIVE PROJECTION

▪ Vanishing Points and Horizon Line

▪ In perspective projection:
✓ Real-world parallel lines appear to meet at a vanishing point in the
image.
✓ All vanishing points lying on the ground form the horizon line.

▪ By locating vanishing points, we can infer plane orientation (e.g.,


whether a road is flat or inclined).
COMPUTER VISION – UNIT 4

PERSPECTIVE PROJECTION

• Shows multiple vanishing points


(V₁, V₂, V₃) and a horizon line (H).

• Parallel road lines meet at


vanishing points along h,
representing realistic perspective
geometry.
COMPUTER VISION – UNIT 4

PERSPECTIVE PROJECTION
• Projection through Lenses
• Cameras use convex lenses to project
3D scenes onto an image plane.
• (A) The real projection forms an
inverted image behind the lens at the
focal plane (F).
• (B) For geometric convenience,
projection is often represented as a
non-inverted image formed at a
virtual focal plane in front of the lens.
COMPUTER VISION – UNIT 4

PROJECTIONS
16.1 (A): Orthographic projection of a rectangular box — parallel lines remain parallel.
16.1 (B): Perspective projection of a box — lines converge toward vanishing points, giving depth realism.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING

• When light falls on an object, some parts appear bright and others dark.
• These variations in brightness or intensity are called shading.
• The shape from shading technique uses this brightness information from a
single image to estimate the 3d shape and surface orientation of an object.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING

• Let’s consider a single light source shining on an object.


• Bright regions → surface faces toward the light.
• Dark regions → surface faces away from the light.
• Shadow regions → no light reaches the surface.
• By analyzing the intensity values of each pixel, we can estimate how the
surface bends or curves.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING


• The figure shows how feature
points (A, B, P, D, E) on a 3D
object are seen from two
different camera positions (C₁
and C₂).

• Each camera captures a 2D


projection of the same 3D object.

• Points like A₁, B₁, D₁ are the


projections of A, B, D as seen
from camera C₁, and A₂, B₂, D₂
are from camera C₂.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING


• The figure shows how light reflects from a surface.
COMPUTER VISION – UNIT 4

SHAPE FROM SHADING

Applications
• Face and object modeling
• Surface inspection in industry
• Medical imaging (surface curvature estimation)
• Digital art and rendering
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO

• Photometric Stereo is an advanced method used in 3D vision to estimate the


shape and surface details of an object using multiple images taken under
different lighting directions but from the same camera position.
• In simple words — instead of changing the camera’s viewpoint (like in
stereo vision), we change the light direction and capture several images of
the same object.
• By studying how the brightness changes in each image, we can find the
surface orientation (normal) at every point and then reconstruct the 3D
shape.
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO

• This figure shows how


we can find the
orientation of a surface
using photometric
stereo — a method in
computer vision.
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO
• Let’s assume we have:
✓A fixed camera (same position and view for all images).
✓A Lambertian surface (light reflects evenly in all directions).
✓Three or more light sources, each shining from a different direction.
• Each light source gives one image.
• The brightness of a point changes depending on how the surface faces each light.
• For a Lambertian surface:
• I=ρ×(L⋅N)
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO

• Steps in Photometric Stereo


1. Capture multiple images of the same object.
• Keep the camera fixed.
• Change only the direction of the light.
2. Measure brightness at each pixel in all images.
3. Solve equations for surface normal (N) using brightness differences.
4. Integrate the normals to reconstruct the 3D surface shape.
COMPUTER VISION – UNIT 4

PHOTOMETRIC STEREO

Applications
• Digital 3D modeling of sculptures, coins, and artifacts.
• Surface inspection in manufacturing (detecting dents or bumps).
• Medical imaging (skin texture analysis).
• Face reconstruction for biometric recognition.
• Archaeological documentation (restoring worn carvings).
COMPUTER VISION – UNIT 4

SHAPE FROM TEXTURE


• When we look at a photograph, we can often guess the shape of objects or
surfaces even if we don’t have depth information — this is because of
texture.

• Texture means the repeating pattern or surface detail of an object (for


example, the pattern on tiles, bricks, grass, or fabric).

• The shape from texture method uses changes in texture patterns in an image
to estimate the 3D shape, orientation, and depth of a surface.
COMPUTER VISION – UNIT 4

SHAPE FROM TEXTURE

Example
• Imagine looking at a brick wall:
• When you face it directly, all bricks look the same size.
• When you look at it at an angle, the bricks near you look big, and the bricks
far away look smaller and packed closely.
• This difference in texture density tells your brain that the wall is slanted.
COMPUTER VISION – UNIT 4

SHAPE FROM TEXTURE


COMPUTER VISION – UNIT 4

SHAPE FROM TEXTURE

Working Steps
1. Capture a single image with visible texture patterns.
2. Detect texture features (like dots, lines, or edges).
3. Measure how texture size and spacing change across the image.
4. Estimate surface tilt and depth variation using these texture changes.
5. Reconstruct the 3D shape of the surface.
COMPUTER VISION – UNIT 4

SHAPE FROM TEXTURE

Applications
• Terrain mapping (from aerial or satellite images).
• Road slope detection in autonomous driving.
• Object modeling and surface reconstruction.
• Robot navigation (detecting ground surface).
• Augmented reality (estimating floor or wall orientation).
COMPUTER VISION – UNIT 4

SHAPE FROM FOCUS


• When we take a photo, some parts of the image look sharp and clear, while
others look blurry.

• This happens because the camera lens can focus sharply only at a certain
distance.

• Shape from Focus (SFF) is a 3D shape recovery technique that uses this
principle of sharpness or focus to estimate the depth (distance) of objects in
a scene.
COMPUTER VISION – UNIT 4

SHAPE FROM FOCUS


• Take a series of images of the same scene by slightly changing the focus of the camera each time.
• By combining these sharpness measurements for all pixels, we can build a 3d surface (depth map).
COMPUTER VISION – UNIT 4

SHAPE FROM FOCUS


Steps in shape from focus
• Image acquisition
• Capture multiple images of the same object at different focus levels (focus
stack).
• Focus measure calculation
• For each pixel, calculate a “focus value” that tells how sharp the image is at
that point.
• Focus level selection
• For each pixel, find which image has the highest focus measure → that is the
correct depth for that pixel.
• Depth map creation
• Use the selected focus level to estimate depth → produce a 3D surface.
COMPUTER VISION – UNIT 4

SHAPE FROM FOCUS

Applications
• 3D shape measurement of small objects (microscopy, manufacturing).
• Quality control in industries.
• Medical imaging.
• Digital surface inspection.
COMPUTER VISION – UNIT 4

ACTIVE RANGE FINDING

• Range finding means measuring the distance between a sensor (like a


camera) and an object.
• In active range finding, an external signal (like laser light or ultrasound) is
sent to the object, and its reflection is measured.
• From this, the distance (depth) can be calculated.
• This method is called “active” because it uses its own energy source — unlike
passive methods such as shape from shading or focus.
COMPUTER VISION – UNIT 4

ACTIVE RANGE FINDING


COMPUTER VISION – UNIT 4

ACTIVE RANGE FINDING


• Common Active Range Finding Methods
a) Time-of-Flight (ToF)
b) Triangulation Method
• A light beam (e.g., laser) is projected on the object.
• A camera looks at where the light spot falls.
• The angle of the reflected beam helps calculate distance.
• Used in short-range, high-precision measurements.
c) Structured Light Method
COMPUTER VISION – UNIT 4

ACTIVE RANGE FINDING

Applications
• 3D scanning and mapping (LiDAR)
• Robotics and autonomous vehicles
• Industrial inspection and quality control
• Gesture and motion recognition (e.g., Kinect)
• Medical imaging and surgery assistance
COMPUTER VISION – UNIT 4

SURFACE REPRESENTATIONS

• In computer vision and 3D modeling, surface representation refers to


how the shape of a 3D object is stored and displayed in the computer.
• Two major types of surface representation are:
1. Point-based representation
2. Volumetric representation
COMPUTER VISION – UNIT 4

POINT-BASED REPRESENTATION

Definition:
• Point-based representation describes a 3D surface using a set of discrete
3D points in space, often called a point cloud.
• Each point has 3D coordinates: (x, y, z).
• Points may also include additional information such as color or intensity.
How it is obtained:
• 3D scanners, LiDAR sensors, stereo cameras, or photogrammetry techniques.
COMPUTER VISION – UNIT 4

POINT-BASED REPRESENTATION
• Instead, we record thousands or millions of tiny points (x, y, z) on its surface.
• When all these points are plotted together → the shape of the object appears.
• This collection of points is called a Point Cloud.
COMPUTER VISION – UNIT 4

POINT-BASED REPRESENTATION
Characteristics:
• No polygons or surfaces are explicitly connected.
• The shape is implied by the distribution of points in space.

Advantages:
• Easy to capture from real-world objects.
• Simple and efficient to store.
• Useful for visualization and surface reconstruction.
COMPUTER VISION – UNIT 4

POINT-BASED REPRESENTATION

Applications
• 3D modeling and reconstruction
• Robotics and autonomous navigation
• Cultural heritage and archaeology scanning
• Virtual reality (VR) / augmented reality (AR)
• Industrial inspection and reverse engineering
COMPUTER VISION – UNIT 4

VOLUMETRIC REPRESENTATION

Definition:
• Volumetric representation describes a 3D object by dividing space into
small volume elements (voxels) rather than just representing its surface.
• A voxel (volume element) is similar to a pixel but in 3D.
• Each voxel may contain information like occupancy, density, or color.

Characteristics:
• Represents both interior and exterior of the object.
• Commonly used in medical imaging and 3D simulations.
COMPUTER VISION – UNIT 4

VOLUMETRIC REPRESENTATION
COMPUTER VISION – UNIT 4

VOLUMETRIC REPRESENTATION

• The 3D space around an object is divided into tiny cubes called voxels (short
for volume elements).
• Each voxel holds information such as:
✓Whether the point is part of the object or empty space
✓Or it may store distance, density, or color
Example:
• Imagine a 3D box made of tiny cubes.
• If a cube is part of the solid object, we mark it as 1 (occupied).
• If it is empty space, we mark it as 0 (empty).
COMPUTER VISION – UNIT 4

VOLUMETRIC REPRESENTATION

Applications
• Medical imaging (CT, MRI → represent human organs)
• 3D reconstruction in computer vision
• 3D modeling and graphics rendering
• Robotics (environment mapping)
• Additive manufacturing (3D printing)
COMPUTER VISION – UNIT 4

3D OBJECT RECOGNITION

• 3D Object Recognition means identifying and classifying objects in


three-dimensional space from visual data (like images, depth maps, or
3D point clouds).
• The goal is to answer:
✓ “What is this object?” and “Where is it located in 3D space?”
• It’s an important step after 3D reconstruction, where the scene’s shape
and surfaces are already known.
COMPUTER VISION – UNIT 4

3D OBJECT RECOGNITION

What Does It Do?


• The process involves:
1. Detecting the object in the scene.
2. Extracting features that describe its shape or geometry.
3. Comparing these features with known models in a database.
4. Recognizing the object (name, class, or type).
5. Estimating its position and orientation (pose) in 3D.
COMPUTER VISION – UNIT 4

3D OBJECT RECOGNITION
Steps in 3D Object Recognition
Step Description

1. Data Acquisition Capture 3D data using stereo cameras, LiDAR, or depth


sensors.
2. Preprocessing Remove noise and segment objects from background.
3. Feature Extraction Find shape features (edges, corners, planes, curves).

4. Matching Compare extracted features with stored models or


database.
5. Pose Estimation Compute object’s position and orientation in 3D.

6. Verification Confirm recognition using geometric constraints or error


checking.
COMPUTER VISION – UNIT 4

3D OBJECT RECOGNITION

This diagram shows the basic steps of 3D Object Recognition — how a computer
identifies and names a 3D object.
COMPUTER VISION – UNIT 4

INTRODUCTION TO MOTION

• Motion means the change in position of objects over time in a sequence


of images or video frames.
• In computer vision, motion analysis helps the computer understand how
objects move, their direction, speed, and even 3D structure.
Example:
• When you record a moving car, each frame shows the car at a
different position — this difference is motion.
COMPUTER VISION – UNIT 4

INTRODUCTION TO MOTION

This diagram gives a simple introduction to motion in computer vision,


specifically explaining the concept of optical flow.
COMPUTER VISION – UNIT 4

INTRODUCTION TO MOTION

Components in the image:


1. Camera – captures a sequence of images (frames) over time.
2. Moving Object – something in the scene that changes position as time passes.
3. (dx, dy) – represents the change in position of the object between two frames:
dx = change in the x-direction (horizontal movement)
dy = change in the y-direction (vertical movement)
4. Optical Flow – describes the apparent motion of the object (or pixels)
between two consecutive frames, as seen by the camera.
COMPUTER VISION – UNIT 4

INTRODUCTION TO MOTION
Basic Types of Motion
(a) Object Motion
• When the object moves in a scene and the camera is fixed.
• Example: A ball rolling on the floor recorded by a stationary camera.
(b) Camera Motion
• When the camera itself moves, even if the object is stationary.
• Example: A moving drone capturing a still building.
(c) Combined Motion
• Both the camera and object move at the same time.
• Example: Filming a moving car from another moving vehicle.
COMPUTER VISION – UNIT 4

TRIANGULATION
Triangulation is a method to find the 3D position of a point using two or more 2D
images taken from different viewpoints.
The basic idea:
If we know where the cameras are and where the point appears in each image, we
can calculate the 3D coordinates of that point by geometry.
Example:
• When both your eyes look at an object, your brain uses the small difference
between the two views to estimate how far the object is — that’s triangulation!
COMPUTER VISION – UNIT 4

TRIANGULATION

Let:
𝐶1 , 𝐶2 → positions of the two cameras

𝑃1 , 𝑃2 → projections of the same 3D point


𝑃on the two image planes

Then,
lines 𝐶1 𝑃1 and 𝐶2 𝑃2 are viewing rays,
and their intersection gives the 3D point
P.
COMPUTER VISION – UNIT 4

INTRODUCTION TO MOTION

Applications
• 3D reconstruction – build 3D models from images
• Robotics – estimate distance to obstacles
• Autonomous vehicles – depth sensing for road objects
• Augmented Reality (AR) – align virtual objects correctly in 3D
• Stereo vision systems – depth perception like human eyes
COMPUTER VISION – UNIT 4

BUNDLE ADJUSTMENT

• Bundle Adjustment (BA) is a process used in 3D computer vision to refine


both:
1.The 3D points of the scene, and
2.The camera parameters (position, orientation, and focal length),
• so that all 3D points and camera views match as accurately as possible.

In simple words:
• Bundle Adjustment improves the accuracy of 3D reconstruction by minimizing
errors between the observed image points and the projected 3D model
points.
COMPUTER VISION – UNIT 4

BUNDLE ADJUSTMENT

Working Principle
• Bundle Adjustment works by minimizing the reprojection error.
• Reprojection Error = Difference between
• where a 3D point actually appears in the image, and
• where the model predicts it should appear.

The goal:
• "Minimize "∑("Observed position"-"Projected position" )^2
• This optimization is usually done using non-linear least squares methods (like
the Levenberg–Marquardt algorithm).
COMPUTER VISION – UNIT 4

BUNDLE ADJUSTMENT

• Imagine you take several photos


of a building from different
angles.

• After reconstructing it in 3D, the


structure looks slightly distorted.

• Bundle Adjustment corrects these


distortions by adjusting camera
angles and 3D points — making
the model precisely aligned with
the images.
COMPUTER VISION – UNIT 4

BUNDLE ADJUSTMENT

Applications
• Structure from Motion (SfM) – refining 3D scene reconstruction
• SLAM (Simultaneous Localization and Mapping) – robot and drone
mapping
• Photogrammetry – accurate 3D measurement from photos
• 3D scanning and modeling – improving geometry precision
• Augmented Reality – ensuring virtual overlays match real-
world geometry
COMPUTER VISION – UNIT 4

TRANSLATIONAL ALIGNMENT
• Translational alignment means aligning two or more images or 3D
datasets by shifting (translating) them in x, y, or z directions so that
they overlap correctly.
• It corrects position differences between images or scans taken from
different viewpoints.

Example:
• If two images of the same object are slightly shifted, translational
alignment moves one image horizontally or vertically until both line up
perfectly.
COMPUTER VISION – UNIT 4

TRANSLATIONAL ALIGNMENT
What is Translation?
• In geometry, translation means moving an object without rotating or resizing it.
• In image alignment, it involves shifting the entire image by a fixed distance.

Translation vector:
T=(t_x,t_y,t_z)
where
t_x→ shift in x-direction
t_y→ shift in y-direction
t_z→ shift in z-direction (for 3D data)
COMPUTER VISION – UNIT 4

TRANSLATIONAL ALIGNMENT
How It Works
Step 1 – Input
• Two or more images (or 3D point sets) that represent the same scene but are
shifted relative to each other.

Step 2 – Find Translation Vector


• Estimate how much one image must move in x, y (and z for 3D) to align with
the other.

Step 3 – Apply Translation


• Shift one image using the calculated vector until the features or pixels match.
COMPUTER VISION – UNIT 4

TRANSLATIONAL ALIGNMENT
COMPUTER VISION – UNIT 4

TRANSLATIONAL ALIGNMENT

Applications
• Image registration – align satellite, medical, or microscope images
• Panorama stitching – combine overlapping photos
• Motion analysis – detect how far an object moved
• 3D reconstruction – align depth maps or point clouds
• Robotics and SLAM – align sensor data in navigation
COMPUTER VISION – UNIT 4

PARAMETRIC MOTION

• In computer vision, motion means how objects move between image frames.
• We often need to describe and analyze this movement.
• For example, how a car moves across a video.
• Parametric motion means representing motion using a mathematical model
(equation) with a few parameters (like translation, rotation, scaling, etc.).
• Parametric” means the motion is described by parameters.
• Instead of describing how each pixel moves separately, we describe the
overall motion of a region or object using a small set of parameters.
COMPUTER VISION – UNIT 4

PARAMETRIC MOTION
COMPUTER VISION – UNIT 4

PARAMETRIC MOTION
Mathematical Model
• The general form of a 2D motion model is:
where:
(x, y) → coordinates in the first frame
(x’, y’) → coordinates in the next frame
A → transformation matrix (rotation, scaling, shear)
tₓ, tᵧ → translation (shifts in x and y direction)
• This matrix A contains the parameters that define how the image
moves.
COMPUTER VISION – UNIT 4

PARAMETRIC MOTION

Why Use Parametric Motion?


• Reduces complexity — only a few parameters instead of thousands of pixel
motions.
• Easier to estimate and track motion between frames.
• Helps in tasks like:
✓ Video stabilization
✓ Object tracking
✓ Motion segmentation
✓ Image alignment (registration)
COMPUTER VISION – UNIT 4

SPLINE-BASED MOTION

• In motion analysis, we often need to describe how objects move smoothly


over time.
• Sometimes, the motion is not constant — for example, a flying bird, a
walking person, or a moving camera.
• To represent this kind of smooth and continuous motion, we use splines.
• A spline is a smooth curve made by joining several simple curve segments
together.
• Each segment is usually a polynomial (like a quadratic or cubic curve).
• Splines are used to approximate smooth motion paths in both 2D and 3D
space.
COMPUTER VISION – UNIT 4

SPLINE-BASED MOTION
COMPUTER VISION – UNIT 4

SPLINE-BASED MOTION

• Let’s say an object moves through these positions:


P₀(0,0), P₁(2,1), P₂(4,0), P₃(6,2)
• A cubic spline will create a smooth curve passing through all points —
• this curve represents how the object moves continuously from P₀ to P₃.
• So instead of linear jumps between frames, we get smooth motion.
COMPUTER VISION – UNIT 4

SPLINE-BASED MOTION

Applications in Computer Vision


• Spline-based motion models are used in:
✓ Tracking moving objects (smooth path fitting)
✓ Camera motion estimation
✓ 3D reconstruction (estimating smooth motion of camera/object)
✓ Animation and robotics (smooth trajectory control)
✓ Optical flow smoothing (reducing noise in motion vectors)
COMPUTER VISION – UNIT 4

OPTICAL FLOW

• In a video or image sequence, objects move — and their pixels change


position from one frame to the next.
• Optical flow is a method to measure this apparent motion of brightness
patterns between consecutive frames.
• So, optical flow = motion of pixels over time.
• Optical flow describes how each pixel in an image moves between two
frames.
• It gives a velocity field — that means for each pixel, we estimate how far
(and in which direction) it has moved.
COMPUTER VISION – UNIT 4

OPTICAL FLOW
COMPUTER VISION – UNIT 4

OPTICAL FLOW

• Optical flow relies on a simple idea:


• The brightness (intensity) of a moving pixel does not change between two
consecutive frames.

• Mathematically:
I(x,y,t)=I(x+u,y+v,t+1)
where
I(x,y,t): intensity of pixel at position (x, y) at time t
(u,v): motion of that pixel between frames
COMPUTER VISION – UNIT 4

OPTICAL FLOW

Applications of Optical Flow


• Motion detection and tracking
• Video compression
• 3D structure estimation
• Object segmentation
• Robot navigation
• Gesture and activity recognition
COMPUTER VISION – UNIT 4

LAYERED MOTION

• In real-world scenes, many objects move independently


• for example:
✓ A car moving on the road
✓ Trees swaying in the wind
✓ People walking in front of a building
• These different movements belong to different parts (layers) of the scene.
• The idea of Layered Motion is to model the motion of a complex scene as
multiple simpler layers, each having its own motion.
COMPUTER VISION – UNIT 4

LAYERED MOTION

• Layered Motion represents the overall motion of a scene as a combination


of several motion layers.
• Each layer corresponds to a region or object moving differently.

Each layer:
• Has its own motion model (e.g., parametric motion: translation, affine, etc.)
• Represents one part of the image (like foreground, background, or a
moving object)
• This makes motion analysis simpler and more accurate.
COMPUTER VISION – UNIT 4

LAYERED MOTION

Steps in Layered Motion Estimation


1. Divide scene into regions (layers) → based on motion differences
or object boundaries.
2. Estimate motion parameters for each layer.
3. Warp (transform) each layer according to its motion.
4. Combine the layers to reconstruct the overall moving scene.
COMPUTER VISION – UNIT 4

LAYERED MOTION

Imagine a video with:


• Background (buildings) – static
• Car – moving right
• Pedestrian – moving left
We can create 3 motion layers:
1.Background layer → no motion
2.Car layer → rightward motion
3.Pedestrian layer → leftward motion
• Each layer has its own parametric motion, and together they form the full video
motion.
COMPUTER VISION – UNIT 4

LAYERED MOTION

Applications
• Video object segmentation
• Background subtraction
• Motion-based video compression
• 3D scene understanding
• Visual tracking and surveillance
COMPUTER VISION – UNIT 4

You might also like