DMET 901: Computer Vision
3D Vision (1)
Mohamed Karam Gabr
Winter 2024
3D Vision
• Human vision maps 3D scenes to 2D images
• 3D vision can be defined as:
From an image or a series of images of a scene, derive an accurate
three-dimensional geometric description of the scene and
quantitatively determine the properties of the object in the scene
3D Vision
• 3D vision using intensity images as input is difficult because:
o The imaging system performs perspective projection
- Points along a line pointing from the optical center project to a single
point
- Parallel lines in the real world do not remain parallel in a perspective
image
o Mutual occlusion of objects
o Presence of noise and complexity of needed algorithms
• To understand 3D vision, we have to understand the basics of
projective geometry
Projective Geometry
• We start by defining the projective space as:
For any point located at (x, y) in the Euclidean space, add another
dimension to represent the same point (x, y, 1) where the added
dimension is always 1
z
x
y
• In this space, scaling is not important so (x, y, 1) is equivalent to
(αx, αy, α), where α is non-zero
• In such space, points located at infinity are represented as (x, y, 0) since
such representation corresponds to the point (x/0, y/0)
Projective Geometry
• To represent a line in the projective space defined by
ax + by + c = 0
This equation can be written in vector form as:
x
a b c y 0 uT p 0
1
• Therefore, in projective space points and lines have the same
representation as 3-dimensional vectors
Projective Geometry
• From geometry, the equation of any straight line can be determined
from any 2 points on it using cross product
• Example: Consider the equation of the straight line passing through the
points (0, 0) and (1, 3). The two points can be represented in projective
space as a = (0, 0, 1) and b = (1, 3, 1). The equation of the straight line
can be found as
i j k
ab 0 0 1
1 3 1
0 3i 0 1 j 0 0 k
3i 1 j 0k 3,1, 0
The equation of the straight line is thus -3x + y = 0 which is the line that
passes through the 2 points (0, 0) and (1, 3)
Projective Geometry
• Some useful properties can be inferred from this representation:
- Consider the intersection of 2 lines u1 = (a1, b1, c1) and u2 = (a2, b2, c2)
which can be shown to be at
(b1c2 – b2 c1, a2c1 – a1c2, a1b2 – a2b1)
The exact same result can be obtained by the cross product of
u1 x u2
- If two lines u1 = (a1, b1, c1) and u2 = (a2, b2, c2) are parallel, their
slopes -a1/b1 and –a2/b2 have to be equal. Therefore, their point of
intersection is located at
u1 x u2 = (b1c2 – b2 c1, a2c1 – a1c2, 0)
which is equivalent to ([b1c2 – b2 c1]/0, [a2c1 – a1c2]/0) = (∞, ∞)
“Any two parallel lines intersect at infinity”
Homography
• Homography is any linear mapping in the projective space defined as
u' Hu
where H is a 3 x 3 matrix for two dimensional points in Euclidean space
• Example: To rotate by any angle θ
(x’, y’)
x' L cos L (x, y)
L cos cos L sin sin L
L cos x, L sin y
x' x cos y sin
y ' x sin y cos
Homography
• Example: The rotation and translation geometric transform defined as
x' cos( ) sin( ) x t x
y ' sin( ) cos( ) y t
y
This can be redefined using homography in projective space as:
x' cos sin t x x x
y ' sin cos t y R t
y 0T 1
y
1 0 0 1 1 1
where
cos sin t x
R , t t
sin cos y
Homography
• Another example of homography: Horizontal Shear
x' 1 0.5 0 x
y ' 0 1 0 y
1 0 0 1 1
Original Image Sheared Image
• After coordinate transformation, interpolation is used to determine the
colors of pixels as we did before with rotation
Homography
• The advantage of projective space representation is that the addition of
an extra dimension enables the representation of any transformation as
a single linear operation (matrix multiplication)
• For any general transformation H given by
u ' h11 h12 h13 u
v' h21 h22 h23 v
1 h31 h32 h33 1
In order to find, the transformed coordinates u’ and v’, α has to be
eliminated by
h u h12 v h13 h u h22 v h23
u ' 11 , v' 21
h31u h32 v h33 h31u h32 v h33
A Single Perspective Camera
• The pinhole model of a camera is an approximation suitable for
computer vision applications
(Real World Point)
(Projected Point)
• It can be shown that the real world point X = (X, Y, Z) will be mapped to
the point x = (fX/Z, fY/Z, f), where f is the focal length of the camera
A Single Perspective Camera
• By ignoring the last coordinate in x = (fX/Z, fY/Z, f) (since the image is
always 2D), then the projected point can be written as x = (fX/Z, fY/Z)
• This point in the projective space can be written as x = (fX/Z, fY/Z, 1)
which is equivalent to x = (fX, fY, Z)
• Therefore, the mapping from the real world point in the projective space
X = (X, Y, Z, 1) to x = (fX, fY, Z) can be expressed using the following
homography
X
fX f 0 0 0
x fY 0 0 0
Y
f
Z
Z 0 0
1 0
1
A Single Perspective Camera
• In the most general case, the camera performs a linear transformation
from the 3D projective space to the 2D projective space
A Single Perspective Camera
• Projection into the image plane
involves the following
transformations:
1- World Euclidean coordinate system
Camera Euclidean coordinate system
(Rotation and Translation)
R t
Xc T X
0 1
R and t are called the extrinsic camera
calibration parameters
2- Camera Euclidean coordinate system
Image Euclidean coordinate system
We have shown that this is equivalent to
f 0 0 0 1 0 0 0
~ 0
u f 0 0 X c if f = 1 u
~ 0 1 0 0 X
c
0 0 1 0 0 0 1 0
A Single Perspective Camera
• Projection into the image plane
involves the following
transformations:
3- Image Euclidean coordinate system
Image affine coordinate system
The previous transformation would have
been enough if the origin of the image
plane is the same as the principal point.
This is not generally the case. To
represent such relationship
f s u0
~ 0
u Ku g v0 u
~
0 0 1
The matrix K is called the intrinsic calibration matrix. f and g correspond to
the focal lengths in horizontal and vertical directions. s represents shear, and
u0 and v0 represent translation
A Single Perspective Camera
• By combining the 3 transformations, we get
1 0 0 0
R t
u K 0 1 0 0 T X
0 0 1 0
0 1
K R | t X MX
The matrix M is called the projection matrix
• Therefore, to reconstruct 3D location of a point X from the image
coordinates u
X M 1u