To Computer Vision: Ahmed Eslam Mohammed Solyman (M.D.)
To Computer Vision: Ahmed Eslam Mohammed Solyman (M.D.)
TO COMPUTER VISION
(Computer Vision and Robotics)
Prepared by:
(February 2019)
Computer Vision
Contents
CHAPTER 1 3
1.1 Introduction 3
2
Computer Vision
CHAPTER 1
COMPUTER VISION
1.1 Introduction
This chapter describes the vision-based control strategies for pick-
and-place robotic application. The software application of these strategies
is accomplished by using MATLAB/SIMULINK of MathWorks
-Company. The vision algorithms are used to identify the interest-objects
models and send their position and orientation data to the data acquisition
system then to the microcontroller which solves the inverse kinematics
and orders the robot to pick these objects to place them in a target goal.
3
Computer Vision
Microcontroller Data-Acquisition PC
{C}
{B}
{W}
{S}
{G}
α {T}
4
Computer Vision
The robot kinematic equation as discussed before can be written as:
B
TW = BTS . STG . (WTG=T)-1 (5.1)
Where BTS and (WTG=T)-1 are known from the physical dimensions of
B
the robot. The robot variables are included in robot matrix TW.
To get BTW, the matrix STG which gives the position and orientation of the
goal relative to the frame {S} must be determined as seen
in section 1.7.
X, Y
mm
End-effector Position Robot Variables
d1, θ2, θ3, θ4, θ5
Robot
Robot Inverse Kinematics
Fig. 1.3errors
control system. Certain Open-loop
can control blockdifferent
arise from diagram sources like
inverse kinematic inaccuracy, robot precision and camera calibration.
5
Computer Vision
and developing an algorithm to make the end effector to coincide with the
goal. The proposed algorithm in this case can be represented in the block
diagram shown in Fig. 1.4.
Error
Video Video Processing x, y
Camera Pixel Camera Calibration
mm
X, Y
End-effector Position Robot Variables
d1, θ2, θ3, θ4, θ5
Robot Inverse Kinematics
Robot
In the previous chapters we have already discussed the robot and the
Fig. 1.4 Closed-loop control block diagram
inverse kinematics (the green blocks in Fig. 1.3 and Fig. 1.4) in this
chapter the camera calibration, image processing and feedback algorithm
are discussed (the red blocks in Fig. 1.3 and Fig. 1.4).
Image Coordinates
Assume that an image f(x, y) is sampled so that the resulting image
has M rows and N columns so, the image is of size M * N. The values of
the coordinates are discrete quantities. For notational clarity and
convenience, we shall use integer values for these discrete coordinates.
The image origin is usually defined to be at (x, y) = (0, 0). The next
coordinate values along the first row of the image are (x, y) = (0, 1). The
notation (0, 1) is used to signify the second sample along the first row. It
7
Computer Vision
does not mean that these are the actual values of physical coordinates
when the image was sampled.
Fig. 1.6 shows this coordinate convention. Note that (x) ranges from
0 to (M–1) and (y) from 0 to (N–1) in integer increments.
The coordinate system in Fig. 1.6 and the preceding discussion lead
to the following representation for a digitized image:
f(x, y) =
[ f (1,0)
. . ..
f (1,1)
.. ..
f ( M−1,0 ) f ( M −1,1)
.. . . f (1 , N −1)
.. . . .. ..
.. . . f ( M−1 , N−1 )
]
The right side of this equation is a digital image by definition. Each
element of this array is called an image element, picture element or pixel.
The terms image and pixel are used throughout the rest of our discussions
to denote a digital image and its elements.
8
Computer Vision
Fig. 1.7 Illustration of the pinhole camera model, image plane in front
of the lens to simplify calculations [Courtesy of Maria
Magnusson Seger].
9
Computer Vision
therefore often convenient to fix the world coordinate system to the
object or scene.
The point where the Z axis pierces the image plane is known as the
principal point and the Z axis as the principal axis. The origin of the
image coordinate system is chosen, for now, as the principal point and its
x- and y axes are aligned with the X and Y axes of the camera coordinate
system. All of this is illustrated in Fig. 1.8.
10
Computer Vision
coordinates this projection is described by a matrix P. We’ll variously
refer to this matrix as the camera matrix or the projection matrix,
depending on which aspect we wish to emphasize.
X
x
[]
y
z = ¿¿ [ ] Y
Z
1
(1.2)
(1.3)
ximage = P Xcam
P= ¿¿ ¿¿
The camera matrix derived above assumes that the origin of the
(1.4)
11
Computer Vision
¿¿ [ ]
x Y (1.5)
[]
y
z =
Z
1
f 0 Px
K=
[ ]
0 f Py
0 0 1
(1.6)
P= ¿¿ ¿¿ (1.7)
12
Computer Vision
Emphasizing the fact that we are projecting features described in
terms of the camera coordinate system, we rewrite the projection as:
ximage = ¿¿ ¿¿ Xcam
(1.8)
The next step is to introduce the world coordinate system and relate
it to the camera coordinate system.
[X Y Z 1] T
(1.9)
Xworld=
13
Computer Vision
the two coordinate systems are related by a rotation (R) and a translation
(t), as is clear from Fig. 1.10, we may write:
R −R C
Xcam= 0
T [
1 ] []
Y
Z R t
[ ]
1 = 0T 1 Xworld
(1.11)
14
Computer Vision
Fig. 1.11 shows the extrinsic and intrinsic parameters
1.4.3
mx 0 0 f 0 Px αx 0 x0
K=
[ 0
0
my
0
0
1 ][ ] [
0 f Py
0 0 1 =
0
0
αy
0 1 ]
y 0 (1.13)
15
Computer Vision
αx s x0
K=
[ 0
0
αy
0
y0
1 ] (1.14)
¿ C RS ¿ C t S
C
TS=
[ 0T 1 ] (1.16)
16
Computer Vision
It is assumed that the world reference system [station frame {S},
OS-(XS, YS& ZS)] is a system that is fixed with the target objects, so that
the extrinsic parameters (R, t) give directly the pose of the camera with
respect to the target. Using the camera calibrator toolbox we can export
the camera parameters of (R, t & k):
C
[
−0 .002 0 .9999 0 .0136
RS = 0. 0399 −0. 0162 0 .9991
] (1.15)
3
C
tS = [−226 .1425 −166 .5716 1 . 24∗10 ] (1.16)
1.064∗103 0 0
[
−0 .8118 1 . 0658∗10 3 0
k= 258 . 3899 295 .171 1 ] (1.17)
x image 1.064∗103 0 0
[ ] [
yimage
z image
−0 .8118 1 . 0658∗10 3 0 .
= 258 . 3899 295 .171 1 ]
X world (1.18)
[ −0 .002 0 . 9999
0. 0399 −0 . 0162
0 . 0136 −166 .5716
0 . 9991 1. 24∗10
[ ]
0. 9992 0 . 0027 −0 . 0399 −226 . 1425
3 ]
Y world
Z world
1
X world
][ ]
x image 0.0106 0
[ ] [
yimage
z image
=
0
−0.0004 −2.4079
0.0107 0.0002 −1.7735
0.0026 0.0030 0 −1. 0636
Y world
0
1
(1.19)
17
Computer Vision
x image +2 . 4079 (1.20)
Xworld= 0 . 0106
Where ximage and yimage are in pixels and determined from the
designed algorithms blob analysis. So, after identifying the values
of (x, y)image in [pixels], we can calculate the (X,Y)world in [mm] of the
target objects from (1.20) and (1.21).
The proposed algorithm should guide the gripper to grasp the objects
from its centroid, so the centroid of objects should be obtained. The blob
analysis block in the Simulink software is very similar to the “region
props” function in MATLAB. They both measure a set of properties for
each connected object in an image file. The properties include area,
centroid, bounding box, major and minor axis, orientation and so on. The
details of the proposed Simulink models will be explained in the next
section. In the following sub-section three different image processing
algorithms are discussed.
18
Computer Vision
to time. Motion estimation is defined here as the estimation of the
displacement and velocity of features in image frame with respect to the
previous frame in a time sequence of 2D images.
The method tracks and estimates the velocity of the arm robot only.
It assumes all objects in the scene are rigid, no shape changes allowed.
This assumption is often relaxed to local rigidity. This assumption assures
that optical flow actually captures real motions in a scene rather than
expansions, contractions, deformations and/or shears of various scene
objects.
The optical flow methods try to calculate the motion between two
image frames which are taken at times (t) and (t + δt) at every voxel
position. These methods are called differential since they are based on
local Taylor series approximations of the image signal; that is, they
use partial derivatives with respect to the spatial and temporal
coordinates.
19
Computer Vision
Assume I (x, y, t) is the center pixel in a n×n neighborhood and
moves by δx, δy in time δt to I (x+δx, y +δy, t+δt). Since I (x, y, t) and I
(x + δx, y + δy, t + δt) are the images of the same point (and therefore the
same) we have:
20
Computer Vision
The objective of this algorithm is to identify the targeted and
interested objects and track the moving objects within a video sequence.
The tracking of the object is based on optical flows among video frames
in contrast to image background-based detection. The proposed
optical flow method is straightforward and easier to implement
and has better performance.
The idea of this project is derived from the tracking section of the
demos listed in MATLAB computer vision toolbox website. The
algorithm consists of software simulation on Simulink.
For the velocity estimation, the optical flow block (yellow block) is
used in the Simulink built in library. The optical flow block reads image
intensity value and estimate the velocity of object motion. The velocity
21
Computer Vision
estimation can be either between two images or between current frame
and Nth frame back, see Fig. 1.12.
After obtaining the velocity from the Optical Flow block, calculating
the velocity threshold is needed in order to determine what is the
minimum velocity magnitude corresponding to a moving object (green
subsystem block, see Fig. 1.12).
22
Computer Vision
The size of the matrix that represents the RGB color space is
dependent on the bitrate that is used. Matlab uses a standard bitrate
of 8 bits when an image is imported. This means that there are 256 tones
of each main color, so the size of the color space matrix is 256x256x256.
The Simulink model for this algorithm mainly consists of two parts,
which are “Identifying RGB of target objects and Gripper Label” and
“Boundary Box, Centroid Determination”. For the RGB identifying, color
analyzer program "Camtasia Studio program" is used to identify the RGB
values of objects so, a Simulink subsystem block called "RGB Filter" is
done for the proposed RGB values input see Fig. 1.16 and Fig. 1.17.
24
Computer Vision
After obtaining the RGB values from the RGB Filter block, it will be
passed to the blob analysis block in order to obtain the boundary box,
centroid for the object and the corresponding box area see Fig.1.14.
25
Computer Vision
Fig. 1.19 Subsystem for the taken median over time of each pixel
The absolute value of the difference between the whole picture and
the background is taken to eliminate negatives. Then a threshold is
established, so anything above it is in the foreground and becomes white,
anything below it is in the background and becomes black see Fig .1.20.
27
Computer Vision
Fig. 1.20 Subsystem that determines the threshold and blob analysis
Unfortunately, this software does not work perfectly due to lag. The
system takes almost a full second to get a new frame and analyze it for
background and object detection.
28
References
Solyman, A. E., Roman, M. R., Keshk, A. B., & Sharshar, K. A. Design and
Simulation of 5-DOF Vision-Based Manipulator to Increase Radiation Safety for
Industrial Cobalt-60 Irradiators. Arab Journal of Nuclear Sciences and Applications,
2016, 49(3), 250-261 [DOI: 10.13140/RG.2.2.35899.46886]. ([CITATION
Placeholder1 \l 1033 ])
Solyman, A., Roman, M., Keshk, A., & Sharshar, K. Vision Control of 5-Dof
Manipulator For Industrial Application. International Journal of Innovative
Computing Information And Control, 2016, 12(3), 735-746. ([ CITATION Sol161 \l
1033 ])
http://www.slideshare.net/sajit1975/26motion-and-feature-based-person-tracking
Zhang, Z. "A flexible new technique for camera calibration". IEEE Transactions on
Pattern Analysis and Machine Intelligence, 22(11):1330–1334.
Heikkilä, J., Silvén, O., “A Four-step Camera Calibration Procedure with Implicit
Image Correction,” in Proc. CVPR Computer Vision and Pattern Recognition, IEEE
Computer Society, 1106-1112 (1997).
S.Indu, Manjari Gupta and Prof. Asok Bhattacharyya.," Vehicle Tracking and Speed
Estimation using Optical Flow Method". , International Journal of Engineering
Science and Technology (IJEST)., Vol. 3 No. 1 Jan 2011
Barron, J.L., D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt. Performance of optical
flow techniques. CVPR, 1992.
Johan Hallenberg., Robot Tool Center Point Calibration using Computer Vision.,
Master thesis., Department of Electrical Engineering, Linkopings University, SE-581
83, Linkoping,