0% found this document useful (0 votes)
51 views29 pages

To Computer Vision: Ahmed Eslam Mohammed Solyman (M.D.)

This document provides an introduction to computer vision concepts for robotic applications. It discusses motion control strategies using computer vision to determine object positions. Digital image representation and pixel coordinates are explained. The pinhole camera model and single camera calibration are described. Image processing algorithms for motion tracking using optical flow and RGB are outlined. The goal is to use these computer vision techniques to determine object locations and orientations to control a robot for pick-and-place tasks.

Uploaded by

charan213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views29 pages

To Computer Vision: Ahmed Eslam Mohammed Solyman (M.D.)

This document provides an introduction to computer vision concepts for robotic applications. It discusses motion control strategies using computer vision to determine object positions. Digital image representation and pixel coordinates are explained. The pinhole camera model and single camera calibration are described. Image processing algorithms for motion tracking using optical flow and RGB are outlined. The goal is to use these computer vision techniques to determine object locations and orientations to control a robot for pick-and-place tasks.

Uploaded by

charan213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

INTRODUCTION

TO COMPUTER VISION
(Computer Vision and Robotics)

Prepared by:

Ahmed Eslam Mohammed Solyman (M.D.)


Assistant Lecturer- Mechanical Power Engineering
Mechatronics and Robotics
Egyptian Atomic Energy Authority
Radioisotopes production facility
Cairo, Egypt
01000480510

(February 2019)
Computer Vision

Contents

CHAPTER 1 3

1.1 Introduction 3

1.2 Motion Control Strategies 4

1.3 Digital Image Representation 6


Image Coordinates 7

1.4 Pinhole Camera Model 9


1.4.1 Central Projection in Homogeneous Coordinates 10
1.4.2 The World Coordinate System 13
1.4.3 The general camera calibration matrix K 15

1.5 Single Camera Calibration 16

1.6 Image Processing Algorithms 18


1.6.1 Motion Tracking through Optical Flow-Based 18
1.6.2 Motion Tracking through RGB-Based 23
1.6.3 Object Tracking Using Background Subtraction 25

2
Computer Vision

CHAPTER 1

COMPUTER VISION

1.1 Introduction
This chapter describes the vision-based control strategies for pick-
and-place robotic application. The software application of these strategies
is accomplished by using MATLAB/SIMULINK of MathWorks
-Company. The vision algorithms are used to identify the interest-objects
models and send their position and orientation data to the data acquisition
system then to the microcontroller which solves the inverse kinematics
and orders the robot to pick these objects to place them in a target goal.

Fig. 1.1 shows the sequence used to accomplish the above-mentioned


target. The camera captures the image/video and sends it to a PC where
the vision algorithms are set. The PC processes the image/video and
sends the position and orientation data of the objects and end effector to
the microcontroller where the robot inverse kinematics are set. The
microcontroller then sends the motion signals to the robot motors to
accomplish the target.

3
Computer Vision

Robotic Motors Camera

Microcontroller Data-Acquisition PC

Fig. 1.1 Sequence of computer vision

1.2 Motion Control Strategies


Fig. 1.2 illustrates the basic idea of using the camera to get
information about the goal position and orientation. The frame {C} is the
camera frame and the frames {B}, {S}, {G}, {W} and {T} are as
illustrated in chapter three.

{C}

{B}

{W}

{S}
{G}
α {T}

Fig. 1.2 Coordinate systems and camera frame

4
Computer Vision
The robot kinematic equation as discussed before can be written as:

B
TW = BTS . STG . (WTG=T)-1 (5.1)

Where BTS and (WTG=T)-1 are known from the physical dimensions of
B
the robot. The robot variables are included in robot matrix TW.
To get BTW, the matrix STG which gives the position and orientation of the
goal relative to the frame {S} must be determined as seen
in section 1.7.

It worth noting that the (x, y) image will be determined through


difficult image processing algorithms as discussed in the following
sections. However, determining the (X, Y) world of the object centroid in
(mm) needs a good calibration process see section 1.5.

The above mentioned approach can be illustrated in the block


diagram shown in Fig. 1.3.

As can be seen, the above strategy is considered an open-loop


Object Data Picture x, y
Camera Image Processing Pixel Camera Calibration

X, Y
mm
End-effector Position Robot Variables
d1, θ2, θ3, θ4, θ5
Robot
Robot Inverse Kinematics

Fig. 1.3errors
control system. Certain Open-loop
can control blockdifferent
arise from diagram sources like
inverse kinematic inaccuracy, robot precision and camera calibration.

Another strategy based on a closed-loop feedback signal is going to be


tested in the current research. The algorithm depends on capturing the
position of the end-effector and the pencil position and orientation (goal)

5
Computer Vision
and developing an algorithm to make the end effector to coincide with the
goal. The proposed algorithm in this case can be represented in the block
diagram shown in Fig. 1.4.

Object Data PictureImage Processingx, Feedback


y Algorithm
Camera Pixel

Error
Video Video Processing x, y
Camera Pixel Camera Calibration

mm
X, Y
End-effector Position Robot Variables
d1, θ2, θ3, θ4, θ5
Robot Inverse Kinematics
Robot

In the previous chapters we have already discussed the robot and the
Fig. 1.4 Closed-loop control block diagram
inverse kinematics (the green blocks in Fig. 1.3 and Fig. 1.4) in this
chapter the camera calibration, image processing and feedback algorithm
are discussed (the red blocks in Fig. 1.3 and Fig. 1.4).

1.3 Digital Image Representation


Digital image consists of a finite set of values called picture
elements or pixels for short. These pixels are arranged in a regular grid
(or raster) of rows and columns, and so it can be useful to think of an
image as a matrix. Every pixel in a greyscale image (also called an
intensity image) is an 8-bit unsigned integer, meaning that it can have an
integer value between 0 and 255. A value of 0 corresponds to pitch black,
a value of 255 to pure white, and values between these extremes produce
various grey levels between black and white.
A color image is also stored as a raster of pixels. Every pixel is now
represented by three integer values between 0 and 255: one for red, one
for green and one for blue. These three primary intensities are added to
6
Computer Vision
reproduce a certain color on the screen, and this commonly used way of
representing color is called the RGB color scheme see Fig. 1.5.

Fig. 1.5 Examples of RGB color values [0:255].

An image may be defined as a two-dimensional function, f(x, y),


where (x) and (y) are spatial (plane) coordinates, and the amplitude of at
any pair of coordinates (x, y) is called the Intensity or gray level of the
image at that point. When (x), (y), and the amplitude values of (f) are all
finite, discrete quantities, we call the image a digital image. The field of
digital image processing refers to processing digital images by means of a
digital computer. Note that a digital image is composed of a finite number
of elements, each of which has a particular location and value. These
elements are referred to as picture elements, image elements, and pixels.
Pixel is the term most widely used to denote the elements of a digital
image.

Image Coordinates
Assume that an image f(x, y) is sampled so that the resulting image
has M rows and N columns so, the image is of size M * N. The values of
the coordinates are discrete quantities. For notational clarity and
convenience, we shall use integer values for these discrete coordinates.
The image origin is usually defined to be at (x, y) = (0, 0). The next
coordinate values along the first row of the image are (x, y) = (0, 1). The
notation (0, 1) is used to signify the second sample along the first row. It

7
Computer Vision
does not mean that these are the actual values of physical coordinates
when the image was sampled.
Fig. 1.6 shows this coordinate convention. Note that (x) ranges from
0 to (M–1) and (y) from 0 to (N–1) in integer increments.

Fig. 1.6 Digital image coordinate conventions.

The coordinate system in Fig. 1.6 and the preceding discussion lead
to the following representation for a digitized image:

f (0,0) f (0,1) .. . . f (0 , N −1)

f(x, y) =
[ f (1,0)
. . ..
f (1,1)
.. ..
f ( M−1,0 ) f ( M −1,1)
.. . . f (1 , N −1)
.. . . .. ..
.. . . f ( M−1 , N−1 )
]
The right side of this equation is a digital image by definition. Each
element of this array is called an image element, picture element or pixel.
The terms image and pixel are used throughout the rest of our discussions
to denote a digital image and its elements.

8
Computer Vision

1.4 Pinhole Camera Model


Here we develop a basic pinhole camera model. The model performs
well as long as the lens is thin and no wide-angle lens is used. In practice
the image plane is located behind the lens, but to simplify calculations
and relations between the coordinate systems, the image plane can be put
in front of the lens. Fig. 1.7 illustrates the pinhole camera model with the
image plane located in front of the lens.

Fig. 1.7 Illustration of the pinhole camera model, image plane in front
of the lens to simplify calculations [Courtesy of Maria
Magnusson Seger].

The hardest part of this model is keeping track of the different


coordinate systems. Before we continue, we have to define an image-
coordinate system in the image plane of the camera. In most electro-
optical cameras, this image plane is defined by the sensor plane. This is
centered at the focus of the camera with its X and Y axes aligned parallel
to, and the Z axis perpendicular to, the image plane see Fig. 1.8.

The image plane is also provided with a coordinate system to record


the position of features on the image. In practice of course, positions will
be measured in pixel coordinates, so ultimately we’ll have to make
provision to measure in pixel coordinates. The object or scene to be
captured is described in terms of a world coordinate system. It is

9
Computer Vision
therefore often convenient to fix the world coordinate system to the
object or scene.

This section is about deriving the relationships between these


coordinate systems. Written in homogeneous coordinates it is a linear
relationship expressed in terms of a matrix called the camera matrix.

1.4.1 Central Projection in Homogeneous Coordinates


Here we consider the central projection of a point X =[X Y Z] T in
the camera coordinate system, with origin at the camera center C, onto
the image plane. The image plane is located at Z = f in the camera
coordinate system where f is known as the focal length of the camera.

The point where the Z axis pierces the image plane is known as the
principal point and the Z axis as the principal axis. The origin of the
image coordinate system is chosen, for now, as the principal point and its
x- and y axes are aligned with the X and Y axes of the camera coordinate
system. All of this is illustrated in Fig. 1.8.

Fig. 1.8 Illustrating basic camera geometry.

If a point X has coordinates [X Y Z]T relative to the camera


coordinate system, X projects onto the point x on the image plane, with C
the center of the projection, as in Fig. 1.8. Using homogeneous

10
Computer Vision
coordinates this projection is described by a matrix P. We’ll variously
refer to this matrix as the camera matrix or the projection matrix,
depending on which aspect we wish to emphasize.

The map from homogeneous camera coordinates to homogeneous


image coordinates is given by:

X
x

[]
y
z = ¿¿ [ ] Y
Z
1
(1.2)

Using this notation, we can describe the central projection


from Xcam to ximage as:

(1.3)
ximage = P Xcam

Where P is the [3×4] homogeneous camera projection matrix. This


defines the camera matrix for the central projection as:

P= ¿¿ ¿¿
The camera matrix derived above assumes that the origin of the
(1.4)

image coordinate system is at the principal point p. However, this is not


usually the case in practice. If the coordinates of the principal point (P)
are (px, py) in the image coordinate system, see Fig. 1.9.

11
Computer Vision

Fig. 1.9 Illustrating camera geometry with offset image coordinates.

From Fig. 1.8 the mapping of Xcam to ximage is given by:

¿¿ [ ]
x Y (1.5)

[]
y
z =
Z
1

Then the camera calibration matrix is:

f 0 Px

K=
[ ]
0 f Py
0 0 1
(1.6)

The camera matrix P is given by:

P= ¿¿ ¿¿ (1.7)

12
Computer Vision
Emphasizing the fact that we are projecting features described in
terms of the camera coordinate system, we rewrite the projection as:

ximage = ¿¿ ¿¿ Xcam
(1.8)

The next step is to introduce the world coordinate system and relate
it to the camera coordinate system.

1.4.2 The World Coordinate System


In general, 3D objects are described in terms of coordinate systems
fixed to the objects as shown in Fig. 1.10. In homogeneous coordinates it
is given by:

[X Y Z 1] T
(1.9)
Xworld=

Fig. 1.10 Camera geometry in a general world coordinate system.

Since we already know how to project a feature in the camera


coordinate system onto the image coordinate system, we only need to
relate the world and camera coordinate systems, i.e. Xworld and Xcam. Since

13
Computer Vision
the two coordinate systems are related by a rotation (R) and a translation
(t), as is clear from Fig. 1.10, we may write:

Xcam= R(Xworld – C) = RXworld + t (1.10)

The Euclidean vector C in (1.11) is the coordinates of the camera


center in the world coordinate system, and the parameters (R, t) are called
the extrinsic parameters, that is the rotation and translation which relates
the world coordinate system to the camera coordinate system
see Fig. 1.10. From (1.11) the translation matrix t is equal to (–RC). Also
note that Xcam = 0 if Xworld= C i.e. the camera coordinate is zero at the
camera center, as expected. From (1.9) we can write:

R −R C
Xcam= 0
T [
1 ] []
Y
Z R t
[ ]
1 = 0T 1 Xworld
(1.11)

Combining (1.7) and (1.9) with (1.12), we get:


(1.12)
ximage= K [R t] Xworld
Where (K) is camera intrinsic matrix and Xworld is now given in the
world coordinate system. Note that all the parameters that refer to the
specific type of camera are contained in (K); these parameters are
referred to as the intrinsic parameters. (R) & (t) describe the external
orientation of the world coordinate system to the camera coordinate
system and are therefore referred to as the extrinsic parameters
see Fig. 1.11.

14
Computer Vision
Fig. 1.11 shows the extrinsic and intrinsic parameters

1.4.3

The general camera calibration matrix K


In the models above we assume that the image coordinate frame is
Euclidean with equal scales in both axial directions, which is not always
true. In particular, if the number of pixels per unit distance in image
coordinates are mx and my in the x and y directions, respectively, then the
calibration matrix becomes:

mx 0 0 f 0 Px αx 0 x0

K=
[ 0
0
my
0
0
1 ][ ] [
0 f Py
0 0 1 =
0
0
αy
0 1 ]
y 0 (1.13)

Where αx = f mx and αy = f my represent the focal length of the


camera in terms of pixel coordinates in the x and y directions,
respectively. Similarly, (x0, y0) is the principal point in terms of pixel
coordinates with x0 = mx px and y0 = my py.

For added generality, we use a calibration matrix of the form:

15
Computer Vision

αx s x0

K=
[ 0
0
αy
0
y0
1 ] (1.14)

Where the added parameter s is referred to as the skew parameter.


The skew parameter will be zero for most normal cameras.

1.5 Single Camera Calibration


Using a position based method; a 3D camera calibration is required
in order to map the 2D data of the image features to the Cartesian space
data. This is to say that intrinsic and extrinsic parameters of the camera
must be evaluated. Intrinsic parameters depend exclusively on the optical
characteristics, e.g. lens and CCD sensor properties. The calibration of
intrinsic parameters can be operated offline in the case that optical setup
is fixed during the operative tasks of the robot. Extrinsic parameters
indicate the relative pose of the camera reference system O C-(XC, YC&
ZC) with respect to a generic world reference system. It is assumed that
the world reference system OS-(XS, YS& ZS) is a system that is fixed with
the target objects see Fig. 1.2, so that the extrinsic parameters give
directly the pose of the camera with respect to the target. The extrinsic
parameters matrix coincides with the homogeneous transformation
between the camera and the object reference systems:

¿ C RS ¿ C t S
C
TS=
[ 0T 1 ] (1.16)

We can use the camera calibrator toolbox of Simulink to estimate


camera intrinsic, extrinsic, and lens distortion parameters. These camera
parameters can be used for various computer vision applications. These
applications include removing the effects of lens distortion from an
image, measuring planar objects, or 3-D reconstruction from multiple
cameras.

16
Computer Vision
It is assumed that the world reference system [station frame {S},
OS-(XS, YS& ZS)] is a system that is fixed with the target objects, so that
the extrinsic parameters (R, t) give directly the pose of the camera with
respect to the target. Using the camera calibrator toolbox we can export
the camera parameters of (R, t & k):

0. 9992 0 .0027 −0. 0399

C
[
−0 .002 0 .9999 0 .0136
RS = 0. 0399 −0. 0162 0 .9991
] (1.15)

3
C
tS = [−226 .1425 −166 .5716 1 . 24∗10 ] (1.16)

1.064∗103 0 0

[
−0 .8118 1 . 0658∗10 3 0
k= 258 . 3899 295 .171 1 ] (1.17)

Applying to (1.12), then we can write:

x image 1.064∗103 0 0

[ ] [
yimage
z image
−0 .8118 1 . 0658∗10 3 0 .
= 258 . 3899 295 .171 1 ]
X world (1.18)

[ −0 .002 0 . 9999
0. 0399 −0 . 0162
0 . 0136 −166 .5716
0 . 9991 1. 24∗10
[ ]
0. 9992 0 . 0027 −0 . 0399 −226 . 1425

3 ]
Y world
Z world
1

X world

][ ]
x image 0.0106 0
[ ] [
yimage
z image
=
0
−0.0004 −2.4079
0.0107 0.0002 −1.7735
0.0026 0.0030 0 −1. 0636
Y world
0
1
(1.19)

Solving (1.19), we can get:

17
Computer Vision
x image +2 . 4079 (1.20)
Xworld= 0 . 0106

y image +1 . 7735 (1.21)


Yworld= 0 .0107

Where ximage and yimage are in pixels and determined from the
designed algorithms blob analysis. So, after identifying the values
of (x, y)image in [pixels], we can calculate the (X,Y)world in [mm] of the
target objects from (1.20) and (1.21).

1.6 Image Processing Algorithms


Image processing algorithms start with a given picture or video and
ends up with useful information about a certain object like its position
and orientation or object number and so on. In the present study the
pencil position and orientation and the end effector position are
considered the data to be determined by image processing algorithms.

The proposed algorithm should guide the gripper to grasp the objects
from its centroid, so the centroid of objects should be obtained. The blob
analysis block in the Simulink software is very similar to the “region
props” function in MATLAB. They both measure a set of properties for
each connected object in an image file. The properties include area,
centroid, bounding box, major and minor axis, orientation and so on. The
details of the proposed Simulink models will be explained in the next
section. In the following sub-section three different image processing
algorithms are discussed.

1.6.1 Motion Tracking through Optical Flow-Based


A great deal of information can be extracted by recording time-
varying image sequences using a fixed camera. An image sequence (or
video) is a series of 2-D images that are sequentially ordered with respect

18
Computer Vision
to time. Motion estimation is defined here as the estimation of the
displacement and velocity of features in image frame with respect to the
previous frame in a time sequence of 2D images.

The method tracks and estimates the velocity of the arm robot only.
It assumes all objects in the scene are rigid, no shape changes allowed.
This assumption is often relaxed to local rigidity. This assumption assures
that optical flow actually captures real motions in a scene rather than
expansions, contractions, deformations and/or shears of various scene
objects.

Optical flow is the distribution of apparent velocities of movement


of brightness patterns in an image. Optical flow can arise from relative
motion of objects and the viewer. Consequently, optical flow can give
important information about spatial arrangement of the objects viewed
and the rate of change of this arrangement. As known earlier,
computation of differential optical flow is, essentially, a two-step
procedure:

a. Measure the spatio-temporal intensity derivatives (which are


equivalent to measuring the velocities normal to the local intensity
structures).
b. Integrate normal velocities into full velocities, for example, either
locally via a least squares calculation or globally via regularization.

The optical flow methods try to calculate the motion between two
image frames which are taken at times (t) and (t + δt) at every voxel
position. These methods are called differential since they are based on
local Taylor series approximations of the image signal; that is, they
use partial derivatives with respect to the spatial and temporal
coordinates.

19
Computer Vision
Assume I (x, y, t) is the center pixel in a n×n neighborhood and
moves by δx, δy in time δt to I (x+δx, y +δy, t+δt). Since I (x, y, t) and I
(x + δx, y + δy, t + δt) are the images of the same point (and therefore the
same) we have:

I (x, y, t) = I (x + δx, y + δy, t + δt) (1.21)

Solving (1.21) using Horn-Schunck method gives:

IxVx + IyVy = − It (1.22)

Where, Ix, Iy, It are intensity derivative in x, y, t respectively and V x,


Vy are the x and y components of the velocity or optical flow of I(x, y, t).

The video sequence is captured using a fixed camera. The Optical


Flow block using the Horn – Schunck algorithm (1981) estimates the
direction and speed of object motion from one video frame to another and
returns a matrix of velocity components. Various image processing
techniques such as thresholding, median filtering are then sequentially
applied to obtain labeled regions for statistical analysis.

Thresholding is the simplest method of image segmentation. The


process of thresholding returns a threshold image differentiating the
objects in motion (in white) and static background (in black). More
precisely, it is the process of assigning a label to every pixel
in an image such that pixels with the same label share certain
visual characteristics.

A Median Filter is then applied to remove salt and pepper noise


from the threshold image without significantly reducing the sharpness of
the image Median filtering is a simple and very effective noise removal
filtering process and an excellent filter for eliminating intensity
spikes.

20
Computer Vision
The objective of this algorithm is to identify the targeted and
interested objects and track the moving objects within a video sequence.
The tracking of the object is based on optical flows among video frames
in contrast to image background-based detection. The proposed
optical flow method is straightforward and easier to implement
and has better performance.

The idea of this project is derived from the tracking section of the
demos listed in MATLAB computer vision toolbox website. The
algorithm consists of software simulation on Simulink.

The Simulink model for this algorithm mainly consists of


three parts, which are “Velocity Estimation (yellow block)”, “Velocity
Threshold Calculation (green block)” and “Blob analysis (Centroid
Determination) (red block)”, see Fig.1.12.

For the velocity estimation, the optical flow block (yellow block) is
used in the Simulink built in library. The optical flow block reads image
intensity value and estimate the velocity of object motion. The velocity

21
Computer Vision
estimation can be either between two images or between current frame
and Nth frame back, see Fig. 1.12.

After obtaining the velocity from the Optical Flow block, calculating
the velocity threshold is needed in order to determine what is the
minimum velocity magnitude corresponding to a moving object (green
subsystem block, see Fig. 1.12).

The velocity threshold can be obtained as Fig. 1.13 by firstly getting


the mean velocity value across frame and across time by passing the
velocity through couple mean blocks (orange blocks).

After that, a comparison of the input velocity with mean velocity


value will be done using relational operator block (gray block). If the
input velocity is greater than the mean value, it will be mapped to one and
zero otherwise. The output of this comparison became a threshold
intensity matrix and passed to a median filter block (green block) &
closing block (yellow block) to remove noise, see Fig. 1.13.

Fig. 1.13 Subsystem that determines the velocity threshold


After segmenting the moving object from the background of the
image, it will be passed to the blob analysis block (red block, see
Fig. 1.12) in order to obtain the boundary box, centroid for the object and
the corresponding box area see Fig. 1.14.

22
Computer Vision

Fig. 1.14 Subsystem that determines the blob analysis box

1.6.2 Motion Tracking through RGB-Based


In the RGB color space, each color is described as a combination
of three main colors, namely Red, Green and Blue. This color space can
be visualized as a 3d matrix with the main colors set out on the axis. The
values for the main colors vary from 0 to 1. Each color is coded with
three values, a value for red, blue and green. In this color space, an
imported image on a computer is thus transformed into 3 matrices with
values per pixel for the representing main color (Fig. 1.15).

Fig. 1.15 RGB color space model


23
Computer Vision

The size of the matrix that represents the RGB color space is
dependent on the bitrate that is used. Matlab uses a standard bitrate
of 8 bits when an image is imported. This means that there are 256 tones
of each main color, so the size of the color space matrix is 256x256x256.

A 3D region inside this matrix must be defined to indicate a


particular color. This can be done by intuition, but it is a lot easier when
the colors are visualized. Because of the fact that colors in the RGB space
depend on three variables, a 2D image is not sufficient to visualize all
colors. The color definition can be done with two 2D images, but that is
still very difficult.

The objective of this algorithm is to identify the target objects and


track the moving ones within a video sequence due to its colors. The
tracking of the gripper is based on its label color. The proposed RGB
algorithm is straightforward and easier to implement and has better
performance. This algorithm consists of software simulation on Simulink.

The Simulink model for this algorithm mainly consists of two parts,
which are “Identifying RGB of target objects and Gripper Label” and
“Boundary Box, Centroid Determination”. For the RGB identifying, color
analyzer program "Camtasia Studio program" is used to identify the RGB
values of objects so, a Simulink subsystem block called "RGB Filter" is
done for the proposed RGB values input see Fig. 1.16 and Fig. 1.17.

24
Computer Vision

Fig. 1.16 RGB-Based Simulink block diagram

Fig. 1.17 Subsystem that has the RGB values of objects

After obtaining the RGB values from the RGB Filter block, it will be
passed to the blob analysis block in order to obtain the boundary box,
centroid for the object and the corresponding box area see Fig.1.14.

25
Computer Vision

1.6.3 Object Tracking Using Background Subtraction


Background subtraction, also known as Foreground Detection, is a
technique in the fields of image processing and computer vision wherein
an image’s foreground is extracted for further processing (object
recognition etc.). Generally an image’s regions of interest are objects
(humans, cars, text etc.) in its foreground. Background subtraction is a
widely used approach for detecting moving objects in videos from static
cameras. The rationale in the approach is that of detecting the moving
objects from the difference between the current frame and a reference
frame, often called “background image”, or “background model.

Background subtraction is particularly a commonly used technique


for motion segmentation in static images. It will detect moving regions by
subtracting the current image pixel by pixel from a reference
background image that is created by averaging images over time in an
initialization period. The basic idea of background subtraction method is
to initialize a background firstly, and then by subtracting current frame in
which the moving object present that current frame is subtracted with
background frame to detect moving object. This method is simple and
easy to realize, and accurately extracts the characteristics of target data,
but it is sensitive to the change of external environment, so it is
applicable to the condition that the background is known.

This method of object detection involves finding the background


and tracking what is not a part of it. A median over time of each pixel of
the video is taken. Then subtract the background from the image to get
what remains in the foreground. Technically, what we get is not the
image of the foreground, but the difference between the background and
the foreground. This unfortunately means that if an object in the
foreground is visually close enough to the background, this method will
not detect it see Fig.1.18 and Fig.1.19.
26
Computer Vision

Fig. 1.18 Background estimation Simulink block diagram

Fig. 1.19 Subsystem for the taken median over time of each pixel

The absolute value of the difference between the whole picture and
the background is taken to eliminate negatives. Then a threshold is
established, so anything above it is in the foreground and becomes white,
anything below it is in the background and becomes black see Fig .1.20.

27
Computer Vision

Fig. 1.20 Subsystem that determines the threshold and blob analysis

Simulink can perform blob detection on the white objects and


determine the points needed to draw a rectangle around them see
Fig. 1.14.

Unfortunately, this software does not work perfectly due to lag. The
system takes almost a full second to get a new frame and analyze it for
background and object detection.

After applying the three software algorithms on the designed robot


to compare between them and choose which is better. It's noted that the
RGB-based algorithm had a better response and results.

28
References
Solyman, A. E., Roman, M. R., Keshk, A. B., & Sharshar, K. A. Design and
Simulation of 5-DOF Vision-Based Manipulator to Increase Radiation Safety for
Industrial Cobalt-60 Irradiators. Arab Journal of Nuclear Sciences and Applications,
2016, 49(3), 250-261 [DOI: 10.13140/RG.2.2.35899.46886]. ([CITATION
Placeholder1 \l 1033 ])

Solyman, A., Roman, M., Keshk, A., & Sharshar, K. Vision Control of 5-Dof
Manipulator For Industrial Application. International Journal of Innovative
Computing Information And Control, 2016, 12(3), 735-746. ([ CITATION Sol161 \l
1033 ])

http://www.slideshare.net/sajit1975/26motion-and-feature-based-person-tracking

Willie Brink. COMPUTER VISION, Applied Mathematics, Stellenbosch University,


July 2013

Rafael C. Gonzalez, Richard E. Woods and Steven L. Eddins. Digital Image


Processing Using MATLAB, Gatesmark Publishing,2nd, 2009

Zhang, Z. "A flexible new technique for camera calibration". IEEE Transactions on
Pattern Analysis and Machine Intelligence, 22(11):1330–1334.

Heikkilä, J., Silvén, O., “A Four-step Camera Calibration Procedure with Implicit
Image Correction,” in Proc. CVPR Computer Vision and Pattern Recognition, IEEE
Computer Society, 1106-1112 (1997).

S.Indu, Manjari Gupta and Prof. Asok Bhattacharyya.," Vehicle Tracking and Speed
Estimation using Optical Flow Method". , International Journal of Engineering
Science and Technology (IJEST)., Vol. 3 No. 1 Jan 2011

B.K.P. Horn, B.G. Schunk, ‘Determining Optical Flow’,Artificial Intelligence, Vol. 2,


1981, pp 185-203.

Barron, J.L., D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt. Performance of optical
flow techniques. CVPR, 1992.

Johan Hallenberg., Robot Tool Center Point Calibration using Computer Vision.,
Master thesis., Department of Electrical Engineering, Linkopings University, SE-581
83, Linkoping,

You might also like