0% found this document useful (0 votes)
19 views14 pages

Vision-Based Pick and Place Control System For Industrial Robots Using An Eye-in-Hand Camera

This paper presents a vision-based pick-and-place control system for industrial robots using a low-cost 2D camera and deep learning algorithms, specifically YOLOv7 and GAN, mounted on the robot's end-effector. The system addresses challenges in coordinate system alignment and data accuracy, demonstrating effectiveness through both simulation and real-world experiments. The research contributes to advancements in industrial robotics by showcasing a cost-effective solution for real-time object manipulation and recognition.

Uploaded by

akshayexodusedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views14 pages

Vision-Based Pick and Place Control System For Industrial Robots Using An Eye-in-Hand Camera

This paper presents a vision-based pick-and-place control system for industrial robots using a low-cost 2D camera and deep learning algorithms, specifically YOLOv7 and GAN, mounted on the robot's end-effector. The system addresses challenges in coordinate system alignment and data accuracy, demonstrating effectiveness through both simulation and real-world experiments. The research contributes to advancements in industrial robotics by showcasing a cost-effective solution for real-time object manipulation and recognition.

Uploaded by

akshayexodusedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 29 December 2024, accepted 21 January 2025, date of publication 29 January 2025, date of current version 10 February 2025.

Digital Object Identifier 10.1109/ACCESS.2025.3536496

Vision-Based Pick and Place Control System for


Industrial Robots Using an Eye-in-Hand Camera
VAN-TRUONG NGUYEN 1 , PHU-TUAN NGUYEN1 , SHUN-FENG SU 2, (Fellow, IEEE),
PHAN XUAN TAN 3 , (Member, IEEE), AND THANH-LAM BUI1
1 Faculty of Mechatronics, SMAE, Hanoi University of Industry, Hanoi 11900, Vietnam
2 Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan
3 College of Engineering, Shibaura Institute of Technology, Tokyo 135-8548, Japan
Corresponding authors: Van-Truong Nguyen ([email protected]) and Phan Xuan Tan ([email protected])
This work was supported by the Vingroup Innovation Foundation (VINIF) under Project VINIF.2023.DA089.

ABSTRACT In this paper, we present a vision-based pick-and-place control system for industrial robots
using an eye-in-hand camera. In industry, using robots with cameras greatly improves efficiency and
performance. Previous studies have focused on utilizing robotic arms for the pick-and-place process in
simulated environments. The challenge when experimenting with real systems lies in aligning the coordinate
systems between the robot and the camera, as well as ensuring high data accuracy during experimentation.
To address this issue, our research focuses on utilizing a low-cost 2D camera combined with deep learning
algorithms mounted on the end-effector of the robotic arm. This study is evaluated in both simulation
and real-world experiments. We propose a novel approach that combines the YOLOv7 (You Only Look
Once V7) deep learning network with GAN (Generative Adversarial Networks) to achieve fast and accurate
object recognition. This system uses deep learning to process camera data to extract object positions for
the robot in real-time. Due to its advantages of fast inference and high accuracy, YOLO is applied as the
baseline for research. By training the deep learning model on diverse objects, it effectively recognizes and
detects any object in the robot’s workspace. Through experimental results, we demonstrate the feasibility and
effectiveness of our vision-based pick-and-place system. Our research contributes an important advancement
in the field of industrial robots by showcasing the potential of using a 2D camera and an integrated deep
learning system for object manipulation.

INDEX TERMS Robotic arm, vision, object detection, calibration vision, robot real-time.

I. INTRODUCTION of robotic arms with vision systems still presents challenges


Robotic arms have been utilized for several years, mostly in terms of object recognition accuracy and difficulties
in manufacturing facilities. Robotic arms are extensively in communication between vision and robot. Alternatively,
utilized in the manufacturing and assembly processes of industrial cameras could be adopted. Industrial cameras
products, artificial intelligence, and automation. The basic help the system operate with better efficiency. Robotic
function of a robotic arm is moving an object from the picking systems integrated with industrial cameras often require
position to the target position [1], [2], [3]. To complete more standard accompanying equipment and complex operational
complex tasks, the robotic arm needs to receive information procedures. Therefore, they are often very expensive. In prac-
from vision sensors [4], [5]. The information from a vision tice, there have been solutions utilizing robotic arms for
sensor typically includes the position and direction of all the pick-and-place process. However, these solutions are
items on a conveyor belt [6], [7]. However, the integration often limited to simulations, resulting in unclear practical
applicability [8], [9]. Therefore, in this study, a solution
The associate editor coordinating the review of this manuscript and using an eye-in-hand camera configuration mounted on the
approving it for publication was Yingxiang Liu . robotic arm’s end-effector is proposed. The proposed solution
2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 13, 2025 For more information, see https://creativecommons.org/licenses/by/4.0/ 25127
V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

integrates YOLOv7 with GAN to enhance both accuracy and robotic arm [21] The robotic arm is capable of performing
diversity in object detection tasks. Additionally, a method pick-and-place tasks with various objects, whether the objects
for calibrating the camera and robot coordinate system using are from a predefined dataset or previously unknown.
a square chessboard pattern is proposed to synchronize the Nowadays, artificial intelligence (AI) is becoming popular
system’s coordinate frames. and developing. The quick advancement of deep learning [22]
In the literature review [10], the eye-hand model for has prompted researchers to undertake several studies
a robot arm could be divided into 5 types: monocular pertaining to object categorization and localisation [23].
eye-in-hand, monocular eye-to-hand, stereo eye-in-hand, Choi et al. [24] utilized CNN (Convolutional Neural
stereo eye-to-hand, and hybrid multi-camera, respectively. Network) to detect various objects held by a robotic
In comparison to fixed cameras, the eye-in-hand camera arm. Mohamed et al. [25] applied faster R-CNN (Regions
structure offers greater flexibility in inspection and assembly with Convolutional Neural Network features) based on 2D
tasks [11]. The eye-in-hand visual servoing structure with rangefinder data to detect and localize objects. In their study,
a camera mounted at the end of the robot arm offers T. Ye et al. [26] introduced a framework for detecting objects
many advantages when grasping stationary or moving using UAVs (unmanned aerial vehicles) in infrared pictures
objects. However, with the eye-to-hand structure, it is easy and video. The feature is derived from the ground object,
to see objects clearly and capture images from different and object recognition is performed using the enhanced
angles as the robot executes its movement tasks [12]. YOLOv5s model [27], [28]. Experiments showed the ability
Papanikolopoulos et al. [13] used a camera mounted on the to detect and accurately locate various objects. However,
robotic arm for observing moving objects with known depth these methods focus on image processing and have not been
information. The study has presented an important method integrated into industrial robots. In industrial pick-and-place
for monitoring moving objects using cameras. However, systems, 3D cameras or 3D scanners [29], [30] are commonly
the system utilizing industrial cameras is often expensive utilised. The system requires a complex 3D reconstruction
and still heavily reliant on lighting conditions and image process, which involves the integration of depth sensors
noise. Kijdech et al. [14] propose a solution involving a into the testing device, thereby increasing both the cost
robotic arm and an RGB-D camera with an eye-in-hand and complexity of the system [31]. Thus, for compatibility
configuration. This system uses a camera mounted on the with industrial robotic systems, image data processing and
robot’s end-effector in conjunction with YOLOv5 for pick- communication with robots are necessary [32], [33].
and-place tasks. The study employs an expensive RGB-D Unlike previous studies, this research is evaluated on both
camera combined with YOLOv5 to determine the coordinates simulated and real-world systems. The primary objective
of objects in real-time, achieving an accuracy rate of of this paper is to propose a comprehensive solution for
90% - 95%. FS. Hameed and colleagues [15] proposed a accurately picking and placing objects using an eye-in-
method for object recognition and grasp orientation for a hand configuration mounted on a robotic arm. Therefore,
robotic arm using a 2D camera mounted on the robot’s end- this research proposes a stereo eye-in-hand pick-and-place
effector. This study utilizes a low-cost camera to identify system that, using a low-cost 2D camera, validates the system
object coordinates and grasp orientation for the robot. in both simulation and real-world dynamic environments.
However, the method has not yet been tested on a real- However, validating in real-world dynamic environments
world system, and challenges in real-time coordinate system is extremely challenging: many objects cannot be covered
synchronization between the camera and the robot remain by the training dataset; synchronize between the robot’s
unresolved. Ishak et al. [17] proposed using the eye-in- coordinate system and the one of the camera. Thus, our
hand structure to grasp a stationary object in real-time. proposed system integrates a combination of YOLOv7
However, the application of reclassification algorithms for and GAN for effective data generation, appropriate real-
the camera has not been implemented. Robots working in time calibration using a chessboard pattern, and real-time
complex environments require supervision from a visual and precise synchronization. In this setup, our system
control system. Xingjian Liu et al. [18] proposed a calibration can effectively coordinate object recognition and position
method between the robot and vision coordinate systems. determination. The proposed method uses a YOLOv7 deep
The eye-in-hand configuration, with a scanner mounted learning network combined with a GAN [34], which can
on the end effector of the robotic arm, was employed, identify objects with high speed and accuracy. The YOLOv7
resulting in high accuracy and efficiency. This system algorithm is chosen for its superior performance and accuracy
uses a camera attached to the end of the robot arm in real-time object detection compared to its predecessors.
to monitor movements in the work area [19], [20]. The The GAN helps to generate more realistic and diverse data
experiments demonstrate the rationality of the vision system, for training image sorting algorithms [35], [36], [37]. The
but these methods have some limitations: using expensive YOLO algorithm is a standard network design across the
cameras, the systems are often validated in simulation. entire process. This algorithm is more simple compared to
Additionally, in order for a robot to be able to recognise the R-CNN algorithm [38], [39]. The data on the position and
unfamiliar objects, it is necessary to utilize suitable image orientation (x-coordinate, y-coordinate, and rotation angle
processing algorithms that can be integrated with the around the Z-axis) of objects are taken from the camera and
25128 VOLUME 13, 2025
V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

converted into poses for the robot arm. The problem actually
is in matching the coordinate systems of the camera and
the robot. The proposed method uses the camera calibration
algorithm with a chessboard structure consisting of square
cells measuring 11 × 16. The parameters obtained after
calibration are computed to align the coordinate systems of
the camera and the robot. The coordinates of the feature
points on the chessboard relative to the robot’s origin will be
calculated and determined based on the end-effector. Through
matrix transformations and calculations, all coordinates are
synchronised to the robot’s reference coordinate system.
This research has achieved a breakthrough in the industrial
world when the robotic arm uses only a 2D camera and an
FIGURE 1. Overall system architecture diagram.
integrated deep learning system to pick and place objects.
This research paves the way for potential developments in
applying various deep learning and reinforcement learning
models to robotic arms. The proposed system is able to pick-and-place tasks. Firstly, we need to determine the object
identify and detect any object in the workspace and execute type on the conveyor. Then, we use a deep learning network to
the necessary operations, effectively performing pick-and- generate an estimate of the object’s 3D position. If there are
place tasks. This system demonstrates superior performance multiple objects with different orientations and placements,
in comparison with baselines in simulation and shows an the task becomes more complex. The data from the camera
effective performance in real-world validation. updates the object’s status to the controller center, which
Our main contributions can be summarized into four folds: enables the Doosan robot to pick and place objects quickly
and precisely.
• Low-cost system: The use of a 2D camera significantly
reduces the cost of the system while maintaining high A. HARDWARE STRUCTURE
accuracy for pick-and-place operations. The system used in this study includes a 6-DOF robotic arm,
• Integration of Yolov7 and GAN: The proposed method a camera mounted on the end-effector, a conveyor module,
combines the Yolov7 deep learning network with GAN and sensors controlled by a programmable logic controller
to generate more diverse and realistic data for the robot (PLC). Figure 2 illustrates the actual structure of the system
to work in real time. This allows for effective data we constructed.
generation and high accuracy in object detection, even The robotic arm operates within a circular work plane with
with limited real-world data. a diameter of approximately 1.8 meters. The working distance
• Calibration and Synchronization: To use the system in between the focal point of the robot arm and the center of the
a real-world environment, we introduce an appropriate camera is 779 mm, as shown in Figure 3. The maximum load
calibration method using a chessboard pattern to ensure the robot can handle within its workspace is 6 kg, and this
real-time synchronization between the robot and camera varies according to the distance from the center of gravity.
coordinate systems. This allows the system to perform The arm’s design and working parameters simplify its ability
pick-and-place tasks in real time with high accuracy and to move and perform tasks in 3-dimensional space.
efficiency. We used a conveyor and sensors to position objects for
• Real-world validation: Simulation testing and evaluation optimal detection and classification. The sensor sends a
of the system’s performance were conducted. To ensure signal to stop the conveyor when an object reaches its
clarity in the research, both evaluations and experimen- position, and the robotic arm then proceeds to recognize and
tal runs were carried out on the actual system. The classify the object. For communication between the conveyor,
product sorting system uses vision and a robotic arm to sensors, and Doosan robot, we chose the Modbus TCP/IP
operate in real-time with a sorting efficiency of 220-250 industrial communication protocol. This allows for reliable
products per hour. and efficient communication between the components of the
system during operation.
II. PROPOSED METHODOLOGY
This research builds a basic object recognition system using B. METHOD CALIBRATION ROBOT HAND-EYE
a Doosan robot integrated with vision module capabilities In contemporary times, there have been notable developments
attached to the robot’s end effector. Overall structure of the in camera calibration technology, resulting in improved
proposed system is shown in Figure 1. efficiency in its consequences. Common modern calibration
The robot’s end-effector movements are determined based methods include point-to-point calibration [42], chessboard
on data from the cameras eye-in-hand. The robot needs calibration [43], model-based calibration [44], and remote
to collect two key types of data, which can perform calibration [45]. The calibration of the 2D camera affixed

VOLUME 13, 2025 25129


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

arrangement and execution. The proposed method uses a


structure and procedure for calibrating the robot handeye,
as seen in Figure 5 and Figure 6. The system comprises a
Doosan robotic arm, a 2D camera, and a calibration board.
The calibration board is securely mounted on a table with
dimensions of 210 × 135mm, consisting of 11 × 16 square
cells of 15 × 15mm each. The ideal distance from the camera
to the hand-eye panel is between 300 and 600 mm. The
camera mounted on the last actuator of the robot will capture
and determine the pixel values of four points A, B, C, and D
on the calibration board for calibration purposes. To complete
the calibration process, the pixel coordinates obtained from
the camera need to be transformed into the coordinates of the
robot.

FIGURE 2. System hardware configuration.

FIGURE 4. The technique of calibrating the coordinates of the camera.

For the estimate of the five intrinsic parameters and all


extrinsic parameters, we apply the closed-form solution.
The closed-form solution provides direct results through
mathematical expressions. Because the closed-form solution
does not rely on iteration, it is less affected by numerical
instability and convergence issues that can occur in iterative
methods. This ensures that the results are computed quickly
and with high accuracy.
α γ
 
FIGURE 3. Robot’s workspace and payload capacity. u0
A =  0 β v0  (1)
0 0 1
to the end effector is a crucial procedure in determining 
B11 B12 B13

the object’s coordinates on the conveyor belt relative to the B = A−T A−1 = B21 B22 B23  (2)
robot’s coordinate system. To ensure precise grasping and B31 B32 B33
placement of objects by the robot, it is imperative to attain  
a significant degree of precision in the synchronization of B11 B12 B13
the two coordinate systems employed by the camera and = B21 B22 B23  (3)
the robot. The calibration process of a camera involves B31 B32 B33
 1 −γ v0 γ −u0 β 
several sequential procedures, which encompass configuring α2 α2 β α2 β
the viewing angle and distance between the camera and the −γ γ2 −γ (v0 γ −u0 β)
+ β12 − βv02 
 
= α2 β α2 β 2 α2 β 2
working position, establishing communication, and syncing  
v0 γ −u0 β −γ (v0 γ −u0 β) (v0 γ −u0 β)2 v20
the coordinate system of the robot with the camera. α2 β α2 β 2
− βv02 α2 β 2
+ β2
+ 1
In the eye-in-hand structure, the camera will always move
(4)
in accordance with the robot’s motion. This requires exactly
B12 B23 − B11 B23
intrinsic parameters and extrinsic parameters of the camera v0 = (5)
to be consistent with every robot position. For a robot to be B11 B22 − B122
able to do pick-and-place tasks, the target coordinates need B2 + v0 (B12 B23 − B11 B23 )
λ = B33 − 13 (6)
to be accurately determined. After successful calibration, B11
the coordinates of the object relative to the camera will α
r
be transformed into the reference coordinate system of α= (7)
B
the robot. The process of calibrating and computing the s 11
coordinate system between the camera and the robot’s base αB11
β= (8)
coordinate system is quite challenging and requires precise B11 B22 − B122

25130 VOLUME 13, 2025


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

−B12 α 2 β
γ = (9)
λ
γ v0 B13 α 2
u0 = − (10)
α λ
where A is the internal matrix of the camera with being the
principal point coordinates, γ is the skew coefficient between
the x-axis and y-axis of the image plane, λ is a scaling factor,
α is the image scale factor along the u0 and v0 axes, β
is the parameter describing the distortion of the two image
axes. Matrix B is a symmetric matrix defined by the vector
6D. Once the matrix B is estimated, we can extract only the
intrinsic parameters from the matrix B.

FIGURE 6. The calibration of the camera is conducted within the working


range.

FIGURE 5. The calibration system and view camera.

We applied Tsai and Zhang’s calibration method [46]


to estimate the parameters in matrix (4). To reduce the
initial assumptions, we assumed some camera parameters
were provided by the manufacturer. This technique utilizes
n feature positions (n > 4) on every image and handles the
calibration issue by solving a set of n linear equations based
on the constraint of alignment with the robot’s manipulator.
We used Dart-Vision software to assist in the calibration,
computing, and storing of the camera intrinsic parameters in
the controller.

C. OBJECT DETECTION FIGURE 7. Overall network architecture of YOLOv7.


For object recognition problems, the tasks of object clas-
sification and object localization need to be addressed.
The eye-in-hand camera method is applied to solve the pipeline. The CBS, ELAN, and MP modules consecutively
matter. In this project, we use the YOLOv7 network apply downsampling operations to the feature maps, reducing
provided by Alexey Bochkovskiy et al., Chien-Yao Wang, and their dimensions by a factor of 1/2. As a consequence, both
Hong-Yuan Mark Liao [47], which detects objects with the length and width of the feature maps are decreased.
speeds and accuracies ranging from 5FPS to 160FPS and Furthermore, these modules effectively enhance the quantity
achieves the highest accuracy of 56.8% AP among all real- of output channels by a factor of two times the quantity
time [48] object detection models. Other popular networks, of input channels. The fundamental modules of YOLO
such as SSD, YOLOv4, Mask R-CNN, and so on, with similar are preserved. The output of the pipeline network results
object detection capabilities can be used as baselines. in the generation of three layers of feature maps with
First, the YOLOv7 network processes the input image by different sizes. After the Repconv module adjusts the final
resizing it to 640 × 640 and inputting it into the network’s output channel number, three convolutional layers with

VOLUME 13, 2025 25131


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

kernel-size = 1 (1 × 1) are used to perform object prediction, YOLO with GAN experiment with 3 different labels in the
class, and box tasks to detect the image for obtaining the research.
results.
TABLE 2. YOLOv7 with GAN experiment setting.

TABLE 1. The calibration system and view camera.

The YOLO method encompasses a set of 9 hyperpa-


rameters, which encompass the learning rate, weight decay
coefficient, and momentum, among others. The determina-
tion of the values for these hyperparameters is often achieved
The input data we use for training consists of three objects through the process of optimization. The establishment of
with 500 images for validation. Each object possesses several hyperparameters often influences the convergence rate and
positions and different postures, hence enhancing the input overall performance of the model. So as to determine the
data for the GAN algorithm, which proves to be rather effec- optimal hyperparameters, the genetic algorithm (GA) [52]
tive. The GAN has two parts that are simultaneously trained, is employed here to accomplish the optimization of the
namely generative (generative model, G) and discriminative hyperparameters with Table 1. We used Google Colab for
(discriminative model, D). The discriminatory model is the training process to reduce training time while ensuring
used to detect if a sample contains valid or invalid data. the effectiveness and quality of the model. In the setting
The generative model captures certain target information of the object detection matter, the input data will provide
distributions to puzzle the discriminative model [49], [50]. a substantial number of candidate boxes. The task at hand
The D model is a binary classifier that categorizes the data necessitates the categorization of these containers; neverthe-
from the G model in the training system as either realistic less, a limited number of them encompass tangible items,
or unrealistic. G minimizes its loss function by providing hence giving rise to the issue of an unequal distribution of
data to D that is classified as genuine. After the generator classes. In order to mitigate the problem of class imbalance,
and discriminator detect an object, the system calculates the GIoU loss function was employed. The GioU loss
the threshold value for each class. The output includes function is an extension of the Intersection over Union (IoU)
confidence, object ID, and its position. The threshold value is loss function [53]. The proposed approach not only preserves
calculated to determine the difference needed for the model the inherent characteristics of Intersection over Union (IoU)
to distinguish between real objects and similar objects. If the [54] but also successfully addresses the limitations associated
difference value is too large, that image will not be included with IoU loss.
in the dataset. L = 1 − GIoU , (11)
The YOLOv7 network and GAN aim to find a minimax c
A −U 1
solution. The discriminative model aims to maximize the GIoU = IoU − , IoU = Bc (12)
accuracy of labeling both real and fake data, whereas Ac U
the generative model aims to minimize the discriminative where L is the loss function based on GIoU , G is the smallest
network’s ability to accurately distinguish. This technique box that surrounds both boxes, Ac is area of Bc , IoU is the
works in Torch and TensorFlow [51]. The generative intersection of the two boxes, and U is the union of the two
and discriminative networks are trained using the Adam boxes.
optimization algorithm [48] with β1 = 0.3, β2 = 0.978, and As the formula shows, introducing the smallest box that
a learning rate of 0.0011. The batch size is set to 32, the encompasses both the predicted and ground truth boxes
hyperparameter λ is set to 0.65, and the normalization method results in a corresponding loss when the boxes do not
used is layer normalization. The number of iterations for the intersect. Meanwhile, when two boxes intersect and have the
training process is set to 1000. Afterwards, the total number same IoU value in different overlapping modes, the better
of images is respectively 200 (Object-id1), 150 (Object-id2), the overlapping mode, the smaller the value of GIoU loss.
and 150 (Object-id3). Furthermore, the dimensions of the The GIoU value will be higher for cases with better-aligned
images are 64 × 64 and 32 × 32, respectively, for the input orientations.
and output. GAN techniques will be employed to create
counterfeit images, which will then be used for training D. ROBOT KINEMATICS AND ROBOTIC ARM CONTROL
and combined with real images to enhance the performance To ensure smooth pick-and-place operations, the forward and
of the robotic system. Table 2 shows the set up of the inverse kinematics problems need to be computed for the

25132 VOLUME 13, 2025


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

robotic arm. The 6-DOF robot configuration used in this angular displacement of the joints.
study falls under the category of cobots. With this structure,  
cθi −sθi cαi sθi sαi ai cθi
our robot can operate in various user-friendly postures and sθi cθi cαi −cθi sαi ai sθi 
avoid many singularities, allowing humans to work alongside Tii−1 = 
0
 (13)
sαi cαi di 
the robot in simple tasks safely. However, the forward and
0 0 0 1
inverse kinematics solutions are more complex compared
to traditional 6-DOF robot configurations. The forward T = T10 .T21 .T32 .T34 .T54 .T65 (14)
 
kinematics of a manipulator pertains to the estimation of r11 r12 r13 x
the kinematic parameters of a robot as its joints move r21 r22 r23 y 
T60 =   (15)
from an initial state to an optimal position. On the other r31 r32 r33 z 
hand, the inverse kinematics problem involves finding 0 0 0 1
the proper joint angles for mounting the end effector at
where cθi and sθi represent the cosine and sine of the angle θi
an ideal position and orientation. Table 1 illustrates the
respectively.
kinematic parameters according to the Denavit-Hartenberg
cαi and sαi represent the cosine and sine of the angle αi
(D-H) convention. In this research, we use the matrix transfor-
respectively.
mation method to represent the kinematic parameters of the
ai is the parameter ‘‘a’’ in the D-H table.
robotic arm and calculate them through matrix calculations.
di is the parameter ‘‘d’’ in the D-H table.
This involved breaking down the robot’s movements into
Tii−1 is a matrix that represents the position and orientation
individual transformations, such as translations and rotations,
of frame i relative to frame i − 1. Including the matrices
using homogeneous transformation matrices. By using these
T10 , T21 , T32 , T34 , T54 , T65 .
matrices, it’s easy to convert between coordinate systems and
T is the transformation from the base frame (origin) to the
calculate the orientation and geographic location of the end
end-effector frame
  (final link).Including the matrix T60 .
effector.
r11 r12 r13 , r21 r22 r23 , , r31 r32 r33 , is rotation
  
Assuming a location in global coordinates, the angular
matrix describes the orientation of the end-effector and x, y, z
values of each joint are estimated using the inverse kine-
is the position of the end-effector.
matics [40] (IK) kinematic equation. We use a numerical
inverse kinematics solver to solve this problem for the v = J (q)q̇ (16)
operator. The velocities of the joints may be transferred to ′ ′ T −1 T
q̇ = J ẋ, J = (J J ) J (17)
a Cartesian coordinate system using the linearization method
of Jacobian, as demonstrated in equation (17). The inverse where v is the velocity of the end-effector, J (q) is the Jacobian
kinematics is employed to solve for the joint velocities matrix containing partial derivatives of the positions or angles
corresponding to a given linear speed in Cartesian space, of the joints, q̇ is the joint velocities, J ′ is the inverse Jacobian
according to the equation (18) [41]. These equations allow matrix, and ẋ is the velocity vector of the end-effector.
the operator to smoothly control the robotic arm using The method for using the inverse kinematics algorithm
linear movements in Cartesian space, while the pseudo- is demonstrated in Figure 8. The camera data, which
inverse method ensures that the joint velocities are properly provides (x, y, z, object orientation), will be transformed after
calculated. calibration into the form (x, y, z, rx, ry, rz) to serve as input
for the inverse kinematics of the robotic arm. To begin with,
the robot manipulator is determined using forward kinematics
TABLE 3. The denavit-hartenberg parameters of the Doosan robot.
(FK), and then the inverse kinematics solver updates the
joint angles using a false-inverse calculation. This process
is repeated until the final impact point approaches the goal’s
position within a suitable zone of error. The gain factor allows
for fast maximum and minimum velocity attainments, given
that the initial values and rate of change are appropriately
adjusted. In the reworking process, mistakes are influenced
by the angles of the joint. If the angles are inappropriate, the
gain factor reduces flexibly and reverses itself.
where the parameter ai represents the distance along the Dynamics is a crucial field in robotic control. Utilising
X-axis between the Z-axis of the i-th mechanical block and dynamic models for controlling a robotic arm ensures high
the Z-axis of the (i-1)-th mechanical block, the variable αi accuracy and smooth motion during operation. Below is
represents the angle of rotation from the Z-axis of the (i-1)th the general dynamic equation of a robot, which describes
mechanical block to the Z-axis of the ith mechanical block the relationship between the torques applied to the joints
along the X-axis, di is the distance along the Z-axis between and the motion of the robotic arm.
the X-axis of the (i-1)th mechanical block and the X-axis of
the ith mechanical block, and the variable θi represents the M (q)q̈ + C(q, q̇)q̇ + g(q) = u (18)

VOLUME 13, 2025 25133


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

III. SIMULATION RESULTS AND EXPERIMENTAL RESULTS


A. SIMULATION RESULTS
Simulation and testing are crucial for costly and intricate
robotic systems. Nowadays, modeling and simulation tools
are widely used to optimize and speed up design processes
like kinematic simulation and robot trajectory planning.
The goal of simulation is to create a system that closely
replicates reality and to evaluate algorithms on this system.
ROS, Robot Studio, RobotSim, and RoboDK are widely
used software tools that considerably simplify the process
of simulating robots. This study uses a 6-DOF cobot, which
offers the advantage of operating in various postures that
avoid singularities. This research conducted simulations
FIGURE 8. The process of solving inverse kinematics numerical method.
to evaluate the working postures of the 6-DOF cobot in
comparison to traditional industrial robots. As illustrated in
Figures 9 and 10, the working poses of traditional industrial
where M (q) is the inertia matrix, q̈ is the joint accelerations,
robots are not fully compatible with the eye-in-hand camera
C(q, q̇) is the Coriolis and centrifugal force matrix, g(q) is
configuration. Simultaneously, the Doosan robot was also
the gravity vector, and u is the control torques or forces
implemented in a simulation to evaluate appropriate working
applied to the joints (input vector). The inverse dynamic has
postures, as illustrated in Figure 11. For the purpose of
the form:
evaluating the precision and practicality of implementing
q̈ = M (q)−1 (u − C(q, q̇)q̇ − g(q)) (19) the Inverse Kinematics technique for governing the Doosan
robot, we employed a Python code excerpt and software
The matrix M (q) would be simply calculated as: to produce a virtual robotic arm simulation, Fig 11. The
utilization of the RoboDK program facilitated the simulation
n 
X  of a robotic arm within a virtual environment. Enhancing the
M (q) = mi JvTi Ji + JwTi ωi Ri Ii RTi Jwi (20) precision of forward and inverse kinematics problems can
i=1 be achieved by the careful consideration of the constraints
where Jvi and Jωi are the linear and angular parts of the and connections inherent in the various stages. A cuboid
Jacobian matrix Ji , mi is the mass of the link, Ri is the rotation is utilized, and its position is altered within the operational
matrix for the link, Ii is the inertia tensor, wi is the angular region of the robotic system. Figure 11 depicts the proficient
velocity of the link. localization of the robot in diverse positions and rotations
For deriving the matrix C(q, q̇), it would be useful to know relative to the target.
the passivity property of a robotic arm, which is Ṁ (q) − In this study, kinematic modeling techniques were applied
2C(q, q̇). To achieve this property, the elements of the matrix to a 6DOF cobot. In previous research, the kinematics of
cij must be calculated based on the elements of the inertia a robotic arm were simulated to check and evaluate the
matrix mij via the following formula: accuracy of the robot. A common aspect between this study
and the paper [55] is the use of Denavit-Hartenberg (DH)
n
1 ∂mij ∂mik ∂mkj
X  
cij = + − q̇k (21)
2 ∂qk ∂qj ∂qi
k=1

where, n is the number of joints (degrees of freedom), cij


are the elements of the Coriolis and centrifugal force matrix
∂m
C(q, q̇), mij is the elements of the inertia matrix M (q), ∂qkij
is the partial derivative of the inertia term, q̇k is the joint
velocity.
Next, the elements of the gravity vector gi (q) are given by:
∂P
gi (q) = (22)
∂qi
where, gi (q) is the gravitational force on the joint robot, P is
the potential energy of the robotic system due to gravity, qi is
the joint position.
Having M (q), C(q, q̇), and gi (q) completes the dynamical
model development. FIGURE 9. The simulation of the working pose of a 6DOF ABB robot.

25134 VOLUME 13, 2025


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

Compared to previous studies, our research demonstrates


that the simulation of the robotic arm is easier and more
efficient. The accuracy is also ensured to be similar to that of
prior research. Based on the simulation results and accuracy
evaluation, this study has been successfully implemented in
practice, integrating an eye-in-hand configuration mounted
on the end-effector.

TABLE 4. Evaluation of DH accuracy with the proposed kinematic


solution.

FIGURE 10. The simulation of the working pose of a 6DOF Huyndai robot.
TABLE 5. Evaluation of DH accuracy with robotics toolbox solution.

The purpose of the system is to perform pick-and-place


tasks. Therefore, the robotic arm needs to use a gripper
attached to the end effector. After getting the object’s
coordinates, the next step is to determine the appropriate
gripping position. To ensure the system operates well in real-
time, the process of simulating the gripping and checking
the gripper’s operation is conducted as shown in Figures 10
and 11.

FIGURE 11. The application of RoboDK for simulation purposes.

parameters to construct a mathematical model of the robotic


system. The paper [55] focuses on evaluating modeling
techniques using DH parameters, the Robotics Toolbox, and FIGURE 12. Simulation robotic arm use gripper (hand-opened).
a rigid multibody model generated by Solidworks software
and the Simscape environment. These studies performed
simulations and evaluated the modeling results using the
Robotics Toolbox for MATLAB. The errors in the models
used for this analysis ranged from 5% to 8%, as shown in the
table 5, and our simulation results yielded similar outcomes,
as shown in the table 4 below. Additionally, we conducted a
simulation of the pick and place posture of the robotic arm
using data obtained from an eye-in-hand camera. After the
camera identified and located the object’s coordinates relative
to the robot’s coordinate system in real-world conditions, FIGURE 13. Simulation robotic arm use gripper (hand-closed).
the data was input into the RoboDK simulation software to
simulate the robot’s gripping posture, as illustrated below.
The simulation results for the robot and the eye-in-hand B. EXPERIMENTAL RESULTS
camera were evaluated using real-world measurements and This research has two main tasks. The first task is object
forward kinematics parameters. recognition using the Eye-in-Hand configuration integrated

VOLUME 13, 2025 25135


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

on the robot arm. The object detection process is carried


out using the YOLOv7 network. In Fig. 14, we perform
200 training epochs over a period of 4 hours and achieve
an accuracy of approximately 94%. All pick object positions
are calculated and labeled with corresponding numbers for
each shape. We use the parameters obtained after calibration
to transform object positions to the robot coordinate system.
For performing the task of object classification, a procedure
including the interaction between a camera and a robot
is constructed, as depicted in Table 6. The parameters
obtained from object detection include the object coordinates
FIGURE 14. The graph after training 200 epochs.
with respect to the Camera Eye-in-Hand system and the
robot pose in Table 7. The similarity parameter is used to
evaluate the accuracy of the detection process. The robot’s
setup contains a camera mounted to its hand, which is
capable of capturing an object located at a designated point.
Subsequently, the camera will ascertain the pixel coordinates
(Tx, Ty) corresponding to the center of the object. Utilizing
a calibrated transformation matrix, the system will then
compute the corresponding coordinates (Tx, Ty) of the robot.
Each object is represented by a unique ID. The information
about the pose and position of the object will be recorded
to facilitate faster and continued pick operations in the FIGURE 15. System test setup.
future.
The test setup process is shown in Figure 15. We tested
with 3 different objects, as shown in Figure 16. Three objects
are labeled with their respective IDs, as shown in Figure 16:
ID1000, ID2000, and ID3000. The objects will be initialized

TABLE 6. Procedure pick and place object.

FIGURE 16. Recognition results with 3 objects.

TABLE 7. Data obtained after object recognition.

FIGURE 17. Robot arm picks and places objects.

with different Vision Jobs for convenience for programming


and data communication. After obtaining enough inputs, the

25136 VOLUME 13, 2025


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

TABLE 8. Testing accuracy results in performance.

FIGURE 21. Result train with YOLOv5, GAN.

ability to accurately identify and locate the main objects.


Particularly in unstable environments such as factories and
workshops. Regarding the integrated system with sensors
FIGURE 18. Result train with CNN.
such as Lidar, scanner, and 3D camera, they are typically
expensive and contribute to the overall complexity of the
system. Compared to integrating and configuring complex
sensors, using a handheld camera simplifies the setup and
maintenance process while also reducing the need for
installation space and system operation. The 8 table displays
the outcomes of objects in two groups: original images
and original images, GAN. The datasets were tested using
CNN, YOLOv5, and YOLOv7. CNN achieves maximum
accuracy 74.3% with the original image, GAN. Furthermore,
YOLOv5 achieves optimal accuracy 83.41% when applied to
the original image, GAN. Therefore, the experimental results
in Table 6 demonstrate the accuracy of YOLOv7 compared
to other YOLO networks. Research has demonstrated a
significant effectiveness with an average accuracy rate of over
FIGURE 19. Result train with CNN, GAN. 94% for objects in the training data and a pick-and-place
performance of 220–250 products per hour.

IV. CONCLUSION
This study presents a comprehensive analysis of the control
system employed in the Doosan robot arm, as well as
the object categorization technique that relies on a 2D
camera positioned on the robot arm’s end effector. In the
present era, a significant proportion of solutions frequently
depend on costly technologies such as industrial cameras or
3D cameras. Researchers encounter substantial limitations
in terms of financial resources and practicality. To tackle
these concerns, our research has effectively employed cost-
effective 2D cameras alongside sophisticated deep learning
algorithms for the purpose of executing machine-based
FIGURE 20. Result train with YOLOv5. categorization tasks. The paper also discusses techniques
related to dynamics, camera calibration, and optimization
of learning processes. The proposed methodology integrates
Pick and Place process is carried out, and the robot can pick the YOLOv7 deep learning network and GAN, with a
up different objects with the required accuracy and time as focus on optimizing the training parameters to improve
shown in Figure 17. overall performance. This enables the capacity to swiftly and
For classification systems, not using an eye-to-hand precisely recognize and categorize entities. The utilization of
camera in pick-and-place operations can limit the robot’s a calibration table is employed as a calibration method to

VOLUME 13, 2025 25137


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

align the coordinates between the camera and the robot. The [13] N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, ‘‘Visual tracking
research has successfully performed object categorization by of a moving target by a camera mounted on a robot: A combination of
control and vision,’’ IEEE Trans. Robot. Autom., vol. 9, no. 1, pp. 14–35,
integrating a 2D camera with a robot, utilizing simulation Feb. 1993.
and testing methodologies. The present work effectively [14] D. Kijdech and S. Vongbunyong, ‘‘Pick-and-place application using a dual
employed sophisticated deep learning algorithms, hence arm collaborative robot and an RGB-D camera with YOLOv5,’’ IAES Int.
J. Robot. Autom. (IJRA), vol. 12, no. 2, p. 197, Jun. 2023.
augmenting the applicability of deep learning techniques [15] F. Hameed, H. Alwan, and Q. Ateia, ‘‘Pose estimation of objects using
prior to their implementation in industrial settings. In the digital image processing for pick-and-place applications of robotic arms,’’
future, the research could explore experiments with YOLOv8 Eng. Technol. J., vol. 38, no. 5, pp. 707–718, May 2020.
[16] S. Garg, B. Harwood, G. Anand, and M. Milford, ‘‘Delta descriptors:
or YOLOv10, as well as test the system on advanced and
Change-based place representation for robust visual localization,’’ IEEE
expensive 3D devices. The ultimate goal of future research Robot. Autom. Lett., vol. 5, no. 4, pp. 5120–5127, Oct. 2020, doi:
is to develop a flexible system capable of integrating various 10.1109/LRA.2020.3005627.
models and devices while maintaining high operational [17] A. J. Ishak and S. N. Mahmood, ‘‘Eye in hand robot arm based automated
object grasping system,’’ Periodicals Eng. Natural Sci. (PEN), vol. 7, no. 2,
efficiency. The utilization of industrial robotics exhibits pp. 555–566, Jul. 2019.
significant potential in facilitating the advancement of [18] X. Liu, W. Chen, H. Madhusudanan, L. Du, and Y. Sun, ‘‘Camera
economically viable and readily available solutions, hence orientation optimization in stereo vision systems for low measurement
error,’’ IEEE/ASME Trans. Mechatronics, vol. 26, no. 2, pp. 1178–1182,
augmenting the automation and manipulation of objects Apr. 2021, doi: 10.1109/TMECH.2020.3019305.
within many industrial domains. [19] A. V. Kudryavtsev, M. T. Chikhaoui, A. Liadov, P. Rougeot, F. Spindler,
K. Rabenorosoa, J. Burgner-Kahrs, B. Tamadazte, and N. Andreff,
‘‘Eye-in-hand visual servoing of concentric tube robots,’’ IEEE
REFERENCES
Robot. Autom. Lett., vol. 3, no. 3, pp. 2315–2321, Jul. 2018, doi:
[1] Y. Zhou, T. Yu, W. Gao, W. Huang, Z. Lu, Q. Huang, and Y. Li, 10.1109/LRA.2018.2807592.
‘‘Shared three-dimensional robotic arm control based on asynchronous [20] X. Liu, H. Madhusudanan, W. Chen, D. Li, J. Ge, C. Ru, and Y. Sun,
BCI and computer vision,’’ IEEE Trans. Neural Syst. Rehabil. Eng., vol. 31, ‘‘Fast eye-in-hand 3-D scanner-robot calibration for low stitching errors,’’
pp. 3163–3175, 2023, doi: 10.1109/TNSRE.2023.3299350. IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8422–8432, Sep. 2021, doi:
[2] N. Lv, J. Liu, and Y. Jia, ‘‘Dynamic modeling and control of deformable 10.1109/TIE.2020.3009568.
linear objects for single-arm and dual-arm robot manipulations,’’
[21] Z. Li, S. Li, and X. Luo, ‘‘Using quadratic interpolated beetle
IEEE Trans. Robot., vol. 38, no. 4, pp. 2341–2353, Aug. 2022, doi:
antennae search to enhance robot arm calibration accuracy,’’ IEEE
10.1109/TRO.2021.3139838.
Robot. Autom. Lett., vol. 7, no. 4, pp. 12046–12053, Oct. 2022, doi:
[3] B. Kaczmarski, A. Goriely, E. Kuhl, and D. E. Moulton, ‘‘A simulation
10.1109/LRA.2022.3211776.
tool for physics-informed control of biomimetic soft robotic arms,’’
[22] Y. E. Haj, A. H. El-Hag, and R. A. Ghunem, ‘‘Application of deep-learning
IEEE Robot. Autom. Lett., vol. 8, no. 2, pp. 936–943, Feb. 2023, doi:
via transfer learning to evaluate silicone rubber material surface erosion,’’
10.1109/LRA.2023.3234819.
IEEE Trans. Dielectr. Electr. Insul., vol. 28, no. 4, pp. 1465–1467,
[4] V.-T. Nguyen, X.-T. Kieu, D.-T. Chu, X. HoangVan, P. X. Tan,
Aug. 2021, doi: 10.1109/TDEI.2021.009617.
and T. N. Le, ‘‘Deep learning-enhanced defects detection for printed
circuit boards,’’ Results Eng., vol. 25, Mar. 2025, Art. no. 104067, doi: [23] J. White, T. Kameneva, and C. McCarthy, ‘‘Vision processing for
10.1016/j.rineng.2025.104067. assistive vision: A deep reinforcement learning approach,’’ IEEE Trans.
Hum.-Mach. Syst., vol. 52, no. 1, pp. 123–133, Feb. 2022, doi:
[5] V.-T. Nguyen, C.-D. Do, T.-V. Dang, T.-L. Bui, and P. X. Tan, ‘‘A
10.1109/THMS.2021.3121661.
comprehensive RGB-D dataset for 6D pose estimation for industrial robots
pick and place: Creation and real-world validation,’’ Results Eng., vol. 24, [24] C. Choi, W. Schwarting, J. DelPreto, and D. Rus, ‘‘Learning
Dec. 2024, Art. no. 103459, doi: 10.1016/j.rineng.2024.103459. object grasping for soft robot hands,’’ IEEE Robot. Autom. Lett.,
[6] Z. Deng, M. Stommel, and W. Xu, ‘‘Operation planning and closed- vol. 3, no. 3, pp. 2370–2377, Jul. 2018, doi: 10.1109/LRA.2018.
loop control of a soft robotic table for simultaneous multiple-object 2810544.
manipulation,’’ IEEE Trans. Autom. Sci. Eng., vol. 17, no. 2, pp. 981–990, [25] I. S. Mohamed, A. Capitanelli, F. Mastrogiovanni, S. Rovetta, and
Apr. 2020, doi: 10.1109/TASE.2019.2953292. R. Zaccaria, ‘‘Detection, localisation and tracking of pallets using machine
[7] M. A. Selver, ‘‘A robotic system for warped stitching based compressive learning techniques and 2D range data,’’ Neural Comput. Appl., vol. 32,
strength prediction of marbles,’’ IEEE Trans. Ind. Informat., vol. 16, no. 11, no. 13, pp. 8811–8828, Jul. 2020, doi: 10.1007/s00521-019-04352-0.
pp. 6796–6805, Nov. 2020, doi: 10.1109/TII.2019.2926372. [26] T. Ye, W. Qin, Y. Li, S. Wang, J. Zhang, and Z. Zhao, ‘‘Dense and small
[8] W. Ma, Q. Du, R. Zhu, W. Han, D. Chen, and Y. Geng, ‘‘Research on object detection in UAV-vision based on a global-local feature enhanced
inverse kinematics of redundant robotic arms based on flexibility index,’’ network,’’ IEEE Trans. Instrum. Meas., vol. 71, pp. 1–13, 2022, doi:
IEEE Robot. Autom. Lett., vol. 9, no. 8, pp. 7262–7269, Aug. 2024, doi: 10.1109/TIM.2022.3196319.
10.1109/lra.2024.3420704. [27] J. Xing, Y. Liu, and G.-Z. Zhang, ‘‘Improved YOLOV5-based UAV pave-
[9] K. M. Oikonomou, I. Kansizoglou, and A. Gasteratos, ‘‘A hybrid ment crack detection,’’ IEEE Sensors J., vol. 23, no. 14, pp. 15901–15909,
reinforcement learning approach with a spiking actor network for efficient Jul. 2023, doi: 10.1109/JSEN.2023.3281585.
robotic arm target reaching,’’ IEEE Robot. Autom. Lett., vol. 8, no. 5, [28] H. Wang, Y. Xu, Y. He, Y. Cai, L. Chen, Y. Li, M. A. Sotelo, and Z. Li,
pp. 3007–3014, May 2023, doi: 10.1109/LRA.2023.3264836. ‘‘YOLOv5-fog: A multiobjective visual detection algorithm for fog driving
[10] X. Zhao, Y. He, X. Chen, and Z. Liu, ‘‘Human–robot collaborative scenes based on improved YOLOv5,’’ IEEE Trans. Instrum. Meas., vol. 71,
assembly based on eye-hand and a finite state machine in a virtual pp. 1–12, 2022, doi: 10.1109/TIM.2022.3196954.
environment,’’ Appl. Sci., vol. 11, no. 12, p. 5754, Jun. 2021, doi: [29] V.-T. Nguyen, D.-T. Chu, D.-H. Phan, and N.-T. Tran, ‘‘An improvement
10.3390/app11125754. of the camshift human tracking algorithm based on deep learning and the
[11] V. T. Nguyen, P.-T. Nguyen, X.-T. Kieu, K. D. Nguyen, and D.-D. Khuat, Kalman filter,’’ J. Robot., vol. 2023, pp. 1–12, Mar. 2023.
‘‘Real-time control method for a 6-DOF robot using an eye-in-hand [30] X. Wang, C. Fu, Z. Li, Y. Lai, and J. He, ‘‘DeepFusionMOT: A 3D
camera based on visual servoing,’’ in Proc. Int. Conf. Intell. Syst. Netw., multi-object tracking framework based on camera-LiDAR fusion with deep
in Lecture Notes in Networks and Systems, vol. 1077, T. D. L. Nguyen, association,’’ IEEE Robot. Autom. Lett., vol. 7, no. 3, pp. 8260–8267,
M. Dawson, L. A. Ngoc, and K. Y. Lam, Eds., Singapore: Springer, 2024, Jul. 2022, doi: 10.1109/LRA.2022.3187264.
doi: 10.1007/978-981-97-5504-2_52. [31] S. Kobayashi, W. Wan, T. Kiyokawa, K. Koyama, and K. Harada,
[12] K. He, R. Newbury, T. Tran, J. Haviland, B. Burgess-Limerick, D. Kulic, ‘‘Obtaining an object’s 3D model using dual-arm robotic manipula-
P. Corke, and A. Cosgun, ‘‘Visibility maximization controller for robotic tion and stationary depth sensing,’’ IEEE Trans. Autom. Sci. Eng.,
manipulation,’’ IEEE Robot. Autom. Lett., vol. 7, no. 3, pp. 8479–8486, vol. 20, no. 3, pp. 2075–2087, Jul. 2023, doi: 10.1109/TASE.2022.
Jul. 2022, doi: 10.1109/LRA.2022.3188430. 3193691.

25138 VOLUME 13, 2025


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

[32] V.-T. Nguyen, C.-D. Do, D. H. Tien, D.-T. Nguyen, and N. T. Le, [49] X. Lu, P. Shen, Y. Tsao, and H. Kawai, ‘‘Coupling a generative model with
‘‘Person detection for monitoring individuals accessing the robot working a discriminative learning framework for speaker verification,’’ IEEE/ACM
zones using YOLOv8,’’ in Computational Intelligence Methods for Green Trans. Audio, Speech, Language Process., vol. 29, pp. 3631–3641, 2021,
Technology and Sustainable Development, in Lecture Notes in Networks doi: 10.1109/TASLP.2021.3129360.
and Systems, vol. 1195, Y. P. Huang, W. J. Wang, H. G. Le, and [50] E. Zahedi, J. Dargahi, M. Kia, and M. Zadeh, ‘‘Gesture-based adaptive
A. Q. Hoang, Eds., Cham, Switzerland: Springer, 2024, doi: 10.1007/978- haptic guidance: A comparison of discriminative and generative modeling
3-031-76197-3_5. approaches,’’ IEEE Robot. Autom. Lett., vol. 2, no. 2, pp. 1015–1022,
[33] Z. Zhang, R. Dershan, A. M. S. Enayati, M. Yaghoubi, D. Richert, Apr. 2017, doi: 10.1109/LRA.2017.2660071.
and H. Najjaran, ‘‘A high-fidelity simulation platform for industrial [51] C. G. Gutiérrez, M. L. S. Rodríguez, R. Á. F. Díaz, J. L. C. Rolle,
manufacturing by incorporating robotic dynamics into an industrial N. R. Gutiérrez, and F. J. D. C. Juez, ‘‘Rapid tomographic reconstruction
simulation tool,’’ IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 9123–9128, through GPU-based adaptive optics,’’ Log. J. IGPL, vol. 27, no. 2,
Oct. 2022, doi: 10.1109/LRA.2022.3190096. pp. 214–226, Mar. 2019, doi: 10.1093/jigpal/jzy034.
[34] S. García-Sánchez, R. Rengel, S. Pérez, T. González, and J. Mateos, [52] C.-T. Lam, B. Ng, and C.-W. Chan, ‘‘Real-time traffic status detection from
‘‘A deep learning-Monte Carlo combined prediction of side-effect on-line images using generic object detection system with deep learning,’’
impact ionization in highly doped GaN diodes,’’ IEEE Trans. in Proc. IEEE 19th Int. Conf. Commun. Technol. (ICCT), Xi’an, China,
Electron Devices, vol. 70, no. 6, pp. 2981–2987, Jun. 2023, doi: Oct. 2019, pp. 1506–1510, doi: 10.1109/ICCT46805.2019.8947064.
10.1109/TED.2023.3265625. [53] R. Zhou, Y. Liu, K. Zhang, and O. Yang, ‘‘Genetic algorithm-
[35] T.-T. Le, T.-S. Le, Y.-R. Chen, J. Vidal, and C.-Y. Lin, ‘‘6D pose based challenging scenarios generation for autonomous vehicle testing,’’
estimation with combined deep learning and 3D vision techniques for a IEEE J. Radio Freq. Identificat., vol. 6, pp. 928–933, 2022, doi:
fast and accurate object grasping,’’ Robot. Auto. Syst., vol. 141, Jul. 2021, 10.1109/JRFID.2022.3223092.
Art. no. 103775. [54] M. Cui, Y. Duan, C. Pan, J. Wang, and H. Liu, ‘‘Optimization for
[36] X. Bi, J. Hu, B. Xiao, W. Li, and X. Gao, ‘‘IEMask R-CNN: Information- anchor-free object detection via scale-independent GIoU loss,’’
enhanced mask R-CNN,’’ IEEE Trans. Big Data, vol. 9, no. 2, pp. 688–700, IEEE Geosci. Remote Sens. Lett., vol. 20, pp. 1–5, 2023, doi:
Apr. 2023, doi: 10.1109/TBDATA.2022.3187413. 10.1109/LGRS.2023.3240428.
[37] X. Lu, J. Ji, Z. Xing, and Q. Miao, ‘‘Attention and feature fusion SSD for [55] J. Iqbal, R. U. Islam, and H. Khan, ‘‘Modeling and analysis of a 6 DOF
remote sensing object detection,’’ IEEE Trans. Instrum. Meas., vol. 70, robotic arm manipulator,’’ Can. J. Elect. Electron. Eng., vol. 3, no. 6,
pp. 1–9, 2021, doi: 10.1109/TIM.2021.3052575. pp. 300–306, 2012.
[38] Y. Zhang, Z. Zhang, K. Fu, and X. Luo, ‘‘Adaptive defect detection for
3-D printed lattice structures based on improved faster R-CNN,’’ IEEE
Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022, doi: 10.1109/TIM.2022.
3200362.
[39] Y. Li, S. Zhang, and W.-Q. Wang, ‘‘A lightweight faster R-CNN for ship
detection in SAR images,’’ IEEE Geosci. Remote Sens. Lett., vol. 19,
pp. 1–5, 2022, doi: 10.1109/LGRS.2020.3038901.
[40] S. Shirafuji and J. Ota, ‘‘Kinematic synthesis of a serial robotic VAN-TRUONG NGUYEN received the B.S. and
manipulator by using generalized differential inverse kinematics,’’ M.S. degrees in mechatronics engineering from
IEEE Trans. Robot., vol. 35, no. 4, pp. 1047–1054, Aug. 2019, doi: Hanoi University of Science and Technology,
10.1109/TRO.2019.2907810. Hanoi, Vietnam, in 2012 and 2014, respectively,
[41] R. K. Malhan, S. Thakar, A. M. Kabir, P. Rajendran, P. M. Bhatt, and the Ph.D. degree in mechanical engineering
and S. K. Gupta, ‘‘Generation of configuration space trajectories
from the National Taiwan University of Science
over semi-constrained Cartesian paths for robotic manipulators,’’ IEEE
and Technology, Taipei, Taiwan, in 2018. He is
Trans. Autom. Sci. Eng., vol. 20, no. 1, pp. 193–205, Jan. 2023, doi:
10.1109/TASE.2022.3144673. currently the Head of the Intelligent Robotics
[42] A. Li, S. Zheng, J. Yin, X. Luo, and H. C. Luong, ‘‘A 21–48
Laboratory and the Dean of the Faculty of Mecha-
GHz subharmonic injection-locked fractional-N frequency synthesizer tronics, SMAE, Hanoi University of Industry,
for multiband point-to-point backhaul communications,’’ IEEE J. Solid- Hanoi. He has authored or co-authored over 60 journals and conference
State Circuits, vol. 49, no. 8, pp. 1785–1799, Aug. 2014, doi: papers and some are published in prestigious journals, such as IEEE
10.1109/JSSC.2014.2320952. TRANSACTIONS ON CYBERNETICS, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND
[43] S. Schramm, J. Rangel, D. A. Salazar, R. Schmoll, and A. Kroll, CYBERNETICS: SYSTEMS, and Journal of Manufacturing Processes. His current
‘‘Target analysis for the multispectral geometric calibration of research interests include robotics, mobile robots, artificial intelligence,
cameras in visual and infrared spectral range,’’ IEEE Sensors J., intelligent control systems, and computer vision applications. He was a
vol. 21, no. 2, pp. 2159–2168, Jan. 2021, doi: 10.1109/JSEN.2020. recipient of the National Outstanding Innovation Award, in 2015, and
3019959. the Best Student Paper Award of the International Automatic Control
[44] A. J. Petruska, J. Edelmann, and B. J. Nelson, ‘‘Model-based calibration Conference, in 2018.
for magnetic manipulation,’’ IEEE Trans. Magn., vol. 53, no. 7, pp. 1–6,
Jul. 2017, doi: 10.1109/TMAG.2017.2653080.
[45] R. Sima, X. Hao, J. Song, H. Qi, Z. Yuan, L. Ding, and Y. Duan, ‘‘Research
on the temperature transfer relationship between miniature fixed-point
and blackbody for on-orbit infrared remote sensor calibration,’’ IEEE
Trans. Geosci. Remote Sens., vol. 59, no. 7, pp. 6266–6276, Jul. 2021, doi:
10.1109/TGRS.2020.3023455.
[46] D. Samper, J. Santolaria, F. J. Brosed, A. C. Majarena, and J. J. Aguilar,
PHU-TUAN NGUYEN received the bachelor’s
‘‘Analysis of tsai calibration method using two- and three-dimensional
calibration objects,’’ Mach. Vis. Appl., vol. 24, no. 1, pp. 117–131, degree in mechatronic technologies and the
Jan. 2013, doi: 10.1007/s00138-011-0398-9. master’s degree in mechatronics from Hanoi
[47] W. Yan, W. Liu, H. Bi, C. Jiang, Q. Zhang, T. Wang, T. Dong, X. Ye, and University of Industry, Vietnam, in 2023 and
Y. Sun, ‘‘YOLO-PD: Abnormal signal detection in gas pipelines based on 2024, respectively. He is currently an Assistant
improved YOLOv7,’’ IEEE Sensors J., vol. 23, no. 17, pp. 19737–19746, Technology Project with the Intelligent Robotics
Sep. 2023, doi: 10.1109/jsen.2023.3296131. Laboratory, Hanoi University of Industry. The
[48] G. Lee, R. Mallipeddi, G.-J. Jang, and M. Lee, ‘‘A genetic algorithm- pervasiveness of industry 4.0 in today’s technology
based moving object detection for real-time traffic surveillance,’’ IEEE society brings to light his work on the control
Signal Process. Lett., vol. 22, no. 10, pp. 1619–1622, Oct. 2015, doi: of robotics, focusing on adaptive controllers,
10.1109/LSP.2015.2417592. optimization techniques, and artificial intelligence applications.

VOLUME 13, 2025 25139


V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots

SHUN-FENG SU (Fellow, IEEE) received the PHAN XUAN TAN (Member, IEEE) received the
B.S. degree in electrical engineering from National B.E. degree in electrical-electronic engineering
Taiwan University, Taipei, Taiwan, in 1983, and from the Military Technical Academy, Vietnam,
the M.S. and Ph.D. degrees in electrical engi- the M.E. degree in computer and communication
neering from Purdue University, West Lafayette, engineering from Hanoi University of Science and
IN, USA, in 1989 and 1991, respectively. He is Technology, Vietnam, and the Ph.D. degree in
currently a Chair Professor with the Department functional control systems from Shibaura Institute
of Electrical Engineering, National Taiwan Uni- of Technology, Japan. He is currently an Associate
versity of Science and Technology, Taipei. He has Professor with Shibaura Institute of Technology.
published more than 300 refereed journals and His current research interests include computer
conference papers in the areas of robotics, intelligent control, fuzzy vision, deep learning, and image processing.
systems, neural networks, and non-derivative optimization. His current
research interests include computational intelligence, machine learning,
virtual reality, intelligent transportation systems, smart home, robotics, and THANH-LAM BUI received the B.S. degree
intelligent control. He is a fellow of IFSA, CACS, and RST. He currently in mechatronics engineering from Phuong Dong
serves as an Associate Editor for IEEE TRANSACTIONS ON CYBERNETICS AND University, Hanoi, Vietnam, in 2006, the M.S.
INFORMATION SCIENCE, a Senior Editor and an Associate Editor of IEEE degree in mechatronics engineering from the
ACCESS, an Executive Editor of the Journal of the Chinese Institute of Military Technical Academy, Hanoi, in 2012, and
Engineers, and an Area Editor and an Associate Editor of International the Ph.D. degree in mechanical engineering from
Journal of Fuzzy Systems. He is very active in various international/domestic Hanoi University of Science and Technology,
professional societies. He is currently the IEEE SMC Society Distinguished Hanoi, in 2018. Since 2007, he has been a Lecturer
Lecturer Program Chair and a member of Board of Government in the with the Faculty of Mechatronics, SMAE, Hanoi
IEEE SMC Society. He also serves as a board member for various academic University of Industry, Hanoi. His current research
societies. He also acted as the general chair, the program chair, or various interests include robotics, intelligent control systems, and nano technology.
positions for many international and domestic conferences.

25140 VOLUME 13, 2025

You might also like