Vision-Based Pick and Place Control System For Industrial Robots Using An Eye-in-Hand Camera
Vision-Based Pick and Place Control System For Industrial Robots Using An Eye-in-Hand Camera
ABSTRACT In this paper, we present a vision-based pick-and-place control system for industrial robots
using an eye-in-hand camera. In industry, using robots with cameras greatly improves efficiency and
performance. Previous studies have focused on utilizing robotic arms for the pick-and-place process in
simulated environments. The challenge when experimenting with real systems lies in aligning the coordinate
systems between the robot and the camera, as well as ensuring high data accuracy during experimentation.
To address this issue, our research focuses on utilizing a low-cost 2D camera combined with deep learning
algorithms mounted on the end-effector of the robotic arm. This study is evaluated in both simulation
and real-world experiments. We propose a novel approach that combines the YOLOv7 (You Only Look
Once V7) deep learning network with GAN (Generative Adversarial Networks) to achieve fast and accurate
object recognition. This system uses deep learning to process camera data to extract object positions for
the robot in real-time. Due to its advantages of fast inference and high accuracy, YOLO is applied as the
baseline for research. By training the deep learning model on diverse objects, it effectively recognizes and
detects any object in the robot’s workspace. Through experimental results, we demonstrate the feasibility and
effectiveness of our vision-based pick-and-place system. Our research contributes an important advancement
in the field of industrial robots by showcasing the potential of using a 2D camera and an integrated deep
learning system for object manipulation.
INDEX TERMS Robotic arm, vision, object detection, calibration vision, robot real-time.
integrates YOLOv7 with GAN to enhance both accuracy and robotic arm [21] The robotic arm is capable of performing
diversity in object detection tasks. Additionally, a method pick-and-place tasks with various objects, whether the objects
for calibrating the camera and robot coordinate system using are from a predefined dataset or previously unknown.
a square chessboard pattern is proposed to synchronize the Nowadays, artificial intelligence (AI) is becoming popular
system’s coordinate frames. and developing. The quick advancement of deep learning [22]
In the literature review [10], the eye-hand model for has prompted researchers to undertake several studies
a robot arm could be divided into 5 types: monocular pertaining to object categorization and localisation [23].
eye-in-hand, monocular eye-to-hand, stereo eye-in-hand, Choi et al. [24] utilized CNN (Convolutional Neural
stereo eye-to-hand, and hybrid multi-camera, respectively. Network) to detect various objects held by a robotic
In comparison to fixed cameras, the eye-in-hand camera arm. Mohamed et al. [25] applied faster R-CNN (Regions
structure offers greater flexibility in inspection and assembly with Convolutional Neural Network features) based on 2D
tasks [11]. The eye-in-hand visual servoing structure with rangefinder data to detect and localize objects. In their study,
a camera mounted at the end of the robot arm offers T. Ye et al. [26] introduced a framework for detecting objects
many advantages when grasping stationary or moving using UAVs (unmanned aerial vehicles) in infrared pictures
objects. However, with the eye-to-hand structure, it is easy and video. The feature is derived from the ground object,
to see objects clearly and capture images from different and object recognition is performed using the enhanced
angles as the robot executes its movement tasks [12]. YOLOv5s model [27], [28]. Experiments showed the ability
Papanikolopoulos et al. [13] used a camera mounted on the to detect and accurately locate various objects. However,
robotic arm for observing moving objects with known depth these methods focus on image processing and have not been
information. The study has presented an important method integrated into industrial robots. In industrial pick-and-place
for monitoring moving objects using cameras. However, systems, 3D cameras or 3D scanners [29], [30] are commonly
the system utilizing industrial cameras is often expensive utilised. The system requires a complex 3D reconstruction
and still heavily reliant on lighting conditions and image process, which involves the integration of depth sensors
noise. Kijdech et al. [14] propose a solution involving a into the testing device, thereby increasing both the cost
robotic arm and an RGB-D camera with an eye-in-hand and complexity of the system [31]. Thus, for compatibility
configuration. This system uses a camera mounted on the with industrial robotic systems, image data processing and
robot’s end-effector in conjunction with YOLOv5 for pick- communication with robots are necessary [32], [33].
and-place tasks. The study employs an expensive RGB-D Unlike previous studies, this research is evaluated on both
camera combined with YOLOv5 to determine the coordinates simulated and real-world systems. The primary objective
of objects in real-time, achieving an accuracy rate of of this paper is to propose a comprehensive solution for
90% - 95%. FS. Hameed and colleagues [15] proposed a accurately picking and placing objects using an eye-in-
method for object recognition and grasp orientation for a hand configuration mounted on a robotic arm. Therefore,
robotic arm using a 2D camera mounted on the robot’s end- this research proposes a stereo eye-in-hand pick-and-place
effector. This study utilizes a low-cost camera to identify system that, using a low-cost 2D camera, validates the system
object coordinates and grasp orientation for the robot. in both simulation and real-world dynamic environments.
However, the method has not yet been tested on a real- However, validating in real-world dynamic environments
world system, and challenges in real-time coordinate system is extremely challenging: many objects cannot be covered
synchronization between the camera and the robot remain by the training dataset; synchronize between the robot’s
unresolved. Ishak et al. [17] proposed using the eye-in- coordinate system and the one of the camera. Thus, our
hand structure to grasp a stationary object in real-time. proposed system integrates a combination of YOLOv7
However, the application of reclassification algorithms for and GAN for effective data generation, appropriate real-
the camera has not been implemented. Robots working in time calibration using a chessboard pattern, and real-time
complex environments require supervision from a visual and precise synchronization. In this setup, our system
control system. Xingjian Liu et al. [18] proposed a calibration can effectively coordinate object recognition and position
method between the robot and vision coordinate systems. determination. The proposed method uses a YOLOv7 deep
The eye-in-hand configuration, with a scanner mounted learning network combined with a GAN [34], which can
on the end effector of the robotic arm, was employed, identify objects with high speed and accuracy. The YOLOv7
resulting in high accuracy and efficiency. This system algorithm is chosen for its superior performance and accuracy
uses a camera attached to the end of the robot arm in real-time object detection compared to its predecessors.
to monitor movements in the work area [19], [20]. The The GAN helps to generate more realistic and diverse data
experiments demonstrate the rationality of the vision system, for training image sorting algorithms [35], [36], [37]. The
but these methods have some limitations: using expensive YOLO algorithm is a standard network design across the
cameras, the systems are often validated in simulation. entire process. This algorithm is more simple compared to
Additionally, in order for a robot to be able to recognise the R-CNN algorithm [38], [39]. The data on the position and
unfamiliar objects, it is necessary to utilize suitable image orientation (x-coordinate, y-coordinate, and rotation angle
processing algorithms that can be integrated with the around the Z-axis) of objects are taken from the camera and
25128 VOLUME 13, 2025
V.-T. Nguyen et al.: Vision-Based Pick and Place Control System for Industrial Robots
converted into poses for the robot arm. The problem actually
is in matching the coordinate systems of the camera and
the robot. The proposed method uses the camera calibration
algorithm with a chessboard structure consisting of square
cells measuring 11 × 16. The parameters obtained after
calibration are computed to align the coordinate systems of
the camera and the robot. The coordinates of the feature
points on the chessboard relative to the robot’s origin will be
calculated and determined based on the end-effector. Through
matrix transformations and calculations, all coordinates are
synchronised to the robot’s reference coordinate system.
This research has achieved a breakthrough in the industrial
world when the robotic arm uses only a 2D camera and an
FIGURE 1. Overall system architecture diagram.
integrated deep learning system to pick and place objects.
This research paves the way for potential developments in
applying various deep learning and reinforcement learning
models to robotic arms. The proposed system is able to pick-and-place tasks. Firstly, we need to determine the object
identify and detect any object in the workspace and execute type on the conveyor. Then, we use a deep learning network to
the necessary operations, effectively performing pick-and- generate an estimate of the object’s 3D position. If there are
place tasks. This system demonstrates superior performance multiple objects with different orientations and placements,
in comparison with baselines in simulation and shows an the task becomes more complex. The data from the camera
effective performance in real-world validation. updates the object’s status to the controller center, which
Our main contributions can be summarized into four folds: enables the Doosan robot to pick and place objects quickly
and precisely.
• Low-cost system: The use of a 2D camera significantly
reduces the cost of the system while maintaining high A. HARDWARE STRUCTURE
accuracy for pick-and-place operations. The system used in this study includes a 6-DOF robotic arm,
• Integration of Yolov7 and GAN: The proposed method a camera mounted on the end-effector, a conveyor module,
combines the Yolov7 deep learning network with GAN and sensors controlled by a programmable logic controller
to generate more diverse and realistic data for the robot (PLC). Figure 2 illustrates the actual structure of the system
to work in real time. This allows for effective data we constructed.
generation and high accuracy in object detection, even The robotic arm operates within a circular work plane with
with limited real-world data. a diameter of approximately 1.8 meters. The working distance
• Calibration and Synchronization: To use the system in between the focal point of the robot arm and the center of the
a real-world environment, we introduce an appropriate camera is 779 mm, as shown in Figure 3. The maximum load
calibration method using a chessboard pattern to ensure the robot can handle within its workspace is 6 kg, and this
real-time synchronization between the robot and camera varies according to the distance from the center of gravity.
coordinate systems. This allows the system to perform The arm’s design and working parameters simplify its ability
pick-and-place tasks in real time with high accuracy and to move and perform tasks in 3-dimensional space.
efficiency. We used a conveyor and sensors to position objects for
• Real-world validation: Simulation testing and evaluation optimal detection and classification. The sensor sends a
of the system’s performance were conducted. To ensure signal to stop the conveyor when an object reaches its
clarity in the research, both evaluations and experimen- position, and the robotic arm then proceeds to recognize and
tal runs were carried out on the actual system. The classify the object. For communication between the conveyor,
product sorting system uses vision and a robotic arm to sensors, and Doosan robot, we chose the Modbus TCP/IP
operate in real-time with a sorting efficiency of 220-250 industrial communication protocol. This allows for reliable
products per hour. and efficient communication between the components of the
system during operation.
II. PROPOSED METHODOLOGY
This research builds a basic object recognition system using B. METHOD CALIBRATION ROBOT HAND-EYE
a Doosan robot integrated with vision module capabilities In contemporary times, there have been notable developments
attached to the robot’s end effector. Overall structure of the in camera calibration technology, resulting in improved
proposed system is shown in Figure 1. efficiency in its consequences. Common modern calibration
The robot’s end-effector movements are determined based methods include point-to-point calibration [42], chessboard
on data from the cameras eye-in-hand. The robot needs calibration [43], model-based calibration [44], and remote
to collect two key types of data, which can perform calibration [45]. The calibration of the 2D camera affixed
−B12 α 2 β
γ = (9)
λ
γ v0 B13 α 2
u0 = − (10)
α λ
where A is the internal matrix of the camera with being the
principal point coordinates, γ is the skew coefficient between
the x-axis and y-axis of the image plane, λ is a scaling factor,
α is the image scale factor along the u0 and v0 axes, β
is the parameter describing the distortion of the two image
axes. Matrix B is a symmetric matrix defined by the vector
6D. Once the matrix B is estimated, we can extract only the
intrinsic parameters from the matrix B.
kernel-size = 1 (1 × 1) are used to perform object prediction, YOLO with GAN experiment with 3 different labels in the
class, and box tasks to detect the image for obtaining the research.
results.
TABLE 2. YOLOv7 with GAN experiment setting.
robotic arm. The 6-DOF robot configuration used in this angular displacement of the joints.
study falls under the category of cobots. With this structure,
cθi −sθi cαi sθi sαi ai cθi
our robot can operate in various user-friendly postures and sθi cθi cαi −cθi sαi ai sθi
avoid many singularities, allowing humans to work alongside Tii−1 =
0
(13)
sαi cαi di
the robot in simple tasks safely. However, the forward and
0 0 0 1
inverse kinematics solutions are more complex compared
to traditional 6-DOF robot configurations. The forward T = T10 .T21 .T32 .T34 .T54 .T65 (14)
kinematics of a manipulator pertains to the estimation of r11 r12 r13 x
the kinematic parameters of a robot as its joints move r21 r22 r23 y
T60 = (15)
from an initial state to an optimal position. On the other r31 r32 r33 z
hand, the inverse kinematics problem involves finding 0 0 0 1
the proper joint angles for mounting the end effector at
where cθi and sθi represent the cosine and sine of the angle θi
an ideal position and orientation. Table 1 illustrates the
respectively.
kinematic parameters according to the Denavit-Hartenberg
cαi and sαi represent the cosine and sine of the angle αi
(D-H) convention. In this research, we use the matrix transfor-
respectively.
mation method to represent the kinematic parameters of the
ai is the parameter ‘‘a’’ in the D-H table.
robotic arm and calculate them through matrix calculations.
di is the parameter ‘‘d’’ in the D-H table.
This involved breaking down the robot’s movements into
Tii−1 is a matrix that represents the position and orientation
individual transformations, such as translations and rotations,
of frame i relative to frame i − 1. Including the matrices
using homogeneous transformation matrices. By using these
T10 , T21 , T32 , T34 , T54 , T65 .
matrices, it’s easy to convert between coordinate systems and
T is the transformation from the base frame (origin) to the
calculate the orientation and geographic location of the end
end-effector frame
(final link).Including the matrix T60 .
effector.
r11 r12 r13 , r21 r22 r23 , , r31 r32 r33 , is rotation
Assuming a location in global coordinates, the angular
matrix describes the orientation of the end-effector and x, y, z
values of each joint are estimated using the inverse kine-
is the position of the end-effector.
matics [40] (IK) kinematic equation. We use a numerical
inverse kinematics solver to solve this problem for the v = J (q)q̇ (16)
operator. The velocities of the joints may be transferred to ′ ′ T −1 T
q̇ = J ẋ, J = (J J ) J (17)
a Cartesian coordinate system using the linearization method
of Jacobian, as demonstrated in equation (17). The inverse where v is the velocity of the end-effector, J (q) is the Jacobian
kinematics is employed to solve for the joint velocities matrix containing partial derivatives of the positions or angles
corresponding to a given linear speed in Cartesian space, of the joints, q̇ is the joint velocities, J ′ is the inverse Jacobian
according to the equation (18) [41]. These equations allow matrix, and ẋ is the velocity vector of the end-effector.
the operator to smoothly control the robotic arm using The method for using the inverse kinematics algorithm
linear movements in Cartesian space, while the pseudo- is demonstrated in Figure 8. The camera data, which
inverse method ensures that the joint velocities are properly provides (x, y, z, object orientation), will be transformed after
calculated. calibration into the form (x, y, z, rx, ry, rz) to serve as input
for the inverse kinematics of the robotic arm. To begin with,
the robot manipulator is determined using forward kinematics
TABLE 3. The denavit-hartenberg parameters of the Doosan robot.
(FK), and then the inverse kinematics solver updates the
joint angles using a false-inverse calculation. This process
is repeated until the final impact point approaches the goal’s
position within a suitable zone of error. The gain factor allows
for fast maximum and minimum velocity attainments, given
that the initial values and rate of change are appropriately
adjusted. In the reworking process, mistakes are influenced
by the angles of the joint. If the angles are inappropriate, the
gain factor reduces flexibly and reverses itself.
where the parameter ai represents the distance along the Dynamics is a crucial field in robotic control. Utilising
X-axis between the Z-axis of the i-th mechanical block and dynamic models for controlling a robotic arm ensures high
the Z-axis of the (i-1)-th mechanical block, the variable αi accuracy and smooth motion during operation. Below is
represents the angle of rotation from the Z-axis of the (i-1)th the general dynamic equation of a robot, which describes
mechanical block to the Z-axis of the ith mechanical block the relationship between the torques applied to the joints
along the X-axis, di is the distance along the Z-axis between and the motion of the robotic arm.
the X-axis of the (i-1)th mechanical block and the X-axis of
the ith mechanical block, and the variable θi represents the M (q)q̈ + C(q, q̇)q̇ + g(q) = u (18)
FIGURE 10. The simulation of the working pose of a 6DOF Huyndai robot.
TABLE 5. Evaluation of DH accuracy with robotics toolbox solution.
IV. CONCLUSION
This study presents a comprehensive analysis of the control
system employed in the Doosan robot arm, as well as
the object categorization technique that relies on a 2D
camera positioned on the robot arm’s end effector. In the
present era, a significant proportion of solutions frequently
depend on costly technologies such as industrial cameras or
3D cameras. Researchers encounter substantial limitations
in terms of financial resources and practicality. To tackle
these concerns, our research has effectively employed cost-
effective 2D cameras alongside sophisticated deep learning
algorithms for the purpose of executing machine-based
FIGURE 20. Result train with YOLOv5. categorization tasks. The paper also discusses techniques
related to dynamics, camera calibration, and optimization
of learning processes. The proposed methodology integrates
Pick and Place process is carried out, and the robot can pick the YOLOv7 deep learning network and GAN, with a
up different objects with the required accuracy and time as focus on optimizing the training parameters to improve
shown in Figure 17. overall performance. This enables the capacity to swiftly and
For classification systems, not using an eye-to-hand precisely recognize and categorize entities. The utilization of
camera in pick-and-place operations can limit the robot’s a calibration table is employed as a calibration method to
align the coordinates between the camera and the robot. The [13] N. P. Papanikolopoulos, P. K. Khosla, and T. Kanade, ‘‘Visual tracking
research has successfully performed object categorization by of a moving target by a camera mounted on a robot: A combination of
control and vision,’’ IEEE Trans. Robot. Autom., vol. 9, no. 1, pp. 14–35,
integrating a 2D camera with a robot, utilizing simulation Feb. 1993.
and testing methodologies. The present work effectively [14] D. Kijdech and S. Vongbunyong, ‘‘Pick-and-place application using a dual
employed sophisticated deep learning algorithms, hence arm collaborative robot and an RGB-D camera with YOLOv5,’’ IAES Int.
J. Robot. Autom. (IJRA), vol. 12, no. 2, p. 197, Jun. 2023.
augmenting the applicability of deep learning techniques [15] F. Hameed, H. Alwan, and Q. Ateia, ‘‘Pose estimation of objects using
prior to their implementation in industrial settings. In the digital image processing for pick-and-place applications of robotic arms,’’
future, the research could explore experiments with YOLOv8 Eng. Technol. J., vol. 38, no. 5, pp. 707–718, May 2020.
[16] S. Garg, B. Harwood, G. Anand, and M. Milford, ‘‘Delta descriptors:
or YOLOv10, as well as test the system on advanced and
Change-based place representation for robust visual localization,’’ IEEE
expensive 3D devices. The ultimate goal of future research Robot. Autom. Lett., vol. 5, no. 4, pp. 5120–5127, Oct. 2020, doi:
is to develop a flexible system capable of integrating various 10.1109/LRA.2020.3005627.
models and devices while maintaining high operational [17] A. J. Ishak and S. N. Mahmood, ‘‘Eye in hand robot arm based automated
object grasping system,’’ Periodicals Eng. Natural Sci. (PEN), vol. 7, no. 2,
efficiency. The utilization of industrial robotics exhibits pp. 555–566, Jul. 2019.
significant potential in facilitating the advancement of [18] X. Liu, W. Chen, H. Madhusudanan, L. Du, and Y. Sun, ‘‘Camera
economically viable and readily available solutions, hence orientation optimization in stereo vision systems for low measurement
error,’’ IEEE/ASME Trans. Mechatronics, vol. 26, no. 2, pp. 1178–1182,
augmenting the automation and manipulation of objects Apr. 2021, doi: 10.1109/TMECH.2020.3019305.
within many industrial domains. [19] A. V. Kudryavtsev, M. T. Chikhaoui, A. Liadov, P. Rougeot, F. Spindler,
K. Rabenorosoa, J. Burgner-Kahrs, B. Tamadazte, and N. Andreff,
‘‘Eye-in-hand visual servoing of concentric tube robots,’’ IEEE
REFERENCES
Robot. Autom. Lett., vol. 3, no. 3, pp. 2315–2321, Jul. 2018, doi:
[1] Y. Zhou, T. Yu, W. Gao, W. Huang, Z. Lu, Q. Huang, and Y. Li, 10.1109/LRA.2018.2807592.
‘‘Shared three-dimensional robotic arm control based on asynchronous [20] X. Liu, H. Madhusudanan, W. Chen, D. Li, J. Ge, C. Ru, and Y. Sun,
BCI and computer vision,’’ IEEE Trans. Neural Syst. Rehabil. Eng., vol. 31, ‘‘Fast eye-in-hand 3-D scanner-robot calibration for low stitching errors,’’
pp. 3163–3175, 2023, doi: 10.1109/TNSRE.2023.3299350. IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8422–8432, Sep. 2021, doi:
[2] N. Lv, J. Liu, and Y. Jia, ‘‘Dynamic modeling and control of deformable 10.1109/TIE.2020.3009568.
linear objects for single-arm and dual-arm robot manipulations,’’
[21] Z. Li, S. Li, and X. Luo, ‘‘Using quadratic interpolated beetle
IEEE Trans. Robot., vol. 38, no. 4, pp. 2341–2353, Aug. 2022, doi:
antennae search to enhance robot arm calibration accuracy,’’ IEEE
10.1109/TRO.2021.3139838.
Robot. Autom. Lett., vol. 7, no. 4, pp. 12046–12053, Oct. 2022, doi:
[3] B. Kaczmarski, A. Goriely, E. Kuhl, and D. E. Moulton, ‘‘A simulation
10.1109/LRA.2022.3211776.
tool for physics-informed control of biomimetic soft robotic arms,’’
[22] Y. E. Haj, A. H. El-Hag, and R. A. Ghunem, ‘‘Application of deep-learning
IEEE Robot. Autom. Lett., vol. 8, no. 2, pp. 936–943, Feb. 2023, doi:
via transfer learning to evaluate silicone rubber material surface erosion,’’
10.1109/LRA.2023.3234819.
IEEE Trans. Dielectr. Electr. Insul., vol. 28, no. 4, pp. 1465–1467,
[4] V.-T. Nguyen, X.-T. Kieu, D.-T. Chu, X. HoangVan, P. X. Tan,
Aug. 2021, doi: 10.1109/TDEI.2021.009617.
and T. N. Le, ‘‘Deep learning-enhanced defects detection for printed
circuit boards,’’ Results Eng., vol. 25, Mar. 2025, Art. no. 104067, doi: [23] J. White, T. Kameneva, and C. McCarthy, ‘‘Vision processing for
10.1016/j.rineng.2025.104067. assistive vision: A deep reinforcement learning approach,’’ IEEE Trans.
Hum.-Mach. Syst., vol. 52, no. 1, pp. 123–133, Feb. 2022, doi:
[5] V.-T. Nguyen, C.-D. Do, T.-V. Dang, T.-L. Bui, and P. X. Tan, ‘‘A
10.1109/THMS.2021.3121661.
comprehensive RGB-D dataset for 6D pose estimation for industrial robots
pick and place: Creation and real-world validation,’’ Results Eng., vol. 24, [24] C. Choi, W. Schwarting, J. DelPreto, and D. Rus, ‘‘Learning
Dec. 2024, Art. no. 103459, doi: 10.1016/j.rineng.2024.103459. object grasping for soft robot hands,’’ IEEE Robot. Autom. Lett.,
[6] Z. Deng, M. Stommel, and W. Xu, ‘‘Operation planning and closed- vol. 3, no. 3, pp. 2370–2377, Jul. 2018, doi: 10.1109/LRA.2018.
loop control of a soft robotic table for simultaneous multiple-object 2810544.
manipulation,’’ IEEE Trans. Autom. Sci. Eng., vol. 17, no. 2, pp. 981–990, [25] I. S. Mohamed, A. Capitanelli, F. Mastrogiovanni, S. Rovetta, and
Apr. 2020, doi: 10.1109/TASE.2019.2953292. R. Zaccaria, ‘‘Detection, localisation and tracking of pallets using machine
[7] M. A. Selver, ‘‘A robotic system for warped stitching based compressive learning techniques and 2D range data,’’ Neural Comput. Appl., vol. 32,
strength prediction of marbles,’’ IEEE Trans. Ind. Informat., vol. 16, no. 11, no. 13, pp. 8811–8828, Jul. 2020, doi: 10.1007/s00521-019-04352-0.
pp. 6796–6805, Nov. 2020, doi: 10.1109/TII.2019.2926372. [26] T. Ye, W. Qin, Y. Li, S. Wang, J. Zhang, and Z. Zhao, ‘‘Dense and small
[8] W. Ma, Q. Du, R. Zhu, W. Han, D. Chen, and Y. Geng, ‘‘Research on object detection in UAV-vision based on a global-local feature enhanced
inverse kinematics of redundant robotic arms based on flexibility index,’’ network,’’ IEEE Trans. Instrum. Meas., vol. 71, pp. 1–13, 2022, doi:
IEEE Robot. Autom. Lett., vol. 9, no. 8, pp. 7262–7269, Aug. 2024, doi: 10.1109/TIM.2022.3196319.
10.1109/lra.2024.3420704. [27] J. Xing, Y. Liu, and G.-Z. Zhang, ‘‘Improved YOLOV5-based UAV pave-
[9] K. M. Oikonomou, I. Kansizoglou, and A. Gasteratos, ‘‘A hybrid ment crack detection,’’ IEEE Sensors J., vol. 23, no. 14, pp. 15901–15909,
reinforcement learning approach with a spiking actor network for efficient Jul. 2023, doi: 10.1109/JSEN.2023.3281585.
robotic arm target reaching,’’ IEEE Robot. Autom. Lett., vol. 8, no. 5, [28] H. Wang, Y. Xu, Y. He, Y. Cai, L. Chen, Y. Li, M. A. Sotelo, and Z. Li,
pp. 3007–3014, May 2023, doi: 10.1109/LRA.2023.3264836. ‘‘YOLOv5-fog: A multiobjective visual detection algorithm for fog driving
[10] X. Zhao, Y. He, X. Chen, and Z. Liu, ‘‘Human–robot collaborative scenes based on improved YOLOv5,’’ IEEE Trans. Instrum. Meas., vol. 71,
assembly based on eye-hand and a finite state machine in a virtual pp. 1–12, 2022, doi: 10.1109/TIM.2022.3196954.
environment,’’ Appl. Sci., vol. 11, no. 12, p. 5754, Jun. 2021, doi: [29] V.-T. Nguyen, D.-T. Chu, D.-H. Phan, and N.-T. Tran, ‘‘An improvement
10.3390/app11125754. of the camshift human tracking algorithm based on deep learning and the
[11] V. T. Nguyen, P.-T. Nguyen, X.-T. Kieu, K. D. Nguyen, and D.-D. Khuat, Kalman filter,’’ J. Robot., vol. 2023, pp. 1–12, Mar. 2023.
‘‘Real-time control method for a 6-DOF robot using an eye-in-hand [30] X. Wang, C. Fu, Z. Li, Y. Lai, and J. He, ‘‘DeepFusionMOT: A 3D
camera based on visual servoing,’’ in Proc. Int. Conf. Intell. Syst. Netw., multi-object tracking framework based on camera-LiDAR fusion with deep
in Lecture Notes in Networks and Systems, vol. 1077, T. D. L. Nguyen, association,’’ IEEE Robot. Autom. Lett., vol. 7, no. 3, pp. 8260–8267,
M. Dawson, L. A. Ngoc, and K. Y. Lam, Eds., Singapore: Springer, 2024, Jul. 2022, doi: 10.1109/LRA.2022.3187264.
doi: 10.1007/978-981-97-5504-2_52. [31] S. Kobayashi, W. Wan, T. Kiyokawa, K. Koyama, and K. Harada,
[12] K. He, R. Newbury, T. Tran, J. Haviland, B. Burgess-Limerick, D. Kulic, ‘‘Obtaining an object’s 3D model using dual-arm robotic manipula-
P. Corke, and A. Cosgun, ‘‘Visibility maximization controller for robotic tion and stationary depth sensing,’’ IEEE Trans. Autom. Sci. Eng.,
manipulation,’’ IEEE Robot. Autom. Lett., vol. 7, no. 3, pp. 8479–8486, vol. 20, no. 3, pp. 2075–2087, Jul. 2023, doi: 10.1109/TASE.2022.
Jul. 2022, doi: 10.1109/LRA.2022.3188430. 3193691.
[32] V.-T. Nguyen, C.-D. Do, D. H. Tien, D.-T. Nguyen, and N. T. Le, [49] X. Lu, P. Shen, Y. Tsao, and H. Kawai, ‘‘Coupling a generative model with
‘‘Person detection for monitoring individuals accessing the robot working a discriminative learning framework for speaker verification,’’ IEEE/ACM
zones using YOLOv8,’’ in Computational Intelligence Methods for Green Trans. Audio, Speech, Language Process., vol. 29, pp. 3631–3641, 2021,
Technology and Sustainable Development, in Lecture Notes in Networks doi: 10.1109/TASLP.2021.3129360.
and Systems, vol. 1195, Y. P. Huang, W. J. Wang, H. G. Le, and [50] E. Zahedi, J. Dargahi, M. Kia, and M. Zadeh, ‘‘Gesture-based adaptive
A. Q. Hoang, Eds., Cham, Switzerland: Springer, 2024, doi: 10.1007/978- haptic guidance: A comparison of discriminative and generative modeling
3-031-76197-3_5. approaches,’’ IEEE Robot. Autom. Lett., vol. 2, no. 2, pp. 1015–1022,
[33] Z. Zhang, R. Dershan, A. M. S. Enayati, M. Yaghoubi, D. Richert, Apr. 2017, doi: 10.1109/LRA.2017.2660071.
and H. Najjaran, ‘‘A high-fidelity simulation platform for industrial [51] C. G. Gutiérrez, M. L. S. Rodríguez, R. Á. F. Díaz, J. L. C. Rolle,
manufacturing by incorporating robotic dynamics into an industrial N. R. Gutiérrez, and F. J. D. C. Juez, ‘‘Rapid tomographic reconstruction
simulation tool,’’ IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 9123–9128, through GPU-based adaptive optics,’’ Log. J. IGPL, vol. 27, no. 2,
Oct. 2022, doi: 10.1109/LRA.2022.3190096. pp. 214–226, Mar. 2019, doi: 10.1093/jigpal/jzy034.
[34] S. García-Sánchez, R. Rengel, S. Pérez, T. González, and J. Mateos, [52] C.-T. Lam, B. Ng, and C.-W. Chan, ‘‘Real-time traffic status detection from
‘‘A deep learning-Monte Carlo combined prediction of side-effect on-line images using generic object detection system with deep learning,’’
impact ionization in highly doped GaN diodes,’’ IEEE Trans. in Proc. IEEE 19th Int. Conf. Commun. Technol. (ICCT), Xi’an, China,
Electron Devices, vol. 70, no. 6, pp. 2981–2987, Jun. 2023, doi: Oct. 2019, pp. 1506–1510, doi: 10.1109/ICCT46805.2019.8947064.
10.1109/TED.2023.3265625. [53] R. Zhou, Y. Liu, K. Zhang, and O. Yang, ‘‘Genetic algorithm-
[35] T.-T. Le, T.-S. Le, Y.-R. Chen, J. Vidal, and C.-Y. Lin, ‘‘6D pose based challenging scenarios generation for autonomous vehicle testing,’’
estimation with combined deep learning and 3D vision techniques for a IEEE J. Radio Freq. Identificat., vol. 6, pp. 928–933, 2022, doi:
fast and accurate object grasping,’’ Robot. Auto. Syst., vol. 141, Jul. 2021, 10.1109/JRFID.2022.3223092.
Art. no. 103775. [54] M. Cui, Y. Duan, C. Pan, J. Wang, and H. Liu, ‘‘Optimization for
[36] X. Bi, J. Hu, B. Xiao, W. Li, and X. Gao, ‘‘IEMask R-CNN: Information- anchor-free object detection via scale-independent GIoU loss,’’
enhanced mask R-CNN,’’ IEEE Trans. Big Data, vol. 9, no. 2, pp. 688–700, IEEE Geosci. Remote Sens. Lett., vol. 20, pp. 1–5, 2023, doi:
Apr. 2023, doi: 10.1109/TBDATA.2022.3187413. 10.1109/LGRS.2023.3240428.
[37] X. Lu, J. Ji, Z. Xing, and Q. Miao, ‘‘Attention and feature fusion SSD for [55] J. Iqbal, R. U. Islam, and H. Khan, ‘‘Modeling and analysis of a 6 DOF
remote sensing object detection,’’ IEEE Trans. Instrum. Meas., vol. 70, robotic arm manipulator,’’ Can. J. Elect. Electron. Eng., vol. 3, no. 6,
pp. 1–9, 2021, doi: 10.1109/TIM.2021.3052575. pp. 300–306, 2012.
[38] Y. Zhang, Z. Zhang, K. Fu, and X. Luo, ‘‘Adaptive defect detection for
3-D printed lattice structures based on improved faster R-CNN,’’ IEEE
Trans. Instrum. Meas., vol. 71, pp. 1–9, 2022, doi: 10.1109/TIM.2022.
3200362.
[39] Y. Li, S. Zhang, and W.-Q. Wang, ‘‘A lightweight faster R-CNN for ship
detection in SAR images,’’ IEEE Geosci. Remote Sens. Lett., vol. 19,
pp. 1–5, 2022, doi: 10.1109/LGRS.2020.3038901.
[40] S. Shirafuji and J. Ota, ‘‘Kinematic synthesis of a serial robotic VAN-TRUONG NGUYEN received the B.S. and
manipulator by using generalized differential inverse kinematics,’’ M.S. degrees in mechatronics engineering from
IEEE Trans. Robot., vol. 35, no. 4, pp. 1047–1054, Aug. 2019, doi: Hanoi University of Science and Technology,
10.1109/TRO.2019.2907810. Hanoi, Vietnam, in 2012 and 2014, respectively,
[41] R. K. Malhan, S. Thakar, A. M. Kabir, P. Rajendran, P. M. Bhatt, and the Ph.D. degree in mechanical engineering
and S. K. Gupta, ‘‘Generation of configuration space trajectories
from the National Taiwan University of Science
over semi-constrained Cartesian paths for robotic manipulators,’’ IEEE
and Technology, Taipei, Taiwan, in 2018. He is
Trans. Autom. Sci. Eng., vol. 20, no. 1, pp. 193–205, Jan. 2023, doi:
10.1109/TASE.2022.3144673. currently the Head of the Intelligent Robotics
[42] A. Li, S. Zheng, J. Yin, X. Luo, and H. C. Luong, ‘‘A 21–48
Laboratory and the Dean of the Faculty of Mecha-
GHz subharmonic injection-locked fractional-N frequency synthesizer tronics, SMAE, Hanoi University of Industry,
for multiband point-to-point backhaul communications,’’ IEEE J. Solid- Hanoi. He has authored or co-authored over 60 journals and conference
State Circuits, vol. 49, no. 8, pp. 1785–1799, Aug. 2014, doi: papers and some are published in prestigious journals, such as IEEE
10.1109/JSSC.2014.2320952. TRANSACTIONS ON CYBERNETICS, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND
[43] S. Schramm, J. Rangel, D. A. Salazar, R. Schmoll, and A. Kroll, CYBERNETICS: SYSTEMS, and Journal of Manufacturing Processes. His current
‘‘Target analysis for the multispectral geometric calibration of research interests include robotics, mobile robots, artificial intelligence,
cameras in visual and infrared spectral range,’’ IEEE Sensors J., intelligent control systems, and computer vision applications. He was a
vol. 21, no. 2, pp. 2159–2168, Jan. 2021, doi: 10.1109/JSEN.2020. recipient of the National Outstanding Innovation Award, in 2015, and
3019959. the Best Student Paper Award of the International Automatic Control
[44] A. J. Petruska, J. Edelmann, and B. J. Nelson, ‘‘Model-based calibration Conference, in 2018.
for magnetic manipulation,’’ IEEE Trans. Magn., vol. 53, no. 7, pp. 1–6,
Jul. 2017, doi: 10.1109/TMAG.2017.2653080.
[45] R. Sima, X. Hao, J. Song, H. Qi, Z. Yuan, L. Ding, and Y. Duan, ‘‘Research
on the temperature transfer relationship between miniature fixed-point
and blackbody for on-orbit infrared remote sensor calibration,’’ IEEE
Trans. Geosci. Remote Sens., vol. 59, no. 7, pp. 6266–6276, Jul. 2021, doi:
10.1109/TGRS.2020.3023455.
[46] D. Samper, J. Santolaria, F. J. Brosed, A. C. Majarena, and J. J. Aguilar,
PHU-TUAN NGUYEN received the bachelor’s
‘‘Analysis of tsai calibration method using two- and three-dimensional
calibration objects,’’ Mach. Vis. Appl., vol. 24, no. 1, pp. 117–131, degree in mechatronic technologies and the
Jan. 2013, doi: 10.1007/s00138-011-0398-9. master’s degree in mechatronics from Hanoi
[47] W. Yan, W. Liu, H. Bi, C. Jiang, Q. Zhang, T. Wang, T. Dong, X. Ye, and University of Industry, Vietnam, in 2023 and
Y. Sun, ‘‘YOLO-PD: Abnormal signal detection in gas pipelines based on 2024, respectively. He is currently an Assistant
improved YOLOv7,’’ IEEE Sensors J., vol. 23, no. 17, pp. 19737–19746, Technology Project with the Intelligent Robotics
Sep. 2023, doi: 10.1109/jsen.2023.3296131. Laboratory, Hanoi University of Industry. The
[48] G. Lee, R. Mallipeddi, G.-J. Jang, and M. Lee, ‘‘A genetic algorithm- pervasiveness of industry 4.0 in today’s technology
based moving object detection for real-time traffic surveillance,’’ IEEE society brings to light his work on the control
Signal Process. Lett., vol. 22, no. 10, pp. 1619–1622, Oct. 2015, doi: of robotics, focusing on adaptive controllers,
10.1109/LSP.2015.2417592. optimization techniques, and artificial intelligence applications.
SHUN-FENG SU (Fellow, IEEE) received the PHAN XUAN TAN (Member, IEEE) received the
B.S. degree in electrical engineering from National B.E. degree in electrical-electronic engineering
Taiwan University, Taipei, Taiwan, in 1983, and from the Military Technical Academy, Vietnam,
the M.S. and Ph.D. degrees in electrical engi- the M.E. degree in computer and communication
neering from Purdue University, West Lafayette, engineering from Hanoi University of Science and
IN, USA, in 1989 and 1991, respectively. He is Technology, Vietnam, and the Ph.D. degree in
currently a Chair Professor with the Department functional control systems from Shibaura Institute
of Electrical Engineering, National Taiwan Uni- of Technology, Japan. He is currently an Associate
versity of Science and Technology, Taipei. He has Professor with Shibaura Institute of Technology.
published more than 300 refereed journals and His current research interests include computer
conference papers in the areas of robotics, intelligent control, fuzzy vision, deep learning, and image processing.
systems, neural networks, and non-derivative optimization. His current
research interests include computational intelligence, machine learning,
virtual reality, intelligent transportation systems, smart home, robotics, and THANH-LAM BUI received the B.S. degree
intelligent control. He is a fellow of IFSA, CACS, and RST. He currently in mechatronics engineering from Phuong Dong
serves as an Associate Editor for IEEE TRANSACTIONS ON CYBERNETICS AND University, Hanoi, Vietnam, in 2006, the M.S.
INFORMATION SCIENCE, a Senior Editor and an Associate Editor of IEEE degree in mechatronics engineering from the
ACCESS, an Executive Editor of the Journal of the Chinese Institute of Military Technical Academy, Hanoi, in 2012, and
Engineers, and an Area Editor and an Associate Editor of International the Ph.D. degree in mechanical engineering from
Journal of Fuzzy Systems. He is very active in various international/domestic Hanoi University of Science and Technology,
professional societies. He is currently the IEEE SMC Society Distinguished Hanoi, in 2018. Since 2007, he has been a Lecturer
Lecturer Program Chair and a member of Board of Government in the with the Faculty of Mechatronics, SMAE, Hanoi
IEEE SMC Society. He also serves as a board member for various academic University of Industry, Hanoi. His current research
societies. He also acted as the general chair, the program chair, or various interests include robotics, intelligent control systems, and nano technology.
positions for many international and domestic conferences.