AI-Powered Visual Sensors and Sensing: Where We Are and Where WeAreGoing
AI-Powered Visual Sensors and Sensing: Where We Are and Where WeAreGoing
1 School of Electrical Engineering, International University, Ho Chi Minh City 700000, Vietnam;
nthieu@hcmiu.edu.vn
2 Vietnam National University, Ho Chi Minh City 700000, Vietnam
3 Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health,
Baltimore, MD 21224, USA
4 SpreeAI, Incline Village, NV 89450, USA; minh.vo@spreeai.com
5 U.S. Army Research Laboratory, 2201 Aberdeen Boulevard, Aberdeen, MD 21005, USA;
john.s.hyatt11.civ@army.mil
6 Department of Mechanical Engineering, School of Engineering, The Catholic University of America,
Washington, DC 20064, USA
* Correspondence: wangz@cua.edu
1. Introduction
Deep learning, a machine learning method that mimics the neural network structures
of the human brain to process data, recognize patterns, and make decisions, traces its origins
back to the 1950s. It was not until the beginning of the 21st century that deep learning
truly began to flourish, with the breakthrough in algorithms, the significant increase in
computing power, and the advent of large-scale data acquisition. As a subset of artificial
intelligence (AI), deep learning has acted as a driving force behind strengthening the impact
of AI in different fields and enhancing its integration into daily life.
In 2006, Geoffrey Hinton and his student [1] showed that deep belief networks, stacks
of restricted Boltzmann machines, could be trained layer by layer in an unsupervised man-
ner and fine-tuned using supervised learning. This model addressed the vanishing gradient
problem and allowed for the practical training of multilayer neural networks. It laid the
foundation for subsequent advancements and marked the revival of deep learning. A land-
mark success came in 2012 when AlexNet [2] delivered groundbreaking results on images
from the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), outperforming all
Received: 11 February 2025
previous methods. This achievement highlighted the overwhelming capabilities of deep
Accepted: 3 March 2025
Published: 12 March 2025
convolutional neural networks (CNNs) and quickly attracted widespread attention from
both academia and industry.
Citation: Nguyen, H.; Vo, M.;
Hyatt, J.; Wang, Z. AI-Powered Visual
Over the past decade, deep learning has undergone a quick growth and has driven
Sensors and Sensing: Where We Are numerous breakthroughs in AI. Some notable early achievements include, but are
and Where We Are Going. Sensors not limited to, the dropout scheme [3] introduced in 2013, recurrent neural networks
2025, 25, 1758. https://doi.org/ (RNNs) [4,5] and generative adversarial networks (GANs) [6] proposed in 2014,
10.3390/s25061758 GoogleNet [7] and residual neural network (ResNet) [8] in 2015, and DeepMind’s WaveNet
Copyright: © 2025 by the authors. model [9] and AlphaGo model [10] in 2016. It should be noted that some of the papers
Licensee MDPI, Basel, Switzerland. were introduced as preprints in one year but were formally published in a subsequent year,
This article is an open access article so there may be a difference between the year of introduction and the year of publication.
distributed under the terms and
The year 2016 marked the successful transition of deep learning from theoretical research
conditions of the Creative Commons
Attribution (CC BY) license
to practical applications, and more sophisticated models have since started to blossom.
(https://creativecommons.org/ Google’s Transformer architecture [11], introduced in 2017, abandoned traditional
licenses/by/4.0/). RNNs or CNNs in favor of a self-attention mechanism, substantially improving the perfor-
mance of sequence modeling tasks, such as natural language processing (NLP). Google’s
BERT model [12], introduced in 2018, and OpenAI’s GPT-3 model [13], introduced in 2019,
both demonstrated powerful text generation and comprehension capabilities. They fur-
ther advanced the application of large models in NLP. Between 2020 and 2021, a Google
team proposed the Vision Transformer (ViT) [14], successfully applying Transformers to
computer vision tasks and challenging the dominance of convolutional neural networks in
image recognition. The release of YOLOv4 [15] demonstrated the efficiency and accuracy
of deep learning in real-time image analysis. Additionally, GANs and their variants, such
as BigGAN [16] and StyleGAN2 [17], continued to improve performance in image and
video generation tasks. In 2022, the launch of ChatGPT [18] (developed by OpenAI in
San Francisco, CA, USA) created a global sensation, marking a major breakthrough in
human–computer interaction and dialogue systems and setting new standards for future
intelligent assistants and AI customer service. OpenAI also introduced several large-scale
models based on Transformers, such as DALL·E, which generates images from textual
descriptions, the cross-modal model CLIP, and the code generation model Codex, all of
which obtained considerable attention in their respective fields. In 2023, Meta (Menlo Park,
CA, USA) released Llama [19], a highly efficient and accessible family of language models.
It aims to achieve high performance with fewer computational resources to facilitate ad-
vanced AI research and applications. Llama’s emphasis on accessibility and open collabora-
tion may significantly affect the trajectory of AI development. In late 2023, not surprisingly,
Google (Mountain View, CA, USA) launched Gemini [20] as a competitor to ChatGPT.
Then, in 2024, OpenAI introduced Sora, a cutting-edge tool to convert text into video [21].
This technological breakthrough generated widespread excitement and amazement around
the world. Amid the ongoing astonishment, Sora quickly faced intense competition from
emerging tools such as Google’s Veo 2 (developed by Google DeepMind, Mountain View,
California, USA), Kuaishou’s Kling (developed by Kuaishou, Beijing, China), Runway
(developed by Runway AI, New York, USA), etc. This speedy development indicates how
technology is advancing at an unprecedented pace in the AI era.
The aforementioned milestone events have tremendously influenced academia and
driven technological innovation and application across industries. For instance, companies
in the autonomous driving sector, like Tesla and Waymo, rely on deep learning algorithms
to enhance vehicle perception and decision-making [22,23]. In healthcare, deep learning
models are employed to analyze medical images and assist doctors in diagnosing dis-
eases [24]. With the ongoing technological evolution, the future of deep learning promises
broader applications and more exciting innovations and discoveries. It will continue to
fundamentally change how we live and work. It is particularly noteworthy that both the
2024 Nobel Prize in Physics and the 2024 Nobel Prize in Chemistry were awarded for
research work related to deep learning.
As deep learning and AI pervade nearly every field of engineering and science,
computer vision remains one of the key areas of application, which has been considerably
enhanced and expanded. Integrating AI with computer vision-based sensors and sensing
technologies has resulted in many groundbreaking advances, such as highly accurate
object detection [25], facial recognition [26], image segmentation [27], optical character
recognition [28], human pose estimation [29], and real-time 3D reconstruction [30,31]. These
are challenging to achieve with conventional methods due to the combined accuracy, speed,
simplicity, and efficiency required.
In 2022, we launched a Special Issue on the progress of AI in computer vision re-
search and applications, which focused on new vision-based sensors and measurement
technologies. The Special Issue featured 30 articles covering a wide range of methods and
applications. This editorial aims to briefly summarize these articles, along with providing
Sensors 2025, 25, 1758 3 of 17
insights into the future development of related technologies. Furthermore, recognizing the
rapid advancement of technologies in this field, we launched a new edition of the Special
Issue to present the latest progress and innovations in AI-powered computer vision for
sensors, sensing, and measurement research, along with their applications in engineering.
2. Where We Are
The previous Special Issue published 30 high-quality articles spanning multiple fields
related to sensors and sensing technologies. Here, we provide a brief review and summary
of these contributions.
Felipe and colleagues (Contribution 6) highlighted how errors in the pitch angle can
lead to significant distortion in 3D scene reconstruction. They proposed a machine learning-
based approach relying on regression algorithms to estimate and correct these pitch angle
errors. They used a range of regression methods, including Linear Regression, Regression
Trees, Regression Forests, and Multi-Layer Perceptron, trained on a variety of input–output
pairs that capture different real-world situations. This helped the calibration process reduce
distortion, resulting in more accurate 3D scene reconstructions.
function focuses on analyzing higher-level features to compare the original image with
the autoencoder’s output, which enhances the ability to segment various structures and
components. They validated their approach through experiments using a dataset that
mimics real-world conditions, and they claimed that their model outperformed other
leading techniques in the field of anomaly segmentation for the tested scenarios.
In their research (Contribution 12), the authors introduced a streamlined network seg-
mentation model named SEMD, which aimed to precisely segment images of standing trees
against complex backgrounds. This model utilizes multi-scale fusion with DeepLabV3+
to reduce the loss of feature information and incorporates the MobileNet architecture to
enhance computational efficiency. Additionally, an attention mechanism known as SENet is
included to effectively capture essential features while filtering out irrelevant data. The ex-
perimental results indicate that the SEMD model achieves a Mean Intersection over Union
(MIoU) of 91.78% in simpler settings and 86.90% against more intricate backgrounds.
Acknowledging the limitations of prior deep learning methods, for instance, segmenta-
tion, (Contribution 13) presented a new technique called Boundary Refine (BRefine), which
enhances segmentation quality and detail. This method leverages the FCN backbone for
segmentation, along with a multistage fusion mask head to boost mask resolution. It also
introduced BRank and sort loss (BR and S loss) to tackle segmentation inconsistencies and
improve boundary detection. In comparison with previous models like Mask R-CNN,
BRefine showed improvements of 3.0, 4.2, and 3.5 AP on the COCO, LVIS, and Cityscapes
datasets, respectively, with a further enhancement of 5.0 AP for large objects within the
COCO dataset.
The research documented in (Contribution 14) combined YOLOv5s for object detec-
tion with Deeplabv3+ for image segmentation to facilitate meter-reading extraction. The
YOLOv5s model first localizes the meter dial, followed by Deeplabv3+, which uses a Mo-
bileNetv2 backbone to effectively extract tick marks and pointers. The results demonstrated
that this methodology enables the YOLOv5s model to achieve an impressive mean average
precision of 99.58% (mAP50) on the dataset, along with a rapid detection speed of 22.2 ms.
The detection transformer (DETR), which utilizes a Transformer-based framework
for object detection, has attracted significant interest due to its strong performance on
the COCO val2017 dataset. However, these models face challenges when applied to new
environments that lack labeled data. To tackle this issue, (Contribution 15) proposed
an unsupervised domain adaptive technique known as DINO with cascading alignment
(CA-DINO). The approach introduces attention-enhanced double discriminators (AEDD)
and weak category-level token restraints (WROT). AEDD aligns with the local and global
contexts, while WROT extends the Deep CORAL loss to adjust class tokens after embedding.
Experimental results on two rigorous benchmarks indicated a 41% relative performance
boost compared to the baseline on the Foggy Cityscapes dataset.
Deep learning techniques have also been employed to detect liquid contents. The
method outlined in (Contribution 16) focuses on identifying liquids within transparent
containers, which is beneficial for various specialized applications, including service robots,
pouring robots, security inspections, and industrial monitoring.
Rather than depending on conventional object detection techniques that utilize visible
imaging, the research in (Contribution 17) approaches the challenge of accurately detecting
weak infrared targets in complex environments while fulfilling the real-time detection
needs. The authors developed a Bottleneck Transformer architecture and implemented
CoordConv techniques to enhance detection performance. This methodology led to a
notable accuracy increase, achieving a mean Average Precision (mAP) of 96.7%, which
reflects a 2.2 percentage point improvement over Yolov5s, outperforming other leading
detection algorithms.
Sensors 2025, 25, 1758 6 of 17
In the realm of human pose estimation, heatmap-based strategies have been predom-
inant, offering a high performance but facing difficulties in accurately detecting smaller
individuals. To remedy this, SSA Net (Contribution 18) proposes an innovative solution.
It employs HRNetW48 as a feature-extractor and utilizes the TDAA module to bolster
the perception of smaller scales. SSA Net replaces traditional heatmap methods with
coordinate vector regression, attaining an impressive AP of 77.4% on the COCO Validation
and competitive scores on the Tiny Validation and MPII datasets, showcasing its capability
across different benchmarks.
2.4. Applications
Addressing mismatches in computer vision—especially during the matching of image
pairs—is essential due to the inherent geometric and radiometric disparities that often
exist between images. These discrepancies can undermine the reliability of matching re-
sults, consequently impacting the accuracy of various vision-related tasks. Recognizing
the limitations of supervised learning techniques and the challenges in accurate labeling,
the authors of Contribution 23 introduced an innovative method that employs deep rein-
forcement learning (DRL). They developed an unsupervised learning framework named
Unsupervised Learning for Mismatch Removal (ULMR). When compared to traditional
supervised and unsupervised learning methods as well as conventional handcrafted tech-
niques, ULMR shows enhanced precision, a higher retention of correct matches, and a
decrease in false matches.
In the realm of video surveillance and behavior recognition, deep learning has show-
cased its potential, particularly in the medical sector. The approach detailed in Contribution
24 offers a comprehensive system for behavior-based video summarization and visualiza-
tion, which aimed to monitor and evaluate the health and well-being of dogs. This system
consists of multiple phases, such as video acquisition and preprocessing, object detection
and cropping, dog behavior detection, and the creation of visual summaries that illustrate
the dog’s location and behavioral patterns.
Sensors 2025, 25, 1758 7 of 17
Figure 1. Strain determination of a thin plate with a hole under tensile loading using a deep learning
scheme. From left to right: the initial shape, the deformed shape, and the corresponding shear
strain map.
Figure 2. AI−powered dynamic motion tracking of points in a tensegrity structure. Unlike conven-
tional methods, the deep learning-based approach can accurately track all points of interest. The
images were captured at 6600 fps, and the frame interval of the six representative images shown here
is 200.
The third example demonstrates the ability of a deep learning approach to identify
mechanical vibration modes (Figure 3). In this pilot study, multiple vibration modes,
specifically, amplitude and phase information for each point within the region of interest,
were extracted from a high-speed video clip of a freely vibrating thin plate. To prepare the
training dataset, theoretical solutions [38] were applied to the same plate (with one side
Sensors 2025, 25, 1758 10 of 17
fixed and the other three sides free) to generate simulated video frames. The preliminary
work utilizes a CNN and transformer architecture.
Figure 3. AI−powered analysis of free vibration modes in a thin plate. The images were captured at
a high speed of 14,000 fps. Presented are two typical frames along with 12 identified vibration modes.
The above three examples demonstrate that AI-powered visual sensors and sensing
have opened up exciting possibilities for unprecedented performance in numerous scientific
and engineering applications. However, there are several challenges that researchers
and engineers must address in order to fully realize the potential of these AI-powered
technologies. One of the primary challenges is the difficulty of capturing real-world datasets.
Visual sensors rely on large amounts of data to train deep learning models effectively, but
acquiring high-quality, diverse, and representative datasets can be time-consuming and
expensive, and often requires complex equipment. Moreover, in certain environments,
such as extreme conditions or hard-to-reach locations, gathering the necessary data may
not be feasible.
A natural way to deal with the real dataset preparation issue is to use theoretical
simulation datasets, as shown in the examples mentioned previously. Simulated datasets,
though less resource-intensive to generate, often struggle to capture the nuanced complex-
ities of real-world scenarios. For example, complex geometries and deformation fields
are difficult to rigorously simulate. As a result, AI models trained on such datasets may
struggle to generalize to new and unseen environments in real-world applications.
Additionally, there is no one-size-fits-all deep learning model for general sensing
applications, and identifying or designing the right model, that performs well across a wide
range of tasks, remains a major obstacle. Researchers often need to customize or fine-tune
models based on complex trials, which can lead to long research and development cycles.
Looking toward the future, several promising directions could help unlock the full
potential of AI-powered visual sensors. One such approach is the use of finite element
simulation to generate highly detailed synthetic datasets that simulate various physical
problems. By combining FEM with AI techniques, it is possible to create datasets that reflect
a wide range of scenario factors, such as stress, deformation, temperature, velocity, and
electric and magnetic field intensities, which would be difficult or expensive to capture in
the real world.
Dataset preparation may also utilize game engines like Unreal Engine and Unity [39] to
generate realistic and controlled datasets. These engines are capable of simulating diverse
environments with high accuracy. They offer flexibility in manipulating variables such as
Sensors 2025, 25, 1758 11 of 17
lighting, object positioning, and texture (e.g., transparent and high-reflective objects, which
are typically hard to measure for many sensors), making them an invaluable tool for testing
and training visual sensors under a broad range of situations.
It is important to distinguish the aforementioned simulated data from the “synthetic data”
produced via generative AI, which often cannot safely be used to train another AI model [40].
In recent years, one of the fastest-growing engineering applications of vision-based
sensing is non-contact movement detection, deformation measurement, and the health
monitoring of structures over long distances. The biggest challenge in accurately extracting
the desired information lies in eliminating visual distortions caused by variations in air
density and thermal haze over long-range measurements, as shown in Figure 4a. These
image distortions introduced by atmospheric effects can severely compromise the accu-
racy of motion and deformation analysis and often lead to errors in data interpretation.
As AI-powered sensing systems are increasingly being integrated into key engineering
applications such as remote sensing, autonomous vehicles, and surveillance, eliminating in-
terference is critical to obtaining reliable input image data. This is a technically demanding
task with a high priority in the field.
(a) (b)
Figure 4. AI should be capable of distinguishing disturbances from the physical quantities that
need to be measured. (a) Image distortion in long-range sensing caused by variations in air density
and thermal haze. Ten representative frames of a target are zoomed-in for better visualization;
(b) The motions of the solar panels under wind force, as captured by an outdoor camera, include the
rigid-body motions of the camera itself.
While some progress has been made to reduce atmospheric distortion through hard-
ware and software solutions [41], much work remains. Future advancements in AI-powered
computer vision should include exploring algorithms capable of isolating and compen-
sating for such distortions. Solving this problem will not only improve the accuracy and
reliability of long-range motion and deformation measurements but also unlock new levels
of performance for broader applications. For instance, Figure 4b shows an application
where the images documented both the vibrations of the solar panels and the more signifi-
cant movements of the outdoor camera. To accurately characterize the vibrations of the
panels under wind conditions, it is crucial to isolate the effects of the camera’s motion.
Figure 5 shows two microscopy measurements we encountered in the research. The
measurements not only highlight the applications of visual sensing technology in fields
such as biology and microelectronics but also demonstrate the difficulties of conducting
Sensors 2025, 25, 1758 12 of 17
(a) (b)
Figure 5. Visual sensing for motion and strain measurements using microscopy. (a) Bacteria motion
tracking under an optical microscope. It is difficult to accurately track all the bacteria motions;
(b) Microelectronics strain measurement under a scanning electron microscope. It is difficult to
determine the full-field strain map.
to sequences via rastering, but the resulting models are extremely large, computationally
expensive black boxes.
These challenges are exacerbated for multimodal data. In order to make predictions
based on heterogeneous data types, model inputs must be embedded into a common space,
typically via modality-specific encoding layers. This is particularly effective for modes
with similar feature spaces (e.g., RGB and IR). Beyond these cases, AI-powered sensing has
the potential to integrate data from very heterogeneous modes, views, and resolutions to
decrease uncertainty and improve predictive power. Examples might include combining
data from acoustic sensors and cameras or from multiple cameras with non-overlapping
fields of view. How to realize these abilities without prohibitively increasing model size
and operational cost remains an open question.
One promising cost-reduction approach that can be inserted into existing data–model
pipelines is to incorporate an extremely lightweight, near-sensor model that only deter-
mines whether a feature of interest may be present. The data are passed to the downstream
large model for full, costly processing only if this feature is detected [46]. In addition,
several fundamentally new approaches to AI modeling have seen rapid growth over the
past few years, although, in addition to open research questions, hardware suitable for
these methods is not as mature as that for conventional deep learning.
Hyperdimensional computing [47] encodes data into high-dimensional, nearly or-
thogonal vector representations, which can be processed with a small library of simple
vector-algebraic operations. The benefits of this approach include the comparatively trans-
parent “reasoning” regarding the data, high robustness to data faults, a low computational
cost, rapid training, and the straightforward integration of multimodal data [48]. Neuro-
symbolic AI combines conventional deep learning for feature extraction with symbolic
logic for transparent, robust causal reasoning, and generalizability [49]. In particular, con-
ventional deep learning is purely correlative by construction, and models capable of causal
reasoning represent a significant advance.
4. Closing Remarks
Recently, OpenAI announced the o3 model, which has significantly improved reason-
ing and problem-solving capabilities compared to previous models. The exceptional perfor-
mance of the experimental model on science and math tests has surprised researchers [50].
Meanwhile, a new startup, DeepSeek, revealed that the cost of the computing power re-
quired to train a large model can be a fraction of that demanded by existing models for
similar tasks [51], which has already spurred big tech corporations to reexamine their
assumption of unlimited computing power. As we were finalizing this editorial, Elon
Musk’s xAI released Grok-3 chatbot, to rival ChatGPT, DeepSeek, and Llama. Such intense
competition will undoubtedly drive rapid advancements and evolutions in generative AI.
Although not as hot as the field of generative AI at this time, the visual sensing
and perception domain is still attracting considerable attention. It can benefit from the
continuous research and development advances in generative AI. The future of AI-powered
visual sensors and sensing holds tremendous promise, with innovations poised to transform
industries in numerous fields and beyond.
We have launched the second edition of this Special Issue. This new edition aims to
show the latest progress and innovations in AI-powered computer vision for sensor, sensing,
and measurement research and applications. It is noteworthy that the scope of the vision iof
this Special Issue encompasses a wide range of imaging techniques, including conventional
imaging, X-ray imaging, microscopy imaging, magnetic resonance imaging, ultrasound
imaging, acoustic imaging, thermal imaging, endoscopic imaging, hyperspectral imaging,
radar imaging, and infrared imaging. Our editorial team has been expanded to include
Sensors 2025, 25, 1758 14 of 17
co-Guest Editors from a university, a federal research institution, a high-tech company, and
a military agency to ensure the diversity of perspectives. We welcome contributions from
our readers.
Author Contributions: Conceptualization, H.N., M.V., J.H. and Z.W.; methodology, Z.W.; software,
M.V. and Z.W.; validation, M.V. and J.H.; formal analysis, H.N. and Z.W.; investigation, H.N. and
M.V.; resources, J.H. and Z.W.; data curation, H.N. and Z.W.; writing—original draft preparation,
H.N. and Z.W.; writing—review and editing, H.N., M.V., J.H. and Z.W.; visualization, M.V. and Z.W.;
supervision, Z.W.; project administration, J.H. and Z.W. All authors have read and agreed to the
published version of the manuscript.
Acknowledgments: Z. Wang would like to thank Weidong Zhu of the University of Maryland,
Baltimore County, and Luo of the Catholic University of America for the collaborations on the
tensegrity study and bacterial chemotaxis projects, respectively. Part of the preliminary work pre-
sented in this editorial was carried out using a high-performance computing (HPC) server equipped
with NVIDIA H100 GPUs, funded by the United States Army Research Office under grant number
W911NF-23-1-0367.
List of Contributions
1. Ibrahem, H.; Salem, A.; Kang, H.S. RT-ViT: Real-Time Monocular Depth Estimation
Using Lightweight Vision Transformers. Sensors 2022, 22, 3849.
2. Liu, B.; Chen, K.; Peng, S.L.; Zhao, M. Adaptive Aggregate Stereo Matching Network
with Depth Map Super-Resolution. Sensors 2022, 22, 4548.
3. You, X.; Wang, Y.; Zhao, X. A Lightweight Monocular 3D Face Reconstruction Method
Based on Improved 3D Morphing Models. Sensors 2023, 23, 6713.
4. Wang, J.; Li, Y.; Ji, Y.; Qian, J.; Che, Y.; Zuo, C.; Chen, Q.; Feng, S. Deep Learning-Based
3D Measurements with Near-Infrared Fringe Projection. Sensors 2022, 22, 6469.
5. Wang, X.; Yuan, Y.; Liu, M.; Niu, Y. Iterated Residual Graph Convolutional Neural
Network for Personalized Three-Dimensional Reconstruction of Left Myocardium
from Cardiac MR Images. Sensors 2023, 23, 7430.
6. Felipe, J.; Sigut, M.; Acosta, L. Calibration of a Stereoscopic Vision System in the
Presence of Errors in Pitch Angle. Sensors 2022, 23, 212.
7. Han, Y.; Zheng, B.; Kong, X.; Huang, J.; Wang, X.; Ding, T.; Chen, J. Underwater Fish
Segmentation Algorithm Based on Improved PSPNet Network. Sensors 2023, 23, 8072.
8. Dang, T.V.; Tran, D.M.C.; Tan, P.X. IRDC-Net: Lightweight Semantic Segmentation Net-
work Based on Monocular Camera for Mobile Robot Navigation. Sensors 2023, 23, 6907.
9. Lu, K.; Cheng, J.; Li, H.; Ouyang, T. MFAFNet: A Lightweight and Efficient Network
with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation.
Sensors 2023, 23, 6382.
10. Su, E.; Tian, Y.; Liang, E.; Wang, J.; Zhang, Y. A Multiscale Instance Segmentation
Method Based on Cleaning Rubber Ball Images. Sensors 2023, 23, 4261.
11. Candido de Oliveira, D.; Nassu, B.T.; Wehrmeister, M.A. Image-Based Detection of
Modifications in Assembled PCBs with Deep Convolutional Autoencoders. Sensors
2023, 23, 1353.
12. Shi, L.; Wang, G.; Mo, L.; Yi, X.; Wu, X.; Wu, P. Automatic Segmentation of Standing
Trees from Forest Images Based on Deep Learning. Sensors 2022, 22, 6663.
13. Yu, J.; Yang, X.; Zhou, S.; Wang, S.; Hu, S. BRefine: Achieving High-Quality Instance
Segmentation. Sensors 2022, 22, 6499.
Sensors 2025, 25, 1758 15 of 17
14. Deng, G.; Huang, T.; Lin, B.; Liu, H.; Yang, R.; Jing, W. Automatic Meter Reading from
UAV Inspection Photos in the Substation by Combining YOLOv5s and DeeplabV3+.
Sensors 2022, 22, 7090.
15. Geng, H.; Jiang, J.; Shen, J.; Hou, M. Cascading Alignment for Unsupervised Domain-
Adaptive DETR with Improved DeNoising Anchor Boxes. Sensors 2022, 22, 9629.
16. Wu, Y.; Ye, H.; Yang, Y.; Wang, Z.; Li, S. Liquid Content Detection in Transparent
Containers: A Benchmark. Sensors 2023, 23, 6656.
17. Fan, X.; Ding, W.; Qin, W.; Xiao, D.; Min, L.; Yuan, H. Fusing Self-Attention and
CoordConv to Improve the YOLOv5s Algorithm for Infrared Weak Target Detection.
Sensors 2023, 23, 6755.
18. Li, S.; Zhang, H.; Ma, H.; Feng, J.; Jiang, M. SSA Net: Small Scale-Aware Enhancement
Network for Human Pose Estimation. Sensors 2023, 23, 7299.
19. Teixeira, E.; Araujo, B.; Costa, V.; Mafra, S.; Figueiredo, F. Literature Review on Ship
Localization, Classification, and Detection Methods Based on Optical Sensors and
Neural Networks. Sensors 2022, 22, 6879.
20. Shahzad, H.F.; Rustam, F.; Soriano Flores, E.; Vidal Mazón, J.L.; de la Torre Diez, I.;
Ashraf, I. A Review of Image Processing Techniques for Deepfakes. Sensors 2022, 22, 4556.
21. Mo, S.; Lu, P.; Liu, X. AI-Generated Face Image Identification with Different Color
Space Channel Combinations. Sensors 2022, 22, 8228.
22. Kang, S.; Kim, G.; Yoo, C.D. Fair Facial Attribute Classification via Causal Graph-
Based Attribute Translation. Sensors 2022, 22, 5271.
23. Deng, C.; Chen, S.; Zhang, Y.; Zhang, Q.; Chen, F. ULMR: An Unsupervised Learning
Framework for Mismatch Removal. Sensors 2022, 22, 6110.
24. Atif, O.; Lee, J.; Park, D.; Chung, Y. Behavior-Based Video Summarization System for
Dog Health and Welfare Monitoring. Sensors 2023, 23, 2892.
25. Patil, R.R.; Mustafa, M.Y.; Calay, R.K.; Ansari, S.M. S-BIRD: A Novel Critical Multi-
Class Imagery Dataset for Sewer Monitoring and Maintenance Systems. Sensors 2023,
23, 2966.
26. Fang, C.; Liu, J.; Han, P.; Chen, M.; Liao, D. FSVM: A Few-Shot Threat Detection
Method for X-ray Security Images. Sensors 2023, 23, 4069.
27. Kim, M.; Choi, H.C. Compact Image-Style Transfer: Channel Pruning on the Single
Training of a Network. Sensors 2022, 22, 8427.
28. Xu, H.; Wang, J.; Zhang, Y.; Zhang, G.; Xiong, Z. Subgrid Variational Optimized
Optical Flow Estimation Algorithm for Image Velocimetry. Sensors 2022, 23, 437.
29. Zaharia, C.; Popescu, V.; Sandu, F. Hardware–Software Partitioning for Real-Time
Object Detection Using Dynamic Parameter Optimization. Sensors 2023, 23, 4894.
30. Zhang, H.; Zhang, X.; Yu, D.; Guan, L.; Wang, D.; Zhou, F.; Zhang, W. Multi-Modality
Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action
Recognition. Sensors 2023, 23, 5414.
References
1. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507.
[CrossRef] [PubMed]
2. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of
the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012;
pp. 1097–1105.
3. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
4. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations
using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734.
Sensors 2025, 25, 1758 16 of 17
5. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th Conference
on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 8–13 December 2014; pp. 3104–3112.
6. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial
Nets. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, 8–13 December 2014;
pp. 2672–2680.
7. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with
Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA,
7–12 June 2015; pp. 1–9.
8. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
9. van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K.
WaveNet: A Generative Model for Raw Audio. In Proceedings of the 9th ISCA Speech Synthesis Workshop (SSW), Sunnyvale,
CA, USA, 13–15 September 2016; p. 125.
10. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam,
V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489.
[CrossRef]
11. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You
Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA,
4–9 December 2017; pp. 5998–6008.
12. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understand-
ing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186.
13. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al.
Language Models are Few-Shot Learners. In Proceedings of the 34th Conference on Neural Information Processing Systems
(NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 1877–1901.
14. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International
Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–17.
15. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
16. Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018,
arXiv:1809.11096.
17. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June
2020; pp. 8110–8119.
18. Achiam, O.J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.;
Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774.
19. Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al.
The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783.
20. Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini:
A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.1180.
21. Liu, Y.; Zhang, K.; Li, Y.; Yan, Z.; Gao, C.; Chen, R.; Yuan, Z.; Huang, Y.; Sun, H.; Gao, J.; et al. Sora: A Review on Background,
Technology, Limitations, and Opportunities of Large Vision Models. arXiv 2024, arXiv:2402.17177.
22. Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review.
Sensors 2021, 21, 2140. [CrossRef]
23. Rana, K.; Khatri, N. Automotive intelligence: Unleashing the potential of AI beyond advanced driver-assistance system, a
comprehensive review. Comput. Electr. Eng. 2024, 117, 109237. [CrossRef]
24. Bajwa, J.; Munir, U.; Nori, A.; Williams, B. Artificial intelligence in healthcare: Transforming the practice of medicine. Future
Healthc. J. 2021, 8, e188–e194. [CrossRef]
25. Ye, M.; Ke, L.; Li, S.; Tai, Y.W.; Tang, C.K.; Danelljan, M.; Yu, F. Cascade-DETR: Delving into High-Quality Universal Object Detec-
tion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–10 October 2023;
pp. 6704–6714.
26. Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020, 20, 342. [CrossRef] [PubMed]
27. Ma, J.; He, Y.; Li, F.; Han, L.J.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [CrossRef]
[PubMed]
Sensors 2025, 25, 1758 17 of 17
28. Li, M.; Lv, T.; Chen, J.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. TrOCR: Transformer-Based Optical Character Recognition
with Pre-trained Models. Proc. AAAI Conf. Artif. Intell. 2023, 37, 13094–13102. [CrossRef]
29. Xu, J.; Guo, Y.; Peng, Y. FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models. In Proceedings
of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024;
pp. 561–570.
30. Nguyen, H.; Novak, E.; Wang, Z. Accurate 3D reconstruction via fringe-to-phase network. Measurement 2022, 190, 110663.
[CrossRef]
31. Mu, F.; Sifferman, C.; Jungerman, S.; Li, Y.; Han, M.; Gleicher, M.; Gupta, M.; Li, Y. Towards 3D Vision with Low-Cost Single-
Photon Cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA,
USA, 17–21 June 2024; pp. 5302–5311.
32. Zhang, H.; Ba, Y.; Yang, E.; Mehra, V.; Gella, B.; Suzuki, A.; Pfahnl, A.; Chandrappa, C.C.; Wong, A.; Kadambi, A. WeatherStream:
Light Transport Automation of Single Image Deweathering . In Proceedings of the 2023 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13499–13509.
33. Ramazzina, A.; Bijelic, M.; Walz, S.; Sanvito, A.; Scheuble, D.; Heide, F. ScatterNeRF: Seeing Through Fog with Physically-Based
Inverse Neural Rendering. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris,
France, 1–6 October 2023; pp. 17911–17922.
34. Muskhelishvili, N. Some Basic Problems of the Mathematical Theory of Elasticity, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1977.
35. Fung, Y.; Tong, P. Classical and Computational Solid Mechanics; World Scientific: Singapore, 2001.
36. Altenbach, H.; Bogdanov, V.; Grigorenko, A.; Kushnir, R.; Nazarenko, V.; Eremeyev, V. Selected Problems of Solid Mechanics and
Solving Methods; Advanced Structured Materials; Springer Nature: Cham, Switzerland, 2024.
37. Karaev, N.; Makarov, I.; Wang, J.; Neverova, N.; Vedaldi, A.; Rupprecht, C. CoTracker3: Simpler and Better Point Tracking by
Pseudo-Labelling Real Videos. arXiv 2024, arXiv:2410.11831.
38. Kreyszig, E.; Kreyszig, H.; Norminton, E.J. Advanced Engineering Mathematics, 10th ed.; John Wiley & Sons: Hoboken, NJ,
USA, 2011.
39. Ciekanowska, A.; Kiszczak Gliński, A.; Dziedzic, K. Comparative Analysis of Unity and Unreal Engine Efficiency in Creating
Virtual Exhibitions of 3D Scanned Models. J. Comput. Sci. Inst. 2021, 20, 247–253. [CrossRef]
40. Alemohammad, S.; Casco-Rodriguez, J.; Luzi, L.; Humayun, A.I.; Babaei, H.; LeJeune, D.; Siahkoohi, A.; Baraniuk, R.G. Self-
Consuming Generative Models Go MAD. In Proceedings of the Twelfth International Conference on Learning Representations
(ICLR), Vienna, Austria, 7–11 May 2024.
41. Liu, Y.; Yu, L.; Wang, Z.; Pan, B. Neutralizing the impact of heat haze on digital image correlation measurements via deep
learning. Opt. Lasers Eng. 2023, 164, 107522. [CrossRef]
42. Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. In Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 12179–12188.
43. Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks are Universal Approximators. Neural Netw. 1989,
2, 359–366. [CrossRef]
44. Kidger, P.; Lyons, T. Universal Approximation with Deep Narrow Networks. In Proceedings of the 33rd Conference on Learning
Theory (COLT), Virtual Event, 9–12 July 2020; pp. 2306–2327.
45. Pilz, K.F.; Mahmood, Y.; Heim, L. AI’s Power Requirements Under Exponential Growth: Extrapolating AI Data Center Power Demand
and Assessing Its Potential Impact on U.S. Competitiveness; RAND Corporation: Santa Monica, CA, USA, 2025.
46. Rezvani, A.; Huang, W.; Chen, H.; Ni, Y.; Imani, M. Self-Trainable and Adaptive Sensor Intelligence for Selective Data Generation.
Front. Artif. Intell. 2024, 7, 1403187. [CrossRef] [PubMed]
47. Kanerva, P. Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional
Random Vectors. Cogn. Comput. 2009, 1, 139–159. [CrossRef]
48. Abdulrahman, M.; Wasif, S.; Wael, M.; Azab, E.; Mashaly, M.; Abd El Ghany, M.A.A. A Review on Hyperdimensional
Computing. In Proceedings of the 2023 International Conference on Microelectronics (ICM), Abu Dhabi, United Arab Emirates,
17–20 December 2023; pp. 74–79.
49. Colelough, B.C.; Regli, W. Neuro-Symbolic AI in 2024: A Systematic Review. arXiv 2025, arXiv:2501.05435; pp. 1–19.
50. Jones, N. How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest. Nature 2025, 637, 774–775. [CrossRef]
51. DeepSeek-AI.; Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; et al. DeepSeek-R1: Incentivizing
Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2025, arXiv:2501.12948.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.