0% found this document useful (0 votes)

52 views17 pages

AI-Powered Visual Sensors and Sensing: Where We Are and Where WeAreGoing

The editorial discusses the advancements and future directions of AI-powered visual sensors and sensing technologies, emphasizing the rapid growth of deep learning and its applications in various fields such as autonomous driving and healthcare. It highlights key contributions from a recent Special Issue on computer vision, showcasing innovative methods in depth estimation, segmentation, and object detection. The document underscores the transformative impact of deep learning on technology and its potential for further innovations in engineering and science.

Uploaded by

experimental mechanics

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views17 pages

AI-Powered Visual Sensors and Sensing: Where We Are and Where WeAreGoing

Uploaded by

experimental mechanics

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Editorial

AI-Powered Visual Sensors and Sensing: Where We Are

and Where We Are Going
Hieu Nguyen 1,2,3 , Minh Vo 4 , John Hyatt 5 and Zhaoyang Wang 6, *

1 School of Electrical Engineering, International University, Ho Chi Minh City 700000, Vietnam;
nthieu@hcmiu.edu.vn
2 Vietnam National University, Ho Chi Minh City 700000, Vietnam
3 Neuroimaging Research Branch, National Institute on Drug Abuse, National Institutes of Health,
Baltimore, MD 21224, USA
4 SpreeAI, Incline Village, NV 89450, USA; minh.vo@spreeai.com
5 U.S. Army Research Laboratory, 2201 Aberdeen Boulevard, Aberdeen, MD 21005, USA;
john.s.hyatt11.civ@army.mil
6 Department of Mechanical Engineering, School of Engineering, The Catholic University of America,
Washington, DC 20064, USA
* Correspondence: wangz@cua.edu

1. Introduction
Deep learning, a machine learning method that mimics the neural network structures
of the human brain to process data, recognize patterns, and make decisions, traces its origins
back to the 1950s. It was not until the beginning of the 21st century that deep learning
truly began to flourish, with the breakthrough in algorithms, the significant increase in
computing power, and the advent of large-scale data acquisition. As a subset of artificial
intelligence (AI), deep learning has acted as a driving force behind strengthening the impact
of AI in different fields and enhancing its integration into daily life.
In 2006, Geoffrey Hinton and his student [1] showed that deep belief networks, stacks
of restricted Boltzmann machines, could be trained layer by layer in an unsupervised man-
ner and fine-tuned using supervised learning. This model addressed the vanishing gradient
problem and allowed for the practical training of multilayer neural networks. It laid the
foundation for subsequent advancements and marked the revival of deep learning. A land-
mark success came in 2012 when AlexNet [2] delivered groundbreaking results on images
from the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), outperforming all
Received: 11 February 2025
previous methods. This achievement highlighted the overwhelming capabilities of deep
Accepted: 3 March 2025
Published: 12 March 2025
convolutional neural networks (CNNs) and quickly attracted widespread attention from
both academia and industry.
Citation: Nguyen, H.; Vo, M.;
Hyatt, J.; Wang, Z. AI-Powered Visual
Over the past decade, deep learning has undergone a quick growth and has driven
Sensors and Sensing: Where We Are numerous breakthroughs in AI. Some notable early achievements include, but are
and Where We Are Going. Sensors not limited to, the dropout scheme [3] introduced in 2013, recurrent neural networks
2025, 25, 1758. https://doi.org/ (RNNs) [4,5] and generative adversarial networks (GANs) [6] proposed in 2014,
10.3390/s25061758 GoogleNet [7] and residual neural network (ResNet) [8] in 2015, and DeepMind’s WaveNet
Copyright: © 2025 by the authors. model [9] and AlphaGo model [10] in 2016. It should be noted that some of the papers
Licensee MDPI, Basel, Switzerland. were introduced as preprints in one year but were formally published in a subsequent year,
This article is an open access article so there may be a difference between the year of introduction and the year of publication.
distributed under the terms and
The year 2016 marked the successful transition of deep learning from theoretical research
conditions of the Creative Commons
Attribution (CC BY) license
to practical applications, and more sophisticated models have since started to blossom.
(https://creativecommons.org/ Google’s Transformer architecture [11], introduced in 2017, abandoned traditional
licenses/by/4.0/). RNNs or CNNs in favor of a self-attention mechanism, substantially improving the perfor-

Sensors 2025, 25, 1758 https://doi.org/10.3390/s25061758

Sensors 2025, 25, 1758 2 of 17

mance of sequence modeling tasks, such as natural language processing (NLP). Google’s
BERT model [12], introduced in 2018, and OpenAI’s GPT-3 model [13], introduced in 2019,
both demonstrated powerful text generation and comprehension capabilities. They fur-
ther advanced the application of large models in NLP. Between 2020 and 2021, a Google
team proposed the Vision Transformer (ViT) [14], successfully applying Transformers to
computer vision tasks and challenging the dominance of convolutional neural networks in
image recognition. The release of YOLOv4 [15] demonstrated the efficiency and accuracy
of deep learning in real-time image analysis. Additionally, GANs and their variants, such
as BigGAN [16] and StyleGAN2 [17], continued to improve performance in image and
video generation tasks. In 2022, the launch of ChatGPT [18] (developed by OpenAI in
San Francisco, CA, USA) created a global sensation, marking a major breakthrough in
human–computer interaction and dialogue systems and setting new standards for future
intelligent assistants and AI customer service. OpenAI also introduced several large-scale
models based on Transformers, such as DALL·E, which generates images from textual
descriptions, the cross-modal model CLIP, and the code generation model Codex, all of
which obtained considerable attention in their respective fields. In 2023, Meta (Menlo Park,
CA, USA) released Llama [19], a highly efficient and accessible family of language models.
It aims to achieve high performance with fewer computational resources to facilitate ad-
vanced AI research and applications. Llama’s emphasis on accessibility and open collabora-
tion may significantly affect the trajectory of AI development. In late 2023, not surprisingly,
Google (Mountain View, CA, USA) launched Gemini [20] as a competitor to ChatGPT.
Then, in 2024, OpenAI introduced Sora, a cutting-edge tool to convert text into video [21].
This technological breakthrough generated widespread excitement and amazement around
the world. Amid the ongoing astonishment, Sora quickly faced intense competition from
emerging tools such as Google’s Veo 2 (developed by Google DeepMind, Mountain View,
California, USA), Kuaishou’s Kling (developed by Kuaishou, Beijing, China), Runway
(developed by Runway AI, New York, USA), etc. This speedy development indicates how
technology is advancing at an unprecedented pace in the AI era.
The aforementioned milestone events have tremendously influenced academia and
driven technological innovation and application across industries. For instance, companies
in the autonomous driving sector, like Tesla and Waymo, rely on deep learning algorithms
to enhance vehicle perception and decision-making [22,23]. In healthcare, deep learning
models are employed to analyze medical images and assist doctors in diagnosing dis-
eases [24]. With the ongoing technological evolution, the future of deep learning promises
broader applications and more exciting innovations and discoveries. It will continue to
fundamentally change how we live and work. It is particularly noteworthy that both the
2024 Nobel Prize in Physics and the 2024 Nobel Prize in Chemistry were awarded for
research work related to deep learning.
As deep learning and AI pervade nearly every field of engineering and science,
computer vision remains one of the key areas of application, which has been considerably
enhanced and expanded. Integrating AI with computer vision-based sensors and sensing
technologies has resulted in many groundbreaking advances, such as highly accurate
object detection [25], facial recognition [26], image segmentation [27], optical character
recognition [28], human pose estimation [29], and real-time 3D reconstruction [30,31]. These
are challenging to achieve with conventional methods due to the combined accuracy, speed,
simplicity, and efficiency required.
In 2022, we launched a Special Issue on the progress of AI in computer vision re-
search and applications, which focused on new vision-based sensors and measurement
technologies. The Special Issue featured 30 articles covering a wide range of methods and
applications. This editorial aims to briefly summarize these articles, along with providing
Sensors 2025, 25, 1758 3 of 17

insights into the future development of related technologies. Furthermore, recognizing the
rapid advancement of technologies in this field, we launched a new edition of the Special
Issue to present the latest progress and innovations in AI-powered computer vision for
sensors, sensing, and measurement research, along with their applications in engineering.

2. Where We Are
The previous Special Issue published 30 high-quality articles spanning multiple fields
related to sensors and sensing technologies. Here, we provide a brief review and summary
of these contributions.

2.1. Depth Estimation and 3D Reconstruction

The field of 3D reconstruction and depth estimation is rapidly evolving, with innova-
tive techniques leveraging deep learning and advanced algorithms to enhance accuracy,
efficiency, and real-time performance across various applications. Several recent contribu-
tions that significantly advance this domain are summarized below.
Ibrahem and colleagues (Contribution 1) presented a novel method for dynamic depth
estimation via Vision Transformers (ViTs) in the encoder branch. By introducing different
versions of ViTs, ViT-t16, and ViT-s16, the technique effectively manages the computational
demands typical of ViTs. This optimization supports the balance between speed and
accuracy and allows the system to operate at approximately 17 frames per second, which
makes it suitable for dynamic applications.
Liu’s team (Contribution 2) proposed a deep learning-based stereo-matching frame-
work, ASR-Net, which integrates a feature extractor network with a residual depth refine-
ment. The depth estimation process can gradually start refining at low resolution and
follow a hierarchical strategy to improve the depth accuracy by using novel deformable
convolution techniques and adaptive cost aggregation. This method has shown remark-
able improvements in both speed and accuracy across several datasets, such as kitti2015,
kitti2012, and scene flow.
Considering the accuracy and speed balance in the 3D Morphing Model, You et al.
(Contribution 3) proposed a lightweight network to improve the 3D deformation model,
called Mobile-FaceRNet. The training network leverages the multi-scale representation
approaches and separable convolutions, together with novel perceptual loss and unique
residual attention modules, to produce high-fidelity facial reconstructions that remain
robust against variations in pose and obstructions.
Wang and co-authors (Contribution 4) presented a deep learning-based 3D shape mea-
surement method built on the traditional fringe projection profilometry (FPP) technique.
The technique uses invisible fringe patterns in near-infrared spectra and a CNN-based
framework, which adopts a 3D collaborative filtering and block-matching approach. Com-
pared to previous integrated and traditional denoising techniques, the proposed technique
improves the 3D shape measurement accuracy using a near-infrared system and increases
the computational speed using the learning-based method.
An article by Wang et al. (Contribution 5) involved the use of a residual graph CNN
for the 3D reconstruction of the left myocardium with cardiac magnetic resonance (MR)
images. This method combines initial mesh generation, vertex joint feature learning, and
mesh deformation. The initial mesh is reconstructed via segmentation and the Cubify
algorithm, the deformed mesh is iteratively constructed via residual graph CNN, and the
skeleton of vertices at different receptive fields is obtained using an autoencoder neural
network. The results were obtained with cardiac MR images and show the enhanced
accuracy and robustness of the reconstruction process compared to existing methods.
Sensors 2025, 25, 1758 4 of 17

Felipe and colleagues (Contribution 6) highlighted how errors in the pitch angle can
lead to significant distortion in 3D scene reconstruction. They proposed a machine learning-
based approach relying on regression algorithms to estimate and correct these pitch angle
errors. They used a range of regression methods, including Linear Regression, Regression
Trees, Regression Forests, and Multi-Layer Perceptron, trained on a variety of input–output
pairs that capture different real-world situations. This helped the calibration process reduce
distortion, resulting in more accurate 3D scene reconstructions.

2.2. Segmentation and Object Detection

Deep learning plays a vital role in image segmentation and object detection applica-
tions. The corresponding methods adopt complex algorithms and neural network archi-
tectures to improve accuracy and reliability, increase the processing speed, and address
the complexities associated with various data conditions. The following contributions
highlight the latest developments in this field.
Recognizing the importance of the sustainable development of intelligent fisheries,
Han et al. (Contribution 7) presented a variation of the PSPNet network, known as IST-
PSPNet, designed for fish segmentation. The method features an iterative attention feature
fusion mechanism for capturing detailed features across multiple scales. It also utilizes a
SoftPool pooling technique to reduce parameters and computational load while retaining
important feature information. A triadic attention mechanism, called triplet attention (TA),
enhances the focus on specific fish body characteristics. The method demonstrated its
effectiveness on the DeepFish dataset, yielding a mean Intersection over Union (mIoU) of
91.56%, with 46.68 million parameters and 40.27 GFLOPS.
Dang et al. (Contribution 8) introduced a variation of the semantic segmentation
network incorporating quantization techniques to extract corridor scenes from a single
image, facilitating robot navigation. The architecture includes autoencoder MobilenetV2
with multi-scale fusion characteristics, enhancing segmentation accuracy while minimizing
computational demands. A new balanced cross-entropy loss function was introduced,
and the network model was trained using four datasets: CitySpaces (5000 images), KITTI
(400 images), Duckie-dataset from Ducktown (1200 images), and TaQuangBuu’s dataset
(1200 images).
Recognizing the limitations of multi-scale receptive fields in a real-time using seg-
mentation network, MFAFNet (Contribution 9) was proposed to tackle this with a parallel
structure to extract short- and long-range contextual information and a separable asym-
metric reinforcement non-bottleneck module to preserve spatial details. The network was
tested on Cityscapes and CamVid; it achieved 75.9% and 69.9% mean IoU with 60.1 and
82.6 FPS, using only 1.27 million parameters.
Regardless of the use of a multi-scale feature fusion module, Su’s work (Contribution 10)
proposed an instance segmentation method for the rubber ball cleaning system. Unlike
previous backbone networks with the convolution module, this study employed a Pyramid
Vision Transformer with the attention mechanism to enhance the feature extraction and
reduce the computational cost. In addition, they improved the feature fusion module across
the scales to enhance the output feature representation. Compared to DeepMask, Mask
R-CNN, BlendMask, SOLOv1, and SOLOv2, their model improved the Dice score, Jaccard
coefficient, and mAP by 4.5%, 4.7%, and 7.73%, respectively, achieving 33.6 FPS and 79.3%
segmentation accuracy.
Anomaly detection and segmentation techniques are increasingly being used in areas
such as manufacturing quality control and fraud detection. In their research (Contribution 11),
Candido de Oliveira and his team introduced a unique loss function designed for training
a deep convolutional autoencoder, utilizing only images of unaltered boards. This loss
Sensors 2025, 25, 1758 5 of 17

function focuses on analyzing higher-level features to compare the original image with
the autoencoder’s output, which enhances the ability to segment various structures and
components. They validated their approach through experiments using a dataset that
mimics real-world conditions, and they claimed that their model outperformed other
leading techniques in the field of anomaly segmentation for the tested scenarios.
In their research (Contribution 12), the authors introduced a streamlined network seg-
mentation model named SEMD, which aimed to precisely segment images of standing trees
against complex backgrounds. This model utilizes multi-scale fusion with DeepLabV3+
to reduce the loss of feature information and incorporates the MobileNet architecture to
enhance computational efficiency. Additionally, an attention mechanism known as SENet is
included to effectively capture essential features while filtering out irrelevant data. The ex-
perimental results indicate that the SEMD model achieves a Mean Intersection over Union
(MIoU) of 91.78% in simpler settings and 86.90% against more intricate backgrounds.
Acknowledging the limitations of prior deep learning methods, for instance, segmenta-
tion, (Contribution 13) presented a new technique called Boundary Refine (BRefine), which
enhances segmentation quality and detail. This method leverages the FCN backbone for
segmentation, along with a multistage fusion mask head to boost mask resolution. It also
introduced BRank and sort loss (BR and S loss) to tackle segmentation inconsistencies and
improve boundary detection. In comparison with previous models like Mask R-CNN,
BRefine showed improvements of 3.0, 4.2, and 3.5 AP on the COCO, LVIS, and Cityscapes
datasets, respectively, with a further enhancement of 5.0 AP for large objects within the
COCO dataset.
The research documented in (Contribution 14) combined YOLOv5s for object detec-
tion with Deeplabv3+ for image segmentation to facilitate meter-reading extraction. The
YOLOv5s model first localizes the meter dial, followed by Deeplabv3+, which uses a Mo-
bileNetv2 backbone to effectively extract tick marks and pointers. The results demonstrated
that this methodology enables the YOLOv5s model to achieve an impressive mean average
precision of 99.58% (mAP50) on the dataset, along with a rapid detection speed of 22.2 ms.
The detection transformer (DETR), which utilizes a Transformer-based framework
for object detection, has attracted significant interest due to its strong performance on
the COCO val2017 dataset. However, these models face challenges when applied to new
environments that lack labeled data. To tackle this issue, (Contribution 15) proposed
an unsupervised domain adaptive technique known as DINO with cascading alignment
(CA-DINO). The approach introduces attention-enhanced double discriminators (AEDD)
and weak category-level token restraints (WROT). AEDD aligns with the local and global
contexts, while WROT extends the Deep CORAL loss to adjust class tokens after embedding.
Experimental results on two rigorous benchmarks indicated a 41% relative performance
boost compared to the baseline on the Foggy Cityscapes dataset.
Deep learning techniques have also been employed to detect liquid contents. The
method outlined in (Contribution 16) focuses on identifying liquids within transparent
containers, which is beneficial for various specialized applications, including service robots,
pouring robots, security inspections, and industrial monitoring.
Rather than depending on conventional object detection techniques that utilize visible
imaging, the research in (Contribution 17) approaches the challenge of accurately detecting
weak infrared targets in complex environments while fulfilling the real-time detection
needs. The authors developed a Bottleneck Transformer architecture and implemented
CoordConv techniques to enhance detection performance. This methodology led to a
notable accuracy increase, achieving a mean Average Precision (mAP) of 96.7%, which
reflects a 2.2 percentage point improvement over Yolov5s, outperforming other leading
detection algorithms.
Sensors 2025, 25, 1758 6 of 17

In the realm of human pose estimation, heatmap-based strategies have been predom-
inant, offering a high performance but facing difficulties in accurately detecting smaller
individuals. To remedy this, SSA Net (Contribution 18) proposes an innovative solution.
It employs HRNetW48 as a feature-extractor and utilizes the TDAA module to bolster
the perception of smaller scales. SSA Net replaces traditional heatmap methods with
coordinate vector regression, attaining an impressive AP of 77.4% on the COCO Validation
and competitive scores on the Tiny Validation and MPII datasets, showcasing its capability
across different benchmarks.

2.3. Literature Review

Object detection plays a crucial role in computer vision, significantly enhancing situa-
tional awareness and surveillance within marine environments. The research presented
in (Contribution 19) examines various models designed to bolster maritime surveillance
systems, with an emphasis on ship localization, classification, and detection. In a related
domain, deepfake technology (Contribution 20), which creates counterfeit audio, images,
and videos, raises significant concerns regarding privacy, democracy, and national secu-
rity. Consequently, considerable efforts have been directed toward developing methods
to identify digital manipulations. Contribution 21 in the Special Issue introduced a novel
approach to identifying the face images generated by deep networks, utilizing different
color spaces. This network evaluates the variations in the color space components in a deep
learning framework, enhancing face sensitivity and the model’s discriminative capabilities
through the implementation of a channel attention mechanism.
Moreover, (Contribution 22) explored the concept of counterfactual fairness in the
classification of facial attributes. The method employs a causal graph-based technique for
attribute translation, generating realistic counterfactual images while taking into account
the intricate causal relationships among various attributes through an encoder–decoder
model. A causal graph is used to sample both factual and counterfactual facial attributes
from a chosen face image. Extensive experiments conducted on the CelebA dataset show-
case the proposed method’s effectiveness and interpretability in the realm of multi-attribute
facial classification.

2.4. Applications
Addressing mismatches in computer vision—especially during the matching of image
pairs—is essential due to the inherent geometric and radiometric disparities that often
exist between images. These discrepancies can undermine the reliability of matching re-
sults, consequently impacting the accuracy of various vision-related tasks. Recognizing
the limitations of supervised learning techniques and the challenges in accurate labeling,
the authors of Contribution 23 introduced an innovative method that employs deep rein-
forcement learning (DRL). They developed an unsupervised learning framework named
Unsupervised Learning for Mismatch Removal (ULMR). When compared to traditional
supervised and unsupervised learning methods as well as conventional handcrafted tech-
niques, ULMR shows enhanced precision, a higher retention of correct matches, and a
decrease in false matches.
In the realm of video surveillance and behavior recognition, deep learning has show-
cased its potential, particularly in the medical sector. The approach detailed in Contribution
24 offers a comprehensive system for behavior-based video summarization and visualiza-
tion, which aimed to monitor and evaluate the health and well-being of dogs. This system
consists of multiple phases, such as video acquisition and preprocessing, object detection
and cropping, dog behavior detection, and the creation of visual summaries that illustrate
the dog’s location and behavioral patterns.
Sensors 2025, 25, 1758 7 of 17

Another significant application of deep learning in video monitoring can be found

in sewer maintenance and cleaning operations. The project described in Contribution 25
introduced the S-BIRD (Sewer-Blockages Imagery Recognition Dataset) to raise awareness
about the prevalent issue of sewer blockages caused by materials like grease, plastic, and
tree roots.
In the context of video monitoring, the challenge of identifying threats in X-ray bag-
gage with a scarcity of labeled data is tackled by the FSVM model, a few-shot SVM-
constrained approach outlined in Contribution 26. FSVM incorporates a differentiable
SVM layer to enhance decision propagation and employs a combined loss function that
includes SVM loss. Evaluation using the SIXray dataset indicates that FSVM surpasses four
widely-used few-shot detection models, especially in complex dataset scenarios.

2.5. Computational Efficiency Optimization

The challenge of the high memory usage and computational demands in complex
neural networks, such as VGG, is significant. The method introduced in Contribution
27 tackles this problem by automatically pinpointing convolutional channels that remain
inactive throughout the training phase. This is achieved through two innovative loss
functions: channel loss and xor loss. By adopting this strategy, notable enhancements
were observed, including a increase in the speed of image generation of up to 49% and a
20% reduction in the number of parameters, all while preserving effective style transfer
performance on the CIFAR-10 dataset.
The research in Contribution 28 uses a variational optical flow model to create a
subgrid-scale optimization technique to accurately model intricate fluid motions within
image sequences and calculate their two-dimensional velocity fields. By merging aspects
of incompressible fluid dynamics with large-eddy simulation techniques, this approach
effectively divides motion into both large-scale and small-scale turbulence elements, utilizing
the Smagorinsky model. The newly developed subgrid scale Horn–Schunck (SGS-HS) optical
flow algorithm demonstrated enhanced performance compared to conventional methods like
Farneback dense optical flow, particularly in turbulent situations. It incorporated a velocity
gradient constraint to boost accuracy in open-channel flow velocimetry experiments.
The investigation presented in (Contribution 29) confronts the obstacles of real-time ob-
ject detection by suggesting a hybrid hardware–software framework. This framework aims
to optimize computer vision algorithms across a variety of platforms, from smartphones to
surveillance systems, with a particular focus on improving memory bandwidth efficiency
and minimizing energy consumption. The research delves into the allocation of algorithm
components to hardware resources, such as IP Cores, and elaborates on methods to facili-
tate interaction between hardware and software. Utilizing embedded AI, this approach
dynamically configures and manages hardware resources, which enhances performance
and adaptability in object detection tasks. The experimental findings illustrated the sub-
stantial advantages of employing IP Cores within an FPGA-based system, demonstrating
noticeable improvements in the overall effectiveness and efficiency of the algorithms.
In Contribution 30, an advanced framework known as Multi-Modality Adaptive Fea-
ture Fusion (MMAFF) was proposed for skeleton-based action recognition, utilizing graph
convolutional networks. This framework introduces multi-scale adaptive convolution
kernels and various dilation rates, significantly improving the network’s capacity to ac-
commodate diverse receptive fields across multiple layers and datasets. A self-attention
mechanism was incorporated to refine traditional multi-scale temporal convolution, allow-
ing for the adaptive selection of convolution parameters. Moreover, the research addressed
the challenges related to context aggregation and initial feature fusion through a novel
feature-fusion mechanism that replaces conventional residual connections. MMAFF exhib-
Sensors 2025, 25, 1758 8 of 17

ited competitive results on established benchmark datasets such as NTU-RGB+D 60 and

NTU-RGB+D 120, highlighting its effectiveness in extracting spatial and temporal features
for multi-modal action recognition tasks.

3. Where We Are Going

Given the pivotal role of 3D visual sensing and depth perception in the highly com-
petitive and investment-intensive field of autonomous driving, achieving real-time, high-
accuracy 3D imaging and depth measurement with comprehensive whole-field coverage
remains one of the most actively researched technologies in computer vision and artificial
intelligence. The approaches can be classified into two primary categories: the first directly
extracts depth and 3D information from real-time video stream captured by single or multi-
ple cameras using AI algorithms; the second utilizes AI to match the full-field pixel points of
images captured by two (or multiple) cameras, and then employs conventional algorithms
to calculate the depth and 3D point cloud data. The first approach is straight and aligns
with the desired goal, while the latter may offer higher reliability through its stepwise
process. Although deep learning-based 3D imaging and depth perception technologies,
along with related software and hardware advancements (such as coping with interference
from fog, rain, and snow [32,33]), are progressing rapidly, a general AI model capable of
adapting to diverse real-world scenarios remains lacking. Nevertheless, we are optimistic
that significant breakthroughs will come soon.
In addition to AI-powered 3D imaging and depth perception, AI and deep learning
can enhance many other established sensing techniques.

3.1. Enhancing Established Methods with Deep Learning

In scientific research and engineering applications, we sometimes face challenges
where we do not have access to certain techniques to carry out specific sensing or mea-
surement tasks for various reasons or the available conventional methods fail to provide
expected results. In such instances, the use and integration of AI can help solve the problem.
Three illustrative examples that we recently obtained in preliminary research are presented
in the following.
In structural and material characterization testing, it is often necessary to measure the
full-field strain distribution of a target object. Digital image correlation (DIC) is currently
a popular technique for such testing. However, the complexity of DIC algorithms has
limited their implementation to a small number of companies and research teams for
general and practical applications. In addition, the accuracy of the DIC results in advanced
measurements often depends on various analytical control parameters, which require the
user to have solid expertise in the technique. Using deep learning methods, accurate full-
field strain measurements can now easily be achieved through a black-box-like approach,
as shown in Figure 1. This example features a classic uniaxial tensile test on a thin plate
with a circular hole in the center, where the stress concentration is well-documented at the
edge of the hole. Unlike conventional methods, the deep learning-based approach is much
more straightforward, without the need to specify numerous analysis control parameters.
This preliminary exploration work uses a CNN encoder–decoder architecture, and the
training was performed using a simulated dataset generated from theoretical solutions of
classical solid mechanics problems [34–36]. The DIC technique is currently one of the most,
if not the most, widely adopted methods in the field of experimental mechanics, and we
can expect that AI-powered techniques may soon replace the existing algorithms of the
DIC technique due to their considerable advantages.
Sensors 2025, 25, 1758 9 of 17

Figure 1. Strain determination of a thin plate with a hole under tensile loading using a deep learning
scheme. From left to right: the initial shape, the deformed shape, and the corresponding shear
strain map.

In the second example, we show the dynamic motion measurement of a tensegrity

structure, as illustrated in Figure 2. This experiment faces two notable challenges: first,
the points of interest are small and highly similar, making them prone to misidentification;
second, occlusion and overlap hinder the conventional analysis from accurately tracking
these points, leading to measurement errors. Deep learning techniques, however, can
effectively address these issues and accurately track each point, much like the cognitive
capabilities of the human brain. The model in this preliminary work uses a CNN and
correlation transformer architecture [37].

Figure 2. AI−powered dynamic motion tracking of points in a tensegrity structure. Unlike conven-
tional methods, the deep learning-based approach can accurately track all points of interest. The
images were captured at 6600 fps, and the frame interval of the six representative images shown here
is 200.

The third example demonstrates the ability of a deep learning approach to identify
mechanical vibration modes (Figure 3). In this pilot study, multiple vibration modes,
specifically, amplitude and phase information for each point within the region of interest,
were extracted from a high-speed video clip of a freely vibrating thin plate. To prepare the
training dataset, theoretical solutions [38] were applied to the same plate (with one side
Sensors 2025, 25, 1758 10 of 17

fixed and the other three sides free) to generate simulated video frames. The preliminary
work utilizes a CNN and transformer architecture.

Figure 3. AI−powered analysis of free vibration modes in a thin plate. The images were captured at
a high speed of 14,000 fps. Presented are two typical frames along with 12 identified vibration modes.

The above three examples demonstrate that AI-powered visual sensors and sensing
have opened up exciting possibilities for unprecedented performance in numerous scientific
and engineering applications. However, there are several challenges that researchers
and engineers must address in order to fully realize the potential of these AI-powered
technologies. One of the primary challenges is the difficulty of capturing real-world datasets.
Visual sensors rely on large amounts of data to train deep learning models effectively, but
acquiring high-quality, diverse, and representative datasets can be time-consuming and
expensive, and often requires complex equipment. Moreover, in certain environments,
such as extreme conditions or hard-to-reach locations, gathering the necessary data may
not be feasible.
A natural way to deal with the real dataset preparation issue is to use theoretical
simulation datasets, as shown in the examples mentioned previously. Simulated datasets,
though less resource-intensive to generate, often struggle to capture the nuanced complex-
ities of real-world scenarios. For example, complex geometries and deformation fields
are difficult to rigorously simulate. As a result, AI models trained on such datasets may
struggle to generalize to new and unseen environments in real-world applications.
Additionally, there is no one-size-fits-all deep learning model for general sensing
applications, and identifying or designing the right model, that performs well across a wide
range of tasks, remains a major obstacle. Researchers often need to customize or fine-tune
models based on complex trials, which can lead to long research and development cycles.
Looking toward the future, several promising directions could help unlock the full
potential of AI-powered visual sensors. One such approach is the use of finite element
simulation to generate highly detailed synthetic datasets that simulate various physical
problems. By combining FEM with AI techniques, it is possible to create datasets that reflect
a wide range of scenario factors, such as stress, deformation, temperature, velocity, and
electric and magnetic field intensities, which would be difficult or expensive to capture in
the real world.
Dataset preparation may also utilize game engines like Unreal Engine and Unity [39] to
generate realistic and controlled datasets. These engines are capable of simulating diverse
environments with high accuracy. They offer flexibility in manipulating variables such as
Sensors 2025, 25, 1758 11 of 17

lighting, object positioning, and texture (e.g., transparent and high-reflective objects, which
are typically hard to measure for many sensors), making them an invaluable tool for testing
and training visual sensors under a broad range of situations.
It is important to distinguish the aforementioned simulated data from the “synthetic data”
produced via generative AI, which often cannot safely be used to train another AI model [40].
In recent years, one of the fastest-growing engineering applications of vision-based
sensing is non-contact movement detection, deformation measurement, and the health
monitoring of structures over long distances. The biggest challenge in accurately extracting
the desired information lies in eliminating visual distortions caused by variations in air
density and thermal haze over long-range measurements, as shown in Figure 4a. These
image distortions introduced by atmospheric effects can severely compromise the accu-
racy of motion and deformation analysis and often lead to errors in data interpretation.
As AI-powered sensing systems are increasingly being integrated into key engineering
applications such as remote sensing, autonomous vehicles, and surveillance, eliminating in-
terference is critical to obtaining reliable input image data. This is a technically demanding
task with a high priority in the field.

(a) (b)

Figure 4. AI should be capable of distinguishing disturbances from the physical quantities that
need to be measured. (a) Image distortion in long-range sensing caused by variations in air density
and thermal haze. Ten representative frames of a target are zoomed-in for better visualization;
(b) The motions of the solar panels under wind force, as captured by an outdoor camera, include the
rigid-body motions of the camera itself.

While some progress has been made to reduce atmospheric distortion through hard-
ware and software solutions [41], much work remains. Future advancements in AI-powered
computer vision should include exploring algorithms capable of isolating and compen-
sating for such distortions. Solving this problem will not only improve the accuracy and
reliability of long-range motion and deformation measurements but also unlock new levels
of performance for broader applications. For instance, Figure 4b shows an application
where the images documented both the vibrations of the solar panels and the more signifi-
cant movements of the outdoor camera. To accurately characterize the vibrations of the
panels under wind conditions, it is crucial to isolate the effects of the camera’s motion.
Figure 5 shows two microscopy measurements we encountered in the research. The
measurements not only highlight the applications of visual sensing technology in fields
such as biology and microelectronics but also demonstrate the difficulties of conducting
Sensors 2025, 25, 1758 12 of 17

them using conventional methods. We believe that AI-powered techniques, as previously

described, can provide convenient and effective solutions to these challenges.

(a) (b)

Figure 5. Visual sensing for motion and strain measurements using microscopy. (a) Bacteria motion
tracking under an optical microscope. It is difficult to accurately track all the bacteria motions;
(b) Microelectronics strain measurement under a scanning electron microscope. It is difficult to
determine the full-field strain map.

Most existing AI-powered visual sensing techniques adopt network architectures,

including CNNs, RNNs, and GANs. A current and future approach is the use of transformer
architectures in deep learning. Transformers, known for handling sequential data and long-
range dependencies, have obtained impressive results in natural language processing and
computer vision tasks [42]. They have the potential to enhance sensing and measurement
performance in every aspect by correlating both local and global features in the visual
data, in addition to other advantages, such as scalability to larger datasets and adaptability
to different input types. How to reduce the number of model parameters and lower the
computing cost while maintaining performance will also be a research topic of interest.

3.2. Moving Beyond Conventional Deep Learning

Due to the enormous amount of investment in AI, existing deep learning methods
are constantly being adapted to new problems and incrementally improved (including the
release of new and more powerful models). The adoption of AI continues to accelerate as a
result. At the same time, the current methods are subject to some fundamental constraints.
For example, current state-of-the-art models rely on universal function approximation the-
orems [43,44], resulting in models with tens of billions of parameters and power demands
showing as much as a 100% increase in global data center power consumption in the last
5 years [45]. Likewise, the existing methods are predominantly correlative, leading to
limited generalization and causal reasoning, etc. The emerging research increasingly aims
to address these limitations.
The current state of the art in AI lacks a general framework to develop model archi-
tectures whose inductive bias is appropriate for processing arbitrary data types, although
a handful of specific model archetypes have been identified for particular purposes. For
example, CNNs are orders of magnitude more parameter-efficient than fully connected
networks for modeling data types whose features are translationally invariant (e.g., images);
likewise, attention-based methods such as transformers are suitable for sequence-based
data. In many cases, models can be successfully applied to data types whose features do
not match their inductive bias. For example, vision transformers convert image patches
Sensors 2025, 25, 1758 13 of 17

to sequences via rastering, but the resulting models are extremely large, computationally
expensive black boxes.
These challenges are exacerbated for multimodal data. In order to make predictions
based on heterogeneous data types, model inputs must be embedded into a common space,
typically via modality-specific encoding layers. This is particularly effective for modes
with similar feature spaces (e.g., RGB and IR). Beyond these cases, AI-powered sensing has
the potential to integrate data from very heterogeneous modes, views, and resolutions to
decrease uncertainty and improve predictive power. Examples might include combining
data from acoustic sensors and cameras or from multiple cameras with non-overlapping
fields of view. How to realize these abilities without prohibitively increasing model size
and operational cost remains an open question.
One promising cost-reduction approach that can be inserted into existing data–model
pipelines is to incorporate an extremely lightweight, near-sensor model that only deter-
mines whether a feature of interest may be present. The data are passed to the downstream
large model for full, costly processing only if this feature is detected [46]. In addition,
several fundamentally new approaches to AI modeling have seen rapid growth over the
past few years, although, in addition to open research questions, hardware suitable for
these methods is not as mature as that for conventional deep learning.
Hyperdimensional computing [47] encodes data into high-dimensional, nearly or-
thogonal vector representations, which can be processed with a small library of simple
vector-algebraic operations. The benefits of this approach include the comparatively trans-
parent “reasoning” regarding the data, high robustness to data faults, a low computational
cost, rapid training, and the straightforward integration of multimodal data [48]. Neuro-
symbolic AI combines conventional deep learning for feature extraction with symbolic
logic for transparent, robust causal reasoning, and generalizability [49]. In particular, con-
ventional deep learning is purely correlative by construction, and models capable of causal
reasoning represent a significant advance.

4. Closing Remarks
Recently, OpenAI announced the o3 model, which has significantly improved reason-
ing and problem-solving capabilities compared to previous models. The exceptional perfor-
mance of the experimental model on science and math tests has surprised researchers [50].
Meanwhile, a new startup, DeepSeek, revealed that the cost of the computing power re-
quired to train a large model can be a fraction of that demanded by existing models for
similar tasks [51], which has already spurred big tech corporations to reexamine their
assumption of unlimited computing power. As we were finalizing this editorial, Elon
Musk’s xAI released Grok-3 chatbot, to rival ChatGPT, DeepSeek, and Llama. Such intense
competition will undoubtedly drive rapid advancements and evolutions in generative AI.
Although not as hot as the field of generative AI at this time, the visual sensing
and perception domain is still attracting considerable attention. It can benefit from the
continuous research and development advances in generative AI. The future of AI-powered
visual sensors and sensing holds tremendous promise, with innovations poised to transform
industries in numerous fields and beyond.
We have launched the second edition of this Special Issue. This new edition aims to
show the latest progress and innovations in AI-powered computer vision for sensor, sensing,
and measurement research and applications. It is noteworthy that the scope of the vision iof
this Special Issue encompasses a wide range of imaging techniques, including conventional
imaging, X-ray imaging, microscopy imaging, magnetic resonance imaging, ultrasound
imaging, acoustic imaging, thermal imaging, endoscopic imaging, hyperspectral imaging,
radar imaging, and infrared imaging. Our editorial team has been expanded to include
Sensors 2025, 25, 1758 14 of 17

co-Guest Editors from a university, a federal research institution, a high-tech company, and
a military agency to ensure the diversity of perspectives. We welcome contributions from
our readers.

Author Contributions: Conceptualization, H.N., M.V., J.H. and Z.W.; methodology, Z.W.; software,
M.V. and Z.W.; validation, M.V. and J.H.; formal analysis, H.N. and Z.W.; investigation, H.N. and
M.V.; resources, J.H. and Z.W.; data curation, H.N. and Z.W.; writing—original draft preparation,
H.N. and Z.W.; writing—review and editing, H.N., M.V., J.H. and Z.W.; visualization, M.V. and Z.W.;
supervision, Z.W.; project administration, J.H. and Z.W. All authors have read and agreed to the
published version of the manuscript.

Acknowledgments: Z. Wang would like to thank Weidong Zhu of the University of Maryland,
Baltimore County, and Luo of the Catholic University of America for the collaborations on the
tensegrity study and bacterial chemotaxis projects, respectively. Part of the preliminary work pre-
sented in this editorial was carried out using a high-performance computing (HPC) server equipped
with NVIDIA H100 GPUs, funded by the United States Army Research Office under grant number
W911NF-23-1-0367.

Conflicts of Interest: The authors declare no conflicts of interest.

List of Contributions
1. Ibrahem, H.; Salem, A.; Kang, H.S. RT-ViT: Real-Time Monocular Depth Estimation
Using Lightweight Vision Transformers. Sensors 2022, 22, 3849.
2. Liu, B.; Chen, K.; Peng, S.L.; Zhao, M. Adaptive Aggregate Stereo Matching Network
with Depth Map Super-Resolution. Sensors 2022, 22, 4548.
3. You, X.; Wang, Y.; Zhao, X. A Lightweight Monocular 3D Face Reconstruction Method
Based on Improved 3D Morphing Models. Sensors 2023, 23, 6713.
4. Wang, J.; Li, Y.; Ji, Y.; Qian, J.; Che, Y.; Zuo, C.; Chen, Q.; Feng, S. Deep Learning-Based
3D Measurements with Near-Infrared Fringe Projection. Sensors 2022, 22, 6469.
5. Wang, X.; Yuan, Y.; Liu, M.; Niu, Y. Iterated Residual Graph Convolutional Neural
Network for Personalized Three-Dimensional Reconstruction of Left Myocardium
from Cardiac MR Images. Sensors 2023, 23, 7430.
6. Felipe, J.; Sigut, M.; Acosta, L. Calibration of a Stereoscopic Vision System in the
Presence of Errors in Pitch Angle. Sensors 2022, 23, 212.
7. Han, Y.; Zheng, B.; Kong, X.; Huang, J.; Wang, X.; Ding, T.; Chen, J. Underwater Fish
Segmentation Algorithm Based on Improved PSPNet Network. Sensors 2023, 23, 8072.
8. Dang, T.V.; Tran, D.M.C.; Tan, P.X. IRDC-Net: Lightweight Semantic Segmentation Net-
work Based on Monocular Camera for Mobile Robot Navigation. Sensors 2023, 23, 6907.
9. Lu, K.; Cheng, J.; Li, H.; Ouyang, T. MFAFNet: A Lightweight and Efficient Network
with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation.
Sensors 2023, 23, 6382.
10. Su, E.; Tian, Y.; Liang, E.; Wang, J.; Zhang, Y. A Multiscale Instance Segmentation
Method Based on Cleaning Rubber Ball Images. Sensors 2023, 23, 4261.
11. Candido de Oliveira, D.; Nassu, B.T.; Wehrmeister, M.A. Image-Based Detection of
Modifications in Assembled PCBs with Deep Convolutional Autoencoders. Sensors
2023, 23, 1353.
12. Shi, L.; Wang, G.; Mo, L.; Yi, X.; Wu, X.; Wu, P. Automatic Segmentation of Standing
Trees from Forest Images Based on Deep Learning. Sensors 2022, 22, 6663.
13. Yu, J.; Yang, X.; Zhou, S.; Wang, S.; Hu, S. BRefine: Achieving High-Quality Instance
Segmentation. Sensors 2022, 22, 6499.
Sensors 2025, 25, 1758 15 of 17

14. Deng, G.; Huang, T.; Lin, B.; Liu, H.; Yang, R.; Jing, W. Automatic Meter Reading from
UAV Inspection Photos in the Substation by Combining YOLOv5s and DeeplabV3+.
Sensors 2022, 22, 7090.
15. Geng, H.; Jiang, J.; Shen, J.; Hou, M. Cascading Alignment for Unsupervised Domain-
Adaptive DETR with Improved DeNoising Anchor Boxes. Sensors 2022, 22, 9629.
16. Wu, Y.; Ye, H.; Yang, Y.; Wang, Z.; Li, S. Liquid Content Detection in Transparent
Containers: A Benchmark. Sensors 2023, 23, 6656.
17. Fan, X.; Ding, W.; Qin, W.; Xiao, D.; Min, L.; Yuan, H. Fusing Self-Attention and
CoordConv to Improve the YOLOv5s Algorithm for Infrared Weak Target Detection.
Sensors 2023, 23, 6755.
18. Li, S.; Zhang, H.; Ma, H.; Feng, J.; Jiang, M. SSA Net: Small Scale-Aware Enhancement
Network for Human Pose Estimation. Sensors 2023, 23, 7299.
19. Teixeira, E.; Araujo, B.; Costa, V.; Mafra, S.; Figueiredo, F. Literature Review on Ship
Localization, Classification, and Detection Methods Based on Optical Sensors and
Neural Networks. Sensors 2022, 22, 6879.
20. Shahzad, H.F.; Rustam, F.; Soriano Flores, E.; Vidal Mazón, J.L.; de la Torre Diez, I.;
Ashraf, I. A Review of Image Processing Techniques for Deepfakes. Sensors 2022, 22, 4556.
21. Mo, S.; Lu, P.; Liu, X. AI-Generated Face Image Identification with Different Color
Space Channel Combinations. Sensors 2022, 22, 8228.
22. Kang, S.; Kim, G.; Yoo, C.D. Fair Facial Attribute Classification via Causal Graph-
Based Attribute Translation. Sensors 2022, 22, 5271.
23. Deng, C.; Chen, S.; Zhang, Y.; Zhang, Q.; Chen, F. ULMR: An Unsupervised Learning
Framework for Mismatch Removal. Sensors 2022, 22, 6110.
24. Atif, O.; Lee, J.; Park, D.; Chung, Y. Behavior-Based Video Summarization System for
Dog Health and Welfare Monitoring. Sensors 2023, 23, 2892.
25. Patil, R.R.; Mustafa, M.Y.; Calay, R.K.; Ansari, S.M. S-BIRD: A Novel Critical Multi-
Class Imagery Dataset for Sewer Monitoring and Maintenance Systems. Sensors 2023,
23, 2966.
26. Fang, C.; Liu, J.; Han, P.; Chen, M.; Liao, D. FSVM: A Few-Shot Threat Detection
Method for X-ray Security Images. Sensors 2023, 23, 4069.
27. Kim, M.; Choi, H.C. Compact Image-Style Transfer: Channel Pruning on the Single
Training of a Network. Sensors 2022, 22, 8427.
28. Xu, H.; Wang, J.; Zhang, Y.; Zhang, G.; Xiong, Z. Subgrid Variational Optimized
Optical Flow Estimation Algorithm for Image Velocimetry. Sensors 2022, 23, 437.
29. Zaharia, C.; Popescu, V.; Sandu, F. Hardware–Software Partitioning for Real-Time
Object Detection Using Dynamic Parameter Optimization. Sensors 2023, 23, 4894.
30. Zhang, H.; Zhang, X.; Yu, D.; Guan, L.; Wang, D.; Zhou, F.; Zhang, W. Multi-Modality
Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action
Recognition. Sensors 2023, 23, 5414.

References
1. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507.
[CrossRef] [PubMed]
2. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of
the 25th International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA, 3–6 December 2012;
pp. 1097–1105.
3. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks
from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
4. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations
using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734.
Sensors 2025, 25, 1758 16 of 17

5. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 28th Conference
on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 8–13 December 2014; pp. 3104–3112.
6. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial
Nets. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, 8–13 December 2014;
pp. 2672–2680.
7. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with
Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA,
7–12 June 2015; pp. 1–9.
8. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
9. van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K.
WaveNet: A Generative Model for Raw Audio. In Proceedings of the 9th ISCA Speech Synthesis Workshop (SSW), Sunnyvale,
CA, USA, 13–15 September 2016; p. 125.
10. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam,
V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489.
[CrossRef]
11. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You
Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA,
4–9 December 2017; pp. 5998–6008.
12. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understand-
ing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:
Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186.
13. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al.
Language Models are Few-Shot Learners. In Proceedings of the 34th Conference on Neural Information Processing Systems
(NeurIPS), Vancouver, BC, Canada, 6–12 December 2020; pp. 1877–1901.
14. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.;
Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International
Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021; pp. 1–17.
15. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
16. Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018,
arXiv:1809.11096.
17. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June
2020; pp. 8110–8119.
18. Achiam, O.J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.;
Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774.
19. Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al.
The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783.
20. Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini:
A Family of Highly Capable Multimodal Models. arXiv 2023, arXiv:2312.1180.
21. Liu, Y.; Zhang, K.; Li, Y.; Yan, Z.; Gao, C.; Chen, R.; Yuan, Z.; Huang, Y.; Sun, H.; Gao, J.; et al. Sora: A Review on Background,
Technology, Limitations, and Opportunities of Large Vision Models. arXiv 2024, arXiv:2402.17177.
22. Yeong, D.J.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review.
Sensors 2021, 21, 2140. [CrossRef]
23. Rana, K.; Khatri, N. Automotive intelligence: Unleashing the potential of AI beyond advanced driver-assistance system, a
comprehensive review. Comput. Electr. Eng. 2024, 117, 109237. [CrossRef]
24. Bajwa, J.; Munir, U.; Nori, A.; Williams, B. Artificial intelligence in healthcare: Transforming the practice of medicine. Future
Healthc. J. 2021, 8, e188–e194. [CrossRef]
25. Ye, M.; Ke, L.; Li, S.; Tai, Y.W.; Tang, C.K.; Danelljan, M.; Yu, F. Cascade-DETR: Delving into High-Quality Universal Object Detec-
tion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–10 October 2023;
pp. 6704–6714.
26. Kortli, Y.; Jridi, M.; Al Falou, A.; Atri, M. Face Recognition Systems: A Survey. Sensors 2020, 20, 342. [CrossRef] [PubMed]
27. Ma, J.; He, Y.; Li, F.; Han, L.J.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [CrossRef]
[PubMed]
Sensors 2025, 25, 1758 17 of 17

28. Li, M.; Lv, T.; Chen, J.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. TrOCR: Transformer-Based Optical Character Recognition
with Pre-trained Models. Proc. AAAI Conf. Artif. Intell. 2023, 37, 13094–13102. [CrossRef]
29. Xu, J.; Guo, Y.; Peng, Y. FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models. In Proceedings
of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024;
pp. 561–570.
30. Nguyen, H.; Novak, E.; Wang, Z. Accurate 3D reconstruction via fringe-to-phase network. Measurement 2022, 190, 110663.
[CrossRef]
31. Mu, F.; Sifferman, C.; Jungerman, S.; Li, Y.; Han, M.; Gleicher, M.; Gupta, M.; Li, Y. Towards 3D Vision with Low-Cost Single-
Photon Cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA,
USA, 17–21 June 2024; pp. 5302–5311.
32. Zhang, H.; Ba, Y.; Yang, E.; Mehra, V.; Gella, B.; Suzuki, A.; Pfahnl, A.; Chandrappa, C.C.; Wong, A.; Kadambi, A. WeatherStream:
Light Transport Automation of Single Image Deweathering . In Proceedings of the 2023 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13499–13509.
33. Ramazzina, A.; Bijelic, M.; Walz, S.; Sanvito, A.; Scheuble, D.; Heide, F. ScatterNeRF: Seeing Through Fog with Physically-Based
Inverse Neural Rendering. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris,
France, 1–6 October 2023; pp. 17911–17922.
34. Muskhelishvili, N. Some Basic Problems of the Mathematical Theory of Elasticity, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1977.
35. Fung, Y.; Tong, P. Classical and Computational Solid Mechanics; World Scientific: Singapore, 2001.
36. Altenbach, H.; Bogdanov, V.; Grigorenko, A.; Kushnir, R.; Nazarenko, V.; Eremeyev, V. Selected Problems of Solid Mechanics and
Solving Methods; Advanced Structured Materials; Springer Nature: Cham, Switzerland, 2024.
37. Karaev, N.; Makarov, I.; Wang, J.; Neverova, N.; Vedaldi, A.; Rupprecht, C. CoTracker3: Simpler and Better Point Tracking by
Pseudo-Labelling Real Videos. arXiv 2024, arXiv:2410.11831.
38. Kreyszig, E.; Kreyszig, H.; Norminton, E.J. Advanced Engineering Mathematics, 10th ed.; John Wiley & Sons: Hoboken, NJ,
USA, 2011.
39. Ciekanowska, A.; Kiszczak Gliński, A.; Dziedzic, K. Comparative Analysis of Unity and Unreal Engine Efficiency in Creating
Virtual Exhibitions of 3D Scanned Models. J. Comput. Sci. Inst. 2021, 20, 247–253. [CrossRef]
40. Alemohammad, S.; Casco-Rodriguez, J.; Luzi, L.; Humayun, A.I.; Babaei, H.; LeJeune, D.; Siahkoohi, A.; Baraniuk, R.G. Self-
Consuming Generative Models Go MAD. In Proceedings of the Twelfth International Conference on Learning Representations
(ICLR), Vienna, Austria, 7–11 May 2024.
41. Liu, Y.; Yu, L.; Wang, Z.; Pan, B. Neutralizing the impact of heat haze on digital image correlation measurements via deep
learning. Opt. Lasers Eng. 2023, 164, 107522. [CrossRef]
42. Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. In Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 12179–12188.
43. Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks are Universal Approximators. Neural Netw. 1989,
2, 359–366. [CrossRef]
44. Kidger, P.; Lyons, T. Universal Approximation with Deep Narrow Networks. In Proceedings of the 33rd Conference on Learning
Theory (COLT), Virtual Event, 9–12 July 2020; pp. 2306–2327.
45. Pilz, K.F.; Mahmood, Y.; Heim, L. AI’s Power Requirements Under Exponential Growth: Extrapolating AI Data Center Power Demand
and Assessing Its Potential Impact on U.S. Competitiveness; RAND Corporation: Santa Monica, CA, USA, 2025.
46. Rezvani, A.; Huang, W.; Chen, H.; Ni, Y.; Imani, M. Self-Trainable and Adaptive Sensor Intelligence for Selective Data Generation.
Front. Artif. Intell. 2024, 7, 1403187. [CrossRef] [PubMed]
47. Kanerva, P. Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional
Random Vectors. Cogn. Comput. 2009, 1, 139–159. [CrossRef]
48. Abdulrahman, M.; Wasif, S.; Wael, M.; Azab, E.; Mashaly, M.; Abd El Ghany, M.A.A. A Review on Hyperdimensional
Computing. In Proceedings of the 2023 International Conference on Microelectronics (ICM), Abu Dhabi, United Arab Emirates,
17–20 December 2023; pp. 74–79.
49. Colelough, B.C.; Regli, W. Neuro-Symbolic AI in 2024: A Systematic Review. arXiv 2025, arXiv:2501.05435; pp. 1–19.
50. Jones, N. How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest. Nature 2025, 637, 774–775. [CrossRef]
51. DeepSeek-AI.; Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; et al. DeepSeek-R1: Incentivizing
Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2025, arXiv:2501.12948.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Artificial Intelligence Applications and Innovations: Day-to-Day Life Impact
No ratings yet
Artificial Intelligence Applications and Innovations: Day-to-Day Life Impact
5 pages
Faiml Unit 3
No ratings yet
Faiml Unit 3
6 pages
Deep Learning in Image Processing Review
No ratings yet
Deep Learning in Image Processing Review
23 pages
Module-1 DL
No ratings yet
Module-1 DL
20 pages
AI Advances in 2018: Key Highlights
No ratings yet
AI Advances in 2018: Key Highlights
10 pages
Electronics 12 03780
No ratings yet
Electronics 12 03780
5 pages
8 Modern Convolutional Neural Networks: Et Al. Et Al. Et Al
No ratings yet
8 Modern Convolutional Neural Networks: Et Al. Et Al. Et Al
57 pages
Report Phase1
No ratings yet
Report Phase1
39 pages
2024 Science & Technology Highlights
No ratings yet
2024 Science & Technology Highlights
61 pages
CV 26 Paper Summery
No ratings yet
CV 26 Paper Summery
39 pages
Sjeat 98 406-418
No ratings yet
Sjeat 98 406-418
13 pages
Evolution of AI MLA With Works Cited
No ratings yet
Evolution of AI MLA With Works Cited
8 pages
Smart Computer Vision
No ratings yet
Smart Computer Vision
359 pages
Deep Learning in Interactive Applications
No ratings yet
Deep Learning in Interactive Applications
17 pages
10.1007@978 3 030 17795 9
No ratings yet
10.1007@978 3 030 17795 9
833 pages
01 AI Trends Report 2023
No ratings yet
01 AI Trends Report 2023
15 pages
The Evolution of AI
No ratings yet
The Evolution of AI
2 pages
Conference Paper Final Draft
No ratings yet
Conference Paper Final Draft
8 pages
Real-Time Updates in Generative AI
No ratings yet
Real-Time Updates in Generative AI
15 pages
Semantic Image Segmentation in Driving
No ratings yet
Semantic Image Segmentation in Driving
38 pages
Deep Learning in Autonomous Driving
No ratings yet
Deep Learning in Autonomous Driving
5 pages
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
No ratings yet
Advancements in Generative AI A Comprehensive Review of GANs GPT Autoencoders Diffusion Model and Transformers
26 pages
The Futureof AI
No ratings yet
The Futureof AI
9 pages
State of AI Report - 2024 ONLINE
No ratings yet
State of AI Report - 2024 ONLINE
213 pages
Computer Vision and Machine Learning Trends
No ratings yet
Computer Vision and Machine Learning Trends
4 pages
A Review On Deep Learning Applications
No ratings yet
A Review On Deep Learning Applications
11 pages
Evolution and Impact of AI Technologies
No ratings yet
Evolution and Impact of AI Technologies
20 pages
Future Trends in Artificial Intelligence
No ratings yet
Future Trends in Artificial Intelligence
13 pages
Autonomous Car CNN Model Guide
No ratings yet
Autonomous Car CNN Model Guide
7 pages
Coarse Work 1
No ratings yet
Coarse Work 1
4 pages
Deep Learning in Computer Vision
No ratings yet
Deep Learning in Computer Vision
7 pages
AI's Impact on Undergraduate Success
No ratings yet
AI's Impact on Undergraduate Success
10 pages
Multimedia AI Grand Challenges
No ratings yet
Multimedia AI Grand Challenges
3 pages
Deep Learning Applications and Image Processing
No ratings yet
Deep Learning Applications and Image Processing
5 pages
Git2 o
No ratings yet
Git2 o
2 pages
AI Breakthroughs: Deep Learning Insights
No ratings yet
AI Breakthroughs: Deep Learning Insights
19 pages
Research Paper (2) Done
No ratings yet
Research Paper (2) Done
17 pages
A Comprehensive Review of Knowledge Distillation in Computer Vision
No ratings yet
A Comprehensive Review of Knowledge Distillation in Computer Vision
38 pages
Electronics 11 01661 v4
No ratings yet
Electronics 11 01661 v4
39 pages
AI Advancements 2020 2024
No ratings yet
AI Advancements 2020 2024
5 pages
Computer Vision and Recognition Systems
No ratings yet
Computer Vision and Recognition Systems
273 pages
Deep Learning's Impact on Chip Design
No ratings yet
Deep Learning's Impact on Chip Design
17 pages
Artificial Intelligence Based Mobile Robot
No ratings yet
Artificial Intelligence Based Mobile Robot
19 pages
AI Applications in Computer Vision
No ratings yet
AI Applications in Computer Vision
7 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
4 pages
2023 Vertical Snapshot Generative AI Preview
No ratings yet
2023 Vertical Snapshot Generative AI Preview
11 pages
Deep Learning: A Comprehensive Guide
No ratings yet
Deep Learning: A Comprehensive Guide
12 pages
Ai 05 00003
No ratings yet
Ai 05 00003
17 pages
State of AI Report - 2024 ONLINE
No ratings yet
State of AI Report - 2024 ONLINE
213 pages
Autonomous Driving Deep Learning Survey
No ratings yet
Autonomous Driving Deep Learning Survey
33 pages
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
No ratings yet
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
13 pages
The Rise of Deep Learning
No ratings yet
The Rise of Deep Learning
11 pages
Pattern Recognition Lab Overview
No ratings yet
Pattern Recognition Lab Overview
132 pages
Ai Research Paper
No ratings yet
Ai Research Paper
4 pages
Deep Learning Models & GPU Frameworks
No ratings yet
Deep Learning Models & GPU Frameworks
15 pages
C8-Modern CNNs
No ratings yet
C8-Modern CNNs
57 pages
Recent Advancements in Artificial Intelligence
No ratings yet
Recent Advancements in Artificial Intelligence
4 pages
AI's Impact on Future Industries
No ratings yet
AI's Impact on Future Industries
4 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
4 pages
Advances in 3D Digitization Sensors
No ratings yet
Advances in 3D Digitization Sensors
21 pages
Computer Vision
No ratings yet
Computer Vision
3 pages
Automatic Robot Hand-Eye Calibration Enabled by Learning-Based 3D Vision
No ratings yet
Automatic Robot Hand-Eye Calibration Enabled by Learning-Based 3D Vision
24 pages
E57 Fact Sheet 2019
No ratings yet
E57 Fact Sheet 2019
1 page
MLV Photogrammetryineducation
No ratings yet
MLV Photogrammetryineducation
20 pages
Pothole Detection Method
No ratings yet
Pothole Detection Method
6 pages
Realfusion 360 Reconstruction of Any Object From A Single Image
No ratings yet
Realfusion 360 Reconstruction of Any Object From A Single Image
20 pages
3D Texturing for Large-Scale Models
No ratings yet
3D Texturing for Large-Scale Models
15 pages
Computer Vision Course Overview IT5409
No ratings yet
Computer Vision Course Overview IT5409
9 pages
Bui Ramadani 3D Reconstruction From 2D X Ray
No ratings yet
Bui Ramadani 3D Reconstruction From 2D X Ray
2 pages
1.평행공간 Parallel Space
No ratings yet
1.평행공간 Parallel Space
19 pages
Vision System
No ratings yet
Vision System
8 pages
AI Models For 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
No ratings yet
AI Models For 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
8 pages
CV Unit-4
No ratings yet
CV Unit-4
10 pages
AI For Computer Vision RGPV Notes
No ratings yet
AI For Computer Vision RGPV Notes
11 pages
Stereo Research Paper
No ratings yet
Stereo Research Paper
10 pages
Computer Vision
No ratings yet
Computer Vision
3 pages
3D Reconstruction with Kinect and ICP
No ratings yet
3D Reconstruction with Kinect and ICP
6 pages
Joshi Et Al. - 2025 - Unconstrained Large-Scale 3D Reconstruction and Re
No ratings yet
Joshi Et Al. - 2025 - Unconstrained Large-Scale 3D Reconstruction and Re
9 pages
Derksen Shadow Neural Radiance Fields For Multi-View Satellite Photogrammetry CVPRW 2021 Paper
No ratings yet
Derksen Shadow Neural Radiance Fields For Multi-View Satellite Photogrammetry CVPRW 2021 Paper
10 pages
Real-Time Motion-Induced Error Compensation For 4-Step Phase-Shifting Profilometry - Guo2021
No ratings yet
Real-Time Motion-Induced Error Compensation For 4-Step Phase-Shifting Profilometry - Guo2021
13 pages
Computer Vision Richard Szeliski
No ratings yet
Computer Vision Richard Szeliski
465 pages
MSR Graduate Handbook 2023-24
No ratings yet
MSR Graduate Handbook 2023-24
55 pages
Equipamentos Arco C Arcadis Orbic3d
No ratings yet
Equipamentos Arco C Arcadis Orbic3d
24 pages
CVR Module 3
No ratings yet
CVR Module 3
12 pages
Binocular Light-Field Imaging Theory and Occlusion-Robust Depth Perception Application
No ratings yet
Binocular Light-Field Imaging Theory and Occlusion-Robust Depth Perception Application
13 pages
Atlas
No ratings yet
Atlas
18 pages
Advanced 3D Imaging Sensors
No ratings yet
Advanced 3D Imaging Sensors
2 pages
NeRF Methods for Cultural Heritage 3D Reconstruction
No ratings yet
NeRF Methods for Cultural Heritage 3D Reconstruction
8 pages