0% found this document useful (0 votes)
14 views5 pages

Aerial_Image_Matching_Algorithm_Based_on_Edge_Boxes

The document presents a novel aerial image matching algorithm utilizing Edge Boxes and HardNet for improved target localization in UAV reconnaissance images. It addresses challenges such as rotation, perspective changes, and varying imaging conditions by dividing the matching process into object block detection and feature extraction stages. Experimental results indicate that the proposed method achieves high accuracy and robustness in matching UAV images with reference targets, outperforming traditional methods like SIFT and SURF.

Uploaded by

Akash ainapur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Aerial_Image_Matching_Algorithm_Based_on_Edge_Boxes

The document presents a novel aerial image matching algorithm utilizing Edge Boxes and HardNet for improved target localization in UAV reconnaissance images. It addresses challenges such as rotation, perspective changes, and varying imaging conditions by dividing the matching process into object block detection and feature extraction stages. Experimental results indicate that the proposed method achieves high accuracy and robustness in matching UAV images with reference targets, outperforming traditional methods like SIFT and SURF.

Uploaded by

Akash ainapur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE)

Aerial Image Matching Algorithm Based on Edge


Boxes
Qunzhong Fang* Baoshu Xu
School of Information Science and Engineering School of Information Science and Engineering
2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE) | 979-8-3503-6144-5/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICAACE61206.2024.10548455

Shenyang University of Technology Shenyang University of Technology


Shenyang, China Shenyang, China
*[email protected] [email protected]

Chengshuo Zhang Shiying Jin


School of Information Science and Engineering School of Information Science and Engineering
Shenyang University of Technology Shenyang University of Technology
Shenyang, China Shenyang, China
[email protected] [email protected]
Abstract—In the process of precisely locating targets using Due to the complicated flight features, high-altitude
reconnaissance images from unmanned aerial vehicles (UAVs), operations, and variable imaging conditions intrinsic to
the effectiveness of feature extraction is often hindered by the unmanned aerial vehicles[2], reconnaissance images taken by
influence of rotation and perspective changes in UAV images. To UAVs will inevitably show variations in lighting, scale, and
address this issue, this study proposes an image matching method angles in comparison to target reference images. Accurate
based on object block detection, dividing the matching problem matching of UAV images with target reference images
into two key stages. In the object block detection stage, the Edge becomes crucial to address these concerns and achieve precise
Boxes algorithm is employed to detect objects in the images. In localization of target reference images. Therefore, in order to
the matching stage, the HardNet network is utilized for feature
achieve high accuracy and resilience, it is crucial to develop
extraction from the detected object blocks, followed by similarity
measurement. Experimental results demonstrate that this method
image matching algorithms with superior robustness to changes
achieves accurate image matching, overcoming challenges such as in image scale, rotation and lighting variations.
object rotation and small object extraction issues in UAV images. Image matching algorithms are designed to determine the
The proposed approach provides a feasible and effective means similarity or degree of correspondence between two or more
for UAV target localization. images. Typically, these algorithms involve comparing one
image (referred to as a reference image or template) with
Keywords-component; Aerial Image Matching; Edge Boxes
another image (referred to as a search image or scene) to
Algorithm; HardNet Algorithm; Similarity Measurement;
ascertain their similarity [3]. Traditional image feature
extraction relies on low-level information such as grayscale
I. INTRODUCTION values, edges, and colors, which often struggle to provide an
abstract representation of the image[3]. Conventional feature
In recent years, unmanned aerial vehicles (UAVs) have extraction methods (such as SIFT [4], SURF [5], etc.) require
demonstrated outstanding performance in several high-tech downsampling of the entire image. However, due to the small
regional conflicts, garnering significant attention worldwide. size of target images from the UAV perspective, downsampling
Confronted with the urgent need for precision strikes in modern leads to indistinct features, making extraction more challenging
information warfare, the development of military UAV and resulting in suboptimal matching performance. In contrast,
technology has become crucial. In this context, the use of UAV deep learning methods can mimic the human brain's ability to
reconnaissance images for precise target localization[1] has learn abstract high-level features from low-level features in
emerged as a key technology. As illustrated in Figure 1: images. These features can be recombined and applied to
different scenarios, providing deep learning neural networks
with a natural advantage in image feature extraction [6].
Deep learning-based image matching methods, such as
SuperPoint [7], directly extract features from the entire image
and perform matching through SuperGlue [8] or LightGlue [9].
While these methods demonstrate satisfactory matching
performance for larger objects in images, they face challenges
in achieving precise matching for smaller objects, particularly
under the perspective of UAVs where the features may not be
as distinct. The dynamic and random uncertainties associated
with UAV flight altitude, flight posture, and image acquisition
conditions further exacerbate difficulties in feature extraction
Figure 1: UAV Target Matching and Localization

979-8-3503-6144-5/24/$31.00 ©2024 IEEE 87


Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
and matching, significantly impacting the accuracy of image points outside the candidate box are assigned a weight of 0.For
matching. the remaining edge fragments with edge points outside the box
but their geometric center inside, the weight is calculated as
To address these challenges, this paper proposes an image follows:
matching method based on object block detection. The feature
extraction process is divided into two steps: first, object block T −1

detection is performed on the entire image, and then feature ϖ b ( si ) = 1 − max ∏ a (t j ,t j +1 ) (2)
T
extraction is applied to the detected object blocks. Finally, j

similarity measurement is employed to detect matched object Where T represents the path formed by extending the edge
blocks in the image pairs. fragment into the interior of the candidate box and intersecting
with the boundaries of the box. If this path does not exist, the
II. OBJECT BLOCK MATCHING weight is assigned as 1.
This paper transforms the global image matching problem Step 5: Aggregate all edge fragments associated with the
into a local object block matching problem within the image. candidate box and calculate the final score of the candidate box.
The overall image matching process involves the use of the The scoring method is defined as per method B.
Edge Boxes [10] algorithm for object detection in the image.
The detected object blocks are then processed using the ∑ϖ b ( si )mi
HardNet [11] network to extract deep features. Finally, a hb = i (3)
similarity measurement is employed to detect matching object 2(bw + bh ) k
blocks in the image.Finally, complete content and
organizational editing before formatting. Please take note of the ∑m p
following items when proofreading spelling and grammar: in
h = hb −
p∈bin (4)
b
2(bw + bh ) k
A. Extraction of Object Blocks
Where mi and bw are the lengths of the edge fragment, bh
The main idea of the Edge Boxes algorithm is to extract a
series of object candidate boxes based on edge information, and bin are the length and width of the candidate box, and bin
facilitating the acceleration of existing target object detection is the set of edge fragments within the candidate box. The value
algorithms. The algorithm initially employs a structured edge of K is 1.5, used to counteract the impact of larger rectangular
detection algorithm from reference [12] to obtain the edge boxes containing more edge fragments. Reference [15] found
response map of the image. Subsequently, multiple sets of edge in experiments on general object candidate box extraction that
fragments are formed by combining edge points based on edge fragments located inside the object often do not contribute
spatial and directional constraints. Following this, each edge significantly to the edges on the object's contour. Therefore, the
fragment is assigned a weight based on its spatial geometric final score of the candidate box is represented by the hbin
relationship with candidate boxes, and the candidate boxes are calculated from Equation (4).
scored according to the weights of these edge fragments. The
specific steps of the algorithm are summarized as follows: As objects appear relatively small from the perspective of a
UAV, it is necessary to exclude detected overly large objects.
Step 1:Input the image and use the edge detection method
Thus, size constraints are applied to the detected candidate
from reference[12] to obtain the edge response map through
boxes. Figure 2 illustrates the candidate boxes with the top 200
non-maximum suppression processing.
response values detected by the Edge Boxes algorithm.
Step 2:Combine edge points based on spatial adjacency
relationships and directional constraints to generate multiple
edge fragments.
Step 3:Calculate the similarity between any two edge
fragments, with the calculation formula as follows:
a( si .s j ) =cos(θi − θij ) cos(θ j − θij ) (1)

Where si , s j are two edge fragments, θi , θ j and θij


respectively represent the average orientation angles of the two
sets of edge points and the angular difference between the two
sets of edges.
Step 4: Given a candidate box, calculate the weight of each Figure 2: Object Blocks Detected by the Edge Boxes Algorithm
edge fragment within the box. This can be divided into four
categories: Edge fragments entirely within the candidate box
are assigned a weight of 1.Edge fragments intersecting with the B. Matching network of HardNet
boundaries of the candidate box are assigned a weight of When substantial non-linear variations exist between two
0.Edge fragments with both the geometric center and edge images, the stability of feature description cannot be

88
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
guaranteed by the method of computing local gradient distance matrix elements are computed through the following
information, as employed by SIFT [13]. Particularly in formula:
scenarios characterized by blurred imagery, such as those
encountered in unmanned aerial vehicle reconnaissance, d (ai , p=
j) 2 − 2ai p j (5)
manually designed feature descriptors exhibit limited
expressive capability. Deep learning, and in particular Figure 4 illustrates the sampling process of the training set.
Convolutional Neural Networks (CNNs), have increasingly For matching descriptors, the closest non-matching descriptors
demonstrated superior performance in the extraction of are designated as follows:
features[14]. The research focus on image feature extraction
has gradually shifted from SIFT features to CNN features. ak min is the closest non-matching descriptor to pi ,

This study employs the HardNet [11] network to extract where


= K min min k =1,2,...n , k ≠ d (ak , pi ) .
deep features from object patches. The feature training is
conducted based on the publicly available Brown dataset [15], pJ min is the closest non-matching descriptor to ai ,
with the network taking extracted object patches as inputs and
producing corresponding object patch features as outputs. where J min min J =1,2,...n , k ≠ d (ak , p j ) .
=
During the training process, a set of identical object patches is
simultaneously inputted into the network, and individual
feature descriptors are generated through the feature extraction
network. The network loss function is constructed by
combining the object patch labels, aiming to minimize the
feature distances between matching object patches and
maximize the distances between non-matching object patches.
Experimental results indicate that descriptors learned by
HardNet outperform traditionally handcrafted descriptors such
as SIFT and SURF in the context of image matching. The
network's ability to generate discriminative features contributes
to improved performance in matching object patches,
especially in scenarios where traditional methods may face
challenges, such as significant deformations or variations in
illumination.
1) HardNet Network architecture Figure 4 Training set sampling process
The HardNet network consists of 7 convolutional layers. as
illustrated in Figure 3. The network takes grayscale object Each matching descriptor can generate a quadruplet
patches of a fixed size (32*32) as input. Zero-padding is (ai , pi , p j min , ak min ) , and then a triplet is formed based on the
applied to all convolutional layers except the last one.
Following each layer, except the final one, are batch following formula:
normalization and ReLU activation in sequential order. The (ai , pi , p j min ), d (ai , p j min ) < d (ak min , pi )
output of the network undergoes L2 normalization, resulting in β = (6)
128-dimensional descriptors with unit length, serving as the (ai , pi , ak min ) else
final output of the network.
For each training set, n triplets are generated, and the loss
function is as follows:
1

n
=L max(0,1 +d ( ai , p j ) − min
n i =1
(7)
( d ( ai , p j min ), d ( ak min , p j )))

Where min(d (ai , p j min ), d (ak min , p j ))) is the distance to the
Figure 3 HardNet network structure nearest negative sample.
2) HardNet loss function The objective of the loss function is to minimize the
The objective of the loss function is to ensure that the distance between matching descriptors in two images, while
distance between matching pairs is significantly smaller than maximizing the distance between non-matching descriptors. In
the distance between non-matching pairs.Consider two images, other words, it aims to minimize the distance for positive
denoted as A and P, representing a training set (batch) samples and maximize the distance to the nearest negative
comprising 2n image blocks, where n denotes the number of sample.
matching pairs. Each matching pair is associated with
corresponding feature descriptors for the image blocks. The

89
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
C. Similarity measure In each group, the top left corner of the picture is the target
The Pearson correlation coefficient, also known as the image, and the matching situation is shown in Figure 5:
Pearson product-moment correlation coefficient, is a widely
used measure for assessing linear correlation between two
variables. It is commonly employed to quantify the degree of
linear correlation between two samples[16]. For two samples,
A and B, the Pearson correlation coefficient is defined as:
cov( A, B )
ρ(A, B ) = (8)
σ Aσ B
(a) Distance of 200 meters (b) distance of 60 meters
Here, cov( A, B) denotes the covariance between A and B,
an σ A and σ B represent the standard deviations of A and B, Figure 5 Effect diagram of image matching at different distances
respectively. The Pearson correlation coefficient ranges from -1
to 1, where 1 indicates perfect positive correlation, -1 indicates B. Rotation Angle Analysis
perfect negative correlation, and 0 indicates no correlation.
In this section, we analyze the performance of the proposed
matching algorithm under different angles. The experiment
III. EXPERIMENTAL AND RESULTS ANALYSIS involved the drone collecting images at different distances,
In this section, we detail the experimental procedures with constantly changing angles, and all images having a
conducted to evaluate the proposed matching algorithm's consistent size of 1920 by 1080 pixels.
effectiveness on unmanned aerial vehicle (UAV) images. The The matching effect is shown in Table 2:
method involves object block extraction using the Edge Boxes
algorithm, resizing the extracted blocks and the target image to Table 2 Image matching effect at different Angle
a uniform size of 32*32, and concatenating them along Target image distance Maximum box Top five
columns. The concatenated image, with a width of 32 and Angle (M) selection resultsBox select
height of (32(number of object blocks + 1)), is then fed into the 200 False False
HardNet matching network for feature extraction. Each object 120 True True
block yields a 128-dimensional descriptor. Similarity -60°
calculations are performed between the descriptors of the 80 True True
extracted object blocks and those of the target image. The 40 True True
results are sorted, and the maximum similarity value is selected. 200 True True
120 True True
A. Scale Analysis -30°
80 True True
In this subsection, we present the analysis of the proposed
matching algorithm's performance under varying distances. The 40 True True
experiments involve collecting images from a UAV at the same 200 True True
perspective, with distances ranging from far to near, all having 120 True True
a consistent size of 1920*1080 pixels. 30°
80 True True
Table 1 shows the matching effect: 40 True True
Table 1 Image matching effect at different distances 200 False True

Distances Maximum box Top five 120 True True


60°
(M) selection resultsBox select 80 True True
200(40*20pixels) False True 40 True True
150(60*30pixels) True True
In each group, the top left corner of the picture is the target
100(80*40pixels) True True
image, and the matching situation is shown in Figure 6:
60(110*50pixels) True True
20(150*80pixels) True True
10(200*100pixels) True True

Specifically, at a distance of approximately 200 meters, the


target object becomes too small, resulting in unclear contour
information and challenging matching. At a distance of 150
meters, accurate matching is achieved. The size of the object
to be matched is 40*20 pixels, and according to the COCO
dataset definition, objects smaller than 32*32 pixels are (a) Matched image rotated right by 60 (b) Matched image rotated left by 30
considered small targets. Therefore, the algorithm in this paper Figure 6 Image matching effect at different angles
can better match small targets.

90
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
C. Comparison of traditional and deep learning methods [3] Ma J,Jiang X,Fan A, et al. Image Matching from Handcrafted to Deep
Features: A Survey[J]. International Journal of Computer
The algorithm proposed in this paper directly performs Vision,2020,129(1).
object block matching without the need for image [4] Lowe, D. G. (2004). Distinctive image features from scale-invariant
downsampling. Therefore, it can facilitate the extraction and keypoints. International Journal of Computer Vision(2), 60.
matching of features for small targets. In contrast, traditional [5] Bay H, Tuytelaars T, van Gool L. SURF: Speeded up robust features.
methods such as SIFT and SURF, as well as deep learning Proceedings of the 9th European Conference on Computer Vision. Graz:
methods like SuperPoint, rely on extracting image feature Springer, 2006. 404–417.
points and subsequently matching these points. When the [6] Xuejiao Zhao. (2023). Research on image recognition technology based
on deep learning algorithm. Modeling and Simulation.
target is at a considerable distance, downsampling diminishes
the salience of target features, making extraction more [7] DeTone, Daniel et al. “SuperPoint: Self-Supervised Interest Point
Detection and Description.” 2018 IEEE/CVF Conference on Computer
challenging. Both traditional methods and deep learning Vision and Pattern Recognition Workshops (CVPRW) (2017): 337-33712.
approaches encounter difficulties in achieving precise matching [8] Sarlin, Paul-Edouard et al. “SuperGlue: Learning Feature Matching With
and localization for specified targets in the context of UAV Graph Neural Networks.” 2020 IEEE/CVF Conference on Computer
perspectives. To facilitate a comparison with the algorithm Vision and Pattern Recognition (CVPR) (2019): 4937-4946.
proposed in this paper, a test image set is constructed under [9] Lindenberger, Philipp et al. “LightGlue: Local Feature Matching at Light
conditions of close proximity, allowing the evaluation of Speed.” ArXiv abs/2306.13643 (2023): n. pag.
various algorithms currently in use. The detailed results are [10] Zitnick C L , Dollar P .Edge Boxes: Locating Object Proposals from
presented in Table 3: Edges[J].Springer International Publishing, 2014.DOI:10.1007/978-3-
319-10602-1_26.
Table 3 The average matching accuracy of the test image set at close proximity. [11] Mishchuk, Anastasiya, Dmytro Mishkin, Filip Radenovic and Jiri Matas.
“Working hard to know your neighbor's margins: Local descriptor
algorithm accuracy /% learning loss.” NIPS (2017).
SIFT+RANCAC 72 [12] Dollár P, Zitnick C L. Structured forests for fast edge
detection[C]//Proceedings of the IEEE international conference on
SURF+RANCAC 75 computer vision. 2013: 1841-1848.
SuperPoint+LightGlue 69 [13] Ma, C., Liu, Y., & Zhang, Y. (2022). Robust feature description for non-
linear image variations based on scale invariant deep convolutional neural
SuperPoint+SuperGlue 65 networks. Pattern Recognition, 116, 107854.
ours 85 [14] Li, L., Mao, Z., Hu, J., & Zhou, Z. H. (2019). Object detection using deep
learning. IEEE transactions on neural networks and learning systems,
By comparing various algorithms through the construction 30(1), 94-109.
of a test image set under close proximity conditions, it can be [15] Matthew Brown, Gang Hua, and Simon Winder. Discriminative learning
concluded that the algorithm proposed in this paper of localimage descriptors. IEEE Transactions on Pattern Analysis and
Machine Intelligence
outperforms other algorithms even under close proximity
[16] Gujarathi, R., & Costa, L. da S. (2020). Pearson correlation coefficient: A
conditions. Overall, the algorithm presented in this paper review of its history, interpretation, and application. Journal of Statistical
surpasses current traditional methods and deep learning Computation and Simulation, 90(7), 1514-1532.
matching methods in the UAV perspective.

IV. CONCLUSION
This paper proposes a two-stage image matching algorithm.
Firstly, it utilizes the Edge Boxes algorithm to detect object
blocks within the image. Subsequently, the HardNet network is
employed to extract deep features from these detected object
blocks. Finally, similarity measurement is applied to identify
matching object blocks in pairs of images. Compared to current
traditional and deep learning matching methods, this approach
achieves higher accuracy in image matching. It demonstrates
precise matching even under conditions of varying sizes, and
angles. In summary, the algorithm presented in this paper can
be further applied to computer vision tasks requiring high
precision.

REFERENCES
[1] Wang, Y., Li, H., & Zhang, L. (2020). Research on the method of precise
target localization based on UAV reconnaissance images. Journal of Geo-
Information Science, 20(6), 1079-1088.
[2] Bangchu Zhang, Jian Liao, Yu Kuang, Min Zhang, Shaolei Zhou, Yuhang
Kang. Research Status and Development Trend of the United States
UAV Swarm Battlefield[J]. Aero Weaponry, 2020, 27(6): 7-12.

91
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.

You might also like