Aerial_Image_Matching_Algorithm_Based_on_Edge_Boxes
Aerial_Image_Matching_Algorithm_Based_on_Edge_Boxes
detection is performed on the entire image, and then feature ϖ b ( si ) = 1 − max ∏ a (t j ,t j +1 ) (2)
T
extraction is applied to the detected object blocks. Finally, j
similarity measurement is employed to detect matched object Where T represents the path formed by extending the edge
blocks in the image pairs. fragment into the interior of the candidate box and intersecting
with the boundaries of the box. If this path does not exist, the
II. OBJECT BLOCK MATCHING weight is assigned as 1.
This paper transforms the global image matching problem Step 5: Aggregate all edge fragments associated with the
into a local object block matching problem within the image. candidate box and calculate the final score of the candidate box.
The overall image matching process involves the use of the The scoring method is defined as per method B.
Edge Boxes [10] algorithm for object detection in the image.
The detected object blocks are then processed using the ∑ϖ b ( si )mi
HardNet [11] network to extract deep features. Finally, a hb = i (3)
similarity measurement is employed to detect matching object 2(bw + bh ) k
blocks in the image.Finally, complete content and
organizational editing before formatting. Please take note of the ∑m p
following items when proofreading spelling and grammar: in
h = hb −
p∈bin (4)
b
2(bw + bh ) k
A. Extraction of Object Blocks
Where mi and bw are the lengths of the edge fragment, bh
The main idea of the Edge Boxes algorithm is to extract a
series of object candidate boxes based on edge information, and bin are the length and width of the candidate box, and bin
facilitating the acceleration of existing target object detection is the set of edge fragments within the candidate box. The value
algorithms. The algorithm initially employs a structured edge of K is 1.5, used to counteract the impact of larger rectangular
detection algorithm from reference [12] to obtain the edge boxes containing more edge fragments. Reference [15] found
response map of the image. Subsequently, multiple sets of edge in experiments on general object candidate box extraction that
fragments are formed by combining edge points based on edge fragments located inside the object often do not contribute
spatial and directional constraints. Following this, each edge significantly to the edges on the object's contour. Therefore, the
fragment is assigned a weight based on its spatial geometric final score of the candidate box is represented by the hbin
relationship with candidate boxes, and the candidate boxes are calculated from Equation (4).
scored according to the weights of these edge fragments. The
specific steps of the algorithm are summarized as follows: As objects appear relatively small from the perspective of a
UAV, it is necessary to exclude detected overly large objects.
Step 1:Input the image and use the edge detection method
Thus, size constraints are applied to the detected candidate
from reference[12] to obtain the edge response map through
boxes. Figure 2 illustrates the candidate boxes with the top 200
non-maximum suppression processing.
response values detected by the Edge Boxes algorithm.
Step 2:Combine edge points based on spatial adjacency
relationships and directional constraints to generate multiple
edge fragments.
Step 3:Calculate the similarity between any two edge
fragments, with the calculation formula as follows:
a( si .s j ) =cos(θi − θij ) cos(θ j − θij ) (1)
88
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
guaranteed by the method of computing local gradient distance matrix elements are computed through the following
information, as employed by SIFT [13]. Particularly in formula:
scenarios characterized by blurred imagery, such as those
encountered in unmanned aerial vehicle reconnaissance, d (ai , p=
j) 2 − 2ai p j (5)
manually designed feature descriptors exhibit limited
expressive capability. Deep learning, and in particular Figure 4 illustrates the sampling process of the training set.
Convolutional Neural Networks (CNNs), have increasingly For matching descriptors, the closest non-matching descriptors
demonstrated superior performance in the extraction of are designated as follows:
features[14]. The research focus on image feature extraction
has gradually shifted from SIFT features to CNN features. ak min is the closest non-matching descriptor to pi ,
Where min(d (ai , p j min ), d (ak min , p j ))) is the distance to the
Figure 3 HardNet network structure nearest negative sample.
2) HardNet loss function The objective of the loss function is to minimize the
The objective of the loss function is to ensure that the distance between matching descriptors in two images, while
distance between matching pairs is significantly smaller than maximizing the distance between non-matching descriptors. In
the distance between non-matching pairs.Consider two images, other words, it aims to minimize the distance for positive
denoted as A and P, representing a training set (batch) samples and maximize the distance to the nearest negative
comprising 2n image blocks, where n denotes the number of sample.
matching pairs. Each matching pair is associated with
corresponding feature descriptors for the image blocks. The
89
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
C. Similarity measure In each group, the top left corner of the picture is the target
The Pearson correlation coefficient, also known as the image, and the matching situation is shown in Figure 5:
Pearson product-moment correlation coefficient, is a widely
used measure for assessing linear correlation between two
variables. It is commonly employed to quantify the degree of
linear correlation between two samples[16]. For two samples,
A and B, the Pearson correlation coefficient is defined as:
cov( A, B )
ρ(A, B ) = (8)
σ Aσ B
(a) Distance of 200 meters (b) distance of 60 meters
Here, cov( A, B) denotes the covariance between A and B,
an σ A and σ B represent the standard deviations of A and B, Figure 5 Effect diagram of image matching at different distances
respectively. The Pearson correlation coefficient ranges from -1
to 1, where 1 indicates perfect positive correlation, -1 indicates B. Rotation Angle Analysis
perfect negative correlation, and 0 indicates no correlation.
In this section, we analyze the performance of the proposed
matching algorithm under different angles. The experiment
III. EXPERIMENTAL AND RESULTS ANALYSIS involved the drone collecting images at different distances,
In this section, we detail the experimental procedures with constantly changing angles, and all images having a
conducted to evaluate the proposed matching algorithm's consistent size of 1920 by 1080 pixels.
effectiveness on unmanned aerial vehicle (UAV) images. The The matching effect is shown in Table 2:
method involves object block extraction using the Edge Boxes
algorithm, resizing the extracted blocks and the target image to Table 2 Image matching effect at different Angle
a uniform size of 32*32, and concatenating them along Target image distance Maximum box Top five
columns. The concatenated image, with a width of 32 and Angle (M) selection resultsBox select
height of (32(number of object blocks + 1)), is then fed into the 200 False False
HardNet matching network for feature extraction. Each object 120 True True
block yields a 128-dimensional descriptor. Similarity -60°
calculations are performed between the descriptors of the 80 True True
extracted object blocks and those of the target image. The 40 True True
results are sorted, and the maximum similarity value is selected. 200 True True
120 True True
A. Scale Analysis -30°
80 True True
In this subsection, we present the analysis of the proposed
matching algorithm's performance under varying distances. The 40 True True
experiments involve collecting images from a UAV at the same 200 True True
perspective, with distances ranging from far to near, all having 120 True True
a consistent size of 1920*1080 pixels. 30°
80 True True
Table 1 shows the matching effect: 40 True True
Table 1 Image matching effect at different distances 200 False True
90
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.
C. Comparison of traditional and deep learning methods [3] Ma J,Jiang X,Fan A, et al. Image Matching from Handcrafted to Deep
Features: A Survey[J]. International Journal of Computer
The algorithm proposed in this paper directly performs Vision,2020,129(1).
object block matching without the need for image [4] Lowe, D. G. (2004). Distinctive image features from scale-invariant
downsampling. Therefore, it can facilitate the extraction and keypoints. International Journal of Computer Vision(2), 60.
matching of features for small targets. In contrast, traditional [5] Bay H, Tuytelaars T, van Gool L. SURF: Speeded up robust features.
methods such as SIFT and SURF, as well as deep learning Proceedings of the 9th European Conference on Computer Vision. Graz:
methods like SuperPoint, rely on extracting image feature Springer, 2006. 404–417.
points and subsequently matching these points. When the [6] Xuejiao Zhao. (2023). Research on image recognition technology based
on deep learning algorithm. Modeling and Simulation.
target is at a considerable distance, downsampling diminishes
the salience of target features, making extraction more [7] DeTone, Daniel et al. “SuperPoint: Self-Supervised Interest Point
Detection and Description.” 2018 IEEE/CVF Conference on Computer
challenging. Both traditional methods and deep learning Vision and Pattern Recognition Workshops (CVPRW) (2017): 337-33712.
approaches encounter difficulties in achieving precise matching [8] Sarlin, Paul-Edouard et al. “SuperGlue: Learning Feature Matching With
and localization for specified targets in the context of UAV Graph Neural Networks.” 2020 IEEE/CVF Conference on Computer
perspectives. To facilitate a comparison with the algorithm Vision and Pattern Recognition (CVPR) (2019): 4937-4946.
proposed in this paper, a test image set is constructed under [9] Lindenberger, Philipp et al. “LightGlue: Local Feature Matching at Light
conditions of close proximity, allowing the evaluation of Speed.” ArXiv abs/2306.13643 (2023): n. pag.
various algorithms currently in use. The detailed results are [10] Zitnick C L , Dollar P .Edge Boxes: Locating Object Proposals from
presented in Table 3: Edges[J].Springer International Publishing, 2014.DOI:10.1007/978-3-
319-10602-1_26.
Table 3 The average matching accuracy of the test image set at close proximity. [11] Mishchuk, Anastasiya, Dmytro Mishkin, Filip Radenovic and Jiri Matas.
“Working hard to know your neighbor's margins: Local descriptor
algorithm accuracy /% learning loss.” NIPS (2017).
SIFT+RANCAC 72 [12] Dollár P, Zitnick C L. Structured forests for fast edge
detection[C]//Proceedings of the IEEE international conference on
SURF+RANCAC 75 computer vision. 2013: 1841-1848.
SuperPoint+LightGlue 69 [13] Ma, C., Liu, Y., & Zhang, Y. (2022). Robust feature description for non-
linear image variations based on scale invariant deep convolutional neural
SuperPoint+SuperGlue 65 networks. Pattern Recognition, 116, 107854.
ours 85 [14] Li, L., Mao, Z., Hu, J., & Zhou, Z. H. (2019). Object detection using deep
learning. IEEE transactions on neural networks and learning systems,
By comparing various algorithms through the construction 30(1), 94-109.
of a test image set under close proximity conditions, it can be [15] Matthew Brown, Gang Hua, and Simon Winder. Discriminative learning
concluded that the algorithm proposed in this paper of localimage descriptors. IEEE Transactions on Pattern Analysis and
Machine Intelligence
outperforms other algorithms even under close proximity
[16] Gujarathi, R., & Costa, L. da S. (2020). Pearson correlation coefficient: A
conditions. Overall, the algorithm presented in this paper review of its history, interpretation, and application. Journal of Statistical
surpasses current traditional methods and deep learning Computation and Simulation, 90(7), 1514-1532.
matching methods in the UAV perspective.
IV. CONCLUSION
This paper proposes a two-stage image matching algorithm.
Firstly, it utilizes the Edge Boxes algorithm to detect object
blocks within the image. Subsequently, the HardNet network is
employed to extract deep features from these detected object
blocks. Finally, similarity measurement is applied to identify
matching object blocks in pairs of images. Compared to current
traditional and deep learning matching methods, this approach
achieves higher accuracy in image matching. It demonstrates
precise matching even under conditions of varying sizes, and
angles. In summary, the algorithm presented in this paper can
be further applied to computer vision tasks requiring high
precision.
REFERENCES
[1] Wang, Y., Li, H., & Zhang, L. (2020). Research on the method of precise
target localization based on UAV reconnaissance images. Journal of Geo-
Information Science, 20(6), 1079-1088.
[2] Bangchu Zhang, Jian Liao, Yu Kuang, Min Zhang, Shaolei Zhou, Yuhang
Kang. Research Status and Development Trend of the United States
UAV Swarm Battlefield[J]. Aero Weaponry, 2020, 27(6): 7-12.
91
Authorized licensed use limited to: VTU Consortium. Downloaded on May 10,2025 at 09:51:23 UTC from IEEE Xplore. Restrictions apply.