0% found this document useful (0 votes)
31 views14 pages

Visual SLAM Algorithms and Their Application For AR, Mapping, Localization

This document provides a comprehensive overview of visual simultaneous localization and mapping (vSLAM) algorithms, which utilize camera data to estimate positions and reconstruct environments for applications in augmented reality (AR), mapping, localization, and wayfinding. It categorizes recent vSLAM algorithms, discusses their technical aspects, and highlights their effectiveness in various dynamic environments. The survey also includes datasets for evaluation and metrics for assessing algorithm performance.

Uploaded by

utkarshjr10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views14 pages

Visual SLAM Algorithms and Their Application For AR, Mapping, Localization

This document provides a comprehensive overview of visual simultaneous localization and mapping (vSLAM) algorithms, which utilize camera data to estimate positions and reconstruct environments for applications in augmented reality (AR), mapping, localization, and wayfinding. It categorizes recent vSLAM algorithms, discusses their technical aspects, and highlights their effectiveness in various dynamic environments. The survey also includes datasets for evaluation and metrics for assessing algorithm performance.

Uploaded by

utkarshjr10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Array 15 (2022) 100222

Contents lists available at ScienceDirect

Array
journal homepage: www.sciencedirect.com/journal/array

Visual SLAM algorithms and their application for AR, mapping, localization
and wayfinding☆
Charalambos Theodorou a, b, *, Vladan Velisavljevic a, Vladimir Dyo a, Fredi Nonyelu b
a
School of Computer Science and Technology, University of Bedforshire, Luton, LU1 3JU, United Kingdom
b
Briteyellow Ltd, Bedford, MK43 0BT, United Kingdom

A R T I C L E I N F O A B S T R A C T

Keywords: Visual simultaneous localization and mapping (vSLAM) algorithms use device camera to estimate agent’s posi­
AR tion and reconstruct structures in an unknown environment. As an essential part of augmented reality (AR)
Visual SLAM experience, vSLAM enhances the real-world environment through the addition of virtual objects, based on
Simultaneous localization and mapping (SLAM)
localization (location) and environment structure (mapping). From both technical and historical perspectives,
this paper categorizes and summarizes some of the most recent visual SLAM algorithms proposed in research
communities, while also discussing their applications in augmented reality, mapping, navigation, and
localization.

1. Introduction systems [13] are typically not applicable. Moving objects, as a conse­
quence, would affect the system’s ability to estimate camera poses.
Visual SLAM, according to Fuentes-Pacheco et al. [1], is a set of Additionally, the extra object motion introduces calculation errors and
SLAM techniques that uses only images to map an environment and reduces the accuracy of trajectory estimation due to increased compu­
determine the position of the spectator. Compared to sensors used in tation weight. In such environments, the SLAM algorithm is required to
traditional SLAM, such as GPS (Global Positioning Systems) or LIDAR deal with possible errors and a certain degree of uncertainty charac­
[2], cameras are more affordable, and are able to gather more infor­ teristic in sensory measures.
mation about the environment such as colour, texture, and appearance. Moreover, for virtual objects to be properly anchored in the real
In addition, modern cameras are compact, have low cost, and low power environment in an AR (Augmented Reality) [14] experience, it is
consumption. Examples of recent applications that employ vSLAM are necessary to apply tracking techniques. That means dynamically deter­
the control of humanoid robots [3], unmanned aerial and land vehicles mining the viewer’s pose (position and orientation) in relation to the
[4], lunar rover [5], autonomous underwater vehicles [6] and endos­ actual elements of the scene. An alternative is the application of SLAM
copy [7]. techniques, which aim precisely at the creation and updating of a map,
Depending on the camera type, there are three basic types of SLAM: as well as the location of the observer in relation to the structure of the
monocular, stereo, and RGB-D. Stereo SLAM is a multi-camera SLAM environment. This confluence between visual SLAM and AR was the
that can obtain a particular degree of trajectory resolution. Additionally, motivation for the realization of this survey. The objective of this
stereo SLAM has the advantage of being more versatile as opposed to research is to carry out a survey of the main visual SLAM algorithms, as
RGB-D SLAM which is more sensitive to sunlight and is mainly used well as their applications in AR, mapping, localization, and wayfinding.
indoors. The last two decades have seen significant success in devel­ The main characteristics of the visual SLAM algorithms were identified
oping algorithms such as MonoSLAM [8], PTAM [9], PTAM-Dense [10], and the main AR applications on visual SLAM were found and analysed.
DTAM [11] and SLAM++ [12]. However, most systems have been As opposed to presenting a general analysis of SLAM, this survey
developed for motionless environments and their robustness is still a provides an in-depth review of different visual SLAM algorithms. The
concern under dynamic environments. Due to the assumption that the survey also includes various datasets that might be considered for
camera is the only moving object in a stationary scene, these SLAM evaluation and different types of evaluation metrics. Existing studies in


This work was supported by Briteyellow Ltd and by Innovation Bridges.
* Corresponding author. School of Computer Science and Technology, University of Bedforshire, Luton, LU1 3JU, United Kingdom.
E-mail address: [email protected] (C. Theodorou).

https://doi.org/10.1016/j.array.2022.100222
Received 15 March 2022; Accepted 3 July 2022
Available online 8 August 2022
2590-0056/© 2022 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
C. Theodorou et al. Array 15 (2022) 100222

this area tend to describe only one SLAM algorithm, and some of them [25]. Visual SLAM systems typically triangulate the position of set points
are rather old. However, to address this, a complete survey describing through successive camera frames, and simultaneously use this infor­
seminal and more recent SLAM algorithms was produced. mation to estimate the camera’s pose. In essence, these systems are
Even if some surveys include a description of different SLAM algo­ designed to map their surroundings in relation to their location in order
rithms (e.g., Refs. [15–17]), an expanded overview of SLAM algorithms, to facilitate navigation.
including those recently developed, is included in this survey, a set of This can be accomplished with a single 3D vision camera, unlike
datasets that could be used to evaluate multiple SLAM algorithms and a other forms of SLAM. As long as there are sufficient points tracked
set of evaluation metrics Table 1. Additionally, the limitations of the through each frame, both the orientation of the sensor and the structure
evaluation metrics have been identified, which will be explored further of the physical environment can be quickly understood.
in the future. Through this article, we hope to help readers better un­ The goal of all visual SLAM systems is to minimize reprojection error,
derstand different SLAM algorithms and how they might be applied in or the difference between the projected and actual points, by means of
different fields. algorithmic adjustment known as bundle adjustment. Due to the need
Following is a description of how this survey is organized. SLAM for visual SLAM systems to operate in real-time, location and mapping
applications were introduced in Section II. In Section III, various SLAM data are bundled adjusted separately but simultaneously to provide
algorithms were discussed. In Section IV a table of different SLAM fea­ faster processing speeds then merged together.
tures was introduced. Section V includes a discussion of various datasets Robots can also be equipped with a camera as another common
that could be used to experiment with SLAM algorithms. Section VI in­ sensor. Engle et al. propose a camera-based method for large-scale, semi-
cludes a description of the two most common used evaluation metrics. dense maps called LSD-SLAM [26] that does not require adjusting
Section VII is mainly focused on discussions about SLAM and Section bundles of features. ORBSLAM [27] is another approach to visual SLAM
VIII concludes this survey. proposed by Use Mur-Artal et al. which allows visual SLAM problems to
be broken down into three subtasks: tracking, mapping, and optimizing.
2. Visual SLAM applications Due to the camera’s sensitivity to light changes, using a monocular
camera is not sufficient to solve the scaling problem. This could lead to
SLAM algorithms make use of data from different sensors. Visual errors during the localization process. A combination of laser range­
SLAM is a SLAM technique that uses only visual sensors which may finder and monocular camera might be useful since those two sensors
require a monocular RGB camera [18], a stereo camera [19], an omni­ are most commonly used for robots. Although such algorithms improve
directional camera (which captures images simultaneously in all 360-de­ the performance of feature extraction, they are limited to particular
gree directions) [20] or an RGB-D camera (captures depth pixel SLAM algorithms.
information in addition to RGB images) [21]. Following, in this section
Localization, Mapping, and Wayfinding, the three main categories of 2.2. Mapping
vSLAM, are described in more depth, along with some relevant algo­
rithms applicable to each category. Visual SLAM mapping is performed by using cameras to acquire data
about an environment, followed by combining computer vision and
odometry algorithms to map the environment. This allows robots to
2.1. Localization
navigate themselves independently and to improve localization. The
majority of robots have wheels, so measuring their distance is easy.
Localization systems assist users in identifying their location and
Inertial measurement units (IMU) [28] have been added to some robots
orientation within an environment. Localization can be performed both
for measuring their body motion. Nonetheless, relying on odometry
indoors [22] and outdoor [23] using various methods. There are tradi­
alone to estimate location is not helpful. However, relying on odometry
tional approaches that do not involve technology, such as the use of
alone to calculate the own position is not accurate enough because of
mental maps that are built through guided exposure to the environment
accumulated error produced by noise. When it takes several turns for the
or auditory instructions, as well as tactile maps. Robots must be able to
error to accumulate, the location becomes uncertain. Loop closure plays
understand their current position in the environment in order to com­
a key role here. Considering buildings and trees to be static, there should
plete tasks securely and autonomously. This type of problem is defined
be a loop closure [29] at the exact same place since it has already been
as a requirement that the robot knows where it is in the environment.
there (loop detection) [30]. It is then possible to correct and adjust the
Laser rangefinders [24] are among the most popular sensors used in
generated map for these accumulated noises [31].
developing SLAM algorithms. For the localization of unmanned ground
The vSLAM concept is fundamental to any kind of robotic application
vehicles (UGVs) in large-scale indoor environments, a scan matching
where the robot must traverse a new environment and generate a map.
method was proposed along with an algorithm for multi-resolution
The system is not limited to robots, but can be used on smartphones and
likelihood mapping. As a result, searching space can be reduced dur­
their cameras, as well. vSLAM would be one aspect of the pipeline
ing the matching process. Using a laser rangefinder, the position of the
needed by some advanced AR use cases, for example, where virtual
robot can be determined with considerable accuracy depending on the
worlds need to be accurately mapped onto real environments.
rangefinder’s accuracy. Although the laser rangefinder enables the
determination of pose, in large and highly irregular environments,
2.3. Wayfinding
where walls surrounding the environment or corridors have no variety.
In an environment with simple geometry, like a park with trees and
Wayfinding systems must be capable of planning and communicating
circles, a robot, for example, can easily get lost.
effective paths to users. Localizing the user and planning the path to the
In recent years, interest in visual SLAM has increased significantly
user’s desired destinations go hand in hand. Once a user has been
localized, the optimal path to destination can be determined and
Table 1 communicated to the user as accessible instructions. There is always a
Survey papers discussing SLAM algorithms.
possibility that the user may veer from the recommended path for many
Ref. Year Algorithm Flowcharts Datasets Metrics reasons, and a smart navigation aid will be capable of dynamically re-
[15] 2015 No No No planning the path to the user’s destination based on his/her new loca­
[16] 2017 No No No tion. While remaining simple and effective, directions must include
[17] 2020 No No No landmarks that can be sensed by the user during navigation. State filter-
Our Work 2022 Yes Yes Yes
based SLAM algorithms such as MonoSLAM and pose-graph SLAM such

2
C. Theodorou et al. Array 15 (2022) 100222

as PTAM, DTAM and LSD-SLAM can be used for wayfinding. In and mapping are handled separately. The first thread tries to track the
wearable. erratic motion of a hand-held device, while on the other hand the second
RNAs [32] today there are multiple SLAM algorithms implemented, method generates a 3D map of the point features based on previous
such the algorithm proposed by Saez for a 6-DOF Pose Estimation (PE) frames. A detailed map is produced, with thousands of landmarks.
[33] by using an RNA camera. Visual Odometry (VO) algorithm [34] Clearly visible at high frame rates Model-based systems are surpassed in
and entropy-based cost functions are used to determine egomotion terms of their accuracy and robustness by this method. In the process of
(change in pose between two camera views) of the camera. For esti­ mapping, there are two distinct stages [9]. A first stage involves creating
mating the pose of a wearable RNA based on stereovision, a an initial map with stereo techniques. After the keyframes (map points)
metric-topological SLAM method (SLAM++) has been proposed. As are added to the map by tracking systems, the mapping thread refines
features from stereo camera images are extracted and tracked step by and expands Fig. 2.
step, local topological maps of the area and global topological relations Video images captured by the hand-held camera are used to maintain
between the areas on the maps are updated. Stereo cameras cannot real-time estimates of the camera’s position in relation to the built map.
provide complete depth information about the scene, though these RNAs After estimating the video frame, graphics can be augmented over it. To
are not capable of detecting objects. Wearable RNAs [35] rely on RGB calculate the final pose, the system uses the same procedure every frame.
camera depth data due to its capability of providing more reliable depth The motion model is used to generate a pose estimate from a new frame
data in feature-sparse environments. In order to estimate the camera’s every time the camera detects a new frame. An estimate of the frame’s
pose, a bundle-adjustment algorithm [36] is used and visual features are prior pose determines how map points should be projected into the
extracted from images and compared against each other. It is possible to image. A final pose estimation is computed based on the detection of
estimate camera pose in real-time thanks to PL-SLAM [37], which sep­ coarse-scale features in the image. From these coarse matches, the
arates the task of tracking and mapping into two separate threads and camera pose is updated, and the overall pose is estimated.
processes them on a dual-core computer. Recent SLAM methods align PTAM is advantageous because it splits tracking and mapping into
the whole image rather than matching features. However, these types of two separate tasks and processes them in parallel, allowing for batch
methods are typically less accurate than feature-based SLAM methods optimization techniques that are not generally associated with real-time
for estimating pose. operations. This map only serves as a tool for tracking cameras, which is
a limitation of PTAM. Virtual entities should be able to interact with the
3. SLAM algorithms map’s geometry, so it shouldn’t be static. PTAM also lacks auto-
occlusive capabilities, which mean that it cannot track objects without
In general, Visual SLAM algorithms have three basic modules: outside assistance. Another limitation of SLAM is that it is not designed
initialization [38], tracking and mapping [39]. The initialization con­ to close large loops. M-estimators are a general class of extremum esti­
sists of defining the global coordinate system of the environment to be mators where the objective function is the sample average. The non-
mapped, as well as the reconstruction of part of its elements, which will linear least squares method and maximum likelihood estimation are
be used as a reference for the beginning of the tracking and mapping. both special cases of M-estimators. M Estimators of trackers do not take
This step can be quite challenging for some visual SLAM applications feature map uncertainties into account, but this does not affect AR
Fig. 1. The next section of this paper is split into three categories: applications.
monocular based, stereo focused, and monocular and stereo focused MonoSLAM. The first successful SLAM algorithm in mobile robotics
vSLAM algorithms. In detail, each algorithm is described along with its was monocular SLAM (MonoSLAM). By moving rapidly along the tra­
advantages and disadvantages. jectory of a monocular camera in an unknown environment, natural
landmarks can be reconstructed into 3D maps, and an urban environ­
3.1. Monocular based ment can be mapped using sparse but persistent points. In this approach,
a map of natural landmarks is created online in a probabilistic frame­
Monocular SLAM is a type of SLAM that relies exclusively on a work from a sparse but persistent set of data. A fundamental aspect of
monocular image sequence captured by a moving camera in order to MonoSLAM is the feature-based map that is a probabilistic snapshot of
perform mapping, tracking and wayfinding. A monocular image the camera’s current state at any given point, as well as the uncertainty
sequence is usually a set of images that are similar to each other. in its estimations. Fig. 3 shows how the Extended Kalman.
PTAM. A hand-held camera can be tracked by Parallel Tracking and Filter continuously updates maps when the system starts up and
Mapping (PTAM) in an AR environment. In parallel threads, tracking persists until the operation is complete. Motion of the camera and

Fig. 1. Classification of Visual SLAM algorithms.

3
C. Theodorou et al. Array 15 (2022) 100222

Fig. 2. PTAM system architecture.

feature observations result in updated probabilistic state estimations. A along the direction of feature viewing, beginning at the estimated po­
new state is added when new features are observed and, if necessary, sition of the camera and having Gaussian uncertainty in parameters.
feature deletion is also possible. Images captured with a camera are combined with computer
A given image measurement and camera position cannot directly be graphics to generate composite scenes in augmented reality. Graphics
used to invert the feature measurement model to determine the location must appear as if they are anchored to the 3D scene being observed in
of a new feature since the feature depth is unknown. To determine the order to produce a convincing effect. An accurate understanding of the
depth of a feature, several measurements from different viewpoints are camera’s motion is required in order to achieve this. The location of the
required, as well as the camera’s motion. Rather than trying to track the object can then be fed into a standard 3D graphics engine, which then
new feature for several frames, it would be better to estimate its depth by renders the image correctly.
triangulating multiple views from multiple views instead of tracking it in ORB-SLAM. ORB-SLAM is a feature-based, real-time SLAM system
the in-game frame. As a result, 2D tracking would be worthless since that works outdoors and indoors in a variety of environments. As a result
tracking a moving camera is very difficult. As well, initialization of of the robust system, motion clutter can be tolerated, the baseline loop
functions in a camera with a narrow viewing angle must be completed can be closed and the loop re-located, and the system can be fully
very quickly to prevent them from being overwritten. As an alternative, automatic. ORB-SLAM operates on three threads at the same time:
after identifying and measuring a new feature, an initial 3D line should tracking, local mapping, and loop closure.
be drawn along a certain line on the map. The line grows to infinity With every frame, tracking locates the camera and determines when

4
C. Theodorou et al. Array 15 (2022) 100222

Fig. 3. MonoSLAM system architecture.

a new keyframe should be inserted. In order to optimize the pose, case, we estimate inter-frame rotations from the model, and then refine
motion-only Bundle Adjustment (BA) was used, along with initial the model in the second case, with 6DOF full pose refinement. In both
feature matching with the previous frame. In the event that tracking is cases, the Lucas-Kanadestyle nonlinear least-squares algorithm is used to
lost (e.g., due to occlusions or abrupt movement), relocalization is minimize an as-measured photometric cost function iteratively. To
performed globally using the place recognition module. The covisibility achieve the global minimum, as soon as the system is initially placed
graph of keyframes that is maintained by the system is used to retrieve within a convex basin, the true solution must be found. As a final step,
the local visible map based on a first estimate of camera pose and feature we will maximize efficiency by using a power of two image pyramid.
matching. After all local map points are found, reprojection is used to LSD-SLAM. In Large-Scale Direct Monocular SLAM (LSD-SLAM),
find matches, and camera pose optimization is again performed. Last but there are three key components: tracking, depth map estimation, and
not least, the tracking thread determines if a keyframe needs to be map optimization. Whenever new camera images are captured, the
added. tracking component automatically keeps track of them. By using the
By processing new keyframes and performing local BA, local map­ previous frame’s pose as the initialization, it determines the rigid body’s
ping is able to reconstruct the surrounding environment efficiently. By position based on the current keyframe. Based on tracked frames, the
finding new correspondences for unmatched ORB, new points are depth map estimation component refines or replaces current keyframes.
triangulated in the covisibility graph. The point culling policy is applied Through relative comparison between pixels to many smaller baselines
some time after the points are created, based on the results of the over time, and interleaved spatial regularization, the depth can be
tracking. This ensures that only high-quality points are retained. Also, refined. Whenever the camera moves too far from the current keyframe,
redundant keyframes are culled by local mapping. points from nearby keyframes are projected into the new keyframe
It is important to keep in mind that pixels that do not belong to the Fig. 4.
model can negatively affect tracking quality. Photometric errors over a In general, keyframes for which tracking references replace key­
certain threshold must be excluded from being included in the analysis. frames will not be further refined, since their depth maps are incorpo­
As the least squares method converges, this threshold is lowered with rated into the global map by the mapping optimization component. In
each iteration. As a result, this scheme makes observing unmodeled order to detect loop closures and scale drift, a similarity transform is
objects possible while tracking densely. estimated based on scale-aware direct image alignment.
DTAM. Dense Tracking and Mapping (DTAM) is a real-time tracking A pose graph is shown in the map as a series of keyframes, after each
and reconstruction approach that does not rely on feature extraction, but of which a camera image is shown, an inverse depth map is displayed,
rather on dense, pixel-by-pixel tracking. When a large RGB hand-held and a variance of the inverse depth is displayed. Only pixels containing
camera flies over a static scene, a dense patchwork surface of millions all regions with sufficient gradients of intensity can be used to calculate
of vertices is created. A detailed texture depth map is generated based on a depth map and variance, and thus only on semi-dense scenes. An
keyframes using the algorithm. In order to create a depth map, bundles alignment metric as well as a keyframe’s covariance are determined by
of frames are reconstructed densely and with sub-pixel resolution. similarity transforms.
When estimating the pose of a live camera, one can determine what PMDS-SLAM. Probability Mesh Enhanced Semantic SLAM (PMDS-
motion parameters produce the best synthetic viewpoint that matches SLAM) [40] divides pixels into meshes and integrates the motion
the live image. probability information from historic frames. The probabilities are
Two stages are involved in refining live camera poses. In the first propagated to the new frame meshes. With the use of motion checks, the

5
C. Theodorou et al. Array 15 (2022) 100222

Fig. 4. LSD-SLAM system architecture.

probability of dynamic targets can be updated in the new frame, More than 90% of the improvements can be achieved for scenes that
reducing its impact on tracking. This mesh probability is further used to have high dynamic range. As a result of PMDS-SLAM, dynamic objects
remove dynamic feature points that are highly likely. are eliminated from interference, thereby reducing pose errors.
During PMDS-SLAM, images are tracked and captured, and the se­ VPS-SLAM. In the world of visual planar semantic SLAM (VPS-SLAM)
mantic information of each pixel is extracted using the Mask-RCNN [41], a lightweight and real-time framework is developed. As part of this
segmentation technology. The superpoint segmentation technology for method, visual/visual-inertial odometry (VO/VIO) is applied to the
segmenting the current frame into a superpoint mesh is then used. The geometrical data representing planar surfaces derived from semantic
initial mesh probability is generated from that semantic prior informa­ objects. Using planar surfaces to estimate shapes and sizes of selected
tion, and then propagated through history and into current time. As the semantic objects allows for rapid, highly accurate improvements in
current frame contains positions where motion is present, it calculates metrics. A graph-based approach utilizing several state-of-the-art
the motion state for superpoint mesh points at those locations. These VO/VIO algorithms and the latest object detectors can estimate the ro­
meshes are then updated using the Bayesian probability formula. Using bot’s six degrees of freedom pose while simultaneously generating a
the dynamic area mask generated by the tracking thread based on the sparse semantic model of the environment Fig. 6. No prior knowledge of
mesh probability, no real moving features can be observed in the result. the objects is needed for this approach.
Camera pose is calculated using only static feature points that match Any object-based detector can be used to detect semantic objects in
each other Fig. 5. VPS-SLAM.
In PMDS-SLAM, the image details are subdivided and all the targets
in the scene are segmented using superpoint segmentation, as opposed 3.2. Stereo based
to deep learning’s semantic segmentation. To achieve the segmentation
effect, it sprinkles superpoints randomly over the RGB input image and Stereo based vSLAM rely on feature points to estimate the camera
iteratively extends the range of superpoints as necessary. In this case, the trajectory and build a map of the environment. Feature points usually
target is not separated individually, but instead it’s subdivided. Using a are points from all the edges in an environment. The performance of
fast implementation method [14], this paper segments the image into such algorithms is affected in low-textured environments, where it is
superpixels by SLIC [13]. This approach can allow the semantic seg­ sometimes difficult to find a sufficient number of reliable point features.
mentation network to segment targets that are unrecognizable with DS-PTAM. A stereo vision-based approach to SLAM is Distributed
greater accuracy, as well as pinpoint the motion feature point area more Stereo Parallel Tracking and Mapping (DS-PTAM). Its purpose is to build
precisely, eliminating the whole contour feature point of targets that are a map of an environment in which a robot can operate in real time while
due to partial joint motion. In comparison to ORB-SLAM2, PMDS-SLAM getting an accurate estimate of its position. By dividing tracking and
can significantly improve a low dynamic sequence’s accuracy by more mapping tasks into two independent execution threads and performing
than 27.5%. them both in parallel, S-PTAM achieves very good performance

6
C. Theodorou et al. Array 15 (2022) 100222

Fig. 5. PMDS-SLAM system architecture.

Fig. 6. VPS-SLAM system architecture.

7
C. Theodorou et al. Array 15 (2022) 100222

compared to other current SLAM methods. S-PTAM assumes that the constraints between keyframes and map points by finding new matches.
stereo camera is at the origin of world coordinates and doesn’t know Further, both map points and measurements considered to be unreliable
what its surroundings are at the beginning of the process. or spurious are removed.
The map is then initialized using the extracted features from the left DOC-SLAM. The Dynamic Object Culling SLAM (DOC-SLAM) [42]
and right frames using a triangulation process. Points on a map are system is a stereo SLAM that is able to achieve good performance in
known as triangulations. Based on the current map, the next stereo highly dynamic environments by removing the actual moving objects.
frame will be located based on the current pose of the camera. Based on By combining the semantic information from panoptic segmentation
previous poses, a decay-velocity model is used to determine the cam­ together with the optical flow points, DOC-SLAM can detect potential
era’s initial pose. By processing each image in three dimensions and then moving objects. To accomplish dynamic object culling, a moving con­
plotting each point on a plane, we adjust this estimation by comparing sistency check module determines and removes feature points in objects
each of its descriptors to the features that are extracted from it. In this which are in motion.
procedure, correspondences are obtained from an observation point in Utilizing a direct method of estimating camera trajectory, this
space that was made by the stereo camera. These correspondences are method reduces the time that’s consumed with feature extraction and
referred to as 2D-3D matches or constraints. Additionally, stereo frames tracking. To remove the objects in motion I propose a moving consis­
from unknown regions around the world are selected to be included in tency check module, which is an alternative to the feature-based
the map, which adds constraints and new points to the map Fig. 7. These method, which measures match points by re-projection error. In
frames are known as keyframes. Fig. 8, static point extraction is followed by dynamic point culling to
After moving along its path, it will obtain new stereo images that will extract the key points.
be processed using the described method, resulting in an incremental VINS-Fusion (stereo) is the base on which DOC-SLAM’s localization
map whose size increases with each successive iteration. The Tracker module is built. In static scenes, VINS-Fusion achieves accurate self-
module is responsible for this functionality. A second execution thread localization by using optimization-based state estimation.
called Mapper will run concurrently with Tracker, which will adjust
camera positions and point locations, i.e., adjust the map. Bundle
Adjustment is used for these refinements. Also, this thread increases the

Fig. 7. DS-PTAM system architecture.

8
C. Theodorou et al. Array 15 (2022) 100222

Fig. 8. DOC-SLAM system architecture.

3.3. Monocular and stereo based updated keyframes. Based on the correction applied to their reference
keyframe, the non-updated points are transformed in Fig. 9.
Monocular and Stereo based vSLAM algorithms can perform map­ Whenever environmental conditions do not change significantly in
ping, tracking and wayfinding by using both a sequence of images or just the long run, the localization mode can be used to enable lightweight
feature points. long-term localization. If necessary, the tracking processes relocalize the
ORB-SLAM2. The ORB-SLAM2 system [43] is an integrated SLAM camera continuously in this mode without deactivating the local map­
system for monocular, stereo, and RGB-D cameras that offers map reuse, ping or loop closure threads. As part of this mode, points are mapped
loop closing, and relocalization functions. This system works by using using visual odometry matches. In visual odometry, the 3D points that
standard CPUs in different environments ranging from small handheld are created at each point of the current frame at the same position as the
devices inside the home to drones flying in factories and cars traveling 3D points generated in the previous frame are matched with the ORB in
through city streets. Bundled adjustments combined with metric scale the current frame. Localization is robust to regions that have not been
observations allow for accurate trajectory estimation on the back end. mapped, but drift may accumulate. By matching map points, we ensure
For localization, the system provides a lightweight mode that utilizes the existing map remains localized at all times.
visual odometry to track non-mapped regions and match those tracks to DynaSLAM. The capability of dynamic object detection and back­
map points to ensure zero drift. ground inpainting is added to ORB-SLAM2 using DynaSLAM. Whether it
During ORB-SLAM2, a full Bundle Adjustment (BA) optimization is is monocular, stereo, or RGB-D, DynaSLAM works well in dynamic
applied to reach the optimum solution. Since optimization would scenarios. The system is capable of detecting moving objects either using
require a lot of resources, it was performed in a separate thread, thus deep learning or multiview geometry. Static maps of scenes make it
allowing the system to create maps and identify loops while optimiza­ possible to inpaint frame backgrounds obscured by dynamic objects.
tion was being performed. As a consequence, merging the bundle After segmenting the potentially dynamic content, the pose of the
adjustment output with the existing map is challenging. During the camera is tracked by analyzing the static part of the image. Because
optimization, a new loop may occur, causing the optimization to be high-gradient areas tend to appear in segment contours, salient features
aborted and the loop to be closed, triggering the BA Optimization pro­ tend to stand out. Such contour areas do not have features taken into
cess again. Upon completion of the full BA, the updated subset of key­ account. This is a simpler, lighter version of ORB-SLAM2’s tracking,
frames should be merged, and all points that were inserted during the which is implemented at this stage of the algorithm. In Fig. 10, the al­
optimization process should be optimized by the full BA. Through the gorithm consists of projecting map features into an image frame, veri­
spanning tree, the corrected keyframes are propagated to the non- fying correspondences within the static areas of the image, 460 and

9
C. Theodorou et al. Array 15 (2022) 100222

Fig. 9. ORB-SLAM2 system architecture.

optimizing the camera pose by minimizing re-projected errors. 4. Comparison


For every removed dynamic object, background inpainting is used to
reconstruct a realistic image by taking information from previous views Monocular vSLAM algorithms such as MonoSLAM, PTAM, PMDS-
and painting over the occluded background. After the map has been SLAM, LSD-SLAM and ORB-SLAM lack the ability to perform well and
created, the synthetic frames may be used to relocate and track cameras, fast in large, crowded indoor environments. This is due to various rea­
as well as for applications 465 such as virtual and augmented reality. sons. For example, due to MonoSLAM’s deterministic nature, it is diffi­
Finally, the only limitation of DynaSLAM is that it is less accurate in cult to estimate an exact normal vector at each feature location due to
scenes with dynamic objects. the relatively simple texture patterns associated with many features,
ORB-SLAM3. Using pin-hole and fisheye lens models, ORB-SLAM3 such as black on white corners, for which full warp estimation is not
[44] is the first system that can perform visual, visual-inertial, and possible. In addition, the algorithm must be improved to handle large
multimap SLAM. indoor and outdoor environments. Moreover, ORB-SLAM’s performance
It is the first system to rely on maximum a posteriori (MAP) esti­ is not robust to changes in illumination while tracking and to global
mation during the initialization of the inertial measurement unit, illumination changes can occur in real life.
resulting in two to ten times higher accuracy as compared to other ap­ On the other hand, stereo base vSLAM algorithms such as DS-PTAM,
proaches in small and large, indoor, and outdoor environments. A new it is tracker configuration and map configuration are prone to higher
mapping method allows ORB-SLAM3 to survive periods of poor visual computational costs and take a longer time to execute. Because map
data, as it continues with an updated map when visual data become updates are so complex, sending and receiving them was complicated.
unavailable for some reason, while combining previous maps seamlessly DS-PTAM has another limitation, tracking is estimated slower because
as new data becomes available. In comparison with conventional the configuration works better on an unoptimized map.
odometry systems, ORB-SLAM3 maintains all the previously processed Monocular and stereo based vSLAM algorithms such as ORB-SLAM2,
keyframe information from equally visible frames from each stage, even their optimization process requires a lot of resources and merging the
if they are far apart in time or from previous sessions, increasing overall bundle adjustment output with the existing map is challenging. Also, if a
accuracy. new loop occurs, the optimization is stopped, the loop is closed, and the
Sensor data is processed by tracking threads, which compute the BA optimization starts over.
position of the current frame relative to the current map in real time. Both Monocular and Stereo based vSLAM algorithms use Bundle
Feature projections with matched features can be made with minimal Adjustment and some of them use Local Bundle Adjustment. In feature-
error. Further, it determines if a keyframe should be applied to the based monocular SLAM, bundle adjustment plays an important role. As
current frame. Visual-inertial modes estimate body velocity and IMU part of the 6DOF camera trajectory and 3D point cloud estimation,
bias by including residuals from inertial sensors during optimization. bundle adjustment is used to estimate the 3D map (3D point cloud)
Tracking threads that lose tracking attempt to relocalize the current based on input feature tracks. However, SLAM systems using bundle
frame throughout all Atlas’ maps [44]. The tracking will resume if it has adjustments suffer from two major weaknesses. In the first place, the
been relocalized, and the active map will be switched. If the active map need for careful initialization of bundle adjustments requires the map to
is not initialized after a certain time, it becomes inactive and is stored as be estimated as accurately as possible and maintained over time, making
nonactive until it can be re-initialized from scratch. the overall algorithm complex. A second challenge will arise during

10
C. Theodorou et al. Array 15 (2022) 100222

Fig. 10. DynaSLAM system architecture.

periods of slow motion or rotation when the SLAM algorithm has diffi­ adjustments.
culty estimating the 3D structure (which requires an appropriate A monocular and stereo based vSLAM algorithm that outperforms
baseline). every-thing else is ORB-SLAM3. It is the first system that can perform
A local bundle adjustment (LBA) is a method of estimating the ge­ visual, visual inertial, and multimap SLAM. Moreover, it is the first
ometry of image sequences taken by a calibrated camera. This approach system to rely on maximum a posteriori (MAP) estimation during the
has the advantage of reducing computational complexity, allowing real- initialization of the inertial measurement unit, achieving two to ten
time processing with comparable accuracy as standard (global) bundle times higher accuracy as compared to other approaches in small and

11
C. Theodorou et al. Array 15 (2022) 100222

large, indoor, and outdoor environments. obtained in this way: m = n -/Delta. Overall time indices of the trans­
lational component, the root mean squared error (RMSE) was calculated
5. SLAM algorithms table as follows:
( )12
Table 2 shows a general description of the SLAM algorithms for 1 ∑m
RMSE(E1:n , Δ) : = ‖trans(Ei )‖2 (2)
different factors. The purpose of the algorithm could be either for m i=1
general use, AR or Robotics. The Camera represents what camera can be
used for each algorithm and Environment shows in which environment The translational components of the relative pose error Ei are
each algorithm works. Finally, Table 2 shows the Resolution of each denoted by trans (Ei). In certain situations, root mean square error is
algorithm and their Estimation. preferred rather than mean error since outliers are less affected. It is also
possible to compute the median rather than the mean instead, which
gives outliers less influence. Also, the rotational error can be evaluated.
6. Datasets
However, most of the time the translational errors are sufficient for
comparison (since rotational errors are translated by the camera when it
Open-source datasets will be discussed in this section which may be
is moved).
used to test SLAM algorithms. In this section, the most commonly used
For systems attempting to match consecutive frames, it is necessary
datasets were discussed, including the KITTI dataset, the EuRoc dataset,
to incorporate the time parameter Δ = 1 which indicates the drift per
and the TUM RGB-D dataset.
frame RMSE (E1:n).
KITTI Dataset. Karlsruhe Institute of Technology and Toyota Tech­
When using multiple previous frames in a system, larger distribu­
nological Institute (KITTI) [45] datasets have been used by mobile ro­
tions of/Delta can be appropriate. For example, Δ = 20 gives the drift per
botics and autonomous driving research. A variety of sensor
second for a sequence recorded at 20 Hz. One commonly chosen (but
technologies were used to record hours of traffic scenarios, including
poor) method of comparing the start point and the end point is to set Δ
high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner.
= n. Because this metric penalizes rotational errors more towards the
Although incredibly popular, the dataset does not contain sufficient
end of the trajectory [48,49], it is misleading. In order to evaluate SLAM
ground truth to allow for semantic segmentation. A total of 7481
systems, it makes sense to average over all time intervals Δ for example,
training images are annotated using 3D bounding boxes in the KITTI
to compute
dataset.
EuRoC Dataset. European RoC MAV [46] is a visual-inertial dataset 1∑ n

collected by a Micro Aerial Vehicle (MAV). Synced IMU measurements, RMSE(E1:n ) : = RMSE(E1:n , Δ) (3)
n Δ=1
as well as motion and structure ground truths, are present in the dataset.
Visual-inertial localization algorithms can be designed and evaluated This expression has quadratic computational complexity in terms of
with the dataset. trajectory length. Accordingly, it was proposed [46] that it could be
TUM RGB-D Dataset. TUM RGB-D [47] is a dataset containing images approximated by composing a set of relative pose samples from a fixed
which contain colour and depth information collected by a Microsoft number of locations.
Kinect sensor along its ground-truth trajectory. Recording was done at Absolute trajectory error (ATE). For vSLAM systems, the absolute
full frame rate (30 Hz) and sensor resolution (640 × 480). Ground-truth distance between the estimated and the ground truth trajectory is
trajectory information was collected from eight high-speed tracking another important metric that can be used to assess the global consis­
cameras (100 Hz), using high-precision motion capture. tency of the estimated trajectory. Due to the fact that both trajectories
can be specified in any coordinate frame, they must be aligned first.
7. Evaluation metrics With the Horn method [50], one can obtain a rigid-body transformation
S that maps the estimated trajectory Pi:n onto the ground truth trajectory
Relative pose error (RPE). Based on a determined time interval Δ, Q1:n using the least-squares solution. As a result of this transformation,
relative pose error is a measure of the local accuracy of the trajectory. the absolute trajectory error can be computed as follows:
Hence, the relative pose error is indicative of the drift of the trajectory
Fi = Q−i 1 SPi (4)
which is especially relevant for evaluating visual odometry systems.
According to this definition, the relative pose error at time step i is For translational components, it was proposed [47] to calculate root
( )− 1 ( − ) mean squared error over all time indices for each component, for
Ei : = Qi e− 1 Qi + Δ Pi 1Pi + Δ (1) instance
The relative pose errors along a sequence of n camera poses are

Table 2
Slam algorithms.
Algorithm Purpose Camera Environment Resolution Estimation

MonoSLAM General Monocular Indoor Low EKF


PTAM AR Monocular Indoor Low BA
DS-PTAM Robotics Stereo Indoor/Outdoor Low-High BA
PTAM-DENSE Robotics Monocular Indoor Low BA
ORB-SLAM General Monocular Indoor/Outdoor Low-High Local-BA
ORB-SLAM2 General Monocular/Stereo Indoor/Outdoor Low-High Local-BA
PL-SLAM General Monocular Indoor Low-High BA
DTAM General Monocular Indoor Low Local-BA
LSD-SLAM General Monocular Indoor/Outdoor Low-High PG
SLAM++ General Depth Indoor Low Local-BA
DynaSLAM General Monocular/Stereo Indoor/Outdoor Low-High BA
DOC-SLAM Robotics Stereo Indoor/Outdoor Low-High Local-BA
PMDS-SLAM Robotics Monocular Indoor/Outdoor Low-High Full BA
VPS-SLAM General Monocular Indoor/Outdoor Low-High Local-BA
ORB-SLAM3 General Monocular Indoor/Outdoor High Local BA

12
C. Theodorou et al. Array 15 (2022) 100222

( )12 Declaration of competing interest


1∑ n
RMSE(F1:n ) : = ‖trans(Fi )‖2 (5)
n i=1
The authors declare that they have no known competing financial
By averaging the data over all possible time intervals, the RPE can interests or personal relationships that could have appeared to influence
also be used to evaluate the overall error of a trajectory. Translational the work reported in this paper.
and rotational errors are taken into account by the RPE, while only
translational errors are considered by the ATE. Therefore, the RPE References
metric provides an elegant way to combine rotational and translational
[1] Fuentes-Pacheco J, Ruiz-Ascencio J, Rend′ on-Mancha JM. Visual simultaneous
errors into a single measure. The ATE, however, generally also detects localization and mapping: a survey. Artif Intell Rev 2015;43(1):55–81.
rotational errors indirectly because they show up in wrong translations. [2] Queralta JP, Yuhong F, Salomaa L, Qingqing L, Gia TN, Zou Z, Tenhunen H,
Westerlund T. Fpga-based architecture for a low-cost 3d lidar design and
implementation from multiple rotating 2d lidars with ros. In: 2019 IEEE sensors;
7.1. Limitation of evaluation metrics
2019. p. 1–4. https://doi.org/10.1109/SENSORS43011.2019.8956928.
[3] Sheikh R, OBwald S, Bennewitz M. A combined rgb and depth descriptor for slam
Many researchers today still use older techniques and algorithms to with humanoids. In: 2018 IEEE/RSJ international conference on intelligent robots
evaluate the accuracy of their SLAM algorithm. Those evaluation ap­ and systems (IROS). IEEE; 2018. p. 1718–24.
[4] Shim JH, Im Cho Y. A visual localization technique for unmanned ground and
proaches have many limitations i.e do not work for all algorithms and aerial robots. In: 2017 first IEEE international conference on robotic computing
especially recent ones and do not always work if the environment is (IRC). IEEE; 2017. p. 399–403.
large and full of obstacles. Those limitations bring the motivation to [5] An P, Liu Y, Zhang W, Jin Z. Vision-based simultaneous localization and mapping
on lunar rover. In: 2018 IEEE 3rd international conference on image, vision and
explore new ways that could potentially be used to accurately evaluate computing (ICIVC). IEEE; 2018. p. 487–93.
SLAM algorithms. [6] Zhang Q, Niu B, Zhang W, Li Y. Feature-based ukf-slam using imaging sonar in
underwater structured environment. In: 2018 IEEE 8th international conference on
underwater system technology: theory and applications (USYS). IEEE; 2018.
8. Discussion p. 1–5.
[7] Xie C, Yao T, Wang J, Liu Q. Endoscope localization and gastrointestinal feature
AR systems demonstrate the great potential of visual SLAM algo­ map construction based on monocular slam technology. J Infect Publ Health 2020;
13(9):1314–21.
rithms to deal with the registration problem. The mapping of the envi­ [8] Davison AJ, Reid ID, Molton ND, Stasse O. Monoslam: real-time single camera
ronment enables the user to include virtual objects in their view slam. IEEE Trans Pattern Anal Mach Intell 2007;29(6):1052–67.
according to their point of observation along with solving the occlusion [9] Klein G, Murray D. Parallel tracking and mapping for small ar workspaces. In: 2007
6th IEEE and ACM international symposium on mixed and augmented reality.
of virtual elements by real elements by utilizing the tracking of the
IEEE; 2007. p. 225–34.
sensor device’s pose. The results of this re620 search show a trend in the [10] S. Lovegrove, Parametric dense visual slam.
use of conventional (monocular) devices as a sensor. Among the [11] Newcombe RA, Lovegrove SJ, Davison AJ. Dtam: dense tracking and mapping in
described algorithms, ORB-SLAM can be considered the state of the art real-time. In: 2011 international conference on computer vision. IEEE; 2011.
p. 2320–7.
among those who use a single camera as a sensor. Although the PL-SLAM [12] Salas-Moreno RF, Newcombe RA, Strasdat H, Kelly PH, Davison AJ. Slam++:
Monocular is robust, especially in environments that are poor in texture, simultaneous localisation and mapping at the level of objects. In: Proceedings of
this result is achieved at the expense of high computational capacity. the IEEE conference on computer vision and pattern recognition; 2013. p. 1352–9.
[13] Ai Y, Rui T, Lu M, Fu L, Liu S, Wang S. Ddl-slam: a robust rgb-d slam in dynamic
The stereo version of these algorithms, ORB-SLAM2 and PL-SLAM Ste­ environments combined with deep learning. IEEE Access 2020;8:162335–42.
reo, respectively, show similar results, although there are inherent ad­ [14] Basiratzadeh S, Lemaire ED, Baddour N. Augmented reality approach for marker-
vantages to stereo technology, such as easier scaling and map based posture measurement on smartphones. In: 2020 42nd annual international
conference of the IEEE engineering in medicine Bi695 ology society (EMBC); 2020.
initialization. Among the algorithms that are based on depth sensors, p. 4612–5. https://doi.org/10.1109/EMBC44109.2020.9175652.
associated or not with RGB cameras, some present promising results [15] Huletski A, Kartashov D, Krinkin K. Evaluation of the modern visual slam methods.
since the behaviour that is invariant to the ambient light is an important In: 2015 artificial intelligence and natural language and information extraction,
social media and web search FRUCT conference (AINL-ISMW FRUCT). IEEE; 2015.
characteristic of these sensors. However, depth sensors are not as p. 19–25.
commonly found in devices as RGB cameras and do not usually perform [16] Taketomi T, Uchiyama H, Ikeda S. Visual slam algorithms: a survey from 2010 to
well outdoors, due to the use of infrared rays. 2016. IPSJ Trans Comput Vis Appl 2017;9(1):1–11.
[17] Covolan JPM, Sementille AC, Sanches SRR. A mapping of visual slam algorithms
Among the advantages that can be pointed out of the use of visual
and their applications in augmented reality. In: 2020 22nd symposium on virtual
SLAM algorithms for AR applications are the high availability of low- and augmented reality (SVR). IEEE; 2020. p. 20–9.
cost cameras, both for desktop computers and for mobile devices, and [18] Munguia-Silva R, Mart′ ınez-Carranza J. Autonomous flight using rgb-d slam with a
the collection of non-invasive information made by optical devices. monocular onboard camera only. In: 2018 international conference on electronics,
communications and computers (CONIELECOMP). IEEE; 2018. p. 200–6.
Among the disadvantages of working in low light environment there is [19] Li Y, Lang S. A stereo-based visual-inertial odometry for slam. In: 2019 Chinese
still a challenge of initializing the map in the monocular configuration, automation congress (CAC). IEEE; 2019. p. 594–8.
and the difficulty of working outdoors with infrared depth cameras. [20] Wang S, Yue J, Dong Y, Shen R, Zhang X. Real-time omnidirectional visual slam
with semi-dense mapping. In: 2018 IEEE intelligent vehicles symposium (IV). IEEE;
2018. p. 695–700.
9. Conclusion [21] Jo H, Jo S, Cho HM, Kim E. Efficient 3d mapping with rgb-d camera based on
distance dependent update. In: 2016 16th international conference on control,
automation and systems (ICCAS). IEEE; 2016. 720 873–875.
This work carried out for this survey was to explore the main tech­ [22] Zafari FG, A and leung kk. A survey of indoor localization systems and
niques of visual SLAM developed in recent years with the aim of iden­ technologies. IEEE Commun Surv Tutor 2019;21(3).
tifying their fundamental characteristics. The results show that the [23] Ellwood SA, Newman C, Montgomery RA, Nicosia V, Buesching CD, Markham A,
Mascolo C, Trigoni N, Pasztor B, Dyo V, et al. An active-radio-frequency-
solutions developed for general purposes are the majority, although AR identification system capable of identifying colocations and social-structure:
is becoming a potential application for dedicated algorithms. The most validation with a wild free-ranging animal. Methods Ecol Evol 2017;8(12):
common solutions are those that perform in real time without requiring 1822–31.
[24] Misono Y, Goto Y, Tarutoko Y, Kobayashi K, Watanabe K. Development of laser
prior knowledge of the environment. In addition, the predominance of
rangefinder-based slam algorithm for mobile robot navigation. In: SICE annual
applications performed in internal environments is due to the limitation conference 2007. IEEE; 2007. p. 392–6.
of sensors in acting externally. This scenario should only change if the [25] Demim F, Nemra A, Boucheloukh A, Louadj K, Hamerlain M, Bazoula A. Robust
sensor technology evolves to the point of overcoming this limitation. svsf-slam algorithm for unmanned vehicle in dynamic environment. In: 2018
international conference on signal, image, vision and their applications (SIVA).
IEEE; 2018. p. 1–5.
[26] Engel J, Schöps T, Cremers D. Lsd-slam: large-scale direct monocular slam. In:
European conference on computer vision. Springer; 2014. p. 834–49.

13
C. Theodorou et al. Array 15 (2022) 100222

[27] Mur-Artal R, Montiel JMM, Tardos JD. Orb-slam: a versatile and accurate [38] Butt MM, Zhang H, Qiu X, Ge B. Monocular slam initialization using epipolar and
monocular slam system. IEEE Trans Robot 2015;31(5):1147–63. homography model. In: 2020 5th international conference on control and robotics
[28] Korkishko YN, Fedorov V, Prilutskiy V, Ponomarev V, Fedorov I, Kostritskii S, engineering (ICCRE). IEEE; 2020. p. 177–82.
Morev I, Obuhovich D, Prilutskiy S, Zuev A, et al. Highprecision inertial [39] Spournias A, Skandamis T, Pappas E, Antonopoulos C, Voros N. En- chancing slam
measurement unit imu-5000. In: 2018 IEEE international symposium on inertial method for mapping and tracking using a low cost laser scanner. In: 2019 10th
sensors and systems (INERTIAL). IEEE; 2018. p. 1–4. international conference on information, intelligence, systems and applications
[29] Quan K, Xiao B, Wei Y. Intelligent descriptor of loop closure detection for visual (IISA). IEEE; 2019. p. 1–4.
slam systems. In: 2019 Chinese control and decision conference (CCDC). IEEE; [40] Wang C, Zhang Y, Li X. Pmds-slam: probability mesh enhanced semantic slam in
2019. p. 993–7. dynamic environments. In: 2020 5th international conference on control, robotics
[30] Deng C, Luo X, Zhong Y. Improved closed-loop detection and octomap algorithm and cybernetics (CRC). IEEE; 2020. p. 40–4.
based on rgb-d slam. In: 2020 IEEE international conference on artificial [41] Bavle H, De La Puente P, How JP, Campoy P. Vps-slam: visual planar semantic slam
intelligence and computer applications (ICAICA). IEEE; 2020. p. 73–6. for aerial robotic systems. IEEE Access 2020;8:60704–18.
[31] Zheng J, Zhang H, Kong W, Tang K. A slam loop closure algorithm of bow [42] Lyu L, Ding Y, Yuan Y, Zhang Y, Liu J, Li J. Doc-slam: robust stereo slam with
incorporating the gray level of pixel. In: 2020 international conference 755 on dynamic object culling. In: 2021 7th international conference on automation,
computer vision, image and deep learning (CVIDL). IEEE; 2020. p. 360–3. robotics and applications (ICARA). IEEE; 2021. p. 258–62.
[32] Qian K, Zhao W, Li K, Ma X, Yu H. Visual slam with boplw pairs using egocentric [43] Mur-Artal R, Tard′ os JD. Orb-slam2: an open-source slam system for monocular,
stereo camera for wearable-assisted substation inspection. IEEE Sensor J 2019;20 stereo, and rgb-d cameras. IEEE Trans Robot 2017;33(5):1255–62.
(3):1630–41. [44] Campos C, Elvira R, Rodríguez JJG, Montiel JM, Tardós JD. Orb-slam3: An
[33] Ye C, Hong S, Tamjidi A. 6-dof pose estimation of a robotic navigation aid by accurate open-source library for visual, visual–inertial, and multimap slam. IEEE
tracking visual and geometric features. IEEE Trans Autom Sci Eng 2015;12(4): Trans Robot 2021;37(6):1874–90.
1169–80. [45] Geiger A, Lenz P, Stiller C, Urtasun R. Vision meets robotics: the kitti dataset. Int J
[34] Fu Z, Quo Y, Lin Z, An W, Fsvo. Semi-direct monocular visual odometry using fixed Robot Res 2013;32(11):1231–7.
maps. In: 2017 IEEE international conference on image processing (ICIP). IEEE; [46] Burri M, Nikolic J, Gohl P, Schneider T, Rehder J, Omari S, Achtelik MW,
2017. p. 2553–7. Siegwart R. The euroc micro aerial vehicle datasets. Int J Robot Res 2016;35(10):
[35] Bescos B, F′ acil JM, Civera J, Neira J. Dynaslam: tracking, mapping, and inpainting 1157–63.
in dynamic scenes. IEEE Rob Autom Lett 2018;3(4):4076–83. [47] Sturm J, Engelhard N, Endres F, Burgard W, Cremers D. A benchmark for the
[36] Liu K, Sun H, Ye P. Research on bundle adjustment for visual slam under large- evaluation of rgb-d slam systems. In: 2012 IEEE/RSJ international conference on
scale scene. In: 2017 4th international conference on systems and informatics intelligent robots and systems. IEEE; 2012. p. 573–80.
(ICSAI). IEEE; 2017. p. 220–4. [48] Kümmerle R, Steder B, Dornhege C, Ruhnke M, Grisetti G, Stachniss C, Kleiner A.
[37] Pumarola A, Vakhitov A, Agudo A, Sanfeliu A, Moreno-Noguer F. Plslam: real-time On measuring the accuracy of slam algorithms. Aut Robots 2009;27(4):387–407.
monocular visual slam with points and lines. In: 2017 IEEE international [49] Kelly A. Linearized error propagation in odometry. Int J Robot Res 2004;23(2):
conference on robotics and automation (ICRA). IEEE; 2017. p. 4503–8. 179–218.
[50] Horn BK. Closed-form solution of absolute orientation using unit quaternions. Josa
a 1987;4(4):629–42.

14

You might also like