0% found this document useful (0 votes)
22 views8 pages

Exploring Event Camera-based Odometry for Planetary Robots

Exploring Event Camera-based Odometry for Planetary Robots

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

Exploring Event Camera-based Odometry for Planetary Robots

Exploring Event Camera-based Odometry for Planetary Robots

Uploaded by

chi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION.

ACCEPTED JUNE, 2022 1

Exploring Event Camera-based Odometry for


Planetary Robots
Florian Mahlknecht1 , Daniel Gehrig2 , Jeremy Nash1 , Friedrich M. Rockenbauer1 , Benjamin Morrell1 ,
Jeff Delaune1 , and Davide Scaramuzza2

Abstract—Due to their resilience to motion blur and high


robustness in low-light and high dynamic range conditions,
event cameras are poised to become enabling sensors for vision-
based exploration on future Mars helicopter missions. However,
existing event-based visual-inertial odometry (VIO) algorithms
either suffer from high tracking errors or are brittle, since
they cannot cope with significant depth uncertainties caused
by an unforeseen loss of tracking or other effects. In this
work, we introduce EKLT-VIO, which addresses both limitations
by combining a state-of-the-art event-based frontend with a
filter-based backend. This makes it both accurate and robust
to uncertainties, outperforming event- and frame-based VIO
algorithms on challenging benchmarks by 32%. In addition, (a) Mission scenario
we demonstrate accurate performance in hover-like conditions
(outperforming existing event-based methods) as well as high
robustness in newly collected Mars-like and high-dynamic-range
sequences, where existing frame-based methods fail. In doing so,
we show that event-based VIO is the way forward for vision-
based exploration on Mars.
Index Terms—Vision-Based Navigation; Space Robotics and (b) Ingenuity Mars Helicopter (c) Lava tube
Automation; Visual-Inertial SLAM
Fig. 1: New mission scenario (a) enabled by EKLT-VIO for a
M ULTIMEDIA M ATERIAL : Mars helicopter (b) scouting the entrance of lava tubes (c).
For code and dataset please visit https://uzh-rpg.github.io/ Event cameras promise to address these limitations [5].
eklt-vio/. Unlike a standard camera that measures absolute pixel bright-
ness using a global exposure time, event camera pixels in-
I. I NTRODUCTION dependently detect positive or negative brightness changes at
TATE estimation is critical for enabling autonomous nav-
S igation and control of mobile robots, with widespread
applications from space exploration to household cleaning
microsecond resolution. Event cameras can provide data at 1
MHz and 120 dB dynamic range, both orders of magnitude
greater than what can be achieved with a standard 60 dB
robots. There exist well-established algorithms, such as [1], camera. This leads to a significant reduction in motion blur,
[2], [3], [4] which estimate ego-motion from visual-inertial and enables operation in high dynamic range (HDR), low light,
data. However, vision-based navigation is drastically impacted and fast motion conditions [6], [7].
by the known limitations of conventional cameras, such as On the application side, computer vision is increasingly
motion blur and low dynamic range. used in modern planetary robotic missions [8], [9], [10], [11],
Manuscript received: February, 24, 2022; Revised May, 20, 2022; Accepted
[12].The resilient properties of event cameras may enable
June, 14, 2022. robots to explore in conditions where frame cameras cannot
This paper was recommended for publication by Editor Eric Marchand upon operate without introducing the size, weight, power, and range
evaluation of the Associate Editor and Reviewers’ comments.
Part of this research was carried out at the Jet Propulsion Laboratory,
limitations of a 3D LiDAR.
California Institute of Technology, under a contract with the National Aero- In this paper, we focus on a scenario involving the explo-
nautics and Space Administration. © 2021. All rights reserved. The other ration of the entrance of a lava tube by a Mars helicopter,
part was carried out at the Robotics and Perception Group, University of
Zurich, under contracts with the National Centre of Competence in Research as illustrated in Fig. 1. Lava tubes are natural tunnels created
(NCCR) Robotics through the Swiss National Science Foundation (SNSF) by lava flows in volcanic terrains. Those found on Mars have
and the European Research Council (ERC) under grant agreement No. drawn significant attention because of the possibility that they
51NF40 185543. We thank Konstantin Kalenberg for the feature prediction
implementation improving EKLT’s computational efficiency. might host microbial life [13]. The natural protection from
1 F. Mahlknecht, J. Nash, F. M. Rockenbauer, B. Morrell and J. Delaune are radiation offered by lava tubes also makes them candidates to
with the Jet Propulsion Laboratory, California Institute of Technology, USA. host the first human base on Mars.
2 D. Gehrig and D. Scaramuzza are with the Robotics and Perception Group,
University of Zurich, Switzerland https:/rpg.ifi.uzh.ch Before sending a robotic mission [14] or astronauts to a
Digital Object Identifier (DOI): see top of this page. specific lava tube, it would be desirable to scout and map
2 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2022

several locations. Mars helicopters are candidate platforms higher tracking performance than existing methods on newly
to scout multiple lava tubes throughout a single mission. collected data in Mars-like conditions. This demonstrates the
However, Mars helicopters cannot fly LiDARs and have to viability of our EKLT-VIO on Mars. Our contributions are:
rely on passive cameras for navigation. Frame cameras are ill- • We introduce EKLT-VIO, an event-based VIO method
suited to explore lava tubes because of the HDR conditions that combines an accurate state-of-the-art event-based
created by the shadow at the entrance of the tube, as well feature tracker EKLT with an EKF backend. It outper-
as the low-light conditions once inside. This capability gap is forms state-of-the-art event- and frame-based methods,
filled by event cameras, which offer the potential to explore reducing the overall tracking error by 32%.
and map the lava tube for potentially tens of meters using • We show accurate and robust tracking even in rotation-
residual light from the entrance. only sequences, which are closest to the hover-like sce-
Mars helicopters come with their own requirements on the narios experienced by Mars helicopters, outperforming
state estimation system [15], [10]. They must rely on small optimization-based and frame-based methods.
passive lightweight cameras to observe the full state up to scale • We outperform existing methods on newly collected
and gravity direction. The camera is fused with an inertial Mars-like sequences collected in the JPL Mars Yard and
measurement unit (IMU), which makes gravity observable, Wells Cave for planetary exploration.
enables a high estimation rate, and acts as an emergency
landing sensor in case of camera failure. Finally, a laser range
II. R ELATED W ORK
finder is used to observe scale in the absence of accelerometer
excitation. The estimation backend must be able to handle Frame-based VIO: An overview of existing approaches
feature depth uncertainty associated with helicopter hovering is discussed in [22]. Frame-based VIO algorithms can be
and rotation-only dynamics. Due to this uncertainty successful roughly segmented into two classes: optimization-based and
feature triangulation is often inhibited in these cases, leading filter-based algorithms [22]. While both algorithms focus
to failure of optimization-based backends, which critically rely on tracking camera poses by minimizing both visual and
on triangulated features. By contrast, filter-based approaches inertial residuals, optimization-based methods solve this by
leverage priors to initialize depth measurements and thus performing iterative Gauss-Newton steps, while filtering-based
do not suffer from this issue [16]. This proved critical in methods achieve this through Kalman Filtering steps.
Ingenuity Mars helicopter’s sixth flight on Mars, where an Since optimizing both 3D landmarks (i.e., SLAM features)
image timestamping anomaly caused roll and pitch oscillations and camera poses is costly, several filtering-based techniques
greater than 20 degrees [17]. Such rotations cause a loss of exist that focus on refining camera poses from bearing
features, which can lead to estimation failure in non-filter- measurements (i.e., multi-state constraint Kalman filter
based state estimation approaches, which are fundamentally (MSCKF) features [23]) directly. However, MSCKF features
unable to handle the depth uncertainty of the new feature need translational motion and provide updates only after the
tracks without a dedicated re-initialization procedure. full feature track is known. The filtering-based approach,
State-of-the-art event-based VIO methods are unsuitable xVIO [15], combines the advantages of both features, with
in these conditions since they either (i) use optimization- robustness to depth uncertainty in rotation-only motion and
based backends, which do not model depth uncertainty, thus computational efficiency with many MSCKF features.
featuring brittle performance in mission-typical rotation-only
motion, or when a significant portion of features are lost [6], or Event-based VIO: First event-based, 6-DOF visual odom-
(ii) show a higher tracking error, due to the use of suboptimal etry (VO) algorithms only started to appear recently [24],
event-based frontends [18]. Image-based VIO methods such [25]. Later work incorporated an IMU to improve tracking
as [15], [19] have addressed this by using depth priors [15] or performance and stability [26], [18], achieving impressive
motion classification [19]. tracking on a fast spinning leash [26]. Despite their robustness,
In this work, we introduce EKLT-VIO, which builds on the these methods are affected by drift due to the differential
EKF backend in [15] which handles pure rotational motion, nature of the used sensors. This is why Ultimate SLAM
and combines it with the state-of-the-art event-based feature (USLAM) [6] used a combination of events, frames, and
tracker EKLT [20], thereby addressing the limitations above. IMU, all provided by the Dynamic and Active Vision Sensor
EKLT-VIO is accurate, outperforming previous state-of-the- (DAVIS) [27]. It tracks FAST corners [28] on frames and
art frame-based and event-based methods on the challenging motion-compensated event frames separately using the Lucas-
Event-Camera Dataset [21], with a 32% improvement in terms Kanade tracker (KLT) [29] and fuses these feature tracks with
of pose accuracy. Moreover, by leveraging depth uncertainty IMU measurements in a sliding window.
it reduces its reliance on triangulating features, which both While addressing drift, USLAM still relies on a sliding
increase robustness during purely-rotational motion, and fa- window optimization scheme, which is expensive and does
cilitates rapid initialization, both of which are limitations of not allow pose-only optimization through the use of MSCKF
existing optimization-based methods. This is because they features. Moreover, its FAST/KLT frontend, first introduced
require lengthy bootstrapping sequences, which would be in [26], is optimized explicitly for frame-like inputs and was
impractical on Mars. Additionally, it maintains state-estimate, shown to transfer suboptimally to event-based frames [20].
even when frame-based methods fail due to excessive motion In this work, we incorporate the state-of-the-art event-based
blur. We show that our event-based EKLT frontend has a tracker EKLT [20], which takes a more principled approach
MAHLKNECHT et al.: EXPLORING EVENT CAMERA-BASED ODOMETRY FOR PLANETARY ROBOTS 3

Fig. 2: We combine the feature tracker EKLT, which use frames and events, with the filter-based backend xVIO to enable low-
translation state-estimation. In contrast to standard, frame-based VIO, an additional synchronization step converts asynchronous
tracks to synchronous matches, which are used by the backend. This enables variable-rate backend updates.

The sliding window states contain the positions, pcwi , and


attitudes parameterized as quaternions, qcwi , of the last M
EKF Updates / s

100 every 20ms


every 25k events camera poses {ci } with respect to a world frame {w}. The
feature states contain the 3D positions, fj , of N SLAM
50
features. In this work N = 15 and M = 10.
We use a discrete-time VIO approach, as opposed to one
based on splines [31], [32], [33], [34]. Although they can
slow fast
0.6 incorporate event-data [31] more elegantly they are notoriously
APE [m]

0.4 computationally expensive [31] and less established. This is


0.2 why we opt for discrete-time VIO and leave splines for future
0.0
work.
0 5 10 15 20 25 30 35 SLAM features are parametrized with respect to an an-
ca
Time [s] chor pose pw j in the sliding window, and defined as
fj = [αj βj ρj ] with αj and βj being normalized image
Fig. 3: Synchronous feature updates (red) tend to generate too coordinates and ρj being the inverse depth. Each time the
many updates during slow sequences and too few during fast feature tracks are updated, each SLAM feature j is converted
sequences, leading to high tracking error. Our irregular update from inverse-depth to Cartesian coordinates in the associated
strategy (purple) adapts to the event-rate, and thus maintains anchor camera frame {caj }.
low tracking error in both scenarios.
   
αj
ca 1 caj |
to fusing events and frames, and thus achieves better feature pjci = C(qcwi ) pw j + C(qw )  βj  − pcwi  , (4)
ρj
tracking performance compared to [6], [26]. 1

III. M ETHODOLOGY The measurement model is the normalized feature:


In this section we present EKLT-VIO, which is illustrated T
zj = π(pjci ) + nj ,

in Fig. 2. It is an event-based VIO algorithm based on the π(x) = x1 /x3 x2 /x3 , (5)
state-of-the-art event tracker EKLT, coupled with a filter-based
xVIO backend. where π(x) performs feature projection, nj is Gaussian
noise, and zj are the new feature observations by the frontend,
A. Backend expressed in normalized image coordinates. Eqs. (4) and (5)
can be used to develop the EKF update by linearizing the
We start by providing a summary of the xVIO backend. For SLAM feature reprojection. Details are given in [15].
more details see [15]. The backend fuses data from an inertial
In addition to SLAM features, the backend maintains
measurement unit (IMU) and feature tracks from the frontend.
MSCKF features that additionally constrain the camera poses
It does this by using an extended Kalman filter (EKF) with an
without an explicit inverse depth. MSCKF features are thus
IMU state xI and a visual state xV :
not part of the state, resulting in a smaller computational cost
| per feature. They need to be observed for the last 2 ≤ m ≤ M
x = xI | xV |

(1)
frames, providing a corresponding observation for each pose
The IMU state follows an inertial propagation scheme as in the sliding window. MSCKF features require triangulation
described in [30]. The visual state xV is split into sliding using those pose priors, so they can only be processed once
window states xS and feature states xF : a track with significant translation is observed. Successfully
triangulated MSCKF features are used to initialize SLAM fea-
| | tures. When there is insufficient translation for triangulation,
xV = xF | xS | , xF = f1 . . . fN
 
(2)
| xVIO instead initializes the inverse depth with ρ0 = 2d1min and
xS = pcw1 | . . . pcwM | qcw1 | . . . qcwM |

(3) uncertainty σ0 = 4d1min , corresponding to a semi-infinite depth
4 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2022

prior, and discards the MSCKF feature track [35]. This depth This variable rate allows our algorithm to adapt to the
prior is especially useful during pure rotation or initialization, scene dynamics (Fig. 3), leading to fewer EKF updates in
where few features can be triangulated, since it can directly slow sequences (Fig.3, left) and a lower tracking error during
contribute to reducing the state covariance. high-speed sequences, compared to fixed rate updating. These
features motivate the use of an event-based frontend since
a purely frame-based one is limited by the framerate of the
B. Frontend
camera. Although, this may lead to drift in purely stationary
Here we provide a summary of our EKLT frontend, and environments where no events are triggered, this can easily be
refer the reader to [20] for more details. EKLT tracks Harris amended by enforcing a minimal backend update rate. Or by
corners, extracted on frames, by aligning the predicted and enforcing a no-motion prior when the event rate goes below a
measured brightness increment in a patch around the corners. threshold, as in [6].
It minimizes the normalized distance between these patches to Outlier rejection: For EKLT we exclusively reject outliers
recover the warping parameters p and normalize optical flow by setting a maximum threshold on the optimized residual
v as of the alignment score in Eq. (6). This allows outliers to
be rejected quickly, without the need for costly geometric
verification, such as 8-point RANSAC.
∆L(u) ∆L̂(u, p, v)
{p, v} = arg min − . (6)
p,v k∆L(u)k ∆L̂(u, p, v) IV. E XPERIMENTS
While ∆L is defined as an aggregation of events in a local We start by validating our approach on standard bench-
patch, ∆L̂ is defined as the negative dot product between the marks in Sec. IV-B, where we compare the performance of
local log image gradient and optical flow vector, following the EKLT-VIO against state-of-the-art event-based [18], frame-
linearized event generation model [36]. Here W (u, p) aligns based [15] and event- and frame-based methods [6]. To
the image gradient with the measured brightness increments study the effect on the event-based feature tracker, we also
according to the alignment parameters p. EKLT minimizes study an additional baseline, based on the HASTE feature
Eq. (6) using Gauss-Newton and the Ceres library[37], and re- tracker [38]. We then proceed to demonstrate the suitability
covers alignment parameters p and optical flow v. As opposed of our approach on two important use-cases motivated by the
to the reference implementation of EKLT, which optimizes Mars exploration scenario: (i) pure rotational motion, imitating
in a sliding window fashion after a fixed number of events, hover-like conditions on Mars (Sec. IV-C), and (ii) challenging
we trigger the optimization only when the adaptive number HDR conditions on newly collected datasets in the JPL Mars
of events is reached, using each event batch only once. This Yard and at the entrance of the Wells Cave, emulating the
entails a significant speed-up without loss in accuracy. entry into lava tubes (Sec. IV-D).

C. Frontend Adaptations A. Baselines and Compared Methods


Asynchronous feature updates: We convert the asyn- USLAM [6] is an event- and frame-based VIO method, which
chronous feature tracks provided by EKLT to synchronous fuses feature tracks derived from frames and event-frames in
feature tracks via a synchronization step (Fig. 2. This step an optimization-based backend.
produces a temporally synchronized list of feature positions, EVIO [18] uses only events and IMU. Events are used to
which are passed to the backend. The backend uses the asso- generate asynchronous feature tracks, which are then fused in
ciated correspondences zi ⇐⇒ zj together with consecutive a filter-based backend. Since open-source code is not available,
camera poses ci and cj to update the state as discussed we only report results on real sequences.
in Sec. III-A. It is performed by selecting the most recent KLT-VIO [15] is a frame-based VIO method that fuses feature
feature in the currently tracked feature set and extrapolating tracks based on FAST/KLT in a filter-based backend, and is
the positions of all other features to its timestamp. We syn- specifically designed for use during helicopter flight.
chronize every time, a fixed number of events ne is triggered, HASTE-VIO [38] Finally, we combine the state-of-the-art
enabling variable-rate backend updates. We empirically found purely event-based tracker HASTE [38] with xVIO as an
ne = 3200 to work best, see Tab. I. We argue that reducing additional baseline. Similar to EKLT, it produces asynchronous
ne will introduce additional noisy updates to the EKF which feature tracks which are first synchronized using the method
reduce the accuracy, while having too high ne makes our described in Sec. III-C, before being fed into the backend.
approach less robust during high-speed motion.
B. Real Data
ne 500 1000 3200 4800 7200 9200 15000 20000
MMPE 0.57 0.55 0.49 0.59 0.60 0.68 0.83 1.72 We benchmark our methods on the Event-Camera
Dataset [21] , recorded with a DAVIS 240C [27] with syn-
TABLE I: Median Mean Position Error (MMPE) [%] on chronized images, events, IMU measurements, and very fast
the Event Camera Dataset for different EKF event update hand-held motions in an HDR scenario. An OptiTrack is used
thresholds for ground-truth camera trajectories. We evaluate the pose
tracking accuracy using the same protocol as [6], and report
MAHLKNECHT et al.: EXPLORING EVENT CAMERA-BASED ODOMETRY FOR PLANETARY ROBOTS 5

Dataset USLAM* [6] USLAM [6] EVIO [18] KLT-VIO [15] HASTE-VIO EKLT-VIO (ours)
MPE MYE MPE MYE MPE MYE MPE MYE MPE MYE MPE MYE
Boxes 6DOF 0.30 0.04 0.68 0.03 4.13 0.92 0.97 0.05 2.03 0.03 0.84 0.09
Boxes Translation 0.27 0.02 1.12 2.62 3.18 0.67 0.33 0.08 2.55 0.46 0.48 0.25
Dynamic 6DOF 0.19 0.10 0.76 0.09 3.38 1.20 0.78 0.03 0.52 0.06 0.79 0.06
Dynamic Translation 0.18 0.15 0.63 0.22 1.06 0.25 0.55 0.06 1.32 0.06 0.40 0.04
HDR Boxes 0.37 0.03 1.01 0.31 3.22 0.15 0.42 0.02 1.75 0.09 0.46 0.06
HDR Poster 0.31 0.05 1.48 0.09 1.41 0.13 0.77 0.03 0.57 0.02 0.65 0.04
Poster 6DOF 0.28 0.07 0.59 0.03 5.79 1.84 0.69 0.02 1.50 0.03 0.35 0.02
Poster Translation 0.12 0.04 0.24 0.02 1.59 0.38 0.16 0.02 1.34 0.02 0.35 0.03
Shapes 6DOF 0.10 0.04 1.07 0.03 2.52 0.61 1.80 0.03 2.35 0.02 0.60 0.03
Shapes Translation 0.26 0.06 1.36 0.01 4.56 2.60 1.38 0.02 1.09 0.02 0.51 0.03
Average 0.24 0.06 0.89 0.34 3.08 0.88 0.79 0.04 1.50 0.08 0.54 0.07
*per-sequence hyperparameter tuning and correct IMU bias intialization

TABLE II: Pose estimate accuracy comparison on the Event-Camera Dataset [21] in terms of mean position error (MPE) in
% and mean yaw error (MYE) in deg/m. Grayed-out results with (*) by USLAM [6] were achieved through per-sequence
parameter tuning and correct IMU bias initialization, while results in black used a single parameter set, tuned on all sequences
simultaneously, and were initialized with an IMU bias of zero.

Dataset USLAM [6] KLT-VIO [15] HASTE-VIO EKLT-VIO (ours)


MPE MYE MPE MYE MPE MYE MPE MYE
Dynamic Rotation 9.97 0.13 6.22 2.32 7.71 1.52
Boxes Rotation diverging 20.57 1.32 8.78 1.36
unfeasible
Poster Rotation diverging 3.96 0.09 1.44 0.09
Shapes Rotation diverging diverging 6.95 4.59

TABLE III: Mean position and yaw error (MPE and MYE) in % and deg/m on rotation-only sequences.

mean position error (MPE) in % of the total trajectory length backends such as USLAM [6]. Similar to hover-like conditions
and mean yaw error (MYE) in deg/m in Tab.II. expected during Mars missions, these sequences translate only
In [6], USLAM uses different parameters for each sequence, little compared to the average scene depth, which poses a
and correct IMU bias initialization, resulting in the gray challenge for keyframe generation and triangulation.
columns in Tab. II. We mark this method as USLAM*. We adopt the same evaluation protocol as before and report
However, on Mars, VIO systems should perform robustly in results for all methods in Tab. III. We observed during this ex-
unknown environments, making, parameter tuning and bias periment that USLAM did not initialize during these sequences
initialization infeasible. For this reason, we retune the param- since it could never detect sufficient translation to insert a
eters of USLAM to perform best on all sequences simultane- new keyframe, and it is thus marked with unfeasible. Frame-
ously resulting in the black values in Tab. II. All other methods based KLT-VIO tracks well for the first 30s, but diverges in the
were tuned in the same way. Comparing USLAM* with second part, where rapid shaking motion causes motion blur
USLAM shows that IMU bias initialization, and per-sequence on the frames, and high feature displacements, both of which
hyperparameter tuning are clearly important to achieve low significantly impact the accuracy of the KLT frontend. This
tracking error, reducing the error from 0.89% to 0.24%. Our leads to a diverging state estimate. By contrast, event-based
EKLT-VIO, on the other hand, achieves an average error of methods EKLT-VIO and HASTE-VIO can track robustly,
0.54% without bias initialization, 39% lower than USLAM. because their event-based front-ends are unaffected by motion-
This improvement indicates that EKLT-VIO is simultaneously blur. EKLT-VIO, however, is the only method to converge
more robust to zero IMU bias initialization, and per-sequence on all sequences and yields a consistently lower tracking
hyperparameter tuning. error compared to all compared methods. In summary, EKLT-
In terms of position error, EKLT-VIO outperforms all other VIO leverages the advantages of event-based frontends for
methods on 5 out of 10 sequences. With an average MPE of robust high-speed tracking and the advantages of a filter-based
0.54% EKLT-VIO shows a 32% lower MPE than runner-up backend to fuse small translational motions. This shows that
KLT-VIO with 0.79%. Finally, with a 3.08% MPE, EVIO [18] EKLT-VIO is most suitable in these conditions.
is outperformed by EKLT-VIO by 82%.
D. Mars-mission Scenario: Wells Cave and JPL Mars yard
C. Rotation-only sequences Finally, we show the capabilities of EKLT-VIO in Mars-
As a next step, we show the suitability of EKLT-VIO like exploration scenarios, by comparing it to image-based
in a Mars Mission-like scenario. To do this, we evaluate methods KLT-VIO [16], ORB-SLAM3 [41], OpenVINS [42],
all methods on the rotation-only sequences of the Event- VINS-Mono [3] and ROVIO [40] on sequences recorded at
Camera Dataset, which are challenging for optimization-based the JPL Mars Yard (Fig. 4 (a)), and Wells Cave Nature
6 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2022

(a) Mars Yard preview (b) Overexposed image (c) Recons. from events (d) Trajectories

(e) Wells Cave preview (f) Underexposed image (g) Recons. from events (h) Trajectories
Fig. 4: In the Mars Yard (a) we test HDR conditions which cause severe oversaturation artefacts in standard images (b). Instead
in the Wells Cave (e) we study low light scenarios encountered in lava tubes, which cause undersaturation (f). HDR images
reconstructed from events [39] (c,g) do not suffer from these artefacts, and are used by our method. As a result, we outperform
existing frame-based approaches KLT-VIO [15] and ROVIO [40] on both trajectories.

16 by our method. For a resolution of 640 × 480 these images


14 can be provided with 30 FPS on a Quadro RTX 4000 GPU.
12
However, EKLT-VIO only needs a subset of these images,
# of SLAM features

since it only uses them for feature initialization.


10
Mars Yard: The trajectory used in this analysis is a hand-
8 held circular motion with a diameter of 1.5 meters over a sharp
6 shadow with increasing speed. The trajectories tracked by all
inside entrance methods are shown in Fig. 4 (d). While EKLT-VIO consis-
4
(low-light) (HDR) tently tracks the circular motion for at least two revolutions,
KLT-VIO
2 filter-based methods KLT-VIO and ROVIO diverge due to a
EKLT-VIO
0 lack of features caused by motion blur and HDR conditions.
0 5 10 15 20 25 The optimization-based methods ORB-SLAM3 and VINS-
Time [s]
MONO fail to initialize, since the sequence starts directly from
Fig. 5: Tracked features on the Wells Cave sequence. While hover, and misses an initialization trajectory, with which to
KLT-VIO and ROVIO quickly diverge, due to lacking features generate an initial map. OpenVINS fails to initialize due to
(c), EKLT-VIO can track successfully. missing parallax. These methods are therefore not plotted. This
shows that thanks to the use of an event-based frontend and
Preserve (Fig. 4 (e)). The Mars Yard sequence features rapid filter-based backend EKLT-VIO can overcome this condition.
illumination changes that challenge the autoexposure and Wells Cave: Finally, the trajectories in the Wells Cave, for
result overexposures in the images (Fig. 4 (b)). The Wells Cave all methods are shown in Fig. 4 (h). Only filter-based meth-
instead is a cave system used by JPL to emulate lava tubes on ods KLT-VIO and ROVIO manage to initialize, but diverge
Mars. It features a low illumination, leading to underexposure quickly. EKLT-VIO tracks consistently, until reaching the
in the images (Fig. 4 (f)). In the Wells Cave we use the tunnel entrance. Again, ORB-SLAM3 and VINS-Mono fail
DAVIS 346[27], and in the Mars Yard, we use a mvBlueFOX- to initialize and therefore are not plotted.OpenVINS fails to
MLC200wG standard camera, a DVXplorer event camera, and initialize due to missing features. As shown in Fig. 5, EKLT-
an MPU9250 IMU. VIO consistently maintains SLAM features, while KLT-VIO
Here we show that EKLT-VIO can run on events alone, only does so once it exits the cave.
by using images reconstructed from events provided by the
method E2VID [39]. They feature a much higher dynamic E. Limitations
range than the standard images (Fig. 4 (c,g)). We reconstruct We study EKLT-VIO, KLT-VIO, and HASTE-VIO in terms
frames every 15’000 events, resulting in an HDR video used of their real-time factor (RTF, Fig. 6 (a)) and report the RTF
MAHLKNECHT et al.: EXPLORING EVENT CAMERA-BASED ODOMETRY FOR PLANETARY ROBOTS 7

2.5

EKLT

Realtime factor / feature


2.0 Tracking
83
83
1.5 Frontend
Backend

1.0 17 13 Feature
management
0.5 4
Visual
update
0.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5
Event rate [Me/s]

(a) Realtime factor (b) Realtime factor vs. Event rate (c) Computational pie chart
Fig. 6: Real-time factor (RTF) (a) for EKLT-VIO (orange), HASTE-VIO (green) and KLT-VIO (blue) on Poster 6DOF. The
RTF per tracked feature (b) increases with the event rate. Our method can process 89’000 events per second when tracking
45 features. As seen in (c), EKLT-VIO spends most of its computation time tracking features.

Speedup method MPE MYE RTF Max RTF Median


per feature (b) and computation allocations (c) for EKLT-
Baseline 0.36 0.02 43.6 17.9
VIO. We conduct all our experiments on a laptop with an RF r = 2 diverging 11.2 5.20
Intel i7-7700HQ quadcore processor, exploiting however only RF r = 5 diverging 5.70 2.20
RPF (τ = 1 ms) 0.27 0.02 37.3 15.40
a single core in the current implementation. The RTF measures RPF (τ = 10 ms) 0.48 0.02 15.7 7.70
how much time is spent to process a second of real-time, and ne = 6400 0.24 0.02 21.2 9.70
15 Features 0.31 0.02 18.6 8.70
RTF< 1 indicates real-time performance. As seen in Fig. 6 (a) 15 Features, RPF (τ = 10 ms) 0.41 0.02 8.20 4.20
15 Features, RPF (τ = 10 ms), ne = 6400 3.79 0.02 4.39 2.05
there exists a clear speed-accuracy trade-off between EKLT-
VIO, HASTE-VIO, and KLT-VIO, since EKLT-VIO achieves TABLE IV: Real-time factor speedup on Poster 6DOF. We
a maximum real-time factor of around 45. Note that this is compare random filtering (RF), refractory period filtering
45 times slower than real-time. For EKLT-VIO, the real-time (RPF), reducing the number of features, and increasing ne .
factor correlates with the event rate (Fig. 6 (b)), which depends Our baseline tracks 45 features and updates each feature, every
on the scene texture and camera speed. On Poster 6DOF it ne = 3200 events. RTF > 1 is slower than real-time.
can process 89’000 kEv/sec.
V. C ONCLUSION
F. Speedup Strategies Future planetary missions, require us to venture into pre-
Fig.6 (c) shows that, the EKLT frontend remains the bot- viously inaccessible domains, such as lava-tubes on Mars,
tleneck, which directs future work toward speeding up EKLT. which pose challenging lighting conditions for traditional
Tab. IV illustrates three speedup strategies to achieve realtime image-based VIO. We explored the use of event cameras,
capabilities evaluated Poster 6DOF. (i) We reduce the number which promise to shed light in these domains due to their
of tracked frontend features from 45 to 15, (ii) we increased high dynamic range. We present EKLT-VIO which integrates
ne , the number of events before triggering an update, by a the state-of-the-art feature tracker EKLT with the filter-based
factor of two and (iii) we reduce the event rate with random backend xVIO thus leveraging the advantages of both. The
filtering (RF), randomly keeping every rth event, or refractory event-based frontend provides robust high-speed feature mea-
period filtering (RPF), where events within a time τ of the surements even in low-light and HDR scenarios while the
previous event are discarded. To improve the convergence filter-based backend addresses the limitations of traditional
in (ii) we additionally implemented IMU-based feature pre- optimization-based VIO algorithms in near-hovering condi-
diction [40], to improve the initial guess. While naive RF tions. We show an evaluation on Mars-like sequences and
degrades performance, RPF with τ = 10 ms reduces the challenging hand-held sequences of the Event-Camera dataset.
median RTF to 7.7. Reducing the frontend features results in On these sequences, we demonstrate the robust pose tracking
an RTF of 8.7, and, when combined with filtering, leads to an the performance of our methods, showing a mean position
RTF of 4.2. These steps lead to a minimal increase of the MPE error reduction of up to 32% compared to event- and frame-
from 0.36 to 0.41. Setting ne = 6400 results in an RTF of 9.7, based state-of-the-art methods. Additionally, we showcase the
while reducing the MPE from 0.36 to 0.24. However, when advantages of our backend and frontend in the first successful
combined with additional filtering, we found that the method evaluation on the rotation-only sequences of the Event-Camera
diverges with an MPE of 3.79, but a lower RTF of 2.05. Dataset with fast motion and challenging lighting conditions.
The remaining gap can be closed by software-side techniques, Finally, we demonstrate our method’s robustness in visually
such as distributing the workload to multiple cores (see https: challenging conditions recorded in the JPL Mars Yard and
//github.com/Doch88/rpg eklt multithreading). There, up to in the Wells Cave, replicating our mission scenario. To spur
four cores were parallelized, leading to a 3.6-fold speedup. further research in this direction, we open-source the imple-
mentation of this work and release our Mars-like sequences.
8 IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED JUNE, 2022

R EFERENCES estimation, visual odometry, and SLAM,” Int. J. Robot. Research,


vol. 36, no. 2, pp. 142–149, 2017.
[1] M. Li and A. I. Mourikis, “Optimization-based estimator design for [22] J. Delmerico and D. Scaramuzza, “A benchmark comparison of monoc-
vision-aided inertial navigation,” in Robotics: Science and Systems. ular visual-inertial odometry algorithms for flying robots,” in IEEE Int.
Berlin Germany, 2013, pp. 241–248. Conf. Robot. Autom. (ICRA), 2018.
[2] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, [23] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalman
“Keyframe-based visual-inertial SLAM using nonlinear optimization,” filter for vision-aided inertial navigation,” in IEEE Int. Conf. Robot.
Int. J. Robot. Research, 2015. Autom. (ICRA), 2007, pp. 3565–3572.
[3] T. Qin, P. Li, and S. Shen, “VINS-Mono: A robust and versatile [24] H. Kim, S. Leutenegger, and A. J. Davison, “Real-time 3D reconstruc-
monocular visual-inertial state estimator,” IEEE Trans. Robot., vol. 34, tion and 6-DoF tracking with an event camera,” in Eur. Conf. Comput.
no. 4, pp. 1004–1020, 2018. Vis. (ECCV), 2016, pp. 349–364.
[4] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “On-manifold [25] H. Rebecq, T. Horstschäfer, G. Gallego, and D. Scaramuzza, “EVO: A
preintegration for real-time visual–inertial odometry,” IEEE Transactions geometric approach to event-based 6-DOF parallel tracking and mapping
on Robotics, vol. 33, no. 1, pp. 1–21, 2016. in real-time,” IEEE Robot. Autom. Lett., vol. 2, no. 2, pp. 593–600, 2017.
[5] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, [26] H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Real-time visual-
S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, and D. Scara- inertial odometry for event cameras using keyframe-based nonlinear
muzza, “Event-based vision: A survey,” IEEE Trans. Pattern Anal. Mach. optimization,” in British Mach. Vis. Conf. (BMVC), 2017.
Intell., 2020. [27] C. Brandli, R. Berner, M. Yang, S.-C. Liu, and T. Delbruck, “A 240x180
[6] A. Rosinol Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, 130dB 3µs latency global shutter spatiotemporal vision sensor,” IEEE
“Ultimate SLAM? combining events, images, and IMU for robust visual J. Solid-State Circuits, vol. 49, no. 10, pp. 2333–2341, 2014.
SLAM in HDR and high speed scenarios,” IEEE Robot. Autom. Lett., [28] E. Rosten and T. Drummond, “Machine learning for high-speed corner
vol. 3, no. 2, pp. 994–1001, Apr. 2018. detection,” in Eur. Conf. Comput. Vis. (ECCV), 2006, pp. 430–443.
[7] S. Sun, G. Cioffi, C. De Visser, and D. Scaramuzza, “Autonomous [29] B. D. Lucas, T. Kanade, et al., “An iterative image registration technique
quadrotor flight despite rotor failure with onboard vision sensors: Frames with an application to stereo vision.” Vancouver, British Columbia,
vs. events,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1981.
580–587, 2021. [30] S. Weiss, M. Achtelik, S. Lynen, M. Chli, and R. Siegwart, “Real-time
[8] A. Johnson, S. Aaron, J. Chang, Y. Cheng, J. Montgomery, S. Mohan, onboard visual-inertial state estimation and self-calibration of MAVs in
S. Schroeder, B. Tweddle, N. Trawny, and J. Zheng, “The lander vision unknown environments,” in IEEE Int. Conf. Robot. Autom. (ICRA), 2012.
system for Mars 2020 entry descent and landing,” in AAS Guidance, [31] E. Mueggler, G. Gallego, H. Rebecq, and D. Scaramuzza, “Continuous-
Navigation, and Control Conference, 2017. time visual-inertial odometry for event cameras,” IEEE Trans. Robot.,
vol. 34, no. 6, pp. 1425–1440, Dec. 2018.
[9] B. Bos, M. Ravine, M. Caplinger, J. Schaffner, J. Ladewig, R. Olds,
[32] G. Cioffi, T. Ciesleski, and D. Scaramuzza, “Continuous-time vs.
C. Norman, D. Huish, M. Hughes, S. Anderson, D. Lorenz, A. May,
discrete-time vision-based slam: A comparative study,” in IEEE Robotics
C. Adam, D. Nelson, M. Moreau, D. Kubitschek, K. Getzandanner,
and Automation Letters (RA-L), 2022.
K. Gordon, A. Eberhardt, and D. Lauretta, “Touch and go camera system
[33] M. Li and A. Mourikis, “Vision-aided inertial navigation with rolling-
(TAGCAMS) for the OSIRIS-REx asteroid sample return mission,”
shutter cameras.” Int. J. Robot. Research, 2014.
Space Science Reviews, vol. 214, 01 2018.
[34] K. Eckenhoff, P. Geneva, and G. Huang, “MIMC-VINS: A versatile
[10] D. S. Bayard, D. T. Conway, R. Brockers, J. H. Delaune, L. H. Matthies,
and resilient multi-imu multi-camera visual-inertial navigation system,”
H. F. Grip, G. B. Merewether, T. L. Brown, and A. M. San Martin,
IEEE Trans. Robot., 2021.
“Vision-based navigation for the NASA Mars helicopter,” in AIAA
[35] J. Montiel, J. Civera, and A. Davison, “Unified inverse depth
Scitech 2019 Forum, 2019, p. 1411.
parametrization for monocular SLAM,” in Robotics: Science and Sys-
[11] M. Maimone, Y. Cheng, and L. Matthies, “Two years of visual odometry tems (RSS), 2006.
on the Mars exploration rovers,” J. Field Robot., vol. 24, no. 3, pp. 169– [36] G. Gallego, J. E. A. Lund, E. Mueggler, H. Rebecq, T. Delbruck, and
186, 2007. D. Scaramuzza, “Event-based, 6-DOF camera tracking from photometric
[12] J. Delaune, R. Brockers, D. S. Bayard, H. Dor, R. Hewitt, depth maps,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 10,
J. Sawoniewicz, G. Kubiak, T. Tzanetos, L. Matthies, and J. Balaram, pp. 2402–2412, Oct. 2018.
“Extended navigation capabilities for a future mars science helicopter [37] S. Agarwal, K. Mierle, and T. C. S. Team, “Ceres Solver,” 3 2022.
concept,” in IEEE Aerospace Conference, 2020, pp. 1–10. [Online]. Available: https://github.com/ceres-solver/ceres-solver
[13] B. Carrier, D. Beaty, M. Meyer, J. Blank, L. Chou, S. DasSarma, [38] I. Alzugaray and M. Chli, “Asynchronous multi-hypothesis tracking of
D. Des Marais, J. Eigenbrode, N. Grefenstette, N. Lanza, A. Schuerger, features with event cameras,” in 2019 International Conference on 3D
P. Schwendner, H. Smith, C. Stoker, J. Tarnas, K. Webster, C. Baker- Vision (3DV), 2019, pp. 269–278.
mans, B. Baxter, M. Bell, and J. G. Xu, “Mars extant life: What’s next? [39] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza, “High speed and
conference report,” Astrobiology, vol. 20, 05 2020. high dynamic range video with an event camera,” IEEE Trans. Pattern
[14] C. Phillips-Lander, J. Wynne, N. Chanover, C. Demirel-Floyd, K. Uckert, Anal. Mach. Intell., 2019.
K. Williams, T. Titus, J. Blank, P. Boston, K. Mitchell, D. Wyrick, [40] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual
S. Shkolyar, K. Retherford, and F. J. Martı́n-Torres, “Mars astrobiolog- inertial odometry using a direct EKF-based approach,” in IEEE/RSJ Int.
ical cave and internal habitability explorer (MACIE): A new frontiers Conf. Intell. Robot. Syst. (IROS), 2015.
mission concept,” in 38th Mars Exploration Program Analysis Group, [41] C. Campos, R. Elvira, J. Rodriguez, J. Montiel, and J. Tardos, “Orb-
04 2020. slam3: An accurate open-source library for visual, visual-inertial and
[15] J. Delaune, D. S. Bayard, and R. Brockers, “Range-visual-inertial multi-map slam,” vol. 37, no. 6, 2021, pp. 1874–1890.
odometry: Scale observability without excitation,” IEEE Robotics and [42] P. Geneva, K. Eckenhoff, W. Lee, Y. Yang, and G. Huang, “OpenVINS:
Automation Letters, vol. 6, no. 2, pp. 2421–2428, 2021. A research platform for visual-inertial estimation,” in IEEE Int. Conf.
[16] ——, “xVIO: A range-visual-inertial odometry framework,” arXiv Robot. Autom. (ICRA), Paris, France, 2020.
preprint arXiv:2010.06677, 2020.
[17] H. Grip, “Surviving an In-Flight Anomaly: What Happened on Ingenu-
ity’s Sixth Flight,” NASA, Tech. Rep., 2021.
[18] A. Z. Zhu, N. Atanasov, and K. Daniilidis, “Event-based visual inertial
odometry,” in IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2017,
pp. 5816–5824.
[19] D. G. Kottas, K. J. Wu, and S. I. Roumeliotis, “Detecting and dealing
with hovering maneuvers in vision-aided inertial navigation systems,”
IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS), 2013.
[20] D. Gehrig, H. Rebecq, G. Gallego, and D. Scaramuzza, “EKLT: Asyn-
chronous photometric feature tracking using events and frames,” Int. J.
Comput. Vis., 2019.
[21] E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza,
“The event-camera dataset and simulator: Event-based data for pose

You might also like