0% found this document useful (0 votes)
15 views17 pages

A Novel Multi Camera Fusion Approach at Plant Scale: From 2D To 3D

This research presents a novel multi-camera fusion approach for non-invasive crop phenotyping, utilizing two multispectral cameras and an RGB depth camera to create a single 3D multispectral image. The method incorporates pattern recognition and statistical optimization to effectively combine thermal and near-infrared data into a coherent 3D representation, allowing for accurate extraction of crop features. The study highlights the importance of this technology in enhancing agricultural productivity and food security through improved crop modeling and analysis.

Uploaded by

Oscar Handous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

A Novel Multi Camera Fusion Approach at Plant Scale: From 2D To 3D

This research presents a novel multi-camera fusion approach for non-invasive crop phenotyping, utilizing two multispectral cameras and an RGB depth camera to create a single 3D multispectral image. The method incorporates pattern recognition and statistical optimization to effectively combine thermal and near-infrared data into a coherent 3D representation, allowing for accurate extraction of crop features. The study highlights the importance of this technology in enhancing agricultural productivity and food security through improved crop modeling and analysis.

Uploaded by

Oscar Handous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SN Computer Science (2024) 5:582

https://doi.org/10.1007/s42979-024-02849-7

ORIGINAL RESEARCH

A Novel Multi‑camera Fusion Approach at Plant Scale: From 2D to 3D


Edgar S. Correa1,2,3 · Francisco C. Calderon1 · Julian D. Colorado1,4

Received: 10 March 2022 / Accepted: 30 March 2024


© The Author(s) 2024

Abstract
Non-invasive crop phenotyping is essential for crop modeling, which relies on image processing techniques. This research
presents a plant-scale vision system that can acquire multispectral plant data in agricultural fields. This paper proposes a
sensory fusion method that uses three cameras, Two multispectral and a RGB depth camera. The sensory fusion method
applies pattern recognition and statistical optimization to produce a single multispectral 3D image that combines thermal and
near-infrared (NIR) images from crops. A multi-camera sensory fusion method incorporates five multispectral bands: three
from the visible range and two from the non-visible range, namely NIR and mid-infrared. The object recognition method
examines about 7000 features in each image and runs only once during calibration. The outcome of the sensory fusion pro-
cess is a homographic transformation model that integrates multispectral and RGB data into a coherent 3D representation.
This approach can handle occlusions, allowing an accurate extraction of crop features. The result is a 3D point cloud that
contains thermal and NIR multispectral data that were initially obtained separately in 2D.

Keywords Multi-spectral imagery · Light-field plenoptic cameras · Phenotyping · Plant modeling · 3D plant morphology

Introduction radiation levels, aluminum toxicity in soils, and biotic stress


[2–6]. Phenotypic quantification requires accurate morpho-
The world population and food demand are increasing, mak- logical modeling, which offers useful information to validate
ing the development of sustainable agricultural technologies new agricultural varieties for higher productivity and food
a vital task [1]. Rice is one of the most important foods security [7, 8]. Plant morphological traits are key variables
worldwide. Experimental phenotyping of different rice vari- in estimating grain yield and crop health. Traditional meth-
eties enables genomic selection models and assessment of ods are often invasive [9] or destructive [10], depending on
agronomic traits such as temperature and humidity tolerance, biological samples [11–14]. To overcome the drawbacks
of traditional methods, image processing techniques have
This article is part of the topical collection “Advances in Applied emerged as a non-destructive alternative. These techniques
Image Processing and Pattern Recognition” guest edited by K C allow qualitative and quantitative analysis of light absorption
Santosh. and reflection at different bands, enabling the characteriza-
tion of crop conditions. This, for instance, permits the detec-
* Edgar S. Correa
[email protected] tion of nitrogen-deficient plants [15–17].
Abiotic stress in plants causes changes in fluorescence
Francisco C. Calderon
[email protected] due to the absorption and reflection of light at different
bands. These variations occur within the 650 to 800 nm
Julian D. Colorado
[email protected] range of the electromagnetic spectrum, corresponding to
the chlorophyll fluorescence [18, 19]. Traditional methods
1
School of Engineering, Pontificia Universidad Javeriana, usually involve direct point measurements, using two main
Bogotá, Cra. 7 No. 40‑62, 110311, Bogotá, Colombia components: (i) image data captured by RGB or multispec-
2
Faculty of Sciences, Université de Montpellier, Montpellier, tral cameras, such as near-infrared, mid-infrared, or ther-
France mal cameras, and (ii) three-dimensional sensors, such as
3
CIRAD, AGAP, Montpellier, France LiDAR, stereo cameras, or plenoptic cameras. The fusion
4
Omics Science Research Institute, iOMICAS, Pontificia
Universidad Javeriana, Cali 760031, Colombia

SN Computer Science
Vol.:(0123456789)
582 Page 2 of 17 SN Computer Science (2024) 5:582

of data from these components allows the generation of a This research presents a multisensory fusion at plant
four-dimensional (4D) model [20]. scale. The challenge in this proposal is to adapt techniques
To the best of the authors’ knowledge, few works in the traditionally associated with other contexts [34–36]. By uti-
literature have used light-field cameras to reconstruct 4D lizing both 2D and 3D cameras, this study proposes a mul-
plant models. In this arena, the PhenoBot research is one tisensory data approach. Accordingly, our research devel-
of the few studies that address plant phenotyping through ops a strategy based on pattern recognition and statistical
4D models using a plenoptic camera [21]. Progress in this optimization to model projective transformations through
area has been restricted to using a single frame or a single a complex object recognition application [37]. The visible
camera to extract plant features, despite the informative data light spectrum (VIS) wavelengths are captured using the Ple-
of the plants being available in different multispectral cam- noptic Raytrix R42 camera and the Kinect One V02 sensor.
eras. This requires a multi-camera fusion method. To tackle The non-visible near-infrared spectrum (NIR) is captured
this challenge, our research aimed to develop a method using the Parrot Sequoia camera, while the mid-infrared
for extracting non-invasive multispectral data of the plant spectral band (MIR) is captured using the Fluke Ti400 ther-
through a multi-camera fusion method, enabling the acqui- mal camera.
sition of infrared and thermal images in 3D space that were Section “Materials and Setup” presents the materials,
initially in 2D. which encompass the mechanical configuration and cali-
bration of the cameras. Section “Methodology” introduces
Related Work the methodology, covering (i) feature detection, (ii) feature
matching with statistical optimization, (iii) homographic
Multi-spectral sensory fusion enhances robustness and reli- model transformation, and the integration with multispectral
ability across a broader range of applications compared to cameras. Finally, the results and conclusions are presented
using only single-wavelength information. Convolutional in Chapters 4 and 5, respectively.
Neural Network (CNN) is a popular deep network archi-
tecture widely employed for analyzing visual imagery [22].
Some studies have introduced CNNs for multispectral Materials and Setup
remote sensing image analysis to enhance the performance
of detection algorithms. These CNN-based detectors are Images Acquisition
trained on large-scale satellite image datasets [23–25]. While
these studies focus on object detection applications, they do The sensory fusion approach is implemented using three
not encompass homographic transformation to unify images cameras: (i) a 3D camera operating in the visible spectrum
from different sensors into a single image, they lack a multi- (VIS), (ii) a near-infrared (NIR) multispectral camera, and
camera fusion approach. Furthermore, they typically require (iii) a mid-infrared thermal multispectral camera. This
a large number of images for training, often exceeding 1200 approach requires the setup of the mechanical assembly and
images. Although these studies introduce object detection the configuration of acquisition software.
applications utilizing multispectral information, they do not
employ a multisensory fusion approach [26]. Other research Camera Assembly and Mounting Structure
integrates 3D information with multispectral remote sens-
ing images to model 3D tree canopies [27, 28], achieving In Fig. 1, two different configurations are observed. These
sensory fusion through public libraries and free software. camera setups share the characteristic of integrating a 3D
However, these approaches do not utilize sensory fusion via camera with three VIS channels, a multispectral camera
homographic transformations. This is because the distance with an infrared channel, and a thermal camera with only
from the sensor to the canopy is often significant, making one channel. The entire assembly is mounted on a tripod
such transformations unnecessary. Conversely, our research to ensure stability and maintain the integrity of the homo-
focuses on sensors positioned close to the plants, neces- graphic alignment.
sitating homographic transformations. Another approach
involves multi-sensory fusion applications over different Plenoptic Camera Calibration
point clouds using 3D spatial join techniques [13, 29, 30].
While promising, this approach is not suitable when the mul- Light field cameras have garnered attention for their innova-
tispectral information is obtained from 2D sensors. Lastly, tive capabilities. This technology captures both the intensity
Multi-Camera Sensor Fusion is commonly utilized for visual and direction of light rays as they propagate through space.
odometry using artificial neural networks [31], or tracking MLA Micro-Lens Array calibration Figure 2a illustrates
interesting objects, but it typically does not integrate multi- the structure of the plenoptic sensor developed by Raytrix.
spectral information [32, 33]. This image highlights the need for camera calibration due to

SN Computer Science
SN Computer Science (2024) 5:582 Page 3 of 17 582

Fig. 1  a Mechanical assembly


of plenoptic camera, multi-
spectral camera, and thermal
camera. b Mechanical assembly
of Kinect V2, multi-spectral
camera, and thermal camera

Fig. 2  Plenoptic camera. a 3D


light field camera [9]. b Projec-
tive model of Plenoptic camera
based on the micro-lenses array
[40]

the micro-lens architecture. In Fig. 2b, the general projec- and the focus setting (establishing the camera-to-subject
tive model of a Plenoptic camera, based on literature [38, distance). Alterations to any of these parameters neces-
39], is presented. In this model, point P represents the real sitate a subsequent recalibration. As depicted in Fig. 3,
spatial information of the scene ( Px , Py , Pz ), and the light the distance from the camera to the desktop is recorded
rays from point P are captured by the main lens, resulting at 360 mm, suggesting the need for corresponding adjust-
in corresponding points Q forming the image captured by ments to the focal length.
the camera. The set of micro-lenses, denoted as l, forms Image illumination is controlled by adjusting the expo-
the basis of plenoptic technology and directly influences the sure time while maintaining a constant aperture setting.
generation of pixels p. When Pz > 0, Eq. 1 describes the For the given lighting conditions, the exposure time is
relationship, where d is the distance between the camera sen- established at 55 milliseconds. In Fig. 4a, an overexpo-
sor and the micro-lens array, D is the distance between the sure effect is evident, a common occurrence when the light
micro-lens array and the main lens, and F is the focal length source is positioned directly in front of the camera, as
of the main lens and the object in the scene [38]. noted by Co et al. (2022). Figure 4b showcases an image
with optimal exposure, whereas Fig. 4c illustrates the cali-
1 1 1
= − (1) bration process for the micro-lens array. Images obtained
F Pz Qz
using the calibrated camera setup are presented in Fig. 5a.
The camera calibration process is performed using the The metric Calibration is performed using the RxLive 5.0
RxLive tool and comprises two essential components: the software calibration wizard. A 22 mm calibration target is
calibration filter and a light source, as demonstrated in utilized, and a total of 44 images are captured with varying
Fig. 3. positions, inclinations, and rotations. Figure 6a depicts the
Three crucial components are essential to the calibra- calibration interface, while Fig. 6b illustrates the 3D acquisi-
tion process, necessitating manual fine-tuning: the cam- tion process.
era’s primary lens (in this case, a 12 mm lens), the dia- Kinect: 3D sensor acquisition. The acquisition of
phragm aperture (controlling the light influx to the sensor), data from this sensor is performed using the MATLAB

SN Computer Science
582 Page 4 of 17 SN Computer Science (2024) 5:582

Fig. 3  Camera Ratrix R42 with


filter calibration disk and light
source [40]

Fig. 4  a Conditions of over-


exposure. b Good lighting
conditions. c Micro Lens Array
calibration [40]

Fig. 5  a Light field image


captured with a R42 Raytrix
plenoptic camera, at a focal
length of 360 mm. b Calibrated
Light field image captured with
a R42 Raytrix plenoptic camera,
at a focal length of 360 mm

programming tool. It is important to ensure that the Kinect Methodology


SDK and Runtime drivers are installed.
Parrot Sequoia: Multi-spectral camera acquisition. The use of 2D and 3D cameras proposes a multisensory
The configuration of this sensor is carried out using the data approach. Thereby, this research work develops a
service provided by the camera via a WiFi connection. strategy based on pattern recognition and statistical opti-
Fluke ti400: Thermal image acquisition. The data cap- mization to model projective transformation through a
tured by this camera is stored on a USB memory device, complex object recognition application.
which is then read and processed using the SmartView
software tool.

SN Computer Science
SN Computer Science (2024) 5:582 Page 5 of 17 582

Fig. 6  a Metric calibration interface of RxLive 5.0. b 3D acquisition with Plenoptic Camera in RxLive 5.0 Software

Complex Object Detection Approach the object detector application relies on the descriptor used.
Features should be invariant to illumination, 3D projective
The methodology is elaborated in four stages, as depicted in transforms, and common object variations. In this research,
Fig. 7. The first stage involves image acquisition with cali- the Scale Invariant Feature Transform (SIFT) approach is
brated cameras, followed by feature detection using descrip- employed.
tor vectors in the second stage. The third stage encompasses SIFT transforms an image into an extensive collection
the matching of these features based on probabilistic optimi- of local feature vectors. The algorithm is inspired by the
zation. Finally, the fourth stage involves estimating the spa- response of neurons in the inferior temporal cortex of pri-
tial transformation model and implementing homographic mates to vision [41]. The resulting feature vector comprises
projection to validate the methodology. 128 dimensions, with each descriptor assigned a position in
Figure 8 shows the experiment structure. It consists of the image denoted by coordinates (X, Y). A more complex
having the interest object on the right, a complex image in image, containing a greater number of details, will yield a
shape and color distribution, on the left, the scene desired larger number of descriptors.
to detect this object. Figure 9 illustrates the position (x, y) of each descriptor
found in the image, marked with an asterisk.
Feature Detection Algorithm 1  Feature matching by force.

To accurately characterize the scene, prominent and distinc-


tive areas of the image must be detected. The robustness of

Fig. 7  Object detection approach. a Image acquisition. b Pattern recognition stage. c Matching–Optimization. d Projective transformation

SN Computer Science
582 Page 6 of 17 SN Computer Science (2024) 5:582

Fig. 8  Topology of the experi-


ment design, on the right the
object of interest, and on the
left, a scene containing several
objects with many similar char-
acteristics

Fig. 9  The composition of


images, indicating with an
asterisk marks the position of
the descriptors generated with
SIFT algorithm for each image

brute force, achieved through the security metric distRatio


to identify the most prominent matches. Figure 11 illus-
trates the result of the brute force matching filter applied
to Fig. 10, yielding 251 matches.

Transform Model Estimation

The transformation model is a mapping function that


establishes a relationship between the object of interest in
the scene and the reference image, which solely features
Feature Matching
the object of interest in the foreground. This mapping is
accomplished through the homography matrix, calculated
The matching stage establishes the relationship between the
using the positional information of the descriptors. To
information generated in two images. This correspondence
achieve this, the Random Sample Consensus (RANSAC)
links each feature by calculating the distance metric between
method is employed as a search strategy. RANSAC is an
descriptors. In the scene, there are 7100 features, while the
iterative technique used to estimate the parameters of a
interest object image contains 1184 features. Consequently,
mathematical model from a set of observed data, which
Fig. 10 displays 7100 matches generated.
may include both inliers and outliers. The methodology
To optimize the pattern recognition application, it is
for implementing RANSAC is outlined in Algorithm 2.
desirable to process the least amount of information pos-
Algorithm 2  RANSAC Algorithm
sible. Algorithm 1 presents the feature correspondence by

SN Computer Science
SN Computer Science (2024) 5:582 Page 7 of 17 582

Fig. 10  7100 matches between


referenced image and scene
image

Fig. 11  251 matches generated


with the algorithm 1 over 7100
matches from Fig. 10

To ensure this, the probability of any selected data


being an inlier is defined as 𝜖 , and 1 − 𝜖 as the probability
of selecting an outlier. At least N selections of s points are
required to ensure (1 − 𝜖 s )N = 1 − p, resulting in the model
represented by Eq. 2 with 𝜖 in Eq. 3.
log(1 − p)
N= (2)
log(1 − (1 − 𝜖 s ))

1 − number of inlier
𝜖= (3)
This process involves estimating the optimal transfor- Total number of points
mation statistically, based on a chi-square probability dis-
The consensus process concludes when the modeled prob-
tribution. The probability that a point is an inlier is set to
ability exceeds the threshold set by the number of events.
𝛼 = 0.95, and to calculate the homography, 𝜎 2 = 5.99 [42].
The spatial transformation is accomplished using the homo-
This approach ensures that the number of samples cho-
graphic matrix described in Eqs. 4 to 6. The first equation
sen is representative, guaranteeing with a probability p
relates to a rotational transformation, the second incorpo-
that at least one of the random samples of points s is free
rates linear transformations along the (x, y) axes, and the
of outliers, meaning the estimated transformation is free
third represents a complete homography transformation in
of outliers with a probability of p = 0.99.
space. The latter transformation, expressed in Eq. 7, is uti-
lized in this work.

SN Computer Science
582 Page 8 of 17 SN Computer Science (2024) 5:582

|X | |Cos(𝜃) −Sin(𝜃) 0| |X | |XT1 |


| T| | | | R| | |
| | | || | |Y |
|YT | = | Sin(𝜃) Cos(𝜃) 0| |YR |
| | | || | (4) | T1 |
| . |
|1| | 0 0 1|| || 1 || | |
| | | b =| |
| . |
| |
|XTn |
|X | |a a t | |X | | |
| T | | 11 12 x | | R | |Y |
| | | || | | Tn |
|YT | = |a21 a22 ty | |YR |
| | | || | (5) |XR1 YR1 1 0 0 0 −XR1 .XT1 −YR1 .XT1 −XT1 ||
| 1 | | 0 0 1| | 1 | |
| | | || | | 0 0
| 0 XR1 YR1 1 −XR1 .YT1 −YR1 .yT1 −yT1 ||
| .
| . . . . . . . . ||
|X | |h h h | |X | =| |
| T | | 11 12 13 | | R | | . . . . . . . . . |
| | | || | | |
|YT | = |h21 h22 h23 | |YR | (6)
|XRn xRn
| 1 0 0 0 −XRn .XTn −YRn .XTn −XTn ||
| | | || | | 0 0 0 XRn YRn 1 −XRn .YTn −YRn .YTn −YTn ||
| 1 | |h31 h32 h33 | | 1 | |
| | | || |
|h11 |
| |
|h |
XT =H.XR (7) | 12 |
| |
|h13 |
| |
The goal is to find the transformation matrix H, defined as: |h21 |
| | | |
h = |h11 h12 h13 h21 h22 h23 h31 h32 h33 |. The transforma- . ||h22 ||
| | |h |
tion XT = H ⋅ XR can be expressed as a linear system Ah = 0 | 23 |
| |
[43–45]. This system is solved using Gaussian elimination |h31 |
| |
with a pseudo-inverse method, as shown in Eq. 8. The matrix |h32 |
| |
| |
resolution is implemented with matrices A and B, which are |h33 |
presented in Eqs. 9 and 10 respectively. (10)
−1
(8)
� � � �
h =(A T .A ) A T .b Image Transformation

|XR1
| YR1 1 0 0 0 −XR1 .XT1 −YR1 .XT1 −XT1 || In Fig. 12, the reference image is depicted in red, while the
| 0
| 0 0 XR1 YR1 1 −XR1 .YT1 −YR1 .yT1 −yT1 || same object is shown in blue in the scene image. The trans-
| .
| . . . . . . . . || formation is performed using the homographic matrix ’h’,
. | (9)
A =| |
| . . . . . . . . which is applied to each corner of the complex object.
| |
|XRn
| xRn 1 0 0 0 −XRn .XTn −YRn .XTn −XTn || The output of the complex object recognition stage is the
| 0
| 0 0 XRn YRn 1 −XRn .YTn −YRn .YTn −YTn || homographic matrix [3×3]. This matrix models the projec-
tive transformation of the scenes, aligning two images from
different cameras to create a single multispectral image. This
relationship enables the development of a general applica-
tion of sensory fusion for multispectral images.

Fig. 12  Object recognition


with occlusion under controlled
conditions

SN Computer Science
SN Computer Science (2024) 5:582 Page 9 of 17 582

thermal cameras, respectively. It is evident that each cam-


era has a different resolution and covers a distinct area of
the crop. Furthermore, the topology of each image var-
ies, even though they were captured simultaneously from
the same scene. Specifically, the thermal image appears
smaller relative to the other two.
The challenge lies in developing an algorithm capable
of generating a single composite image by transforming
each available channel. These channels consist of (i) RGB
- three channels from the 3D sensor, (ii) Near IR - one
channel from the infrared image, and (iii) Medium IR - one
channel from the thermal image.
Fig. 13  Plant image acquisition, in filed conditions, whit plenoptic, The objective is to align the multispectral information
multispectral and thermal cameras configuration captured by each camera with the reference frame of the
3D sensor. This involves two steps: (i) establishing the
relationship between the plenoptic and multispectral NIR
Multi‑Sensory Image Making Up camera, and (ii) determining the relationship between the
plenoptic and multispectral MIR camera.
Each of the cameras integrated into this research produces Matching between Plenoptica camera and NIR camera
2D information. The first step towards achieving sensory Figure 15a illustrates the homographic relationship
fusion is to utilize this 2D information for pattern recogni- between the plenoptic camera and the multispectral NIR
tion. Subsequently, the resulting homography matrices are camera. The region of the image captured with the ple-
used to relate the 2D information to the referenced 3D sen- noptic camera is highlighted in red, while the region cap-
sor. The research revolves around the acquisition process tured with the multispectral camera is indicated in cyan.
depicted in Fig. 13. The homographic transformation obtained using Eq. 8 is
In Fig. 14, the images acquired by each sensor for the depicted in blue.
same scene are depicted: the plenoptic, multispectral, and

Fig. 14  a Plenoptic image with


raytrix R42 camera. Resolu-
tion = [960×1381]. b Infrared
IR image with parrot sequoia
camera. Resolution = [960×
1280]. c Thermal IR image with
t400 fluke camera. Resolution =
[240×320]

Fig. 15  a Homographic projection between the plenoptic camera and nels of the plenoptic image and the NIR channel of the multispectral
the NIR multispectral camera. b Homographic transformation of the camera. Resolution = [960×1381]
NIR multispectral image. c RGN Image. Composed of the RG chan-

SN Computer Science
582 Page 10 of 17 SN Computer Science (2024) 5:582

Figure 15b depicts the transformation of the entire NIR entire image is utilized in this case, and there is no need to
multispectral image. The blue box represents the overlap- crop a fraction.
ping region of the multispectral camera with the frame of In Fig. 16c, an image is composed using the RG channels
the plenoptic image, providing a new channel that can be of the plenoptic camera and the thermal MIR multispectral
related to the three-dimensional information. In Fig. 15c, channel.
an image composed of the RG channels of the plenoptic Figure 17a presents an image composition consisting of
camera and the NIR multispectral channel is presented. the R channel of the plenoptic camera, the IR channel of the
Matching between Plenoptic camera and thermal NIR multispectral camera, and the IR channel of the MIR
camera thermal multispectral camera. Finally, Fig. 17b displays the
Figure 16a illustrates the homographic relationship five channels available for generating different image com-
between the plenoptic camera and the thermal multispec- positions and calculating the vegetative index.
tral camera MIR. The region of the image captured with Figure 18 displays the homographic matrices that estab-
the plenoptic camera is highlighted in red, while the region lish the correspondence between common features in each
captured with the thermal multispectral camera is indicated scene, such as the position of the known pattern. This infor-
in cyan. The homographic transformation obtained using mation facilitates the integration of images into a single
Eq. 8 in the thermal image is depicted in blue. frame, even when they are captured by different cameras.
Figure 16b illustrates the transformation of the complete In this application, the target frame is the one associated
thermal multispectral image. In contrast to Fig. 15b, the with three-dimensional information, which in this research
is linked to the plenoptic camera or Kinect sensor.

Fig. 16  a Homographic projection between the plenoptic camera and posed of the RG channels of the plenoptic image and the MIR chan-
the thermal multispectral camera MIR. b Homographic transforma- nel of the thermal multispectral camera. Resolution = [716×915]
tion of the thermal multispectral MIR image. c RGN2 image. Com-

Fig. 17  a Five channels in the same frame, RGB from the plenoptic posed of the R channel of the plenoptic image, the IR channel of the
camera, NIR from the multispectral camera, and IR-Thermal from NIR multispectral camera and the IR channel of the MIR thermal
the thermal camera. Resolution = [960×1381].b RNN image. Com- mulltiespectral camera. Resolution = [716×915]

SN Computer Science
SN Computer Science (2024) 5:582 Page 11 of 17 582

Fig. 18  Homographic relationship through complex pattern visualization changes in shape and color

SN Computer Science
582 Page 12 of 17 SN Computer Science (2024) 5:582

Proposed System Model setting up an image with various configurations of chan-


nels from the three introduced cameras, integrating them
Figure 19 illustrates the diagram depicting the stages with 3D spatial information. (c) The final module aims to
involved in implementing the multispectral sensory fusion validate the model characterized by homographic projec-
approach. The diagram is structured into three modules: tions in a new scene. This validation stage solely considers
(a) The first module involves object recognition with a the homographic model with new images, without taking
complex pattern within the same scene captured by three into account the complex pattern.
different sensors. (b) The second module focuses on

Fig. 19  Sensory Fusion, going from 2D to 3D. (a) Object detection approach (b) 3D integrating (c) Validation

SN Computer Science
SN Computer Science (2024) 5:582 Page 13 of 17 582

Results and Discussion Figure 20 illustrates the resulting channels within the 3D
reference frame. Further, the 3D information is depicted in
The validation of the proposed methodology involves Fig. 21a, b, integrating spectral bands from the various cam-
reproducing matching results and integrating information eras: (i) a channel from the visible RGB spectrum of the 3D
from all sensors into the 3D reference frame. This results sensor, (ii) the multispectral NIR channel of the infrared
in a three-dimensional model composed of data from three camera, and (iii) the infrared channel of the thermal camera.
cameras, representing a multi-sensory fusion approach In Fig. 22, the homographic transformation relating the
over multispectral wavelengths. infrared image to the RGB image is depicted. The resulting
In Fig. 19a, images captured in the laboratory with dif- image combines common regions, as shown in Fig. 23. This
ferent sensors are displayed. These include three channels: clipping represents a new channel available for generating
the visible RGB spectrum (VIS), the multispectral near- an integrated image along with the RGB image.
infrared channel (NIR), and the thermal middle-infrared Finally, Fig. 24 presents an image composed of the RG
channel (MIR). channels of the Kinect sensor (depicted in the red box in
Fig. 22), along with (a) the NIR channel of the Parrot sensor

Fig. 20  Five channels in the


same frame, RGB from the 3D
camera, NIR from the multi-
spectral camera, and IR-Ther-
mal from the thermal camera.
Resolution = [368×490]

Fig. 21  a RGB initial 3D model. Captured with the Kinect v02 sensor. b Resulting 3D model, composed of: (i) the G channel of the 3D sensor,
(ii) the IR channel of the parrot camera and (iii) the infrared thermal channel of the Fluke thermal camera

SN Computer Science
582 Page 14 of 17 SN Computer Science (2024) 5:582

Fig. 22  Homographic projection between the Kinect camera and: (a) the NIR multispectral camera and (b) the NIR thermal multispectral cam-
era

Fig. 23  Homographic trans-


formation of the: (a) NIR mul-
tispectral image and (b) MIR
multispectral image

Fig. 24  (a) RGN Image with


Resolution = [757×740]. (b)
RGT Image with Resolution =
[368×490]

and (b) the MIR channel of the Fluke sensor (shown in the which combines VIS and NIR-MIR information, are dis-
blue box in Fig. 23). played in pseudo color in the RGB standard. This coloring
The images integrating multispectral information in is necessary because the human visual system cannot per-
Fig. 20, for instance, such as the one depicted in Fig. 24, ceive these wavelengths. In Fig. 24, we observe the result

SN Computer Science
SN Computer Science (2024) 5:582 Page 15 of 17 582

of the investigation: an image composed of the green visible The resulting model combines 3D information, facilitat-
channel (G) along with two multispectral channels, MIR and ing precise morphological plant measurements, and multi-
NIR, referenced in 3D space. spectral data, enabling assessment of plant conditions such
The resulting image size varies based on the common as weather stress or nitrogen deficiency. The sensory fusion
regions captured by the cameras in the scene. The sensory approach generates valuable information for crop modeling
fusion strategy, employing pattern recognition and statisti- applications, with the potential to extract morphological
cal optimization, effectively models the capture of different variables such as number and size of leaves, stems or plant
images to generate a single, multisensory integrated image. height, in addition multispectral information at the plant
This approach, crucial for plant-related applications, per- scale, it is non-invasive that can be installed in agricultural
forms object recognition once during calibration and is production fields.
capable of overcoming occlusions. Furthermore, it offers
the potential to propose new robust descriptors.
Funding Open Access funding provided by Colombia Consortium.
The success of the proposed sensory fusion, as depicted This work was partly funded by Pontificia Universidad Javeriana in
in Fig. 19, is evident. Initially, the projective model imple- Bogota Colombia, under the project ID 20366 -Optimized navigati on
ments a complex pattern recognition approach, followed by control applied to monitoring and inspection. Also, by the Omics Sci-
validation using multi-camera scene acquisition. The result ence Research Insti tute (iOMICAS) anchored in Pontificia Universidad
Javeriana in Cali Colombia.
is a point cloud composed of multispectral information ini-
tially in 2D, demonstrating the efficacy of the strategy. Declarations
The effectiveness of the proposed sensory fusion, as
depicted in Fig. 19, is appreciable. It begins with the imple- Conflict of interest The authors declare that they have no Conflict of
mentation of a complex pattern recognition approach in the interest.
projective model, followed by validation through multi-cam- Open Access This article is licensed under a Creative Commons Attri-
era scene acquisition. This results in a point cloud initially bution 4.0 International License, which permits use, sharing, adapta-
containing 2D multispectral information, demonstrating the tion, distribution and reproduction in any medium or format, as long
strategy’s success. as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated
Conclusions otherwise in a credit line to the material. If material is not included in
the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
This paper proposes a novel multi-camera sensory fusion need to obtain permission directly from the copyright holder. To view a
technique for complex object detection based on homogra- copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
phy model transformations. The proposed technique uses
probabilistic optimization and pattern recognition methods
that are widely used in pose estimation, odometry, and track-
ing tasks. The technique performs sensory fusion in a multi- References
sensory setting, which is a novel contribution.
The technique produces homographic transformations for 1. Copyright. In: Bhullar, G.S., Bhullar, N.K. (eds.) Agricultural
each camera, using images with rich patterns that do not Sustainability, p. Academic Press, San Diego (2013). https://​doi.​
org/​10.​1016/​B978-0-​12-​404560-​6.​00017-4. https://​www.​scien​
require a large-scale dataset. This benefit allows for detailed cedir​ect.​com/​scien​ce/​artic​le/​pii/​B9780​12404​56060​00174
morphological modeling of individual plants or crops, as 2. Rose MT, Rose TJ, Pariasca-Tanaka J, Widodo, Wissuwa M.
the cameras are close to the target, where comprehensive Revisiting the role of organic acids in the bicarbonate tolerance
datasets may not be available. The technique performs object of zinc-efficient rice genotypes. Funct Plant Biol. 2011;38(6):493–
504. https://​doi.​org/​10.​1071/​FP110​08. (cited By 32).
recognition only once during the initial setup, and then 3. Wu C, Zou Q, Xue S, Mo J, Pan W, Lou L, Wong MH. Effects
applies sensory fusion to unknown scenes. The technique of silicon (si) on arsenic (as) accumulation and speciation in rice
generates a 3D representation from initially 2D multispectral (oryza sativa l.) genotypes with different radial oxygen loss (rol).
images, which include near or thermal infrared information. Chemosphere. 2015;138:447–53. https://d​ oi.o​ rg/1​ 0.1​ 016/j.c​ hemo​
sphere.​2015.​06.​081.
However, the technique is sensitive to distance changes, and 4. Wu C, Zou Q, Xue S-G, Pan W-S, Yue X, Hartley W, Huang L,
needs re-calibration if altered. Mo J-Y. Effect of silicate on arsenic fractionation in soils and
Future work will be oriented towards implementing and its accumulation in rice plants. Chemosphere. 2016;165:478–86.
comparing convolutional neural networks for object recogni- https://​doi.​org/​10.​1016/j.​chemo​sphere.​2016.​09.​061.
5. Zhang L, Yang Q, Wang S, Li W, Jiang S, Liu Y. Influence of
tion to enhance the sensory fusion performance. This will silicon treatment on antimony uptake and translocation in rice
involve integrating homographic transformations, with a genotypes with different radial oxygen loss. Ecotoxicol Environ
main challenge being the creation of a large image database.

SN Computer Science
582 Page 16 of 17 SN Computer Science (2024) 5:582

Saf. 2017;144:572–7. https://​doi.​org/​10.​1016/j.​ecoenv.​2017.​06.​ American Society of Agricultural and Biological Engineers, ???,
076. 2014.
6. Matsubara K, Yonemaru J-I, Kobayashi N, Ishii T, Yamamoto 22. Sandhya Devi RS, Vijay Kumar VR, Sivakumar P. A review of
E, Mizobuchi R, Tsunematsu H, Yamamoto T, Kato H, Yano image classification and object detection on machine learning and
M. A follow-up study for biomass yield qtls in rice. PLoS ONE, deep learning techniques. In: Proceedings of the 5th International
2018;13(10). https://​doi.​org/​10.​1371/​journ​al.​pone.​02060​54. cited Conference on Electronics, Communication and Aerospace Tech-
By 2 nology, ICECA 2021, 2021.
7. McCouch WMTCS. Open access resources for genome-wide asso- 23. Qingyun F, Zhaokui W. Cross-modality attentive feature fusion for
ciation mapping in rice. Nat Commun. 2016;7:1. https://​doi.​org/​ object detection in multispectral remote sensing imagery. Pattern
10.​1038/​ncomm​s10532. Recogn. 2022;1:30.
8. Bouman BAM, Peng S, Castañeda AR, Visperas RM. Yield and 24. Lin T-, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dol-
water use of irrigated tropical aerobic rice systems. Agric Water lár P, Zitnick CL. Microsoft COCO: Common Objects in Context.
Manag. 2005;74(2):87–105. https://d​ oi.o​ rg/1​ 0.1​ 016/j.a​ gwat.2​ 004.​ Lecture Notes in Computer Science (including subseries Lecture
11.​007. Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
9. Kamffer KAOA Z Bindon. Optimization of a method for the ics), vol. 8693 LNCS, pp. 740–755 (2014). Cited By :11409
extraction and quantification of carotenoids and chlorophylls dur- 25. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman
ing ripening in grape berries (vitis vinifera cv. merlot). Journal A. The pascal visual object classes (voc) challenge. Int J Comput
of Agricultural and Food Chemistry, 2020;58. https://​doi.​org/​10.​ Vis. 2010;88(2):303–38 (Cited By :8991).
1021/​jf100​4308 26. Gani MO, Kuiry S, Das A, Nasipuri M, Das N. Multispectral
10. Ling Q, Wang S, Ding Y, Li G. Re-evaluation of using the color Object Detection with Deep Learning. Communications in Com-
difference between the top 3rd leaf and the 4th leaf as a unified puter and Information Science, vol. 1406 CCIS, pp. 105–117
indicator for high-yielding rice. Sci Agric Sin. 2017;50(24):4705– (2021). Cited By :3
13. https://​doi.​org/​10.​3864/j.​issn.​0578-​1752.​2017.​24.​004. (cited 27. Münzinger M, Prechtel N, Behnisch M. Mapping the urban forest
By 2). in detail: From lidar point clouds to 3d tree models. Urban For
11. Colorado JD, Calderon F, Mendez D, Petro E, Rojas JP, Cor- Urban Green. 2022;74:2.
rea ES, Mondragon IF, Rebolledo MC, Jaramillo-Botero A. A 28. Li H, Zech J, Ludwig C, Fendrich S, Shapiro A, Schultz M, Zipf
novel nir-image segmentation method for the precise estimation of A. Automatic mapping of national surface water with openstreet-
above-ground biomass in rice crops. PLoS ONE. 2020;15(10):6. map and sentinel-2 msi data using deep learning. Int J Appl Earth
12. Correa ES, Calderon F, Colorado JD. Gfkuts: A novel multispec- Observ Geoinform. 2021;104:2.
tral image segmentation method applied to precision agriculture. 29. Jurado JM, López A, Pádua L, Sousa JJ. Remote sensing image
In: 2020 Virtual Symposium in Plant Omics Sciences, OMICAS fusion on 3d scenarios: a review of applications for agriculture
2020 - Conference Proceedings, 2020. Cited By :2 and forestry. Int J Appl Earth Observ Geoinform. 2022;11:2.
13. Jing Z, Guan H, Zhao P, Li D, Yu Y, Zang Y, Wang H, Li J. 30. Wichmann V, Bremer M, Lindenberger J, Rutzinger M, Georges
Multispectral lidar point cloud classification using se-pointnet++. C, Petrini-Monteferri F. Evaluating the potential of multispectral
Remote Sens. 2021;13(13):8. airborne lidar for topographic mapping and land cover classifica-
14. Jimenez-Sierra DA, Correa ES, Benítez-Restrepo HD, Calderon tion. ISPRS Annals of the Photogrammetry, Remote Sensing and
FC, Mondragon IF, Colorado JD. Novel feature-extraction meth- Spatial Information Sciences II-3/W5, 113–119 (2015). https://​
ods for the estimation of above-ground biomass in rice crops. doi.​org/​10.​5194/​isprs​annals-​II-3-​W5-​113-​2015
Sensors. 2021;21(13):4369. 31. Kaygusuz N, Mendez O, Bowden R. Multi-camera sensor fusion
15. Yang J, Song S, Du L, Shi S, Gong W, Sun J, Chen B. Analyzing for visual odometry using deep uncertainty estimation. 2021.
the effect of fluorescence characteristics on leaf nitrogen concen- https://​doi.​org/​10.​1109/​itsc4​8978.​2021.​95650​79.
tration estimation. Remote Sens. 2018;10:9. https://​doi.​org/​10.​ 32. Dockstader SL, Tekalp AM. Multiple camera fusion for multi-
3390/​rs100​91402. object tracking. In: Proceedings 2001 IEEE Workshop on Multi-
16. Yuan Z, Ata-Ul-Karim ST, Cao Q, Lu Z, Cao W, Zhu Y, Liu X. Object Tracking, 2001;95–102. https://​doi.​org/​10.​1109/​MOT.​
Indicators for diagnosing nitrogen status of rice based on chloro- 2001.​937987
phyll meter readings. Field Crops Res. 2016;185:12–20. https://​ 33. Cachique SM, Correa ES, Rodriguez-Garavito C. Intelligent
doi.​org/​10.​1016/j.​fcr.​2015.​10.​003. digital tutor to assemble puzzles based on artificial intelligence
17. Yamane K, Kawasaki M, Taniguchi M, Miyake H. Correlation techniques. In: International Conference on Applied Informatics,
between chloroplast ultrastructure and chlorophyll fluorescence 2020;56–71 . Springer
characteristics in the leaves of rice (oryza sativa l.) grown under 34. Alam MS, Morshidi MA, Gunawan TS, Olanrewaju RF, Ari-
salinity. Plant Prod Sci. 2008;11(1):139–45. https://​doi.​org/​10.​ fin F. Pose estimation algorithm for mobile augmented real-
1626/​pps.​11.​139. ity based on inertial sensor fusion. Int J Electr Comput Eng.
18. Zhang H, Zhu L-f, Hu H, Zheng K-f, Jin Q-y. Monitoring leaf 2022;12(4):3620–31.
chlorophyll fluorescence with spectral reflectance in rice (oryza 35. Yang L, Li Y, Li X, Meng Z, Luo H. Efficient plane extraction
sativa l.). Proc Eng. 2011;15:4403–8. https://​doi.​org/​10.​1016/j.​ using normal estimation and ransac from 3d point cloud. Com-
proeng.​2011.​08.​827. (CEIS 2011). puter Standards and Interfaces, 2022;82. Cited By :1
19. Subhash N, Mohanan CN. Laser-induced red chlorophyll fluores- 36. Gao L, Zhao Y, Han J, Liu H. Research on multi-view 3d recon-
cence signatures as nutrient stress indicator in rice plants. Remote struction technology based on sfm. Sensors. 2022;22:12.
Sens Environ. 1994;47(1):45–50. https://​doi.​org/​10.​1016/​0034-​ 37. Correa ES, Parra CA, Vizcaya PR, Calderon FC, Colorado JD.
4257(94)9​ 0126-0. (Fluorescence Measurements of Vegetation). Complex object detection using light-field plenoptic camera 1576
20. Liu S. Phenotyping wheat by combining adel-wheat 4d structure CCIS, 2022;119–133.
model with proximal remote sensing measurements along the 38. Zhang C. Decoding and calibration method on focused plenoptic
growth cycle. PhD thesis, 2016. camera. Comput Vis Med. 2016;2:2096–662. https://​doi.​org/​10.​
21. Polder G, Hofstee JW. Phenotyping large tomato plants in the 1007/​s41095-​016-​0040-x.
greenhouse using a 3D light-field camera, vol. 1, pp. 153–159.

SN Computer Science
SN Computer Science (2024) 5:582 Page 17 of 17 582

39. O’brien S, Trumpf J, Ila V, Mahony R. Calibrating light-field 44. Zhuang L, Yu J, Song Y. Panoramic image mosaic method based
cameras using plenoptic disc features. In: 2018 International Con- on image segmentation and improved sift algorithm, 2021;2113.
ference on 3D Vision (3DV), 2018;286–294. https://​doi.​org/​10.​ Chap. 1
1109/​3DV.​2018.​00041 45. Luo X, Chen W, Du X. A matching algorithm based on the topo-
40. Edgar S Correa1, PRVFC Carlos A Parra1, Colorado JD. Complex logical structure of feature points 2021;11720. Cited By :1
object detection using light-field plenoptic camera, 2022;21:977–
1000. https://​doi.​org/​10.​1016/​S0262-​8856(03)​00137-9 Publisher's Note Springer Nature remains neutral with regard to
41. Lowe DG. Distinctive image features from scale-invariant key- jurisdictional claims in published maps and institutional affiliations.
points. Int J Comput Vis. 2004;60:91–110. https://​doi.​org/​10.​
1023/B:​VISI.​00000​29664.​99615.​94.
42. Fotouhi HHK-NMAKS M. Sc-ransac: spatial consistency on
ransac. Multimedia Tools and Applications, 2019;78(7):9429–
9461. https://​doi.​org/​10.​1007/​s11042-​018-​6475-6
43. Solem JE. Programming computer vision with python: Tools and
algorithms for analyzing images, 2012. Pages 72-74. " O’Reilly
Media, Inc."

SN Computer Science

You might also like