Estimation of Low Nutrients in Tomato Crops Through The Analysis of Leaf Images Using Machine Learning
Estimation of Low Nutrients in Tomato Crops Through The Analysis of Leaf Images Using Machine Learning
(Received 19 March 2021; Revised 30 March 2021; Accepted 09 April 2021; Published online 15 April 2021)
Abstract: Tomato crops are considered the most important agricultural products worldwide. However, the quality of tomatoes
depends mainly on the nutrient levels. Visual inspection is made by farmers to anticipate the nutrient deficiency of the plants.
Recently, precision agriculture has explored opportunities to automate nutrient level monitoring. Previous work has demon-
strated that a convolutional neural network is able to estimate low nutrients in tomato plants using images of their leaves.
However, the performance of the convolutional neural network was not adequate. Thus, this work proposes a novel convolutional
neural network-based classifier, namely, CNN+AHN, for estimating low nutrients in tomato crops using an image of the tomato
leaves. The CNN+AHN incorporates a set of convolutional layers as the feature extraction part, and a supervised learning method
called artificial hydrocarbon network as the dense layer. Different combinations of the architecture of CNN+AHN were
examined. Experimental results showed that our best CNN+AHN classifier is able to estimate low nutrients in tomato plants with
an accuracy of 95.57% and F1-score of 95.75%, outperforming the literature.
Key words: agriculture; image processing; deep learning; computer vision; color analysis
In this work, we take advantage of deep learning to analyze the those weights are known as molecular parameters, and they
leaves of the tomato crops for detecting nutrients deficiency. An- resemble the hydrogen and carbon atoms of a hydrocarbon mole-
other successfully vision-based application of deep learning [13] is cule in nature.
where with a simple CNN predicting the nutrient deficiency in
X
n X
k≤4
tomato crops using an image of their leaves, the results show that φðx,kÞ = σr H ir xir : (1)
this CNN-based work achieved 86.59% of accuracy metric using r=1 i=1
the same dataset used in this work [13].
The current work is based on our previous research [13] in Molecules are arranged in groups so-called compounds. Those
which we showed that a simple nonoptimized CNN model is able to are structures that represent nonlinearities among molecules. They are
perform an accuracy of 86.59%. For that, we collected and released associated with a functional behavior as in (2), where m is the number
a public dataset of tomato leaves with their nutrient levels. Then, of molecules in the compound and Σj is a partition of the input x such
we performed four different experiments using the original dataset, that Σj = fxjargminj ðx − μj Þ = jg, and μj ∈ Rn is the center of the
a set of enhanced images from the dataset, the original images j-th molecule [22]. In fact, Σj1 ∩Σj2 = ∅ if j1 ≠ j2 . The compound
augmented with others from the Internet, and the enhancement of behavior written in (2) is known as linear chain of m molecules since
the original plus the augmented images. In contrast, the current it is similar to organic chains in chemical nature [31].
work assumes that a CNN model is able to perform the classifica- 8
>
> φ ðx,3Þ x ∈ Σ1
tion task of low nutrients detection. Then, we improve the archi- > 1
>
tecture of the CNN via Bayesian optimization and the inclusion of < φ2 ðx,2Þ x ∈ Σ2
the AHN model at the dense layer. We outperformed our previous ψðxÞ = ::: ::: : (2)
>
>
work as shown in Section IV. >
> φ ðx,2Þ x ∈ Σ
: m−1 m−1
φm ðx,3Þ x ∈ Σm
Fig. 2. Schematic of the vision-based monitoring system for detecting low nutrients in tomato plants.
2) IMAGE PREPROCESSING. In our previous work [13], we classification task, to finally obtain the proposed CNN+AHN
proved that contrast enhancement and image resizing improve the model.
performance of the machine learning classifier. In the current work,
1) CNN MODEL BACKBONE. We propose a CNN as a backbone
we adopted the same preprocessing for the images to be consistent
with the comparative process. that receives as input a 28 × 28 px size of an RGB color image.
First, we apply contrast enhancement to the original images The image inputs into a sequence of three convolutional layers with
emphasizing the color of the leaves using the gamma transformation 8, 16, and 32 filters of 3 × 3 size. Each of these layers follows with a
to the Red-Green-Blue (RGB) channels [32], as shown in (4), where rectified linear unit (ReLU)-based layer and a max-pooling layer
r is the input gray level (red, green, or blue intensity values) to the that reduces the spatial size of the maps. Finally, there is a fully
gamma transformation, L is the maximum intensity value in the connected layer with a Softmax layer of four units. The output of
channel, s is the resulting output gray level, and ½a,b is the input the CNN is a class label of the low nutrient estimated in the image.
range of gray levels to enhance. For all images in the experimenta- The possible classes are nitrogen, phosphorus, potassium, and
tion, the γ value was set to 1, and we used the following input range normal. It is worth noting that this CNN architecture was obtained
of gray levels to contrast enhancement: ½0.2 ðL − 1Þ, 0.6 ðL − 1Þ using a Bayesian optimization method [33] that searched in the
for the red channel, ½0.3 ðL − 1Þ, 0.7 ðL − 1Þ for the green following hyperparameters: the number of convolutional layers
channel, and ½0,ðL − 1Þ for the blue channel. (from 1 to 5), the initial learning rate (from 0.001 to 0.01), and the
8 regularization term (from 1 × 10−10 to 1 × 10−2 ). The number
>
> of filters and the filter sizes of the convolutional layers were
<0 r<a fixed.
r−a γ
ψðxÞ = ðL − 1Þ½b−a a ≤ r ≤ b: (4) We used the stochastic gradient descent with momentum
>
>
: ðL − 1Þ r>b algorithm for training, and the optimized hyperparameters: Three
convolutional layers, 0.005044 as initial learning rate, and regula-
Then, we reduce the original images ð3024 × 4032 px) to 28 × 28 px tion term of 1.6792 × 10−10 .
size to reduce the computing task in the CNN+AHN model. 2) AHN AS DENSE LAYER. To develop the CNN+AHN model,
after training the CNN, we isolate the first three convolutional
layers with their respective ReLU-based and max-pooling layers.
C. DEVELOPMENT OF THE CNN+AHN MODEL Then, we place an AHN in sequence. We use Bayesian optimiza-
The proposed CNN+AHN model consists of a set of convolu- tion to determine the suitable number of molecules (from 1 to 20) in
tional layers that act as the feature extractor, and an AHN as the AHN model, as the only hyperparameter. The output of the
the dense layer (Fig. 3). To design this architecture, first, we AHN is, then, connected to a Softmax layer to perform the
train and optimize a simple CNN model using a dataset of tomato classification task. Fig. 3 shows the architecture of the proposed
leaves images with low nutrients labels. Then, we use the feature CNN+AHN model.
extraction layers of the CNN as the first part of our model, and To train the AHN dense layer, we input the images into the
we place an AHN in sequence. Later, we train the AHN for the CNN and we get the output of the last max-pooling layer. These
Fig. 3. Architecture of the proposed CNN+AHN model. It receives an input RGB image of the tomato leaves with 28 × 28 px resolution. Then, this
image goes through the three convolutional-based layers and the AHN dense layer. Finally, the estimated class is output using a Softmax layer.
outputs were used as inputs to the AHN, and the same class labels All the experiments were implemented in MATLAB using the
were used as targets. We used the SPE-AHN algorithm to train the Deep Learning Toolbox, and a personal computer Dell with
AHN with four molecules. processor Intel Core i7-8850H at 2.6 GHz, six CPU cores, and
16 GB in RAM.
3) FEATURE REDUCTION LAYER. The literature reports that
large number of features in data might reduce the predictability
power of the AHN [22]. To minimize the impact of large number of IV. RESULTS AND DISCUSSION
features from the last convolutional layer, we propose to implement
a feature reduction layer in the CNN+AHN before the AHN. To do We compare the performance of the CNN+AHN classifier with the
so, we use principal components analysis (PCA) [34] to reduce the CNN model reported in our previous work [13]. Also, we made
number of features. This reduction layer takes the convolutional different configurations to validate the effectiveness of the pro-
features as input, then principal components are computed, and posal, e.g., the single CNN model (backbone), the CNN+AHN,
finally, a subset of the k first components are selected that explain a and the CNN+AHN with a PCA layer.
given degree, that is, threshold p, of data variance. For this work, For the experiments, we conduct a fivefold cross-validation
we select a threshold of p = 97% of explained variance. Finally, approach for each of the models. In Table I, we report the mean and
those k components are the inputs of the AHN layer. standard deviation of each model with respect to the performance
metrics.
Table I shows that the baseline CNN model reported in [13]
D. EVALUATION performs with an accuracy of 86.59 2.34%. It is far from the new
results found in this work. For instance, the CNN backbone
We evaluate the performance of the CNN+AHN classifier with
classifier performs with an accuracy of 93.83 1.72% and the
widely used metrics in machine learning [35]: accuracy (5),
proposed CNN+AHN gets an accuracy of 95.33 0.17% and
precision (6), sensitivity (7), specificity (8), and F1-score (9),
95.36 0.23% when no having and having PCA layer, respec-
where TP refers to true positives, TN to true negatives, FP to
tively. This gives an insight that the combined CNN+AHN im-
false positives, and FN to false negatives.
proves the performance of the single CNN model in all the metrics.
Moreover, the standard deviation of the single CNN model is
TP + TN slightly larger than the one computed with the CNN+AHN.
accuracy = , (5)
TP + TN + FP + FN Fig. 4 shows the confusion matrix of the best model obtained
during the cross-validation using the CNN+AHN with PCA layer
TP (accuracy: 95.57%, F1-score: 95.75%, precision: 95.94%, sensi-
precision = , (6) tivity: 95.61%, specificity: 98.40%). It can be observed that mainly
TN + FP all images are well classified with the target low nutrients, except
where the target class is potassium, and the model incorrectly
TP classifies the image as nitrogen. This can be explained since low
sensitivity = , (7) nitrogen is related to yellow leaves and low potassium to leaves
TP + FP
with yellow edges. This condition is difficult to discriminate
visually.
TN
specificity = , (8)
TN + FP A. DISCUSSION
The experimental results show that the proposed CNN+AHN with
precision sensitivity
F1-score = 2 : (9) PCA layer is the best model in terms of all performance metrics
precision þ sensitivity evaluated in this work. As noticed, the single-optimized CNN
classifier found in this work is better than the previous baseline
From our previous work [13], we determined that the training CNN. Also, the optimized CNN classifier is able to transfer the
of models is better with an augmentation of the dataset. In this feature extraction layers into the CNN+AHN in which the response
regard, the current work adopts the same augmentation procedure is slightly better in all the metrics (mean and standard deviation).
that consists of 84 images retrieved from the Internet. Those were However, the CNN+AHN with PCA layer does not represent a
collected manually by inspection, and the level of nutrients was major improvement. A reasoning to choose CNN+AHN with PCA
tagged using the information in the description of the web sources. layer as the best model, in contrast with the CNN+AHN without
The augmented images were also preprocessed in the same way as the PCA layer, is that the feature reduction impacts positively in the
the original images. number of learning parameters that has the AHN. In this regard, the
Confusion Matrix convolutional layers that act as the feature extraction process.
Then, a PCA layer was used to reduce the number of features
128 2 0 2 97.0% that enters to the final layer comprised of an AHN with a Softmax
normal
19.5% 0.3% 0.0% 0.3% 3.0% function. We optimized the CNN backbone and the AHN
separately.
Based on the comparative results, against the baseline CNN
1 224 2 9 94.9%
nitrogen
0.2% 34.2% 0.3% 1.4% 5.1%
from previous work and different architecture configurations of the
CNN+AHN, we validated that the CNN+AHN with PCA layer
Output Class
ge
iu
rm
or
ss
tro
ph
ta
ni
os
po
Target Class
Fig. 4. Confusion matrix of the best CNN+AHN with PCA layer REFERENCES
classifier (accuracy: 95.57%, F1-score: 95.75%, precision: 95.94%,
sensitivity: 95.61%, specificity: 98.40%). [1] Infoagro, “El control de plagas reduce el desperdicio de alimentos,”
Apr. 2018.
[2] Organización de las Naciones Unidas para la Alimentación y la
AHN associated to the one without the PCA layer has 28,224 Agricultura, “Pérdidas y desperdicios de alimentos en américa latina
learning parameters while the AHN with PCA layer only has 5,634, y el caribe,” Apr. 2018.
that is, a significant reduction. [3] Infoagro, “Buenas prácticas en el uso de fertilizantes,” Apr. 2018.
The advantages of our method are that the CNN+AHN [4] Conoce Hidroponia, “Importancia del cultivo de jitomate en México,”
classifier significantly improves the vision-based monitoring sys- Apr. 2018.
tem for anticipating the insufficiency of primary nutrients in tomato [5] SAGARPA, “Planeación Agrícola Nacional,” Jun. 2018.
crops using only images from leaves. Also, it is validated that the [6] E. Heuvelink, Tomatoes. Netherlands: CABI, 2005.
CNN+AHN works with different images with no restrictions on [7] Infoagro, “El cultivo del tomate (1a parte),” May 2018.
how to take the photograph (angle or distance). Some weaknesses [8] Infojardin, “Tomate, tomatera, jitomate,” May 2018.
of the proposed CNN+AHN are that the dataset is very limited, so a [9] J. B. Jones, Tomato Plant Culture: In the Field, Greenhouse, and
large dataset is required for robust validation. Also, the CNN Home Garden, 2nd ed. Florida: CRC Press, 2007.
+AHN was not evaluated for different intensity light. Also, the [10] Organización de las Naciones Unidas para la Alimentación y la
resizing preprocessing might delete interesting features that were Agricultura, “El Cultivo de Tomate con Buenas Prácticas Agrícolas
not taken into account in this research. Finally, the CNN+AHN en la Agricultura Urbana y Periurbana,” Apr. 2018.
was validated in tomato leave images, thus no other crops are [11] Agrologica, “Deficiencias y excesos nutricionales en tomate: sínto-
considered so far. mas y corrección,” Mar. 2018.
To this end, and to the best of our knowledge, this is the first [12] L. Chanabá and J. Andrés, Efecto de la Fertilización Química y
time that the combination of CNN and AHN is done for a vision- Orgánica en el Tomate de Árbol. Quito: INIAP Archivo Historico,
based monitoring system to detect low nutrients in tomato plants. 2003.
Therefore, we consider our current work to be very promising for [13] C. Cevallos, H. Ponce, E. Moya-Albor, and J. Brieva, “Vision-based
future precision agriculture applications. Currently, we manually analysis on leaves of tomato crops for classifying nutrient deficiency
photograph the tomato leaves. Later, we could adopt the drone using convolutional neural networks,” in 2020 Int. Joint Conf. Neural
approach [36] to automatically and systematically photograph the Networks (IJCNN), Glasgow, UK, 2020, pp. 1–7.
tomato leaves based on the planned paths to extend our research to [14] D. Blancard, Tomato Diseases: Identification, Biology and Control.
massive farming lands. Versailles Cedex: Elsevier, 2009.
[15] J. De Baerdemaeker, “Precision agriculture technology and robotics
for good agricultural practices,” IFAC Proc. Vol., vol. 44, pp. 1–4,
V. CONCLUSIONS 2016.
[16] E. Hemming, J. Henten, C. Bac, and Y. Edan, “Robotics in protected
This work proposed a CNN+AHN classifier to estimate low cultivation,” IFAC Proc. Vol., vol. 46, pp. 170–177, 2016.
nutrients—nitrogen, phosphorus, or potassium—in tomato plants [17] P. Wan, A. Toudeshki, H. Tan, and R. Ehsani, “A methodology for
using an image of their leaves. The method consisted of a hybrid fresh tomato maturity detection using computer vision,” Comput.
model divided into two parts. The first comprises a set of Electron. Agric., vol. 146, pp. 43–50, 2018.
[18] N. El-Bendary, E. E. Hariri, A. E. Hassanien, and A. Badr, “Using composition analysis using near-infrared spectroscopy in cloud,” J.
machine learning techniques for evaluating tomato ripeness,” Expert Artif. Intell. Technol., vol. 1, pp. 74–82, 2021.
Syst. Appl., vol. 42, pp. 1892–1905, 2014. [28] D. Jiang, G. Qi, G. Hu, N. Mazur, Z. Zhu, and D. Wang, “A residual
[19] P. Megha, Arakeri and Lakshmana, “Computer vision based fruit neural network-based method for the classification of tobacco culti-
grading system for quality evaluation of tomato in agriculture indus- vation regions using near-infrared spectroscopy sensors,” Infrared
try,” Procedia Comput. Sci., vol. 79, pp. 426–433, 2016. Phys. Technol., vol. 111, p. 103494, 2020.
[20] N. Goel and P. Sehgal, “Fuzzy classification of pre-harvest tomatoes [29] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
for ripeness estimation – an approach based on automatic rule learning 521, no. 7553, pp. 436–444, 2015.
using decision tree,” Appl. Soft Comput., vol. 36, pp. 45–56, 2015. [30] J. Hu, Y. Kuang, B. Liao, L. Cao, S. Dong, and P. Li, “A multichannel
[21] H. Ponce and P. Ponce, “Artificial organic networks,” in IEEE 2D convolutional neural network model for task-evoked fMRI data
Electron. Robotics Automotive Mech. Conf. (CERMA), Cuernavaca, classification,” Comput Intell Neurosci, vol. 2019, no. 1, p. 5065214,
Mexico, 2011, pp. 29–34. 2019.
[22] H. Ponce, P. V. de Campos Souza, A. J. Guimarães, and G. González- [31] H. Ponce, P. Ponce, and A. Molina, Artificial Organic Networks:
Mora, “Stochastic parallel extreme artificial hydrocarbon networks: Artificial Intelligence Based on Carbon Networks, Springer,
An implementation for fast and robust supervised machine learning in vol. 521 of Studies in Computational Intelligence, Switzerland,
high-dimensional data,” Eng. Appl. Artif. Intell., vol. 89, p. 103427, 2014.
2020. [32] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed.
[23] N. Noguchi and O. Barawid, “Robot farming system using multiple Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2006.
robot tractors in japan agriculture,” IFAC Proc. Vol., vol. 44, pp. 633– [33] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian
637, 2016. optimization of machine learning algorithms,” in Adv. Neural Inf.
[24] S. Cubero, F. Albert, J. M. Prats-Moltabán, D. G. Fernández-Pacheco, Process. Syst., Lake Tahoe, NE, USA, 2012, pp. 2951–2959.
J. Blasco, and N. Aleixos, “Application for the estimation of the [34] I. Jolliffe, “Principal components as a small number of interpretable
standard citrus colour index (CCI) using image processing in mobile variables: some examples,” in Principal Component Analysis.
devices,” Biosyst. Eng., vol. 167, pp. 63–74, 2017. Springer, 2002, New York, NY, USA, pp. 63–77.
[25] L. F. Santos, S. Barbon, N. Valous, and D. Fernandes, “Predicting the [35] Y. Liu, Y. Zhou, S. Wen, and C. Tang, “A strategy on selecting
ripening of papaya fruit with digital imaging and random,” Comput. performance metrics for classifier evaluation,” Int. J. Mob. Comput.
Electron. Agric., vol. 145, pp. 76–82, 2018. Multimedia Commun., vol. 6, no. 4, pp. 20–35, 2014.
[26] K. P. Ferentinos, “Deep learning models for plant disease detection and [36] M. De La Rosa and Y. Chen, “A machine learning platform for
diagnosis,” Comput. Electron. Agric., vol. 145, pp. 311–318, 2018. multirotor activity training and recognition,” in 2019 IEEE 14th Int.
[27] D. Jiang, G. Hu, G. Qi, and N. Mazur, “A fully convolutional Symp. Auton. Decentralized Syst. (ISADS), Utrecht, Netherlands,
neural network-based regression approach for effective chemical 2019, pp. 1–8.