Bee GP
Bee GP
net/publication/369041399
CITATIONS READS
0 172
3 authors:
SEE PROFILE
All content following this page was uploaded by Thi-Thu-Hong Phan on 07 March 2023.
1 INTRODUCTION
It can be affirmed that honey bees (Apis mellifera) make a significant contribution to the sustainable
agricultural development and economic growth of many countries around the world. For agriculture,
honey bees are considered one of the most crucial insects. These extremely valuable insects are
treated as the key factor in the plant’s pollination. Approximately 35% of the world’s cultivated
crops depend on some variety of bees [3]. From an economic point, products of honey bee such as
honey, pollen, royal jelly and beeswax have high nutritional value, good for health and bring great
commercial value 1 .
Thanks to the great values for our real life, monitoring the health of the honey bee is a vital
task for beekeepers in many countries. It could be the check if the queen bee is in the hive, the
check for the presence of pests or diseases such as bacteria or mites infections (e.g. Varroa mites),
or the early detection of swarming. To do this, traditionally, beekeepers usually open and close
the hive to look inside to examine the status of a beehive. This method has many disadvantages,
∗ Both authors contributed equally to this research.
1 https://www.statista.com/statistics/933928/global-market-value-of-honey
Authors’ addresses: Hien Nguyen Thi, [email protected], LeQuyDon Technical University, 236 Hoang Quoc Viet str,
Hanoi; Thi-Thu-Hong Phan, [email protected], FPT University, Danang; Cao Truong Tran, [email protected],
LeQuyDon Technical University, Vietnam.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2022 Association for Computing Machinery.
XXXX-XXXX/2022/12-ART $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
for example, it can cause stress, tension, or panic in the bees. In particular, there are some hive
problems that are only discovered by experienced beekeepers. This is the impetus to find modern
technology-based approaches for remotely monitoring beehives to reduce the disadvantages of
traditional methods. Applying new technologies and new methods, a system of the internet of
things (IoT) is mounted to the hive to continuously collect information about honey bees, and
then these data will be processed and analyzed by ML methods to detect the status, behavior, and
activities of the honey bees.
Among the collected data used for monitoring beehives, bee sound is one of the important data
containing information regarding the health and behavior of the bees such as feeling airborne
toxicants, missing queen or swarming [4–6]. The first basic task of any sound-based hive monitoring
technology is to recognize bee sounds and distinguish them from non-bee sounds. Non-bee sounds
are often related to the ambient environment and events occurring in the surroundings of the hive
such as urban sounds, rain or other animals such as crickets. Once the system fails to distinguish
bee sounds from non-bee sounds, all subsequent work related to data analysis for specific problems
will fail. Therefore, the distinguishing of bee sound or non-bee sound has received the attention of
many studies.
During the past decades, ML methods have achieved remarkable results in many areas including
the monitoring of beehive [14] [5] [16]. Genetic programming (GP) is an interesting ML approach
for solving a wide range of issues because to the flexibility and expressiveness of computer program
representation paired with the tremendous powers of evolutionary search. The success of the GP
method in many different studies has led us to apply this approach for the challenging task of
classifying bee sound samples in this study.
The rest of the paper is organized as follows: The next section overviews the related works in
the literature. The proposed method, GP, is presented in section 3. A description about the dataset
and the settings for experiments are given in section 4. Section 5 presents the experiment results
and provides a comparison of GP with other approaches. The paper is finally concluded with a
summary of the most important points and future works.
2 RELATED WORK
In recent years, machine learning methods have made great contributions in bee automated
monitoring systems. For example, Nolasco et al. have applied the ML methods to automatically
recognize different states in a beehive [14]. The authors investigated the support vector machine
(SVM) and convolutional neural networks (CNN) for beehive state recognition, using audio data
of beehives. Results pointed out that the potential of ML methods for generalizing the system to
new hives. In [5], Cejrowski et al. used SVM algorithm to detect the swarming in beehive based on
acquisited bee sound signals. In [16], Zgank applied deep neural networks and HMM to classify
acoustic swarm based on the audio data collected by an IoT system. The results showed that for
the bee activity acoustic monitoring, the DNN models outperformed the results obtained from the
HMM model. Nolasco and Benetos [13] utilized the SVM and CNN algorithms for beehive sound
identification.
Among ML methods, genetic programming (GP) is an algorithm proposed for solving a wide
range of classifier problems. This method is based on the Darwinian concepts in which natural
selection and recombination are used to develop a population of programs toward an appropriate
solution devoted to specific challenges. GP has been employed in various representations such
as classification rule sets and decision tree classifiers [8], linear and graph classifiers [1]. This
method was also used to create a new type of classifier representation [10, 15, 17]- numerical
expression (tree-like) classifiers which has been successfully used to real-world classification issues
such identifying and distinguishing certain classes of objects in images. This shows the promising
ability of GP as a generic technique for dealing with classification problems. In addition to the
basic GP, combining GP with other strategies to solve the classification problem is also broadly
studied. Bhowan et al. [2] suggested a multi-objective genetic programming (MOGP) technique
for developing accurate and diversified ensembles of genetic program classifiers that perform well
on both the minority and majority of classes. In [12], the authors investigated the performance
of developing diverse ensembles using GP for software defect prediction with imbalanced data,
employing colonization and migration operators as well as three ensemble selection techniques for
the multi-objective evolutionary algorithm.
Motivated by the success of GP model in various domains, we propose to apply this approach
for identifying bee sound samples in the beehive.
3 PROPOSED METHOD
In this section, we present our approach using Genetic Programming (GP) for the task of bee audio
classification (BAC). We first provide a justification for employing an evolutionary strategy, and
then we describe the functioning of the GP algorithm in broad terms. Finally, we describe our
modeling approach to the BAC challenge.
GP [7] is a popular evolutionary technique for solving difficult optimization and learning issues
when the search space for an optimal solution is extremely huge. This method simulates the
development of a population of people in accordance with Darwin’s theory of survival of the
fittest. Each individual is often represented as a tree, which consists of terminals (leaf nodes) and
non-terminals (functions, for instance, arithmetic or logic operators, see Fig. 1). Different steps of
GP for finding solutions are indicated in Algorithm 1.
𝐶𝑙𝑎𝑠𝑠1, 𝑓 (𝑋𝑖 ) ⩽ 𝜃 1
𝑐𝑙𝑎𝑠𝑠 (𝑋𝑖 ) = 𝐶𝑙𝑎𝑠𝑠2, 𝜃 1 < 𝑓 (𝑋𝑖 ) ⩽ 𝜃 2 (1)
𝐶𝑙𝑎𝑠𝑠3, 𝑓 (𝑋𝑖 ) > 𝜃 3
where, 𝑛 refers to the number of sample, 𝑓 is the GP-expression evolved, 𝑓 (𝑋𝑖 ) is the output value,
and 𝜃 1, 𝜃 2 are dynamic pre-defined class’s boundaries by Eq. 2:
Ñ
𝑖<𝑚 𝑇 (𝐷𝑇 𝑟𝑎𝑖𝑛 )
𝑖
𝜃 𝑖 = 𝑚𝑖𝑛(𝐷𝑇 𝑟𝑎𝑖𝑛 ) + (2)
|𝑇 (𝐷𝑇 𝑟𝑎𝑖𝑛 )|
Finally, the fitness of a program is calculated as Eq 3.
𝑛
∑︁
𝑓 (𝑇 ) = 𝑀 (𝑇𝑖 )) (3)
𝑖=1
where 𝑀 is a Boolean function, if sample 𝑇𝑖 is correctly classified it will be equal to 1 and otherwise
equal to 0.
3.4 Elitism
The purpose of elite selection is to keep the fittest chromosomes. Furthermore, when the population
is updated, it is largely used to guarantee that high-quality chromosomes are not missing in the
future. We use the elitism strategy to maintain our greatest fitness ratings. After the elitist stage is
completed, we proceed the following phase which is the creation of a new population.
Selection. The tournament selection technique is used to choose parents for the next generation
of children. To begin, we form a group of individuals drawn at random from the present population.
Moreover, the best candidate based on a fitness evaluation is chosen as the first parent to breed a
new individual, and the second parent is selected in the same manner.
Crossover. The purpose of the crossover operator is to create a new offspring from a parent pair.
It creates new children from the portions of each parent; hence, crossover causes variety within the
population. To begin the execution of normal crossover, two parents are picked using a selection
procedure. Following, one sub-tree is picked at random in each parent. If the two selected sub-trees
meet the conditions (depth of the resulting offspring, syntactic closure property, and so on), the
crossover operation swaps them. The additional children are then included in the next generation
[10].
Mutation. Beginning, a mutation point is picked at random. The sub-tree rooted at the mutation
point is then evicted. The outgoing sub-tree is replaced by a randomly generated one. The mutation
operator is employed to keep the population’s genetic variety. The crossover rate remains high,
whereas the mutation rate remains low. The lower mutation rate keeps the population unpredictable
and prevents chromosomal repetition, but the higher crossover rate prevents the optimum local
solution from converging too quickly.
4 EXPERIMENTAL SETUP
4.1 Data set
In this study, we use bee sound dataset introduced in [9]. The data were collected from April 2017
to September 2017 in two honey beehives on two bee farms, far from 17 km each other. The 30
seconds of audio data were acquired by a multi-sensor EBM in different beehives every 15 minutes.
Each 30-second audio file in .wav format was then cut into 2-second with 1-second overlap. Each of
these audio samples was then labeled as belonging to one of three classes: buzzing bees (B), crickets
chirping (C), and ambient noise (N) [9]. In this paper, we focus on the dataset BUZZ2. This dataset
consists of 9914 audio samples. The training and testing datasets include 7582 samples taken from
hive 1.1 and 2332 samples taken from hive 2.1 used for the validation. The validation dataset is
completely separate in terms of hive and hive location. This makes challenges for machine learning
methods in the task of bee sound recognition. The detail of the BUZZ2 dataset is presented in the
table 1:
the seed of the pseudo-random number generator is different. Besides, we also apply the elitism in
our evolutionary process which is to reproduce the best individual for the following generation.
Parameter Value
Function set +, -, x, / (protected division), sin, cos, sqrt, log
Variable terminals all features
Constant terminals Random float values
Population size 1024
Initialization Samped half-and-half
Generations 50
Crossover probability 60%
Mutation probability 30%
Reproduction rate 10%
Selection type Tournament (size=7)
three models even are better than the best DL model (namely, RawConvNet 1 having an accuracy
of 95.7%) in [9]. When comparing the DL and ML models on the test set, we see that the two models
LR (with an accuracy of 89%) and SVM (82.3%) have poor results, they have lower performance than
all the ConvNets 1, 2, and 3 (94.9%, 95.5% and 94.6% corresponding), and RawConvNet 1 models
(95.7%). In general, all methods produce high accuracy on the test set excepting the SVM (one vs
all) model, which has the worst result.
However, the picture is different on the BUZZ2 validation set, where it is more challenging.
The validation samples are completely separated from the training samples by beehive, location,
time, and bee race. In terms of accuracy, nearly all models based on DL generalized better than
ML counterparts. All the DL models in [9] correctly predict labels on the validation test at least
83.2%. In contrast, ML models yield lower results than DL models, except KNN gives better results
than 3 DL models (ConvNet 1, 2, 3). However, looking at the table 3, GP once again proves the
capacity for the identification of bee sound samples. This method achieves the better accuracy than
all compared ML methods, and comparable results with the best DL model (RawConvNet 1) in [9].
Table 3. The accuracy of deep learning models and machine learning models on Test and Validation sets of
BUZZ2 dataset .
models can be obtained by analyzing the distribution from the given data to identify the best of
these models that depend on the context.
When comparing two models, the model with the lower error is generally considered better. And
when two models having the same error, the model with lower complexity is typically selected in
the belief that it would be able to predict future data well. The experimental results clearly show
that GP is a suitable method for the identification of bee sound samples. This method has a better
generalizability characteristic, takes less execution time, and is simple algorithmic complexity.
REFERENCES
[1] Wolfgang Banzhaf, Frank D. Francone, Robert E. Keller, and Peter Nordin. 1998. Genetic Programming: An Introduction:
On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA.
[2] Urvesh Bhowan, Mark Johnston, Mengjie Zhang, and Xin Yao. 2013. Evolving Diverse Ensembles Using Genetic
Programming for Classification With Unbalanced Data. IEEE Transactions on Evolutionary Computation 17, 3 (2013),
368–386. https://doi.org/10.1109/TEVC.2012.2199119
[3] T. D. Breeze, A. P. Bailey, K. G. Balcombe, and S. G. Potts. 2011. Pollination services in the UK: How important are
honeybees? Agriculture, Ecosystems & Environment 142, 3 (Aug. 2011), 137–143.
[4] Jerry J Bromenshenk, Colin B Henderson, Robert A Seccomb, Steven D Rice, and Robert T Etter. 2009. Honey bee
acoustic recording and analysis system for monitoring hive health. US Patent 7,549,907.
[5] Tymoteusz Cejrowski, Julian Szymański, Higinio Mora, and David Gil. 2018. Detection of the Bee Queen Presence
Using Sound Analysis. In Intelligent Information and Database Systems (Lecture Notes in Computer Science), Ngoc Thanh
Nguyen, Duong Hung Hoang, Tzung-Pei Hong, Hoang Pham, and Bogdan Trawiński (Eds.). Springer International
Publishing, Cham, 297–306. https://doi.org/10.1007/978-3-319-75420-8_28
[6] Sara Ferrari, Mitchell Silva, Marcella Guarino, and Daniel Berckmans. 2008. Monitoring of swarming sounds in bee
hives for early detection of the swarming period. Computers and electronics in agriculture 64, 1 (2008), 72–77.
[7] John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press,
Cambridge, MA, USA.
[8] John R. Koza. 1994. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA,
USA.
[9] Vladimir Kulyukin, Sarbajit Mukherjee, and Prakhar Amlathe. 2018. Toward Audio Beehive Monitoring: Deep Learning
vs. Standard Machine Learning in Classifying Beehive Audio Samples. Applied Sciences 8, 9 (Sept. 2018), 1573.
[10] T. Loveard and V. Ciesielski. 2001. Representing classification problems in genetic programming. In Proceedings of the
2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546), Vol. 2. 1070–1077 vol. 2. https://doi.org/10.1109/
CEC.2001.934310
[11] Sean Luke, Liviu Panait, Gabriel Balan, Sean Paus, Zbigniew Skolicki, Jeff Bassett, Robert Hubley, and A Chircop. 2006.
Ecj: A java-based evolutionary computation research system. Downloadable versions and documentation can be found
at the following url: http://cs. gmu. edu/eclab/projects/ecj (2006).
[12] Goran Mauša and Tihana Galinac Grbac. 2017. Co-evolutionary multi-population genetic programming for classification
in software defect prediction: An empirical case study. Applied Soft Computing 55 (June 2017), 331–351. https:
//doi.org/10.1016/j.asoc.2017.01.050
[13] Inês Nolasco and Emmanouil Benetos. 2018. To bee or not to bee: Investigating machine learning approaches for
beehive sound recognition. arXiv preprint arXiv:1811.06016.
[14] Inês Nolasco, Alessandro Terenzi, Stefania Cecchi, Simone Orcioni, Helen L Bear, and Emmanouil Benetos. 2019.
Audio-based identification of beehive states. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP). IEEE, 8256–8260.
[15] A. Song, V. Ciesielski, and H.E. Williams. 2002. Texture classifiers generated by genetic programming. In Proceedings
of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Vol. 1. 243–248 vol.1. https://doi.org/10.
1109/CEC.2002.1006241
[16] Andrej Zgank. 2019. Bee Swarm Activity Acoustic Classification for an IoT-Based Farm Service. Sensors 20, 1 (Dec.
2019), 21.
[17] Mengjie Zhang and Victor Ciesielski. 1999. Genetic Programming for Multiple Class Object Detection. In Advanced
Topics in Artificial Intelligence, Norman Foo (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 180–192.
[18] Mengjie Zhang, Victor B. Ciesielski, and Peter Andreae. 2003. A Domain-Independent Window Approach to Multiclass
Object Detection Using Genetic Programming. EURASIP Journal on Advances in Signal Processing 2003, 8 (Dec. 2003),
1–19. https://doi.org/10.1155/S1110865703303063 Number: 8 Publisher: SpringerOpen.