Atmosphere
Atmosphere
Article
Particulate Matter Forecasting Using Different Deep Neural
Network Topologies and Wavelets for Feature Augmentation
Stephanie Lima Jorge Galvão 1 , Júnia Cristina Ortiz Matos 1 , Yasmin Kaore Lago Kitagawa 1,2 ,
Flávio Santos Conterato 1 , Davidson Martins Moreira 1 , Prashant Kumar 2
and Erick Giovani Sperandio Nascimento 1,2,3, *
Abstract: The concern about air pollution in urban areas has substantially increased worldwide. One
of its main components, particulate matter (PM) with aerodynamic diameter of ≤2.5 µm (PM2.5 ),
can be inhaled and deposited in deeper regions of the respiratory system, causing adverse effects
on human health, which are even more harmful to children. In this sense, the use of determin-
istic and stochastic models has become a key tool for predicting atmospheric behavior and, thus,
providing information for decision makers to adopt preventive actions to mitigate air pollution
impacts. However, stochastic models present their own strengths and weaknesses. To overcome
some of disadvantages of deterministic models, there has been an increasing interest in the use of
Citation: Galvão, S.L.J.; Matos, J.C.O.;
Kitagawa, Y.K.L.; Conterato, F.S.;
deep learning, due to its simpler implementation and its success on multiple tasks, including time
Moreira, D.M.; Kumar, P.; series and air quality forecasting. Thus, the objective of the present study is to develop and evaluate
Nascimento, E.G.S. Particulate Matter the use of four different topologies of deep artificial neural networks (DNNs), analyzing the impact
Forecasting Using Different Deep of feature augmentation in the prediction of PM2.5 concentrations by using five levels of discrete
Neural Network Topologies and wavelet transform (DWT). The following types of deep neural networks were trained and tested on
Wavelets for Feature Augmentation. data collected from two living lab stations next to high-traffic roads in Guildford, UK: multi-layer
Atmosphere 2022, 13, 1451. https:// perceptron (MLP), long short-term memory (LSTM), one-dimensional convolutional neural network
doi.org/10.3390/atmos13091451
(1D-CNN) and a hybrid neural network composed of LSTM and 1D-CNN. The performance of each
Academic Editors: Yuanqing Zhu, model in making predictions up to twenty-four hours ahead was quantitatively assessed through
Long Liu and Stephan Havemann statistical metrics. The results show that wavelets improved the forecasting results and that discrete
wavelet transform is a relevant tool to enhance the performance of DNN topologies, with special
Received: 29 July 2022
emphasis on the hybrid topology that achieved the best results among the applied models.
Accepted: 1 September 2022
Published: 8 September 2022
Keywords: particulate matter; air pollution; artificial neural networks; deep learning; forecasting;
Publisher’s Note: MDPI stays neutral wavelets
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
1. Introduction
The increase in air pollution in urban areas is a concern on a global scale. Such pollution
Copyright: © 2022 by the authors.
occurs especially due to anthropogenic activities, such as industrialization, the growth of
Licensee MDPI, Basel, Switzerland. urbanization, automotive vehicles powered by fossil fuels and agricultural burning [1].
This article is an open access article According to United Nations, more than half of the world lives in urban regions (around
distributed under the terms and 55%) and this number is increasing, considering some European countries, such as the
conditions of the Creative Commons United Kingdom, with more than 83% of the population living in urban environments, a
Attribution (CC BY) license (https:// figure that continues to increase over time. Consequently, humans have been constantly
creativecommons.org/licenses/by/ exposed to variety of harmful components from many sources, mainly those from road
4.0/). vehicles, which are the dominant source of ambient air pollutants, such as particulate
matter (PM), nitrogen oxide (NOx), carbon monoxide (CO) and volatile organic compounds
(VOCs) [2].
Among these pollutants, PM can be highlighted as one of most critical, as it can cause
numerous adverse effects on human health, such as asthma attacks, chronic bronchitis,
diabetes, cardiovascular disease and lung cancer [3], and it is strongly associated with
respiratory diseases in children [2].
PM is an atmospheric pollutant composed of a mixture of solid and liquid particles sus-
pended in the air [2]. These kinds of particles can be directly emitted through anthropogenic
or non-anthropogenic activities, and they are classified according to their aerodynamic
diameter and their impacts on human health. PM2.5 includes fine particles with a diameter
up to 2.5 µm, which can enter the cardiorespiratory systems. The World Health Orga-
nization (WHO) estimates that long-term exposure to PM2.5 increases long-term risk of
cardiopulmonary mortality by 6% to 13% per 10 µg/m3 of PM2.5 [4]. Furthermore, results
from the European project Aphekom indicate that the life expectancy of the most polluted
cities could be increased by approximately 20 months if long-term exposure to PM2.5 were
reduced to the annual limits established by the WHO [2].
For these reasons, countries have been encouraged to adopt of even more stringent
standards and actions to help control and reduce temporal PM concentrations in urban
environments [4]. Hence, the construction of models that predict the concentration of
this component up to 24 h ahead in densely populated areas with lower computational
complexity and cost arises as a key and strategic tool to assist the monitoring process,
support control and preventive actions to improve air quality and, consequently, reduce
impacts on the health of the population.
Thus, the objective of this work is to build and evaluate the performance of four deep
artificial neural network (DNN) models to predict hourly concentrations of PM2.5 up to
24 h ahead of time, as well as the impact on model performance of applying five-level
discrete wavelet transform (DWT) on the data as a feature augmentation method. The
DNN types applied were multilayer perceptron (MLP), long short-term memory (LSTM),
one-dimensional convolutional neural network (1D-CNN) and a hybrid model (LSTM
with 1D-CNN). To train and test the DNN models, data from densely populated areas in
Surrey County, UK, characterized by high vehicle traffic were used and augmented by the
addition of new features based on the reconstructed detail and approximation signals of
wavelet transform from levels 1 to 5. In order to assess the performance of the deep neural
networks in the prediction task, all results were compared to a linear regression model
as a baseline. Then, they were statistically evaluated according to the following metrics:
mean squared error (MSE), mean absolute error (MAE), Pearson’s r and normalized mean
squared error (NMSE).
This paper is organized into five sections. In Section 1, we introduce the background
and research gaps in the topic areas. In Section 2, we explore the related works in the area
of air pollutant forecasting. In Section 3, we present the case study, data, basic concepts of
DNN, DWT and additional methods used in this work. In Section 4, we present and discuss
the results. Finally, in Section 5, we highlight the main points and present our conclusions,
indicating aspects to be explored in future investigations.
2. Related Works
In recent years, several methods have been applied to the task of forecasting air
pollution components, mainly using statistical, econometric and deep learning models.
Zhang et al. [5] and Badicu et al. [6] assessed the Autoregressive Integrated Moving Average
Model (ARIMA), a powerful statistical model, to predict PM concentrations. The former
used monthly PM2.5 data from the city Fuzhou, China during the period from August 2014
to July 2016 to train the model and predicted the period of July 2016 to July 2017. The
training results presented a mean absolute error (MAE) of 11.4%, with the highest error
values in cold seasons, when the real values from PM2.5 were higher than those predicted by
the model. The latter worked with data from Bucharest, Romania, considering the period
Atmosphere 2022, 13, 1451 3 of 20
of March to May 2019 with a frequency of 15 min to predict PM10 and PM2.5 concentrations.
The results showed that in 89% of cases, the predicted values were under an acceptable
limit of uncertainty. However, this kind of approach has some limitations in long-term
forecasting, as it uses only past data and it has difficulty reaching high peaks, such as in [5],
where it was not able to reach the real peaks of PM2.5 .
Considering these limitations, artificial intelligence (AI) methodologies have been used
to improve forecasting performance due to their ability to learn from complex nonlinear
patterns, their robustness and self-adaptation and their ability to, once correctly trained,
perform predictions with limited computational resources and cost when compared to
other approaches, such as numerical modeling. Reis Jr. et al. [7] analyzed the use of
recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to predict
short-term (24 h) ozone concentration. They compared the performance of CNN, recurrent
neural network long short-term memory (LSTM) and gated recurrent unit (GRU) structures
with a simple multi-layer perceptron (MLP) model. The data were collected between
2001 and 2005 in the region of Vitória in southeastern Brazil. The results showed that the
LSTM topology presented an average performance similar to that of MLP but with slightly
worse results. However, when considering individual time steps, the LSTM presented the
most suitable results for the 9th hour, demonstrating the potential of LSTM for learning
long-term behaviors. Ozone forecasting up to 24 h in advance was also evaluated by
Alves et al. [8] using the same data but comparing only the MLP model with baseline
models: the persistence model and the lasso regression technique. The MLP model proved
to be the most effective according to statistical analyses, outperforming the others in almost
all forecasting steps, except for the 1st hour.
Regarding PM forecasting, the use of MLP topology to forecast PM particles was
investigated by Ahani et al. [9], who compared its performance is with that of the ARIMAX
model (ARIMA with exogenous variables) to predict PM2.5 up to 10 h ahead using different
feature selection methods. The applied data were from Tehran City, the capital of Iran, and
represented a period from 2011 to 2015. The ARIMAX model presented a smaller RMSE in
almost all time steps considered, except for the second and the last time steps, for which
the MLP presented similar results. This shows that, despite its higher capacity, the single
application of artificial neural network (ANN) structures in some data may not outperform
simpler methodologies. Thus, it is possible to assess complementary methodologies to
make them even more robust. Yang et al. [10] used four different DNN topologies to
predict PM2.5 and PM10 , including two hybrid models. The DNNs used were GRU, LSTM,
CNN-GRU and CNN-LSTM. Data from 2015 to 2018 were used to make predictions 15 days
in advance. The results demonstrated that 15-day predictions remained reliable; however,
the most accurate forecasts are up to 7 days in advance. The hybrid models outperformed
the single models for all stations, and the CNN-LSTM model produced the fewest errors.
Despite the research that has been conducted using ANNs to predict air pollution
components, forecasting accuracy depends on the quality of data provided to the model.
This means that the results can still be improved by different representations of data,
which can reveal hidden patterns, as well as the application of feature augmentation tech-
niques. Therefore, various studied involving preprocessing methods for time series, such
as wavelets, have demonstrated the benefits of their application in improving the perfor-
mance of ANNs in the task of forecasting PM concentrations. For instance, Wang et al. [11]
presented the advantages of using hybrid models combining machine learning techniques
and wavelet transforms to predict PM2.5 signal. The prediction was performed 1 h ahead by
decomposition of PM2.5 data in low- and high-frequency components that capture the trend
and noise from the original signal. The temporal resolution of data was the hourly average
concentration in the period from 2016 to 2017. The machine learning methods used were a
backpropagation neural network (BPNN) and a support vector machine (SVM). The results
indicate that hybrid models are more accurate and stable when using wavelets, highlighting
their importance in detecting time and frequency behaviors. Bai et al. [12] also used a
BPNN model based on wavelet decomposition to forecast air pollutant (PM10 , SO2 and
Atmosphere 2022, 13, 1451 4 of 20
Figure 1.1.Location
Figure Locationofofthe
themonitoring
monitoring stations,
stations, represented
represented as numbers:
as numbers: “1” “1” represents
represents StokeStoke
Park Park
LLS,
LLS, and “2” represents Sutherland Memorial Park LLS. (Source: https://livinglabs.iscapepro-
and “2” represents Sutherland Memorial Park LLS. (Source: https://livinglabs.iscapeproject.eu/,
ject.eu/, accessed on 31 August 2022).
accessed on 31 August 2022).
3.2. Artificial
Table NeuralofNetworks
1. Description the available measurements in the dataset from both stations.
ANNs are composed of a basic structure called neurons. These structures are com-
Variable
bined linearly Description
with associated weights, which are assigned with random values at the start
Timethen passed into an activation
of the training, Time of the sample, that
function withinserts
one-minute frequency capable
non-linearities
of modellingTEMP Air temperature
complex relationships. Through the relationcollected at thecomponents
of the basic station and the
HUM Air humidity collected at the station
activation functions, ANNs can assume different topologies.
PRESS Air pressure collected at the station
The ANNs
PM2.5 explored in this paper were MLP,
Concentration LSTM, CNN
of particulate matterand
witha ahybrid model
size ≤2.5 µm with
the aim of improving
CO the results of LSTM and CNN. A brief
Concentration explanation
of carbon monoxide of each model is
presented inNO this
2 section. Concentration of nitrogen dioxide
O3 Concentration of ozone
3.2.1. Multi-Layer Perceptron (MLP)
3.2.1. The
Multi-Layer Perceptron
multi-layer perceptron (MLP)
neural network (MLP) is the simplest artificial neural net-
workThe topology possible. It is basically
multi-layer perceptron neural networka combination
(MLP)ofis multiple perceptrons,
the simplest whichnet-
artificial neural are
the basic
work neuron
topology units. The
possible. It is functioning of each neuron,
basically a combination or perceptron,
of multiple can be
perceptrons, mathemati-
which are the
cally neuron
basic expressed by Equation
units. (1).
The functioning of each neuron, or perceptron, can be mathematically
expressed by Equation (1). !
𝑦 =𝑓 m 𝑤 𝑥 + 𝑏 (1)
y x w = f ∑ wi x i + b (1)
i =1
where yxw is the output of the perceptron, f is the activation function, xi is an attribute or
where yxw is the output of the perceptron, f is the activation function, xi is an attribute or
feature from input data vector x of size m, wi represents each weight from weight vector
feature from input data vector x of size m, wi represents each weight from weight vector
w and b is the bias. In summary, the objective is to determine whether the output of the
w and b is the bias. In summary, the objective is to determine whether the output of the
function (f) triggers (i.e., returns a value other than zero) after summing up the product of
function (f ) triggers (i.e., returns a value other than zero) after summing up the product of
the input features and the weights, which are the parameters that are automatically
the input features and the weights, which are the parameters that are automatically learned
learned through a supervised learning algorithm.
through a supervised learning algorithm.
An MLP is generally composed of three or more fully connected layers. Figure 2 pre-
An MLP is generally composed of three or more fully connected layers. Figure 2
sents a schematic diagram of a typical MLP architecture. At least three layers are required:
presents a schematic diagram of a typical MLP architecture. At least three layers are
an input layer, a hidden layer and an output layer.
required: an input layer, a hidden layer and an output layer.
MLPs are suitable for several applications, with their its main parameters represented
by the number of layers, the activation functions and the number of neurons in each
layer [8], with a flexible topology. The definition of the number of layers and neurons
is variable, and the optimal composition is problem-specific. The number of outputs is
dependent on specific application requirements, permitting multi-step and multivariate
forecasting. The most adequate configuration of these attributes for each of is chosen
mostly empirically for each application. All the connections between MLP layers are of
the forward kind, which means that backward signal propagation is only possible through
a backpropagation algorithm [8]. Although MLPs were not specifically designed to deal
with time series forecasting, due to their simplicity and ability to solve complex problems,
cally for each application. All the connections between MLP layers are of the forward kind
which means that backward signal propagation is only possible through a backpropaga
tion algorithm [8]. Although MLPs were not specifically designed to deal with time series
forecasting, due to their simplicity and ability to solve complex problems, they have been
employed in many studies to predict air pollution components, such as in [5–9,11,22].
Atmosphere 2022, 13, 1451 7 of 20
Figure3.3.LSTM
Figure LSTMcell
cellstructure.
structure. Arrows,
Arrows, squares
squares and circles
and circles represent
represent datapointwise
data flow, flow, pointwise operations
operations
and activation functions, respectively.
and activation functions, respectively.
In an LSTM, the cell state acts as an internal selective memory of the past, represented
in Figure 3 by the horizontal line starting at ct − 1 and ending at ct . The output of an
LSTM cell is represented by h, i.e., the hidden state. The following equations depict the
mathematical procedure of an LSTM cell:
h i
f t = σ W f h ( t −1) , x t + b f (2)
h i
it = σ Wi h(t−1) , xt + bi (3)
in Figure 3 by the horizontal line starting at ct − 1 and ending at ct. The output of an LSTM
cell is represented by h, i.e., the hidden state. The following equations depict the mathe-
matical procedure of an LSTM cell:
𝑓 = 𝜎 𝑊 ℎ( ), 𝑥 +𝑏 (2)
Atmosphere 2022, 13, 1451 8 of 20
𝑖 = 𝜎 𝑊 ℎ( ), 𝑥 +𝑏 (3)
𝐶𝑠
= 𝑆h 𝑊 ℎ( i) , 𝑥 +𝑏 (4)
Cst = S WC h(t−1) , xt + bC (4)
𝐶 = 𝑓 𝐶( ) + 𝑖 𝐶𝑠 (5)
Ct = f t C(t−1) + it Cs (5)
𝑜 = h𝜎 𝑊 ℎ( i) , 𝑥 + 𝑏 (6)
ot = σ Wo h(t−1) , xt + bo (6)
where 𝑓 is the forget gate; 𝑖 is the input gate; 𝐶𝑠 and 𝐶 are the candidates for the cell
state and
where f t isthe
thecell state
forget at timestep
gate; t, respectively;
it is the input 𝑜 isCthe
gate; Cst and t areoutput gate at t; σ
the candidates is the
for the cell
sigmoid
function;
state and the S is the
cell a hyperbolic
state at timesteptangent
t, respectively; ot is𝑊theisoutput
function; the weight
gate atmatrix neurons; ℎ
of xsigmoid
t; σ is the
function;
is the cellS output at t; 𝑥 is the
is the a hyperbolic tangent at t; andW𝑏x isisthe
inputfunction; theweight of x neurons; ht to
matrixcorresponding
bias matrix is x.
the cell output at t; xt is the input at t; and bx is the bias matrix corresponding to x.
3.2.3. Convolutional Neural Networks (CNN)
3.2.3. Convolutional Neural Networks (CNN)
A CNN is a type of neural network that learns patterns from data through the appli-
A CNN is a type of neural network that learns patterns from data through the applica-
cation of convolutions aimed at learning filters that extract the main features from the data
tion of convolutions aimed at learning filters that extract the main features from the data to
to perform
perform a specific
a specific tasktask
(see (see Figure
Figure 4). Thus,
4). Thus, CNNsCNNs are able
are able to learn
to learn spatial
spatial and temporal
and temporal
relationsfrom
relations fromdata
data [7].
[7]. Consequently,
Consequently, CNNs
CNNs are are
ableable to resize
to resize and and automatically
automatically detectdetect
new elements and patterns from data. In addition, pooling layers reduce
new elements and patterns from data. In addition, pooling layers reduce the size of input the size of input
sequence, followed by the application of flattening layers, which adjust
sequence, followed by the application of flattening layers, which adjust the shape of data the shape of data
totoenter
entera afinal
final regular
regular MLP MLPthatthat concludes
concludes the specified
the specified task. CNNs
task. CNNs are widely
are widely appliedapplied
in
in image
image processing
processing [24],their
[24], and andbenefits
their benefits can be
can be either either explored
explored andfor
and assessed assessed for time
time series
series predictions,
predictions, for which forlookback
which lookback is alsoas
is also required required
an inputastoan input
the CNN. to the CNN.
Figure4.4.Typical
Figure Typical CNN
CNN architecture.
architecture.
The
Thefollowing
followingequations mathematically
equations describe
mathematically the convolution
describe layer:layer:
the convolution
] = (𝑛]f ∗=k(𝑓
G [m, n𝐺[𝑚, )[m, n] =𝑛]∑=j ∑i k [ j, i𝑘[𝑗,
∗ 𝑘)[𝑚, ] f [m𝑖]𝑓[𝑚
− j, −
n−𝑗, 𝑛i ]− 𝑖] (7) (7)
[]
C l = 𝐶al =V [𝑎l ] 𝑉 (8) (8)
[ ]
V l =𝑉K l=
·C𝐾
[l −1∙ ]𝐶
+ bl + 𝑏
(9)
(9)
whereG𝐺isisthe
where thefeature
featuremap;
map;f is𝑓 the
is the input;
input; k, m𝑘, and
𝑚 and 𝑛 represent
n represent the kernel,
the kernel, rowsrows
and and
columns of the result matrix, respectively; the indices j and i are related to the kernel; l is l is
columns of the result matrix, respectively; the indices j and i are related to the kernel;
thelayer
the layerindex,
index,V Visisthe
the intermediate
intermediate value;
value; K isKthe
is the tensor
tensor thatthat
has has filters
filters or kernels;
or kernels; C is C is
theresult
the resultofofthe
theconvolution;
convolution; b isb the
is the bias;
bias; andand
a isathe
is the corresponding
corresponding activation
activation function.
function.
Inaddition,
In addition,a apooling
pooling layer
layer can
can be be employed
employed to reduce
to reduce the the dimensionality
dimensionality of theofsize
the of
size of
theoutput
the outputofofthethe convolution
convolution step,
step, e.g.,
e.g., by by extracting
extracting the the maximum
maximum valuevalue (MaxPooling)
(MaxPooling)
or the average value (AvgPooling) from the learned and extracted kernels/filters within a
fixed-size window, thus decreasing the required processing power for network training.
method exploits the advantages of CNNs, extracting the most important multidimen-
sional attributes from data, resizing them and sending them as input to the LSTM layers,
attributes
which from data,
can extract moreresizing them
attributes and sending
related themrelationships.
to temporal as input to theThe
LSTM layers, which
combination of a
can extract more attributes related to temporal relationships. The combination
CNN and LSTM is expected to deliver more reliable predictions. A representation of such of a CNN
an and LSTM is is
architecture expected
shown into Figure
deliver 5,
more
withreliable predictions.
some internal A that
layers representation of such an
allow for connections
architecture is shown in Figure 5, with some internal layers that allow for connections
between the parts. Thus, this architecture will be evaluated along with others models.
between the parts. Thus, this architecture will be evaluated along with others models.
Figure 5. Representation of a CNN-LSTM model considering the inputs, outputs and internal layers.
Figure 5. Representation of a CNN-LSTM model considering the inputs, outputs and internal lay-
ers.3.3. Wavelet Decomposition for Feature Extraction
WT of a time-domain function is a tool that emerged as an improved version of Fourier
3.3.transform.
Wavelet Decomposition for Feature
Fourier transform Extraction
consists of taking a time-domain signal and breaking it into a
WT of asum
weighted time-domain
of sine andfunction is a tool
cosine waves that emerged
to represent as frequency
it in the an improved domainversion
[25].of Fou-
How-
ever, scientists needed a more appropriate function to represent choppy
rier transform. Fourier transform consists of taking a time-domain signal and breaking it signals [26], and
intobeyond that, itsum
a weighted is necessary to overcome
of sine and the problem
cosine waves of the it
to represent window size not changing
in the frequency domainwith [25].
frequency
However, [11]. Wavelet
scientists needed analysis
a morecan work withfunction
appropriate differentto signal temporal
represent resolutions
choppy signalsand[26],
anddifferent
beyondbasis
that, functions, providing
it is necessary a detailed
to overcome thefrequency
problem of assessment
the window of all discontinuities
size not changing
and signal patterns, processing data at different scales.
with frequency [11]. Wavelet analysis can work with different signal temporal resolutions
Despite the algebra involved in the process, the discrete wavelet transform (DWT) of a
and different basis functions, providing a detailed frequency assessment of all disconti-
signal is calculated by multiple applications of high-pass and low-pass filters, as shown
nuities and signal patterns, processing data at different scales.
in Figure 6. The outputs from the former are detail coefficients, and those from the latter
Despite the algebra involved in the process, the discrete wavelet transform (DWT) of
are approximation coefficients. The number of times that filters are used is determined
a signal
by theislevel
calculated by multiplerequired.
of decomposition applications of high-passofand
The combination thelow-pass
two outputs filters, as shown
contains the
in Figure 6. The outputs from the former are detail coefficients, and
same frequency content as the input signal, but the amount of data is doubled. Therefore,those from the latter
a
aredownsampling
approximationprocedurecoefficients. The number
is applied ofoutputs,
to filter times that filters are
as shown used is6 determined
in Figure using a factor by
theoflevel
two.of decomposition required. The combination of the two outputs contains the
same frequency content there
For each feature, as theisinput signal,
a specific but the
wavelet amount
family that of data
most is doubled.represents
satisfactorily Therefore,
a downsampling
the original signal procedure
in terms isofapplied
separatingto filter
more outputs, as shown frequencies.
and less significant in Figure 6 using a factor
To automate
the process of selecting the most suitable wavelet family, Zucatelli et al. [22] proposed
of two.
a method based on the use of RMSE between the original signal and the reconstructed
approximation signal to obtain the most appropriate family for a specific feature. In the
present study, this process was applied to all features considered relevant to the analysis.
The importance of WT in machine learning applications lies in the fact that it permits
the generation of new features using the approximation and detail coefficients from a
pre-determined level of decomposition. The most interesting characteristic of WT is that
its individual functions are localized in time and frequency [27], allowing the data to be
reconstructed in the same length as the original data, which is relevant to improving ANN
model training.
Atmosphere 2022, 13, x FOR PEER REVIEW 10 of 21
Figure 6. Illustration of the wavelet decomposition process. LPF, low-pass filter; HPF, high-pass
Figure 6. Illustration of the wavelet decomposition process. LPF, low-pass filter; HPF, high-pass
filter. The outputs are downsampled by a factor of two. CA, approximation coefficient; CD, detail
filter. The outputs are downsampled by a factor of two. CA, approximation coefficient; CD, detail
coefficient; numbers identify the decomposition level.
coefficient; numbers identify the decomposition level.
3.4. Model Setup
For eachapplying
Before feature, there
data isto aANN
specific wavelet
models, somefamily that most satisfactorily
preprocessing was performed. represents
The
the originaldata
available signal in terms
from the two of stations
separating
were more and less significant
concatenated to providefrequencies.
more data for To the
auto-
mate the process of selecting the most suitable wavelet family, Zucatelli
training step. Latitude, longitude and altitude were added to distinguish the regions, and et al. [22] pro-
posed a method based on the use of RMSE between the original signal
the data were resampled by the average of each hour. Then, five levels of wavelet transform and the recon-
structed approximation
were applied, using thesignal
familytoselection
obtain the mostdescribed
criteria appropriate family
in [22]. Forfor a specific
each feature,feature.
five
Inreconstructed
the present study, thisapproximation
detail and process was applied to allobtained.
signals were features considered relevant to the
Previous studies, such as [8], showed the importance of transforming time variables
analysis.
intoThe
periodical information
importance of WT by employing
in machine trigonometric
learning functions
applications liestoinenable thethat
the fact representa-
it permits
tion of time cycles, which can lead to improved forecasting performance
the generation of new features using the approximation and detail coefficients from of DNN models.
a pre-
Thus, the time variable was converted into periodic sine and cosine
determined level of decomposition. The most interesting characteristic of WT is that with the aim of im-its
proving the ability of the DNN to learn periodic and temporal relationships [8], depicting
individual functions are localized in time and frequency [27], allowing the data to be re-
them in six new features corresponding to the sine and cosine of hours, days and months
constructed in the same length as the original data, which is relevant to improving ANN
according to the following equations:
model training.
2πt a
3.4. Model Setup sinta = sin (10)
f
Before applying data to ANN models, some preprocessing
was performed. The avail-
2πt a
able data from the two stations were cosconcatenated
t a = cos to provide more data for the training (11)
f
step. Latitude, longitude and altitude were added to distinguish the regions, and the data
where
were ta is the value
resampled by theofaverage
the time ofattribute
each hour. being calculated,
Then, i.e.,ofhour
five levels of the
wavelet day, day were
transform of
the month or month of the year; and f is the of possible value of that time
applied, using the family selection criteria described in [22]. For each feature, five recon- attribute in the
corresponding
structed time
detail and scale, i.e., for signals
approximation hour, the number
were of hours in a day (24); for day, the
obtained.
number
Previous studies, such as [8], showed the importanceofofmonths
of days in that month; and for month, the number in a yeartime
transforming (12).variables
As a result, the final dataset was composed by 86 features, 8 of which were the original
into periodical information by employing trigonometric functions to enable the represen-
features and the remainder of which were the preprocessed and augmented features, as
tation of time cycles, which can lead to improved forecasting performance of DNN mod-
previously described. Finally, all variables were scaled to the same range between zero and
els. Thus, the time variable was converted into periodic sine and cosine with the aim of
one so that they all had the same degree of importance.
improvingTablesthe
2–5ability
presentofthe
theconfigurations
DNN to learnofperiodic and temporal
each implemented DNNrelationships
topology. The[8], depict-
number
ing them in six new features corresponding to the sine and cosine of
of neurons at the input for each DNN is related to the amount of features required as input, hours, days and
months
i.e., 86,according
considering to all
thethe
following
featuresequations:
generated by the wavelet transforms, as previously
2𝜋𝑡
explained. In the case of the LSTM, 1D-CNN and the hybrid 1DCNN-LSTM models, a
𝑠𝑖𝑛 = 𝑠𝑖𝑛 (10)
𝑓
Atmosphere 2022, 13, 1451 11 of 20
lookback of three samples was set up for training. The output layer of each DNN was set
to 24, one for each forecasting hour ahead, totaling 24 h. Therefore, to make predictions
once the model was trained, the raw features were collected, as expressed in Table 1, the
time series were resampled for hourly frequency, the time attributes were preprocessed
as detailed in Equations (10) and (11), the corresponding wavelet transforms and levels
were generated, the features between zero and one were scaled and the lookback for each
sample was processed (if working with LSTM, 1D-CNN or 1DCNN-LSTM models). As a
result, the models output the next 24 h of PM2.5 concentrations, given the input.
The training, validation and test datasets were separated prior to the building, vali-
dation and assessment of the models. The training dataset consisted of the concatenation
of the Stoke Park data from February to June, in addition to August and September, and
the Sutherland Memorial Park dataset corresponding to the months of June, in addition to
August to October, both in 2019. From the training dataset, 30% was randomly separated
for validation. The month of July 2019 was separated as the test dataset, corresponding to
Atmosphere 2022, 13, 1451 12 of 20
about 15.38% of the total dataset, and was never seen by the models during the training
and validation of data from both stations. This was done to assess the final performance of
the models in predicting PM2.5 concentrations in order to standardize the tests for the same
period for which data were available for both regions.
Table 6 presents the hyperparameters used to train each DNN. No specific hyperpa-
rameter search technique was implemented, as the primary target was to evaluate different
DNN topologies for the task of forecasting PM2.5 for the next 24 h using WT for feature
augmentation. The parameters were set to be practically the same in order to guarantee
comparativeness between each topology—except for MLP, which required more 100 epochs
than the others models to be successfully trained.
n
1
MAE =
n ∑ |Oi − Fi | (13)
i =1
MSE
N MSE = (14)
Var (O)
∑in=1 Oi − O Fi − F
r= q 2 q n 2 (15)
∑in=1 Oi − O ∑i=1 Fi − F
∑in=1 (Oi − Fi )
R2 = 1 − 2 (16)
∑in=1 Oi − O
where n is the number of samples; Oi is the i-th observed sample; Fi is the corresponding
predicted value; O and F are the average of all observed and predicted values, respectively;
and Var(O) denotes the variance of the O set of observed samples.
In addition, once the models were trained, the prediction intervals for each model and
each forecasting horizon were estimated by applying quantile regression to the errors of the
predictions made in the validation dataset—which, in this case, was used as the calibration
set. To this end, a quantile of q = 0.95 was employed, meaning that the prediction intervals
contained a range of values that should include the actual future value with a probability
of 95% [28]. The prediction intervals were calculated for each forecast horizon in the test
dataset and averaged to generate the final prediction intervals for each model.
After assessing all the prediction results in the test dataset, the model selected as
presenting the best metrics was evaluated to determine whether its predictions differed
in distribution relative to those of the other model, i.e., whether they were statistically
equivalent or not. To this end, the Wilcoxon signed-rank test [29] was employed, as it is
a nonparametric statistical technique for comparing two paired or related samples and
Atmosphere 2022, 13, 1451 13 of 20
determining whether their distributions are equal or not. For a given statistical significance
(α), if the null hypothesis (H0) can be rejected, i.e., if p ≤ α, where p is the calculated p value
according to the test; then, the samples are drawn from different distributions. On the
contrary, if p > α, H0 cannot be rejected, meaning that the samples are drawn from the same
distribution. For this work, α = 0.05 was used.
Table 7. Average results without wavelets considering both stations in the train dataset.
Table 8. Average results with five levels of wavelet transform considering both stations in the
train dataset.
Table 9. Average results without wavelets considering both stations in the test dataset.
Table 10. Average results with five levels of wavelet transform considering both stations in the
test dataset.
Tables 7 and 8 present the metrics for the model results without and with wavelet
transforms in the train dataset, respectively. For the train dataset, the hybrid model 1DCNN-
Atmosphere 2022, 13, 1451 14 of 20
LSTM presented the best results for all metrics. The hybrid model achieved the best results
with features augmented with wavelets, showing that they were key to increasing the
models performance.
Tables 9 and 10 present the results for metrics of the test dataset, showing that the
application of wavelets improved the metrics and, consequently, the results of all topologies,
except for MLP, which obtained worse values, except for R2 . The hybrid model 1DCNN-
LSTM had exhibited the best improvement, with a reduction in MSE of 127.27 to 17.09,
representing a reduction of almost 90% in the test dataset. The hybrid architecture with
wavelets also presented the best values for all metrics considered, whereas 1D-CNN and
LSTM with wavelets demonstrated similar results individually but performed worse than
the hybrid 1DCNN-LSTM architecture. Metrics changed from the training to the test
metric assessment; in general, the application of wavelet transforms either improved the
models’ ability to generalize, as the metrics did not change considerably, presenting with
similar metric levels. This highlights the importance of feature augmentation with wavelet
transform of time series data to improve the learning ability of DNNs due to their capacity
to capture information from the time and frequency domains at the same time and at
different scales.
Tables 11 and 12 present the prediction intervals for each DNN model for all forecasting
horizons evaluated in the test dataset without and with wavelets. The smaller the prediction
interval, the better because the uncertainties in the predictions are smaller, and the interval
is narrower. According to the results, the hybrid model outperformed the others in both
cases, presenting the best prediction interval with the 5-level wavelet decomposition.
Table 11. Average prediction intervals for all forecasting horizons for each model in the test dataset
without wavelets for both stations.
Mean Mean
Prediction Prediction Mean Prediction
Model
Interval (+/−) Interval Lower Prediction Interval Upper
Bound Bound
MLP 18.23 −8.91 9.32 27.55
LSTM 18.17 −8.77 9.40 27.57
1D-CNN 16.23 −6.64 9.59 25.82
1DCNN-LSTM 9.09 3.85 12.95 22.05
Table 12. Average prediction intervals for all forecasting horizons for each model in the test dataset
with the five levels of wavelet decomposition for both stations.
Mean Mean
Prediction Prediction Mean Prediction
Model
Interval (+/−) Interval Lower Prediction Interval Upper
Bound Bound
MLP 13.85 −2.91 10.94 24.80
LSTM 12.09 −3.82 8.27 20.36
1D-CNN 10.76 −2.65 8.11 18.86
1DCNN-LSTM 8.59 0.21 8.79 17.38
Figure 7. Metrics of the hybrid 1DCNN-LSTM model with five-level wavelet decomposition for each
forecasting horizon calculated for Stoke Park (a) and Sutherland (b) stations.
With respect to the NMSE for Stoke Park, the LSTM and MLP models outperformed
the 1DCNN-LSTM model for last-hour forecasting, but 1DCNN-LSTM performed better
for the overall forecasting hours, with a more consistent and robust performance than that
of the other models, including for the Pearson r metric. However, for Sutherland Memorial
Park, the MLP model presented the highest errors for the entire range for NMSE, whereas
the 1DCNN-LSTM model achieved the best performance in almost all steps, except when
LSTM slightly overcame its results for the last forecasting hours. Linear regression and MLP
presented the worst performance in Stoke Park for NMSE and Pearson r, whereas MLP
presented the worst results for Sutherland Memorial Park data for both metrics, followed
by 1DCNN. For both datasets, the smallest error occurred in the first step, increasing along
the forecasting horizon. This behavior occurs due to the decreasing ability of all methods
with respect to longer-term forecasting, which reduces the capacity of the trained DNN
models to make precise inferences about events in farther in the future. Therefore, the
prediction performance decreases as the time horizon increases, making the metrics worse
in the 24th hour. This also provides a basis for future research in the field of deep neural
networks, with the aim of improving the ability of such models to learn and represent
longer-term temporal relationships for multivariate time series forecasting. These results
are related to a unique model trained for both stations at once.
Figures 8–10 show a qualitative analysis of the forecasting behavior for Stoke and
Sutherland parks of the 1st, 12th and 24th hours using the 1DCNN-LSTM model built with
and without wavelet transforms, including the prediction intervals of the model—plotted as
shadowed regions around the predictions. It is possible to notice the qualitative differences
between the observed data and the predictions using the specified model with and without
wavelets. In general, the application of wavelets increased the model’s ability to predict
PM2.5 concentrations. Wavelets contributed to more smoothed and robust predictions,
presenting a behavior closer to the real data, with more precise behavior and less noise,
Atmosphere 2022, 13, 1451 16 of 20
which was not the case without wavelets. This behavior was more evident for Stoke Park
than for Sutherland Memorial Park, where the predictions without wavelets preserved some
characteristics of the original signal but, in general, performed worse than using wavelets.
Atmosphere 2022, 13, x FOR PEER REVIEW 17 ofthe
The results of this analysis are in agreement with the quantitative metrics, reflecting 21
Atmosphere 2022, 13, x FOR PEER REVIEW
lower error values for the approach using 1DCNN-LSTM and wavelet transforms. 17 of 21
Figure 8.
Figure 8. Comparison
Comparison between
between the
the predictions
predictions of
of PMPM2.5 concentrations
concentrations made
made with
with and
and without
without
2.5
wavelets
Figure 8. by 1DCNN-LSTM
Comparison for the
between 1stpredictions
the hour for (a)ofStoke
PM Park
2.5 Station and (b)
concentrations Sutherland
made with andMemorial
without
wavelets by 1DCNN-LSTM for the 1st hour for (a) Stoke Park Station and (b) Sutherland Memorial
Park Station.
wavelets The shadowedfor
by 1DCNN-LSTM regions represent
the 1st hour forthe
(a) prediction
Stoke Parkintervals of the
Station and (b)model.
Sutherland Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
Park Station. The shadowed regions represent the prediction intervals of the model.
Figure 9. Comparison between the predictions of PM2.5 concentrations made with and without
wavelets
Figure 9.by
Figure 9. 1DCNN-LSTM
Comparison
Comparison for the
between
between 12th
the
the hour for (a)
predictions
predictions ofStoke
of 2.5Park
PM2.5
PM Station and made
concentrations
concentrations (b) Sutherland
made andMemorial
with and
with without
without
Park Station.
wavelets by The shadowedfor
1DCNN-LSTM regions
the represent
12th hour the
for (a)prediction
Stoke intervals
Park Station of the
and (b)model.
Sutherland Memorial
wavelets by 1DCNN-LSTM for the 12th hour for (a) Stoke Park Station and (b) Sutherland Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
Park Station. The shadowed regions represent the prediction intervals of the model.
Atmosphere 2022, 13, x FOR PEER REVIEW 18 of 21
Atmosphere
Atmosphere 13,13,
2022,
2022, 1451PEER REVIEW
x FOR 1718
of 20
of 21
Figure 10. Comparison between the predictions of PM2.5 concentrations made with and without
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke Park Station and (b) Sutherland Memoria
Figure
Park 10.
10.Comparison
FigureStation. between
The shadowed
Comparison between the predictions
regions representofof
the predictions the
PMPM 2.5 concentrations made with and without
prediction intervals of the
withmodel.
2.5 concentrations made and without
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke Park ParkStation
Stationand
and(b)
(b) Sutherland
Sutherland Memorial
Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
4.3. Evaluation of the Generalization Ability of the DNN
Park Station. The shadowed regions represent the prediction intervals of the model.
4.3. It is important
4.3.Evaluation
Evaluation ofofthe to evaluate Ability
theGeneralization
Generalization of the of
Ability generalization
of the
theDNN
DNN ability of the DNN, which is demon-
strated by analyzing
ItItisisimportant
important to the evolution
toevaluate
evaluate of the loss (MSE)
of the generalization
of generalization value
ability
ability forDNN,
ofofthe
the each epoch
DNN,whichwhich during
is demon-
is the train-
demon-
ing
strated and
stratedby validation
byanalyzing
analyzing the procedures,
the evolution using
evolutionofofthe the
theloss
loss portion
(MSE)
(MSE) value of
value the
forfor data
each separated
epoch
each during
epoch for each
the training
during purpose
the train-
The
and
ing and aim
validationis to procedures,
validation evaluate whether
procedures,using the
the
using model
portion
the of performs
portion theofdata wellseparated
the separated
data during training
for each each with
forpurpose. Thethe same
purpose.
aim is
behavior to evaluate
both in whether
the the
trainingmodel
and performs
validation well during
sets. If thetraining
The aim is to evaluate whether the model performs well during training with the same model with the
presents same behavior
different results of
both in
loss along the training and validation sets. If the model presents different results of loss
behavior boththein epochs,
the trainingit may
andbe suffering
validation some
sets. If thesort of under-
model presentsor different
overfitting, depending
results of
along
on the the epochs, it
behavior ofmay
the be
losssuffering
curve some sort of
measured at under-
each or overfitting,
epoch for each depending
set. on the
loss along the epochs, it may be suffering some sort of under- or overfitting, depending
behavior of the loss curve measured at each epoch for each set.
on the Figure
behavior 11ofpresents a graphical
the loss curve measured evolution,
at each showing
epoch for that eachtheset. model generalizes well
Figure 11 presents a graphical evolution, showing that the model generalizes well, pre-
presenting
Figure no
11 overfitting
presents a or underfitting,
graphical evolution, as the
showing loss of
that
senting no overfitting or underfitting, as the loss of both training and validation
both
the training
model and validation
generalizes
presentedwell, pre-
sented
the samethe
presenting nosame convergence
overfitting
convergence behavior,
or underfitting,
behavior, and and
PM2.5aspredictionsPM2.5
the loss ofinpredictions
both training
the test inand
dataset, the test had
dataset,
validation
which pre-which
not
sented
had notthe same
been convergence
seen before by behavior,
the model, and
been seen before by the model, were successfully performed. werePM predictions
successfully
2.5 in the
performed.test dataset, which
had not been seen before by the model, were successfully performed.
Figure 11.
Figure Training and validation 1DCNN-LSTM loss graph. “Loss”“Loss”
and “val_loss” representrepresent
the
Figure 11.11. Training
Training andand validation
validation 1DCNN-LSTM
1DCNN-LSTM loss loss graph.
graph. and “val_loss”
“Loss” and “val_loss” represent the the
evolution
evolution of the loss measured at each epoch for the training and validation datasets, respectively.
evolution of of
thethe loss
loss measured
measured at each
at each epoch
epoch for training
for the the training and validation
and validation datasets,
datasets, respectively.
respectively.
4.4.
4.4. Assessment
Assessment of the
of the Statistical
Statistical Difference
Difference of Predictions
of the the Predictions
AsAs presented
presented in in Section
Section 3.5,3.5,
thethe Wilcoxon
Wilcoxon signed-rank
signed-rank test was
test was employed
employed to assess
to assess
whether
whether thethe models’
models’ predictions
predictions differed
differed in terms
in terms of distribution
of distribution and whether
and whether they were
they were
Atmosphere 2022, 13, 1451 18 of 20
Table 13. Wilcoxon signed-rank test assessment of the predictions of the hybrid 1DCNN-LSTM model
against the other models with five-level wavelet transforms. If H0 is rejected, i.e., if p ≤ α, α = 0.05,
the distributions are different.
Stoke
Model Sutherland p-Value Test Result
p-Value
MLP 0.00 0.00 Different distribution
LSTM 0.00 0.04 Different distribution
1D-CNN 0.00 0.00 Different distribution
5. Conclusions
In the present study, we systematically evaluated different deep learning models, along
with WT, to predict the concentration of PM2.5 up to 24 h ahead in two open-road regions
of Surrey, UK, characterized by the proximity of parks where children and adults perform
recreational activities by the high vehicle traffic, which are relevant factors with respect
to air pollution monitoring and assessment. The methodology implemented consisted of
developing and validating the use of deep learning associated with WT and comparing the
results of the tested models with those of simpler methodologies. Different deep neural
network topologies were implemented, namely MLP, LSTM, 1D-CNN and 1DCNN-LSTM,
with and without WT, along with a linear regressor model as a baseline. The results showed
that the best performance was achieved by the 1DCNN-LSTM model among all other DNN
architectures, with WT applied on the time series data. The final deep neural network
model captured the real data behavior and presented a good generalization of the problem
in test data, despite being related to a period of data that was never seen by the model
during the training and validation.
WT was implemented with the aim of decomposing the original time-series signals
into several low- and high-frequency components, extracting some information from the
data that was not yet available. This increased the results of all deep neural networks,
which is in line with other previously developed studies [12,13,22]. Our results highlight
the positive impact of with respect to improving DNN performance and how this approach
is appropriate to deal with complex problems.
Thus, this methodology proved to have a great potential for use in by academics,
authorities, industry and society to construct and validate deep learning models to predict
hour PM2.5 concentrations in advance for the next 24 h with good performance. This
research provides a solid basis for understanding, developing, and evaluating deep learning
models for this task, enabling the adoption of preventive or mitigation actions when
necessary, such as alerting people to avoid highly polluted areas when the predictions of
PM2.5 concentrations reach hazardous levels, avoiding imminent health risks associated
with exposure to air pollutants.
In future studies, this methodology can be assessed in other places and scenarios
under varying conditions to verify its robustness. Furthermore, other deep neural network
approaches and models can be implemented, such as transformers or physics-informed
neural networks (PINNs), including feature augmentation methodologies, to assess their
capability of predicting long-term PM2.5 concentrations with high fidelity.
Atmosphere 2022, 13, 1451 19 of 20
Author Contributions: Conceptualization, E.G.S.N. and P.K.; methodology, S.L.J.G. and E.G.S.N.;
software, S.L.J.G., J.C.O.M., Y.K.L.K. and F.S.C.; validation, S.L.J.G., E.G.S.N. and D.M.M.; formal
analysis, S.L.J.G., J.C.O.M., Y.K.L.K. and E.G.S.N.; investigation, S.L.J.G., D.M.M. and E.G.S.N.;
resources, P.K., D.M.M. and E.G.S.N.; data curation, S.L.J.G. and Y.K.L.K.; writing—original draft
preparation, S.L.J.G., J.C.O.M., Y.K.L.K. and F.S.C.; writing—review and editing, S.L.J.G., P.K., D.M.M.
and E.G.S.N.; visualization, S.L.J.G. and E.G.S.N.; supervision, E.G.S.N.; project administration,
E.G.S.N.; funding acquisition, P.K., D.M.M. and E.G.S.N. All authors have read and agreed to the
published version of the manuscript.
Funding: This work was partially supported by the Bahia State Research Support Foundation
(Fundação de Amparo Pesquisa do Estado da Bahia—FAPESB, Brazil) at SENAI CIMATEC, under
project nº CNV 0002/2015. The authors thank the Reference Center on Artificial Intelligence (CRIA)
and the Supercomputing Center for Industrial Innovation (CS2i), both from SENAI CIMATEC, as
well as the NVIDIA/CIMATEC AI Joint Lab, for infrastructure, technical and scientific support. The
authors also thank the iSCAPE (Improving Smart Control of Air Pollution in Europe) project, which
was funded by the European Community’s H2020 Programme (H2020-SC5-04-2015) under Grant
Agreement No. 689954, as well as the team from the University of Surrey’s Global Centre for Clean
Air Research (GCARE), United Kingdom, for providing the data.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data are publicly available at https://www.iscapeproject.eu/,
accessed on 31 August 2022.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Doreswamy, K.S.; Harishkumar, K.M.; Gad, I. Forecasting Air Pollution Particulate Matter (PM2.5 ) Using Machine Learning
Regression Models. Procedia Comput. Sci. 2020, 171, 2057–2066. [CrossRef]
2. World Health Organization. Health Effects of Particulate Matter, Policy Implications for Countries in Eastern Europe, Caucasus
and Central Asia; World Health Organization. Regional Office for Europe. Available online: https://apps.who.int/iris/handle/10
665/344854 (accessed on 31 August 2022).
3. World Health Organization. Air Pollution, The United Nations. Available online: https://www.who.int/health-topics/air-
pollution#tab=tab_2 (accessed on 31 August 2022).
4. World Health Organization. Occupational and Environmental Health Team, Air Quality Guidelines for Particulate Matter, Ozone,
Nitrogen Dioxide, and Sulfur Dioxide: Global Update 2005: Summary of Risk Assessment; World Health Organization: Geneva,
Switzerland, 2006.
5. Zhang, L.; Lin, J.; Qiu, R.; Hu, X.; Zhang, H.; Chen, Q.; Tan, H.; Lin, D.; Wang, J. Trend analysis and forecast of PM2.5 in Fuzhou,
China using the ARIMA model. Ecol. Indic. 2018, 95, 702–710. [CrossRef]
6. Badicu, A.; Suciu, G.; Balanescu, M.; Dobrea, M.; Birdici, A.; Orza, O.; Pasat, A. PMs concentration forecasting using ARIMA algo-
rithm. In Proceedings of the IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020.
[CrossRef]
7. Reis, A.S., Jr.; Nascimento, E.G.S.; Moreira, D.M. Assessing recurrent and convolutional neural networks for tropospheric ozone
forecasting in the region of Vitória, Brazil. WIT Trans. Ecol. Environ. 2020, 244, 101–112. [CrossRef]
8. Alves, L.V.B.; Nascimento, E.G.S.; Moreira, D.M. Hourly tropospheric ozone concentration forecasting using deep learning. WIT
Trans. Ecol. Environ. 2019, 236, 129–138. [CrossRef]
9. Ida, K.A.; Majid, S.; Alireza, S. Statistical models for multi-step-ahead forecasting of fine particulate matter in urban areas. Atmos.
Pollut. Res. 2019, 10, 689–700. [CrossRef]
10. Yang, G.; Lee, H.; Lee, G. A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea.
Atmosphere 2020, 11, 348. [CrossRef]
11. Wang, P.; Zhang, G.; Chen, F.; He, Y. A hybrid-wavelet model applied for forecasting PM2.5 concentrations in Taiyuan city, China.
Atmos. Pollut. Res. 2019, 10, 1884–1894. [CrossRef]
12. Bai, Y.; Li, Y.; Wang, X.; Xie, J.; Li, C. Air pollutants concentrations forecasting using back propagation neural network based on
wavelet decomposition with meteorological conditions. Atmos. Pollut. Res. 2016, 7, 557–566. [CrossRef]
13. Qiao, W.; Tian, W.; Tian, Y.; Yang, Q.; Wang, Y.; Zhang, J. The forecasting of PM2.5 using a hybrid model based on wavelet
transform and an improved deep learning algorithm. IEEE Access 2019, 7, 142814–142825. [CrossRef]
14. Huang, C.-J.; Kuo, P.-H. A deep cnn-lstm model for particulate matter (PM2.5 ) forecasting in smart cities. Sensors 2018, 18, 2220.
[CrossRef]
Atmosphere 2022, 13, 1451 20 of 20
15. Li, T.; Hua, M.; Wu, X. A Hybrid CNN-LSTM Model for Forecasting Particulate Matter (PM2.5 ). IEEE Access 2020, 8, 26933–26940.
[CrossRef]
16. Zohre, E.K.; Ruhollah, T.M.; Mohamad, K.; Ali, R.N. Predicting the ground-level pollutants concentrations and identifying the
influencing factors using machine learning, wavelet transformation, and remote sensing techniques. Atmos. Pollut. Res. 2021,
12, 101064. [CrossRef]
17. Mirzadeh, S.M.; Nejadkoorki, F.; Mirhoseini, S.A.; Moosavi, V. Developing a wavelet-AI hybrid model for short- and long-term
predictions of the pollutant concentration of particulate matter10. Int. J. Environ. Sci. Technol. 2022, 19, 209–222. [CrossRef]
18. Liu, B.; Yu, X.; Chen, J.; Wang, Q. Air pollution concentration forecasting based on wavelet transform and combined weighting
forecasting model. Atmos. Pollut. Res. 2021, 12, 101144. [CrossRef]
19. Kim, J.; Wang, X.; Kang, C.; Yu, J.; Li, P. Forecasting air pollutant concentration using a novel spatiotemporal deep learning model
based on clustering, feature selection and empirical wavelet transform. Sci. Total Environ. 2021, 801, 149654. [CrossRef]
20. Araujo, M.L.S.; Kitagawa, Y.K.L.; Moreira, D.M.; Nascimento, E.G.S. Forecasting Tropospheric Ozone Using Neural Networks
and Wavelets: Case Study of a Tropical Coastal-Urban Area. In Computational Intelligence Methodologies Applied to Sustainable
Development Goals; Studies in Computational Intelligence; Verdegay, J.L., Brito, J., Cruz, C., Eds.; Springer: Cham, Switzerland,
2022; Volume 1036. [CrossRef]
21. Abhijith, K.V.; Prashant, K. Field investigations for evaluating green infrastructure effects on air quality in open-road conditions.
Atmos. Environ. 2019, 201, 132–147. [CrossRef]
22. Zucatelli, P.J.; Nascimento, E.G.S.; Santos, A.Á.B.; Arce, A.M.G.; Moreira, D.M. An investigation on deep learning and wavelet
transform to nowcast wind power and wind power ramp: A case study in Brazil and Uruguay. Energy 2021, 230, 120842.
[CrossRef]
23. Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM) neural network for flood forecasting. Water
2019, 11, 1387. [CrossRef]
24. Paolo, A.; Adriano, B.; Maide, B.; Luigi, F. Image processing for medical diagnosis using CNN. Nucl. Instrum. Methods Phys. Res.
Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2003, 497, 174–178. [CrossRef]
25. National Instruments. Understanding FFTs and Windowing, Technical Report. Available online: https://www.ni.com/pt-br/
innovations/white-papers/06/understanding-ffts-and-windowing.html (accessed on 31 August 2022).
26. Graps, A. An Introduction to Wavelets. IEEE Comput. Sci. Eng. 1995, 2, 50–61. [CrossRef]
27. Sifuzzaman, M.; Islam, M.R.; Ali, M.Z. Application of Wavelet Transform and its Advantages Compared to Fourier Transform.
J. Phys. Sci. 2009, 13, 121–134.
28. Hoshmand, A.R. Business Forecasting: A Practical Approach, 2nd ed.; Routledge: New York, NY, USA, 2010; ISBN 978-1592576128.
29. Corder, G.W.; Foreman, D.I. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, 1st ed.; Wiley: New York, NY,
USA, 2009; ISBN 978-0470454619.