0% found this document useful (0 votes)

22 views20 pages

Atmosphere

Uploaded by

mhdyasaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views20 pages

Atmosphere

Uploaded by

mhdyasaman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

atmosphere

Article
Particulate Matter Forecasting Using Different Deep Neural
Network Topologies and Wavelets for Feature Augmentation
Stephanie Lima Jorge Galvão 1 , Júnia Cristina Ortiz Matos 1 , Yasmin Kaore Lago Kitagawa 1,2 ,
Flávio Santos Conterato 1 , Davidson Martins Moreira 1 , Prashant Kumar 2
and Erick Giovani Sperandio Nascimento 1,2,3, *

1 Computational Modeling Department, SENAI CIMATEC, Av. Orlando Gomes, n. 1845,

Salvador 41650-010, Bahia, Brazil
2 Global Centre for Clean Air Research (GCARE), School of Sustainability, Civil and Environmental
Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
3 Surrey Institute for People-Centred Artificial Intelligence, School of Computer Science and Electronic
Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK
* Correspondence: [email protected] or [email protected]

Abstract: The concern about air pollution in urban areas has substantially increased worldwide. One
of its main components, particulate matter (PM) with aerodynamic diameter of ≤2.5 µm (PM2.5 ),
can be inhaled and deposited in deeper regions of the respiratory system, causing adverse effects
on human health, which are even more harmful to children. In this sense, the use of determin-
istic and stochastic models has become a key tool for predicting atmospheric behavior and, thus,
providing information for decision makers to adopt preventive actions to mitigate air pollution
impacts. However, stochastic models present their own strengths and weaknesses. To overcome
some of disadvantages of deterministic models, there has been an increasing interest in the use of
Citation: Galvão, S.L.J.; Matos, J.C.O.;
Kitagawa, Y.K.L.; Conterato, F.S.;
deep learning, due to its simpler implementation and its success on multiple tasks, including time
Moreira, D.M.; Kumar, P.; series and air quality forecasting. Thus, the objective of the present study is to develop and evaluate
Nascimento, E.G.S. Particulate Matter the use of four different topologies of deep artificial neural networks (DNNs), analyzing the impact
Forecasting Using Different Deep of feature augmentation in the prediction of PM2.5 concentrations by using five levels of discrete
Neural Network Topologies and wavelet transform (DWT). The following types of deep neural networks were trained and tested on
Wavelets for Feature Augmentation. data collected from two living lab stations next to high-traffic roads in Guildford, UK: multi-layer
Atmosphere 2022, 13, 1451. https:// perceptron (MLP), long short-term memory (LSTM), one-dimensional convolutional neural network
doi.org/10.3390/atmos13091451
(1D-CNN) and a hybrid neural network composed of LSTM and 1D-CNN. The performance of each
Academic Editors: Yuanqing Zhu, model in making predictions up to twenty-four hours ahead was quantitatively assessed through
Long Liu and Stephan Havemann statistical metrics. The results show that wavelets improved the forecasting results and that discrete
wavelet transform is a relevant tool to enhance the performance of DNN topologies, with special
Received: 29 July 2022
emphasis on the hybrid topology that achieved the best results among the applied models.
Accepted: 1 September 2022
Published: 8 September 2022
Keywords: particulate matter; air pollution; artificial neural networks; deep learning; forecasting;
Publisher’s Note: MDPI stays neutral wavelets
with regard to jurisdictional claims in
published maps and institutional affil-
iations.

1. Introduction
The increase in air pollution in urban areas is a concern on a global scale. Such pollution
Copyright: © 2022 by the authors.
occurs especially due to anthropogenic activities, such as industrialization, the growth of
Licensee MDPI, Basel, Switzerland. urbanization, automotive vehicles powered by fossil fuels and agricultural burning [1].
This article is an open access article According to United Nations, more than half of the world lives in urban regions (around
distributed under the terms and 55%) and this number is increasing, considering some European countries, such as the
conditions of the Creative Commons United Kingdom, with more than 83% of the population living in urban environments, a
Attribution (CC BY) license (https:// figure that continues to increase over time. Consequently, humans have been constantly
creativecommons.org/licenses/by/ exposed to variety of harmful components from many sources, mainly those from road
4.0/). vehicles, which are the dominant source of ambient air pollutants, such as particulate

Atmosphere 2022, 13, 1451. https://doi.org/10.3390/atmos13091451 https://www.mdpi.com/journal/atmosphere

Atmosphere 2022, 13, 1451 2 of 20

matter (PM), nitrogen oxide (NOx), carbon monoxide (CO) and volatile organic compounds
(VOCs) [2].
Among these pollutants, PM can be highlighted as one of most critical, as it can cause
numerous adverse effects on human health, such as asthma attacks, chronic bronchitis,
diabetes, cardiovascular disease and lung cancer [3], and it is strongly associated with
respiratory diseases in children [2].
PM is an atmospheric pollutant composed of a mixture of solid and liquid particles sus-
pended in the air [2]. These kinds of particles can be directly emitted through anthropogenic
or non-anthropogenic activities, and they are classified according to their aerodynamic
diameter and their impacts on human health. PM2.5 includes fine particles with a diameter
up to 2.5 µm, which can enter the cardiorespiratory systems. The World Health Orga-
nization (WHO) estimates that long-term exposure to PM2.5 increases long-term risk of
cardiopulmonary mortality by 6% to 13% per 10 µg/m3 of PM2.5 [4]. Furthermore, results
from the European project Aphekom indicate that the life expectancy of the most polluted
cities could be increased by approximately 20 months if long-term exposure to PM2.5 were
reduced to the annual limits established by the WHO [2].
For these reasons, countries have been encouraged to adopt of even more stringent
standards and actions to help control and reduce temporal PM concentrations in urban
environments [4]. Hence, the construction of models that predict the concentration of
this component up to 24 h ahead in densely populated areas with lower computational
complexity and cost arises as a key and strategic tool to assist the monitoring process,
support control and preventive actions to improve air quality and, consequently, reduce
impacts on the health of the population.
Thus, the objective of this work is to build and evaluate the performance of four deep
artificial neural network (DNN) models to predict hourly concentrations of PM2.5 up to
24 h ahead of time, as well as the impact on model performance of applying five-level
discrete wavelet transform (DWT) on the data as a feature augmentation method. The
DNN types applied were multilayer perceptron (MLP), long short-term memory (LSTM),
one-dimensional convolutional neural network (1D-CNN) and a hybrid model (LSTM
with 1D-CNN). To train and test the DNN models, data from densely populated areas in
Surrey County, UK, characterized by high vehicle traffic were used and augmented by the
addition of new features based on the reconstructed detail and approximation signals of
wavelet transform from levels 1 to 5. In order to assess the performance of the deep neural
networks in the prediction task, all results were compared to a linear regression model
as a baseline. Then, they were statistically evaluated according to the following metrics:
mean squared error (MSE), mean absolute error (MAE), Pearson’s r and normalized mean
squared error (NMSE).
This paper is organized into five sections. In Section 1, we introduce the background
and research gaps in the topic areas. In Section 2, we explore the related works in the area
of air pollutant forecasting. In Section 3, we present the case study, data, basic concepts of
DNN, DWT and additional methods used in this work. In Section 4, we present and discuss
the results. Finally, in Section 5, we highlight the main points and present our conclusions,
indicating aspects to be explored in future investigations.

2. Related Works
In recent years, several methods have been applied to the task of forecasting air
pollution components, mainly using statistical, econometric and deep learning models.
Zhang et al. [5] and Badicu et al. [6] assessed the Autoregressive Integrated Moving Average
Model (ARIMA), a powerful statistical model, to predict PM concentrations. The former
used monthly PM2.5 data from the city Fuzhou, China during the period from August 2014
to July 2016 to train the model and predicted the period of July 2016 to July 2017. The
training results presented a mean absolute error (MAE) of 11.4%, with the highest error
values in cold seasons, when the real values from PM2.5 were higher than those predicted by
the model. The latter worked with data from Bucharest, Romania, considering the period
Atmosphere 2022, 13, 1451 3 of 20

of March to May 2019 with a frequency of 15 min to predict PM10 and PM2.5 concentrations.
The results showed that in 89% of cases, the predicted values were under an acceptable
limit of uncertainty. However, this kind of approach has some limitations in long-term
forecasting, as it uses only past data and it has difficulty reaching high peaks, such as in [5],
where it was not able to reach the real peaks of PM2.5 .
Considering these limitations, artificial intelligence (AI) methodologies have been used
to improve forecasting performance due to their ability to learn from complex nonlinear
patterns, their robustness and self-adaptation and their ability to, once correctly trained,
perform predictions with limited computational resources and cost when compared to
other approaches, such as numerical modeling. Reis Jr. et al. [7] analyzed the use of
recurrent neural networks (RNNs) and convolutional neural networks (CNNs) to predict
short-term (24 h) ozone concentration. They compared the performance of CNN, recurrent
neural network long short-term memory (LSTM) and gated recurrent unit (GRU) structures
with a simple multi-layer perceptron (MLP) model. The data were collected between
2001 and 2005 in the region of Vitória in southeastern Brazil. The results showed that the
LSTM topology presented an average performance similar to that of MLP but with slightly
worse results. However, when considering individual time steps, the LSTM presented the
most suitable results for the 9th hour, demonstrating the potential of LSTM for learning
long-term behaviors. Ozone forecasting up to 24 h in advance was also evaluated by
Alves et al. [8] using the same data but comparing only the MLP model with baseline
models: the persistence model and the lasso regression technique. The MLP model proved
to be the most effective according to statistical analyses, outperforming the others in almost
all forecasting steps, except for the 1st hour.
Regarding PM forecasting, the use of MLP topology to forecast PM particles was
investigated by Ahani et al. [9], who compared its performance is with that of the ARIMAX
model (ARIMA with exogenous variables) to predict PM2.5 up to 10 h ahead using different
feature selection methods. The applied data were from Tehran City, the capital of Iran, and
represented a period from 2011 to 2015. The ARIMAX model presented a smaller RMSE in
almost all time steps considered, except for the second and the last time steps, for which
the MLP presented similar results. This shows that, despite its higher capacity, the single
application of artificial neural network (ANN) structures in some data may not outperform
simpler methodologies. Thus, it is possible to assess complementary methodologies to
make them even more robust. Yang et al. [10] used four different DNN topologies to
predict PM2.5 and PM10 , including two hybrid models. The DNNs used were GRU, LSTM,
CNN-GRU and CNN-LSTM. Data from 2015 to 2018 were used to make predictions 15 days
in advance. The results demonstrated that 15-day predictions remained reliable; however,
the most accurate forecasts are up to 7 days in advance. The hybrid models outperformed
the single models for all stations, and the CNN-LSTM model produced the fewest errors.
Despite the research that has been conducted using ANNs to predict air pollution
components, forecasting accuracy depends on the quality of data provided to the model.
This means that the results can still be improved by different representations of data,
which can reveal hidden patterns, as well as the application of feature augmentation tech-
niques. Therefore, various studied involving preprocessing methods for time series, such
as wavelets, have demonstrated the benefits of their application in improving the perfor-
mance of ANNs in the task of forecasting PM concentrations. For instance, Wang et al. [11]
presented the advantages of using hybrid models combining machine learning techniques
and wavelet transforms to predict PM2.5 signal. The prediction was performed 1 h ahead by
decomposition of PM2.5 data in low- and high-frequency components that capture the trend
and noise from the original signal. The temporal resolution of data was the hourly average
concentration in the period from 2016 to 2017. The machine learning methods used were a
backpropagation neural network (BPNN) and a support vector machine (SVM). The results
indicate that hybrid models are more accurate and stable when using wavelets, highlighting
their importance in detecting time and frequency behaviors. Bai et al. [12] also used a
BPNN model based on wavelet decomposition to forecast air pollutant (PM10 , SO2 and
Atmosphere 2022, 13, 1451 4 of 20

NO2 ) concentrations but with additional information concerning meteorological conditions.

The BPNN model was employed to generate wavelet coefficients of the concentrations of
air pollutants for the next day, and then the signals were reconstructed to generate the
predictions. The forecasting horizon was the mean of the next 24 h. Findings showed that
the results of the W-BPNN model were closer to observed data than those of the BPNN
model alone, meaning that the multiresolution data provided by wavelets improve the
accuracy of air pollutant concentration forecasting.
Qiao et al. [13] used a hybrid stacked autoencoder (SAE) to solve the LSTM vanishing
gradient problem and used wavelet transform (WT) to decompose PM2.5 time series into
coefficients as the inputs of an ANN structure to predict average PM2.5 1 day ahead. LSTM
outputs were used to reconstruct the signal and generate the predictions. The data were
from January 2014 to June 2019, and the resulting model outperformed the six other baseline
models, with an MAE of approximately 3.0. The baseline models were SAE-BP (SAE back
propagation), SAE-ELM (SAE extreme learning machine), SAE-BiLSTM (SAE bidirectional
LSTM) and the same machine learning models without SAE (LSTM, BP and ELM (extreme
learning machine)). Results showed that SAE-LSTM predictions were the best compared
with the other models, satisfactorily solving the vanishing gradient problem.
Huang et al. [14] developed a hybrid CNN-LSTM model to predict the concentrations
of PM2.5 one hour ahead using both air pollution and past meteorological data. They
compared their solution with other traditional machine learning techniques and found
that it achieved the best results for this task. Li et al. [15] developed another hybrid CNN-
LSTM deep neural network to predict PM2.5 concentrations for the next day, comparing
their proposed model with univariate and multivariate approaches and LSTM architecture,
achieving the best results with their approach. Mirzadeh et al. [16] evaluated a traditional
machine learning technique called support vector regression (SVR) with WT to predict
PM10 , PM2.5 , SO2 , NO2 , CO and O3 in Isfahan, Iran, finding that SVR with WT presented
better results and lower uncertainty than the other tested models. The same authors [17]
conducted a study to evaluate how WT and traditional AI techniques could be combined to
improve the prediction of short (few hours) and long-term (daily) concentrations of PM2.5
using an adaptive neuro-fuzzy inference system (ANFIS), SVR and a shallow ANN. Their
results showed that WT combined with SVR and ANFIS achieved the best experimental
results among the tested models. Liu et al. [18] presented a combined weighted forecasting
model (CWFM) for air pollution concentration forecasting using WT, bidirectional (Bi)-
LSTM, Bi-GRU and LSTM, along with a weight assignment, and compared the results of
the combined approach with each individual model for prediction of NO2 air pollutants.
They concluded that the combined approach presented a better performance than each
individual model. Jusong et al. [19] developed a hybrid 3D-CNN and Bi-LSTM deep
neural network using WT, feature selection and clustering techniques to predict PM2.5
concentrations up to 10h ahead, achieving the best results compared to other techniques.
Araujo et al. [20] also evaluated the combination of WT and ANNs to predict air pollution
applied to tropospheric O3 forecasting, finding that WT enhanced the ANN’s ability to
forecast air pollution concentrations.
Despite previous studies with the aim of predicting PM2.5 using machine/deep learn-
ing and WT, in the present study, we aim to innovate by systematically constructing and
evaluating four different types of DNN combined with systematic selection and application
of five different levels of WT, with the aim of predicting hourly PM2.5 concentrations up to
24 h ahead for a highly urbanized region in the UK. This research can provide new and
valuable information with respect to how to effectively apply deep learning and WT for
PM2.5 forecasting, improving the ability of regulatory, government or other agencies to
adopt preventive or contingency measures to improve air quality and reduce air pollution
impacts on human health in urban areas.
and valuable information with respect to how to effectively apply deep learning and WT
for PM2.5 forecasting, improving the ability of regulatory, government or other agencies to
Atmosphere 2022, 13, 1451
adopt preventive or contingency measures to improve air quality and reduce air pollution
5 of 20
impacts on human health in urban areas.

3. Materials and Methods

3. Materials and Methods
3.1. Case Study and Data Description
3.1. Case Study and Data Description
As a part of the iSCAPE (Improving the Smart Control of Air Pollution in Europe)
As a part of the iSCAPE (Improving the Smart Control of Air Pollution in Europe)
project, with the aim of developing integrated strategies to control air pollution in Euro-
project, with the aim of developing integrated strategies to control air pollution in European
pean cities, a diverse set of data was collected (https://www.iscapeproject.eu/iscape-data,
cities, a diverse set of data was collected (https://www.iscapeproject.eu/iscape-data,
accessed on
accessed on 31
31 August
August 2022).
2022). One
One of of these
these approaches
approaches consisted
consisted of of the
the use
use of
of living
living lab
lab
stations (LLSs), which provided environmental and atmospheric
stations (LLSs), which provided environmental and atmospheric data with the aim of data with the aim of
monitoring the
monitoring the performance
performance of of implemented
implemented interventions,
interventions, such such asas low
low boundary
boundary walls walls
and green infrastructure in selected cities. Guildford, UK, is one
and green infrastructure in selected cities. Guildford, UK, is one such city, and the data such city, and the data
provided by two LLSs were assessed
provided by two LLSs were assessed in this work. in this work.
Guildfordis
Guildford is located
located inin Guildford
Guildford Borough,
Borough,one one ofof the
the most
most populated
populated areas
areas in in Surrey
Surrey
County [21], where 72% of residents rely on cars as their main
County [21], where 72% of residents rely on cars as their main mode of transportation, mode of transportation,
leading to
leading to an
an increased
increased air air pollution
pollution concentrations.
concentrations. The The available
available datadata were
were collected
collected by by
the University
the Universityof ofSurrey
Surreyinintwotwoparks:
parks:Stoke
StokePark
Parkandand Sutherland
Sutherland Memorial
Memorial Park
Park (Figure
(Figure 1).
1). Data were obtained in open-road conditions, on the outer
Data were obtained in open-road conditions, on the outer side of hedges that delimit side of hedges that delimit
the two
the two parks.
parks. TheThe Stoke
Stoke ParkPark data
data were
were collected
collected from
from February
February to to September
September of of 2019,
2019,
whereas Sutherland Memorial Park data were collected from
whereas Sutherland Memorial Park data were collected from June to October of the same June to October of the same
year.Both
year. Bothdatasets
datasetshave havea atime
timeresolution
resolution of of
oneone minute.
minute. The The measurements
measurements usedusedwerewere
air
air temperature,
temperature, air humidity,
air humidity, air pressure,
air pressure, PM2.5PM 2.5, carbon
, carbon monoxide
monoxide (CO),(CO), nitrogen
nitrogen diox-
dioxide
ide (NO
(NO 2) and
2 ) and ozoneozone
(O3 ).(OTable
3). Table 1 presents
1 presents the description
the description of allofavailable
all available measured
measured vari-
variables
ables
in the in the data.
data.

Figure 1.1.Location
Figure Locationofofthe
themonitoring
monitoring stations,
stations, represented
represented as numbers:
as numbers: “1” “1” represents
represents StokeStoke
Park Park
LLS,
LLS, and “2” represents Sutherland Memorial Park LLS. (Source: https://livinglabs.iscapepro-
and “2” represents Sutherland Memorial Park LLS. (Source: https://livinglabs.iscapeproject.eu/,
ject.eu/, accessed on 31 August 2022).
accessed on 31 August 2022).

3.2. Artificial Neural Networks

ANNs are composed of a basic structure called neurons. These structures are combined
linearly with associated weights, which are assigned with random values at the start of
the training, then passed into an activation function that inserts non-linearities capable of
modelling complex relationships. Through the relation of the basic components and the
activation functions, ANNs can assume different topologies.
The ANNs explored in this paper were MLP, LSTM, CNN and a hybrid model with
the aim of improving the results of LSTM and CNN. A brief explanation of each model is
presented in this section.
HUM Air humidity collected at the station
PRESS Air pressure collected at the station
PM2.5 Concentration of particulate matter with a size ≤2.5 µm
CO Concentration of carbon monoxide
Atmosphere 2022, 13, 1451 NO2 Concentration of nitrogen dioxide 6 of 20
O3 Concentration of ozone

3.2. Artificial
Table NeuralofNetworks
1. Description the available measurements in the dataset from both stations.
ANNs are composed of a basic structure called neurons. These structures are com-
Variable
bined linearly Description
with associated weights, which are assigned with random values at the start
Timethen passed into an activation
of the training, Time of the sample, that
function withinserts
one-minute frequency capable
non-linearities
of modellingTEMP Air temperature
complex relationships. Through the relationcollected at thecomponents
of the basic station and the
HUM Air humidity collected at the station
activation functions, ANNs can assume different topologies.
PRESS Air pressure collected at the station
The ANNs
PM2.5 explored in this paper were MLP,
Concentration LSTM, CNN
of particulate matterand
witha ahybrid model
size ≤2.5 µm with
the aim of improving
CO the results of LSTM and CNN. A brief
Concentration explanation
of carbon monoxide of each model is
presented inNO this
2 section. Concentration of nitrogen dioxide
O3 Concentration of ozone
3.2.1. Multi-Layer Perceptron (MLP)
3.2.1. The
Multi-Layer Perceptron
multi-layer perceptron (MLP)
neural network (MLP) is the simplest artificial neural net-
workThe topology possible. It is basically
multi-layer perceptron neural networka combination
(MLP)ofis multiple perceptrons,
the simplest whichnet-
artificial neural are
the basic
work neuron
topology units. The
possible. It is functioning of each neuron,
basically a combination or perceptron,
of multiple can be
perceptrons, mathemati-
which are the
cally neuron
basic expressed by Equation
units. (1).
The functioning of each neuron, or perceptron, can be mathematically
expressed by Equation (1). !
𝑦 =𝑓 m 𝑤 𝑥 + 𝑏 (1)
y x w = f ∑ wi x i + b (1)
i =1
where yxw is the output of the perceptron, f is the activation function, xi is an attribute or
where yxw is the output of the perceptron, f is the activation function, xi is an attribute or
feature from input data vector x of size m, wi represents each weight from weight vector
feature from input data vector x of size m, wi represents each weight from weight vector
w and b is the bias. In summary, the objective is to determine whether the output of the
w and b is the bias. In summary, the objective is to determine whether the output of the
function (f) triggers (i.e., returns a value other than zero) after summing up the product of
function (f ) triggers (i.e., returns a value other than zero) after summing up the product of
the input features and the weights, which are the parameters that are automatically
the input features and the weights, which are the parameters that are automatically learned
learned through a supervised learning algorithm.
through a supervised learning algorithm.
An MLP is generally composed of three or more fully connected layers. Figure 2 pre-
An MLP is generally composed of three or more fully connected layers. Figure 2
sents a schematic diagram of a typical MLP architecture. At least three layers are required:
presents a schematic diagram of a typical MLP architecture. At least three layers are
an input layer, a hidden layer and an output layer.
required: an input layer, a hidden layer and an output layer.

Figure 2.2. Schematic

Figure Schematic diagram
diagram representing
representing the
the basic
basic structure
structure of
of an
an MLP
MLParchitecture.
architecture. Each
Each circle
circle
corresponds to a perceptron.
corresponds to a perceptron.

MLPs are suitable for several applications, with their its main parameters represented
by the number of layers, the activation functions and the number of neurons in each
layer [8], with a flexible topology. The definition of the number of layers and neurons
is variable, and the optimal composition is problem-specific. The number of outputs is
dependent on specific application requirements, permitting multi-step and multivariate
forecasting. The most adequate configuration of these attributes for each of is chosen
mostly empirically for each application. All the connections between MLP layers are of
the forward kind, which means that backward signal propagation is only possible through
a backpropagation algorithm [8]. Although MLPs were not specifically designed to deal
with time series forecasting, due to their simplicity and ability to solve complex problems,
cally for each application. All the connections between MLP layers are of the forward kind
which means that backward signal propagation is only possible through a backpropaga
tion algorithm [8]. Although MLPs were not specifically designed to deal with time series
forecasting, due to their simplicity and ability to solve complex problems, they have been
employed in many studies to predict air pollution components, such as in [5–9,11,22].
Atmosphere 2022, 13, 1451 7 of 20

3.2.2. Long Short-Term Memory (LSTM)

RNNbeen
they have is aemployed
type of neural network
in many created
studies to predicttoairdeal with sequential
pollution components,data
suchdistributed
as
across time and
in [5–9,11,22]. space. However, such a structure is prone suffer from the vanishing gra
dient problem, a characteristic of gradient-based learning methods, which can even pre
3.2.2. Long Short-Term
vent neural networksMemory (LSTM)The main difference between RNNs and basic neura
from training.
RNN is
networks is athat
typeRNNsof neural
alsonetwork
establish created to deal
weighted with sequential
connections data distributed
between neurons [13], con
across time and space. However, such a structure is prone
nected by the hidden state, which carries information from the immediately suffer from the vanishingprevious
gradient problem, a characteristic of gradient-based learning methods, which can even
steps and overwrites at every step with no special or selective control of what is memo
prevent neural networks from training. The main difference between RNNs and basic
rized or forgotten. This limits the ability of traditional RNNs to correctly represent long
neural networks is that RNNs also establish weighted connections between neurons [13],
term relationships
connected by the hiddenpresent
state,inwhich
time carries
series or other sequential
information from thedata.
immediately previous
steps To
andtackle this at
overwrites issue,
everyLSTM
step with arose as an or
no special alternatives to solve
selective control theisvanishing
of what memorizedgradien
problem
or forgotten.of This
conventional RNN topologies.
limits the ability of traditionalLSTM
RNNs is to acorrectly
model represent
structured in the form o
long-term
chains comprising the cell state, input gate, forget gate and output gate, making connec
relationships present in time series or other sequential data.
tionsTowith
tackle
thethis
next issue,
cell LSTM
through arose
theas anstate
cell alternatives
and hiddento solve the[23].
state vanishing
The cellgradient
state is a kind
problem of conventional RNN topologies. LSTM is a model structured
of selective memory of the past, and the gates work interchangeably to control the flow o in the form of chains
comprising the cell state, input gate, forget gate and output gate, making connections with
data in the cell state. The input gate processes the input and decides whether it is relevan
the next cell through the cell state and hidden state [23]. The cell state is a kind of selective
to change the memory available in the cell state. The forget gate decides which data should
memory of the past, and the gates work interchangeably to control the flow of data in the
be kept
cell state.from oldergate
The input output, controlling
processes the inputtheandflow of the
decides hidden
whether it isstate andtodeciding
relevant change which
information
the should be
memory available carried
in the to the
cell state. Thenext cell.gate decides which data should be kept
forget
To train an LSTM neural network,
from older output, controlling the flow of the hidden the input data
state and need
decidingto be three-dimensional
which information be
cause of
should bethe addition
carried to the of thecell.
next lookback, which represents how many steps back are used to
To train
predict the an
nextLSTMstepneural network, the
or variables. inputto
Owing data need
this to be three-dimensional
capacity, LSTM has the because ability to learn
of the addition of the lookback, which represents how many steps
temporal relations and improve the forecasting results, representing an interesting back are used to predict tool to
the next step or variables. Owing to this capacity, LSTM has the ability to learn temporal
deal with time series. Figure 3 shows how an LSTM cell is structured. All lines carry data
relations and improve the forecasting results, representing an interesting tool to deal with
that can go through pointwise operations, neural network layers, concatenations and rep
time series. Figure 3 shows how an LSTM cell is structured. All lines carry data that can go
lications.
through pointwise operations, neural network layers, concatenations and replications.

Figure3.3.LSTM
Figure LSTMcell
cellstructure.
structure. Arrows,
Arrows, squares
squares and circles
and circles represent
represent datapointwise
data flow, flow, pointwise operations
operations
and activation functions, respectively.
and activation functions, respectively.

In an LSTM, the cell state acts as an internal selective memory of the past, represented
in Figure 3 by the horizontal line starting at ct − 1 and ending at ct . The output of an
LSTM cell is represented by h, i.e., the hidden state. The following equations depict the
mathematical procedure of an LSTM cell:
h i
f t = σ W f h ( t −1) , x t + b f (2)
h i
it = σ Wi h(t−1) , xt + bi (3)
in Figure 3 by the horizontal line starting at ct − 1 and ending at ct. The output of an LSTM
cell is represented by h, i.e., the hidden state. The following equations depict the mathe-
matical procedure of an LSTM cell:
𝑓 = 𝜎 𝑊 ℎ( ), 𝑥 +𝑏 (2)
Atmosphere 2022, 13, 1451 8 of 20
𝑖 = 𝜎 𝑊 ℎ( ), 𝑥 +𝑏 (3)
𝐶𝑠
= 𝑆h 𝑊 ℎ( i) , 𝑥 +𝑏 (4)
Cst = S WC h(t−1) , xt + bC (4)
𝐶 = 𝑓 𝐶( ) + 𝑖 𝐶𝑠 (5)
Ct = f t C(t−1) + it Cs (5)
𝑜 = h𝜎 𝑊 ℎ( i) , 𝑥 + 𝑏 (6)
ot = σ Wo h(t−1) , xt + bo (6)
where 𝑓 is the forget gate; 𝑖 is the input gate; 𝐶𝑠 and 𝐶 are the candidates for the cell
state and
where f t isthe
thecell state
forget at timestep
gate; t, respectively;
it is the input 𝑜 isCthe
gate; Cst and t areoutput gate at t; σ
the candidates is the
for the cell
sigmoid
function;
state and the S is the
cell a hyperbolic
state at timesteptangent
t, respectively; ot is𝑊theisoutput
function; the weight
gate atmatrix neurons; ℎ
of xsigmoid
t; σ is the
function;
is the cellS output at t; 𝑥 is the
is the a hyperbolic tangent at t; andW𝑏x isisthe
inputfunction; theweight of x neurons; ht to
matrixcorresponding
bias matrix is x.
the cell output at t; xt is the input at t; and bx is the bias matrix corresponding to x.
3.2.3. Convolutional Neural Networks (CNN)
3.2.3. Convolutional Neural Networks (CNN)
A CNN is a type of neural network that learns patterns from data through the appli-
A CNN is a type of neural network that learns patterns from data through the applica-
cation of convolutions aimed at learning filters that extract the main features from the data
tion of convolutions aimed at learning filters that extract the main features from the data to
to perform
perform a specific
a specific tasktask
(see (see Figure
Figure 4). Thus,
4). Thus, CNNsCNNs are able
are able to learn
to learn spatial
spatial and temporal
and temporal
relationsfrom
relations fromdata
data [7].
[7]. Consequently,
Consequently, CNNs
CNNs are are
ableable to resize
to resize and and automatically
automatically detectdetect
new elements and patterns from data. In addition, pooling layers reduce
new elements and patterns from data. In addition, pooling layers reduce the size of input the size of input
sequence, followed by the application of flattening layers, which adjust
sequence, followed by the application of flattening layers, which adjust the shape of data the shape of data
totoenter
entera afinal
final regular
regular MLP MLPthatthat concludes
concludes the specified
the specified task. CNNs
task. CNNs are widely
are widely appliedapplied
in
in image
image processing
processing [24],their
[24], and andbenefits
their benefits can be
can be either either explored
explored andfor
and assessed assessed for time
time series
series predictions,
predictions, for which forlookback
which lookback is alsoas
is also required required
an inputastoan input
the CNN. to the CNN.

Figure4.4.Typical
Figure Typical CNN
CNN architecture.
architecture.

The
Thefollowing
followingequations mathematically
equations describe
mathematically the convolution
describe layer:layer:
the convolution

] = (𝑛]f ∗=k(𝑓
G [m, n𝐺[𝑚, )[m, n] =𝑛]∑=j ∑i k [ j, i𝑘[𝑗,
∗ 𝑘)[𝑚, ] f [m𝑖]𝑓[𝑚
− j, −
n−𝑗, 𝑛i ]− 𝑖] (7) (7)
[]
C l = 𝐶al =V [𝑎l ] 𝑉 (8) (8)
[ ]
V l =𝑉K l=
·C𝐾
[l −1∙ ]𝐶
+ bl + 𝑏
(9)
(9)
whereG𝐺isisthe
where thefeature
featuremap;
map;f is𝑓 the
is the input;
input; k, m𝑘, and
𝑚 and 𝑛 represent
n represent the kernel,
the kernel, rowsrows
and and
columns of the result matrix, respectively; the indices j and i are related to the kernel; l is l is
columns of the result matrix, respectively; the indices j and i are related to the kernel;
thelayer
the layerindex,
index,V Visisthe
the intermediate
intermediate value;
value; K isKthe
is the tensor
tensor thatthat
has has filters
filters or kernels;
or kernels; C is C is
theresult
the resultofofthe
theconvolution;
convolution; b isb the
is the bias;
bias; andand
a isathe
is the corresponding
corresponding activation
activation function.
function.
Inaddition,
In addition,a apooling
pooling layer
layer can
can be be employed
employed to reduce
to reduce the the dimensionality
dimensionality of theofsize
the of
size of
theoutput
the outputofofthethe convolution
convolution step,
step, e.g.,
e.g., by by extracting
extracting the the maximum
maximum valuevalue (MaxPooling)
(MaxPooling)
or the average value (AvgPooling) from the learned and extracted kernels/filters within a
fixed-size window, thus decreasing the required processing power for network training.

3.2.4. Hybrid Model

Hybrid models exploit the main functionalities of baseline methods, creating a more
robust model that can handle more complex problems. In this sense, the CNN-LSTM
method exploits the advantages of CNNs, extracting the most important multidimensional
fixed-size window, thus decreasing the required processing power for network training.

3.2.4. Hybrid Model

Hybrid models exploit the main functionalities of baseline methods, creating a more
Atmosphere 2022, 13, 1451 robust model that can handle more complex problems. In this sense, the CNN-LSTM 9 of 20

method exploits the advantages of CNNs, extracting the most important multidimen-
sional attributes from data, resizing them and sending them as input to the LSTM layers,
attributes
which from data,
can extract moreresizing them
attributes and sending
related themrelationships.
to temporal as input to theThe
LSTM layers, which
combination of a
can extract more attributes related to temporal relationships. The combination
CNN and LSTM is expected to deliver more reliable predictions. A representation of such of a CNN
an and LSTM is is
architecture expected
shown into Figure
deliver 5,
more
withreliable predictions.
some internal A that
layers representation of such an
allow for connections
architecture is shown in Figure 5, with some internal layers that allow for connections
between the parts. Thus, this architecture will be evaluated along with others models.
between the parts. Thus, this architecture will be evaluated along with others models.

Figure 5. Representation of a CNN-LSTM model considering the inputs, outputs and internal layers.
Figure 5. Representation of a CNN-LSTM model considering the inputs, outputs and internal lay-
ers.3.3. Wavelet Decomposition for Feature Extraction
WT of a time-domain function is a tool that emerged as an improved version of Fourier
3.3.transform.
Wavelet Decomposition for Feature
Fourier transform Extraction
consists of taking a time-domain signal and breaking it into a
WT of asum
weighted time-domain
of sine andfunction is a tool
cosine waves that emerged
to represent as frequency
it in the an improved domainversion
[25].of Fou-
How-
ever, scientists needed a more appropriate function to represent choppy
rier transform. Fourier transform consists of taking a time-domain signal and breaking it signals [26], and
intobeyond that, itsum
a weighted is necessary to overcome
of sine and the problem
cosine waves of the it
to represent window size not changing
in the frequency domainwith [25].
frequency
However, [11]. Wavelet
scientists needed analysis
a morecan work withfunction
appropriate differentto signal temporal
represent resolutions
choppy signalsand[26],
anddifferent
beyondbasis
that, functions, providing
it is necessary a detailed
to overcome thefrequency
problem of assessment
the window of all discontinuities
size not changing
and signal patterns, processing data at different scales.
with frequency [11]. Wavelet analysis can work with different signal temporal resolutions
Despite the algebra involved in the process, the discrete wavelet transform (DWT) of a
and different basis functions, providing a detailed frequency assessment of all disconti-
signal is calculated by multiple applications of high-pass and low-pass filters, as shown
nuities and signal patterns, processing data at different scales.
in Figure 6. The outputs from the former are detail coefficients, and those from the latter
Despite the algebra involved in the process, the discrete wavelet transform (DWT) of
are approximation coefficients. The number of times that filters are used is determined
a signal
by theislevel
calculated by multiplerequired.
of decomposition applications of high-passofand
The combination thelow-pass
two outputs filters, as shown
contains the
in Figure 6. The outputs from the former are detail coefficients, and
same frequency content as the input signal, but the amount of data is doubled. Therefore,those from the latter
a
aredownsampling
approximationprocedurecoefficients. The number
is applied ofoutputs,
to filter times that filters are
as shown used is6 determined
in Figure using a factor by
theoflevel
two.of decomposition required. The combination of the two outputs contains the
same frequency content there
For each feature, as theisinput signal,
a specific but the
wavelet amount
family that of data
most is doubled.represents
satisfactorily Therefore,
a downsampling
the original signal procedure
in terms isofapplied
separatingto filter
more outputs, as shown frequencies.
and less significant in Figure 6 using a factor
To automate
the process of selecting the most suitable wavelet family, Zucatelli et al. [22] proposed
of two.
a method based on the use of RMSE between the original signal and the reconstructed
approximation signal to obtain the most appropriate family for a specific feature. In the
present study, this process was applied to all features considered relevant to the analysis.
The importance of WT in machine learning applications lies in the fact that it permits
the generation of new features using the approximation and detail coefficients from a
pre-determined level of decomposition. The most interesting characteristic of WT is that
its individual functions are localized in time and frequency [27], allowing the data to be
reconstructed in the same length as the original data, which is relevant to improving ANN
model training.
Atmosphere 2022, 13, x FOR PEER REVIEW 10 of 21

Atmosphere 2022, 13, 1451 10 of 20

Figure 6. Illustration of the wavelet decomposition process. LPF, low-pass filter; HPF, high-pass
Figure 6. Illustration of the wavelet decomposition process. LPF, low-pass filter; HPF, high-pass
filter. The outputs are downsampled by a factor of two. CA, approximation coefficient; CD, detail
filter. The outputs are downsampled by a factor of two. CA, approximation coefficient; CD, detail
coefficient; numbers identify the decomposition level.
coefficient; numbers identify the decomposition level.
3.4. Model Setup
For eachapplying
Before feature, there
data isto aANN
specific wavelet
models, somefamily that most satisfactorily
preprocessing was performed. represents
The
the originaldata
available signal in terms
from the two of stations
separating
were more and less significant
concatenated to providefrequencies.
more data for To the
auto-
mate the process of selecting the most suitable wavelet family, Zucatelli
training step. Latitude, longitude and altitude were added to distinguish the regions, and et al. [22] pro-
posed a method based on the use of RMSE between the original signal
the data were resampled by the average of each hour. Then, five levels of wavelet transform and the recon-
structed approximation
were applied, using thesignal
familytoselection
obtain the mostdescribed
criteria appropriate family
in [22]. Forfor a specific
each feature,feature.
five
Inreconstructed
the present study, thisapproximation
detail and process was applied to allobtained.
signals were features considered relevant to the
Previous studies, such as [8], showed the importance of transforming time variables
analysis.
intoThe
periodical information
importance of WT by employing
in machine trigonometric
learning functions
applications liestoinenable thethat
the fact representa-
it permits
tion of time cycles, which can lead to improved forecasting performance
the generation of new features using the approximation and detail coefficients from of DNN models.
a pre-
Thus, the time variable was converted into periodic sine and cosine
determined level of decomposition. The most interesting characteristic of WT is that with the aim of im-its
proving the ability of the DNN to learn periodic and temporal relationships [8], depicting
individual functions are localized in time and frequency [27], allowing the data to be re-
them in six new features corresponding to the sine and cosine of hours, days and months
constructed in the same length as the original data, which is relevant to improving ANN
according to the following equations:
model training.

2πt a
3.4. Model Setup sinta = sin (10)
f
Before applying data to ANN models, some preprocessing
was performed. The avail-
2πt a
able data from the two stations were cosconcatenated
t a = cos to provide more data for the training (11)
f
step. Latitude, longitude and altitude were added to distinguish the regions, and the data
where
were ta is the value
resampled by theofaverage
the time ofattribute
each hour. being calculated,
Then, i.e.,ofhour
five levels of the
wavelet day, day were
transform of
the month or month of the year; and f is the of possible value of that time
applied, using the family selection criteria described in [22]. For each feature, five recon- attribute in the
corresponding
structed time
detail and scale, i.e., for signals
approximation hour, the number
were of hours in a day (24); for day, the
obtained.
number
Previous studies, such as [8], showed the importanceofofmonths
of days in that month; and for month, the number in a yeartime
transforming (12).variables
As a result, the final dataset was composed by 86 features, 8 of which were the original
into periodical information by employing trigonometric functions to enable the represen-
features and the remainder of which were the preprocessed and augmented features, as
tation of time cycles, which can lead to improved forecasting performance of DNN mod-
previously described. Finally, all variables were scaled to the same range between zero and
els. Thus, the time variable was converted into periodic sine and cosine with the aim of
one so that they all had the same degree of importance.
improvingTablesthe
2–5ability
presentofthe
theconfigurations
DNN to learnofperiodic and temporal
each implemented DNNrelationships
topology. The[8], depict-
number
ing them in six new features corresponding to the sine and cosine of
of neurons at the input for each DNN is related to the amount of features required as input, hours, days and
months
i.e., 86,according
considering to all
thethe
following
featuresequations:
generated by the wavelet transforms, as previously
2𝜋𝑡
explained. In the case of the LSTM, 1D-CNN and the hybrid 1DCNN-LSTM models, a
𝑠𝑖𝑛 = 𝑠𝑖𝑛 (10)
𝑓
Atmosphere 2022, 13, 1451 11 of 20

lookback of three samples was set up for training. The output layer of each DNN was set
to 24, one for each forecasting hour ahead, totaling 24 h. Therefore, to make predictions
once the model was trained, the raw features were collected, as expressed in Table 1, the
time series were resampled for hourly frequency, the time attributes were preprocessed
as detailed in Equations (10) and (11), the corresponding wavelet transforms and levels
were generated, the features between zero and one were scaled and the lookback for each
sample was processed (if working with LSTM, 1D-CNN or 1DCNN-LSTM models). As a
result, the models output the next 24 h of PM2.5 concentrations, given the input.

Table 2. Developed MLP architecture.

Layer Layer Type Neurons Activation Function

Input N/A 86 N/A
First Hidden Layer Dense 10 Sigmoid
Second Hidden Layer Dense 17 ReLu
Output Dense 24 Sigmoid

Table 3. Developed LSTM architecture.

Layer Layer Type Neurons Activation Function

Input N/A 86 N/A
First Hidden Layer LSTM 64 Sigmoid
Second Hidden Layer Dropout (0.4) N/A N/A
Third Hidden Layer Dense 12 ReLu
Output Dense 24 Sigmoid

Table 4. Developed 1D-CNN architecture.

Layer Layer Type Neurons Activation Function

Input N/A 86 N/A
First Hidden Layer 1D-CNN 128 Sigmoid
Second Hidden Layer 1D-CNN 32 ReLu
Third Hidden Layer 1D-MaxPooling N/A N/A
Fourth Hidden Layer Dropout (0.2) N/A N/A
Fifth Hidden Layer Flatten N/A N/A
Sixth Hidden Layer Dense 16 ReLu
Output Dense 24 Sigmoid

Table 5. Developed hybrid (1DCNN-LSTM) architecture.

Layer Layer Type Neurons Activation Function

Input N/A 86 N/A
First Hidden Layer 1D-CNN 128 Sigmoid
Second Hidden Layer 1D-MaxPooling N/A N/A
Third Hidden Layer LSTM 64 Sigmoid
Fourth Hidden Layer Dropout (0.2) N/A N/A
Fifth Hidden Layer Flatten N/A N/A
Sixth Hidden Layer Dense 32 ReLu
Output Dense 24 Sigmoid

The training, validation and test datasets were separated prior to the building, vali-
dation and assessment of the models. The training dataset consisted of the concatenation
of the Stoke Park data from February to June, in addition to August and September, and
the Sutherland Memorial Park dataset corresponding to the months of June, in addition to
August to October, both in 2019. From the training dataset, 30% was randomly separated
for validation. The month of July 2019 was separated as the test dataset, corresponding to
Atmosphere 2022, 13, 1451 12 of 20

about 15.38% of the total dataset, and was never seen by the models during the training
and validation of data from both stations. This was done to assess the final performance of
the models in predicting PM2.5 concentrations in order to standardize the tests for the same
period for which data were available for both regions.
Table 6 presents the hyperparameters used to train each DNN. No specific hyperpa-
rameter search technique was implemented, as the primary target was to evaluate different
DNN topologies for the task of forecasting PM2.5 for the next 24 h using WT for feature
augmentation. The parameters were set to be practically the same in order to guarantee
comparativeness between each topology—except for MLP, which required more 100 epochs
than the others models to be successfully trained.

Table 6. Hyperparameters used to train each DNN architecture.

Layer MLP LSTM 1D-CNN 1DCNN-LSTM

Optimizer Adam Adam Adam Adam
Learning Rate 0.001 0.001 0.001 0.001
Loss Function MSE MSE MSE MSE
Batch Size 32 32 32 32
Epochs 300 200 200 200

3.5. Model Evaluation

The performance of each DNN topology was quantitatively evaluated using the error
metrics mean square error (MSE), mean absolute error (MAE) and normalized mean square
error (NMSE), along with the Pearson correlation (r) and coefficient of determination (R2 ) as
correlation metrics, in both training and test datasets according to the following equations:
n
1
MSE =
n ∑ (Oi − Fi )2 (12)
i =1

n
1
MAE =
n ∑ |Oi − Fi | (13)
i =1

MSE
N MSE = (14)
Var (O)
∑in=1 Oi − O Fi − F

r= q 2 q n 2 (15)
∑in=1 Oi − O ∑i=1 Fi − F
∑in=1 (Oi − Fi )
R2 = 1 − 2 (16)
∑in=1 Oi − O
where n is the number of samples; Oi is the i-th observed sample; Fi is the corresponding
predicted value; O and F are the average of all observed and predicted values, respectively;
and Var(O) denotes the variance of the O set of observed samples.
In addition, once the models were trained, the prediction intervals for each model and
each forecasting horizon were estimated by applying quantile regression to the errors of the
predictions made in the validation dataset—which, in this case, was used as the calibration
set. To this end, a quantile of q = 0.95 was employed, meaning that the prediction intervals
contained a range of values that should include the actual future value with a probability
of 95% [28]. The prediction intervals were calculated for each forecast horizon in the test
dataset and averaged to generate the final prediction intervals for each model.
After assessing all the prediction results in the test dataset, the model selected as
presenting the best metrics was evaluated to determine whether its predictions differed
in distribution relative to those of the other model, i.e., whether they were statistically
equivalent or not. To this end, the Wilcoxon signed-rank test [29] was employed, as it is
a nonparametric statistical technique for comparing two paired or related samples and
Atmosphere 2022, 13, 1451 13 of 20

determining whether their distributions are equal or not. For a given statistical significance
(α), if the null hypothesis (H0) can be rejected, i.e., if p ≤ α, where p is the calculated p value
according to the test; then, the samples are drawn from different distributions. On the
contrary, if p > α, H0 cannot be rejected, meaning that the samples are drawn from the same
distribution. For this work, α = 0.05 was used.

4. Results and Discussion

4.1. Comparison of Each Approach with and without Wavelet Decomposition
Tables 7–10 present a comparison of the average metrics of all tested models forecasting
24 h ahead with and without the five wavelet-decomposition levels for both train and test
datasets. This average was calculated considering the values of all forecasting hours and
stations. The DNN models were compared with a simple linear regression model in order
to assess how the DNNs performed in comparison with a baseline approach. Best results in
each table for each metric are highlighted in bold.

Table 7. Average results without wavelets considering both stations in the train dataset.

Model MSE MAE NMSE Pearson (r) R2

LR 116.30 7.30 0.48 0.74 0.52
MLP 83.85 5.77 0.35 0.80 0.65
LSTM 82.17 5.74 0.34 0.81 0.66
1D-CNN 64.32 5.53 0.29 0.84 0.71
1DCNN-LSTM 12.78 2.44 0.08 0.96 0.92

Table 8. Average results with five levels of wavelet transform considering both stations in the
train dataset.

Model MSE MAE NMSE Pearson (r) R2

LR 696.57 17.95 1.74 0.0078 −0.74
MLP 44.36 4.48 0.21 0.89 0.79
LSTM 35.45 4.00 0.32 0.91 0.83
1D-CNN 27.04 3.46 0.13 0.93 0.87
1DCNN-LSTM 18.64 2.96 0.91 0.96 0.91

Table 9. Average results without wavelets considering both stations in the test dataset.

Model MSE MAE NMSE Pearson (r) R2

LR 45.61 5.38 1.75 0.37 0.47
MLP 29.18 3.93 1.13 0.37 −0.13
LSTM 28.21 3.86 1.08 0.39 −0.09
1D-CNN 33.27 4.18 1.29 0.37 −0.30
1DCNN-LSTM 127.27 7.58 5.14 0.40 −4.15

Table 10. Average results with five levels of wavelet transform considering both stations in the
test dataset.

Model MSE MAE NMSE Pearson (r) R2

LR 23.75 3.66 0.98 0.52 −0.74
MLP 84.84 6.12 3.15 0.35 0.79
LSTM 25.10 3.70 1.45 0.49 0.83
1D-CNN 27.19 3.73 1.05 0.45 0.87
1DCNN-LSTM 17.09 2.97 0.66 0.68 0.91

Tables 7 and 8 present the metrics for the model results without and with wavelet
transforms in the train dataset, respectively. For the train dataset, the hybrid model 1DCNN-
Atmosphere 2022, 13, 1451 14 of 20

LSTM presented the best results for all metrics. The hybrid model achieved the best results
with features augmented with wavelets, showing that they were key to increasing the
models performance.
Tables 9 and 10 present the results for metrics of the test dataset, showing that the
application of wavelets improved the metrics and, consequently, the results of all topologies,
except for MLP, which obtained worse values, except for R2 . The hybrid model 1DCNN-
LSTM had exhibited the best improvement, with a reduction in MSE of 127.27 to 17.09,
representing a reduction of almost 90% in the test dataset. The hybrid architecture with
wavelets also presented the best values for all metrics considered, whereas 1D-CNN and
LSTM with wavelets demonstrated similar results individually but performed worse than
the hybrid 1DCNN-LSTM architecture. Metrics changed from the training to the test
metric assessment; in general, the application of wavelet transforms either improved the
models’ ability to generalize, as the metrics did not change considerably, presenting with
similar metric levels. This highlights the importance of feature augmentation with wavelet
transform of time series data to improve the learning ability of DNNs due to their capacity
to capture information from the time and frequency domains at the same time and at
different scales.
Tables 11 and 12 present the prediction intervals for each DNN model for all forecasting
horizons evaluated in the test dataset without and with wavelets. The smaller the prediction
interval, the better because the uncertainties in the predictions are smaller, and the interval
is narrower. According to the results, the hybrid model outperformed the others in both
cases, presenting the best prediction interval with the 5-level wavelet decomposition.

Table 11. Average prediction intervals for all forecasting horizons for each model in the test dataset
without wavelets for both stations.

Mean Mean
Prediction Prediction Mean Prediction
Model
Interval (+/−) Interval Lower Prediction Interval Upper
Bound Bound
MLP 18.23 −8.91 9.32 27.55
LSTM 18.17 −8.77 9.40 27.57
1D-CNN 16.23 −6.64 9.59 25.82
1DCNN-LSTM 9.09 3.85 12.95 22.05

Table 12. Average prediction intervals for all forecasting horizons for each model in the test dataset
with the five levels of wavelet decomposition for both stations.

Mean Mean
Prediction Prediction Mean Prediction
Model
Interval (+/−) Interval Lower Prediction Interval Upper
Bound Bound
MLP 13.85 −2.91 10.94 24.80
LSTM 12.09 −3.82 8.27 20.36
1D-CNN 10.76 −2.65 8.11 18.86
1DCNN-LSTM 8.59 0.21 8.79 17.38

4.2. Assessing Individual Forecasting Hour Performance for Each Approach

It is also important to verify how each model performs for each forecasting horizon
(or each individual hour ahead) in the test dataset, which was never seen by the models
during the training and validation. Figure 7 shows the average NMSE and Pearson r
behavior metrics for each forecasting hour for each station, considering the data with
wavelet transform, as this method was proven to produce superior results relative to no
wavelet transform. In general, the hybrid method using 1DCNN-LSTM presented better
performance than the others models.
(or each individual hour ahead) in the test dataset, which was never seen by the models
during the training and validation. Figure 7 shows the average NMSE and Pearson r be-
havior metrics for each forecasting hour for each station, considering the data with wave-
let transform, as this method was proven to produce superior results relative to no wave-
Atmosphere 2022, 13, 1451 let transform. In general, the hybrid method using 1DCNN-LSTM presented 15 better
of 20 per-
formance than the others models.

Figure 7. Metrics of the hybrid 1DCNN-LSTM model with five-level wavelet decomposition for each
forecasting horizon calculated for Stoke Park (a) and Sutherland (b) stations.

With respect to the NMSE for Stoke Park, the LSTM and MLP models outperformed
the 1DCNN-LSTM model for last-hour forecasting, but 1DCNN-LSTM performed better
for the overall forecasting hours, with a more consistent and robust performance than that
of the other models, including for the Pearson r metric. However, for Sutherland Memorial
Park, the MLP model presented the highest errors for the entire range for NMSE, whereas
the 1DCNN-LSTM model achieved the best performance in almost all steps, except when
LSTM slightly overcame its results for the last forecasting hours. Linear regression and MLP
presented the worst performance in Stoke Park for NMSE and Pearson r, whereas MLP
presented the worst results for Sutherland Memorial Park data for both metrics, followed
by 1DCNN. For both datasets, the smallest error occurred in the first step, increasing along
the forecasting horizon. This behavior occurs due to the decreasing ability of all methods
with respect to longer-term forecasting, which reduces the capacity of the trained DNN
models to make precise inferences about events in farther in the future. Therefore, the
prediction performance decreases as the time horizon increases, making the metrics worse
in the 24th hour. This also provides a basis for future research in the field of deep neural
networks, with the aim of improving the ability of such models to learn and represent
longer-term temporal relationships for multivariate time series forecasting. These results
are related to a unique model trained for both stations at once.
Figures 8–10 show a qualitative analysis of the forecasting behavior for Stoke and
Sutherland parks of the 1st, 12th and 24th hours using the 1DCNN-LSTM model built with
and without wavelet transforms, including the prediction intervals of the model—plotted as
shadowed regions around the predictions. It is possible to notice the qualitative differences
between the observed data and the predictions using the specified model with and without
wavelets. In general, the application of wavelets increased the model’s ability to predict
PM2.5 concentrations. Wavelets contributed to more smoothed and robust predictions,
presenting a behavior closer to the real data, with more precise behavior and less noise,
Atmosphere 2022, 13, 1451 16 of 20

which was not the case without wavelets. This behavior was more evident for Stoke Park
than for Sutherland Memorial Park, where the predictions without wavelets preserved some
characteristics of the original signal but, in general, performed worse than using wavelets.
Atmosphere 2022, 13, x FOR PEER REVIEW 17 ofthe
The results of this analysis are in agreement with the quantitative metrics, reflecting 21
Atmosphere 2022, 13, x FOR PEER REVIEW
lower error values for the approach using 1DCNN-LSTM and wavelet transforms. 17 of 21

Figure 8.
Figure 8. Comparison
Comparison between
between the
the predictions
predictions of
of PMPM2.5 concentrations
concentrations made
made with
with and
and without
without
2.5
wavelets
Figure 8. by 1DCNN-LSTM
Comparison for the
between 1stpredictions
the hour for (a)ofStoke
PM Park
2.5 Station and (b)
concentrations Sutherland
made with andMemorial
without
wavelets by 1DCNN-LSTM for the 1st hour for (a) Stoke Park Station and (b) Sutherland Memorial
Park Station.
wavelets The shadowedfor
by 1DCNN-LSTM regions represent
the 1st hour forthe
(a) prediction
Stoke Parkintervals of the
Station and (b)model.
Sutherland Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
Park Station. The shadowed regions represent the prediction intervals of the model.

Figure 9. Comparison between the predictions of PM2.5 concentrations made with and without
wavelets
Figure 9.by
Figure 9. 1DCNN-LSTM
Comparison
Comparison for the
between
between 12th
the
the hour for (a)
predictions
predictions ofStoke
of 2.5Park
PM2.5
PM Station and made
concentrations
concentrations (b) Sutherland
made andMemorial
with and
with without
without
Park Station.
wavelets by The shadowedfor
1DCNN-LSTM regions
the represent
12th hour the
for (a)prediction
Stoke intervals
Park Station of the
and (b)model.
Sutherland Memorial
wavelets by 1DCNN-LSTM for the 12th hour for (a) Stoke Park Station and (b) Sutherland Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
Park Station. The shadowed regions represent the prediction intervals of the model.
Atmosphere 2022, 13, x FOR PEER REVIEW 18 of 21

Atmosphere
Atmosphere 13,13,
2022,
2022, 1451PEER REVIEW
x FOR 1718
of 20
of 21

Figure 10. Comparison between the predictions of PM2.5 concentrations made with and without
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke Park Station and (b) Sutherland Memoria
Figure
Park 10.
10.Comparison
FigureStation. between
The shadowed
Comparison between the predictions
regions representofof
the predictions the
PMPM 2.5 concentrations made with and without
prediction intervals of the
withmodel.
2.5 concentrations made and without
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke
wavelets by 1DCNN-LSTM for the 24th hour for (a) Stoke Park ParkStation
Stationand
and(b)
(b) Sutherland
Sutherland Memorial
Memorial
Park Station. The shadowed regions represent the prediction intervals of the model.
4.3. Evaluation of the Generalization Ability of the DNN
Park Station. The shadowed regions represent the prediction intervals of the model.

4.3. It is important
4.3.Evaluation
Evaluation ofofthe to evaluate Ability
theGeneralization
Generalization of the of
Ability generalization
of the
theDNN
DNN ability of the DNN, which is demon-
strated by analyzing
ItItisisimportant
important to the evolution
toevaluate
evaluate of the loss (MSE)
of the generalization
of generalization value
ability
ability forDNN,
ofofthe
the each epoch
DNN,whichwhich during
is demon-
is the train-
demon-
ing
strated and
stratedby validation
byanalyzing
analyzing the procedures,
the evolution using
evolutionofofthe the
theloss
loss portion
(MSE)
(MSE) value of
value the
forfor data
each separated
epoch
each during
epoch for each
the training
during purpose
the train-
The
and
ing and aim
validationis to procedures,
validation evaluate whether
procedures,using the
the
using model
portion
the of performs
portion theofdata wellseparated
the separated
data during training
for each each with
forpurpose. Thethe same
purpose.
aim is
behavior to evaluate
both in whether
the the
trainingmodel
and performs
validation well during
sets. If thetraining
The aim is to evaluate whether the model performs well during training with the same model with the
presents same behavior
different results of
both in
loss along the training and validation sets. If the model presents different results of loss
behavior boththein epochs,
the trainingit may
andbe suffering
validation some
sets. If thesort of under-
model presentsor different
overfitting, depending
results of
along
on the the epochs, it
behavior ofmay
the be
losssuffering
curve some sort of
measured at under-
each or overfitting,
epoch for each depending
set. on the
loss along the epochs, it may be suffering some sort of under- or overfitting, depending
behavior of the loss curve measured at each epoch for each set.
on the Figure
behavior 11ofpresents a graphical
the loss curve measured evolution,
at each showing
epoch for that eachtheset. model generalizes well
Figure 11 presents a graphical evolution, showing that the model generalizes well, pre-
presenting
Figure no
11 overfitting
presents a or underfitting,
graphical evolution, as the
showing loss of
that
senting no overfitting or underfitting, as the loss of both training and validation
both
the training
model and validation
generalizes
presentedwell, pre-
sented
the samethe
presenting nosame convergence
overfitting
convergence behavior,
or underfitting,
behavior, and and
PM2.5aspredictionsPM2.5
the loss ofinpredictions
both training
the test inand
dataset, the test had
dataset,
validation
which pre-which
not
sented
had notthe same
been convergence
seen before by behavior,
the model, and
been seen before by the model, were successfully performed. werePM predictions
successfully
2.5 in the
performed.test dataset, which
had not been seen before by the model, were successfully performed.

Figure 11.
Figure Training and validation 1DCNN-LSTM loss graph. “Loss”“Loss”
and “val_loss” representrepresent
the
Figure 11.11. Training
Training andand validation
validation 1DCNN-LSTM
1DCNN-LSTM loss loss graph.
graph. and “val_loss”
“Loss” and “val_loss” represent the the
evolution
evolution of the loss measured at each epoch for the training and validation datasets, respectively.
evolution of of
thethe loss
loss measured
measured at each
at each epoch
epoch for training
for the the training and validation
and validation datasets,
datasets, respectively.
respectively.

4.4.
4.4. Assessment
Assessment of the
of the Statistical
Statistical Difference
Difference of Predictions
of the the Predictions
AsAs presented
presented in in Section
Section 3.5,3.5,
thethe Wilcoxon
Wilcoxon signed-rank
signed-rank test was
test was employed
employed to assess
to assess
whether
whether thethe models’
models’ predictions
predictions differed
differed in terms
in terms of distribution
of distribution and whether
and whether they were
they were
Atmosphere 2022, 13, 1451 18 of 20

4.4. Assessment of the Statistical Difference of the Predictions

As presented in Section 3.5, the Wilcoxon signed-rank test was employed to assess
whether the models’ predictions differed in terms of distribution and whether they were
statistically similar. Table 13 presents the evaluation of the hybrid 1DCNN-LSTM model
relative to other DNN models, both with features augmented with 5-level wavelet trans-
forms. According to the results, the Wilcoxon signed-rank test demonstrated that the
predictions of the 1DCNN-LSTM had a different distribution than the other DNN models,
as the null hypothesis was rejected for every paired test, demonstrating that the hybrid
model’s predictions were statistically different from those of the other models.

Table 13. Wilcoxon signed-rank test assessment of the predictions of the hybrid 1DCNN-LSTM model
against the other models with five-level wavelet transforms. If H0 is rejected, i.e., if p ≤ α, α = 0.05,
the distributions are different.

Stoke
Model Sutherland p-Value Test Result
p-Value
MLP 0.00 0.00 Different distribution
LSTM 0.00 0.04 Different distribution
1D-CNN 0.00 0.00 Different distribution

5. Conclusions
In the present study, we systematically evaluated different deep learning models, along
with WT, to predict the concentration of PM2.5 up to 24 h ahead in two open-road regions
of Surrey, UK, characterized by the proximity of parks where children and adults perform
recreational activities by the high vehicle traffic, which are relevant factors with respect
to air pollution monitoring and assessment. The methodology implemented consisted of
developing and validating the use of deep learning associated with WT and comparing the
results of the tested models with those of simpler methodologies. Different deep neural
network topologies were implemented, namely MLP, LSTM, 1D-CNN and 1DCNN-LSTM,
with and without WT, along with a linear regressor model as a baseline. The results showed
that the best performance was achieved by the 1DCNN-LSTM model among all other DNN
architectures, with WT applied on the time series data. The final deep neural network
model captured the real data behavior and presented a good generalization of the problem
in test data, despite being related to a period of data that was never seen by the model
during the training and validation.
WT was implemented with the aim of decomposing the original time-series signals
into several low- and high-frequency components, extracting some information from the
data that was not yet available. This increased the results of all deep neural networks,
which is in line with other previously developed studies [12,13,22]. Our results highlight
the positive impact of with respect to improving DNN performance and how this approach
is appropriate to deal with complex problems.
Thus, this methodology proved to have a great potential for use in by academics,
authorities, industry and society to construct and validate deep learning models to predict
hour PM2.5 concentrations in advance for the next 24 h with good performance. This
research provides a solid basis for understanding, developing, and evaluating deep learning
models for this task, enabling the adoption of preventive or mitigation actions when
necessary, such as alerting people to avoid highly polluted areas when the predictions of
PM2.5 concentrations reach hazardous levels, avoiding imminent health risks associated
with exposure to air pollutants.
In future studies, this methodology can be assessed in other places and scenarios
under varying conditions to verify its robustness. Furthermore, other deep neural network
approaches and models can be implemented, such as transformers or physics-informed
neural networks (PINNs), including feature augmentation methodologies, to assess their
capability of predicting long-term PM2.5 concentrations with high fidelity.
Atmosphere 2022, 13, 1451 19 of 20

Author Contributions: Conceptualization, E.G.S.N. and P.K.; methodology, S.L.J.G. and E.G.S.N.;
software, S.L.J.G., J.C.O.M., Y.K.L.K. and F.S.C.; validation, S.L.J.G., E.G.S.N. and D.M.M.; formal
analysis, S.L.J.G., J.C.O.M., Y.K.L.K. and E.G.S.N.; investigation, S.L.J.G., D.M.M. and E.G.S.N.;
resources, P.K., D.M.M. and E.G.S.N.; data curation, S.L.J.G. and Y.K.L.K.; writing—original draft
preparation, S.L.J.G., J.C.O.M., Y.K.L.K. and F.S.C.; writing—review and editing, S.L.J.G., P.K., D.M.M.
and E.G.S.N.; visualization, S.L.J.G. and E.G.S.N.; supervision, E.G.S.N.; project administration,
E.G.S.N.; funding acquisition, P.K., D.M.M. and E.G.S.N. All authors have read and agreed to the
published version of the manuscript.
Funding: This work was partially supported by the Bahia State Research Support Foundation
(Fundação de Amparo Pesquisa do Estado da Bahia—FAPESB, Brazil) at SENAI CIMATEC, under
project nº CNV 0002/2015. The authors thank the Reference Center on Artificial Intelligence (CRIA)
and the Supercomputing Center for Industrial Innovation (CS2i), both from SENAI CIMATEC, as
well as the NVIDIA/CIMATEC AI Joint Lab, for infrastructure, technical and scientific support. The
authors also thank the iSCAPE (Improving Smart Control of Air Pollution in Europe) project, which
was funded by the European Community’s H2020 Programme (H2020-SC5-04-2015) under Grant
Agreement No. 689954, as well as the team from the University of Surrey’s Global Centre for Clean
Air Research (GCARE), United Kingdom, for providing the data.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data are publicly available at https://www.iscapeproject.eu/,
accessed on 31 August 2022.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Doreswamy, K.S.; Harishkumar, K.M.; Gad, I. Forecasting Air Pollution Particulate Matter (PM2.5 ) Using Machine Learning
Regression Models. Procedia Comput. Sci. 2020, 171, 2057–2066. [CrossRef]
2. World Health Organization. Health Effects of Particulate Matter, Policy Implications for Countries in Eastern Europe, Caucasus
and Central Asia; World Health Organization. Regional Office for Europe. Available online: https://apps.who.int/iris/handle/10
665/344854 (accessed on 31 August 2022).
3. World Health Organization. Air Pollution, The United Nations. Available online: https://www.who.int/health-topics/air-
pollution#tab=tab_2 (accessed on 31 August 2022).
4. World Health Organization. Occupational and Environmental Health Team, Air Quality Guidelines for Particulate Matter, Ozone,
Nitrogen Dioxide, and Sulfur Dioxide: Global Update 2005: Summary of Risk Assessment; World Health Organization: Geneva,
Switzerland, 2006.
5. Zhang, L.; Lin, J.; Qiu, R.; Hu, X.; Zhang, H.; Chen, Q.; Tan, H.; Lin, D.; Wang, J. Trend analysis and forecast of PM2.5 in Fuzhou,
China using the ARIMA model. Ecol. Indic. 2018, 95, 702–710. [CrossRef]
6. Badicu, A.; Suciu, G.; Balanescu, M.; Dobrea, M.; Birdici, A.; Orza, O.; Pasat, A. PMs concentration forecasting using ARIMA algo-
rithm. In Proceedings of the IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020.
[CrossRef]
7. Reis, A.S., Jr.; Nascimento, E.G.S.; Moreira, D.M. Assessing recurrent and convolutional neural networks for tropospheric ozone
forecasting in the region of Vitória, Brazil. WIT Trans. Ecol. Environ. 2020, 244, 101–112. [CrossRef]
8. Alves, L.V.B.; Nascimento, E.G.S.; Moreira, D.M. Hourly tropospheric ozone concentration forecasting using deep learning. WIT
Trans. Ecol. Environ. 2019, 236, 129–138. [CrossRef]
9. Ida, K.A.; Majid, S.; Alireza, S. Statistical models for multi-step-ahead forecasting of fine particulate matter in urban areas. Atmos.
Pollut. Res. 2019, 10, 689–700. [CrossRef]
10. Yang, G.; Lee, H.; Lee, G. A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea.
Atmosphere 2020, 11, 348. [CrossRef]
11. Wang, P.; Zhang, G.; Chen, F.; He, Y. A hybrid-wavelet model applied for forecasting PM2.5 concentrations in Taiyuan city, China.
Atmos. Pollut. Res. 2019, 10, 1884–1894. [CrossRef]
12. Bai, Y.; Li, Y.; Wang, X.; Xie, J.; Li, C. Air pollutants concentrations forecasting using back propagation neural network based on
wavelet decomposition with meteorological conditions. Atmos. Pollut. Res. 2016, 7, 557–566. [CrossRef]
13. Qiao, W.; Tian, W.; Tian, Y.; Yang, Q.; Wang, Y.; Zhang, J. The forecasting of PM2.5 using a hybrid model based on wavelet
transform and an improved deep learning algorithm. IEEE Access 2019, 7, 142814–142825. [CrossRef]
14. Huang, C.-J.; Kuo, P.-H. A deep cnn-lstm model for particulate matter (PM2.5 ) forecasting in smart cities. Sensors 2018, 18, 2220.
[CrossRef]
Atmosphere 2022, 13, 1451 20 of 20

15. Li, T.; Hua, M.; Wu, X. A Hybrid CNN-LSTM Model for Forecasting Particulate Matter (PM2.5 ). IEEE Access 2020, 8, 26933–26940.
[CrossRef]
16. Zohre, E.K.; Ruhollah, T.M.; Mohamad, K.; Ali, R.N. Predicting the ground-level pollutants concentrations and identifying the
influencing factors using machine learning, wavelet transformation, and remote sensing techniques. Atmos. Pollut. Res. 2021,
12, 101064. [CrossRef]
17. Mirzadeh, S.M.; Nejadkoorki, F.; Mirhoseini, S.A.; Moosavi, V. Developing a wavelet-AI hybrid model for short- and long-term
predictions of the pollutant concentration of particulate matter10. Int. J. Environ. Sci. Technol. 2022, 19, 209–222. [CrossRef]
18. Liu, B.; Yu, X.; Chen, J.; Wang, Q. Air pollution concentration forecasting based on wavelet transform and combined weighting
forecasting model. Atmos. Pollut. Res. 2021, 12, 101144. [CrossRef]
19. Kim, J.; Wang, X.; Kang, C.; Yu, J.; Li, P. Forecasting air pollutant concentration using a novel spatiotemporal deep learning model
based on clustering, feature selection and empirical wavelet transform. Sci. Total Environ. 2021, 801, 149654. [CrossRef]
20. Araujo, M.L.S.; Kitagawa, Y.K.L.; Moreira, D.M.; Nascimento, E.G.S. Forecasting Tropospheric Ozone Using Neural Networks
and Wavelets: Case Study of a Tropical Coastal-Urban Area. In Computational Intelligence Methodologies Applied to Sustainable
Development Goals; Studies in Computational Intelligence; Verdegay, J.L., Brito, J., Cruz, C., Eds.; Springer: Cham, Switzerland,
2022; Volume 1036. [CrossRef]
21. Abhijith, K.V.; Prashant, K. Field investigations for evaluating green infrastructure effects on air quality in open-road conditions.
Atmos. Environ. 2019, 201, 132–147. [CrossRef]
22. Zucatelli, P.J.; Nascimento, E.G.S.; Santos, A.Á.B.; Arce, A.M.G.; Moreira, D.M. An investigation on deep learning and wavelet
transform to nowcast wind power and wind power ramp: A case study in Brazil and Uruguay. Energy 2021, 230, 120842.
[CrossRef]
23. Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM) neural network for flood forecasting. Water
2019, 11, 1387. [CrossRef]
24. Paolo, A.; Adriano, B.; Maide, B.; Luigi, F. Image processing for medical diagnosis using CNN. Nucl. Instrum. Methods Phys. Res.
Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2003, 497, 174–178. [CrossRef]
25. National Instruments. Understanding FFTs and Windowing, Technical Report. Available online: https://www.ni.com/pt-br/
innovations/white-papers/06/understanding-ffts-and-windowing.html (accessed on 31 August 2022).
26. Graps, A. An Introduction to Wavelets. IEEE Comput. Sci. Eng. 1995, 2, 50–61. [CrossRef]
27. Sifuzzaman, M.; Islam, M.R.; Ali, M.Z. Application of Wavelet Transform and its Advantages Compared to Fourier Transform.
J. Phys. Sci. 2009, 13, 121–134.
28. Hoshmand, A.R. Business Forecasting: A Practical Approach, 2nd ed.; Routledge: New York, NY, USA, 2010; ISBN 978-1592576128.
29. Corder, G.W.; Foreman, D.I. Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach, 1st ed.; Wiley: New York, NY,
USA, 2009; ISBN 978-0470454619.

Atmosphere 16 00127 v2
No ratings yet
Atmosphere 16 00127 v2
25 pages
MPD A Meteorological and Pollution Dataset A Comprehensive Study of Machine and Deep Learning Methods For Air Pollution Forecasting
No ratings yet
MPD A Meteorological and Pollution Dataset A Comprehensive Study of Machine and Deep Learning Methods For Air Pollution Forecasting
18 pages
Prediction of Outdoor PM2.5 Concentrations Based On
No ratings yet
Prediction of Outdoor PM2.5 Concentrations Based On
34 pages
MDPI Sandy S Special Issue
No ratings yet
MDPI Sandy S Special Issue
14 pages
V25i0811-Forecasting Air Pollution Levels With Machine Learning Techniques
No ratings yet
V25i0811-Forecasting Air Pollution Levels With Machine Learning Techniques
7 pages
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models
No ratings yet
Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models
17 pages
Displays: Kezheng Sun, Lijuan Tang, Jiansheng Qian, Guangcheng Wang, Cairong Lou
No ratings yet
Displays: Kezheng Sun, Lijuan Tang, Jiansheng Qian, Guangcheng Wang, Cairong Lou
6 pages
Intelligent Forecasting of Air Quality and Pollution Prediction Using Machine Learning
No ratings yet
Intelligent Forecasting of Air Quality and Pollution Prediction Using Machine Learning
15 pages
A Systematic Study On PM2.5 and PM10 Concentration Prediction in Air
No ratings yet
A Systematic Study On PM2.5 and PM10 Concentration Prediction in Air
15 pages
ANN for Air Pollution Prediction Review
No ratings yet
ANN for Air Pollution Prediction Review
61 pages
Deep Learning for Air Pollution Modeling
No ratings yet
Deep Learning for Air Pollution Modeling
6 pages
Machine Learning for Air Quality Prediction
No ratings yet
Machine Learning for Air Quality Prediction
3 pages
Airqualitypridiction
No ratings yet
Airqualitypridiction
7 pages
Advanced Air Quality Prediction Using Multimodal Data and Dynamic Modeling Techniques
No ratings yet
Advanced Air Quality Prediction Using Multimodal Data and Dynamic Modeling Techniques
29 pages
Hybrid-Ensemble IoT Model for Air Quality
No ratings yet
Hybrid-Ensemble IoT Model for Air Quality
8 pages
Air Pollution Forecasting Using A Deep Learning Model Based On 1D Convnets and Bidirectional GRU
No ratings yet
Air Pollution Forecasting Using A Deep Learning Model Based On 1D Convnets and Bidirectional GRU
9 pages
Improvement of Air Quality Index Prediction Using Geographically Weighted
No ratings yet
Improvement of Air Quality Index Prediction Using Geographically Weighted
12 pages
Predicting PM2.5 in Istanbul Using Deep Learning
No ratings yet
Predicting PM2.5 in Istanbul Using Deep Learning
18 pages
Deep Learning Air Quality Prediction Model
No ratings yet
Deep Learning Air Quality Prediction Model
14 pages
An Efficient Implementation of ARIMA Technique For Air Quality Prediction
No ratings yet
An Efficient Implementation of ARIMA Technique For Air Quality Prediction
7 pages
Envsoft S 25 01279
No ratings yet
Envsoft S 25 01279
26 pages
Machine Learning for Air Quality Prediction
No ratings yet
Machine Learning for Air Quality Prediction
15 pages
Report
No ratings yet
Report
64 pages
PM2 5 Air Pollution Prediction Through Deep Learning Using Multisource Meteorological Wildfire and Heat Data
No ratings yet
PM2 5 Air Pollution Prediction Through Deep Learning Using Multisource Meteorological Wildfire and Heat Data
20 pages
Deep Air Quality Forecasting Using Hybrid Deep
No ratings yet
Deep Air Quality Forecasting Using Hybrid Deep
14 pages
Atmosphere 15 01163
No ratings yet
Atmosphere 15 01163
19 pages
Deepairnet: Applying Recurrent Networks For Air Quality Prediction Deepairnet: Applying Recurrent Networks For Air Quality Prediction
No ratings yet
Deepairnet: Applying Recurrent Networks For Air Quality Prediction Deepairnet: Applying Recurrent Networks For Air Quality Prediction
10 pages
Estimation of Ground PM25 Concentrations in Pakist
No ratings yet
Estimation of Ground PM25 Concentrations in Pakist
17 pages
Air Pollution Prediction with ML Techniques
No ratings yet
Air Pollution Prediction with ML Techniques
3 pages
Aaqr 23 12 Oa 0317
No ratings yet
Aaqr 23 12 Oa 0317
28 pages
Modeling Air Quality Prediction Using A Deep Learning Approach Method Optimization and Evaluation
No ratings yet
Modeling Air Quality Prediction Using A Deep Learning Approach Method Optimization and Evaluation
26 pages
Deep Learning for Air Quality Forecasting
No ratings yet
Deep Learning for Air Quality Forecasting
8 pages
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
No ratings yet
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
15 pages
Aarya Patel Research Paper
No ratings yet
Aarya Patel Research Paper
8 pages
Research Paper Model
No ratings yet
Research Paper Model
4 pages
3-Day-Ahead Forecasting of Regional Pollution Index For The Pollutants NO2, CO, SO2, and O3 Using Artificial Neural Networks in Athens, Greece
No ratings yet
3-Day-Ahead Forecasting of Regional Pollution Index For The Pollutants NO2, CO, SO2, and O3 Using Artificial Neural Networks in Athens, Greece
15 pages
Daily Air Pollution Forecasting Using SVM
No ratings yet
Daily Air Pollution Forecasting Using SVM
11 pages
Conference 122
No ratings yet
Conference 122
15 pages
LSTM-Based Air Pollution Forecasting in Korea
No ratings yet
LSTM-Based Air Pollution Forecasting in Korea
6 pages
Air Pollution Forecasting Application Based On Dee
No ratings yet
Air Pollution Forecasting Application Based On Dee
16 pages
Air Quality Forecasting Using Machine Learning
No ratings yet
Air Quality Forecasting Using Machine Learning
12 pages
AI-Powered PM2.5 Forecasting for HCMC
No ratings yet
AI-Powered PM2.5 Forecasting for HCMC
13 pages
LSTM Models for Hourly PM10 Prediction
No ratings yet
LSTM Models for Hourly PM10 Prediction
13 pages
Project Amitya 210911166
No ratings yet
Project Amitya 210911166
5 pages
Air Quality Forecasting with GP and ANN
No ratings yet
Air Quality Forecasting with GP and ANN
13 pages
Mahajan 2017
No ratings yet
Mahajan 2017
7 pages
LSTM-Based AQI Prediction for Delhi
No ratings yet
LSTM-Based AQI Prediction for Delhi
4 pages
AiCareBreath IoT-Enabled Location-Invariant Novel Unified Model For Predicting Air Pollutants To Avoid Related Respiratory Disease
No ratings yet
AiCareBreath IoT-Enabled Location-Invariant Novel Unified Model For Predicting Air Pollutants To Avoid Related Respiratory Disease
9 pages
Reference Paper
No ratings yet
Reference Paper
1 page
Machine Learning for Air Quality Prediction
No ratings yet
Machine Learning for Air Quality Prediction
9 pages
Air Quality Prediction Models Overview
No ratings yet
Air Quality Prediction Models Overview
2 pages
PM2.5 Air Pollution Forecasting Project
No ratings yet
PM2.5 Air Pollution Forecasting Project
9 pages
Data Driven Based PM2.5 Concentration Forecasting
No ratings yet
Data Driven Based PM2.5 Concentration Forecasting
4 pages
London PM2.5 Prediction Using Machine Learning
No ratings yet
London PM2.5 Prediction Using Machine Learning
9 pages
Delhi Air Quality Prediction with LSTM
No ratings yet
Delhi Air Quality Prediction with LSTM
18 pages
Design and Analysis of Air Pollution Concentration Prediction Models Using Transfer Learning and Recurrent Neural Networks
No ratings yet
Design and Analysis of Air Pollution Concentration Prediction Models Using Transfer Learning and Recurrent Neural Networks
8 pages
Forecasting PM10 Concentrations Using Artificial Neural Network in Imphal City
No ratings yet
Forecasting PM10 Concentrations Using Artificial Neural Network in Imphal City
10 pages
Sustainability 14 09951 v2
No ratings yet
Sustainability 14 09951 v2
36 pages
Ecology IA Ideas
No ratings yet
Ecology IA Ideas
2 pages
AS4041 ASME B31 - 3 Pipe Wall Thickness
100% (1)
AS4041 ASME B31 - 3 Pipe Wall Thickness
8 pages
Vocal Health in German Teachers
No ratings yet
Vocal Health in German Teachers
11 pages
Agenda 21: Quick Guide & Insights
100% (1)
Agenda 21: Quick Guide & Insights
3 pages
Tube Fitting
No ratings yet
Tube Fitting
184 pages
Operator's Manual: T-1080S Spectrum With Premium HMI
100% (1)
Operator's Manual: T-1080S Spectrum With Premium HMI
150 pages
GR 8-HY-MSCS (UAE SST) - WORKSHEET
No ratings yet
GR 8-HY-MSCS (UAE SST) - WORKSHEET
9 pages
Hardware Abstraction Layer: "Learning AOSP"
No ratings yet
Hardware Abstraction Layer: "Learning AOSP"
13 pages
W22Xdb IE3 Squirrel Cage Motor Data Sheet
No ratings yet
W22Xdb IE3 Squirrel Cage Motor Data Sheet
2 pages
The Role of The Interpreter Is To
No ratings yet
The Role of The Interpreter Is To
5 pages
National Spray Plaster Medium
No ratings yet
National Spray Plaster Medium
2 pages
Central and Southeast European Politics Since 1989 1st Edition Sabrina P. Ramet Ebook All Chapter Text Included
100% (6)
Central and Southeast European Politics Since 1989 1st Edition Sabrina P. Ramet Ebook All Chapter Text Included
67 pages
Approaches: Artificial
No ratings yet
Approaches: Artificial
25 pages
Silencer Analysis
No ratings yet
Silencer Analysis
5 pages
L2 - DNA KNex Lab
100% (1)
L2 - DNA KNex Lab
16 pages
Cement Plant Separators & Classifiers
100% (1)
Cement Plant Separators & Classifiers
52 pages
Diode Characteristics and Rectification Lab
No ratings yet
Diode Characteristics and Rectification Lab
40 pages
Glass Fundamentals 3A - Understanding Emissivity - 19-09-2018
No ratings yet
Glass Fundamentals 3A - Understanding Emissivity - 19-09-2018
6 pages
Ecosystems and Energy Sources - Google Slides
No ratings yet
Ecosystems and Energy Sources - Google Slides
15 pages
01 Natural Sciences PDF
No ratings yet
01 Natural Sciences PDF
13 pages
Du Funk Magazine Volume 1, Issue 9
No ratings yet
Du Funk Magazine Volume 1, Issue 9
95 pages
Agricultural Science Exam for JHS 3
No ratings yet
Agricultural Science Exam for JHS 3
2 pages
Hivd900 English
83% (18)
Hivd900 English
64 pages
APCODUR MIO Grey Safety Data Sheet
No ratings yet
APCODUR MIO Grey Safety Data Sheet
15 pages
Marry Me
No ratings yet
Marry Me
31 pages
Mock Test 2
No ratings yet
Mock Test 2
3 pages
Simple Harmonic Motion and Waves
No ratings yet
Simple Harmonic Motion and Waves
32 pages
Resume Tips for Math Teachers
No ratings yet
Resume Tips for Math Teachers
1 page
Genel Katalog Yeni Compressed
No ratings yet
Genel Katalog Yeni Compressed
308 pages
AI Landscape
No ratings yet
AI Landscape
67 pages

Atmosphere

Uploaded by

Atmosphere

Uploaded by

atmosphere

1 Computational Modeling Department, SENAI CIMATEC, Av. Orlando Gomes, n. 1845,

Atmosphere 2022, 13, 1451. https://doi.org/10.3390/atmos13091451 https://www.mdpi.com/journal/atmosphere

NO2 ) concentrations but with additional information concerning meteorological conditions.

3. Materials and Methods

3.2. Artificial Neural Networks

Figure 2.2. Schematic

3.2.2. Long Short-Term Memory (LSTM)

3.2.4. Hybrid Model

3.2.4. Hybrid Model

Atmosphere 2022, 13, 1451 10 of 20

Table 2. Developed MLP architecture.

Layer Layer Type Neurons Activation Function

Table 3. Developed LSTM architecture.

Layer Layer Type Neurons Activation Function

Table 4. Developed 1D-CNN architecture.

Layer Layer Type Neurons Activation Function

Table 5. Developed hybrid (1DCNN-LSTM) architecture.

Layer Layer Type Neurons Activation Function

Table 6. Hyperparameters used to train each DNN architecture.

Layer MLP LSTM 1D-CNN 1DCNN-LSTM

3.5. Model Evaluation

4. Results and Discussion

Model MSE MAE NMSE Pearson (r) R2

Model MSE MAE NMSE Pearson (r) R2

Model MSE MAE NMSE Pearson (r) R2

Model MSE MAE NMSE Pearson (r) R2

4.2. Assessing Individual Forecasting Hour Performance for Each Approach

4.4. Assessment of the Statistical Difference of the Predictions

You might also like