Machine Learning for Air Quality Prediction
Machine Learning for Air Quality Prediction
cognitive computing
Article
A Machine Learning Approach for Air Quality
Prediction: Model Regularization and Optimization
Dixian Zhu 1, *, Changjie Cai 2 , Tianbao Yang 1 and Xun Zhou 3
1 Department of Computer Science, University of Iowa, Iowa City, IA 52242, USA; [email protected]
2 Department of Occupational and Environmental Health, University of Oklahoma Health Sciences Center,
Oklahoma City, OK 73104, USA; [email protected]
3 Department of Management Sciences, University of Iowa, Iowa City, IA 52242, USA; [email protected]
* Correspondence: [email protected]
Abstract: In this paper, we tackle air quality forecasting by using machine learning approaches
to predict the hourly concentration of air pollutants (e.g., ozone, particle matter (PM2.5 ) and sulfur
dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model
on big data by using large-scale optimization algorithms. Although there exist some works applying
machine learning to air quality prediction, most of the prior studies are restricted to several-year
data and simply train standard regression models (linear or nonlinear) to predict the hourly air
pollution concentration. In this work, we propose refined models to predict the hourly air pollution
concentration on the basis of meteorological data of previous days by formulating the prediction over
24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different
regularization techniques. We propose a useful regularization by enforcing the prediction models of
consecutive hours to be close to each other and compare it with several typical regularizations for
MTL, including standard Frobenius norm regularization, nuclear norm regularization, and `2,1 -norm
regularization. Our experiments have showed that the proposed parameter-reducing formulations
and consecutive-hour-related regularizations achieve better performance than existing standard
regression models and existing regularizations.
1. Introduction
Adverse health impacts from exposure to outdoor air pollutants are complicated functions
of pollutant compositions and concentrations [1]. Major outdoor air pollutants in cities include
ozone (O3 ), particle matter (PM), sulfur dioxide (SO2 ), carbon monoxide (CO), nitrogen oxides
(NOx ), volatile organic compounds (VOCs), pesticides, and metals, among others [2,3]. Increased
mortality and morbidity rates have been found in association with increased air pollutants (such as O3 ,
PM and SO2 ) concentrations [3–5]. According to the report from the American Lung Association [6],
a 10 parts per billion (ppb) increase in the O3 mixing ratio might cause over 3700 premature deaths
annually in the United States (U.S.). Chicago, as for many other megacities in U.S., has struggled
with air pollution as a result of industrialization and urbanization. Although O3 precursor (such as
VOCs, NOx , and CO) emissions have significantly decreased since the late 1970s, O3 levels in Chicago
have not been in compliance with standards set by the Environmental Protection Agency (EPA) to
protect public health [7]. Particle size is critical in determining the particle deposition location in the
human respiratory system [8]. PM2.5 , referring to particles with a diameter less than or equal to 2.5 µm,
has been an increasing concern, as these particles can be deposited into the lung gas-exchange region,
the alveoli [9]. The U.S. EPA revised the annual standard of PM2.5 by lowering the concentration to
12 µg/m3 to provide improved protection against health effects associated with long- and short-term
exposure [10]. SO2 , as an important precursor of new particle formation and particle growth, has also
been found to be associated with respiratory diseases in many countries [11–15]. Therefore, we selected
O3 , PM2.5 and SO2 for testing in this study.
Meteorological conditions, including regional and synoptic meteorology, are critical in
determining the air pollutant concentrations [16–21]. According to the study by Holloway et al. [22],
the O3 concentration over Chicago was found to be most sensitive to air temperature, wind speed
and direction, relative humidity, incoming solar radiation, and cloud cover. For example, a lower
ambient temperature and incoming solar radiation slow down photochemical reactions and lead to less
secondary air pollutants, such as O3 [23]. Increasing wind speed could either increase or decrease the
air pollutant concentrations. For instance, when the wind speed was low (weak dispersion/ventilation),
the pollutants associated with traffic were found at the highest concentrations [24,25]. However, strong
wind speeds might form dust storms by blowing up the particles on the ground [26]. High humidity is
usually associated with high concentrations of certain air pollutants (such as PM, CO and SO2 ) but
with low concentrations of other air pollutants (such as NO2 and O3 ) because of various formation and
removal mechanisms [25]. In addition, high humidity can be an indicator of precipitation events, which
result in strong wet deposition leading to low concentrations of air pollutants [27]. Because various
particle compositions and their interactions with light were found to be the most important factors in
attenuating visibility [28,29], low visibility could be an indicator of high PM concentrations. Cloud can
scatter and absorb solar radiation, which is significant for the formation of some air pollutants (e.g.,
O3 ) [23,30]. Therefore, these important meteorological variables were selected to predict air pollutant
concentrations in this study.
Statistical models have been applied for air pollution prediction on the basis of meteorological
data [31–35]. However, existing studies on statistical modeling have mostly been restricted to simply
utilizing standard classification or regression models, which have neglected the nature of the problem
itself or ignored the correlation between sub-models in different time slots. On the other hand, machine
learning approaches have been developing for over 60 years and have achieved tremendous success
in a variety of areas [36–41]. There exist various new tools and techniques invented by the machine
learning community, which allow for more refined modeling of a specific problem. In particular,
model regularization is a fundamental technique for improving the generalization performance of
a predictive model. Accordingly, many efficient optimization algorithms have been developed for
solving various machine learning formulations with different regularizations.
In this study, we focus on refined modeling for predicting hourly air pollutant concentrations
on the basis of historical metrological data and air pollution data. A striking difference between this
work and the previous works is that we emphasize how to regularize the model in order to improve
its generalization performance and how to learn a complex regularized model from big data with
advanced optimization algorithms. We collected 10 years worth of meteorological and air pollution
data from the Chicago area. The air pollutant data was from the EPA [42,43], and the meteorological
data was from MesoWest [44]. From their databases, we fetched consecutive hourly measurements
of various meteorological variables and pollutants reported by two air quality monitoring stations
and two air pollutant monitoring sites in the Chicago area. Each record of hourly measurements
included meteorological variables such as solar radiation, wind direction and speed, temperature,
and atmospheric pressure; as well as air pollutants, including PM2.5 , O3 , and SO2 . We used two
methods for model regularization: (i) explicitly controlling the number of parameters in the model;
(ii) explicitly enforcing a certain structure in the model parameters. For controlling the number of
parameters in the model, we compared three different model formulations, which can be considered in
a unified multi-task learning (MTL) framework with a diagonal- or full-matrix model. For enforcing
the model matrix into a certain structure, we have considered the relationship between prediction
models of different hours and compared three different regularizations with standard Frobenius
norm regularization. The experimental results show that the model with the intermediate size and
the proposed regularization, which enforces the prediction models of two consecutive hours to be
Big Data Cogn. Comput. 2018, 2, 5 3 of 15
close, achieved the best results and was far better than standard regression models. We have also
developed efficient optimization algorithms for solving different formulations and demonstrated their
effectiveness through experiments.
The rest of the paper is organized as follows. In Section 2, we discuss related work. In Section 3,
we describe the data collection and preprocessing. In Section 4, we describe the proposed solutions,
including formulations, regularizations and optimizations. In Section 5, we present the experimental
studies and the results. In Section 6, we give conclusions and indicate future work.
2. Related Work
Many previous works have been proposed to apply machine learning algorithms to air
quality predictions. Some researchers have aimed to predict targets into discretized levels.
Kalapanidas et al. [32] elaborated effects on air pollution only from meteorological features such
as temperature, wind, precipitation, solar radiation, and humidity and classified air pollution into
different levels (low, med, high, and alarm) by using a lazy learning approach, the case-based reasoning
(CBR) system. Athanasiadis et al. [45] employed the σ-fuzzy lattice neurocomputing classifier to predict
and categorize O 3 concentrations into three levels (low, mid, and high) on the basis of meteorological
features and other pollutants such as SO2 , NO, NO2 , and so on. Kurt and Oktay [33] modeled
geographic connections into a neural network model and predicted daily concentration levels of SO2 ,
CO, and PM10 3 days in advance. However, the process of converting regression tasks to classification
tasks is problematic, as it ignores the magnitude of the numeric data and consequently is inaccurate.
Other researchers have worked on predicting concentrations of pollutants. Corani [46] worked on
training neural network models to predict hourly O3 and PM10 concentrations on the basis of data from
the previous day. Mainly compared were the performances of feed-forward neural networks (FFNNs)
and pruned neural networks (PNNs). Further efforts have been made on FFNNs: Fu et al. [47] applied
a rolling mechanism and gray model to improve traditional FFNN models. Jiang et al. [48] explored
multiple models (physical and chemical model, regression model, and multiple layer perceptron) on
the air pollutant prediction task, and their results show that statistical models are competitive with the
classical physical and chemical models. Ni, X. Y. et al. [49] compared multiple statistical models on the
basis of PM2.5 data around Beijing, and their results implied that linear regression models can in some
cases be better than the other models.
MTL focuses on learning multiple tasks that have commonalities [50] that can improve the efficiency
and accuracy of the models. It has achieved tremendous successes in many fields, such as natural
language processing [37], image recognition [38], bioinformatics [39,40], marketing prediction [41], and so
on. A variety of regularizations can be utilized to enhance the commonalities of the related tasks, including
the `2,1 -norm [51], nuclear norm [52], spectral norm [53], Frobenius norm [54], and so on. However, most
of the former machine learning works on air pollutant prediction did not consider the similarities
between the models and only focused on improving the model performance for a single task, that is,
improving prediction performance for each hour either separately or identically.
Therefore, we decided to use meteorological and pollutant data to perform predictions of hourly
concentrations on the basis of linear models. In this work, we focused on three different prediction
model formulations and used the MTL framework with different regularizations. To the best of
our knowledge, this is the first work that has utilized MTL for the air pollutant prediction task.
We exploited analytical approaches and optimization techniques to obtain the optimal solutions.
The model’s evaluation metric was the root-mean-squared error (RMSE).
study included the concentrations of O3 , PM2.5 and SO2 . We downloaded the air pollutant data from
the U.S. EPA’s Air Quality System (AQS) database (https://www.epa.gov/outdoor-air-quality-data),
which has been widely used for model evaluation [42,43]. We selected the meteorological variables
that would affect the air pollutant concentrations, including air temperature, relative humidity, wind
speed and direction, wind gust, precipitation accumulation, visibility, dew point, wind cardinal
direction, pressure, and weather conditions. We downloaded the meteorological data from MesoWest
(http://mesowest.utah.edu/), a project within the Department of Meteorology at the University of
Utah, which has been aggregating meteorological data since 2002 [44].
The locations of the two air quality monitoring sites and two weather stations are shown
in Figure 1. The Alsip Village (AV) air quality monitoring site is also located in a suburban
residential area, which is in southern Cook County, Illinois (AQS ID: 17-031-0001; latitude/longitude:
41.670992/−87.732457. The Lemont Village (LV) air quality monitoring site is located in a
suburban residential area, which is in southwestern Cook County, Illinois (AQS ID: 17-031-1601;
latitude/longitude: 41.66812/−87.99057. The weather station situated in Lansing Municipal Airport
(LMA) is the closest meteorological site (MesoWest ID: KIGQ; latitude/longitude: 41.54125/−87.52822)
to the AV air quality monitoring site. The weather station positioned at Lewis University (LU) is the
closest meteorological site (MesoWest ID: KLOT; latitude/longitude: 41.60307/−88.10164) to the LV
air quality monitoring site.
Figure 1. Locations of measurement sites. Blue stars denote the two air quality monitoring sites.
Red circles denote the two meteorological sites.
3.2. Preprocessing
We paired the collected meteorological data and air pollutant data on the basis of time to obtain
the required data format for applying the machine learning methods. In particular, for each variable,
we formed one value for each hour. However, the original data may have contained multiple records
or missing values at some hours. To preprocess the data, we calculated the hourly mean value of each
numeric variable if there were multiple observed records within an hour and chose the category with
the highest frequency per hour for each categorical variable if there were multiple values. Missing
Big Data Cogn. Comput. 2018, 2, 5 5 of 15
values existed for some variables, which was not tolerable for applying the machine learning methods
used in this study. Therefore, we imputed the missing values by using the closest-neighbor values
for four continuous variables and one categorical variable: wind gust, pressure, altimeter reading,
precipitation, and weather conditions. We deleted the days that still had missing values after imputing.
We applied dummy coding for two categorical variables, the cardinal wind direction (16 values, e.g.,
N, S, E, W, etc.) and weather conditions (31 values, e.g., sunny, rainy, windy, etc.). Then, we added the
weekday and weekend as two boolean features. Finally, we obtained 60 features in total (9 numerical
meteorological features, 16 dummy codings for wind direction, 31 dummy codings for weather
conditions, 2 boolean features for weekday/weekend, 1 numerical feature for pollutants, and 1 bias
term). We applied normalization for all the features and pollutant targets to make their values fall in
the range [0, 1].
where W denotes the parameters of the model, f (W, xi ) denotes the prediction of the air pollutant
concentration, and ϕ(·) denotes a regularization function of the model parameters W.
Next, we introduce two levels of model regularization. The first level is to explicitly control the
number of model parameters. The second level is to explicitly impose a certain regularization on the
model parameter. For the first level, we consider three models that are described below:
• Baseline Model. The first model is a baseline model that has been considered in existing studies
and has the fewest number of parameters. In particular, the prediction of the air pollutant
concentration is given by
D
f k (W, xi ) = ∑ e>k ui,j · w j + e>k vi · wD+1 + w0 , k = 1, . . . , 24
j =1
where ek ∈ R24×1 is a basis vector with 1 at only the kth position and 0 at other positions;
w0 , w1 , . . . , w D , w D+1 ∈ R are the model parameters, where w0 is the bias term. We denote this
model by W = (w0 , w1 , . . . , w D+1 )> . It is notable that this model predicts the hourly concentration
on the basis of the same hourly historical data of the previous day and that it has D + 2 parameters.
This simple model assumes that all 24 h share the same model parameter.
Big Data Cogn. Comput. 2018, 2, 5 6 of 15
• Heavy Model. The second model takes all the data of the previous day into account when
predicting the concentration of every hour of the second day. In particular, for the kth hour,
the prediction is given by
D
f k (W, xi ) = ∑ ui,j > wk,j + vi> wk,D+1 + wk,0 , k = 1, . . . , 24
j =1
We note that each column of W corresponds to the prediction model for each hour. There are
a total of 24 × (24 ×( D + 1) + 1) parameters. It is notable that the baseline model is a special
case by enforcing all columns of W to be the same and because each wk,j has only one non-zero
element at the kth position.
• Light Model. The third model is between the baseline model and the heavy model. It considers
the 24 h pattern of the air pollutants in the previous day and the same hourly meteorological data
of the previous day to predict the concentration at a particular hour. The prediction is given by
D
f k (W, xi ) = ∑ e>k ui,j · wk,j + vi> wk,D+1 + wk,0 , k = 1, . . . , 24
j =1
It is also notable that each column corresponds to the predictive model for one hour and that W
has a total of 24 × ( D + 1) + 24 × 24 × 1 parameters.
d
kW k2,1 = ∑ kWj,∗ k2
j =1
where Wj,∗ denotes the jth row of W. We consider a `2,1 -norm regularizer ϕ(W ) = λkW k2,1 .
• Nuclear norm regularization. The nuclear norm is defined as the sum of singular values of
a matrix, which is a standard regularization for enforcing a matrix to have a low rank. The
motivation for using a low-rank matrix is that models for consecutive hours are highly correlated,
which could render the matrix W to be low rank. We denote by kW k∗ the nuclear norm of a matrix
W; the regularization is ϕ(W ) = λkW k∗ .
• Consecutive close (CC) regularization. Finally, we propose a useful regularization for the
considered problem that explicitly enforces the predictive models for two consecutive hours
to be close to each other. The intuition is that usually the concentrations of air pollutants for two
consecutive hours are close to each other. We denote the model by W = (w1 , . . . , wK ) and by
Cons(W ) = [(w1 − w2 ), (w2 − w3 ), ..., (wK −1 − wK )]. The CC regularization is given by
K −1
∑ k w j − w j +1 k p
p
ϕ (W ) = λ (2)
j =1
where p = 1 or p = 2.
∂F (Wt−1 , xi ) >
Wt0 = Wt−1 − 2ηs e ( F (Wt−1 , xi ) − Yi ) (3)
∂Wt−1
where ηs is the stage-wise step size, i is a sampled index, and e is a vector with 1 for all its elements.
Then a proximal mapping is as follows (denoted by λ̃ = 2ηs λ):
The above problem has analytical solutions. We denote wi as a column vector for W > and wi0 as a
>
column vector for W 0 t . Then the solution to Equation (4) can be computed by the following [51]:
λ̃
(1 − )w0 , λ̃ > 0, kwi0 k2 > λ̃
kwi0 k2 i
wi = (5)
0, λ̃ > 0, kwi0 k2 ≤ λ̃
wi0 , λ̃ = 0
Big Data Cogn. Comput. 2018, 2, 5 8 of 15
Algorithm 1: ASSG method with proximal mapping solving `2,1 -norm regularized model.
Input: X, Y, W0 , η0 , S, and T
for s = 1, . . . , S do
ηs = ηs−1 /2
for t = 1, . . . , T do
sample i ∈ {1, ..., n}
update Wt0 using Equation (3)
update Wt using Equation (4)
end
W0 = ∑tT=1 W1 /WT
end
Output: W0
Then stochastic gradient descent and ascent are used to update W and U at each iteration:
∂F (Wt−1 , xi ) >
Wt = Wt−1 − ηt−1 (2 e ( F (Wt−1 , xi ) − Yi ) + λUt−1 )
∂Wt−1 (6)
Ut = Ut−1 + τt−1 (λWt−1 − ρ∂[kUt−1 k2 − 1]+ )
where ρ ≥ kY k2F and ∂[kUt k2 − 1]+ can be computed by u1 v1> 1[σ1 > 1], with (u1 , v1 ) being the top-left
and -right singular vectors of Ut and σ1 being the top singular value. The pseudocode for the algorithm
is as follows:
Here, E = (ê1 , ..., êk−1 ), where êi = (0, ..., 1, −1, ..., 0) T , i = 1, ..., k − 1, the ith element is 1 and the
(i + 1)th element is −1. Therefore, Cons(W ) = WE. A dummy variable U = WE was introduced to
decouple the last term from the first term, and a Lagrangian function was formed as follows:
n
1 β
L(W, U, Λ) =
n ∑ k F(W, xi ) − Yi k22 + λkU k1,1 − tr(Λ> (WE − U )) + 2 kWE − U k2F (7)
i =1
Algorithm 3: LA-SADMM solving consecutive close (CC) regularized problem with `1 -norm.
Input: X, Y, W0 , U0 , Λ0 , β 1 , η1 , S, and T
for s = 1, . . . , S do
for τ = 1, . . . , T do
sample i ∈ {1, ..., n}
update Wτ , Uτ , and Λτ using Equation (8)
end
WT = ∑τT=1 Wτ /T
W0 = WT , U0 = UT , and Λ0 = Λ T
β s+1 = 2β s , and ηs+1 = ηs /2
end
Output: WT
5. Experiments
We used the names of the paired air quality monitoring sites and two weather stations to denote
the two datasets, that is, LU–LV and LMA–AV. LU–LV contained the data to predict the concentration
of the two air pollutants O3 and SO2 . LMA–AV contained the data to predict the concentration of the
two air pollutants O3 and PM2.5 .
We compared 11 different models that were learned with different combinations of model
formulations and regularizations. The 11 models were the following:
• Baseline: the baseline model with standard Frobenius norm regularization.
• Heavy–F: the heavy model with standard Frobenius norm regularization.
• Light–F: the heavy model with standard Frobenius norm regularization.
• Heavy–`2,1 : the heavy model with `2,1 -norm regularization.
• Heavy–nuclear: the heavy model with nuclear-norm regularization.
• Heavy–CCL2: the heavy model with CC regularization using the `2 -norm.
• Heavy–CCL1: the heavy model with CC regularization using the `1 -norm.
• Light–`2,1 : the light model with `2,1 -norm regularization.
• Light–nuclear: the light model with nuclear-norm regularization.
• Light–CCL2: the light model with CC regularization using the `2 -norm.
• Light–CCL1: the light model with CC regularization using the `1 -norm.
It is noteworthy that we also added the standard Frobenius norm regularizer for the
heavy/light–nuclear, –CCL2, and –CCL1 models, because their regularizers were mainly considered
for controlling the similarities of submodels and may not have been enough for preventing overfitting.
We divided each dataset into two parts: training data and testing data. Each model was trained on
the training data with proper regularization parameters and the learning rate selected on the basis
of 5-fold cross-validation. Each trained model was evaluated on the testing data. The splitting of the
data was done by dividing all days into a number of chunks of 11 consecutive days, for which the first
8 days were used for training and the next 3 days were used for testing. We have used the RMSE as
the evaluation metric.
We first report the improvement of each method over the baseline method. The improvement
was measured by a positive or negative percentage over the performance of the baseline method,
that is, (RMSE of compared method - RMSE of the baseline method)×100/RMSE of the baseline
method. The results are shown in Figures 2 and 3. To facilitate the comparison between different
methods, for each air pollutant of each dataset, we report two figures, with one grouping the results by
regularizations and the other grouping the results by the model formulations. From the results, we can
see that (i) the light model formulation had a clear advantage over the heavy model formulation and
the baseline model formulation, which implied that controlling the number of parameters is important
Big Data Cogn. Comput. 2018, 2, 5 11 of 15
for improving generalization performance; and (ii) the proposed CC regularization yielded a better
performance than other regularizations, which verified that considering the similarities between
models of consecutive hours is helpful. We also report the exact RMSE of each method in Table 2.
LU-LV: O3 LU-LV: O3
15 15
10 10
5 5
0 0
F L2,1 Nuclear CCL2 CCL1 Heavy Light
2 2
1 1
0 0
-1 -1
F L2,1 Nuclear CCL2 CCL1 Heavy Light
Figure 2. Improvement of different methods over the baseline method for Lewis University–Lemont
Village (LU–LV) dataset.
LMA-AV: O3 LMA-AV: O3
15 15
Improving Percentage (%)
Improving Percentage (%)
10 10
5 5
0 0
F L2,1 Nuclear CCL2 CCL1 Heavy Light
8 8
6 6
4 4
2 2
0 0
-2 -2
-4 -4
F L2,1 Nuclear CCL2 CCL1 Heavy Light
Figure 3. Improvement of different methods over the baseline method for Lansing Municipal
Airport–Alsip Village (LMA–AV) dataset.
Big Data Cogn. Comput. 2018, 2, 5 12 of 15
Table 2. Root-mean-squared error (RMSE) for all approaches and datasets. The best approaches are
marked as bold.
Finally, we compared the convergence speed of the employed optimization algorithms with their
standard counterparts. In particular, we compared the ASSG and SSG methods for optimizing the
`2,1 -norm regularized problem, and SSG for solving the nuclear norm regularized problem, and and
SADMM for solving the CC regularized problem. The results are plotted in Figure 4 and demonstrate
that the employed advanced optimization techniques converged much faster than the classical
techniques.
objective
75
54 56
70
53 54
65
52 60 52
51 55 50
0 1000 2000 3000 4000 5000 0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 3
# of iterations # of iterations # of iterations ×104
6. Conclusions
In this paper, we have developed efficient machine learning methods for air pollutant prediction.
We have formulated the problem as regularized MTL and employed advanced optimization algorithms
for solving different formulations. We have focused on alleviating model complexity by reducing the
number of model parameters and on improving the performance by using a structured regularizer.
Our results show that the proposed light formulation achieves much better performance than the
other two model formulations and that the regularization by enforcing prediction models for two
consecutive hours to be close can also boost the performance of predictions. We have also shown that
advanced optimization techniques are important for improving the convergence of optimization and
that they speed up the training process for big data. For future work, we will further consider the
commonalities between nearby meteorology stations and combine them in a MTL framework, which
may provide a further boosting for the prediction.
Acknowledgments: Authors would like to thank the support from Environmental Health Sciences Research
Center at University of Iowa, and National Science Foundation Grant No. IIS-1566386 for funding and facilitating
this research.
Big Data Cogn. Comput. 2018, 2, 5 13 of 15
Author Contributions: Dixian Zhu, Tianbao Yang, and Xun Zhou conceived and designed the experiments;
Changjie Cai collected the data; Dixian Zhu and Changjie Cai analyzed the data; Dixian Zhu performed the
experiments; Xun Zhou and Tianbao Yang contributed to the progress of research idea; Tianbao Yang, Changjie Cai
and Dixian Zhu wrote the paper. All authors have read and approved the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Curtis, L.; Rea, W.; Smith-Willis, P.; Fenyves, E.; Pan, Y. Adverse health effects of outdoor air pollutants.
Environ. Int. 2006, 32, 815–830.
2. Mayer, H. Air pollution in cities. Atmos. Environ. 1999, 33, 4029–4037.
3. Samet, J.M.; Zeger, S.L.; Dominici, F.; Curriero, F.; Coursac, I.; Dockery, D.W.; Schwartz, J.; Zanobetti, A.
The national morbidity, mortality, and air pollution study. Part II: Morbidity and mortality from air pollution
in the United States. Res. Rep. Health Eff. Inst. 2000, 94, 5–79.
4. Dockery, D.W.; Schwartz, J.; Spengler, J.D. Air pollution and daily mortality: Associations with particulates
and acid aerosols. Environ. Res. 1992, 59, 362–373.
5. Schwartz, J.; Dockery, D.W. Increased mortality in Philadelphia associated with daily air pollution
concentrations. Am. Rev. Respir. Dis. 1992, 145, 600–604.
6. American Lung Association. State of the Air Report; ALA: New York, NY, USA, 2007; pp. 19–27.
7. Environmental Protection Agency (EPA). Region 5: State Designations, as of September 18, 2009.
Available online: https://archive.epa.gov/ozonedesignations/web/html/region5desig.html (accessed
on 17 December 2017).
8. Hinds, W.C. Aerosol Technology: Properties, Behavior, and Measurement of Airborne Particles; John Wiley & Sons:
Hoboken, NJ, USA, 2012.
9. Soukup, J.M.; Becker, S. Human alveolar macrophage responses to air pollution particulates are associated
with insoluble components of coarse material, including particulate endotoxin. Toxicol. Appl. Pharmacol.
2001, 171, 20–26.
10. Environmental Protection Agency (EPA). CFR Parts 50, 51, 52, 53, and 58-National Ambient Air Quality
Standards for Particulate Matter: Final Rule. Fed. Regist. 2013, 78, 3086–3286.
11. Schwartz, J. Short term fluctuations in air pollution and hospital admissions of the elderly for respiratory
disease. Thorax 1995, 50, 531–538.
12. De Leon, A.P.; Anderson, H.R.; Bland, J.M.; Strachan, D.P.; Bower, J. Effects of air pollution on daily hospital
admissions for respiratory disease in London between 1987-88 and 1991-92. J. Epidemiol. Community Health
1996, 50 (Suppl. 1), s63–s70.
13. Birmili, W.; Wiedensohler, A. New particle formation in the continental boundary layer: Meteorological and
gas phase parameter influence. Geophys. Res. Lett. 2000, 27, 3325–3328.
14. Lee, J.-T.; Kim, H.; Song, H.; Hong, Y.C.; Cho, Y.S.; Shin, S.Y.; Hyun, Y.J.; Kim, Y.S. Air pollution and asthma
among children in Seoul, Korea. Epidemiology 2002, 13, 481–484.
15. Cai, C.; Zhang, X.; Wang, K.; Zhang, Y.; Wang, L.; Zhang, Q.; Duan, F.; He, K.; Yu, S.-C. Incorporation of
new particle formation and early growth treatments into WRF/Chem: Model improvement, evaluation,
and impacts of anthropogenic aerosols over East Asia. Atmos. Environ. 2016, 124, 262–284.
16. Kalkstein, L.S.; Corrigan, P. A synoptic climatological approach for geographical analysis: Assessment of
sulfur dioxide concentrations. Ann. Assoc. Am. Geogr. 1986, 76, 381–395.
17. Comrie, A.C. A synoptic climatology of rural ozone pollution at three forest sites in Pennsylvania.
Atmos. Environ. 1994, 28, 1601–1614.
18. Eder, B.K.; Davis, J.M.; Bloomfield, P. An automated classification scheme designed to better elucidate the
dependence of ozone on meteorology. J. Appl. Meteorol. 1994, 33, 1182–1199.
19. Zelenka, M.P. An analysis of the meteorological parameters affecting ambient concentrations of acid aerosols
in Uniontown, Pennsylvania. Atmos. Environ. 1997, 31, 869–878.
20. Laakso, L.; Hussein, T.; Aarnio, P.; Komppula, M.; Hiltunen, V.; Viisanen, Y.; Kulmala, M. Diurnal and
annual characteristics of particle mass and number concentrations in urban, rural and Arctic environments
in Finland. Atmos. Environ. 2003, 37, 2629–2641.
21. Jacob, D.J.; Winner, D.A. Effect of climate change on air quality. Atmos. Environ. 2009, 43, 51–63.
Big Data Cogn. Comput. 2018, 2, 5 14 of 15
22. Holloway, T.; Spak, S.N.; Barker, D.; Bretl, M.; Moberg, C.; Hayhoe, K.; Van Dorn, J.; Wuebbles, D. Change in
ozone air pollution over Chicago associated with global climate change. J. Geophys. Res. Atmos. 2008, 113,
doi:10.1029/2007JD009775.
23. Akbari, H. Shade trees reduce building energy use and CO2 emissions from power plants. Environ. Pollut.
2002, 116, S119–S126.
24. DeGaetano, A.T.; Doherty, O.M. Temporal, spatial and meteorological variations in hourly PM 2.5
concentration extremes in New York City. Atmos. Environ. 2004, 38, 1547–1558.
25. Elminir, H.K. Dependence of urban air pollutants on meteorology. Sci. Total Environ. 2005, 350, 225–237.
26. Natsagdorj, L.; Jugder, D.; Chung, Y.S. Analysis of dust storms observed in Mongolia during 1937–1999.
Atmos. Environ. 2003, 37, 1401–1411.
27. Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change;
John Wiley & Sons: Hoboken, NJ, USA, 2016.
28. Appel, B.R.; Tokiwa, Y.; Hsu, J.; Kothny, E.L.; Hahn, E. Visibility as related to atmospheric aerosol constituents.
Atmos. Environ. (1967) 1985, 19, 1525–1534.
29. Deng, X.; Tie, X.; Wu, D.; Zhou, X.; Bi, X.; Tan, H.; Li, F.; Jiang, C. Long-term trend of visibility and its
characterizations in the Pearl River Delta (PRD) region, China. Atmos. Environ. 2008, 42, 1424–1435.
30. Twomey, S. The influence of pollution on the shortwave albedo of clouds. J. Atmos. Sci. 1977, 34, 1149–1152.
31. Zheng, Y.; Liu, F.; Hsieh, H.-P. U-Air: When urban air quality inference meets big data. In Proceedings of the
19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA,
11–14 August 2013.
32. Kalapanidas, E.; Avouris, N. Short-term air quality prediction using a case-based classifier. Environ. Model.
Softw. 2001, 16, 263–272.
33. Kurt, A.; Oktay, A.B. Forecasting air pollutant indicator levels with geographic models 3 days in advance
using neural networks. Expert Syst. Appl.2010, 37, 7986–7992.
34. Kleine Deters, J.; Zalakeviciute, R.; Gonzalez, M.; Rybarczyk, Y. Modeling PM2.5 urban pollution using
machine learning and selected meteorological parameters. J. Electr. Comput. Eng. 2017, 2017, 5106045.
35. Bougoudis, I.; Demertzis, K.; Iliadis, L.; Anezakis, V.-D.; Papaleonidas, A. FuSSFFra, a fuzzy semi-supervised
forecasting framework: The case of the air pollution in Athens. In Neural Computing and Applications; Springer:
Berlin, Germany, 2017; pp. 1–14.
36. Yuan, Z.; Zhou, X.; Yang, T.; Tamerius, J.; Mantilla, R. Predicting Traffic Accidents Through Heterogeneous
Urban Data: A Case Study. In Proceedings of the 6th International Workshop on Urban Computing
(UrbComp 2017), Halifax, NS, Canada, 14 August 2017.
37. Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with
multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki,
Finland, 5–9 July 2008.
38. Fan, J.; Gao, Y.; Luo, H. Integrating concept ontology and multitask learning to achieve more effective
classifier training for multilevel image annotation. IEEE Trans. Image Process. 2008, 17, 407–426.
39. Widmer, C.; Leiva, J.; Altun, Y.; Rätsch, G. Leveraging sequence classification by taxonomy-based multitask
learning. In Annual International Conference on Research in Computational Molecular Biology; Springer:
Berlin/Heidelberg, Germany, 2010.
40. Kshirsagar, M.; Carbonell, J.; Klein-Seetharaman, J. Multitask learning for host-pathogen protein interactions.
Bioinformatics 2013, 29, i217–i226.
41. Lindbeck, A.; Snower, D.J. Multitask learning and the reorganization of work: From tayloristic to holistic
organization. J. Labor Econ. 2000, 18, 353–376.
42. Foley, K.M.; Roselle, S.J.; Appel, K.W.; Bhave, P.V.; Pleim, J.E.; Otte, T.L.; Mathur, R.; Sarwar, G.; Young, J.O.;
Gilliam, R.C.; et al. Incremental testing of the Community Multiscale Air Quality (CMAQ) modeling system
version 4.7. Geosci. Model Dev. 2010, 3, 205–226.
43. Yahya, K.; Wang, K.; Campbell, P.; Chen, Y.; Glotfelty, T.; He, J.; Pirhalla, M.; Zhang, Y. Decadal application of
WRF/Chem for regional air quality and climate modeling over the US under the representative concentration
pathways scenarios. Part 1: Model evaluation and impact of downscaling. Atmos. Environ. 2017, 152, 562–583.
44. Horel, J.; Splitt, M.; Dunn, L.; Pechmann, J.; White, B.; Ciliberti, C.; Lazarus, S.; Slemmer, J.; Zaff, D.;
Burks, J.; et al. Mesowest: Cooperative mesonets in the western United States. Bull. Am. Meteorol. Soc. 2002,
83, 211–225.
Big Data Cogn. Comput. 2018, 2, 5 15 of 15
45. Athanasiadis, I.N.; Kaburlasos, V.G.; Mitkas, P.A.; Petridis, V. Applying machine learning techniques on air
quality data for real-time decision support. In Proceedings of the First international NAISO Symposium on
Information Technologies in Environmental Engineering (ITEE’2003), Gdansk, Poland, 24–27 June 2003.
46. Corani, G. Air quality prediction in Milan: Feed-forward neural networks, pruned neural networks and lazy
learning. Ecol. Model. 2005, 185, 513–529.
47. Fu, M.; Wang, W.; Le, Z.; Khorram, M.S. Prediction of particular matter concentrations by developed
feed-forward neural network with rolling mechanism and gray model. Neural Comput. Appl. 2015, 26,
1789–1797.
48. Jiang, D.; Zhang, Y.; Hu, X.; Zeng, Y.; Tan, J.; Shao, D. Progress in developing an ANN model for air pollution
index forecast. Atmos. Environ. 2004, 38, 7055–7064.
49. Ni, X.Y.; Huang, H.; Du, W.P. Relevance analysis and short-term prediction of PM 2.5 concentrations in
Beijing based on multi-source data. Atmos. Environ. 2017, 150, 146–161.
50. Caruana, R. Multitask learning. In Learning to Learn; Springer: Boston, MA, USA, 1998; pp. 95–133.
51. Liu, J.; Ji, S.; Ye, J. Multi-task feature learning via efficient l 2, 1-norm minimization. In Proceedings of the
Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009.
52. Recht, B.; Fazel, M.; Parrilo, P.A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear
norm minimization. SIAM Rev. 2010, 52, 471–501.
53. Argyriou, A.; Micchelli, C.A.; Pontil, M. On spectral learning. J. Mach. Learn. Res. 2010, 11, 935–953.
54. Maurer, A. Bounds for linear multi-task learning. J. Mach. Learn. Res. 2006, 7, 117–139.
55. Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms.
In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada,
4–8 July 2004.
56. Xu, Y.; Lin, Q.; Yang, T. Stochastic Convex Optimization: Faster Local Growth Implies Faster Global
Convergence. In Proceedings of the International Conference on Machine Learning, Sydney, Australia,
6–11 August 2017.
57. Parikh, N.; Boyd, S. Proximal algorithms. Found. Trends Optim. 2014, 1, 127–239.
58. Xiao,Y.; Li, Z.; Yang, T.; Zhang, L. SVD-free convex-concave approaches for nuclear norm regularization.
In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne,
Australian, 19–25 August 2017.
59. Xu, Y.; Liu, M.; Lin, Q.; Yang, T. ADMM without a Fixed Penalty Parameter: Faster Convergence with
New Adaptive Penalization. In Proceedings of the Advances in Neural Information Processing Systems,
Long Beach, CA, USA, 4–9 December 2017.
60. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359.
c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).