0% found this document useful (0 votes)
70 views10 pages

Machine Learning Model To Estimate The Photodegradation Performance of Stannates Photocatalysts.

Uploaded by

Anouar Soltani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views10 pages

Machine Learning Model To Estimate The Photodegradation Performance of Stannates Photocatalysts.

Uploaded by

Anouar Soltani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Computational and Theoretical Chemistry 1244 (2025) 115003

Contents lists available at ScienceDirect

Computational and Theoretical Chemistry


journal homepage: www.elsevier.com/locate/comptc

Machine learning predictive model to estimate the photo-degradation


performance of stannates and hydroxystannates photocatalysts on a variety
of waterborne contaminants
Anouar Soltani , Faiçal Djani *, Yassine Abdesslam
Laboratory of Molecular Chemistry and Environment, University of Biskra, BP 145 RP, Biskra 07000, Algeria

A R T I C L E I N F O A B S T R A C T

Keywords: In this work, a comprehensive machine learning (ML) methodology was used to predict the degradation effi­
Photocatalytic degradation ciency of different stannate and hydroxystannate photocatalysts on a wide range of waterborne pollutants. The
Machine learning structural, atomic features along with molecular fingerprints (MF) were used as descriptors of the crystalline
Molecular fingerprint
phase of the photocatalysts and the organic compounds, respectively. The encoded features of the photocatalysts
Random Forest
KNN
and contaminants along with the experimental variables of the degradation process are input to two ML models,
named as RF (random forest) and KNN (K nearest neighbor). The RF model has achieved a very good prediction
of the photocatalytic degradation efficiency (%) by different photocatalysts over a wide range of organic con­
taminants. The RF model performance was investigated by applying two different training strategies. The effects
of different factors on photocatalytic degradation performance are further evaluated by feature importance
analyses. Two illustrative applications on the use of the ML model for optimal photocatalyst selection and for
assessing other types of photocatalysts for different environmental applications were provided.

1. Introduction its ease, simplicity, flexibility, operativity, insensitivity to noxious ma­


terials, and economic feasibility [9]. However, still bear numerous
In pursuit to fulfill the diverse needs of a continuously growing so­ drawbacks, for instance, numerous adsorbents such as bio-adsorbents,
cieties and surging technological advancements, newer chemicals and activated carbon, silica, alumina, clay, metal oxides, titania…etc.
substances continually to enter our daily lives [1]. These compounds, [10–12] still possess drawbacks that hampered their performance like
often designed for specific purposes include synthetic chemicals, phar­ enhancing total organic carbon (i.e.; AC) [13], compromised adsorption
maceuticals, pesticides, heavy metals, and various other substances, capacity (i.e.; alumina) [14] and the deformation of certain nanoporous
inadvertently find their way into water bodies, soil, and air through materials such as TiO2, Fe3O4, and ZnO caused by adsorption [15].
various pathways leading eventually to adverse effects on human health, In the context of biological processes, microorganisms have shown
including hormonal imbalances and potential long-term health risks the potential to decompose organic colorants in biological treatment via
[1–4]. Therefore, balancing the benefits of industrialization with envi­ aerobic or anaerobic cycle, attracting well public acceptance because of
ronmental preservation has become a critical global challenge, neces­ its simplicity and economic feasibility [16]. However, various dyes used
sitating innovative technologies, policy interventions, and a shift in textile industries are harmful to aerobic organisms and create sludge
towards more eco-friendly practices to mitigate its adverse impacts on rising, flocculation, and sludge bulking. Therefore, the biological pro­
the planet [5,6] (see Table 1). cess that proceeds through the aerobic route is testified to be insufficient
Several techniques are available for treating effluents containing in tackling textile dyes, specifically azo dyes [17].
contaminants. These treatment processes can be divided into two main In recent years, photocatalysts have emerged along with numerous
classes: biological processes and physical–chemical processes [7,8]. promising chemical methods such as electrochemical oxidation [18],
While adsorption is usually considered a suitable and efficient method Fenton’s oxidation [19] and ozonation [20] for water purification to
for almost all types of pollutants among its physical methods counter­ address the performance imperfections of conventional treatment
parts like membrane filtration and coagulation-flocculation in terms of methods. This process utilizes catalysts to break down organic pollutants

* Corresponding author.
E-mail address: [email protected] (F. Djani).

https://doi.org/10.1016/j.comptc.2024.115003
Received 3 April 2024; Received in revised form 8 November 2024; Accepted 24 November 2024
Available online 2 December 2024
2210-271X/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

Table 1 spectrum of water-borne pollutants.


Technical and atomic parameter descriptors used in this study. Recent advancements in machine learning (ML) have significantly
No Descriptor impacted research in chemistry, making several tasks with much flexi­
bility than experiments, more efficient than theoretical modeling and
01 Photocatalyst dosage
02 Contaminant dosage with an affordable computational cost. New organic and inorganic
03 pH materials with desired properties have been designed [34–40], aiding in
04 SSA the design of more efficient batteries, catalysts, polymers, and other
05 Mulliken Electronegativity of the A element materials by quickly screening and identifying potential candidates from
06 Mulliken Electronegativity of the B element
07 Mulliken Electronegativity of the Oxygen
vast chemical spaces [41].
08 Mulliken Electronegativity of the Hydrogen In the context of assessing photocatalysts efficiency, several intelli­
09 Ionic radius of the the A element gent algorithms have been investigated to establish connections be­
10 Ionic radius of the the B element tween experimental conditions, molecular descriptors as inputs and
11 Ionic radius of the the Oxygen
reaction constant (k) or degradation efficiency (%) as well as providing
12 Ionic radius of the the Hydrogen
13 Number of valence electrons of the A element comprehensive understanding of the key factors involved in the degra­
14 Number of valence electrons of the B element dation process [42–48]. This approach can potentially lead to crucial
15 Number of valence electrons of the Oxygen advancement in the design of more efficient environmental remediation
16 Number of valence electrons of the Hydrogen and water treatment solutions.
17 Ionization energy of the A element
18 Ionization energy of the B element
In the previous reports, the weights assigned to each feature were
19 Ionization energy of the Oxygen assumed to be identical, indicating that the input features were treated
20 Ionization energy of the Hydrogen equally in the model without distinguishing between significant and
21 Average of ionization energies of the elements constituting the photocatalyst insignificant data. Still, because these models only take into account a
material
small number of contaminants and one photocatalyst, their applicability
22 Average of ionic radii of the elements constituting the photocatalyst material
23 Average of Mulliken Electronegativities of the elements constituting the is restricted to organic pollutants with an identical chemical structure.
photocatalyst material Yet, the applicability of these models is restricted to organic pollutants
24 Molecular Fingerprints of the contaminants (1024 features) that possess a comparable chemical composition, since they solely take
25 Light type into account a restricted range of pollutants and a solitary photocatalyst.
26 Irradiation time
Jiang et al. [49], proposed a novel hybrid methodology combining
27 Calcination temperature
28 Calcination time convolutional crystal graph network CGCNN, molecular fingerprint MF
29 Drying temperature and artificial neural network ANN to foresee the photo-oxidation per­
30 Drying time formance of different photocatalysts on a various water pollutant. The
31 particle shape
model achieved satisfactory agreement with the experimental results,
32 pore size
and showed better prediction precision in terms of the R2, RMSE and
MAE. However, the authors used average particle size as a morpholog­
when exposed to light, effectively degrading various contaminants. A ical descriptor of the photocatalyst, which is less accurate than specific
broad spectrum of photosensitive materials includes such as: TiO2 [21], surface area (SSA), this could adversely diminish the accuracy of the
WO3 [22], ZnO [23], SnO2, Fe2O3 and ß-MnO2 [24], ternary oxides such model since it does not provide informations about the distribution of
as: titanates (ATiO3), tantalates (ATaO3), vandanate (AVO4) and stan­ other larger and smaller particles which also contribute to the catalytic
nates (ASnO3) [25]. The integration of Carbon-based materials, partic­ process.
ularly graphene oxide (GO), reduced graphene oxide (rGO) and A large number of algorithms has been used in regression tasks to
multiwalled carbon nanotubes (MWCNTs) for the development of make connections between target property (output) and chemical or
innovative composite photocatalysts has been found to enhance the physical descriptors (input). While there is no such particular algorithm
photocatalytic activity owing to their high chemical stability across a that is convenient for all problems, ML algorithms differ actually in their
wide pH range, customizable structural and electronic properties [26]. explainability, which refers to the ability to explain the outputs of a
Stannates (ASnO3) and hydroxystannates ASn(OH)6, where A is model in human terms. While the most of the ML work reported in the
either an alkaline earth element or transition metal, have attracted literature are based on less interpretable methods due to complex ar­
recently significant attention due to their environmental friendliness, chitectures and numerous parameters such as artificial neural network
outstanding electronic, optic, magnetic properties and incredibly flex­ (ANN), the use of more easily interpretable methods such as RF and KNN
ible structure which have made them reported in many applications, is always privileged as they provide feature importance scores,
especially electronic devices, solar cells, fuel cells, methane combustion, furthermore, they helps help identify and diagnose issues within the
hydrogen production and heterogeneous photocatalysis [27–31]. How­ model, such as data leakage, irrelevant features, or model biases have
ever, the number of published papers on stannates applied as photo­ not yet been considered for such applications, to the best of our
catalysts is relatively small compared to their binary counterparts such knowledge.
as TiO2, SnO2, Fe2O3, WO3, ß-MnO2 according to Honorio et al. [32]. The objective of this work is to provide a comprehensive procedure
The effectiveness of photocatalysts may be assessed by quantifying the for predicting the degradation efficiency of stannates and hydrox­
exact change in pollutant concentrations, which can ultimately be stated ystannates for a variety on a range of contaminants. Data from published
in terms of the degradation rate constant k or the degradation percent­ research were collected to generate a database of photocatalysis matrix,
age (%) [33]. While the path to assess the efficiency of a photocatalyst to including the experimental variables. The connections between features
a range of waterborne contaminants can be very challenging, given the of each photocatalyst, ML algorithms and molecular fingerprints (MF)
laborious nature of the synthesis process, the costly characterization of assisted digital representation of contaminants together with experi­
the crystalline structure, the size and shape of the grain, the specific mental conditions were investigated using two ML algorithms random
surface area, the pore structure, and finally the complex multivariable forest (RF) and K nearest neighbor (KNN). The RF model achieved the
experimental setup of the optimization process, which includes factors highest prediction accuracy and was selected to pave the way for further
like photocatalyst dosage, medium pH, contaminant concentration, light performance analysis using two different strategies. The developed
wavelength and intensity [34]. Therefore, the standard experimental model enabled to forecast the performance of novel photocatalysts and
approach practicality is severely jeopardized as a result of the vast contaminants providing therefore a simple and promising decision-
making tool to support photocatalytic process optimization and cost-

2
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

effective operation.

2. Material and methods

All machine learning models were built using the Weka 3.8.6 ma­
chine learning Software. The study is performed within four stages.
Firstly, a thorough literature review of 31 articles has allowed us to
build a comprehensive dataset of 597 data points containing photo­
catalyst type, contaminant type, initial contaminant concentration,
photocatalyst concentration, specific surface area (SSA), light type and
pH (Supporting information). Degradation efficiency data were extrac­
ted from C(t)/C0 = f (t) or degradation efficiency (D.E(%)) = f (t) plots
using WebPlotDigitizer 4.6 software. During the second stage, contam­
inant canonical smiles were gathered from PubChem and python pack­
age RDKit was then used to convert smiles to a binary digit vector, where
“0″ or “1” represent, respectively, the absence or presence of a particular
substructure. Hybrid features for the photocatalysts including crystal
and composition-based features were generated. In the first last stage, a
preprocessing step included outlier values detection and removal and
data normalizing has been performed using Weka built-in filters. Finally,
two ML algorithms, namely random forest (RF) and K nearest neighbor Fig. 1. The RMSE versus generation of evolution related to subset of features.
(KNN) were applied with hyperparameters defined as default, and the
prediction results were gathered and visualized. The algorithm with the
dosage, Contaminant dosage, pH, SSA, Light type, irradiation time,
highest R2 and lowest MAE and RMSE scores was selected for further
lattice parameter c, Average of ionic radii of the elements constituting
performance analysis using another training methodology i.e., training
the photocatalyst material, Average of ionization energies of the ele­
with respect to each photocatalyst subset only. The RF model was
ments constituting the photocatalyst material, Ionization energy of the A
interpretated using Shapley Additive explanations (SHAP) for each of
element along with 6 molecular fingerprints of the contaminants.
the features. Evaluation metrics of the different prediction tasks were
In order to select the ML model for which we pave the way for further
expressed in terms of R2, MAE and RMSE as follow:
performance analysis using the two different strategies through the
∑n
(p − oi )2 (p − p) manuscript. Initially, the whole dataset was randomly divided into n =
R2 = ∑n i=1 i 2 ∑n i 2
(1) 2, 3 and 597 subsets, with (n-1) subset being used for model training and
i=1 (oi − o) i=1 (pi − p)
the remaining subset being used for testing periodically. After each
[
∑n
− pi |] subset validation procedure, evaluation metrics (R2, MAE, RMSE) were
i=1 |oi
MAE = (2) calculated and the final testing score is expressed eventually as the
n
average of the three evaluation parameters during the entire validation
[
∑n
− pi )2 ] process. The three-fold cross-validation method is a widely recognized
i=1 (oi
RMSE = (3) re-sampling technique and can mitigates the error of the model pre­
n
diction. The evaluation of the prediction performance in terms of coef­
where n is the total number of samples, oi and pi represent, the observed ficient of determination (R2), mean absolute error (MAE), and root mean
and the predicted degradation efficiencies, respectively. p and o are the square error (RMSE) of the two ML models on the three testing sub­
average of the predicted and the measured degradation efficiencies groups and the leave one out cross validation LOOCV with and without
values, respectively. taking into account processing and morphological parameters are listed
in Table 2.
3. Results The scatter plots in Fig. 2 summarize the predicted vs experimentally
measured values of D.E (%) for the two ML models. The more the points
3.1. Selection of molecular descriptors

In order to eliminate redundant, noisy, or unnecessary descriptors Table 2


during the process of constructing a model, the selection of molecular The performance of the two ML models in one, two, three-fold and leave one out
descriptors is an effective method for reducing the complexity of the cross-validation (LOOCV) predictions.
input space without sacrificing crucial information. The Classifier subset Folds 1 2 3 LOOCV
evaluator is a function that evaluates the subset of descriptors for Processing and morphological
modeling on training data or a separate hold out testing set. Greedy parameters excluded
Stepwise algorithm was adopted as a search method. An important
Random Forest (RF) R2 0.994 0.87 0.9 0.943
benefit of using a classifier subset evaluator is its capability to pinpoint MAE 0.026 0.11 0.094 0.0718
the most relevant characteristics, so decreasing the complexity of the RMSE 0.036 0.146 0.129 0.098
data, streamlining the model, and perhaps enhancing the model’s effi­
cacy which serves the objective of our study. The merit of a set of at­ Processing and morphological parameters included
tributes is measured by estimating RMSE as a function of subset of Random Forest (RF) R2 0.993 0.85 0.89 0.943
descriptors in a way that the minimum error corresponds to the optimal MAE 0.028 0.12 0.096 0.072
RMSE 0.038 0.15 0.13 0.1
sub set of features.
Fig. 1 illustrates how RMSE vary as a function of feature generations.
It has been found that the smallest RMSE after 16 generations of evo­ Processing and morphological parameters excluded
K nearest neighbor (KNN) R2 0.99 0.8165 0.847 0.903
lution corresponds to the optimal sub set of features including two
MAE 0.01 0.125 0.1128 0.095
atomic parameters and three technical parameters. Photocatalyst RMSE 0.1 0.17 0.1556 0.1246

3
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

Fig. 2. Scatter plot describing predicted vs experimentally measured values of D.E (%). (a) RF. (b) KNN.

that would lay along the diagonal 1:1 line, the more the model tend data points using these two different training and testing procedures.
toward perfection. The highest correlation coefficient R2 in all cross- For the ML model trained with all datasets (Fig. 2), its prediction
validation folds was achieved by the random forest algorithm, which performance metrics, such as R2, MAE, RMSE of ML model trained with
has achieved an absolute error of only 0.09 and 0.07 in the three-fold the overall dataset (Table 2), lied in between those predicted for indi­
and the leave one out cross validation, respectively and was visualized vidual photocatalyst (Table 3).
in the clustering of most of the points close to the diagonal 1:1 line. The The Random Forest model trained using all available datasets,
error decreased as the number of folds rose. This is due to the fact that exhibited superior predictive performance compared to training the
when the number of folds rises, the size of the training sample likewise model only with the data specific to that photocatalyst. This might
increases, providing the model with more opportunities to capture new appear at the first glance paradoxical, especially in this case where
instances. The RF model demonstrated good prediction performance, different photocatalyst and contaminants share no direct connections.
considering the complex interactions at play and the relatively small Still, machine learning algorithms exhibit enhanced efficiency in pattern
data utilized for training. recognition as the volume of data increases. Additionally, it is evident
K nearest neighbor (KNN) algorithm achieved less accurate pre­ that the model trained without incorporating processing and morpho­
dictions than random forest, exhibiting considerably errors and lower logical properties demonstrated superior performance compared to
correlation coefficients, reflected in the spreading of the points from the when these features were included. Furthermore, the significance of the
1:1 regression diagonal. Random forest (RF) typically outperforms KNN data’s quality and variety cannot be overstated, and it has noticeably
in situations with many features and a large number of training exam­ diminished in the context of the second training approach. ML models
ples. KNN, in contrast, is more suited to low-dimensional data and can be have the ability to learn more patterns and associations when provided
sensitive to the choice of distance metric [50]. with a more extensive dataset. However, it is important to remember
Additional investigations were performed to examine the accuracy of that they may also include irrelevant or misleading information, some­
the RF model in predicting the extent of photocatalytic degradation times referred to as ’noise’. The data impact on the performance of the
across various photocatalysts and pollutants. The first investigation ML model is confirmed by the findings shown in Table 3. The finding
sought to examine the RF model using the data acquired for several emphasized the essentials of a data-driven approach, namely the sig­
photocatalysts, using the three-fold cross-validation approach as speci­ nificance of both the quantity and variety of data.
fied. The second set of analysis attempted to compare the prior findings The Fig. 3a and b show the sensitivity of the model’s performance in
of RF training with those obtained by training just particular subgroups terms of R2, MAE and RMSE to the training data volume and diversity in
of photocatalysts. To achieve this objective, the data was partitioned data, respectively. The RF model provides valuable insights into its
into subsets based on various photocatalysts. Each subset was thereafter learning process, demonstrating significant improvements in adapting
used to train and evaluate the RF model specific to that particular kind. to new patterns as the amount of data rises. Additionally, Fig. 3b out­
The scatter plots depicted in Fig. S1 (a, c, e, g, i, k, m) in Supporting lines a specific feature related to machine learning models which is,
information show of the prediction vs actual D.E values for each group of unlike physics-based models, are able to learn from data with no direct
the photocatalysts. These predictions resulted from training the RF relevance between them. The above discussion is further confirmed by
model with all the data and testing with respect to each photocatalyst several studies especially in drug discovery [51,52].
group separately. Fig. S1 (b, d, f, h, j, l, n) show the prediction results for
each group of photocatalysts acting on a variety of contaminants trained 4. Performance of RF model for different types of contaminants
and tested separately with their own data. Table 3 summarizes the
performance of model prediction for different photocatalysts with most In addition, we conducted an analysis of the RF model performance

Table 3
The performance of the RF model in the first (trained with all dataset and tested with respect to each photocatalyst) and the second (trained for individual photo­
catalyst) training strategy.
Photocatalyst ZnSn ZnSnO3 SrSn SrSnO3 MgSn MgSnO3 CaSnO3
(OH)6 (OH)6 (OH)6

RF model trained with all dataset and tested with respect to each photocatalyst R2 0.964 0.974 0.989 0.986 0.96 0.996 0.972
MAE 0.09 0.058 0.049 0.054 0.088 0.025 0.088
RMSE 0.11 0.073 0.065 0.072 0.104 0.03 0.109

RF model trained for individual photocatalyst R2 0.932 0.941 0.928 0.879 0.933 0.962 0.932
MAE 0.079 0.057 0.089 0.098 0.084 0.055 0.089
RMSE 0.1 0.088 0.123 0.146 0.105 0.068 0.11

4
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

Fig. 3. RF Model performance as a function of training data volume (a) and number of subgroups (b).

with respect to several categories of pollutants. The dataset used to train color coded, with shifting from blue to red indicating the magnitude of
the ML model consisted of 8 distinct water pollutants with the number of the variable increased.
data points differ from one to other as presented in Table 4. The majority Combining the informations extracted from Fig. 4a and b, factors like
of the results were relatively precise, with mean absolute error (MAE) photocatalyst, pH and contaminant dosages possess almost the same
values below 0.1. The previously discussed trend that increasing amount influence with photocatalyst concentration showing slightly advanced
of data would reduce prediction error with was valid for the six influence. This important ML guided conclusion corroborates well with
contaminant categories, namely: methyl orange, dimethyl phthalate experimental findings [53,54], since, usually in photocatalysis, diluted
ester, diethyl phthalate ester, ciprofloxacin, toluene and Remazol golden solutions are used for several purposes that include: improving mass
yellow. However, there were two specific pollutants, namely methylene transfer [55], reducing the recombination probability of photogenerated
blue and rhodamine B, that deviated from this trend. The complex electron-hole pairs [56] and prompting light penetration [57].
structure of these two pollutants may be at the origin of this paradox Another important information that could be extracted which is less
outcome. This implies that it is essential to further include substantial straightforward, that specific surface area (SSA) didn’t have that much
quantity of data in order to effectively improve the ML model accuracy significance as it would be expected. In fact, other factors like crystal­
for these two contaminants. linity also play important role in photocatalysis, even more important
As we can see, testing the RF model performance with respect to each than surface area in some cases [58–60] which has been well predicted
contaminant/ photocatalyst (in the previous section) category has hel­ by our model, indicting furthermore its conformity with real
ped in identifying subsets that need additional enhancements. Conse­ experiments.
quently, it is essential to conduct a multi-scale performance test to Ultimately, the developed RF model offers a remarkable under­
analyze the strengths and weaknesses of the model in question. standing of how the unique properties of a photocatalyst might impact
its activity. Based on the SHAP mean values shown in Fig. 4b, there is a
4.1. Feature importance and model interpretability positive correlation between the higher ionization energy of the metal in
the A position and the catalytic activity of the material. This outcome
The previous results indicated that the RF model trained with all can be comprehended from a chemical standpoint. Elements with
dataset and tested with respect to each photocatalyst and contaminant elevated ionization energies tend to be more electronegative due to their
achieved decent performance in predicting the photocatalytic removal strong retention of electrons. As a result, the electronic density around
efficiency by different photocatalysts over a wide range of contaminants. the metal in the B position, is diminished and the electrons become less
As the volume and diversity of data being investigated increases, the tightly bound and are easily stimulated when exposed to light, which is
efficiency of machine learning models to deduce patterns improves. crucial for a better photocatalytic activity.
However, in situations where it is important to understand how the The clustering of SHAP values for light type to the right with blue
model arrived at its decision, the challenge is to find suitable interpre­ color data points (UV light) indicated, would affect positively the con­
tation for the model’s outcome. taminant’s photo-degradation efficiency, this significance is lowered
In the present study, the estimation of Shapley Additive explanations while shifting toward high feature value of light type (visible and
(SHAP) value for each of the variables could help in assessing the simulated sunlight labelled 2 and 3, respectively).
contribution of a specific feature to the overall target value by making
the prediction with and without the attribute.
The mean SHAP values of the six experimental variables are shown in 4.2. Application of the RF model in predicting the photocatalytic
Fig. 4a. The influence of the magnitude of the value of each independent performance of a novel photocatalyst
variable is demonstrated in Fig. 4b, in which, the value of each point is
One the possible important applications of the developed RF model is
to forecast the degrading efficiency of a non-familiar photocatalyst
Table 4
which is cadmium stannate CdSnO3 (hasn’t been included in our data­
The performance of RF model in predicting D.E with respect to each contami­
nant subgroup. set). For this purpose, a testing set was prepared after retrieving the
experimental variables that have been adopted for the photocatalytic
Contaminant No of data points R2 MAE RMSE
degradation of rhodamine B in the work of Liu et al. [61]. Table 5
MB 97 0.95 0.1 0.12 summarizes the predicted values against the actual ones extracted from
MO 111 0.88 0.12 0.15
the ln (C(t)/C0) plot in the manuscript. Fig. 5 is a visualization of the
RHB 115 0.97 0.098 0.11
DMP 42 0.994 0.024 0.03 previously RF predicted vs experimental values of the target D.E (%).
DEP 42 0.997 0.024 0.029 The overall predicted D.E was about only 14 % less than the actual value
CIP 30 0.968 0.07 0.085 (pred.61 % vs actual. 75 %), which is accepted given the model’s null
Toluene 33 0.96 0.07 0.098 experience of the novel photocatalyst (see Table 6).
Remazol Golden Yellow 36 0.99 0.048 0.058
Making predictions with the pre-trained RF model with the RhB

5
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

Fig. 4. (a) The mean SHAP values of the six experimental variables (highlighting the relative impact of the experimental variable on the performance of the RF
model). (b) The color-coded distribution of SHAP values (showing whether a certain variable has a positive or negative effect on ML model prediction).

subgroup gave rise to 18 % absolute error (pred.57 % vs actual. 75 %), 4.3. Application of the RF model in predicting the D.E of novel
this observation, once again re-emphasizes the previously made one contaminant
about the necessity of training the model with the largest and diverse
dataset possible. While speaking about RhB subgroup, it is noteworthy Crystal violet also known as gentian violet is another type of con­
that model predicted the degradation efficiency of the six photocatalysts taminants that’s hasn’t been included in the dataset and for which we
while preserving mostly the original reactivity sequence: ZnSnO3 < want to forecast its degradation efficiency. In this application, the above
CdSnO3 < MgSnO3 < CaSnO3 < SrSn(OH)6. Finally, the predicted D.E procedure described in methodology section has been repeated for
values may be used to derive the degradation rate constant (k) by the strontium hydroxystannate and the experimental conditions were
application of kinetic models and selecting the one with the highest retrieved from the work made by Xue et al. [62].
correlation coefficient R2, which can be regarded as an additional Surprisingly, the model was able to make a close guess for the novel
application for the model. contaminant, where the predicted value of degradation efficiency of 74

6
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

Table 5 4.4. Comparative study


The performance of RF model in predicting photocatalytic efficiency of cadmium
stannate CdSnO3 for the degradation of rhodamine B (RhB) using the first Table 7 presents a comparison between the results obtained in the
method. current research and the findings from earlier studies on assessing
Actual D.E Predicted D.E Error organic pollutants. The findings indicate that the random forest model,
43.81 40.36 − 0.034 has superior R2 values and lower MAE and RMSE values compared to the
50.9 49.7 − 0.012 majority of previously published studies. Furthermore, the unique
58.96 51.47 − 0.075 method adopted to multiply the data points, by taking into account the
63.04 57.14 − 0.059 evolution of degradation efficiency through time, resulting in a total of
67.44 58.07 0.0936
597 instances, had a positive impact on the quality of the obtained re­

70.43 59.74 − 0.11
72.96 61.61 − 0.11 sults than it would be if the dataset was built with only single D.E for
75.65 61.61 − 0.14 each case. This emphasizes the importance of varying input character­
istics and experimental conditions from one hand and opens up insights
into further developing a model that governs stannate and hydrox­
ystannate photocatalytic properties from the other hand.

Table 7
Summar of feature selection with the new set of 2D
molecular descriptors.
No Descriptor

01 Photocatalyst dosage
02 Contaminant dosage
03 pH
04 SSA
05 Light type
Fig. 5. (a) D.E (%) predictions vs. experimental values of the RhB degraded by
06 Irradiation time
CdSnO3; (b) reactivity order as predicted using the pre-trained RF using the first
07 Ia
methodology. 08 EState_VSA5
09 EState_VSA9
% was estimated against an actual value close to 100 %. Fig. 6(a and b) 10 fr_bicyclic
11 fr_methoxy
shows the ML predicted vs actual D.E for CV contaminant by SrSn(OH)6
12 MaxAbsPartialCharge
using the two strategies. Retraining the RF model with the dataset by 13 MaxPartialCharge
involving the data from crystal violet article gave an impressive pre­ 14 MinAbsPartialCharge
diction of 95 %. 15 MinPartialCharge
16 PEOE_VSA7
17 SlogP_VSA11
18 SMR_VSA5

Table 6
Comparative table of some similar predictive tasks from literature for the photocatalytic degradation of numerous organic pollutants.
Task performed No of data points Best algorithm Evaluation metrics Ref.

R2 MAE RMSE

MG degradation efficiency prediction 1200 CatBoost 0.999 0.64 1.34 [43]


Tetracycline degradation prediction 96 ANN 0.98 – 0.0162 [45]
45 distinct types of contaminants (results of the 3-fold CV) 450 CGCNN-MF-ANN 0.768 0.193 0.266 [49]
perfluorooctanoic acid (PFOA) 1343 GBM 0.878 6.009 10.328 [63]
18 different types of contaminants (results of the 3-fold CV) 597 RF 0.9 0.09 0.12 This work

Fig. 6. (a) Predictions with the pre-trained RF model only with all the dataset without including crystal violet data (CV). (b) Predictions with the pre-trained RF using
all data set with crystal violet included. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

7
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

4.5. Applicability domain Table 8


Summary of the results of the K-means clustering.
In the present study, the applicability domain (AD) of the model was Feature Confidentiality zone
determined using k-means clustering technique. Furthermore, in this
Time (min) 20–180
section, we illustrate how the clustering technique can provide insights pH 2.8–9
to identify which contaminants beyond the 18 categories utilized in this Photocatalyst dosage (g/l) 0.1–1
study fall within the domain of applicability. To address this issue, we Initial Contaminant 5, 10, 20 and 50
first replaced the Morgan fingerprints used earlier, although powerful, concentration (mg/l)
SSA (m2/g) 10.1–138.7
however, they still suffer from major lack of interpretability with Light type 1, 2 and 3 (UV, Visible light and simulated
another set of 2D QSAR descriptors calculated using rdkit.Chem.De­ sunlight, respectively)
scriptors python module for all 18 contaminants addressed in this study Ionization energy (eV) 5.7328, 6.1132, 7.7558, 8.7656 and 9.393
and performed a feature importance calculation. The results of feature
selection are shown in Table 7.
absolute (MAE = 0.09) error and root mean square error (RMSE =
Afterwards, the selected feature space was used to perform a clus­
0.129). The interpretability of the machine learning (ML) model was
tering calculation, in which, the segmentation of the feature space en­
assessed by examining the significance of various variables on the per­
ables us to define a measurable boundary that delineates the model’s
formance of the ML model via the computation of their SHAP values and
applicability domain. The Euclidean distance was chosen to measure the
their distributions. The feature importance analysis revealed the impact
distance of each point to the cluster’s centroid. The boundaries of the
of experimental factors on the ML model predictions, which aligned with
applicability domain were determined using the 90 % confidentiality
the experimental observations. The RF model was used to showcase the
zone ellipse (prediction results with 10 % standard deviation of the
practical uses of selecting the most effective catalysts for removing
differences between actual experimental and predicted photo­
pollutants. An existing machine learning model was expanded to fore­
degradation values were omitted). The results are shown in Table S2 in
cast the behavior of additional photocatalysts. Additionally, an
Supplementary data. The range of values corresponding to confidential
approach for re-training the model was suggested to enhance its overall
zone of each feature is given in the Table 8 below.
effectiveness and applicability.
According to the clustering results obtained, at the Ionization energy
of 5.7328, 6.1132, 7.7558, 8.7656 and 9.393 eV, the model achieved
CRediT authorship contribution statement
good accuracy. This range of ionization energy are attributed to BaSnO3,
SrSnO3, MgSnO3, CaSnO3, ZnSnO3 and their hydroxides, respectively.
Anouar Soltani: Writing – original draft, Visualization, Validation,
EState_VSA5 and EState_VSA9 are components of the EState (Elec­
Supervision, Software, Methodology, Investigation, Formal analysis,
trotopological State) VSA descriptors utilized in cheminformatics for the
Data curation. Faiçal Djani: Supervision, Conceptualization. Yassine
characterization of the electronic properties of molecules. Max­
Abdesslam: Writing – review & editing, Software, Methodology,
AbsPartialCharge, MaxPartialCharge, MinAbsPartialCharge, and Min­
Investigation, Formal analysis, Conceptualization.
PartialCharge are parameters associated with the distribution of
electronic charge in a molecule. They denote the extreme values of
partial charges allocated to atoms within a molecular structure. The Declaration of competing interest
parameters denote the electronic properties that may affect molecular
interactions, stability, and reactivity. PEOE_VSA7 is a molecular The authors declare that they have no known competing financial
descriptor based on the polarizability of electron density, whereas interests or personal relationships that could have appeared to influence
SlogP_VSA11 refers to the logarithm of the octanol–water partition co­ the work reported in this paper.
efficient, which signifies hydrophobicity. SMR_VSA5 denotes a molec­
ular descriptor that indicates the presence of aliphatic rings and Acknowledgment
additional structural characteristics that may influence the behavior of
compounds. These data are of great role in guiding future predictions in The author would like to acknowledge the support provided by Mr.
the realm of identifying which contaminants beyond the 18 categories Kotiba Hamad, associate professor at the University of Sungkyunkwan,
utilized in this study fall within the domain of applicability. A future school of Advanced Materials Science & Engineering, South Korea, his
prediction can be considered reliable when the query falls within the constructive comments helped improve the quality of this article.
established boundaries of the AD (Table S2 in Supplementary data).
Likewise, any prediction that falls outside the AD should be regarded as Appendix A. Supplementary data
an extrapolation beyond the scope of the model and consequently
perceived as uncertain and therefore, less reliable. Supplementary data to this article can be found online at https://doi.
org/10.1016/j.comptc.2024.115003.
5. Conclusion
Data availability
A random forest machine learning model, named RF, was designed to
Data will be made available on request.
accurately forecast the efficacy of several stannate and hydroxystannate
photocatalysts in the degradation of a broad range of pollutants. The
molecular fingerprint (MF) was used to encode the structures of pol­ References
lutants, whereas, a mixed structural-composition based vector features [1] B.S. Rathi, P.S. Kumar, D.-V.-N. Vo, Critical review on hazardous pollutants in
was used to depict the capture the characteristics of the photocatalysts. water environment: occurrence, monitoring, fate, removal technologies and risk
The encoded data of the photocatalysts and pollutants were merged with assessment, Sci. Total Environ. 797 (2021) 149134, https://doi.org/10.1016/j.
scitotenv.2021.149134.
experimental parameters and inputted into a random forest model. A
[2] J.C. Egbueri, J.C. Agbasi, D.A. Ayejoto, M.I. Khan, M.Y.A. Khan, Extent of
dataset was constructed including 14 photocatalyst subsets and 18 anthropogenic influence on groundwater quality and human health-related risks:
distinct contaminants. This dataset was used to train and validate the RF an integrated assessment based on selected physicochemical characteristics,
and KNN models. The RF model outperformed KNN in predicting the Geocarto Int. 38 (1) (2023), https://doi.org/10.1080/10106049.2023.2210100.
[3] M. Owsianiak, et al., Ecotoxicity characterization of chemicals: global
degradation efficiency, as shown by the relatively strong correlation recommendations and implementation in USEtox, Chemosphere 310 (2023)
between the predicted and actual data (R2 = 0.9) and the low mean 136807, https://doi.org/10.1016/j.chemosphere.2022.136807.

8
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

[4] E. Drakvik, et al., Statement on advancing the assessment of chemical mixtures and [27] J. Huang, et al., Size-controlled synthesis of porous ZnSnO3 cubes and their gas-
their risks for human health and the environment, Environ. Int. 134 (2020) sensing and photocatalysis properties, Sens. Actuators, BChem 171–172 (2012)
105267, https://doi.org/10.1016/j.envint.2019.105267. 572–579, https://doi.org/10.1016/j.snb.2012.05.036.
[5] A.C. Bejarano, J.E. Adams, J. McDowell, T.F. Parkerton, M.L. Hanson, [28] W.-H. Pan, W.-J. Yang, C.-X. Wei, L.-Y. Hao, H.-D. Lu, W. Yang, Recent advances in
Recommendations for improving the reporting and communication of aquatic zinc hydroxystannate-based flame retardant polymer blends, Polymers (Basel) 14
toxicity studies for oil spill planning, response, and environmental assessment, (11) (2022) 2175, https://doi.org/10.3390/polym14112175.
Aquat. Toxicol. 255 (2023) 106391, https://doi.org/10.1016/j. [29] N. Kumar, U. Jung, B. Jung, J. Park, M. Naushad, Zinc hydroxystannate/zinc-tin
aquatox.2022.106391. oxide heterojunctions for the UVC-assisted photocatalytic degradation of methyl
[6] H.A. Muhammed, A. Yahaya, S.S. Abdullahi, A.H. Jagaba, A.H. Birniwa, Mitigating orange and tetracycline, Environ. Pollut. 316 (2023) 120353, https://doi.org/
water contamination by controlling anthropogenic activities of organochlorine 10.1016/j.envpol.2022.120353.
pesticides (OCPs) for surface water quality assurance, Case Stud. Chem. Environ. [30] G. Gnanamoorthy, V.K. Yadav, D. Latha, V. Karthikeyan, V. Narayanan, Enhanced
Eng. 8 (2023) 100474, https://doi.org/10.1016/j.cscee.2023.100474. photocatalytic performance of ZnSnO3/rGO nanocomposite, Chem. Phys. Lett. 739
[7] I.A. Saleh, N. Zouari, M.A. Al-Ghouti, Removal of pesticides from water and (2020), https://doi.org/10.1016/j.cplett.2019.137050.
wastewater: chemical, physical and biological treatment approaches, Environ. [31] J. Joseph, S.B. Saseendran, S.R. Achary, A.A. Sukumaran, M.K. Jayaraj, Zinc
Technol Innov 19 (2020) 101026, https://doi.org/10.1016/j.eti.2020.101026. stannate flakes for optoelectronic and antibacterial applications, 2019, p. 030026,
[8] S.F. Ahmed, et al., Recent developments in physical, biological, chemical, and doi: 10.1063/1.5112865.
hybrid treatment techniques for removing emerging contaminants from [32] L.M.C. Honorio, M.V.B. Santos, E.C. da Silva Filho, J.A. Osajima, A.S. Maia, I.M.
wastewater, J. Hazard. Mater. 416 (2021) 125912, https://doi.org/10.1016/j. G. dos Santos, Alkaline earth stannates applied in photocatalysis: prospection and
jhazmat.2021.125912. review of literature, Cerâmica 64 (372) (2018) 559–569, https://doi.org/10.1590/
[9] R. Rashid, I. Shafiq, P. Akhter, M.J. Iqbal, M. Hussain, A state-of-the-art review on 0366-69132018643722480.
wastewater treatment techniques: the effectiveness of adsorption method, Environ. [33] A.K. Ganguli, G.B. Kunde, W. Raza, S. Kumar, P. Yadav, Assessment of performance
Sci. Pollut. Res. 28 (8) (2021) 9050–9066, https://doi.org/10.1007/s11356-021- of photocatalytic nanostructured materials with varied morphology based on
12395-x. reaction conditions, Molecules 27 (22) (2022) 7778, https://doi.org/10.3390/
[10] S. Wadhawan, A. Jain, J. Nayyar, S.K. Mehta, Role of nanomaterials as adsorbents molecules27227778.
in heavy metal ion removal from waste water: a review, J. Water Process Eng. 33 [34] H. Tao, T. Wu, M. Aldeghi, T.C. Wu, A. Aspuru-Guzik, E. Kumacheva, Nanoparticle
(2020) 101038, https://doi.org/10.1016/j.jwpe.2019.101038. synthesis assisted by machine learning, Nat. Rev. Mater. 6 (8) (2021) 701–716,
[11] A.K. Prajapati, S. Das, M.K. Mondal, Exhaustive studies on toxic Cr(VI) removal https://doi.org/10.1038/s41578-021-00337-5.
mechanism from aqueous solution using activated carbon of Aloe vera waste [35] G. Hautier, C. Fischer, V. Ehrlacher, A. Jain, G. Ceder, Data mined ionic
leaves, J Mol Liq 307 (2020) 112956, https://doi.org/10.1016/j. substitutions for the discovery of new compounds, Inorg. Chem. 50 (2) (2011)
molliq.2020.112956. 656–663, https://doi.org/10.1021/ic102031h.
[12] A.K. Prajapati, M.K. Mondal, Hazardous As(III) removal using nanoporous [36] C.L. Phillips, G.A. Voth, Discovering crystals using shape matching and machine
activated carbon of waste garlic stem as adsorbent: kinetic and mass transfer learning, Soft Matter 9 (35) (2013) 8552, https://doi.org/10.1039/c3sm51449h.
mechanisms, Korean J. Chem. Eng. 36 (11) (2019) 1900–1914, https://doi.org/ [37] P. Raccuglia, et al., Machine-learning-assisted materials discovery using failed
10.1007/s11814-019-0376-x. experiments, Nature 533 (7601) (2016) 73–76, https://doi.org/10.1038/
[13] A. Chatterjee, S. Shamim, A.K. Jana, J.K. Basu, Insights into the competitive nature17439.
adsorption of pollutants on a mesoporous alumina–silica nano-sorbent synthesized [38] B. Meredig, et al., Combinatorial screening for new materials in unconstrained
from coal fly ash and a waste aluminium foil, RSC Adv. 10 (26) (2020) composition space with machine learning, Phys. Rev. B 89 (9) (2014) 094104,
15514–15522, https://doi.org/10.1039/D0RA01397H. https://doi.org/10.1103/PhysRevB.89.094104.
[14] U. Kumari, H. Siddiqi, M. Bal, B.C. Meikap, Calcium and zirconium modified acid [39] G. Hautier, C.C. Fischer, A. Jain, T. Mueller, G. Ceder, Finding nature’s missing
activated alumina for adsorptive removal of fluoride: performance evaluation, ternary oxide compounds using machine learning and density functional theory,
kinetics, isotherm, characterization and industrial wastewater treatment, Adv. Chem. Mater. 22 (12) (2010) 3762–3767, https://doi.org/10.1021/cm100795d.
Powder Technol. 31 (5) (2020) 2045–2060, https://doi.org/10.1016/j. [40] G.V.S.M. Carrera, L.C. Branco, J. Aires-de-Sousa, C.A.M. Afonso, Exploration of
apt.2020.02.035. quantitative structure–property relationships (QSPR) for the design of new
[15] G.Y. Gor, P. Huber, N. Bernstein, Adsorption-induced deformation of nanoporous guanidinium ionic liquids, Tetrahedron 64 (9) (2008) 2216–2224, https://doi.org/
materials—a review, Appl. Phys. Rev. 4 (1) (2017), https://doi.org/10.1063/ 10.1016/j.tet.2007.12.021.
1.4975001. [41] D. Farrusseng, F. Clerc, C. Mirodatos, R. Rakotomalala, Virtual screening of
[16] G. Saxena, R. Bharagava, Organic and inorganic pollutants in industrial wastes, in: materials using neuro-genetic approach: concepts and implementation, Comput.
Environmental Pollutants and Their Bioremediation Approaches, CRC Press, 2017, Mater. Sci. 45 (1) (2009) 52–59, https://doi.org/10.1016/j.
pp. 23–56, https://doi.org/10.1201/9781315173351-3. commatsci.2008.03.060.
[17] M. Ibrahim, A. Siddique, L. Verma, J. Singh, J.R. Koduru, Adsorptive removal of [42] I. Salahshoori, M. Namayandeh Jorabchi, A. Baghban, H.A. Khonakdar, Integrative
fluoride from aqueous solution by biogenic iron permeated activated carbon analysis of multi machine learning models for tetracycline photocatalytic
derived from sweet lime waste, Acta Chim. Slov. (2019) 123–136, https://doi.org/ degradation with MOFs in wastewater treatment, Chemosphere 350 (2024)
10.17344/acsi.2018.4717. 141010, https://doi.org/10.1016/j.chemosphere.2023.141010.
[18] S.O. Ganiyu, C.A. Martínez-Huitle, M.A. Oturan, Electrochemical advanced [43] Z.H. Jaffari, et al., Machine learning approaches to predict the photocatalytic
oxidation processes for wastewater treatment: advances in formation and detection performance of bismuth ferrite-based materials in the removal of malachite green,
of reactive species and mechanisms, Curr. Opin. Electrochem. 27 (2021) 100678, J. Hazard Mater. 442 (2023) 130031, https://doi.org/10.1016/j.
https://doi.org/10.1016/j.coelec.2020.100678. jhazmat.2022.130031.
[19] D. Ghernaout, N. Elboughdiri, S. Ghareba, Fenton technology for wastewater [44] A. Esmaeili, et al., Pharmaceutical wastewater treatment using TiO2 nanosheets
treatment: dares and trends, Oalib 07 (01) (2020) 1–26, https://doi.org/10.4236/ deposited by cobalt co-catalyst as hybrid photocatalysts: combined experimental
oalib.1106045. study and artificial intelligence modeling, Chem. Prod. Process Model. 18 (4)
[20] Y. Guo, et al., Modelling of emerging contaminant removal during heterogeneous (2023) 611–631, https://doi.org/10.1515/cppm-2022-0070.
catalytic ozonation using chemical kinetic approaches, J. Hazard. Mater. 380 [45] F.-S. Tabatabai-Yazdi, A. Ebrahimian Pirbazari, F. Esmaeili Khalil Saraei, N. Gilani,
(2019) 120888, https://doi.org/10.1016/j.jhazmat.2019.120888. Construction of graphene based photocatalysts for photocatalytic degradation of
[21] K.P. Gopinath, N.V. Madhav, A. Krishnan, R. Malolan, G. Rangarajan, Present organic pollutant and modeling using artificial intelligence techniques, Physica B
applications of titanium dioxide for the photocatalytic removal of pollutants from Condens. Matter 608 (2021) 412869, https://doi.org/10.1016/j.
water: a review, J. Environ. Manage. 270 (2020) 110906, https://doi.org/ physb.2021.412869.
10.1016/j.jenvman.2020.110906. [46] A. Esmaeili, et al., CdS nanocrystallites sensitized ZnO nanosheets for visible light
[22] P. Shandilya, S. Sambyal, R. Sharma, P. Mandyal, B. Fang, Properties, optimized induced sonophotocatalytic/photocatalytic degradation of tetracycline: from
morphologies, and advanced strategies for photocatalytic applications of WO3 experimental results to a generalized model based on machine learning methods,
based photocatalysts, J. Hazard. Mater. 428 (2022) 128218, https://doi.org/ Chemosphere 332 (2023) 138852, https://doi.org/10.1016/j.
10.1016/j.jhazmat.2022.128218. chemosphere.2023.138852.
[23] F. Güell, et al., ZnO-based nanomaterials approach for photocatalytic and sensing [47] N. Esmaeili, et al., Estimation of 2,4-dichlorophenol photocatalytic removal using
applications: recent progress and trends, Mater. Adv. 4 (17) (2023) 3685–3707, different artificial intelligence approaches, Chem. Prod. Process Model. 18 (2)
https://doi.org/10.1039/D3MA00227F. (2023) 247–263, https://doi.org/10.1515/cppm-2021-0065.
[24] G. Ramanathan, K.R. Murali, Photocatalytic activity of SnO2 nanoparticles, [48] C.-M. Kim, Z.H. Jaffari, A. Abbas, M.F. Chowdhury, K.H. Cho, Machine learning
J. Appl. Electrochem. 52 (5) (2022) 849–859, https://doi.org/10.1007/s10800- analysis to interpret the effect of the photocatalytic reaction rate constant (k) of
022-01676-z. semiconductor-based photocatalysts on dye removal, J. Hazard. Mater. 465 (2024)
[25] O.V. Nkwachukwu, O.A. Arotiba, Perovskite oxide–based materials for 132995, https://doi.org/10.1016/j.jhazmat.2023.132995.
photocatalytic and photoelectrocatalytic treatment of water, Front. Chem. 9 [49] Z. Jiang, J. Hu, M. Tong, A.C. Samia, H. (Judy) Zhang, X. (Bill) Yu, A novel
(2021), https://doi.org/10.3389/fchem.2021.634630. machine learning model to predict the photo-degradation performance of different
[26] M. Paszkiewicz-Gawron, et al., Stannates, titanates and tantalates modified with photocatalysts on a variety of water contaminants, Catalysts 11 (9) (2021) 1107,
carbon and graphene quantum dots for enhancement of visible-light photocatalytic https://doi.org/10.3390/catal11091107.
activity, Appl. Surf. Sci. 541 (2021) 148425, https://doi.org/10.1016/j. [50] L. Chen, et al., Optimization and comparison of machine learning methods in
apsusc.2020.148425. estimation of carbon dioxide loading in chemical solvents for environmental
applications, J. Mol. Liq. 349 (2022) 118513, https://doi.org/10.1016/j.
molliq.2022.118513.

9
A. Soltani et al. Computational and Theoretical Chemistry 1244 (2025) 115003

[51] T. Pereira, M. Abbasi, B. Ribeiro, J.P. Arrais, Diversity oriented Deep J. Ind. Eng. Chem. 116 (2022) 339–350, https://doi.org/10.1016/j.
Reinforcement Learning for targeted molecule generation, J. Cheminform. 13 (1) jiec.2022.09.024.
(2021) 21, https://doi.org/10.1186/s13321-021-00498-z. [58] H. Cheng, J. Wang, Y. Zhao, X. Han, Effect of phase composition, morphology, and
[52] T.B. Dunn, et al., Diversity and chemical library networks of large data sets, specific surface area on the photocatalytic activity of TiO2 nanomaterials, RSC Adv.
J. Chem. Inf. Model. 62 (9) (2022) 2186–2201, https://doi.org/10.1021/acs. 4 (87) (2014) 47031–47038, https://doi.org/10.1039/C4RA05509H.
jcim.1c01013. [59] K. Zhang, J. Wang, R. Ninakanti, S.W. Verbruggen, Solvothermal synthesis of
[53] M. Pavel, C. Anastasescu, R.-N. State, A. Vasile, F. Papa, I. Balint, Photocatalytic mesoporous TiO2 with tunable surface area, crystal size and surface hydroxylation
degradation of organic and inorganic pollutants to harmless end products: for efficient photocatalytic acetaldehyde degradation, Chem. Eng. J. 474 (2023)
assessment of practical application potential for water and air cleaning, Catalysts 145188, https://doi.org/10.1016/j.cej.2023.145188.
13 (2) (2023) 380, https://doi.org/10.3390/catal13020380. [60] Z. Liang, X. Zhuang, Z. Tang, Q. Deng, H. Li, W. Kang, High-crystalline polymeric
[54] N. Li, C. Wang, K. Zhang, H. Lv, M. Yuan, D.W. Bahnemann, Progress and prospects carbon nitride flake composed porous nanotubes with significantly improved
of photocatalytic conversion of low-concentration NO, Chin. J. Catal. 43 (9) (2022) photocatalytic water splitting activity: the optimal balance between crystallinity
2363–2387, https://doi.org/10.1016/S1872-2067(22)64139-1. and surface area, Chem. Eng. J. 432 (2022) 134388, https://doi.org/10.1016/j.
[55] A.K. Prajapati, M.K. Mondal, Comprehensive kinetic and mass transfer modeling cej.2021.134388.
for methylene blue dye adsorption onto CuO nanoparticles loaded on nanoporous [61] C. Liu, et al., Controlled synthesis and structure tunability of photocatalytically
activated carbon prepared from waste coconut shell, J. Mol. Liq. 307 (2020) active mesoporous metal-based stannate nanostructures, Appl. Surf. Sci. 296
112949, https://doi.org/10.1016/j.molliq.2020.112949. (2014) 53–60, https://doi.org/10.1016/j.apsusc.2014.01.030.
[56] E. Dhanaraman, A. Verma, P. Chen, N. Chen, Y. Siddiqui, Y. Fu, Bi2WO6 [62] Z. Xue, et al., Low temperature synthesis of SnSr(OH)6 nanoflowers and
incorporation of g-C3N4 to enhance the photocatalytic N2 reduction reaction and photocatalytic performance for organic pollutants, Int. J. Mater. Res. 113 (1)
antibiotic pollutants removal, Sol. RRL (2024), https://doi.org/10.1002/ (2022) 80–90, https://doi.org/10.1515/ijmr-2021-8333.
solr.202300981. [63] A.H. Navidpour, A. Hosseinzadeh, Z. Huang, D. Li, J.L. Zhou, Application of
[57] S. Wu, M. Li, L. Xin, H. Long, X. Gao, Simultaneously photocatalytic removal of Cr machine learning algorithms in predicting the photocatalytic degradation of
(VI) and metronidazole by asynchronous cross-linked modified sodium alginate, perfluorooctanoic acid, Catal. Rev. (2022) 1–26, https://doi.org/10.1080/
01614940.2022.2082650.

10

You might also like