The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol

The Status of ML Algorithms
for Structure-property Relationships
Using Matbench as a Test Protocol
Anubhav Jain
Lawrence Berkeley National Laboratory
TMS Spring 2022, March 2022
Slides (already) posted to hackingmaterials.lbl.gov

ML is quickly becoming a standard tool for
materials screening
2
Machine learning
High-throughput DFT
Expensive calculation
Experiment
Millions of candidates

There are many new algorithms being published
for ML in materials –
New ones constantly reported!
3

4
Q: Which one is the “best”
based on the literature?

5
Q: Which one is the “best”
based on the literature?
A: Can’t tell! They’re nearly
all done on different data.

Difficulty of comparing ML algorithms
6
Data set used
in study A
Data set used
in study B
Data set used
in study C
• Different data sets
• Source (e.g., OQMD vs MP)
• Quantity (e.g., MP 2018 vs MP 2019)
• Subset / data filtering (e.g., ehull<X)
• Different evaluation metrics
• Test set vs. cross validation?
• Different test set fraction?
• Often no runnable version of a
published algorithm.
MAE 5-Fold CV = 0.102 eV
RMSE Test set = 0.098 eV
vs.
? ?

What’s needed – an “ImageNet” for materials
science
7
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/

What does a standard
data set do for a field?
8
One of the reasons computer science
/ machine learning seems to advance
so quickly is that they decouple data
generation from algorithm
development
This allows groups to focus on
algorithm development without all
the data generation, data cleaning,
etc. that often is the majority of an
end-to-end data science project

The ingredients of the Matbench benchmark
qStandard data sets
qStandard test splits according to nested cross-validation procedure
qAn online leaderboard that encourages reproducible results
9

How to design good data sets for materials
science?
10
• There is no single type of problem that materials scientists are trying
to solve
• For now, focus on materials property prediction (from structure or
composition)
• We want a test set that contains a diverse array of problems
• Smaller data versus larger data
• Different applications (electronic, mechanical, etc.)
• Composition-only or structure information available
• Experimental vs. Ab-initio
• Classification or regression

Matbench includes 13 different ML tasks
11
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference
Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.

The tasks encompass a variety of problems
12
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference
Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.

ü Standard data sets
q Standard test splits according to nested cross-validation procedure
q An online leaderboard that encourages reproducible results
13

The most common method:
a single hold-out test set
14
• Training/validation is used for
model selection
• Test/hold-out is used only for
error estimation (i.e., final
score)

Nested CV as a standard scoring metric
15
Nested CV is like hold-out, but varies the hold out set.
Think of it as k different “universes” – we have a
different training + validation of the model in each
universe and a different hold-out.

Nested CV as a standard scoring metric
16
Nested CV is like hold-out, but varies the hold out set.
Think of it as N different “universes” – we have a
different training + validation of the model in each
universe and a different hold-out.
“A nested CV procedure provides an almost unbiased estimate of the true error.”
Varma and Simon, Bias in error estimation when using cross-validation for model
selection (2006)

ü Standard test splits according to nested cross-validation procedure
q An online leaderboard that encourages reproducible results
17

Matbench Website – now complete!
https://matbench.materialsproject.org

Matbench compares ML algorithms
19
Bigger datasets
Better
relative
performance

Access to Datasets/ML tasks
Interactively, via Materials Project
ml.materialsproject.org
Programmatically via matbench in python (2 lines)
*loads all 13 tasks
Programmatically via matminer in python (2 lines) Direct download, via matbench.materialsproject.org
Preferred/easiest method!
https://github.com/hackingmaterials/matminer
https://github.com/hackingmaterials/matminer

Programmatic Access and Analysis of Submissions
21
• Run a benchmark on your own algorithm in ~10 lines of code
• Run on any combination or all of the 13 existing tasks
• If your entry outperforms existing entry, submit algorithm in a pull request!
Existing notebooks/code and
software requirements for
reproducing any benchmark
{'python': [['crabnet==1.2.1',
'scikit_learn==1.0.2', 'matbench==0.5']]}
Comprehensive raw data
(accessible via matbench python
package or any json-capable
language) on all benchmarks
Publicly available to anyone!
In-depth performance metrics for
individual ML tasks for all
submissions
Both visually on website, and
programmatically

ü Standard test splits according to nested cross-validation procedure
ü An online leaderboard that encourages reproducible results
22

What algorithms have been tested on the
matbench data set so far?
• Magpie + sine coloumb matrix random forest (feature-based random forests)
• Ward, L., Agrawal, A., Choudhary, A. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater 2, 16028
(2016). https://doi.org/10.1038/npjcompumats.2016.28
• Faber, Felix, et al. "Crystal structure representations for machine learning models of formation energies." International Journal of Quantum Chemistry 115.16 (2015):
1094-1101.
• Automatminer (feature-based AutoML)
• Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer
Reference Algorithm. npj Comput Mater 2020, 6 (1), 138.
• CGCNN (graph neural network)
• Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett.
2018, 120 (14), 145301.
• MEGNET (graph neural network)
• Chen, C.; Ye, W.; Zuo, Y.; Zheng, C.; Ong, S. P. Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chemistry of Materials 2019, 31
(9), 3564–3572.
• MODNet (feature-based neural network)
• De Breuck, P.-P.; Evans, M. L.; Rignanese, G.-M. Robust Model Benchmarking and Bias-Imbalance in Data-Driven Materials Science: A Case Study on MODNet.
arXiv:2102.02263 [cond-mat] 2021.
• CRABNet (attention-based composition neural network)
• Wang, A.; Kauwe, S.; Murdock, R.; Sparks, T. Compositionally-Restricted Attention-Based Network for Materials Property Prediction; ChemRxiv, 2020.
https://doi.org/10.26434/chemrxiv.11869026.v1.
• ALIGNN (graph neural network with bond angles)
• Choudhary, Kamal, and Brian DeCost. "Atomistic Line Graph Neural Network for improved materials property predictions." npj Computational Materials 7.1 (2021): 1-8.
23

Insights from standardized comparisons
24
• Originally, we found traditional ”hand-crafted” feature models performed best generally when ! < 10%
• So it seemed matsci data – typically small datasets, esp. experimental – was best modelled by traditional
ML/feature methods, e.g. Random Forest
• Clever developments in neural networks have improved GNN models on smaller datasets, in part
powered by competition on the Matbench leaderboard
• Standardized platform has enabled easier identification of techniques which work well for certain
problems, and those that do not
+

25
Errors Predicting Final Phonon DOS Peak Frequencies
Structural GNN
(2022)
Composition GNN
(2021)
Algorithm
Mean MAE
(cm-1)
Mean RMSE
(cm-1)
Maximum
max_error (cm-1)
ALIGNN (2022) 29.5385 53.501 615.3466
MODNet v0.1.10
(2021) 38.7524 78.222 1031.8168
CrabNet (2021) 55.1114 138.3775 1452.7562
AMMExpress
(2020) 56.1706 109.7048 1151.557
CGCNN (2019) 57.7635 141.7018 2504.8743
Mean Absolute Error !"#$ ± &"#$ Predicting Final PhDOS Peaks
SoTA early 2020
Same data, same test; so, why are some algorithms best?
• ALIGNN: Incorporation of bond angle into crystal graph
• Bond angle/local env importance for vibrational properties?
• Matbench enables these sorts of “instant” ablation studies

26
Errors Predicting Predicting Expt. !"#$
Mean Absolute Error %&'( ± *&'( Predicting Expt. !"#$
Composition GNN
Algorithm
Mean MAE
(eV)
Std. MAE
(eV)
Mean RMSE
(eV)
CrabNet 0.3463 0.0088 0.8504
MODNet (v0.1.10) 0.347 0.0222 0.7437
CrabNet v1.2.1 0.3757 0.0207 0.8805
AMMExpress v2020 0.4161 0.0194 0.9918
Traditional Features
+ Encoding/selection
SoTA early 2020
Same data, same test; so, why are some algorithms best?
• CrabNet: Importance of attention mechanism for
compositional props.; low variability across folds
• MODNet: Normalized Mutual Information feature selection
results in high performance at risk of higher variability across
folds

Improvements to Materials ML Benchmarks
27
Standardized Uncertainty Quantification More Datasets + Better Tasks!
• ML-Materials design improved by UQ of each prediction
• Enables adaptive design:
• Practical: modern models (e.g., MODNet) produce
UQ estimates naturally
• Useful: Can analyze UQ to tell us how often samples
true values actually fall outside UQ range
• In progress: Coming soon to matbench package!
• Impossible to represent the full field of materials
design in a single set of benchmarks
• However… can we come close? Aim to include a wider
variety of properties and sources:
• Expt. load-dependent Vicker’s hardness
• Expt. superconductor Tc
• Expt. Δ"#
$
from crystal structure
• Expt. UV-Vis measurements of metal oxides
• Unique, domain-specific procedures for each task
• For example: segregation of CV samples into clusters
based on structure/composition (LOCOCV)
• Evaluation procedures which most closely resemble
real world usage of these algorithms in the most
computationally feasible fashion

Conclusions and future
• As the community increasingly develops new algorithms for machine
learning materials properties, a standard way to test these algorithms
is needed
• Matbench represents such a standard and allows you to test your
algorithms against others
• Matbench also allows us to measure overall progress in the field
• We hope to see you on the leaderboard!
28

Acknowledgements
29
Alex Dunn
Lead developer
Qi Wang
Alex Ganose Daniel Dopp
Slides (already) posted to hackingmaterials.lbl.gov

The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol

More Related Content

What's hot

Similar to The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol

More from Anubhav Jain

Recently uploaded

The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol