
Nima Shoghi
ML PhD Student
Atlanta, GA
I'm a PhD student in Machine Learning at Georgia Tech, where I am focusing on Deep Learning for Scientific Applications under the guidance of Dr. Pan Li and Dr. Victor Fung. I earned my B.S. and M.S. degrees in Computer Science from Georgia Tech, during which I conducted research at the High Performance Computer Architecture Lab on accelerating ML training and inference. Prior to starting my PhD, I completed a two-year AI residency at Meta AI's FAIR Chemistry team, where I worked on developing large pre-trained models, trained on a diverse mixture of chemical data across multiple domains, for general-purpose chemical property prediction. My research interests lie in the development and application of deep learning techniques to challenging problems in science and engineering. I am particularly excited about the potential for deep learning to accelerate discovery and understanding in fields like chemistry and climate science.
Recent Updates
Started a Research Scientist Internship at ByteDance Research (ByteDance Seed).
Our paper on MatterTune is now available on arXiv!
Gave invited talks on pre-training for chemistry at various venues, including the 2024 Machine Learning for Materials and Molecular Discoveries Symposium in Gothenburg, the AI for Science Institute (AISI) Beijing, King Abdullah University of Science and Technology (KAUST), and SES AI.
Started my PhD in Machine Learning at Georgia Tech with Dr. Pan Li and Dr. Victor Fung.
Wrote a blog post on From Molecules to Materials for Valence Labs.
Our paper on large-scale diverse pre-training for chemical property prediction was accepted to ICLR 2024!
Featured Publications
Selected research publications
From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Nima Shoghi, Adeesh Kolluru, John Kitchin, Zachary Ulissi, C. Lawrence Zitnick, Brandon Wood
Introduces Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that leverages diverse data to advance atomic property prediction across chemical domains, achieving state-of-the-art performance on 34 out of 40 downstream tasks.
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
Richard Tran, Janice Lan, ..., Nima Shoghi, ..., C. Lawrence Zitnick
Introduces the Open Catalyst 2022 (OC22) dataset, consisting of 62,331 DFT relaxations, to accelerate machine learning for oxide electrocatalysts and establish benchmarks for the field.
Lingyu Kong, Nima Shoghi, Guoxiang Hu, Pan Li, Victor Fung
Introduces MatterTune, a modular platform that enables fine-tuning of pre-trained atomistic foundation models for materials science applications.
Experience
Selected professional and research experience

Research Scientist Intern, AI for Science
- Developed foundational spatio-temporal models for protein dynamics, producing realistic conformational ensembles and long-horizon trajectories to support drug-discovery use cases (e.g., mechanism insight and pathway exploration).
- Adapted video diffusion with history-aware temporal attention and noise-aware training, improving long-horizon stability/robustness; achieved state-of-the-art quality and diversity/coverage on key benchmarks.
- Built an end-to-end pipeline: protein-dynamics generation → physics-based relaxation (simulator) → quality/diversity evaluation; operated at scale with distributed multi-GPU training and backtesting-style analysis.

AI Resident, FAIR Chemistry Team
- Developed large foundation models for atomic property prediction, pre-trained on data from diverse chemical domains. Fine-tuned the model to achieve state-of-the-art results across 35 out of 41 downstream tasks. (ICLR 2024*)
- Contributed to the development of a transfer learning approach using Graph Neural Networks to generalize models across domains in molecular and catalyst discovery, reducing the need for large, domain-specific datasets. (J Chem Phys 2022)
- Benchmarked state-of-the-art machine learning interatomic potentials models on the Open Catalyst 2022 dataset, one of the largest datasets for automatic catalyst discovery. (ACS Catalysis 2023)

Machine Learning Intern
- Developed transformer models pre-trained on approximately 500,000 time-series data points from manufacturing processes to predict process outcomes and detect anomalies, achieving accuracy improvements on real-world manufacturing datasets.
Education

PhD in Machine Learning, 4.0 GPA
- Advisors: Dr. Pan Li and Dr. Victor Fung
- Research Interest: Developing ML techniques to solve complex problems in the scientific and engineering domains.

M.S. in Computer Science (ML Focus), 4.0 GPA
