Google Scholar

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

X Liu, K Bao, J Zhang, Y Liu, Y Liu, Y Chen… - arXiv preprint arXiv …, 2025 - arxiv.org

X Liu, K Bao, J Zhang, Y Liu, Y Liu, Y Chen, Y Jiao, T Luo

arXiv preprint arXiv:2502.05567, 2025•arxiv.org

Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. With the proposed ATLAS running for 10 iterations, we construct an undergraduate-level dataset comprising 117k theorem statements and develop ATLAS Translator, which demonstrates statistically significant improvements over both the HERALD Translator and the Kimina-Autoformalizer across all benchmarks (, two-sided t-test), achieving a new state of the art. The datasets, model, and code will be released to the public soon.

arxiv.org

Show moreShow less

Save Cite Cited by 2 Related articles All 2 versions View as HTML

Cite

Advanced search

Saved to My library

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data