ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Liu, Xiaoyang; Bao, Kangjie; Zhang, Jiashuo; Liu, Yunqi; Liu, Yuntian; Chen, Yu; Jiao, Yang; Luo, Tao

Computer Science > Computation and Language

arXiv:2502.05567 (cs)

[Submitted on 8 Feb 2025 (v1), last revised 19 May 2025 (this version, v2)]

Title:ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Authors:Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang, Yunqi Liu, Yuntian Liu, Yu Chen, Yang Jiao, Tao Luo

View PDF HTML (experimental)

Abstract:Autoformalization, the automatic translation of mathematical content from natural language into machine-verifiable formal languages, has seen significant progress driven by advances in large language models (LLMs). Nonetheless, a primary barrier to further improvements is the limited availability of parallel corpora that map informal mathematical text to its formal counterpart. To address this limitation, we propose ATLAS (Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data), a novel data generation framework designed to produce large-scale, high-quality parallel corpora of theorem statements. Distinct from prior approaches, ATLAS begins with a concept repository, accelerates the improvement of student model through expert iteration combined with knowledge distillation, and introduces two novel augmentation strategies that exploit the structural characteristics of formal languages. With the proposed ATLAS running for 10 iterations, we construct an undergraduate-level dataset comprising 117k theorem statements and develop ATLAS Translator, which demonstrates statistically significant improvements over both the HERALD Translator and the Kimina-Autoformalizer across all benchmarks ($p<0.05$, two-sided t-test), achieving a new state of the art. The datasets, model, and code will be released to the public soon.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2502.05567 [cs.CL]
	(or arXiv:2502.05567v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.05567

Submission history

From: Xiaoyang Liu [view email]
[v1] Sat, 8 Feb 2025 13:28:51 UTC (308 KB)
[v2] Mon, 19 May 2025 04:17:39 UTC (1,065 KB)

Computer Science > Computation and Language

Title:ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators