AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Liu, Zihan; Chen, Yang; Shoeybi, Mohammad; Catanzaro, Bryan; Ping, Wei

Computer Science > Computation and Language

arXiv:2412.15084 (cs)

[Submitted on 19 Dec 2024 (v1), last revised 17 Jan 2025 (this version, v2)]

Title:AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Authors:Zihan Liu, Yang Chen, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

View PDF HTML (experimental)

Abstract:In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks. We release model weights, training data, and evaluation benchmarks at: this https URL

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.15084 [cs.CL]
	(or arXiv:2412.15084v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.15084

Submission history

From: Wei Ping [view email]
[v1] Thu, 19 Dec 2024 17:29:44 UTC (663 KB)
[v2] Fri, 17 Jan 2025 07:12:55 UTC (713 KB)

Computer Science > Computation and Language

Title:AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators