MathArena

Competition performance

Show individual competitions

Competition	Accuracy	Rank	Cost	Output Tokens
IMProofBench - Proofs 🕵️ Research Math	50.23% ± 14.00%	1/5	N/A	N/A
IMProofBench - Final Answers 🕵️ Research Math	52.93% ± 15.10%	5/11	N/A	N/A
Apex 🏔️ Apex	1.04% ± 1.44%	12/22	$5.54	46122
Overall 👁️ Visual Mathematics	78.75% ± 2.97%	4/13	$2.04	7243
Kangaroo 2025 1-2 👁️ Visual Mathematics	68.75% ± 9.27%	4/13	$1.52	6222
Kangaroo 2025 3-4 👁️ Visual Mathematics	60.42% ± 9.78%	8/13	$2.39	9838
Kangaroo 2025 5-6 👁️ Visual Mathematics	65.00% ± 8.53%	8/13	$2.41	7952
Kangaroo 2025 7-8 👁️ Visual Mathematics	90.83% ± 5.16%	2/13	$1.96	6449
Kangaroo 2025 9-10 👁️ Visual Mathematics	92.50% ± 4.71%	7/13	$1.72	5632
Kangaroo 2025 11-12 👁️ Visual Mathematics	95.00% ± 3.90%	4/13	$2.24	7363
Overall 🔢 Final-Answer Competitions	91.02% ± 1.96%	8/17	$5.04	14156
AIME 2025 🔢 Final-Answer Competitions	95.00% ± 3.90%	4/54	$4.08	13475
HMMT Feb 2025 🔢 Final-Answer Competitions	88.33% ± 5.74%	15/54	$5.00	16380
BRUMO 2025 🔢 Final-Answer Competitions	91.67% ± 4.95%	16/40	$3.28	10760
SMT 2025 🔢 Final-Answer Competitions	91.98% ± 3.66%	3/38	$6.29	11731
CMIMC 2025 🔢 Final-Answer Competitions	90.00% ± 4.65%	6/31	$6.94	17108
HMMT Nov 2025 🔢 Final-Answer Competitions	89.17% ± 5.56%	11/17	$4.65	15483
IMO 2025 ✍️ Proof-Based Competitions	38.10% ± 19.43%	1/7	$53.61	725147
Project Euler 💻 Project Euler	N/A	N/A	$28.93	39853

Sampling parameters

Additional parameters

{
  "reasoning": {
    "summary": "auto"
  }
}

Most surprising traces (Item Response Theory)

Computed once using a Rasch-style logistic fit; excludes Project Euler where traces are hidden.

Click a trace button above to load it.

Click a trace button above to load it.

GPT-5 (high)