Evocodebench: An evolving code generation benchmark aligned with real-world code repositories

J Li, G Li, X Zhang, Y Dong, Z Jin - arXiv preprint arXiv:2404.00599, 2024 - arxiv.org
How to evaluate Large Language Models (LLMs) in code generation is an open question.
Existing benchmarks demonstrate poor alignment with real-world code repositories and are
insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark-
EvoCodeBench to address the preceding problems, which has three primary advances.(1)
EvoCodeBench aligns with real-world repositories in multiple dimensions, eg, code
distributions and dependency distributions.(2) EvoCodeBench offers comprehensive …

[CITATION][C] EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. CoRR abs/2404.00599 (2024)

J Li, G Li, X Zhang, Y Dong, Z Jin - arXiv preprint arXiv:2404.00599, 2024
Showing the best results for this search. See all results