Evocodebench: An evolving code generation benchmark aligned with real-world code repositories
How to evaluate Large Language Models (LLMs) in code generation is an open question.
Existing benchmarks demonstrate poor alignment with real-world code repositories and are
insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark-
EvoCodeBench to address the preceding problems, which has three primary advances.(1)
EvoCodeBench aligns with real-world repositories in multiple dimensions, eg, code
distributions and dependency distributions.(2) EvoCodeBench offers comprehensive …
Existing benchmarks demonstrate poor alignment with real-world code repositories and are
insufficient to evaluate the coding abilities of LLMs. This paper proposes a new benchmark-
EvoCodeBench to address the preceding problems, which has three primary advances.(1)
EvoCodeBench aligns with real-world repositories in multiple dimensions, eg, code
distributions and dependency distributions.(2) EvoCodeBench offers comprehensive …
[CITATION][C] EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories. CoRR abs/2404.00599 (2024)
Showing the best results for this search. See all results