How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
Jialun Cao,Yuk-Kit Chan,12 Autores,S. Cheung
2025 · DOI: 10.48550/arXiv.2501.10711
arXiv.org · 6 Citações
TLDR
How2Bench comprising a 55-criteria checklist as a set of guidelines to comprehensively govern the development of code-related benchmarks is proposed to assure its quality, reliability, and reproducibility.
