How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs
Jialun Cao,Yuk-Kit Chan,12 Authors,S. Cheung
2025 · DOI: 10.48550/arXiv.2501.10711
arXiv.org · 6 Citations
TLDR
How2Bench comprising a 55-criteria checklist as a set of guidelines to comprehensively govern the development of code-related benchmarks is proposed to assure its quality, reliability, and reproducibility.
