UPDF AI

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs

Jialun Cao,Yuk-Kit Chan,12 Authors,S. Cheung

2025 · DOI: 10.48550/arXiv.2501.10711
arXiv.org · 6 Citations

TLDR

How2Bench comprising a 55-criteria checklist as a set of guidelines to comprehensively govern the development of code-related benchmarks is proposed to assure its quality, reliability, and reproducibility.