RubricEval: A Scalable Human-LLM Evaluation Framework for Open-Ended Tasks
RubricEval: A Scalable Human-LLM Evaluation Framework for Open-Ended Tasks
Vineel Bhat
引用 0 次
TLDR
This work proposes RubricEval, a human-LLM evaluation framework that scores instructions using instruction-level rubrics and provides interpretable summary feedback to model developers, and implemented two mechanisms for feedback generation that found LLM-generated feedback to be broadly informative and helpful.
