RubricEval: A Scalable Human-LLM Evaluation Framework for Open-Ended Tasks

TLDR

This work proposes RubricEval, a human-LLM evaluation framework that scores instructions using instruction-level rubrics and provides interpretable summary feedback to model developers, and implemented two mechanisms for feedback generation that found LLM-generated feedback to be broadly informative and helpful.