How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation

TLDR

Two metrics for assessing LLM-based agent believability are introduced: consistency, and robustness, together with a benchmark, SimulateBench, with which, the consistency and robustness of agents implemented with popular LLMs are evaluated.