UPDF AI

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

Tianbao Xie,Siheng Zhao,5 作者,Tao Yu

2023 · DOI: 10.48550/arXiv.2309.11489
arXiv.org · 引用 83 次

TLDR

T EXT 2R EWARD is introduced, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs) that produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback.