Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning
Tianbao Xie,Siheng Zhao,5 作者,Tao Yu
2023 · DOI: 10.48550/arXiv.2309.11489
arXiv.org · 引用 83 次
TLDR
T EXT 2R EWARD is introduced, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs) that produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback.
