Policy Gradient Methods for Reinforcement Learning with Function Approximation
Policy Gradient Methods for Reinforcement Learning with Function Approximation
R. Sutton,David A. McAllester,Satinder Singh,Y. Mansour
1999 · DBLP: conf/nips/SuttonMSM99
Neural Information Processing Systems · 7,261 Citations
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
