UPDF AI

Q-learning

C. Watkins,P. Dayan

1992 · DOI: 10.1007/BF00992698
Machine-mediated learning · 11,676 Citações

TLDR

This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.