Q-learning
Q-learning
C. Watkins,P. Dayan
1992 · DOI: 10.1007/BF00992698
Machine-mediated learning · 11,676 citazioni
TLDR
This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
