Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Weighted importance sampling for off-policy learning with linear function approximation.

A. Mahmood, H. van Hasselt, and R. Sutton. NIPS, page 3014-3022. (2014)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Richard S Ascough

William Banks Sutton

S Ulrich

S Bachmann

S Hilt

Other publications of authors with the same name

Learning to Predict by Methods of Temporal DifferencesR. Sutton. TR87-509. GTE Laboratories Inc., Waltham, MA, (1987)Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse CodingR. Sutton. Advances in Neural Information Processing Systems 8, Cambridge, MA: MIT Press, (1996)Associative Search Network: A Reinforcement Learning Associative MemoryA. Barto, R. Sutton, and P. Brouwer. Biological Cybernetics, (1981)Experiments with reinforcement learning in problems with continuous state and action spacesJ. Santamar&\#237;a, R. Sutton, and A. Ram. Adapt. Behav., 6 (2): 163--217 (1997)DYNA, an integrated architecture for learning, planning, and reactingR. Sutton. Working Notes of the 1991 AAAI Spring Symposium on Integrated Intelligent Architectures, (1991)Reinforcement Learning of Local Shape in the Game of GoD. Silver, R. Sutton, and M. 0003. IJCAI, page 1053-1058. (2007)Multi-step Reinforcement Learning: A Unifying Algorithm.K. Asis, J. Hernandez-Garcia, G. Holland, and R. Sutton. CoRR, (2017)Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning.A. Kearney, V. Veeriah, J. Travnik, P. Pilarski, and R. Sutton. CoRR, (2019)A new Q(lambda) with interim forward view and Monte Carlo equivalence.R. Sutton, A. Mahmood, D. Precup, and H. van Hasselt. ICML, volume 32 of JMLR Workshop and Conference Proceedings, page 568-576. JMLR.org, (2014)Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System.E. Ludvig, R. Sutton, and E. Kehoe. Neural Comput., 20 (12): 3034-3054 (2008)

BibSonomy

Disambiguation of "Sutton, Richard S."

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Weighted importance sampling for off-policy learning with linear function approximation.

Please choose a person to relate this publication to

Richard S Ascough

William Banks Sutton

S Ulrich

S Bachmann

S Hilt

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Sutton, Richard S."

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Weighted importance sampling for off-policy learning with linear function approximation.

Please choose a person to relate this publication to

Richard S Ascough

William Banks Sutton

S Ulrich

S Bachmann

S Hilt

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Weighted importance sampling for off-policy learning with linear function approximation.