Author of the publication

Weighted importance sampling for off-policy learning with linear function approximation.

, , and . NIPS, page 3014-3022. (2014)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Learning to Predict by Methods of Temporal Differences. TR87-509. GTE Laboratories Inc., Waltham, MA, (1987)Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. Advances in Neural Information Processing Systems 8, Cambridge, MA: MIT Press, (1996)Associative Search Network: A Reinforcement Learning Associative Memory, , and . Biological Cybernetics, (1981)Experiments with reinforcement learning in problems with continuous state and action spaces, , and . Adapt. Behav., 6 (2): 163--217 (1997)DYNA, an integrated architecture for learning, planning, and reacting. Working Notes of the 1991 AAAI Spring Symposium on Integrated Intelligent Architectures, (1991)Reinforcement Learning of Local Shape in the Game of Go, , and . IJCAI, page 1053-1058. (2007)Multi-step Reinforcement Learning: A Unifying Algorithm., , , and . CoRR, (2017)Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning., , , , and . CoRR, (2019)A new Q(lambda) with interim forward view and Monte Carlo equivalence., , , and . ICML, volume 32 of JMLR Workshop and Conference Proceedings, page 568-576. JMLR.org, (2014)Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System., , and . Neural Comput., 20 (12): 3034-3054 (2008)