We introduce "Search with Amortized Value Estimates" (SAVE), an approach for
combining model-free Q-learning with model-based Monte-Carlo Tree Search
(MCTS). In SAVE, a learned prior over state-action values is used to guide
MCTS, which estimates an improved set of state-action values. The new
Q-estimates are then used in combination with real experience to update the
prior. This effectively amortizes the value computation performed by MCTS,
resulting in a cooperative relationship between model-free learning and
model-based search. SAVE can be implemented on top of any Q-learning agent with
access to a model, which we demonstrate by incorporating it into agents that
perform challenging physical reasoning tasks and Atari. SAVE consistently
achieves higher rewards with fewer training steps, and---in contrast to typical
model-based search approaches---yields strong performance with very small
search budgets. By combining real experience with information computed during
search, SAVE demonstrates that it is possible to improve on both the
performance of model-free learning and the computational cost of planning.
Description
[1912.02807] Combining Q-Learning and Search with Amortized Value Estimates
%0 Generic
%1 hamrick2019combining
%A Hamrick, Jessica B.
%A Bapst, Victor
%A Sanchez-Gonzalez, Alvaro
%A Pfaff, Tobias
%A Weber, Theophane
%A Buesing, Lars
%A Battaglia, Peter W.
%D 2019
%K 2019 arxiv reinforcement-learning
%T Combining Q-Learning and Search with Amortized Value Estimates
%U http://arxiv.org/abs/1912.02807
%X We introduce "Search with Amortized Value Estimates" (SAVE), an approach for
combining model-free Q-learning with model-based Monte-Carlo Tree Search
(MCTS). In SAVE, a learned prior over state-action values is used to guide
MCTS, which estimates an improved set of state-action values. The new
Q-estimates are then used in combination with real experience to update the
prior. This effectively amortizes the value computation performed by MCTS,
resulting in a cooperative relationship between model-free learning and
model-based search. SAVE can be implemented on top of any Q-learning agent with
access to a model, which we demonstrate by incorporating it into agents that
perform challenging physical reasoning tasks and Atari. SAVE consistently
achieves higher rewards with fewer training steps, and---in contrast to typical
model-based search approaches---yields strong performance with very small
search budgets. By combining real experience with information computed during
search, SAVE demonstrates that it is possible to improve on both the
performance of model-free learning and the computational cost of planning.
@misc{hamrick2019combining,
abstract = {We introduce "Search with Amortized Value Estimates" (SAVE), an approach for
combining model-free Q-learning with model-based Monte-Carlo Tree Search
(MCTS). In SAVE, a learned prior over state-action values is used to guide
MCTS, which estimates an improved set of state-action values. The new
Q-estimates are then used in combination with real experience to update the
prior. This effectively amortizes the value computation performed by MCTS,
resulting in a cooperative relationship between model-free learning and
model-based search. SAVE can be implemented on top of any Q-learning agent with
access to a model, which we demonstrate by incorporating it into agents that
perform challenging physical reasoning tasks and Atari. SAVE consistently
achieves higher rewards with fewer training steps, and---in contrast to typical
model-based search approaches---yields strong performance with very small
search budgets. By combining real experience with information computed during
search, SAVE demonstrates that it is possible to improve on both the
performance of model-free learning and the computational cost of planning.},
added-at = {2019-12-11T16:08:57.000+0100},
author = {Hamrick, Jessica B. and Bapst, Victor and Sanchez-Gonzalez, Alvaro and Pfaff, Tobias and Weber, Theophane and Buesing, Lars and Battaglia, Peter W.},
biburl = {https://www.bibsonomy.org/bibtex/21a497237554c543de45a3c1d34ab3b58/analyst},
description = {[1912.02807] Combining Q-Learning and Search with Amortized Value Estimates},
interhash = {4db8c872e3dba18333a37f16885540e1},
intrahash = {1a497237554c543de45a3c1d34ab3b58},
keywords = {2019 arxiv reinforcement-learning},
note = {cite arxiv:1912.02807},
timestamp = {2019-12-11T16:08:57.000+0100},
title = {Combining Q-Learning and Search with Amortized Value Estimates},
url = {http://arxiv.org/abs/1912.02807},
year = 2019
}