Abstract
Modern deep learning methods provide an effective means to learn good
representations. However, is a good representation itself sufficient for
efficient reinforcement learning? This question is largely unexplored, and the
extant body of literature mainly focuses on conditions which permit efficient
reinforcement learning with little understanding of what are necessary
conditions for efficient reinforcement learning. This work provides strong
negative results for reinforcement learning methods with function approximation
for which a good representation (feature extractor) is known to the agent,
focusing on natural representational conditions relevant to value-based
learning and policy-based learning. For value-based learning, we show that even
if the agent has a highly accurate linear representation, the agent still needs
to sample exponentially many trajectories in order to find a near-optimal
policy. For policy-based learning, we show even if the agent's linear
representation is capable of perfectly representing the optimal policy, the
agent still needs to sample exponentially many trajectories in order to find a
near-optimal policy.
These lower bounds highlight the fact that having a good (value-based or
policy-based) representation in and of itself is insufficient for efficient
reinforcement learning. In particular, these results provide new insights into
why the existing provably efficient reinforcement learning methods rely on
further assumptions, which are often model-based in nature. Additionally, our
lower bounds imply exponential separations in the sample complexity between 1)
value-based learning with perfect representation and value-based learning with
a good-but-not-perfect representation, 2) value-based learning and policy-based
learning, 3) policy-based learning and supervised learning and 4) reinforcement
learning and imitation learning.
Users
Please
log in to take part in the discussion (add own reviews or comments).