Article,

Introspective Q-learning and learning from demonstration

M. Li, T. Brys, and D. Kudenko.
The Knowledge Engineering Review, (2019)
DOI: DOI: 10.1017/S0269888919000031

Abstract

One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.

BibTeX key: li2019introspective
entry type: article
year: 2019
journal: The Knowledge Engineering Review
publisher: Cambridge University Press
volume: 34
issn: 02698889
DOI: DOI: 10.1017/S0269888919000031
url: https://www.cambridge.org/core/article/introspective-qlearning-and-learning-from-demonstration/33B9FD738F4B3935F2788021F2D3E885

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{li2019introspective, abstract = {One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.}, added-at = {2020-01-21T12:41:31.000+0100}, author = {Li, Mao and Brys, Tim and Kudenko, Daniel}, biburl = {https://www.bibsonomy.org/bibtex/2aefd6718424528f3f8d2e1ff68becd40/kudenko}, description = {Introspective Q-learning and learning from demonstration | The Knowledge Engineering Review | Cambridge Core}, doi = {DOI: 10.1017/S0269888919000031}, editor = {McBurney, Peter}, interhash = {bd2d3f88a5f1251636d312d2cd4c32a7}, intrahash = {aefd6718424528f3f8d2e1ff68becd40}, issn = {02698889}, journal = {The Knowledge Engineering Review}, keywords = {myown}, publisher = {Cambridge University Press}, timestamp = {2020-01-21T12:53:19.000+0100}, title = {Introspective Q-learning and learning from demonstration}, url = {https://www.cambridge.org/core/article/introspective-qlearning-and-learning-from-demonstration/33B9FD738F4B3935F2788021F2D3E885}, volume = 34, year = 2019 }

BibSonomy

Introspective Q-learning and learning from demonstration

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on