Article,

Model-Based Reinforcement Learning for Whole-Chain Recommendations

X. Zhao, L. Xia, Y. Zhao, D. Yin, and J. Tang.
(2019)cite arxiv:1902.03987.

Abstract

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems. In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its own recommendation strategy. However, the majority of existing RL-based recommender systems focus on separately optimizing each strategy, which could lead to sub-optimal overall performance, because independently optimizing each scenario (i) overlooks the sequential correlation among scenarios, (ii) ignores users' behavior data from other scenarios, and (iii) only optimizes its own objective but neglects the overall objective of a session. Therefore, in this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent reinforcement learning based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendation strategies. To be specific, all recommender agents share the same memory of users' historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges - (i) it requires huge amounts of user behavior data, and (ii) the distribution of reward (users' feedback) are extremely unbalanced. In this paper, we introduce model-based reinforcement learning techniques to reduce the training data requirement and execute more accurate strategy updates. The experimental results based on data from a real e-commerce platform demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to validate the importance of each component of DeepChain.

BibTeX key: zhao2019ChainRecommendations
entry type: article
year: 2019
url: http://arxiv.org/abs/1902.03987
note: cite arxiv:1902.03987

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 zhao2019ChainRecommendations %A Zhao, Xiangyu %A Xia, Long %A Zhao, Yihong %A Yin, Dawei %A Tang, Jiliang %D 2019 %K %T Model-Based Reinforcement Learning for Whole-Chain Recommendations %U http://arxiv.org/abs/1902.03987 %X With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems. In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its own recommendation strategy. However, the majority of existing RL-based recommender systems focus on separately optimizing each strategy, which could lead to sub-optimal overall performance, because independently optimizing each scenario (i) overlooks the sequential correlation among scenarios, (ii) ignores users' behavior data from other scenarios, and (iii) only optimizes its own objective but neglects the overall objective of a session. Therefore, in this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent reinforcement learning based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendation strategies. To be specific, all recommender agents share the same memory of users' historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges - (i) it requires huge amounts of user behavior data, and (ii) the distribution of reward (users' feedback) are extremely unbalanced. In this paper, we introduce model-based reinforcement learning techniques to reduce the training data requirement and execute more accurate strategy updates. The experimental results based on data from a real e-commerce platform demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to validate the importance of each component of DeepChain.

@article{zhao2019ChainRecommendations, abstract = {With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems. In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its own recommendation strategy. However, the majority of existing RL-based recommender systems focus on separately optimizing each strategy, which could lead to sub-optimal overall performance, because independently optimizing each scenario (i) overlooks the sequential correlation among scenarios, (ii) ignores users' behavior data from other scenarios, and (iii) only optimizes its own objective but neglects the overall objective of a session. Therefore, in this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent reinforcement learning based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendation strategies. To be specific, all recommender agents share the same memory of users' historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges - (i) it requires huge amounts of user behavior data, and (ii) the distribution of reward (users' feedback) are extremely unbalanced. In this paper, we introduce model-based reinforcement learning techniques to reduce the training data requirement and execute more accurate strategy updates. The experimental results based on data from a real e-commerce platform demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to validate the importance of each component of DeepChain.}, added-at = {2019-07-13T10:17:27.000+0200}, author = {Zhao, Xiangyu and Xia, Long and Zhao, Yihong and Yin, Dawei and Tang, Jiliang}, biburl = {https://www.bibsonomy.org/bibtex/2a8688f6b15dd7790db3c2e5cdd7969f3/lanteunis}, interhash = {e88b3029f1f174e033f57634081513ec}, intrahash = {a8688f6b15dd7790db3c2e5cdd7969f3}, keywords = {}, note = {cite arxiv:1902.03987}, timestamp = {2019-07-13T10:17:27.000+0200}, title = {Model-Based Reinforcement Learning for Whole-Chain Recommendations}, url = {http://arxiv.org/abs/1902.03987}, year = 2019 }

BibSonomy

Model-Based Reinforcement Learning for Whole-Chain Recommendations

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on