Abstract
With the recent prevalence of Reinforcement Learning (RL), there have been
tremendous interests in developing RL-based recommender systems. In practical
recommendation sessions, users will sequentially access multiple scenarios,
such as the entrance pages and the item detail pages, and each scenario has its
own recommendation strategy. However, the majority of existing RL-based
recommender systems focus on separately optimizing each strategy, which could
lead to sub-optimal overall performance, because independently optimizing each
scenario (i) overlooks the sequential correlation among scenarios, (ii) ignores
users' behavior data from other scenarios, and (iii) only optimizes its own
objective but neglects the overall objective of a session. Therefore, in this
paper, we study the recommendation problem with multiple (consecutive)
scenarios, i.e., whole-chain recommendations. We propose a multi-agent
reinforcement learning based approach (DeepChain), which can capture the
sequential correlation among different scenarios and jointly optimize multiple
recommendation strategies. To be specific, all recommender agents share the
same memory of users' historical behaviors, and they work collaboratively to
maximize the overall reward of a session. Note that optimizing multiple
recommendation strategies jointly faces two challenges - (i) it requires huge
amounts of user behavior data, and (ii) the distribution of reward (users'
feedback) are extremely unbalanced. In this paper, we introduce model-based
reinforcement learning techniques to reduce the training data requirement and
execute more accurate strategy updates. The experimental results based on data
from a real e-commerce platform demonstrate the effectiveness of the proposed
framework. Further experiments have been conducted to validate the importance
of each component of DeepChain.
Users
Please
log in to take part in the discussion (add own reviews or comments).