论文阅读8-----基于强化学习的推荐系统

Whole-Chain Recommendations

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems.

RL可以用于推荐系统。

In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its specific characteristics.

推荐用很多个场景，比如进入页面，页面详细页面等，每一个页面都有其特别的特质。

However, the majority of existing RL-based recommende systems focus on optimizing one strategy for all scenarios or separately optimizing each strategy, which could lead to sub-optimal overall performance.

其他的RL只能优化一个场景或是分别优化场景，可能会导致次优化结果。

In this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent RL-based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendationstrategies.

为了实现多个场景的共同优化，我们提出了multi-agent RL的推荐系统来共同推荐。

To be specific, all recommender agents (RAs) share the same memory of users’ historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges in the existing model-free RL model [22] -

model-free的有两个缺点

(i) it requires huge amounts of user behavior data, and

第一需要大量的数据

(ii) the distribution of reward (users’ feedback) are extremely unbalanced.

奖励的分布极其不平衡

In this paper, we introduce model-based RL techniques to reduce the training data requirement and execute more accurate strategy updates.

所以，我们是model-based的方法

The experimental results based on a real e-commerce platform demonstrate the effectiveness of the proposed framework.

实验证明我们很厉害