论文阅读8-----基于强化学习的推荐系统

2021-01-18 17:57:15 浏览数 (1)

Whole-Chain Recommendations

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems.

RL可以用于推荐系统。

In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its specific characteristics.

推荐用很多个场景,比如进入页面,页面详细页面等,每一个页面都有其特别的特质。

However, the majority of existing RL-based recommende systems focus on optimizing one strategy for all scenarios or separately optimizing each strategy, which could lead to sub-optimal overall performance.

其他的RL只能优化一个场景或是分别优化场景,可能会导致次优化结果。

In this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent RL-based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendationstrategies.

为了实现多个场景的共同优化,我们提出了multi-agent RL的推荐系统来共同推荐。

To be specific, all recommender agents (RAs) share the same memory of users’ historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges in the existing model-free RL model [22] -

model-free的有两个缺点

(i) it requires huge amounts of user behavior data, and

第一需要大量的数据

(ii) the distribution of reward (users’ feedback) are extremely unbalanced.

奖励的分布极其不平衡

In this paper, we introduce model-based RL techniques to reduce the training data requirement and execute more accurate strategy updates.

所以,我们是model-based的方法

The experimental results based on a real e-commerce platform demonstrate the effectiveness of the proposed framework.

实验证明我们很厉害

从这个场景调到这个场景从这个场景调到这个场景
两个actor。M负责providing recommendations in the entrance page 而actor-D负责recommendations in detail page两个actor。M负责providing recommendations in the entrance page 而actor-D负责recommendations in detail page
At操作类似与NARM的由来At操作类似与NARM的由来
基本操作,如果是entrance界面就是actor-M,如果是detail页面就是actor-D界面,反正每次就只有一个可以激活基本操作,如果是entrance界面就是actor-M,如果是detail页面就是actor-D界面,反正每次就只有一个可以激活
就是在那个页面就使用哪一个ideal就是在那个页面就使用哪一个ideal
off-polic用于数据收集,详情可见我的其他专辑4,5,3之类的,它是通过离线数据采用序列化推荐格式训练而来off-polic用于数据收集,详情可见我的其他专辑4,5,3之类的,它是通过离线数据采用序列化推荐格式训练而来
离线训练好了,上线测试呗离线训练好了,上线测试呗
只有被标记过的数据有reward,所以只用那学校有的只有被标记过的数据有reward,所以只用那学校有的

好了好了又想学习推荐系统科研的小可爱们,但又不知道该怎样写代码的可以可我的github主页或是由中国人民大学出品的RecBole

https://github.com/xingkongxiaxia/Sequential_Recommendation_System 基于ptyorch的当今主流推荐算法

https://github.com/xingkongxiaxia/tensorflow_recommend_system 我还有基于tensorflow的代码

https://github.com/RUCAIBox/RecBole RecBole(各种类型的,超过60种推荐算法)

欢迎大家点小星星

0 人点赞