论文阅读11-----基于强化学习的推荐系统

2021-01-20 10:33:29 浏览数 (1)

Large-Scale Interactive RecommendationwithTree-Structured Policy Gradient

Abstract Reinforcement learning (RL) has recently been introduced to interactive recommender systems (IRS) because of its nature of learning from dynamic interactions and planning for longrun performance.

RL可以被用于IRS因为它动态的特性以及为长期行为的打算。

As IRS is always with thousands of items to recommend (i.e., thousands of actions), most existing RLbased methods, however, fail to handle such a large discrete action space problem and thus become inefficient. The existing work that tries to deal with the large discrete action space problem by utilizing the deep deterministic policy gradient framework suffers from the inconsistency between the continuous action representation (the output of the actor network) and the real discrete action.

需要推荐的东西比较多,为了能够把RL用于推荐系统我们常常采用DDPG格式,但是DDPG格式会出现真是action和outpput出来的action之间的差异(一般采用cos similarity或是欧氏距离最近)

To avoid such inconsistency and achieve high efficiency and recommendation effectiveness, in this paper,

我们解决两者之间的不连贯性以及提高了它的效率。

we propose a Tree-structured Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree.

就是我们采用了层次化的聚集树,所白了一层一层从上往下走,最后的叶子结点为action,每一层形成一个policy gradient选择下一层直到最后一个。

Extensive experiments on carefully-designed environments based on two real-world datasets demonstrate that our model provides superior recommendation performance and significant efficiency improvement over state-of-the-art methods.

实验证明我们很厉害。

我们先来看一下模型图

从上往下,一层一层的选择直到最后一个,从上而下的选择可以形成一个序号(1,2,4,8)如图选择的那个点的标号就是(1,2,4,8)代表item8。基本上这个模型的理念就是这样的,这样形成的一个向量就可以用于计算从上往下,一层一层的选择直到最后一个,从上而下的选择可以形成一个序号(1,2,4,8)如图选择的那个点的标号就是(1,2,4,8)代表item8。基本上这个模型的理念就是这样的,这样形成的一个向量就可以用于计算
state的表现形式state的表现形式
总之想法还比较独特,通过一层一层的分解减少了action space还是比较6的总之想法还比较独特,通过一层一层的分解减少了action space还是比较6的
policy gradient个数policy gradient个数
上诉两图就是形成一个episode上诉两图就是形成一个episode

好了好了又想学习推荐系统科研的小可爱们,但又不知道该怎样写代码的可以可我的github主页或是由中国人民大学出品的RecBole

https://github.com/xingkongxiaxia/Sequential_Recommendation_System 基于ptyorch的当今主流推荐算法

https://github.com/xingkongxiaxia/tensorflow_recommend_system 我还有基于tensorflow的代码

https://github.com/RUCAIBox/RecBole RecBole(各种类型的,超过60种推荐算法)

欢迎大家点小星星

0 人点赞