论文阅读9-----基于强化学习的推荐系统

Simulating User Feedback for Reinforcement Learning Based Recommendations

Abstract

With the recent advances in Reinforcement Learning (RL),there have been tremendous interests in employing RL fo recommender systems. However, directly training and evaluating a new RL-based recommendation algorithm needs to

collect users’ real-time feedback in the real system, which is time and efforts consuming and could negatively impact

on users’ experiences.

日常abstract操作，RL用于推荐系统很受欢迎，但是直接用RL推荐系统做线上测试和训练会伤害用户。

Thus, it calls for a user simulator that can mimic real users’ behaviors where we can pre-train and evaluate new recommendation algorithms. Simulating users’ behaviors in a dynamic system faces immense challenges –

所以我们来解决这个问题了，模拟有这些问题。

(i) the underlining item distribution is complex, and

item的分布十分复杂。

(ii) historical logs for each user are limited.

历史日志很有限。

In this paper, we develop a user simulator base on Generative Adversarial Net-

work (GAN).

所以我们开发了一个模拟器保证让你神魂颠倒。

To be specific, the generator captures the underlining distribution of users’ historical logs and generates realistic logs that can be considered as augmentations of real logs;

我们的模拟器可强了，generator可以使用历史数据生成真实数据（跟监督学习预测未发生的一样一样的）用于增加数据。

while the discriminator not only distinguishes real and fake logs but also predicts users’ behaviors.

discriminator也挺厉害的，不仅可以区分真的或是假的数据也可以用来预测用户行为。

The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed simulator

实验证明了我们很厉害。

RL用于推荐系统，simulator用于模拟环境（说白了就是给推荐系统推荐的物品进行打分）

introduction

一般introduction，1.都是什么事物在发展，是必须有前途的那种事物，2.好了这个事物出现了一个问题，这个这个问题会给这个事物带去什么害处，反正就是不解决就晚了，3.有的有，有的没有（开创性事物可能没有），但是说他们并不是说这个已经有的ideal的好处，而是它们有一大堆缺点4.好了，我们来了，我们的ideal既能解决问题，还没有那些副作用或是改进。

contributions：

日常吹比时间

1.我们提出了方法可以根据离线数据生成真实数据。（我们解决了问题）

2.提出基于那个方法的模型可以用于问题的解决。（有些不会这样写，单都基本一样，至少要三个contributions）

3.实验证明了我们的确很溜。（这个基本每一篇文章都有）