With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in utilizing RL fo
online advertising in recommendation platforms (e.g. ecommerce and news feed sites).
However, most RL-based advertising algorithms focus on solely optimizing the revenue of ads while ignoring possible negative influence of ads on user experience of recommended items (products, articles and videos).
Developing an optimal advertising algorithm in recommendations faces immense challenges because interpolating ads improperly or too frequently may decrease user experience, while interpolating fewer ads will reduce the advertising revenue.
Thus, in this paper, we propose a novel advertising strategy for the rec/ads trade-off.
To be specific,we develop a reinforcement learning based framework tha can continuously update its advertising strategies and maximize reward in the long run.
Given a recommendation list, we design a novel Deep Q-network architecture that can determine three internally related tasks jointly, i.e.,
(i) whethe to interpolate an ad or not in the recommendation list, and if yes,
(ii) the optimal ad and
(iii) the optimal location to interpolate.
The experimental results based on real-world data demonstrate the effectiveness of the proposed framework.
Online advertising is a form of advertising that leverages the Internet to deliver promotional marketing messages to con-
sumers. The goal of online advertising is to assign the right ads to the right consumers so as to maximize the revenue,
click-through rate (CTR) or return on investment (ROI) of the advertising campaign. The two main marketing strategies in online advertising are guaranteed delivery (GD) and real-time bidding (RTB).
For guaranteed delivery, ad exposures to consumers are guaranteed by contracts signed between advertisers and publishers in advance (Jia et al. 2016).
For real-time bidding, each ad impression is bid by advertisers in real-time when an impression is just generated from
a consumer visit (Cai et al. 2017).
However, the majority of online advertising techniques are based on offline/static optimization algorithms that treat each impression independently and maximize the immediate revenue for each impression, which is challenging in real-world business, especially when the environment is unstable.
Therefore, great efforts have been made on developing reinforcement learning based online advertising techniques (Cai et al. 2017Wang et al. 2018a; Zhao et al. 2018b; Rohde et al. 2018; Wu et al. 2018b; Jin et al. 2018), which can continuously up-
date their advertising strategies during the interactions with consumers and the optimal strategy is made by maximizing the expected long-term cumulative revenue from consumers.
However, most existing works focus on maximizing the income of ads, while ignoring the negative influence of ads on
user experience for recommendations.
Designing an appropriate advertising strategy is a challenging problem, since
(i) displaying too many ads or improper ads will degrade user experience and engagement;and
(ii) displaying insufficient ads will reduce the advertising revenue of the platforms.
In real-world platforms,
as shown in Figure 1, ads are often displayed with normal recommended items, where recommendation and advertising strategies are typically developed by different departments, and optimized by different techniques with different metrics (Feng et al. 2018). Upon a user’s request, the recommendation system firstly generates a list of recommendations according to user’s interests, and then the advertising system needs to make three decisions (sub-actions), i.e. whether to interpolate an ad in current recommendation list (rec-list); and if yes, the advertising system also needs to choose the optimal ad and interpolate it into the optimal location (e.g. in Figure 1 the advertising agent (AA) decides to interpolate an ad ad9between rec2and rec3of the reclist). The first sub-action maintains the frequency of ads, while the other two sub-actions aims to control the appropriateness of ads. The goal of advertising strategy is to simultaneously maximize the income of ads and minimize the negative influence of ads on user experience.
The above-mentioned three decisions(sub-actions) are internally related, i.e., (only) when the AA decides to interpo-
late an ad, the locations and candidate ads together determine the rewards. Figure 2 illustrates the two conventional
Deep Q-network (DQN) architectures for online advertising.
Note that in this paper we suppose (i) there are |A| candidate ads for each request, and (ii) the length of the recommenda-
tion list (or rec-list) is L. The DQN in Figure 2(a) takes the state space and outputs Q-values of all locations. This archi-
tecture can determine the optimal location but cannot choose the specific ad to interpolate. The DQN in Figure 2(b) inputs
a state-action pair and outputs the Q-value corresponding to a specific action (ad). This architecture can select a specific
ad but cannot decide the optimal location.
figure 2:a 输入state,输出各个location的价值,可以决定好的位置,但不可以决定投放哪一个广告,b是输入(state,action)对,此时的action是广告,输出价值,可以决定那一广告好,但并不能决定哪一个位置比较好。
Taking a representation of location (e.g. one-hot vector) as the additional input is an alternative way, but O(|A| · L) evaluations are necessary to find the optimal action-value function Q∗(s, a),which prevents the DQN architecture from being adopted inpractical advertising systems. It is worth to note that both architectures cannot determine whether to interpolate an ad (or not) into a given rec-list.
Thus, in this paper, we design a new DEep reinforcement learning framework with a novel DQN architecture for online Advertising in Recommende systems (DEAR), which can determine the aforementioned three tasks simultaneously with reasonable time complexity. We summarize our major contributions as follows:
We identify the phenomena of online advertising with recommendations and provide a principled approach for better advertising strategy;
• We propose a deep reinforcement learning based framework DEAR and a novel Q-network architecture, which can simultaneously determine whether to interpolate an ad, the optimal location and which ad to interpolate;
• We demonstrate the effectiveness of the proposed framwork in real-world short video site.
problem statement
proposed framework
As aforementioned the online advertising in recommende system problem is challenging because
(i) the action of the advertising agent (AA) is complex which consists of three sub-actions, i.e., whether interpolate an ad into current rec-list, if yes, which ad is optimal and where is the best location;
(ii) the three sub-actions are internally related, i.e.,when the AA decides to interpolate an ad, the candidate ads and locations are interactive to maximize the reward, which prevents traditional DQN architectures from being employed in online advertising systems; and
(iii) the AA should simultaneously maximize the income of ads and minimize the negative influence of ads on user experience.
To address these challenges, we propose a deep reinforcement learning framework with a novel Deep Q-network architecture. In the following, we first introduce the processing of state and action features, and then we illustrate the proposed DQN architecture with optimization algorithm.
sequence的recommendations记录和广告用两个RNN得到最后的状态,C是contextual information(用户手机型号,年龄。。。)RECt是要推荐的一堆东西,At是广告,然后输出是每一个location的广告价值。骚操作来了
The Reward Function
https://github.com/xingkongxiaxia/Sequential_Recommendation_System 基于ptyorch的当今主流推荐算法
https://github.com/xingkongxiaxia/tensorflow_recommend_system 我还有基于tensorflow的代码
https://github.com/RUCAIBox/RecBole RecBole(各种类型的,超过60种推荐算法)