ICLR国际表征学习大会是深度学习领域的顶级会议。本次会议共收到4956篇论文投稿,接收1574篇,本届会议录用率约为30%。其中涉及推荐系统相关论文5篇,特此整理出来以供大家学习。
会议的全部接收论文地址如下: https://iclr.cc/virtual/2023/papers.html
由于ICLR是Open Review的,所以我们可以看到每篇论文的投稿过程,了解在投稿过程中所关注论文的审稿意见以及评分,并可以学习一下在投稿过程中如何与审稿人进行亲切友好的battle。
下文整理了每篇文章的标题、论文链接、演讲地址以及论文摘要。主要涉及强化学习推荐、图推荐、鲁棒推荐系统以及去偏推荐系统。大家可以通过链接获取论文的详细评审意见以及论文的原始文件。
1. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor
pdf: https://openreview.net/pdf?id=HmPOzJQhbwg ppt: https://iclr.cc/media/iclr-2023/Slides/12018_pBpzLdl.pdf openview: https://openreview.net/forum?id=HmPOzJQhbwg presentation: https://iclr.cc/virtual/2023/poster/12018
Wanqi Xue · Qingpeng Cai · Ruohan Zhan · Dong Zheng · Peng Jiang · Kun Gai · Bo An
Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation. However, due to expensive online interactions, it is very difficult for RL algorithms to perform state-action value estimation, exploration and feature extraction when optimizing long-term engagement. In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy. In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. ResAct optimizes the policy by first reconstructing the online behaviors and then improving it via a Residual Actor. To extract long-term information, ResAct utilizes two information-theoretical regularizers to confirm the expressiveness and conciseness of features. We conduct experiments on a benchmark dataset and a large-scale industrial dataset which consists of tens of millions of recommendation requests. Experimental results show that our method significantly outperforms the state-of-the-art baselines in various long-term engagement optimization tasks.
2. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation
pdf: https://openreview.net/pdf?id=FKXVK9dyMM openview: https://openreview.net/forum?id=FKXVK9dyMM presentation: https://iclr.cc/virtual/2023/poster/11723
Xuheng Cai · Chao Huang · Lianghao Xia · Xubin Ren
Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNNs integrated with contrastive learning have shown superior performance in recommendation with their data augmentation schemes, aiming at dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues impairing the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structural refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://github.com/HKUDS/LightGCL.
3. StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random
pdf: https://openreview.net/pdf?id=3VO1y5N7K1H openview: https://openreview.net/forum?id=3VO1y5N7K1H
Haoxuan Li · Chunyuan Zheng · Peng Wu
In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.
4. SCalibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
pdf: https://openreview.net/pdf?id=wzlWiO_WY4 openview: https://openreview.net/forum?id=wzlWiO_WY4 presentation: https://iclr.cc/virtual/2023/poster/11322
Yewen Fan · Nian Si · Kun Zhang
Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration often suffers from a problem called “maximization bias”. Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions are achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we quantify maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problem under covariate shifts, without incurring additional online serving costs or compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
5. TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations
pdf: https://openreview.net/pdf?id=EIgLnNx_lC openview: https://openreview.net/forum?id=EIgLnNx_lC
Haoxuan Li · Yan Lyu · Chunyuan Zheng · Peng Wu
Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate.However, our theoretical analysis reveals that DR usually has a large variance. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which usually occur in practice. In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR approaches when the error imputation model is misspecified. In addition, we further propose a novel semi-parametric collaborative learning approach that decomposes imputed errors into parametric and nonparametric parts and updates them collaboratively, resulting in more accurate predictions. Both theoretical analysis and experiments demonstrate the superiority of the proposed methods compared with existing debiasing methods.