ICLR2024推荐系统投稿论文一览

2023-11-30 14:53:34 浏览数 (1)

今年ICLR会议已经把审稿意见放出来了,特此整理了一下关于推荐系统相关的论文,总共筛选出31篇。值得说明的是,之前整理的顶会论文都是正式被接收的,比如NeurlPS2023推荐系统论文集锦等。这次由于ICLR是Open Review的,所以目前下文所列出的论文列表不是最终的接收列表,而是投稿列表。正因为如此,我们可以看到每篇论文的投稿过程,了解在投稿过程中所关注论文的审稿意见以及评分,并可以学习一下在投稿过程中如何与审稿人进行“亲切友好”的battle。下文整理了每篇文章的标题、目前获得的评分、论文链接以及论文摘要。大家可以通过链接获取论文的详细评审意见以及论文的原始文件。

通过对本次ICLR上关于推荐系统相关论文的总结,发现所涉及的研究方向比较广泛,包括因果推荐、强化学习推荐、策略推荐、图推荐、安全协同过滤、跨域推荐、解耦表示、大语言模型推荐、推荐遗忘等。

更多ICLR论文可移步下文链接。

https://openreview.net/group?id=ICLR.cc/2024/Conference

以下整理了论文标题、评分、链接以及摘要,如感兴趣可移步原文精读。

1. STUDY: Socially Aware Temporally Causal Decoder Recommender Systems

2. SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems

3. UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems

4. Strategic Recommendations for Improved Outcomes in Congestion Games

5. Categorical Features of entities in Recommendation Systems Using Graph Neural Networks

6. Safe Collaborative Filtering

7. Cross-domain Recommendation from Implicit Feedback

8. Disentangled Heterogeneous Collaborative Filtering

9. Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference for Recommendation

10. Demystifying Embedding Spaces using Large Language Models

11. FIITED: Fine-grained embedding dimension optimization during training for recommender systems

12. From Deterministic to Probabilistic World: Balancing Enhanced Doubly Robust Learning for Debiased Recommendation

13. How Does Message Passing Improve Collaborative Filtering?

14. VibeSpace: Automatic vector embedding creation for arbitrary domains and mapping between them using large language models

15. Unifying User Preferences and Critic Opinions: A Multi-View Cross-Domain Item-sharing Recommender System

16. GNN-based Reinforcement Learning Agent for Session-based Recommendation

17. Basis Function Encoding of Numerical Features in Factorization Machines for Improved Accuracy

18. MOESART: An Effective Sampling-based Router for Sparse Mixture of Experts

19. On the Embedding Collapse When Scaling up Recommendation Models

20. Hyperbolic Embeddings in Sequential Self-Attention for Improved Next-Item Recommendations

21. Constraining Non-Negative Matrix Factorization to Improve Signature Learning

22. Farzi Data: Autoregressive Data Distillation

23. Factual and Personalized Recommendation Language Modeling with Reinforcement Learning

24. ConvFormer: Revisiting Token-mixers for Sequential User Modeling

25. Talk like a Graph: Encoding Graphs for Large Language Models

26. Weight Uncertainty in Individual Treatment Effect

27. Explaining recommendation systems through contrapositive perturbations

28. Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators

29. Evidential Conservative Q-Learning for Dynamic Recommendations

30. UNLEARNING THE UNWANTED DATA FROM A PERSONALIZED RECOMMENDATION MODEL

31. AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations

1. STUDY: Socially Aware Temporally Causal Decoder Recommender Systems

Ratings: 5, 1, 5, 3, 5

https://openreview.net/forum?id=6CfJp9NG6Q

Recommender systems are widely used to help people find items that are tailored to their interests. These interests are often influenced by social networks, making it important to use social network information effectively in recommender systems, especially for demographic groups with interests that differ from the majority. This paper introduces STUDY, a Socially-aware Temporally caUsal Decoder recommender sYstem. The STUDY architecture is significantly more efficient to learn and train than existing methods and performs joint inference over socially-connected groups in a single forward pass of a modified transformer decoder network. We demonstrate the benefits of STUDY in the recommendation of books for students who have dyslexia or are struggling readers. Students with dyslexia often have difficulty engaging with reading material, making it critical to recommend books that are tailored to their interests. We worked with our non-profit partner Learning Ally to evaluate STUDY on a dataset of struggling readers. STUDY was able to generate recommendations that more accurately predicted student engagement, when compared with existing methods.

2. SUBER: An RL Environment with Simulated Human Behavior for Recommender Systems

Ratings: 6, 5, 3, 3

https://openreview.net/forum?id=w327zcRpYn

Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. By utilizing LLMs as synthetic users, this work introduces a modular and novel framework for training RL-based recommender systems. The software, including the RL environment, is publicly available.

3. UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender Systems

Ratings: 5, 8, 3

https://openreview.net/forum?id=hJCinlknXn

Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems by effectively exploring users' interests. However, modern recommender system exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration. For example, user behaviors with different activity levels require varying intensity of exploration, while previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hurts user experiences in the long run. To address these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups. We first construct a distributional critic which allows policy optimization under varying quantile levels of cumulative reward feedbacks from users, representing user groups with varying activity levels. Guided by this critic, we devise a population of distinct actors aimed at effective and fine-grained exploration within its respective user group. To simultaneously enhance diversity and stability during the exploration process, we further introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets demonstrate that our approach outperforms all other baselines in terms of long-term performance, validating its user-oriented exploration effectiveness. Meanwhile, further analyses reveal our approach's additional benefits of improved performance for low-activity users as well as increased fairness among users.

4. Strategic Recommendations for Improved Outcomes in Congestion Games

Ratings: 5, 3, 3, 6

https://openreview.net/forum?id=YJxhZnGU1q

Traffic on roads, packets on the Internet, and electricity on power grids share a structure abstracted in congestion games, where self-interested behaviour can lead to socially sub-optimal results. External recommendations may seek to alleviate these issues, but recommenders must take into account the effect that their recommendations have on the system. In this paper, we investigate the effects that dynamic recommendations have on Q-learners as they repeatedly play congestion games. To do so, we propose a novel model of recommendation whereby a Q-learner receives a recommendation as a state. Thus, the recommender strategically picks states during learning, which we call the Learning Dynamic Manipulation Problem. We define the manipulative potential of these recommenders in repeated congestion games and propose an algorithm for the Learning Dynamic Manipulation Problem designed to drive the actions of Q-learners toward a target action distribution. We simulate our algorithm and show that it can drive the system to convergence at the social optimum of a well-known congestion game. Our results show theoretically and empirically that increasing the recommendation space can increase the manipulative potential of the recommender.

5. Categorical Features of entities in Recommendation Systems Using Graph Neural Networks

Ratings: 6, 3, 3, 3

https://openreview.net/forum?id=PuCno7nwgH

The paper tackles the challenge of capturing entity attribute-specific preferences in recommender systems, with a particular focus on the role of categorical features within GNN-based user-item recommender engines. Despite the significant influence of categorical features such as brand, category, and price bucket on the user decision-making process, there are not many studies dedicated to understanding the GNN's capability to extract and model such preferences effectively. The study extensively compares and tests various techniques for incorporating categorical features into the GNN framework to address this gap. These techniques include one-hot encoding-based node features, category-value nodes, and hyperedges. Three real-world datasets are used to answer what is the most optimal way to incorporate such information. In addition, the paper introduces a novel hyperedge-based method designed to leverage categorical features more effectively compared to existing approaches. The advantage of the hyperedge approach is demonstrated through extensive experiments in effectively modeling categorical features and extracting user attribute-specific preferences.

6. Safe Collaborative Filtering

Ratings: 6, 8, 8

https://openreview.net/forum?id=yarUvgEXq3

Excellent tail performance is crucial for modern machine learning tasks, such as algorithmic fairness, class imbalance, and risk-sensitive decision making, as it ensures the effective handling of challenging samples within a dataset. Tail performance is also a vital determinant of success for personalized recommender systems to reduce the risk of losing users with low satisfaction. This study introduces a "safe" collaborative filtering method that prioritizes recommendation quality for less-satisfied users rather than focusing on the average performance. Our approach minimizes the conditional value at risk (CVaR), which represents the average risk over the tails of users' loss. To overcome computational challenges for web-scale recommender systems, we develop a robust yet practical algorithm that extends the most scalable method, implicit alternating least squares (iALS). Empirical evaluation on real-world datasets demonstrates the excellent tail performance of our approach while maintaining competitive computational efficiency.

7. Cross-domain Recommendation from Implicit Feedback

Ratings: 3, 1, 3, 5

https://openreview.net/forum?id=wi8wMFuO0H

Existing cross-domain recommendation (CDR) algorithms aim to leverage explicit feedback from richer source domains to enhance recommendations in a target domain with limited records. However, practical scenarios often involve easily obtainable implicit feedback, such as user clicks, and purchase history, instead of explicit feedback. Thus, in this paper, we consider a more practical problem setting, called cross-domain recommendation from implicit feedback (CDRIF), where both source and target domains are based on implicit feedback. We initially observe that current CDR algorithms struggle to make recommendations when implicit feedback exists in both source and target domains. The primary issue with current CDR algorithms mainly lies in that implicit feedback can only approximately express user preferences in the dataset, inevitably introducing noisy information during the training of recommender systems. To this end, we propose a noise-aware reweighting framework (NARF) for CDRIF, which effectively alleviates the negative effects brought by the implicit feedback and improves recommendation performance. Extensive experiments conducted on both synthetic and large real-world datasets demonstrate that NARF, implemented by two representative CDR algorithms, significantly outperforms the baseline methods, which further underscores the significance of handling implicit feedback in CDR. The code is available in an anonymous Github repository: https://anonymous.4open.science/r/CDR-3E2A/README.md.

8. Disentangled Heterogeneous Collaborative Filtering

Ratings: 6, 3, 5

https://openreview.net/forum?id=KQm3IUWxwb

Modern recommender systems often utilize low-dimensional latent representations to embed users and items based on their observed interactions. However, many existing recommendation models are primarily designed for coarse-grained and homogeneous interactions, which limits their effectiveness in two key dimensions: i) They fail to exploit the relational dependencies across different types of user behaviors, such as page views, add-to-favorites, and purchases. ii) They struggle to encode the fine-grained latent factors that drive user interaction patterns. In this study, we introduce DHCF, an efficient and effective contrastive learning recommendation model that effectively disentangles users' multi-behavior interaction patterns and the latent intent factors behind each behavior. Our model achieves this through the integration of intent disentanglement and multi-behavior modeling using a parameterized heterogeneous hypergraph architecture. Additionally, we propose a novel contrastive learning paradigm that adaptively explores the benefits of multi-behavior contrastive self-supervised augmentation, thereby improving the model's robustness against data sparsity. Through extensive experiments conducted on three public datasets, we demonstrate the effectiveness of DHCF, which significantly outperforms various strong baselines with competitive efficiency.

9. Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference for Recommendation

Ratings: 8, 6, 5, 6

https://openreview.net/forum?id=52fz5sUAy2

The interaction between users and recommender systems is not only affected by selection bias but also the neighborhood effect, i.e., the interaction between a user and an item is affected by the interactions between other users and other items, or between the same user and other items, or between other users and the same item. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but the lack of consideration of neighborhood effects can lead to biased estimates and suboptimal performance of the prediction model. In this paper, we formally formulate the neighborhood effect as an interference problem from the perspective of causal inference and introduce a treatment representation to capture the neighborhood effect. On this basis, we propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effects. In addition, we further develop two novel estimators for the ideal loss. We theoretically establish the connection between the proposed methods and previous methods ignoring the neighborhood effect and show that the proposed methods achieve unbiased learning when both selection bias and neighborhood effects are present, while the existing methods are biased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed methods.

10. Demystifying Embedding Spaces using Large Language Models

Ratings: 5, 8, 6, 8

https://openreview.net/forum?id=qoYogklIPz

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.

11. FIITED: Fine-grained embedding dimension optimization during training for recommender systems

Ratings: 5, 3, 8, 3, 3

https://openreview.net/forum?id=gDDW5zMKFe

Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual embedding vector continuously during training, assigning longer dimensions to more important embeddings while adapting to dynamic changes in data. A novel embedding storage system based on virtually hashed physically indexed hash tables is designed to efficiently implement the embedding dimension adjustment and effectively enable memory saving. Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model’s quality, saving significantly more memory than a state-of-the-art in-training embedding pruning method. On public click-through rate prediction datasets, FIITED is able to prune up to 93.75%-99.75% embeddings without significant accuracy loss. Given the same embedding size reduction, FIITED is able to achieve better model quality than the baselines.

12. From Deterministic to Probabilistic World: Balancing Enhanced Doubly Robust Learning for Debiased Recommendation

Ratings: 6, 3, 8

https://openreview.net/forum?id=2uwvigLUr8

In recommender systems, selection bias arises from the users' selective interactions with items, which poses a widely-recognized challenge for unbiased evaluation and learning for recommendation models. Recently, doubly robust and its variants have been widely studied to achieve debiased learning of prediction models, which enables unbiasedness when either imputed errors or learned propensities are accurate. However, we find that previous studies achieve unbiasedness using the doubly robust learning approaches are all based on deterministic error imputation model and deterministic propensity model, and these approaches fail to be unbiased when using probabilistic models to impute errors and learn propensities. To tackle this problem, in this paper, we first derive the bias of doubly robust learning methods and provide alternative unbiasedness conditions for probabilistic models. Then we propose a novel balancing enhanced doubly robust joint learning approach, which improves the accuracy of the imputed errors and leads to unbiased learning under probabilistic error imputations and learned propensities. We further derive the generalization error bound when using the probabilistic models, and show that it can be effectively controlled by the proposed learning approach. We conduct extensive experiments on three real-world datasets, including a large-scale industrial dataset, to demonstrate the effectiveness of the proposed method.

13. How Does Message Passing Improve Collaborative Filtering?

Ratings: 8, 3, 3, 5

https://openreview.net/forum?id=JZC8cEmMWY

Collaborative filtering (CF) has exhibited prominent results for recommender systems and is broadly utilized for real-world applications. A branch of research enhances CF methods with message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message passing helps CF methods in a manner akin to its benefits for graph-based learning tasks in general (e.g., node classification). However, whether or not this assumption is correct still needs verification, even though message passing empirically improves CF. To address this gap, we formally investigate why message passing helps CF from multiple perspectives (i.e., information passed from neighbors, additional gradients for neighbors, and individual improvement gains of subgroups w.r.t. the node degree) and show that many assumptions made by previous works are not entirely accurate. With our rigorously designed ablation studies and analyses, we discover that message passing (i) improves the CF performance primarily by information passed from neighbors instead of their accompanying gradients and (ii) usually helps low-degree nodes more than high-degree nodes. Utilizing these novel findings, we present Test-time Aggregation for Collaborative Filtering, namely TAG-CF, a test-time augmentation framework that only conducts message passing once at inference time. It can be used as a plug-and-play module and is effective at enhancing representations trained by different CF supervision signals. Evaluated on five datasets, TAG-CF performs on par with or better than trending graph-based CF methods with less than 1% of their total training time. Furthermore, we show that test-time aggregation in TAG-CF improves recommendation performance in similar ways as the training-time aggregation does, demonstrating the legitimacy of our findings on why message passing improves CF.

14. VibeSpace: Automatic vector embedding creation for arbitrary domains and mapping between them using large language models

Ratings: 3, 3, 5, 1

https://openreview.net/forum?id=BxPqibGUPR

We present VibeSpace; a method for the fully unsupervised construction of interpretable embedding spaces applicable to arbitrary domain areas. By leveraging knowledge contained within large language models, our method automates otherwise costly data acquisition processes and assesses the similarity of entities, allowing for meaningful and interpretable positioning within vector spaces. Our approach is also capable of learning intelligent mappings between vector space representations of non-overlapping domains, allowing for a novel form of cross-domain similarity analysis. First, we demonstrate that our data collection methodology yields comprehensive and rich datasets across multiple domains, including songs, books, and movies. Second, we show that our method yields single-domain embedding spaces which are separable by various domain specific features. These representations provide a solid foundation upon which we can develop classifiers and initialise recommender systems, demonstrating our method's utility as a data-free solution to the cold-start problem. Further, these spaces can be interactively queried to obtain semantic information about different regions in embedding spaces. Lastly, we argue that by exploiting the unique capabilities of current state-of-the-art large language models, we produce cross-domain mappings which capture contextual relationships between heterogeneous entities which may not be attainable through traditional methods. The presented method facilitates the creation of embedding spaces of any domain which circumvents the need for collection and calibration of sensitive user data, as well as providing deeper insights and better interpretations of multi-domain data.

15. Unifying User Preferences and Critic Opinions: A Multi-View Cross-Domain Item-sharing Recommender System

Ratings: 5, 5, 5, 3

https://openreview.net/forum?id=Z7OWaSze0V

Traditional cross-domain recommender systems often assume user overlap and similar user behavior across domains. However, these presumptions may not always hold true in real-world situations. In this paper, we explore an less explored but practical scenario: cross-domain recommendation with distinct user groups, sharing only item-specific data. Specifically, we consider user and critic review scenarios. Critic reviews, typically from professional media outlets, provide expert and objective perspectives, while user reviews offer personalized insights based on individual experiences. The challenge lies in leveraging critic expertise to enhance personalized user recommendations without sharing user data. To tackle this, we propose a Multi-View Cross-domain Item-sharing Recommendation (MCIR) framework that synergizes user preferences with critic opinions. We develop separate embedding networks for users and critics. The user-rating network leverage a variational autoencoder to capture user scoring embeddings, while the user-review network use pretrained text embeddings to obtain user commentary embeddings. In contrast, critic network utilize multi-task learning to derive insights from critic ratings and reviews. Further, we use Graph Convolutional Network layers to gather neighborhood information from the user-critic-item graph, and implement an attentive integration mechanism and cross-view contrastive learning mechanism to align embeddings across different views. Real-world dataset experiments validate the effectiveness of the proposed MCIR framework, demonstrating its superiority over many state-of-the-art methods.

16. GNN-based Reinforcement Learning Agent for Session-based Recommendation

Ratings: 3, 1, 1, 3

https://openreview.net/forum?id=Iv60x1iAvp

This paper focuses on session-based item recommendation and the challenges of using Reinforcement Learning (RL) in recommender systems. While traditional RL methods rely on one-hot encoded vectors as user state, they often fail to capture user-specific characteristics, which may provide misleading results. In contrast, recently, Graph Neural Networks (GNNs) have emerged as a promising technique for learning user-item representations effectively. However, GNNs prioritize static rating prediction, which does not fully capture the dynamic nature of session-based recommendations. To address these limitations, we propose a novel approach called GNN-RL-based Recommender System (GRRS), which combines both frameworks to provide a unique solution for the session-based recommendation footnote{Code available at url{https://anonymous.4open.science/r/iclr24_gnn_rl/}}. We demonstrate that our method can leverage the strengths of both GNNs and RL while overcoming their respective shortcomings. Our experiments on several logged public datasets validate the efficacy of our approach over various SOTA algorithms. Additionally, we offer a solution to the emph{offline training problem}, which is often encountered by RL algorithms when employed on logged datasets, which may be of independent interest.

17. Basis Function Encoding of Numerical Features in Factorization Machines for Improved Accuracy

Ratings: 3, 3, 5, 5, 5, 6, 5, 6

https://openreview.net/forum?id=HmKav4WZ9w

Factorization machine (FM) variants are widely used for large scale real-time content recommendation systems, since they offer an excellent balance between model accuracy and low computational costs for training and inference. These systems are trained on tabular data with both numerical and categorical columns. Incorporating numerical columns poses a challenge, and they are typically incorporated using a scalar transformation or binning, which can be either learned or chosen a-priori. In this work, we provide a systematic and theoretically-justified way to incorporate numerical features into FM variants by encoding them into a vector of function values for a set of functions of one's choice.

We view factorization machines as approximators of segmentized functions, namely, functions from a field's value to the real numbers, assuming the remaining fields are assigned some given constants, which we refer to as the segment. From this perspective, we show that our technique yields a model that learns segmentized functions of the numerical feature spanned by the set of functions of one's choice, namely, the spanning coefficients vary between segments. Hence, to improve model accuracy we advocate the use of functions known to have strong approximation power, and offer the B-Spline basis due to its well-known approximation power, availability in software libraries, and efficiency. Our technique preserves fast training and inference, and requires only a small modification of the computational graph of an FM model. Therefore, it is easy to incorporate into an existing system to improve its performance. Finally, we back our claims with a set of experiments that include a synthetic experiment, performance evaluation on several data-sets, and an A/B test on a real online advertising system which shows improved performance. The results can be reproduced with the code in the supplemental material.

18. MOESART: An Effective Sampling-based Router for Sparse Mixture of Experts

Ratings: 6, 5, 5, 3

https://openreview.net/forum?id=KTq2XSBNsa

The sparse Mixture-of-Experts (Sparse-MoE) is a promising framework for efficiently scaling up model capacity. This framework consists of a set of experts (subnetworks) and one or more routers. The routers activate only a small subset of the experts on a per-example basis, which can save on resources. Among the most widely used sparse routers are Top-k and its variants, which activate k experts for each example during training. While very effective at model scaling, these routers are prone to performance issues because of discontinuous nature of the routing problem. Differentiable routers have been shown to mitigate the performance issues of Top-k, but these are not k-sparse during training, which limits their utility. To address this challenge, we propose MOESART: a novel k-sparse routing approach, which maintains k-sparsity during both training and inference. Unlike existing routers, MOESART aims at learning a good k-sparse approximation of the classical, softmax router. We achieve this through carefully designed sampling and expert weighting strategies. We compare MOESART with state-of-the-art MoE routers, through large-scale experiments on 14 datasets from various domains, including recommender systems, vision, and natural language processing. MOESART achieves up to 16% (relative) reduction in out-of-sample loss on standard image datasets, and up to 15% (relative) improvement in AUC on standard recommender systems, over popular k-sparse routers, e.g., Top-k, V-MoE, Expert Choice Router and X-MoE. Moreover, for distilling natural language processing models, MOESART can improve predictive performance by 0.5% (absolute) on average over the Top-k router across 7 GLUE and 2 SQuAD benchmarks.

19. On the Embedding Collapse When Scaling up Recommendation Models

Ratings: 3, 5, 5, 5

https://openreview.net/forum?id=0IaTFNJner

Recent advances in deep foundation models have led to a promising trend of developing large recommendation models to leverage vast amounts of available data. However, we experiment to scale up existing recommendation models and observe that the enlarged models do not improve satisfactorily. In this context, we investigate the embedding layers of enlarged models and identify a phenomenon of embedding collapse, which ultimately hinders scalability, wherein the embedding matrix tends to reside in a low-dimensional subspace. Through empirical and theoretical analysis, we demonstrate that the feature interaction module specific to recommendation models has a two-sided effect. On the one hand, the interaction restricts embedding learning when interacting with collapsed embeddings, exacerbating the collapse issue. On the other hand, feature interaction is crucial in mitigating the fitting of spurious features, thereby improving scalability. Based on this analysis, we propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to capture diverse patterns and reduce collapse. Extensive experiments demonstrate that this proposed design provides consistent scalability for various recommendation models.

20. Hyperbolic Embeddings in Sequential Self-Attention for Improved Next-Item Recommendations

Ratings: 3, 3, 3, 5

https://openreview.net/forum?id=0TZs6WOs16

In recent years, self-attentive sequential learning models have surpassed conventional collaborative filtering techniques in next-item recommendation tasks. However, Euclidean geometry utilized in these models may not be optimal for capturing a complex structure of the behavioral data. Building on recent advances in the application of hyperbolic geometry to collaborative filtering tasks, we propose a novel approach that leverages hyperbolic geometry in the sequential learning setting. Our approach involves transitioning the learned parameters to a Poincar'e ball, which enables a linear predictor in a non-linear space. Our experimental results demonstrate that under certain conditions hyperbolic models may simultaneously improve recommendation quality and gain representational capacity. We identify several determining factors that affect the results, which include the ability of a loss function to preserve hyperbolic structure and the general compatibility of data with hyperbolic geometry. For the latter, we propose an empirical approach based on Gromov delta-hyperbolicity estimation that allows categorizing datasets as either compatible or not.

21. Constraining Non-Negative Matrix Factorization to Improve Signature Learning

Ratings: 3, 6, 3,

https://openreview.net/forum?id=AcGUW5655J

Collaborative filtering approaches are fundamental for learning meaningful low-dimensional representations when only association data is available. Among these methods, Non-negative Matrix Factorization (NMF) has gained prominence due to its capability to yield interpretable and meaningful low-dimensional representations. However, one significant challenge for NMF is the vast number of solutions for the same problem instance, making the selection of high-quality signatures a complex task. In response to this challenge, our work introduces a novel approach, Self-Matrix Factorization (SMF), which leverages NMF by incorporating constraints that preserve the relationships inherent in the original data. This is achieved by drawing inspiration from a distinct family of matrix decomposition methods, known as Self-Expressive Models (SEM). In our experimental analyses, conducted on two diverse benchmark datasets, our findings present a compelling narrative. SMF consistently delivers competitive or even superior performance when compared to NMF in predictive tasks. However, what truly sets SMF apart, as validated by our empirical results, is its remarkable ability to consistently generate significantly more meaningful object representations.

22. Farzi Data: Autoregressive Data Distillation

Ratings: 6, 6, 6, 5

https://openreview.net/forum?id=H9DYMIpz9c

We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences — Farzi Data — which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under the hood, FARZI conducts memory-efficient data distillation by (i) deriving efficient reverse-mode differentiation of the Adam optimizer by leveraging Hessian-Vector Products; and (ii) factorizing the high-dimensional discrete event-space into a latent-space which provably promotes implicit regularization. Empirically, for sequential recommendation and language modeling tasks, we are able to achieve 98 − 120% of downstream full-data performance when training state-of-the-art models on Farzi Data of size as little as 0.1% of the original dataset. Notably, being able to train better models with significantly less data sheds light on the design of future large auto-regressive models, and opens up new opportunities to further scale up model and data sizes.

23. Factual and Personalized Recommendation Language Modeling with Reinforcement Learning

Ratings: 3, 5, 6, 5

https://openreview.net/forum?id=fQxLgR9gx7

Recommender systems (RSs) play a central role in connecting users to content, products and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P^4LM) that recommends items to users in a way that better explains item characteristics and their relevance to a user's preferences. To do this, P^4LM uses the embedding space representation of a user's preferences constructed by a traditional RS to generate compelling responses that are factually-grounded and relevant w.r.t. those preferences. Moreover, we develop a joint reward function that measures precision, appeal, and personalization, which we use as AI-based feedback for reinforcement learning-based language modeling. Using MovieLens data, we show that P^4LM can deliver compelling, personalized movie narratives to users.

24. ConvFormer: Revisiting Token-mixers for Sequential User Modeling

Ratings: 1, 6, 5

https://openreview.net/forum?id=Gny0PVtKz2

Sequential user modeling is essential for building recommender systems, aiming to predict users' subsequent preferences based on their historical behavior. Despite the widespread success of the Transformer architecture in various domains, we observe that its self-attentive token mixer is outperformed by simpler strategies in the realm of sequential user modeling. This observation motivates our study, which aims to revisit and optimize the design of token mixers for this specific application. We start by examining the core building blocks of the self-attentive token mixer, identifying three empirically-validated criteria essential for designing effective token mixers in sequential user models. To validate the utility of these criteria, we develop ConvFormer, a streamlined modification to the Transformer architecture that satisfies the proposed criteria simultaneously. We also present an acceleration technique to handle the computational cost of processing long sequences. Experimental results on four public datasets reveal that even a simple model, when designed in accordance with the proposed criteria, can surpass various complex and delicate solutions, validating the efficacy of the proposed criteria.

25. Talk like a Graph: Encoding Graphs for Large Language Models

Ratings: 6, 6, 6, 6

https://openreview.net/forum?id=IuXR1CCrSi

Graphs are a powerful tool for representing and analyzing complex relationships in real-world applications such as social networks, recommender systems, and computational finance. Reasoning on graphs is essential for drawing inferences about the relationships between entities in a complex system, and to identify hidden patterns and trends. Despite the remarkable progress in automated reasoning with natural text, reasoning on graphs with large language models (LLMs) remains an understudied problem. In this work, we perform the first comprehensive study of encoding graph-structured data as text for consumption by LLMs. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered. These novel results provide valuable insight on strategies for encoding graphs as text. Using these insights we illustrate how the correct choice of encoders can boost performance on graph reasoning tasks inside LLMs by 4.8% to 61.8%, depending on the task.

26. Weight Uncertainty in Individual Treatment Effect

Ratings: 5, 1, 3, 5, 3

https://openreview.net/forum?id=j2AWbl4L3K

The estimation of individual treatment effects (ITE) has recently gained significant attention from both the research and industrial communities due to its potential applications in various fields such as healthcare, economics, and education. However, the sparsity of observational data often leads to a lack of robustness and over-fitting in most existing methods. To address this issue, this paper investigates the benefits of incorporating uncertainty modeling in the process of optimizing parameters for robust ITE estimation. Specifically, we derive an informative generalization bound that connects to Bayesian inference and propose a variational bound in closed form to learn a probability distribution on the weights of a hypothesis and representation function. Through experiments on one synthetic dataset and two benchmark datasets, we demonstrate the effectiveness of our proposed model in comparison to state-of-the-art methods. Moreover, we conduct experiments on a real-world dataset in recommender scenarios to verify the benefits of uncertainty in causal inference. The results of our experiments provide evidence of the practicality of our model, which aligns with our initial expectations.

27. Explaining recommendation systems through contrapositive perturbations

Ratings: 5, 5, 3

https://openreview.net/forum?id=mavWQw7DnC

Recommender systems are widely used to help users discover new items online. A popular method for recommendations is factorization models, which predict a user's preference for an item based on latent factors derived from their interaction history. However, explaining why a particular item was recommended to a user is challenging, and current approaches such as counterfactual explanations can be computationally expensive. In this paper, we propose a new approach called contrapositive explanations that leverages a different logical structure to counterfactual explanations. We show how contrapositive explanations can be used to explain recommendation systems by finding the minimum change that would have resulted in a different recommendation. Specifically, we present a methodology that focuses on finding an explanation in the form of "Because the user interacted with item, j we recommend item i to the user," which is easier to compute and find compared to traditional counterfactual approaches which aim at "Because the user did not interacted with item j, we did not recommend item i to the user,". We evaluate our approach on several real-world datasets and show that it provides effective and efficient explanations compared to other existing methods.

28. Benchmarks for Reinforcement Learning with Biased Offline Data and Imperfect Simulators

Ratings: 6, 3, 3, 3

https://openreview.net/forum?id=uwjDyJfe3m

In many reinforcement learning (RL) applications one cannot easily let the agent act in the world; this is true for autonomous vehicles, healthcare applications, and even some recommender systems, to name a few examples. Offline RL provides a way to train agents without exploration, but is often faced with biases due to data distribution shifts, limited exploration, and incomplete representation of the environment. To address these issues, practical applications have tried to combine simulators with grounded offline data, using so-called hybrid methods. However, constructing a reliable simulator is in itself often challenging due to intricate system complexities as well as missing or incomplete information. In this work, we outline four principal challenges for combining offline data with imperfect simulators in RL: simulator modeling error, partial observability, state and action discrepancies, and hidden confounding. To help drive the RL community to pursue these problems, we construct ''Benchmarks for Mechanistic Offline Reinforcement Learning'' (B4MRL), which provide dataset-simulator benchmarks for the aforementioned challenges. Finally, we propose a new approach to combine an imperfect simulator with biased data and demonstrate its efficiency. Our results suggest the key necessity of such benchmarks for future research.

29. Evidential Conservative Q-Learning for Dynamic Recommendations

Ratings: 5, 5, 3, 5

https://openreview.net/forum?id=QwNj5TP9gm

Reinforcement learning (RL) has been leveraged in recommender systems (RS) to capture users' evolving preferences and continuously improve the quality of recommendations. In this paper, we propose a novel evidential conservative Q-learning framework (ECQL) that learns an effective and conservative recommendation policy by integrating evidence-based uncertainty and conservative learning. ECQL conducts evidence-aware explorations to discover items that locate beyond current observation but reflect users' long-term interests. Also, it provides an uncertainty-aware conservative view on policy evaluation to discourage deviating too much from users' current interests. Two central components of ECQL include a uniquely designed sequential state encoder and a novel conservative evidential-actor-critic (CEAC) module. The former generates the current state of the environment by aggregating historical information and a sliding window that contains the current user interactions as well as newly recommended items from RL exploration that may represent future interests. The latter performs an evidence-based rating prediction by maximizing the conservative evidential Q-value and leverages a ranking score to explore the item space for a more diverse and valuable recommendation. Experiments on multiple real-world dynamic datasets demonstrate the state-of-the-art performance of ECQL and its capability to capture users' long-term interests.

30. UNLEARNING THE UNWANTED DATA FROM A PERSONALIZED RECOMMENDATION MODEL

Ratings: 3, 5, 3, 3

https://openreview.net/forum?id=3Ok7ccvtf3

Recommender Systems (RS) learn user behavior by monitoring their activities on the online platform. In a few scenarios, users consume the content but don’t want to get their recommendations because a). They consumed the content by mistake, and those interactions have been utilized in personalizing the model; b) The content was consumed by someone else on their behalf; c) Data acquisition was faulty because of machine failure; d) The user has lost interest in the service, etc. Out of any of these reasons, the user wants the data that was used for generating the recommendation to be unlearned by RS. The constraints with this unlearning are 1) The user’s other data should be intact, 2) Personalized experience should not be affected, and 3) We can not afford training from scratch. To solve the stated problem, a few unlearning strategies have already been proposed, but unlearning the matrix factorization-based model is not much explored. In this work, we propose a solution of unlearning from the faulty recommendation model (m1) by diluting the impact of unwanted data. To do so, we first correct the unwanted data and pre- pare an intermediate tiny model m2, referred to as the rescue model. Further, we apply the convolution fusion function (CFF) on the latent features acquired using m1 , m2 . The performance of the proposed method is evaluated on multiple public datasets. We observed that the proposed method outperforms SOTA benchmark models on recommendation tasks.

31. AFDGCF: Adaptive Feature De-correlation Graph Collaborative Filtering for Recommendations

Ratings: 8, 5, 5, 8

https://openreview.net/forum?id=53kW6e1uNN

Collaborative filtering methods based on graph neural networks (GNNs) have witnessed significant success in recommender systems (RS), capitalizing on their ability to capture collaborative signals within intricate user-item relationships via message-passing mechanisms. However, these GNN-based RS inadvertently introduce a linear correlation between user and item embeddings, contradicting the goal of providing personalized recommendations. While existing research predominantly ascribes this flaw to the over-smoothing problem, this paper underscores the critical, often overlooked role of the over-correlation issue in diminishing the effectiveness of GNN representations and subsequent recommendation performance. The unclear relationship between over-correlation and over-smoothing in RS, coupled with the challenge of adaptively minimizing the impact of over-correlation while preserving collaborative filtering signals, is quite challenging. To this end, this paper aims to address the aforementioned gap by undertaking a comprehensive study of the over-correlation issue in graph collaborative filtering models. Empirical evidence substantiates the widespread prevalence of over-correlation in these models. Furthermore, a theoretical analysis establishes a pivotal connection between the over-correlation and over-smoothing predicaments. Leveraging these insights, we introduce the Adaptive Feature De-correlation Graph Collaborative Filtering (AFDGCF) Framework, which dynamically applies correlation penalties to the feature dimensions of the representation matrix, effectively alleviating both over-correlation and over-smoothing challenges. The efficacy of the proposed framework is corroborated through extensive experiments conducted with four different graph collaborative filtering models across four publicly available datasets, demonstrating the superiority of AFDGCF in enhancing the performance landscape of graph collaborative filtering models.

0 人点赞