机器学习学术速递[12.9]

cs.LG 方向，今日共计76篇

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】 A graph representation based on fluid diffusion model for multimodal data analysis: theoretical aspects and enhanced community detection 标题：用于多模态数据分析的基于流体扩散模型的图表示：理论和增强的社区检测链接：https://arxiv.org/abs/2112.04388

作者：Andrea Marinoni,Christian Jutten,Mark Girolami 备注：26 pages, 17 figures 摘要：通过图形结构表示数据，可以确定在多个数据分析应用程序中提取信息的最有效方法之一。当调查多模态数据集时尤其如此，因为通过不同传感策略收集的记录被考虑和探索。然而，经典的图形信号处理基于根据热扩散机制配置的信息传播模型。该系统对数据属性提供了一些约束和假设，这些约束和假设可能对多模态数据分析无效，特别是在考虑从异构源收集的大规模数据集时，因此结果的准确性和稳健性可能会受到严重危害。本文介绍了一种基于流体扩散的图定义模型。所提出的方法提高了基于图形的数据分析的能力，以考虑到操作场景中现代数据分析的几个问题，从而为精确、通用和高效地理解所检查记录背后的现象提供了一个平台，并充分利用记录的多样性所提供的潜力，全面描述数据及其重要性。在这项工作中，我们将注意力集中在使用这种流体扩散模型来驱动社区检测方案，即，根据节点之间的相似性以无监督的方式将多模态数据集划分为多个组。通过在不同应用场景中测试真实多模态数据集所获得的实验结果表明，在多模态数据分析中，我们的方法能够显著优于最先进的社区检测方案。摘要：Representing data by means of graph structures identifies one of the most valid approach to extract information in several data analysis applications. This is especially true when multimodal datasets are investigated, as records collected by means of diverse sensing strategies are taken into account and explored. Nevertheless, classic graph signal processing is based on a model for information propagation that is configured according to heat diffusion mechanism. This system provides several constraints and assumptions on the data properties that might be not valid for multimodal data analysis, especially when large scale datasets collected from heterogeneous sources are considered, so that the accuracy and robustness of the outcomes might be severely jeopardized. In this paper, we introduce a novel model for graph definition based on fluid diffusion. The proposed approach improves the ability of graph-based data analysis to take into account several issues of modern data analysis in operational scenarios, so to provide a platform for precise, versatile, and efficient understanding of the phenomena underlying the records under exam, and to fully exploit the potential provided by the diversity of the records in obtaining a thorough characterization of the data and their significance. In this work, we focus our attention to using this fluid diffusion model to drive a community detection scheme, i.e., to divide multimodal datasets into many groups according to similarity among nodes in an unsupervised fashion. Experimental results achieved by testing real multimodal datasets in diverse application scenarios show that our method is able to strongly outperform state-of-the-art schemes for community detection in multimodal data analysis.

【2】 Improving the Training of Graph Neural Networks with Consistency Regularization 标题：用一致性正则化改进图神经网络的训练链接：https://arxiv.org/abs/2112.04319

作者：Chenhui Zhang,Yufei He,Yukuo Cen,Zhenyu Hou,Jie Tang 摘要：图形神经网络（GNNs）在半监督学习场景中取得了显著的成功。图神经网络中的消息传递机制有助于未标记节点从其标记的邻居收集监控信号。在这项工作中，我们研究了一致性正则化，一种被广泛采用的半监督学习方法，如何帮助改善图神经网络的性能。我们回顾了图神经网络一致性正则化的两种方法。一种是简单一致性正则化（SCR），另一种是平均教师一致性正则化（MCR）。我们将一致性正则化方法与两种最先进的GNN相结合，并在ogbn产品数据集上进行了实验。通过一致性正则化，在开放图基准（OGB）的ogbn产品数据集上，无论有无外部数据，最先进的GNNs的性能都可以提高0.3%。摘要：Graph neural networks (GNNs) have achieved notable success in the semi-supervised learning scenario. The message passing mechanism in graph neural networks helps unlabeled nodes gather supervision signals from their labeled neighbors. In this work, we investigate how consistency regularization, one of widely adopted semi-supervised learning methods, can help improve the performance of graph neural networks. We revisit two methods of consistency regularization for graph neural networks. One is simple consistency regularization (SCR), and the other is mean-teacher consistency regularization (MCR). We combine the consistency regularization methods with two state-of-the-art GNNs and conduct experiments on the ogbn-products dataset. With the consistency regularization, the performance of state-of-the-art GNNs can be improved by 0.3% on the ogbn-products dataset of Open Graph Benchmark (OGB) both with and without external data.

【3】 On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations 标题：论不切实际的预测在数百篇评价图形表示的论文中的运用链接：https://arxiv.org/abs/2112.04274

作者：Li-Chung Lin,Cheng-Hung Liu,Chih-Ming Chen,Kai-Chin Hsu,I-Feng Wu,Ming-Feng Tsai,Chih-Jen Lin 备注：Accepted by AAAI 2022 摘要：使用基本事实进行预测听起来像是机器学习中的一个矛盾修饰法。然而，这种不切实际的设置在数百篇（如果不是数千篇）寻找图形表示的论文中被使用。为了使用获得的表示来评估节点分类的多标签问题，许多工作在预测阶段假设每个测试实例的标签数量是已知的。在实践中，这样的地面真相信息很少可用，但我们指出，这种不适当的设置现在在这个研究领域无处不在。我们详细调查了这种情况发生的原因。我们的分析表明，如果信息不切实际，性能可能会被高估。为了了解为什么没有使用合适的预测，我们确定了应用一些多标签技术的困难。为了在未来的研究中使用，我们建议在不使用实际未知信息的情况下进行简单有效的设置。最后，我们借此机会对多标签节点分类中的主要图表示学习方法进行了公平而认真的比较。摘要：Prediction using the ground truth sounds like an oxymoron in machine learning. However, such an unrealistic setting was used in hundreds, if not thousands of papers in the area of finding graph representations. To evaluate the multi-label problem of node classification by using the obtained representations, many works assume in the prediction stage that the number of labels of each test instance is known. In practice such ground truth information is rarely available, but we point out that such an inappropriate setting is now ubiquitous in this research area. We detailedly investigate why the situation occurs. Our analysis indicates that with unrealistic information, the performance is likely over-estimated. To see why suitable predictions were not used, we identify difficulties in applying some multi-label techniques. For the use in future studies, we propose simple and effective settings without using practically unknown information. Finally, we take this chance to conduct a fair and serious comparison of major graph-representation learning methods on multi-label node classification.

【4】 Shortest Paths in Graphs with Matrix-Valued Edges: Concepts, Algorithm and Application to 3D Multi-Shape Analysis 标题：矩阵值边图的最短路：概念、算法及其在三维多形状分析中的应用链接：https://arxiv.org/abs/2112.04165

作者：Viktoria Ehm,Daniel Cremers,Florian Bernard 备注：published at 3DV 摘要：在图形中寻找最短路径与计算机视觉和图形学中的许多问题有关，包括图像分割、形状匹配或离散曲面上测地线距离的计算。传统上，对于具有标量边权重的图，最短路径的概念被考虑，这使得通过将各个边权重相加来计算路径的长度成为可能。然而，具有标量边权重的图在其表达能力上受到严重限制，因为经常使用边来编码显著更复杂的相互关系。在这项工作中，我们补偿了这种建模限制，并引入了新的图论概念，即具有矩阵值边的图中的最短路径。为此，我们定义了一种有意义的方法来量化矩阵值边的路径长度，并提出了一种简单而有效的算法来计算各自的最短路径。虽然我们的形式主义具有普遍性，因此适用于视觉、图形和其他领域的广泛设置，但我们将重点展示其在三维多形状分析中的优点。摘要：Finding shortest paths in a graph is relevant for numerous problems in computer vision and graphics, including image segmentation, shape matching, or the computation of geodesic distances on discrete surfaces. Traditionally, the concept of a shortest path is considered for graphs with scalar edge weights, which makes it possible to compute the length of a path by adding up the individual edge weights. Yet, graphs with scalar edge weights are severely limited in their expressivity, since oftentimes edges are used to encode significantly more complex interrelations. In this work we compensate for this modelling limitation and introduce the novel graph-theoretic concept of a shortest path in a graph with matrix-valued edges. To this end, we define a meaningful way for quantifying the path length for matrix-valued edges, and we propose a simple yet effective algorithm to compute the respective shortest path. While our formalism is universal and thus applicable to a wide range of settings in vision, graphics and beyond, we focus on demonstrating its merits in the context of 3D multi-shape analysis.

【5】 Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks 标题：学习理论(有时)可以解释图神经网络的泛化链接：https://arxiv.org/abs/2112.03968

作者：Pascal Mattia Esser,Leena Chennuru Vankadara,Debarghya Ghoshdastidar 备注：35th Conference on Neural Information Processing Systems (NeurIPS 2021) 摘要：近年来，监督学习环境中的一些结果表明，经典的统计学习理论度量，如VC维，不能充分解释深度学习模型的性能，这促使在无限宽度和迭代区域进行大量工作。然而，除了有监督的环境外，很少有理论解释神经网络的成功。在本文中，我们认为，在一些分布假设下，经典的学习理论方法可以充分解释图神经网络在直传环境下的泛化。特别是，我们通过分析图卷积网络在节点分类问题上的泛化特性，对神经网络在传递推理中的性能进行了严格的分析。虽然VC维确实会在该设置中导致微不足道的泛化误差界，但我们表明，对于随机块模型，transductive Rademacher复杂度可以解释图卷积网络的泛化特性。我们进一步使用基于transductive Rademacher复杂度的泛化误差界来演示图卷积和网络架构在实现较小泛化误差方面的作用，并深入了解图结构何时有助于学习。这篇论文的发现可以重新激发人们对学习理论方法方面的神经网络泛化研究的兴趣，尽管是在具体问题上。摘要：In recent years, several results in the supervised learning setting suggested that classical statistical learning-theoretic measures, such as VC dimension, do not adequately explain the performance of deep learning models which prompted a slew of work in the infinite-width and iteration regimes. However, there is little theoretical explanation for the success of neural networks beyond the supervised setting. In this paper we argue that, under some distributional assumptions, classical learning-theoretic measures can sufficiently explain generalization for graph neural networks in the transductive setting. In particular, we provide a rigorous analysis of the performance of neural networks in the context of transductive inference, specifically by analysing the generalisation properties of graph convolutional networks for the problem of node classification. While VC Dimension does result in trivial generalisation error bounds in this setting as well, we show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for stochastic block models. We further use the generalisation error bounds based on transductive Rademacher complexity to demonstrate the role of graph convolutions and network architectures in achieving smaller generalisation error and provide insights into when the graph structure can help in learning. The findings of this paper could re-new the interest in studying generalisation in neural networks in terms of learning-theoretic measures, albeit in specific problems.

Transformer(1篇)

【1】 Relating transformers to models and neural representations of the hippocampal formation 标题：将Transformer与海马结构的模型和神经表示联系起来链接：https://arxiv.org/abs/2112.04035

作者：James C. R. Whittington,Joseph Warren,Timothy E. J. Behrens 摘要：许多基于大脑网络的深层神经网络结构最近被证明可以复制大脑中观察到的神经放电模式。Transformer神经网络（Transformer neural network）是最令人兴奋和最有前途的新型结构之一，它的开发没有考虑大脑。在这项工作中，我们表明，Transformer，当配备有重复的位置编码，复制精确调整的海马结构的空间表征；最显著的是放置和网格单元。此外，我们还表明，这一结果并不令人惊讶，因为它与当前神经科学的海马模型密切相关。此外，我们还显示，与神经科学版本相比，transformer版本提供了显著的性能提升。这项工作继续结合人工和大脑网络的计算，提供了海马-皮层相互作用的新理解，并提出了更广泛的皮层区域如何执行当前神经科学模型（如语言理解）以外的复杂任务。摘要：Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how wider cortical areas may perform complex tasks beyond current neuroscience models such as language comprehension.

GAN|对抗|攻击|生成相关(2篇)

【1】 Does Structure Matter? Leveraging Data-to-Text Generation for Answering Complex Information Needs 标题：结构重要吗？利用数据到文本生成来满足复杂的信息需求链接：https://arxiv.org/abs/2112.04344

作者：Hanane Djeddal,Thomas Gerald,Laure Soulier,Karen Pinel-Sauvagnat,Lynda Tamine 备注：8 pages, 1 figure, ECIR 2022 short paper 摘要：在这项工作中，我们的目标是用自然语言为复杂的信息需求提供结构化的答案。特别是，我们设想从数据到文本生成的角度使用生成模型。我们建议使用内容选择和规划管道，旨在通过生成中间计划来构建答案。使用TREC复杂答案检索（CAR）数据集进行实验评估。我们评估了生成的答案及其相应的结构，并与文本到文本模型进行了比较，展示了基于计划的模型的有效性。摘要：In this work, our aim is to provide a structured answer in natural language to a complex information need. Particularly, we envision using generative models from the perspective of data-to-text generation. We propose the use of a content selection and planning pipeline which aims at structuring the answer by generating intermediate plans. The experimental evaluation is performed using the TREC Complex Answer Retrieval (CAR) dataset. We evaluate both the generated answer and its corresponding structure and show the effectiveness of planning-based models in comparison to a text-to-text model.

【2】 Generative Adversarial Network (GAN) and Enhanced Root Mean Square Error (ERMSE): Deep Learning for Stock Price Movement Prediction 标题：生成性对抗性网络(GAN)和增强型均方根误差(ERMSE)：股价走势预测的深度学习链接：https://arxiv.org/abs/2112.03946

作者：Ashish Kumar,Abeer Alsadoon,P. W. C. Prasad,Salma Abdullah,Tarik A. Rashid,Duong Thu Hang Pham,Tran Quoc Vinh Nguyen 备注：18 pages. Multimed Tools Appl, 2021 摘要：股票价格走势预测在金融界和学术界都具有重要意义。股票价格包含复杂、不完整、模糊的信息，因此预测其发展趋势是一项极其困难的任务。预测和分析金融数据是一个非线性、时间相关的问题。随着机器学习和深度学习的迅速发展，通过专门设计的网络可以更有效地完成这项任务。本文旨在通过使用生成式对抗网络的深度学习体系结构，提高预测精度，最大限度地减少预测误差损失。提出了一种由相空间重构（PSR）方法重构价格序列和生成性对抗网络（GAN）组成的通用模型，GAN是以长短时记忆（LSTM）为生成模型的两个神经网络和卷积神经网络（CNN）的组合作为对抗训练的判别模型来预测股市。LSTM将根据历史基本指标信息生成新实例，然后CNN将估计数据是由LSTM预测的还是真实的。研究发现，生成性对抗网络（GAN）在增强LSTM均方根误差方面表现良好，因为它在预测方向方面的准确度提高了4.35%，处理时间和RMSE分别减少了78秒和0.029秒。这项研究在股票指数的准确性方面提供了更好的结果。似乎该系统专注于最小化均方根误差和处理时间，提高方向预测精度，并在股票指数精度方面提供更好的结果。摘要：The prediction of stock price movement direction is significant in financial circles and academic. Stock price contains complex, incomplete, and fuzzy information which makes it an extremely difficult task to predict its development trend. Predicting and analysing financial data is a nonlinear, time-dependent problem. With rapid development in machine learning and deep learning, this task can be performed more effectively by a purposely designed network. This paper aims to improve prediction accuracy and minimizing forecasting error loss through deep learning architecture by using Generative Adversarial Networks. It was proposed a generic model consisting of Phase-space Reconstruction (PSR) method for reconstructing price series and Generative Adversarial Network (GAN) which is a combination of two neural networks which are Long Short-Term Memory (LSTM) as Generative model and Convolutional Neural Network (CNN) as Discriminative model for adversarial training to forecast the stock market. LSTM will generate new instances based on historical basic indicators information and then CNN will estimate whether the data is predicted by LSTM or is real. It was found that the Generative Adversarial Network (GAN) has performed well on the enhanced root mean square error to LSTM, as it was 4.35% more accurate in predicting the direction and reduced processing time and RMSE by 78 secs and 0.029, respectively. This study provides a better result in the accuracy of the stock index. It seems that the proposed system concentrates on minimizing the root mean square error and processing time and improving the direction prediction accuracy, and provides a better result in the accuracy of the stock index.

半/弱/无/有监督|不确定性|主动学习(9篇)

【1】 Exploring Temporal Granularity in Self-Supervised Video Representation Learning 标题：时间粒度在自监督视频表征学习中的探索链接：https://arxiv.org/abs/2112.04480

作者：Rui Qian,Yeqing Li,Liangzhe Yuan,Boqing Gong,Ting Liu,Matthew Brown,Serge Belongie,Ming-Hsuan Yang,Hartwig Adam,Yin Cui 摘要：本文提出了一个名为TeG的自监督学习框架来探索视频表示学习中的时间粒度。在TeG中，我们从视频中采样一个长片段，并在长片段中采样一个短片段。然后我们提取它们密集的时间嵌入。训练目标由两部分组成：一个细粒度的时间学习目标，用于最大化短片段和长片段中相应时间嵌入之间的相似性；一个持久的时间学习目标，用于将两个片段的全局嵌入合并在一起。我们的研究通过三个主要发现揭示了时间粒度的影响。1）不同的视频任务可能需要不同时间粒度的特征。2）有趣的是，一些被广泛认为需要时间意识的任务实际上可以通过时间持久性特征很好地解决。3） TeG的灵活性在8个视频基准上产生了最先进的结果，在大多数情况下优于有监督的预训练。摘要：This work presents a self-supervised learning framework named TeG to explore Temporal Granularity in learning video representations. In TeG, we sample a long clip from a video and a short clip that lies inside the long clip. We then extract their dense temporal embeddings. The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips. Our study reveals the impact of temporal granularity with three major findings. 1) Different video tasks may require features of different temporal granularities. 2) Intriguingly, some tasks that are widely considered to require temporal awareness can actually be well addressed by temporally persistent features. 3) The flexibility of TeG gives rise to state-of-the-art results on 8 video benchmarks, outperforming supervised pre-training in most cases.

【2】 Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features 标题：训练具有自监督特征的鲁棒零发语音转换模型链接：https://arxiv.org/abs/2112.04424

作者：Trung Dang,Dung Tran,Peter Chin,Kazuhito Koishida 摘要：无监督零炮语音转换（VC）的目的是在不依赖并行训练数据的情况下，修改话语的说话人特征以匹配看不见的目标说话人。最近，语音表征的自监督学习被证明可以在不使用转录本的情况下生成有用的语言单元，转录本可以直接传递到VC模型。在本文中，我们证明了使用长度重采样解码器可以获得高质量的音频样本，这使得VC模型能够与不同的语言特征提取器和声码器协同工作，而无需对相同的序列长度进行操作。我们证明了我们的方法在VCTK数据集上的性能优于许多基线。在不修改架构的情况下，我们进一步证明了a）使用来自同一说话人的不同音频片段对，b）添加循环一致性损失，以及c）添加说话人分类损失可以帮助学习更好的说话人嵌入。我们使用这些技术对LibriTTS进行训练的模型实现了最佳性能，生成的音频样本能够很好地传输到目标说话人的声音，同时保留了在字符错误率方面与实际人类话语相当的语言内容。摘要：Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data. Recently, self-supervised learning of speech representation has been shown to produce useful linguistic units without using transcripts, which can be directly passed to a VC model. In this paper, we showed that high-quality audio samples can be achieved by using a length resampling decoder, which enables the VC model to work in conjunction with different linguistic feature extractors and vocoders without requiring them to operate on the same sequence length. We showed that our method can outperform many baselines on the VCTK dataset. Without modifying the architecture, we further demonstrated that a) using pairs of different audio segments from the same speaker, b) adding a cycle consistency loss, and c) adding a speaker classification loss can help to learn a better speaker embedding. Our model trained on LibriTTS using these techniques achieves the best performance, producing audio samples transferred well to the target speaker's voice, while preserving the linguistic content that is comparable with actual human utterances in terms of Character Error Rate.

【3】 Radar Occupancy Prediction with Lidar Supervision while Preserving Long-Range Sensing and Penetrating Capabilities 标题：保持远程感知和穿透能力的激光雷达监视雷达占有率预测链接：https://arxiv.org/abs/2112.04282

作者：Pou-Chun Kung,Chieh-Chih Wang,Wen-Chieh Lin 摘要：雷达通过在不同天气条件下实现远程传感，显示了自主驾驶的巨大潜力。但由于雷达噪声的存在，雷达也是一种极具挑战性的传感方式。最近的工作在利用激光雷达标签监督对雷达图像中的空闲空间和占用空间进行分类方面取得了巨大进展。然而，还有几个问题没有解决。首先，激光雷达的探测范围限制了探测结果的探测距离。其次，由于两个传感器之间的物理传感差异，激光雷达会降低结果的性能。例如，一些激光雷达可见的物体对雷达是不可见的，而由于雷达的穿透能力，激光雷达扫描中遮挡的一些物体在雷达图像中是可见的。这些感知差异分别导致假阳性和穿透能力退化。针对这一问题，本文提出了训练数据预处理和极轴滑动窗口推理的方法。数据预处理旨在减少激光雷达扫描中雷达不可见测量的影响。极性滑动窗口推理旨在通过将近距离训练网络应用于远程区域来解决有限的传感范围问题。我们建议使用极坐标表示来减少远距离和近距离数据之间的形状差异，而不是使用普通的笛卡尔表示。我们发现，将近距离训练网络扩展到极坐标空间中的长距离区域推理，其IoU比笛卡尔空间中的IoU好4.2倍。此外，极滑动窗口推理通过改变推理区域的视点来保持雷达的穿透能力，这使得一些被遮挡的测量对于预训练网络来说似乎是不被遮挡的。摘要：Radar shows great potential for autonomous driving by accomplishing long-range sensing under diverse weather conditions. But radar is also a particularly challenging sensing modality due to the radar noises. Recent works have made enormous progress in classifying free and occupied spaces in radar images by leveraging lidar label supervision. However, there are still several unsolved issues. Firstly, the sensing distance of the results is limited by the sensing range of lidar. Secondly, the performance of the results is degenerated by lidar due to the physical sensing discrepancies between the two sensors. For example, some objects visible to lidar are invisible to radar, and some objects occluded in lidar scans are visible in radar images because of the radar's penetrating capability. These sensing differences cause false positive and penetrating capability degeneration, respectively. In this paper, we propose training data preprocessing and polar sliding window inference to solve the issues. The data preprocessing aims to reduce the effect caused by radar-invisible measurements in lidar scans. The polar sliding window inference aims to solve the limited sensing range issue by applying a near-range trained network to the long-range region. Instead of using common Cartesian representation, we propose to use polar representation to reduce the shape dissimilarity between long-range and near-range data. We find that extending a near-range trained network to long-range region inference in the polar space has 4.2 times better IoU than in Cartesian space. Besides, the polar sliding window inference can preserve the radar penetrating capability by changing the viewpoint of the inference region, which makes some occluded measurements seem non-occluded for a pretrained network.

【4】 Self-Supervised Models are Continual Learners 标题：自我监督模式是持续的学习器链接：https://arxiv.org/abs/2112.04215

作者：Enrico Fini,Victor G. Turrisi da Costa,Xavier Alameda-Pineda,Elisa Ricci,Karteek Alahari,Julien Mairal 摘要：自监督模型已被证明，在对未标记数据进行大规模离线训练时，其视觉表现比其监督模型更具可比性或更好。然而，在连续学习（CL）场景中，当数据按顺序呈现给模型时，它们的效率会灾难性地降低。在本文中，我们证明了通过添加一个预测网络，将表示的当前状态映射到其过去状态，自监督损失函数可以无缝地转换为CL的蒸馏机制。这使我们能够设计一个持续的自我监督视觉表征学习框架，该框架（i）显著提高学习表征的质量，（ii）与多个最先进的自我监督目标兼容，（iii）几乎不需要超参数调整。我们通过在不同的CL环境中训练六个流行的自我监督模型来证明我们方法的有效性。摘要：Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a Continual Learning (CL) scenario where data is presented to the model sequentially. In this paper, we show that self-supervised loss functions can be seamlessly converted into distillation mechanisms for CL by adding a predictor network that maps the current state of the representations to their past state. This enables us to devise a framework for Continual self-supervised visual representation Learning that (i) significantly improves the quality of the learned representations, (ii) is compatible with several state-of-the-art self-supervised objectives, and (iii) needs little to no hyperparameter tuning. We demonstrate the effectiveness of our approach empirically by training six popular self-supervised models in various CL settings.

【5】 Learning music audio representations via weak language supervision 标题：通过弱语言监督学习音乐音频表征链接：https://arxiv.org/abs/2112.04214

作者：Ilaria Manco,Emmanouil Benetos,Elio Quinton,Gyorgy Fazekas 备注：5 pages, 5 figures 摘要：用于音乐信息检索的音频表示通常通过任务特定方式的监督学习来学习。尽管该方案能够有效地产生最先进的结果，但对于模型可能具有的应用范围而言，该方案缺乏灵活性，并且需要大量注释数据集。在这项工作中，我们提出了一个问题，即是否有可能利用弱对齐文本作为学习通用音乐音频表示的唯一监督信号。为了解决这个问题，我们设计了一个多模式的音乐和语言预训练体系结构（MuLaP），通过一组代理任务进行优化。以嘈杂的自然语言描述的形式提供微弱的监督，传达曲目的整体音乐内容。在预训练之后，我们将模型的音频主干转移到一组音乐音频分类和回归任务中。我们通过比较同一音频主干与不同训练策略产生的音频表示的性能，证明了我们方法的有效性，并表明我们的预训练方法在所有任务和数据集上始终取得可比或更高的分数。我们的实验还证实，MuLaP有效地利用音频字幕对来学习与文献中的纯音频和跨模态自监督方法相竞争的表示。摘要：Audio representations for music information retrieval are typically learned via supervised learning in a task-specific fashion. Although effective at producing state-of-the-art results, this scheme lacks flexibility with respect to the range of applications a model can have and requires extensively annotated datasets. In this work, we pose the question of whether it may be possible to exploit weakly aligned text as the only supervisory signal to learn general-purpose music audio representations. To address this question, we design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks. Weak supervision is provided in the form of noisy natural language descriptions conveying the overall musical content of the track. After pre-training, we transfer the audio backbone of the model to a set of music audio classification and regression tasks. We demonstrate the usefulness of our approach by comparing the performance of audio representations produced by the same audio backbone with different training strategies and show that our pre-training method consistently achieves comparable or higher scores on all tasks and datasets considered. Our experiments also confirm that MuLaP effectively leverages audio-caption pairs to learn representations that are competitive with audio-only and cross-modal self-supervised methods in the literature.

【6】 Model-Value Inconsistency as a Signal for Epistemic Uncertainty 标题：作为认知不确定性信号的模型-值不一致链接：https://arxiv.org/abs/2112.04153

作者：Angelos Filos,Eszter Vértes,Zita Marinho,Gregory Farquhar,Diana Borsa,Abram Friesen,Feryal Behbahani,Tom Schaul,André Barreto,Simon Osindero 备注：The first three authors contributed equally 摘要：通过使用环境模型和价值函数，agent可以通过将模型展开为不同的长度并使用其价值函数进行引导，来构建对状态价值的许多估计。我们的关键洞察是，我们可以将这组值估计视为一种集合，我们称之为emph{implicit value integration}（IVE）。因此，这些估计之间的差异可以作为代理人认知不确定性的代理；我们将此信号简称为emph{model value inconsistency}或emph{self inconsistency}。与之前通过训练多个模型和/或值函数的集合来估计不确定性的工作不同，这种方法只需要单个模型和值函数，而大多数基于模型的强化学习算法已经在学习这些模型和值函数。我们从像素的表格和函数近似设置中提供了经验证据，证明自不一致性是有用的（i）作为勘探信号，（ii）在分布变化下安全行动，以及（iii）使用模型对基于价值的规划进行稳健性验证。摘要：Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent's epistemic uncertainty; we term this signal emph{model-value inconsistency} or emph{self-inconsistency} for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a model.

【7】 Active Sensing for Communications by Learning 标题：基于学习的主动感知通信链接：https://arxiv.org/abs/2112.04075

作者：Foad Sohrabi,Tao Jiang,Wei Cui,Wei Yu 备注：14 Pages, 9 Figures 摘要：本文针对无线通信中的一类主动感知问题提出了一种深度学习方法，其中一个代理在预定的时间帧内顺序地与环境交互以收集信息，以便执行感知或驱动任务以最大化某些效用函数。在这种主动学习环境中，智能体需要根据迄今为止的观察结果依次设计自适应感知策略。为了解决这样一个具有挑战性的问题，即历史观测的维数随着时间的推移而增加，我们建议使用长短时记忆（LSTM）网络来利用观测序列中的时间相关性，并将每个观测映射到一个固定大小的状态信息向量。然后，我们使用深度神经网络（DNN）将每个时间帧的LSTM状态映射到下一个测量步骤的设计。最后，我们使用另一个DNN将最终LSTM状态映射到所需的解决方案。我们研究了无线通信中自适应信道感知问题的框架的性能。特别地，我们考虑了毫米波波束对准的自适应波束形成问题和反射对准的自适应可重构智能表面传感问题。数值结果表明，所提出的深度主动传感策略优于现有的自适应或非自适应传感方案。摘要：This paper proposes a deep learning approach to a class of active sensing problems in wireless communications in which an agent sequentially interacts with an environment over a predetermined number of time frames to gather information in order to perform a sensing or actuation task for maximizing some utility function. In such an active learning setting, the agent needs to design an adaptive sensing strategy sequentially based on the observations made so far. To tackle such a challenging problem in which the dimension of historical observations increases over time, we propose to use a long short-term memory (LSTM) network to exploit the temporal correlations in the sequence of observations and to map each observation to a fixed-size state information vector. We then use a deep neural network (DNN) to map the LSTM state at each time frame to the design of the next measurement step. Finally, we employ another DNN to map the final LSTM state to the desired solution. We investigate the performance of the proposed framework for adaptive channel sensing problems in wireless communications. In particular, we consider the adaptive beamforming problem for mmWave beam alignment and the adaptive reconfigurable intelligent surface sensing problem for reflection alignment. Numerical results demonstrate that the proposed deep active sensing strategy outperforms the existing adaptive or nonadaptive sensing schemes.

【8】 Unsupervised Representation Learning via Neural Activation Coding 标题：基于神经激活编码的无监督表示学习链接：https://arxiv.org/abs/2112.04014

作者：Yookoon Park,Sangho Lee,Gunhee Kim,David M. Blei 备注：Published in International Conference on Machine Learning (ICML), 2021 摘要：我们提出神经激活编码（NAC）作为一种新的方法，用于从未标记数据中学习深度表示，以用于下游应用。我们认为深度编码器应该最大化其在数据上的非线性表现力，以便下游预测器充分利用其表示能力。为此，NAC在噪声通信信道上最大化编码器的激活模式和数据之间的互信息。我们证明了对噪声鲁棒激活码的学习增加了ReLU编码器的不同线性区域的数量，从而获得了最大的非线性表达能力。更有趣的是，NAC学习数据的连续和离散表示，我们分别对两个下游任务进行评估：（i）对CIFAR-10和ImageNet-1K进行线性分类，以及（ii）对CIFAR-10和FLICKR-25K进行最近邻检索。实证结果表明，NAC在最近的基线（包括SimCLR和DICTURHASH）中，在这两项任务上都取得了更好或可比的性能。此外，NAC预训练为深层生成模型的训练提供了显著的益处。我们的代码可在https://github.com/yookoon/nac. 摘要：We present neural activation coding (NAC) as a novel approach for learning deep representations from unlabeled data for downstream applications. We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power. To this end, NAC maximizes the mutual information between activation patterns of the encoder and the data over a noisy communication channel. We show that learning for a noise-robust activation code increases the number of distinct linear regions of ReLU encoders, hence the maximum nonlinear expressivity. More interestingly, NAC learns both continuous and discrete representations of data, which we respectively evaluate on two downstream tasks: (i) linear classification on CIFAR-10 and ImageNet-1K and (ii) nearest neighbor retrieval on CIFAR-10 and FLICKR-25K. Empirical results show that NAC attains better or comparable performance on both tasks over recent baselines including SimCLR and DistillHash. In addition, NAC pretraining provides significant benefits to the training of deep generative models. Our code is available at https://github.com/yookoon/nac.

【9】 Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization 标题：基于简单暹罗网络和自监督正则化的自监督说话人确认链接：https://arxiv.org/abs/2112.04459

作者：Mufan Sang,Haoqi Li,Fang Liu,Andrew O. Arnold,Li Wan 备注：Submitted to ICASSP 2022 摘要：训练没有说话人标签的说话人鉴别和鲁棒的说话人验证系统仍然具有挑战性，值得探索。在本研究中，我们提出了一个有效的自监督学习框架和一种新的正则化策略来促进自监督说话人表征学习。与基于对比学习的自监督学习方法不同，所提出的自监督正则化方法（SSReg）只关注正数据对潜在表示之间的相似性。我们还探讨了替代在线数据增强策略在时域和频域上的有效性。通过我们强大的在线数据扩充策略，所提出的SSReg显示了不使用负对的自监督学习的潜力，并且它可以通过简单的暹罗网络结构显著提高自监督说话人表示学习的性能。在VoxCeleb数据集上的综合实验表明，通过添加有效的自监督正则化，我们提出的自监督方法获得了23.4%的相对改进，并且优于以前的其他工作。摘要：Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. Different from contrastive learning-based self-supervised learning methods, the proposed self-supervised regularization (SSReg) focuses exclusively on the similarity between the latent representations of positive data pairs. We also explore the effectiveness of alternative online data augmentation strategies on both the time domain and frequency domain. With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture. Comprehensive experiments on the VoxCeleb datasets demonstrate that our proposed self-supervised approach obtains a 23.4% relative improvement by adding the effective self-supervised regularization and outperforms other previous works.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 Burn After Reading: Online Adaptation for Cross-domain Streaming Data 标题：阅后即焚：跨域流媒体数据的在线适配链接：https://arxiv.org/abs/2112.04345

作者：Luyu Yang,Mingfei Gao,Zeyuan Chen,Ran Xu,Abhinav Shrivastava,Chetan Ramaiah 摘要：在网络隐私的背景下，许多方法提出了复杂的隐私和安全保护措施来保护敏感数据。在本文中，我们认为：不存储任何敏感数据是最好的安全形式。因此，我们提出了一个“阅读后燃烧”的在线框架，即每个在线样本在处理后立即被删除。同时，我们将标记的公共数据和未标记的私有数据之间不可避免的分布转移作为无监督的域适配问题来解决。具体来说，我们提出了一种新的算法，旨在解决在线自适应设置的最基本挑战——缺乏不同的源-目标数据对。因此，我们设计了一种称为CroDoBo的跨域自举方法，以增加跨域的组合多样性。此外，为了充分利用不同组合之间的有价值差异，我们采用了多个学习者共同监督的训练策略。CroDoBo在四个领域适应基准上实现了最先进的在线性能。摘要：In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data. In this paper, we argue that: not storing any sensitive data is the best form of security. Thus we propose an online framework that "burns after reading", i.e. each online sample is immediately deleted after it is processed. Meanwhile, we tackle the inevitable distribution shift between the labeled public data and unlabeled private data as a problem of unsupervised domain adaptation. Specifically, we propose a novel algorithm that aims at the most fundamental challenge of the online adaptation setting--the lack of diverse source-target data pairs. Therefore, we design a Cross-Domain Bootstrapping approach, called CroDoBo, to increase the combined diversity across domains. Further, to fully exploit the valuable discrepancies among the diverse combinations, we employ the training strategy of multiple learners with co-supervision. CroDoBo achieves state-of-the-art online performance on four domain adaptation benchmarks.

【2】 Pareto Domain Adaptation 标题：帕累托域适配链接：https://arxiv.org/abs/2112.04137

作者：Fangrui Lv,Jian Liang,Kaixiong Gong,Shuang Li,Chi Harold Liu,Han Li,Di Liu,Guoren Wang 备注：Accepted in NeurIPS 2021 摘要：域适配（DA）尝试将知识从标记的源域转移到与源域分布不同的未标记的目标域。为了实现这一点，DA方法包括一个用于提取源知识的源分类目标和一个用于减少域转移、确保知识转移的域对齐目标。通常，以前的DA方法采用一些权重超参数来线性组合训练目标，形成一个总体目标。然而，这些目标的梯度方向可能会由于域移动而相互冲突。在这种情况下，线性优化方案可能会以破坏其中一个训练目标为代价降低总体目标值，从而导致受限解。在本文中，我们从基于梯度的角度重新思考DA的优化方案。我们提出了一种Pareto域自适应（ParetoDA）方法来控制整体优化方向，旨在协同优化所有训练目标。具体来说，为了在目标域上获得理想的解决方案，我们设计了一个模拟目标分类的代理损失。为了提高目标预测精度以支持模拟，我们提出了一种基于贝叶斯定理的目标预测细化机制。另一方面，由于目标加权方案的先验知识通常无法指导优化在目标域上逼近最优解，因此我们提出了一种动态偏好机制，通过保留未标记目标数据集上代理损失的梯度来动态指导我们的合作优化。在图像分类和语义分割基准上的大量实验证明了ParetoDA算法的有效性摘要：Domain adaptation (DA) attempts to transfer the knowledge from a labeled source domain to an unlabeled target domain that follows different distribution from the source. To achieve this, DA methods include a source classification objective to extract the source knowledge and a domain alignment objective to diminish the domain shift, ensuring knowledge transfer. Typically, former DA methods adopt some weight hyper-parameters to linearly combine the training objectives to form an overall objective. However, the gradient directions of these objectives may conflict with each other due to domain shift. Under such circumstances, the linear optimization scheme might decrease the overall objective value at the expense of damaging one of the training objectives, leading to restricted solutions. In this paper, we rethink the optimization scheme for DA from a gradient-based perspective. We propose a Pareto Domain Adaptation (ParetoDA) approach to control the overall optimization direction, aiming to cooperatively optimize all training objectives. Specifically, to reach a desirable solution on the target domain, we design a surrogate loss mimicking target classification. To improve target-prediction accuracy to support the mimicking, we propose a target-prediction refining mechanism which exploits domain labels via Bayes' theorem. On the other hand, since prior knowledge of weighting schemes for objectives is often unavailable to guide optimization to approach the optimal solution on the target domain, we propose a dynamic preference mechanism to dynamically guide our cooperative optimization by the gradient of the surrogate loss on a held-out unlabeled target dataset. Extensive experiments on image classification and semantic segmentation benchmarks demonstrate the effectiveness of ParetoDA

【3】 Multinational Address Parsing: A Zero-Shot Evaluation 标题：跨国地址解析：零概率评估链接：https://arxiv.org/abs/2112.04008

作者：Marouane Yassine,David Beauchemin,François Laviolette,Luc Lamontagne 备注：Accepted in the International Journal of Information Science and Technology (iJIST). arXiv admin note: text overlap with arXiv:2006.16152 摘要：地址解析包括识别组成地址的段，例如街道名称或邮政编码。由于地址解析在记录链接等任务中的重要性，人们使用了许多技术来进行地址解析，最近的一种技术是依靠神经网络。虽然这些模型产生了显著的结果，但之前关于神经网络的工作只关注解析来自单一来源国的地址。本文探讨了在Zero-Shot迁移学习环境下，在不进行进一步训练的情况下，将通过对一些国家的地址进行深度学习模型训练而获得的地址解析知识转移给其他国家的可能性。我们还实验了在相同的Zero-Shot转移设置下使用注意机制和域对抗训练算法来提高性能。这两种方法在大多数测试国家都能产生最先进的性能，而在其余国家则能产生良好的效果。我们还探讨了不完全地址对最佳模型的影响，并评估了在训练期间使用不完全地址的影响。此外，我们还提出了一些经过训练的模型的开源Python实现。摘要：Address parsing consists of identifying the segments that make up an address, such as a street name or a postal code. Because of its importance for tasks like record linkage, address parsing has been approached with many techniques, the latest relying on neural networks. While these models yield notable results, previous work on neural networks has only focused on parsing addresses from a single source country. This paper explores the possibility of transferring the address parsing knowledge acquired by training deep learning models on some countries' addresses to others with no further training in a zero-shot transfer learning setting. We also experiment using an attention mechanism and a domain adversarial training algorithm in the same zero-shot transfer setting to improve performance. Both methods yield state-of-the-art performance for most of the tested countries while giving good results to the remaining countries. We also explore the effect of incomplete addresses on our best model, and we evaluate the impact of using incomplete addresses during training. In addition, we propose an open-source Python implementation of some of our trained models.

【4】 Which images to label for few-shot medical landmark detection? 标题：要标记哪些图像以进行Few-Shot医学标志性检测？链接：https://arxiv.org/abs/2112.04386

作者：Quan Quan,Qingsong Yao,Jun Li,S. Kevin Zhou 摘要：深度学习方法的成功依赖于标记良好的大规模数据集的可用性。然而，对于医学图像来说，注释如此丰富的训练数据通常需要有经验的放射科医生，并且消耗他们有限的时间。Few-Shot学习是为了减轻这一负担而开发的，它只需要几个标记数据就可以获得有竞争力的性能。然而，在少数镜头学习中，一个关键但先前被忽略的问题是在学习之前选择模板图像进行注释，这会影响最终的性能。在此，我们提出了一种新的样本选择策略（SCP）来选择“最有价值”的图像进行注释，在Few-Shot医学地标检测的背景下。SCP由三部分组成：1）构建预训练深度模型以从放射图像中提取特征的自我监督训练，2）定位信息斑块的关键点建议，以及3）搜索最具代表性样本或模板的代表性得分估计。在三个广泛使用的公共数据集上的各种实验证明了SCP的优势。对于一次性医疗标志物检测，其使用将头影测量和手X光数据集的平均径向误差分别减少14.2%（从3.595mm减少到3.083mm）和35.5%（4.114mm减少到2.653mm）。摘要：The success of deep learning methods relies on the availability of well-labeled large-scale datasets. However, for medical images, annotating such abundant training data often requires experienced radiologists and consumes their limited time. Few-shot learning is developed to alleviate this burden, which achieves competitive performances with only several labeled data. However, a crucial yet previously overlooked problem in few-shot learning is about the selection of template images for annotation before learning, which affects the final performance. We herein propose a novel Sample Choosing Policy (SCP) to select "the most worthy" images for annotation, in the context of few-shot medical landmark detection. SCP consists of three parts: 1) Self-supervised training for building a pre-trained deep model to extract features from radiological images, 2) Key Point Proposal for localizing informative patches, and 3) Representative Score Estimation for searching the most representative samples or templates. The advantage of SCP is demonstrated by various experiments on three widely-used public datasets. For one-shot medical landmark detection, its use reduces the mean radial errors on Cephalometric and HandXray datasets by 14.2% (from 3.595mm to 3.083mm) and 35.5% (4.114mm to 2.653mm), respectively.

【5】 Adaptive R-Peak Detection on Wearable ECG Sensors for High-Intensity Exercise 标题：适用于大强度运动的穿戴式心电传感器的自适应R峰检测链接：https://arxiv.org/abs/2112.04369

作者：Elisabetta De Giovanni,Tomas Teijeiro,Grégoire P. Millet,David Atienza 备注：12 pages, 14 figures, 2 tables 摘要：目的：通过可穿戴传感器对生物信号进行连续监测已迅速扩展到医疗和健康领域。在静止状态下，重要参数的自动检测通常是准确的。然而，在高强度运动等条件下，信号会发生突然的生理变化，影响标准算法的鲁棒性。方法：我们的方法称为BayeSlope，它基于无监督学习、贝叶斯滤波和非线性归一化，根据R峰在心电图中的预期位置来增强和正确检测R峰。此外，由于BayeSlope计算量大，并且能够快速耗尽设备电池，因此我们提出了一种在线设计，该设计能够适应突发生理变化的鲁棒性以及现代嵌入式平台异构资源的复杂性。该方法将BayeSlope算法与轻量级算法相结合，在具有不同功能的内核中执行，以降低能耗，同时保持准确性。结果：在20名受试者的高强度自行车运动中，贝斯洛普的F1成绩达到99.3%。此外，在线自适应过程在五种不同运动强度下的F1得分为99%，总能量消耗为1.55 -0.54 mJ。结论：我们提出了一种高准确度和鲁棒性的方法，并在现代超低功耗嵌入式平台上实现了一个完全节能的实现，以改进挑战性条件下的R峰值检测，例如在高强度运动期间。重要意义：实验表明，BayeSlope在F1分数上优于最先进的算法，最高可达8.4%，而我们的在线自适应方法在现代异构可穿戴平台上可以达到38.7%的节能效果。摘要：Objective: Continuous monitoring of biosignals via wearable sensors has quickly expanded in the medical and wellness fields. At rest, automatic detection of vital parameters is generally accurate. However, in conditions such as high-intensity exercise, sudden physiological changes occur to the signals, compromising the robustness of standard algorithms. Methods: Our method, called BayeSlope, is based on unsupervised learning, Bayesian filtering, and non-linear normalization to enhance and correctly detect the R peaks according to their expected positions in the ECG. Furthermore, as BayeSlope is computationally heavy and can drain the device battery quickly, we propose an online design that adapts its robustness to sudden physiological changes, and its complexity to the heterogeneous resources of modern embedded platforms. This method combines BayeSlope with a lightweight algorithm, executed in cores with different capabilities, to reduce the energy consumption while preserving the accuracy. Results: BayeSlope achieves an F1 score of 99.3% in experiments during intense cycling exercise with 20 subjects. Additionally, the online adaptive process achieves an F1 score of 99% across five different exercise intensities, with a total energy consumption of 1.55 -0.54~mJ. Conclusion: We propose a highly accurate and robust method, and a complete energy-efficient implementation in a modern ultra-low-power embedded platform to improve R peak detection in challenging conditions, such as during high-intensity exercise. Significance: The experiments show that BayeSlope outperforms a state-of-the-art algorithm up to 8.4% in F1 score, while our online adaptive method can reach energy savings up to 38.7% on modern heterogeneous wearable platforms.

强化学习(1篇)

【1】 Application of Deep Reinforcement Learning to Payment Fraud 标题：深度强化学习在支付欺诈中的应用链接：https://arxiv.org/abs/2112.04236

作者：Siddharth Vimal,Kanishka Kayathwal,Hardik Wadhwa,Gaurav Dhama 备注：Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond at KDD 2021 摘要：在过去十年中，消费者可以选择的各种数字支付方式一直是电子商务交易的关键驱动力。不幸的是，这也导致了网络罪犯和欺诈者不断通过部署越来越复杂的欺诈攻击来寻找这些系统中的漏洞。典型的欺诈检测系统采用标准的监督学习方法，重点是最大化欺诈召回率。然而，我们认为这样的公式可能导致次优解。这些欺诈模型的设计要求它们对数据中的高级不平衡具有鲁棒性，能够适应欺诈模式的变化，在欺诈率和下降率之间保持平衡，以实现收入最大化，并且易于接受异步反馈，因为通常在交易和欺诈实现之间存在明显的滞后。为了实现这一点，我们将欺诈检测描述为一个连续的决策问题，通过在模型中以奖励函数的形式包含效用最大化。历史拒绝率和欺诈率使用由批准或拒绝交易组成的二元操作空间定义系统的状态。在这项研究中，我们主要关注效用最大化，并为此探索不同的奖励函数。针对两个公开的欺诈数据集，使用深度Q-学习对所提出的强化学习系统的性能进行了评估，并与不同的分类器进行了比较。我们的目标是在今后的工作中解决其余问题。摘要：The large variety of digital payment choices available to consumers today has been a key driver of e-commerce transactions in the past decade. Unfortunately, this has also given rise to cybercriminals and fraudsters who are constantly looking for vulnerabilities in these systems by deploying increasingly sophisticated fraud attacks. A typical fraud detection system employs standard supervised learning methods where the focus is on maximizing the fraud recall rate. However, we argue that such a formulation can lead to sub-optimal solutions. The design requirements for these fraud models requires that they are robust to the high-class imbalance in the data, adaptive to changes in fraud patterns, maintain a balance between the fraud rate and the decline rate to maximize revenue, and be amenable to asynchronous feedback since usually there is a significant lag between the transaction and the fraud realization. To achieve this, we formulate fraud detection as a sequential decision-making problem by including the utility maximization within the model in the form of the reward function. The historical decline rate and fraud rate define the state of the system with a binary action space composed of approving or declining the transaction. In this study, we primarily focus on utility maximization and explore different reward functions to this end. The performance of the proposed Reinforcement Learning system has been evaluated for two publicly available fraud datasets using Deep Q-learning and compared with different classifiers. We aim to address the rest of the issues in future work.

元学习(1篇)

【1】 CoMPS: Continual Meta Policy Search 标题：COMPS：连续元策略搜索链接：https://arxiv.org/abs/2112.04467

作者：Glen Berseth,Zhiwei Zhang,Grace Zhang,Chelsea Finn,Sergey Levine 备注：23 pages, under review 摘要：我们开发了一种新的持续元学习方法来应对顺序多任务学习中的挑战。在此设置中，代理的目标是在任何任务序列中快速获得高回报。先前的元强化学习算法在加速新任务的获取方面已显示出良好的效果。但是，它们需要在训练期间访问所有任务。除了简单地将过去的经验转移到新任务之外，我们的目标是设计出能够学会学习的持续强化学习算法，利用他们在以前任务中的经验更快地学习新任务。我们引入了一种新的方法，即连续元策略搜索（CoMPS），该方法通过以增量方式对每个任务进行元训练来消除这一限制，而无需重新访问以前的任务。CoMPS不断重复两个子例程：使用RL学习新任务，使用RL的经验执行完全离线元学习，为后续任务学习做好准备。我们发现，在几个具有挑战性的连续控制任务序列上，CoMPS优于先前的连续学习和非策略元强化方法。摘要：We develop a new continual meta-learning method to address challenges in sequential multi-task learning. In this setting, the agent's goal is to achieve high reward over any sequence of tasks quickly. Prior meta-reinforcement learning algorithms have demonstrated promising results in accelerating the acquisition of new tasks. However, they require access to all tasks during training. Beyond simply transferring past experience to new tasks, our goal is to devise continual reinforcement learning algorithms that learn to learn, using their experience on previous tasks to learn new tasks more quickly. We introduce a new method, continual meta-policy search (CoMPS), that removes this limitation by meta-training in an incremental fashion, over each task in a sequence, without revisiting prior tasks. CoMPS continuously repeats two subroutines: learning a new task using RL and using the experience from RL to perform completely offline meta-learning to prepare for subsequent task learning. We find that CoMPS outperforms prior continual learning and off-policy meta-reinforcement methods on several sequences of challenging continuous control tasks.

符号|符号学习(1篇)

【1】 Accelerating Understanding of Scientific Experiments with End to End Symbolic Regression 标题：用端到端符号回归促进对科学实验的理解链接：https://arxiv.org/abs/2112.04023

作者：Nikos Arechiga,Francine Chen,Yan-Ying Chen,Yanxia Zhang,Rumen Iliev,Heishiro Toyoda,Kent Lyons 摘要：我们考虑从原始数据中学习自由形式符号表达式的问题，例如任何科学领域中的实验所产生的问题。科学现象的精确和可解释的模型是科学研究的基石。简单但可解释的模型，如线性或逻辑回归和决策树通常缺乏预测准确性。或者，精确的黑盒模型（如深度神经网络）提供了很高的预测精度，但不容易接受人类对这一现象的理解，从而丰富了这一现象的科学理论。科学上的许多重大突破都是围绕着开发具有高预测精度的简约方程模型展开的，如牛顿定律、万有引力和麦克斯韦方程。以前关于从数据中自动搜索等式模型的工作结合了特定领域的启发式方法以及计算昂贵的技术，如遗传编程和蒙特卡罗搜索。我们开发了一个深度神经网络（MACSYMA），将符号回归问题作为端到端的监督学习问题来处理。MACSYMA可以生成描述数据集的符号表达式。该任务的计算复杂度降低为神经网络的前馈计算。我们在一个合成数据集上训练神经网络，该数据集由不同长度和不同噪声水平的数据表组成，为此，神经网络必须学会一个标记一个标记地生成正确的符号表达式。最后，我们通过在行为科学的公共数据集上运行来验证我们的技术。摘要：We consider the problem of learning free-form symbolic expressions from raw data, such as that produced by an experiment in any scientific domain. Accurate and interpretable models of scientific phenomena are the cornerstone of scientific research. Simple yet interpretable models, such as linear or logistic regression and decision trees often lack predictive accuracy. Alternatively, accurate blackbox models such as deep neural networks provide high predictive accuracy, but do not readily admit human understanding in a way that would enrich the scientific theory of the phenomenon. Many great breakthroughs in science revolve around the development of parsimonious equational models with high predictive accuracy, such as Newton's laws, universal gravitation, and Maxwell's equations. Previous work on automating the search of equational models from data combine domain-specific heuristics as well as computationally expensive techniques, such as genetic programming and Monte-Carlo search. We develop a deep neural network (MACSYMA) to address the symbolic regression problem as an end-to-end supervised learning problem. MACSYMA can generate symbolic expressions that describe a dataset. The computational complexity of the task is reduced to the feedforward computation of a neural network. We train our neural network on a synthetic dataset consisting of data tables of varying length and varying levels of noise, for which the neural network must learn to produce the correct symbolic expression token by token. Finally, we validate our technique by running on a public dataset from behavioral science.

医学相关(1篇)

【1】 Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data 标题：基于高校Subreddit数据的冠状病毒大流行情绪分析及效果分析链接：https://arxiv.org/abs/2112.04351

作者：Tian Yan,Fang Liu 摘要：2019冠状病毒疾病已经影响到社会和人类健康和幸福。在本研究中，我们从与8所大学相关的亚Reddit社区收集了2019年（大流行前）和2020年（大流行）的Reddit数据，应用自然语言处理（NLP）技术，并使用社交媒体数据训练图形神经网络，研究与大流行前相比，大流行如何影响人们的情绪和心理状态。具体地说，我们首先应用一种预训练的鲁棒优化BERT预训练方法（RoBERTa）从Reddit消息的语义信息中学习嵌入，并训练一个用于情感分类的图注意网络（GAT）。GAT的使用允许我们在训练期间利用消息之间的关系信息。然后，我们应用子组自适应模型叠加，将RoBERTa和GAT的预测概率结合起来，得到情绪的最终分类。通过对收集的数据进行手动标记和模型预测情绪标签，我们应用广义线性混合效应模型以统计显著的方式估计流行病和在线教学对人们情绪的影响。结果表明，在研究人群中，2020年出现消极情绪的几率比2019年的几率高14.6%$（p$-value$<0.001$），而在2020年，面对面教学出现消极情绪的几率比在线教学高41.6%$（p$-value$=0.037$）。摘要：The COVID-19 pandemic has affected societies and human health and well-being in various ways. In this study, we collected Reddit data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with 8 universities, applied natural language processing (NLP) techniques, and trained graphical neural networks with social media data, to study how the pandemic has affected people's emotions and psychological states compared to the pre-pandemic era. Specifically, we first applied a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn embedding from the semantic information of Reddit messages and trained a graph attention network (GAT) for sentiment classification. The usage of GAT allows us to leverage the relational information among the messages during training. We then applied subgroup-adaptive model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the manually labeled and model-predicted sentiment labels on the collected data, we applied a generalized linear mixed-effects model to estimate the effects of pandemic and online teaching on people's sentiment in a statistically significant manner. The results suggest the odds of negative sentiments in 2020 is $14.6%$ higher than the odds in 2019 ($p$-value $<0.001$), and the odds of negative sentiments are $41.6%$ higher with in-person teaching than with online teaching in 2020 ($p$-value $=0.037$) in the studied population.

推理|分析|理解|解释(1篇)

【1】 Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework 标题：从邻域成分分析的视角重新审视对比学习：一个完整的框架链接：https://arxiv.org/abs/2112.04468

作者：Ching-Yun Ko,Jeet Mohapatra,Sijia Liu,Pin-Yu Chen,Luca Daniel,Lily Weng 备注：The full version of SSLNeurIPS'21 contributed talk (NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice). Work in progress 摘要：对比学习作为自我监督表征学习的重要工具，近年来受到了前所未有的重视。本质上，对比学习旨在利用成对的正样本和负样本进行表征学习，这涉及到在特征空间中利用邻域信息。通过研究对比学习和邻域成分分析（NCA）之间的联系，我们提出了一种新的随机最近邻对比学习观点，并随后提出了一系列优于现有对比学习的对比损失。在我们提出的框架下，我们展示了一种新的方法来设计综合对比损失，该方法可以同时在下游任务上实现良好的准确性和鲁棒性。通过集成框架，我们的标准精度提高了6%，对抗精度提高了17%。摘要：As a seminal tool in self-supervised representation learning, contrastive learning has gained unprecedented attention in recent years. In essence, contrastive learning aims to leverage pairs of positive and negative samples for representation learning, which relates to exploiting neighborhood information in a feature space. By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. Under our proposed framework, we show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks. With the integrated framework, we achieve up to 6% improvement on the standard accuracy and 17% improvement on the adversarial accuracy.

检测相关(3篇)

【1】 GCA-Net : Utilizing Gated Context Attention for Improving Image Forgery Localization and Detection 标题：GCA-Net：利用门限上下文关注度改进图像伪造定位与检测链接：https://arxiv.org/abs/2112.04298

作者：Sowmen Das,Md. Saiful Islam,Md. Ruhul Amin 摘要：法医分析依赖于从操纵图像中识别隐藏的痕迹。传统的神经网络在这项任务中失败，因为它们无法处理特征衰减和对主要空间特征的依赖。在这项工作中，我们提出了一种新的门控上下文注意网络（GCA-Net），它利用非局部注意块进行全局上下文学习。此外，我们利用门控注意机制和密集的解码器网络，在解码阶段引导相关特征流，从而实现精确定位。所提出的注意框架允许网络通过过滤粗略特征来关注相关区域。此外，通过利用多尺度特征融合和有效的学习策略，GCA网络可以更好地处理操纵区域的尺度变化。我们表明，我们的方法在多个基准数据集上的平均AUC为4.2%-5.4%，优于最先进的网络。最后，我们还进行了大量的烧蚀实验，以证明该方法对图像取证的鲁棒性。摘要：Forensic analysis depends on the identification of hidden traces from manipulated images. Traditional neural networks fail in this task because of their inability in handling feature attenuation and reliance on the dominant spatial features. In this work we propose a novel Gated Context Attention Network (GCA-Net) that utilizes the non-local attention block for global context learning. Additionally, we utilize a gated attention mechanism in conjunction with a dense decoder network to direct the flow of relevant features during the decoding phase, allowing for precise localization. The proposed attention framework allows the network to focus on relevant regions by filtering the coarse features. Furthermore, by utilizing multi-scale feature fusion and efficient learning strategies, GCA-Net can better handle the scale variation of manipulated regions. We show that our method outperforms state-of-the-art networks by an average of 4.2%-5.4% AUC on multiple benchmark datasets. Lastly, we also conduct extensive ablation experiments to demonstrate the method's robustness for image forensics.

【2】 Learnable Faster Kernel-PCA for Nonlinear Fault Detection: Deep Autoencoder-Based Realization 标题：基于深度自动编码器的可学习快速核PCA非线性故障检测实现链接：https://arxiv.org/abs/2112.04193

作者：Zelin Ren,Xuebing Yang,Yuchen Jiang,Wensheng Zhang 备注：11 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要：核主成分分析（KPCA）是一种公认的非线性降维方法，广泛应用于非线性故障检测任务中。作为一种基于内核技巧的方法，KPCA继承了两个主要问题。首先，核函数的形式和参数通常是盲目选择的，严重依赖于试错。因此，如果选择不当，可能会严重降低性能。第二，在在线监测阶段，KPCA计算量大，实时性差，因为内核方法需要利用所有离线训练数据。在这项工作中，为了解决这两个缺点，提出了一种可学习的快速实现传统KPCA的方法。其核心思想是使用新的非线性DAE-FE（基于深度自动编码器的特征提取）框架对所有可行的核函数进行参数化，并详细提出了DAE-PCA（基于深度自动编码器的主成分分析）方法。所提出的DAE-PCA方法与KPCA方法等价，但在根据输入自动搜索最合适的非线性高维空间方面具有更大的优势。此外，与传统KPCA相比，在线计算效率提高了约100倍。以田纳西-伊斯曼（TE）过程为例，说明了该方法的有效性和优越性。摘要：Kernel principal component analysis (KPCA) is a well-recognized nonlinear dimensionality reduction method that has been widely used in nonlinear fault detection tasks. As a kernel trick-based method, KPCA inherits two major problems. First, the form and the parameters of the kernel function are usually selected blindly, depending seriously on trial-and-error. As a result, there may be serious performance degradation in case of inappropriate selections. Second, at the online monitoring stage, KPCA has much computational burden and poor real-time performance, because the kernel method requires to leverage all the offline training data. In this work, to deal with the two drawbacks, a learnable faster realization of the conventional KPCA is proposed. The core idea is to parameterize all feasible kernel functions using the novel nonlinear DAE-FE (deep autoencoder based feature extraction) framework and propose DAE-PCA (deep autoencoder based principal component analysis) approach in detail. The proposed DAE-PCA method is proved to be equivalent to KPCA but has more advantage in terms of automatic searching of the most suitable nonlinear high-dimensional space according to the inputs. Furthermore, the online computational efficiency improves by approximately 100 times compared with the conventional KPCA. With the Tennessee Eastman (TE) process benchmark, the effectiveness and superiority of the proposed method is illustrated.

【3】 Scalable 3D Semantic Segmentation for Gun Detection in CT Scans 标题：CT扫描中用于枪支检测的可伸缩三维语义分割链接：https://arxiv.org/abs/2112.03917

作者：Marius Memmel,Christoph Reich,Nicolas Wagner,Faraz Saeedan 备注：This work was part of the Project Lab Deep Learning in Computer Vision Winter Semester 2019/2020 at TU Darmstadt 摘要：随着3D数据可用性的提高，对处理这些数据的解决方案的需求也迅速增加。然而，将维度添加到已经可靠精确的2D方法中会导致巨大的内存消耗和更高的计算复杂性。这些问题导致当前硬件达到其限制，大多数方法被迫大幅降低输入分辨率。我们的主要贡献是一种新的用于行李CT扫描中枪支检测的深度3D语义分割方法，该方法能够实现高分辨率体素化体积的快速训练和低视频内存消耗。我们介绍了一种移动金字塔方法，该方法在推理时利用多个前向过程分割实例。摘要：With the increased availability of 3D data, the need for solutions processing those also increased rapidly. However, adding dimension to already reliably accurate 2D approaches leads to immense memory consumption and higher computational complexity. These issues cause current hardware to reach its limitations, with most methods forced to reduce the input resolution drastically. Our main contribution is a novel deep 3D semantic segmentation method for gun detection in baggage CT scans that enables fast training and low video memory consumption for high-resolution voxelized volumes. We introduce a moving pyramid approach that utilizes multiple forward passes at inference time for segmenting an instance.

分类|识别(6篇)

【1】 Enhancing Counterfactual Classification via Self-Training 标题：通过自我训练加强反事实分类链接：https://arxiv.org/abs/2112.04461

作者：Ruijiang Gao,Max Biggs,Wei Sun,Ligong Han 备注：AAAI 2022 摘要：与传统的监督学习不同，在许多情况下，只有部分反馈可用。我们可能只观察所选行动的结果，但不观察与其他备选方案相关的反事实结果。这些设置涵盖了广泛的应用，包括定价、在线营销和精准医疗。一个关键的挑战是观测数据受到系统中部署的历史政策的影响，从而产生有偏差的数据分布。我们将这项任务视为一个领域适应问题，并提出了一种自我训练算法，该算法使用观察数据中有限的不可见行为的分类值来估算结果，以通过伪标记模拟随机试验，我们称之为反事实自我训练（CST）。CST迭代插补伪标签并重新训练模型。此外，我们还发现输入一致性损失可以进一步改善CST性能，这在最近的伪标记理论分析中得到了证明。我们在合成数据集和真实数据集上证明了所提算法的有效性。摘要：Unlike traditional supervised learning, in many settings only partial feedback is available. We may only observe outcomes for the chosen actions, but not the counterfactual outcomes associated with other alternatives. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. A key challenge is that observational data are influenced by historical policies deployed in the system, yielding a biased data distribution. We approach this task as a domain adaptation problem and propose a self-training algorithm which imputes outcomes with categorical values for finite unseen actions in the observational data to simulate a randomized trial through pseudolabeling, which we refer to as Counterfactual Self-Training (CST). CST iteratively imputes pseudolabels and retrains the model. In addition, we show input consistency loss can further improve CST performance which is shown in recent theoretical analysis of pseudolabeling. We demonstrate the effectiveness of the proposed algorithms on both synthetic and real datasets.

【2】 Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition 标题：用于细粒度识别的移动网络渐进式多阶段交互训练链接：https://arxiv.org/abs/2112.04223

作者：Zhenxin Wu,Qingliang Chen,Yifeng Liu,Yinqi Zhang,Chengkai Zhu,Yang Yu 摘要：细粒度视觉分类（FGVC）旨在从子类别中识别对象。这是一项非常具有挑战性的任务，因为班级之间存在微妙的差异。现有的研究采用大规模卷积神经网络或视觉变换器作为特征抽取器，这在计算上非常昂贵。事实上，细粒度识别的真实场景通常需要一个更轻量级的移动网络，可以离线使用。然而，与大规模模型相比，移动网络的基本特征提取能力较弱。本文基于轻量级MobilenetV2，提出了一种基于递归马赛克生成器（RMG-PMSI）的渐进式多阶段交互式训练方法。首先，我们提出了一种递归马赛克生成器（RMG），它在不同的阶段生成不同粒度的图像。然后，不同阶段的特征通过一个多阶段交互（MSI）模块，该模块加强和补充了不同阶段的相应特征。最后，使用渐进训练（P），模型在不同阶段提取的特征可以充分利用并相互融合。在三个著名的细粒度基准测试上的实验表明，RMG-PMSI能够显著提高性能，具有良好的鲁棒性和可移植性。摘要：Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences. Existing research applies large-scale convolutional neural networks or visual transformers as the feature extractor, which is extremely computationally expensive. In fact, real-world scenarios of fine-grained recognition often require a more lightweight mobile network that can be utilized offline. However, the fundamental mobile network feature extraction capability is weaker than large-scale models. In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI). First, we propose a Recursive Mosaic Generator (RMG) that generates images with different granularities in different phases. Then, the features of different stages pass through a Multi-Stage Interaction (MSI) module, which strengthens and complements the corresponding features of different stages. Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other. Experiments on three prestigious fine-grained benchmarks show that RMG-PMSI can significantly improve the performance with good robustness and transferability.

【3】 Best Arm Identification under Additive Transfer Bandits 标题：加性传递带下的ARM最佳辨识链接：https://arxiv.org/abs/2112.04083

作者：Ojash Neopane,Aaditya Ramdas,Aarti Singh 摘要：我们考虑在多臂土匪（MAB）中的最佳臂识别（BAI）问题的变体，其中有两组武器（源和目标），并且目标是在只拉源臂的情况下确定最佳目标臂。在本文中，我们研究了当平均值未知时，源和目标MAB实例之间存在已知的相加关系时的设置。我们将展示我们的框架如何涵盖一系列以前研究过的纯探索问题，并另外捕获新问题。我们提出并理论分析了一种LUCB类型的算法，以高概率识别$epsilon$最优目标臂。我们的理论分析强调了这种迁移学习问题在典型的BAI设置中不会出现的方面，但作为特例，恢复了单域BAI的LUCB算法。摘要：We consider a variant of the best arm identification (BAI) problem in multi-armed bandits (MAB) in which there are two sets of arms (source and target), and the objective is to determine the best target arm while only pulling source arms. In this paper, we study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We show how our framework covers a range of previously studied pure exploration problems and additionally captures new problems. We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability. Our theoretical analysis highlights aspects of this transfer learning problem that do not arise in the typical BAI setup, and yet recover the LUCB algorithm for single domain BAI as a special case.

【4】 Image classifiers can not be made robust to small perturbations 标题：不能使图像分类器对小扰动具有鲁棒性链接：https://arxiv.org/abs/2112.04033

作者：Zheng Dai,David K. Gifford 摘要：图像分类器对输入小扰动的敏感性通常被视为其结构的缺陷。我们证明了这种敏感性是分类器的一个基本属性。对于$n$-by-$n$图像集上的任意分类器，我们表明，对于除一个类别外的所有类别，与以任何$p$-标准（包括汉明距离）测量的图像空间直径相比，可以通过微小的修改来改变该类别中除一小部分以外的所有图像的分类。然后，我们研究这一现象在人类视觉感知中的表现，并讨论其对计算机视觉系统设计考虑的影响。摘要：The sensitivity of image classifiers to small perturbations in the input is often viewed as a defect of their construction. We demonstrate that this sensitivity is a fundamental property of classifiers. For any arbitrary classifier over the set of $n$-by-$n$ images, we show that for all but one class it is possible to change the classification of all but a tiny fraction of the images in that class with a tiny modification compared to the diameter of the image space when measured in any $p$-norm, including the hamming distance. We then examine how this phenomenon manifests in human visual perception and discuss its implications for the design considerations of computer vision systems.

【5】 DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification 标题：DeepFace-EMD：使用补丁推土机距离进行重新排序改进了非分布人脸识别链接：https://arxiv.org/abs/2112.04016

作者：Hai Phan,Anh Nguyen 摘要：人脸识别（FI）无处不在，并推动执法部门做出许多高风险决策。最先进的FI方法通过获取图像嵌入之间的余弦相似性来比较两幅图像。然而，这种方法对未包含在训练集或图库中的新类型图像（例如，当查询面被遮罩、裁剪或旋转时）的非分布（out-of-distribution，OOD）泛化较差。在这里，我们提出了一种重新排序的方法，该方法使用图像块的深层空间特征上的地球移动器距离来比较两个面。我们的额外比较阶段明确地在细粒度级别（例如，眼睛到眼睛）检查图像相似性，并且比传统FI对OOD扰动和遮挡更鲁棒。有趣的是，在没有微调特征提取器的情况下，我们的方法持续提高了所有测试的OOD查询的准确性：屏蔽、裁剪、旋转和敌对，同时在分布图像上获得类似的结果。摘要：Face identification (FI) is ubiquitous and drives many high-stake decisions made by law enforcement. State-of-the-art FI approaches compare two images by taking the cosine similarity between their image embeddings. Yet, such an approach suffers from poor out-of-distribution (OOD) generalization to new types of images (e.g., when a query face is masked, cropped, or rotated) not included in the training set or the gallery. Here, we propose a re-ranking approach that compares two faces using the Earth Mover's Distance on the deep, spatial features of image patches. Our extra comparison stage explicitly examines image similarity at a fine-grained level (e.g., eyes to eyes) and is more robust to OOD perturbations and occlusions than traditional FI. Interestingly, without finetuning feature extractors, our method consistently improves the accuracy on all tested OOD queries: masked, cropped, rotated, and adversarial while obtaining similar results on in-distribution images.

【6】 Dyadic Sex Composition and Task Classification Using fNIRS Hyperscanning Data 标题：基于fNIRS超扫描数据的二元性组成与任务分类链接：https://arxiv.org/abs/2112.03911

作者：Liam A. Kruse,Allan L. Reiss,Mykel J. Kochenderfer,Stephanie Balters 备注：20th IEEE International Conference on Machine Learning and Applications 摘要：功能性近红外光谱（fNIRS）超扫描技术是一种新兴的神经成像应用，它可以测量潜在社会互动的细微神经特征。研究人员评估了性别和任务类型（如合作与竞争）对人与人互动过程中大脑间连贯性的影响。然而，目前还没有研究使用基于深度学习的方法来深入了解fNIRS超扫描环境中的性别和任务差异。这项工作提出了一种基于卷积神经网络的方法，用于对$N=222$参与者的广泛超扫描数据集进行二元性别组合和任务分类。使用动态时间扭曲计算的脑间信号相似性作为输入数据。该方法的分类准确率最高可达80%以上，为探索和理解复杂的大脑行为提供了新的途径。摘要：Hyperscanning with functional near-infrared spectroscopy (fNIRS) is an emerging neuroimaging application that measures the nuanced neural signatures underlying social interactions. Researchers have assessed the effect of sex and task type (e.g., cooperation versus competition) on inter-brain coherence during human-to-human interactions. However, no work has yet used deep learning-based approaches to extract insights into sex and task-based differences in an fNIRS hyperscanning context. This work proposes a convolutional neural network-based approach to dyadic sex composition and task classification for an extensive hyperscanning dataset with $N = 222$ participants. Inter-brain signal similarity computed using dynamic time warping is used as the input data. The proposed approach achieves a maximum classification accuracy of greater than $80$ percent, thereby providing a new avenue for exploring and understanding complex brain behavior.

表征(2篇)

【1】 Gaudí: Conversational Interactions with Deep Representations to Generate Image Collections 标题：Gaudí：与深层表示的对话交互以生成图像集合链接：https://arxiv.org/abs/2112.04404

作者：Victor S. Bursztyn,Jennifer Healey,Vishwa Vinay 备注：Accepted at the NeurIPS 2021 Workshop on Machine Learning for Creativity and Design 摘要：基于现实主义语言建模（GPT-3）和跨模态表达（CLIP）的最新进展，Gaud'i被开发用于帮助设计师使用自然语言搜索灵感图像。在设计过程的早期阶段，设计师通常会创建主题集，将鼓舞人心的图像称为“情绪板”，目的是引出客户的首选创意方向。创建情绪板涉及顺序图像搜索，当前使用关键字或图像执行这些搜索。Gaud'i将这个过程转化为一个对话，用户在对话中逐渐详细描述情绪板的主题。这种表示方式允许我们的AI根据GPT-3假设的主题，直接从项目简报中从头开始生成新的搜索查询。与之前的情绪板创建计算方法相比，据我们所知，我们首次尝试将情绪板表示为设计师在向客户展示创意方向时讲述的故事。摘要：Based on recent advances in realistic language modeling (GPT-3) and cross-modal representations (CLIP), Gaud'i was developed to help designers search for inspirational images using natural language. In the early stages of the design process, with the goal of eliciting a client's preferred creative direction, designers will typically create thematic collections of inspirational images called "mood-boards". Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaud'i transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project briefing, following a theme hypothesized by GPT-3. Compared to previous computational approaches to mood-board creation, to the best of our knowledge, ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.

【2】 Implicit Neural Representations for Image Compression 标题：用于图像压缩的隐式神经表示法链接：https://arxiv.org/abs/2112.04267

作者：Yannick Strümpler,Janis Postels,Ren Yang,Luc van Gool,Federico Tombari 摘要：近年来，隐式神经表征（INRs）作为一种新的、有效的数据类型表征方法受到了广泛的关注。到目前为止，以前的工作主要集中在优化其重建性能。这项工作从一个新的角度研究INRs，即作为图像压缩工具。为此，我们提出了第一个基于INRs的综合压缩管道，包括量化、量化感知再训练和熵编码。使用INRs编码，即过度拟合数据样本，通常要慢几个数量级。为了缓解这一缺点，我们利用基于MAML的元学习初始化以较少的梯度更新达到编码，这通常也提高了INRs的率失真性能。我们发现，我们使用INRs进行源压缩的方法大大优于以前类似的工作，与专门为图像设计的普通压缩算法具有竞争力，并且与基于率失真自动编码器的最新学习方法相比，缩小了差距。此外，我们还对我们的方法的各个组成部分的重要性进行了广泛的研究，希望这有助于进一步研究这种新的图像压缩方法。摘要：Recently Implicit Neural Representations (INRs) gained attention as a novel and effective representation for various data types. Thus far, prior work mostly focused on optimizing their reconstruction performance. This work investigates INRs from a novel perspective, i.e., as a tool for image compression. To this end, we propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. Encoding with INRs, i.e. overfitting to a data sample, is typically orders of magnitude slower. To mitigate this drawback, we leverage meta-learned initializations based on MAML to reach the encoding in fewer gradient updates which also generally improves rate-distortion performance of INRs. We find that our approach to source compression with INRs vastly outperforms similar prior work, is competitive with common compression algorithms designed specifically for images and closes the gap to state-of-the-art learned approaches based on Rate-Distortion Autoencoders. Moreover, we provide an extensive ablation study on the importance of individual components of our method which we hope facilitates future research on this novel approach to image compression.

编码器(1篇)

【1】 Autoencoder-based Communications with Reconfigurable Intelligent Surfaces 标题：基于自动编码器的可重构智能曲面通信链接：https://arxiv.org/abs/2112.04441

作者：Tugba Erpek,Yalin E. Sagduyu,Ahmed Alkhateeb,Aylin Yener 摘要：本文提出了一种新的联合设计可重构智能表面（RIS）和发射机-接收机对的方法，它们作为一组深度神经网络（DNN）一起训练，以优化接收机的端到端通信性能。RIS是一个软件定义的单元阵列，可根据散射和反射剖面进行控制，以将来自发射器的输入信号聚焦到接收器。RIS的好处是通过克服视线（LoS）链路的物理障碍来提高无线通信的覆盖范围和频谱效率。RIS波束码字（来自预定义码本）的选择过程被表述为DNN，而发射机-接收机对的操作被建模为两个DNN，一个用于编码器（在发射机处），另一个用于自动编码器的解码器（在接收机处），通过考虑渠道效应，包括中间RIS引起的渠道效应。底层dnn被联合训练以最小化接收机处的符号错误率。数值结果表明，在没有使用RIS或RIS波束的选择与收发对的设计分离的情况下，所提出的设计在各种基线方案的误差性能方面取得了重大的进步。摘要：This paper presents a novel approach for the joint design of a reconfigurable intelligent surface (RIS) and a transmitter-receiver pair that are trained together as a set of deep neural networks (DNNs) to optimize the end-to-end communication performance at the receiver. The RIS is a software-defined array of unit cells that can be controlled in terms of the scattering and reflection profiles to focus the incoming signals from the transmitter to the receiver. The benefit of the RIS is to improve the coverage and spectral efficiency for wireless communications by overcoming physical obstructions of the line-of-sight (LoS) links. The selection process of the RIS beam codeword (out of a pre-defined codebook) is formulated as a DNN, while the operations of the transmitter-receiver pair are modeled as two DNNs, one for the encoder (at the transmitter) and the other one for the decoder (at the receiver) of an autoencoder, by accounting for channel effects including those induced by the RIS in between. The underlying DNNs are jointly trained to minimize the symbol error rate at the receiver. Numerical results show that the proposed design achieves major gains in error performance with respect to various baseline schemes, where no RIS is used or the selection of the RIS beam is separated from the design of the transmitter-receiver pair.

优化|敛散性(4篇)

【1】 Convergence Results For Q-Learning With Experience Replay 标题：带经验回放的Q-学习的收敛性结果链接：https://arxiv.org/abs/2112.04213

作者：Liran Szlak,Ohad Shamir 摘要：RL中一种常用的启发式方法是经验重放（例如~citet{lin1993reinforction，mnih2015human}），在这种方法中，学习者存储并重复使用过去的轨迹，就像在线采样一样。在这项工作中，我们开始在表格Q-学习环境中对这种启发式进行严格的研究。我们提供了一个收敛速度保证，并讨论了它如何与Q-学习的收敛性进行比较，这取决于重要参数，如重放迭代的频率和次数。通过引入和分析一类简单的MDP，我们还提供了理论证据，表明我们何时可以期望这种启发式严格提高性能。最后，我们提供了一些实验来支持我们的理论发现。摘要：A commonly used heuristic in RL is experience replay (e.g.~citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online. In this work, we initiate a rigorous study of this heuristic in the setting of tabular Q-learning. We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of replay iterations. We also provide theoretical evidence showing when we might expect this heuristic to strictly improve performance, by introducing and analyzing a simple class of MDPs. Finally, we provide some experiments to support our theoretical findings.

【2】 Pretrained Cost Model for Distributed Constraint Optimization Problems 标题：分布式约束优化问题的预训练成本模型链接：https://arxiv.org/abs/2112.04187

作者：Yanchen Deng,Shufeng Kong,Bo An 备注：Accepted by AAAI-22 摘要：分布式约束优化问题（DCOP）是组合优化问题的一个重要子类，其中信息和控制分布在多个自治代理之间。以前，机器学习（ML）主要通过学习有效的启发式算法来解决组合优化问题。然而，现有的基于ML的启发式方法往往不能推广到不同的搜索算法。最重要的是，这些方法通常需要完全了解要解决的问题，这不适用于分布式环境，因为在分布式环境中，由于地理限制或隐私问题，集中化不现实。为了解决通用性问题，我们提出了一种新的DCOP有向无环图表示模式，并利用图注意网络（GAT）嵌入图表示。然后，我们的模型GAT-PCM以离线方式使用最佳标记数据进行预训练，以便构造有效的启发式算法，以促进广泛的DCOP算法，其中评估部分分配的质量至关重要，例如局部搜索或回溯搜索。此外，为了实现分散的模型推理，我们提出了GAT-PCM的分布式嵌入模式，其中每个代理只交换嵌入向量，并展示了其合理性和复杂性。最后，我们将模型与局部搜索或回溯搜索算法相结合，证明了模型的有效性。大量的实证评估表明，GAT PCM增强算法在各种基准测试中显著优于最先进的方法。预训练模型可在以下网址获得：https://github.com/dyc941126/GAT-PCM. 摘要：Distributed Constraint Optimization Problems (DCOPs) are an important subclass of combinatorial optimization problems, where information and controls are distributed among multiple autonomous agents. Previously, Machine Learning (ML) has been largely applied to solve combinatorial optimization problems by learning effective heuristics. However, existing ML-based heuristic methods are often not generalizable to different search algorithms. Most importantly, these methods usually require full knowledge about the problems to be solved, which are not suitable for distributed settings where centralization is not realistic due to geographical limitations or privacy concerns. To address the generality issue, we propose a novel directed acyclic graph representation schema for DCOPs and leverage the Graph Attention Networks (GATs) to embed graph representations. Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to construct effective heuristics to boost a broad range of DCOP algorithms where evaluating the quality of a partial assignment is critical, such as local search or backtracking search. Furthermore, to enable decentralized model inference, we propose a distributed embedding schema of GAT-PCM where each agent exchanges only embedded vectors, and show its soundness and complexity. Finally, we demonstrate the effectiveness of our model by combining it with a local search or a backtracking search algorithm. Extensive empirical evaluations indicate that the GAT-PCM-boosted algorithms significantly outperform the state-of-the-art methods in various benchmarks. The pretrained model is available at https://github.com/dyc941126/GAT-PCM.

【3】 Hyper-parameter optimization based on soft actor critic and hierarchical mixture regularization 标题：基于软角色批评和分层混合正则化的超参数优化链接：https://arxiv.org/abs/2112.04084

作者：Chaoyue Liu,Yulai Zhang 摘要：超参数优化是机器学习中的一个关键问题，因为它的目标是在任何模型中实现最先进的性能。在这一领域已经做出了很大的努力，如随机搜索、网格搜索、贝叶斯优化。在本文中，我们将超参数优化过程建模为一个马尔可夫决策过程，并用强化学习进行处理。提出了一种新的基于软作用子批评和分层混合正则化的超参数优化方法。实验表明，该方法能在较短的时间内获得较好的超参数。摘要：Hyper-parameter optimization is a crucial problem in machine learning as it aims to achieve the state-of-the-art performance in any model. Great efforts have been made in this field, such as random search, grid search, Bayesian optimization. In this paper, we model hyper-parameter optimization process as a Markov decision process, and tackle it with reinforcement learning. A novel hyper-parameter optimization method based on soft actor critic and hierarchical mixture regularization has been proposed. Experiments show that the proposed method can obtain better hyper-parameters in a shorter time.

【4】 Tailored neural networks for learning optimal value functions in MPC 标题：MPC中学习最优值函数的定制神经网络链接：https://arxiv.org/abs/2112.03975

作者：Dieter Teichrib,Moritz Schulze Darup 备注：7 pages, 2 figures, 1 table 摘要：基于学习的预测控制是基于优化的预测控制的一种很有前途的替代方法。然而，有效地学习最优控制策略、最优值函数或Q函数需要合适的函数逼近器。通常，人们会考虑使用人工神经网络（ANN），但选择合适的拓扑结构也很重要。在这种背景下，最近的研究表明，定制的人工神经网络原则上可以利用其分段仿射结构准确地描述线性预测控制中的最优控制策略。在本文中，我们提供了一个类似的结果来表示最优值函数和Q函数，这两个函数都是线性MPC的分段二次函数。摘要：Learning-based predictive control is a promising alternative to optimization-based MPC. However, efficiently learning the optimal control policy, the optimal value function, or the Q-function requires suitable function approximators. Often, artificial neural networks (ANN) are considered but choosing a suitable topology is also non-trivial. Against this background, it has recently been shown that tailored ANN allow, in principle, to exactly describe the optimal control policy in linear MPC by exploiting its piecewise affine structure. In this paper, we provide a similar result for representing the optimal value function and the Q-function that are both known to be piecewise quadratic for linear MPC.

预测|估计(3篇)

【1】 Non parametric estimation of causal populations in a counterfactual scenario 标题：反事实情景下因果总体的非参数估计链接：https://arxiv.org/abs/2112.04288

作者：Celine Beji,Florian Yger,Jamal Atif 摘要：在因果关系中，在没有混淆推理的情况下估计治疗的效果仍然是一个主要问题，因为需要在有和没有治疗的情况下评估结果。由于不能同时观察这两种情况，潜在结果的估计仍然是一项具有挑战性的任务。我们提出了一种创新方法，将问题重新表述为缺失数据模型。目的是估计因果人群的隐藏分布，定义为治疗和结果的函数。因果自动编码器（CAE）通过对治疗和结果信息的先验依赖性增强，将潜在空间同化为目标人群的概率分布。这些特征在被缩减到一个潜在空间后被重建，并被网络中间层引入的包含治疗和结果信息的掩模所约束。摘要：In causality, estimating the effect of a treatment without confounding inference remains a major issue because requires to assess the outcome in both case with and without treatment. Not being able to observe simultaneously both of them, the estimation of potential outcome remains a challenging task. We propose an innovative approach where the problem is reformulated as a missing data model. The aim is to estimate the hidden distribution of emph{causal populations}, defined as a function of treatment and outcome. A Causal Auto-Encoder (CAE), enhanced by a prior dependent on treatment and outcome information, assimilates the latent space to the probability distribution of the target populations. The features are reconstructed after being reduced to a latent space and constrained by a mask introduced in the intermediate layer of the network, containing treatment and outcome information.

【2】 Vision-Cloud Data Fusion for ADAS: A Lane Change Prediction Case Study 标题：面向ADAS的视云数据融合：车道变化预测实例研究链接：https://arxiv.org/abs/2112.04042

作者：Yongkang Liu,Ziran Wang,Kyungtae Han,Zhenyu Shou,Prashant Tiwari,John H. L. Hansen 备注：Published on IEEE Transactions on Intelligent Vehicles 摘要：随着智能车辆和高级驾驶员辅助系统（ADAS）的快速发展，一个新的趋势是交通系统将涉及混合级别的驾驶员参与。因此，在这种情况下，为驾驶员提供必要的视觉指导对于预防潜在风险至关重要。为了推动视觉制导系统的发展，我们引入了一种新的视觉云数据融合方法，将云中的摄像机图像和数字孪生信息集成在一起，以帮助智能车辆做出更好的决策。目标车辆边界框是在目标探测器（在ego车辆上运行）和位置信息（从云端接收）的帮助下绘制和匹配的。以深度图像作为附加特征源，在联合阈值上0.7相交的情况下，获得了79.2%的匹配精度。通过对车道变化预测的实例分析，验证了所提出的数据融合方法的有效性。在案例研究中，提出了一种多层感知器算法和改进的车道变化预测方法。从Unity game engine获得的人在回路仿真结果表明，所提出的模型可以在安全性、舒适性和环境可持续性方面显著改善公路驾驶性能。摘要：With the rapid development of intelligent vehicles and Advanced Driver-Assistance Systems (ADAS), a new trend is that mixed levels of human driver engagements will be involved in the transportation system. Therefore, necessary visual guidance for drivers is vitally important under this situation to prevent potential risks. To advance the development of visual guidance systems, we introduce a novel vision-cloud data fusion methodology, integrating camera image and Digital Twin information from the cloud to help intelligent vehicles make better decisions. Target vehicle bounding box is drawn and matched with the help of the object detector (running on the ego-vehicle) and position information (received from the cloud). The best matching result, a 79.2% accuracy under 0.7 intersection over union threshold, is obtained with depth images served as an additional feature source. A case study on lane change prediction is conducted to show the effectiveness of the proposed data fusion methodology. In the case study, a multi-layer perceptron algorithm is proposed with modified lane change prediction approaches. Human-in-the-loop simulation results obtained from the Unity game engine reveal that the proposed model can improve highway driving performance significantly in terms of safety, comfort, and environmental sustainability.

【3】 Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing 标题：旋转不变广义线性模型的近似消息传递估计链接：https://arxiv.org/abs/2112.04330

作者：Ramji Venkataramanan,Kevin Kögler,Marco Mondelli 备注：31 pages, 4 figures 摘要：我们考虑通过旋转不变的设计矩阵定义的广义线性模型中的信号估计问题。由于这些矩阵可以具有任意的谱分布，因此该模型非常适合捕捉应用中经常出现的复杂相关结构。我们提出了一系列新的近似消息传递（AMP）信号估计算法，并通过状态演化递归严格描述了它们在高维极限下的性能。假设知道设计矩阵谱，我们的旋转不变放大器的复杂度与现有的高斯矩阵放大器的复杂度相同；作为特例，它还可以恢复现有的放大器。数值结果表明，该算法的性能接近于向量AMP（在某些情况下被认为是Bayes最优的），但其复杂度要低得多，因为该算法不需要计算昂贵的奇异值分解。摘要：We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices. Since these matrices can have an arbitrary spectral distribution, this model is well suited to capture complex correlation structures which often arise in applications. We propose a novel family of approximate message passing (AMP) algorithms for signal estimation, and rigorously characterize their performance in the high-dimensional limit via a state evolution recursion. Assuming knowledge of the design matrix spectrum, our rotationally invariant AMP has complexity of the same order as the existing AMP for Gaussian matrices; it also recovers the existing AMP as a special case. Numerical results showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal in some settings), but obtained with a much lower complexity, as the proposed algorithm does not require a computationally expensive singular value decomposition.

其他神经网络|深度学习|模型|建模(18篇)

【1】 MLP Architectures for Vision-and-Language Modeling: An Empirical Study 标题：用于视觉和语言建模的MLP体系结构：一项实证研究链接：https://arxiv.org/abs/2112.04453

作者：Yixin Nie,Linjie Li,Zhe Gan,Shuohang Wang,Chenguang Zhu,Michael Zeng,Zicheng Liu,Mohit Bansal,Lijuan Wang 备注：15 pages 摘要：我们启动了第一个关于使用MLP架构进行视觉和语言（VL）融合的实证研究。通过对5个VL任务和5个稳健的VQA基准的大量实验，我们发现：（i）在没有预训练的情况下，使用MLP进行多模融合与Transformer相比有明显的性能差距；（ii）然而，VL预训练有助于缩小绩效差距；（iii）在MLP上增加微小的单头注意，而不是沉重的多头注意，足以实现与Transformer相当的性能。此外，我们还发现，当在更硬的鲁棒VQA基准上进行评估时，MLP和Transformer之间的性能差距并未扩大，这表明将MLP用于VL融合可以大致推广到与使用Transformer类似的程度。这些结果提示，MLP可以有效地学习对齐从低级编码器中提取的视觉和文本特征，而无需严重依赖自我注意。基于此，我们提出了一个更大胆的问题：我们是否可以有一个用于VL建模的全MLP架构，其中VL融合和视觉编码器都被MLP取代？我们的结果表明，当两个模型都经过预训练时，与最先进的全功能VL模型相比，全MLP VL模型是次优的。然而，与未经预训练的全功能Transformer型号相比，全MLP的预训练平均得分更高。这表明了大规模预训练MLP类体系结构用于VL建模的潜力，并启发了未来的研究方向，即以较少的归纳设计偏差简化已建立的VL建模。我们的代码可在以下网站公开获取：https://github.com/easonnie/mlp-vil 摘要：We initiate the first empirical study on the use of MLP architectures for vision-and-language (VL) fusion. Through extensive experiments on 5 VL tasks and 5 robust VQA benchmarks, we find that: (i) Without pre-training, using MLPs for multimodal fusion has a noticeable performance gap compared to transformers; (ii) However, VL pre-training can help close the performance gap; (iii) Instead of heavy multi-head attention, adding tiny one-head attention to MLPs is sufficient to achieve comparable performance to transformers. Moreover, we also find that the performance gap between MLPs and transformers is not widened when being evaluated on the harder robust VQA benchmarks, suggesting using MLPs for VL fusion can generalize roughly to a similar degree as using transformers. These results hint that MLPs can effectively learn to align vision and text features extracted from lower-level encoders without heavy reliance on self-attention. Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs? Our result shows that an all-MLP VL model is sub-optimal compared to state-of-the-art full-featured VL models when both of them get pre-trained. However, pre-training an all-MLP can surprisingly achieve a better average score than full-featured transformer models without pre-training. This indicates the potential of large-scale pre-training of MLP-like architectures for VL modeling and inspires the future research direction on simplifying well-established VL modeling with less inductive design bias. Our code is publicly available at: https://github.com/easonnie/mlp-vil

【2】 Improving language models by retrieving from trillions of tokens 标题：通过从数万亿个令牌中检索来改进语言模型链接：https://arxiv.org/abs/2112.04426

作者：Sebastian Borgeaud,Arthur Mensch,Jordan Hoffmann,Trevor Cai,Eliza Rutherford,Katie Millican,George van den Driessche,Jean-Baptiste Lespiau,Bogdan Damoc,Aidan Clark,Diego de Las Casas,Aurelia Guy,Jacob Menick,Roman Ring,Tom Hennigan,Saffron Huang,Loren Maggiore,Chris Jones,Albin Cassirer,Andy Brock,Michela Paganini,Geoffrey Irving,Oriol Vinyals,Simon Osindero,Karen Simonyan,Jack W. Rae,Erich Elsen,Laurent Sifre 摘要：我们通过基于与前面标记的局部相似性，对从大型语料库检索到的文档块进行条件化处理来增强自回归语言模型。通过一个2万亿美元的令牌数据库，我们的检索增强型Transformer（RETRO）在桩上获得了与GPT-3和Jurassic-1相当的性能，尽管使用的参数少了25$倍。经过微调后，RETRO性能将转化为下游知识密集型任务，如问答。RETRO结合了一个冻结的Bert检索器、一个可微编码器和一个分块交叉注意机制，根据比训练期间通常消耗的数据多一个数量级的数据来预测令牌。我们通常从头开始训练RETRO，但也可以通过检索快速改装预先训练过的Transformer，并且仍然获得良好的性能。我们的工作以前所未有的规模通过显式记忆为改进语言模型开辟了新的途径。摘要：We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

【3】 Mixed Membership Distribution-Free model 标题：混合成员分布--自由模型链接：https://arxiv.org/abs/2112.04389

作者：Huan Qing 备注：15 pages, 7 figures, comments are welcome 摘要：考虑混合成员加权网络中的潜在社团信息的检测问题，其中节点具有混合成员，节点之间的连接可以是有限实数。针对这个问题，我们提出了一个通用的混合成员分布自由模型。该模型没有边的分布约束，只有期望值，可以看作是以前一些模型的推广。我们使用一种有效的谱算法来估计模型下的社区成员。我们还利用精细谱分析推导了该算法在该模型下的收敛速度。我们展示了混合成员分布自由模型的优势，并将其应用于小规模的模拟网络，当边遵循不同的分布时。摘要：We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using delicate spectral analysis. We demonstrate the advantages of mixed membership distribution-free model with applications to a small scale of simulated networks when edges follow different distributions.

【4】 Player Modeling using Behavioral Signals in Competitive Online Games 标题：好胜网络游戏中基于行为信号的玩家建模链接：https://arxiv.org/abs/2112.04379

作者：Arman Dehpanah,Muheeb Faizan Ghori,Jonathan Gemmell,Bamshad Mobasher 备注：Accepted in the 2021 International Conference on Computational Science and Computational Intelligence (CSCI'21) 摘要：竞争性在线游戏使用评级系统来匹配具有类似技能的玩家，以确保玩家获得满意的体验。在本文中，我们重点关注在为玩家建模以创建比赛时解决游戏行为不同方面的重要性。为此，我们从75000多个battle royale比赛数据集中设计了几个行为特征，并根据检索到的特征创建玩家模型。然后，我们使用创建的模型来预测数据中不同玩家组的排名。将预测的等级与三种流行评级系统的等级进行比较。我们的结果显示了简单行为模型优于主流评级系统。一些行为特征为所有玩家组提供了准确的预测，而另一些则被证明对某些玩家组有用。这项研究的结果强调了在分配任务时考虑玩家行为的不同方面的必要性，如目标、策略和专业知识。摘要：Competitive online games use rating systems to match players with similar skills to ensure a satisfying experience for players. In this paper, we focus on the importance of addressing different aspects of playing behavior when modeling players for creating match-ups. To this end, we engineer several behavioral features from a dataset of over 75,000 battle royale matches and create player models based on the retrieved features. We then use the created models to predict ranks for different groups of players in the data. The predicted ranks are compared to those of three popular rating systems. Our results show the superiority of simple behavioral models over mainstream rating systems. Some behavioral features provided accurate predictions for all groups of players while others proved useful for certain groups of players. The results of this study highlight the necessity of considering different aspects of the player's behavior such as goals, strategy, and expertise when making assignments.

【5】 Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks 标题：迭代恢复算法的神经网络推广误差界链接：https://arxiv.org/abs/2112.04364

作者：Ekkehard Schnoor,Arash Behboodi,Holger Rauhut 备注：29 pages, 6 figures 摘要：受学习迭代软阈值算法（LISTA）的启发，我们介绍了一类适用于从少量线性测量值进行稀疏重建的神经网络。通过允许各层之间广泛程度的权重共享，我们能够对非常不同的神经网络类型进行统一分析，从递归神经网络到更类似于标准前馈神经网络的网络。基于训练样本，通过经验风险最小化，我们的目标是学习最佳网络参数，从而获得从低维线性测量重构信号的最佳网络。我们通过分析由这样的深度网络组成的假设类的Rademacher复杂度得出了推广界，同时也考虑了阈值参数。我们得到了样本复杂度的估计值，该估计值本质上仅与参数数量和深度成线性关系。我们应用我们的主要结果来获得几个实际例子的特定泛化边界，包括（隐式）字典学习的不同算法和卷积神经网络。摘要：Motivated by the learned iterative soft thresholding algorithm (LISTA), we introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types, ranging from recurrent ones to networks more similar to standard feedforward neural networks. Based on training samples, via empirical risk minimization we aim at learning the optimal network parameters and thereby the optimal network that reconstructs signals from their low-dimensional linear measurements. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks, that also take into account the thresholding parameters. We obtain estimates of the sample complexity that essentially depend only linearly on the number of parameters and on the depth. We apply our main result to obtain specific generalization bounds for several practical examples, including different algorithms for (implicit) dictionary learning, and convolutional neural networks.

【6】 Deep Learning and Mathematical Intuition: A Review of (Davies et al. 2021) 标题：深度学习与数学直觉：戴维斯等人(Davies et al.2021年) 链接：https://arxiv.org/abs/2112.04324

作者：Ernest Davis 摘要：Davies等人（2021年）最近的一篇论文描述了深度学习（DL）技术是如何被用来发现可能的假设，这些假设导致了两个原始的数学结果：一个是在结理论中，一个是在表征理论中。在这里，我认为DL技术应用于数学的重要性和新颖性在所审查的论文中被大大夸大了，在大众科学出版社的一些报道中也被大大夸大了。在纽结理论的结果中，DL的作用很小，传统的统计分析可能已经足够了。在表征理论结果中，DL的作用更大；然而，它与几十年来在实验数学中所做的并没有太大区别。此外，目前还不清楚DL的显著特征是否能在广泛的数学问题中应用。最后，我认为这里的DL“引导人类直觉”是没有帮助和误导的；DL主要做的是将许多可能的猜测标记为错误，并将其他一些猜测标记为可能值得研究。当然，表征理论的结果代表了DL在数学研究中的一个新颖而有趣的应用，但其更大的意义尚不确定。摘要：A recent paper by Davies et al (2021) describes how deep learning (DL) technology was used to find plausible hypotheses that have led to two original mathematical results: one in knot theory, one in representation theory. I argue here that the significance and novelty of this application of DL technology to mathematics is significantly overstated in the paper under review and has been wildly overstated in some of the accounts in the popular science press. In the knot theory result, the role of DL was small, and a conventional statistical analysis would probably have sufficed. In the representation theory result, the role of DL is much larger; however, it is not very different in kind from what has been done in experimental mathematics for decades. Moreover, it is not clear whether the distinctive features of DL that make it useful here will apply across a wide range of mathematical problems. Finally, I argue that the DL here "guides human intuition" is unhelpful and misleading; what the DL does primarily does is to mark many possible conjectures as false and a few others as possibly worthy of study. Certainly the representation theory result represents an original and interesting application of DL to mathematical research, but its larger significance is uncertain.

【7】 FastSGD: A Fast Compressed SGD Framework for Distributed Machine Learning 标题：FastSGD：一种面向分布式机器学习的快速压缩SGD框架链接：https://arxiv.org/abs/2112.04291

作者：Keyu Yang,Lu Chen,Zhihao Zeng,Yunjun Gao 摘要：随着大数据量的快速增长，分布式机器学习（ML）被广泛应用于大规模模型的训练。随机梯度下降（SGD）可以说是ML的主要算法。由SGD训练的分布式ML模型涉及大量的梯度通信，这限制了分布式ML的可扩展性。因此，压缩梯度对于减少通信非常重要。在本文中，我们提出了一种用于分布式ML的快速压缩SGD框架FastSGD。为了以低成本实现高压缩比，FastSGD将梯度表示为键值对，并以线性时间复杂度压缩梯度键和值。对于梯度值压缩，FastSGD首先使用倒数映射器将原始值转换为倒数值，然后利用对数量化将倒数值进一步减少为小整数。最后，FastSGD滤波器将梯度整数减少一个给定的阈值。对于梯度密钥压缩，FastSGD提供了一种自适应细粒度增量编码方法来存储具有较少比特的梯度密钥。在实际ML模型和数据集上的大量实验表明，与最先进的方法相比，FastSGD实现了高达4个数量级的压缩比，并将收敛时间加快了8倍。摘要：With the rapid increase of big data, distributed Machine Learning (ML) has been widely applied in training large-scale models. Stochastic Gradient Descent (SGD) is arguably the workhorse algorithm of ML. Distributed ML models trained by SGD involve large amounts of gradient communication, which limits the scalability of distributed ML. Thus, it is important to compress the gradients for reducing communication. In this paper, we propose FastSGD, a Fast compressed SGD framework for distributed ML. To achieve a high compression ratio at a low cost, FastSGD represents the gradients as key-value pairs, and compresses both the gradient keys and values in linear time complexity. For the gradient value compression, FastSGD first uses a reciprocal mapper to transform original values into reciprocal values, and then, it utilizes a logarithm quantization to further reduce reciprocal values to small integers. Finally, FastSGD filters reduced gradient integers by a given threshold. For the gradient key compression, FastSGD provides an adaptive fine-grained delta encoding method to store gradient keys with fewer bits. Extensive experiments on practical ML models and datasets demonstrate that FastSGD achieves the compression ratio up to 4 orders of magnitude, and accelerates the convergence time up to 8x, compared with state-of-the-art methods.

【8】 Learning over All Stabilizing Nonlinear Controllers for a Partially-Observed Linear System 标题：部分观测线性系统的全镇定非线性控制器学习链接：https://arxiv.org/abs/2112.04219

作者：Ruigang Wang,Nicholas Barbara,Max Revay,Ian R. Manchester 备注：submitted to L4DC-2022 摘要：基于最近发展起来的一类称为回归平衡网络（REN）的神经网络和Youla参数化的非线性版本，我们提出了一种线性动力系统非线性输出反馈控制器的参数化方法。我们的方法保证了部分可观测线性动力系统的闭环稳定性，而不需要满足任何约束条件。这大大简化了模型拟合，因为在保持稳定性的同时，可以应用任何无约束优化程序。我们用精确梯度法和近似梯度法证明了我们在强化学习任务中的方法。仿真研究表明，在相同的问题设置下，我们的方法具有更大的可扩展性，并且明显优于其他方法。摘要：We propose a parameterization of nonlinear output feedback controllers for linear dynamical systems based on a recently developed class of neural network called the recurrent equilibrium network (REN), and a nonlinear version of the Youla parameterization. Our approach guarantees the closed-loop stability of partially observable linear dynamical systems without requiring any constraints to be satisfied. This significantly simplifies model fitting as any unconstrained optimization procedure can be applied whilst still maintaining stability. We demonstrate our method on reinforcement learning tasks with both exact and approximate gradient methods. Simulation studies show that our method is significantly more scalable and significantly outperforms other approaches in the same problem setting.

【9】 Contrastive Instruction-Trajectory Learning for Vision-Language Navigation 标题：视觉语言导航的对比教学-轨迹学习链接：https://arxiv.org/abs/2112.04138

作者：Xiwen Liang,Fengda Zhu,Yi Zhu,Bingqian Lin,Bing Wang,Xiaodan Liang 备注：Accepted by AAAI 2022 摘要：视觉语言导航（VLN）任务要求agent在自然语言指令的指导下到达目标。以前的作品学习按照指示一步一步地导航。然而，这些工作可能无法区分指令轨迹对之间的相似性和差异，并且忽略了子指令的时间连续性。这些问题阻碍了代理学习独特的视觉和语言表示，损害了导航策略的健壮性和通用性。在本文中，我们提出了一个对比指令轨迹学习（CITL）框架，该框架探索了相似数据样本之间的不变性和不同数据样本之间的差异，以学习鲁棒导航的独特表示。具体而言，我们提出：（1）粗粒度对比学习目标，通过对比全轨迹观察和指令的语义来增强视觉和语言表征；（2）利用子指令的时间信息感知指令的细粒度对比学习目标；（3）一种用于对比学习的成对样本重加权机制，用于挖掘硬样本，从而减轻对比学习中数据采样偏差的影响。我们的CITL可以轻松地与VLN主干集成，形成新的学习范式，并在看不见的环境中实现更好的通用性。大量实验表明，使用CITL的模型在R2R、R4R和RxR上优于以前的最新方法。摘要：The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-by-step following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, harming the robustness and generalizability of the navigation policy. In this paper, we propose a Contrastive Instruction-Trajectory Learning (CITL) framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation. Specifically, we propose: (1) a coarse-grained contrastive learning objective to enhance vision-and-language representations by contrasting semantics of full trajectory observations and instructions, respectively; (2) a fine-grained contrastive learning objective to perceive instructions by leveraging the temporal information of the sub-instructions; (3) a pairwise sample-reweighting mechanism for contrastive learning to mine hard samples and hence mitigate the influence of data sampling bias in contrastive learning. Our CITL can be easily integrated with VLN backbones to form a new learning paradigm and achieve better generalizability in unseen environments. Extensive experiments show that the model with CITL surpasses the previous state-of-the-art methods on R2R, R4R, and RxR.

【10】 Uncovering the Local Hidden Community Structure in Social Networks 标题：揭示社会网络中的局部隐藏社区结构链接：https://arxiv.org/abs/2112.04100

作者：Meng Wang,Boyu Li,Kun He,John E. Hopcroft 备注：22 pages, 9 figures, submitted to a journal 摘要：隐藏社区是最近提出的一个有用的社会网络分析概念。为了应对网络规模的快速增长，在这项工作中，我们从局部的角度探索隐藏社区的检测，并提出了一种新的方法，该方法在从原始网络采样的子图上迭代检测和提升每一层。我们首先基于改进的局部谱方法从单个种子节点扩展种子集，并检测出一个初始的优势局部群体。然后，我们临时删除该社区的成员以及它们与其他节点的连接，并检测剩余子图中的所有邻域社区，包括一些仅包含原始网络中一小部分成员的“断开的社区”。本地社区和邻里社区形成了一个主导层，通过减少这些社区内的边缘权重，我们削弱了该层的结构，以揭示隐藏层。最后，我们重复整个过程，所有包含种子节点的社区都可以被检测到并迭代地提升。我们从理论上证明，我们的方法可以避免一些情况，即一个破碎的社区和局部社区被视为子图中的一个社区，从而导致由全局隐藏社区检测方法导致的检测不准确。大量的实验表明，我们的方法可以显著优于设计用于全局隐藏社区检测或多个局部社区检测的最新基线。摘要：Hidden community is a useful concept proposed recently for social network analysis. To handle the rapid growth of network scale, in this work, we explore the detection of hidden communities from the local perspective, and propose a new method that detects and boosts each layer iteratively on a subgraph sampled from the original network. We first expand the seed set from a single seed node based on our modified local spectral method and detect an initial dominant local community. Then we temporarily remove the members of this community as well as their connections to other nodes, and detect all the neighborhood communities in the remaining subgraph, including some "broken communities" that only contain a fraction of members in the original network. The local community and neighborhood communities form a dominant layer, and by reducing the edge weights inside these communities, we weaken this layer's structure to reveal the hidden layers. Eventually, we repeat the whole process and all communities containing the seed node can be detected and boosted iteratively. We theoretically show that our method can avoid some situations that a broken community and the local community are regarded as one community in the subgraph, leading to the inaccuracy on detection which can be caused by global hidden community detection methods. Extensive experiments show that our method could significantly outperform the state-of-the-art baselines designed for either global hidden community detection or multiple local community detection.

【11】 The Effect of Model Size on Worst-Group Generalization 标题：模型规模对最差组泛化的影响链接：https://arxiv.org/abs/2112.04094

作者：Alan Pham,Eunice Chan,Vikranth Srivatsa,Dhruba Ghosh,Yaoqing Yang,Yaodong Yu,Ruiqi Zhong,Joseph E. Gonzalez,Jacob Steinhardt 备注：The first four authors contributed equally to the work 摘要：在子群信息已知的各种设置下，过度参数化会导致罕见子群的测试精度较低。为了获得更完整的图片，我们考虑的情况下，子群信息是未知的。我们研究了在经验风险最小化（ERM）条件下，模型大小对最差群体泛化的影响，包括：1）架构（ResNet、VGG或BERT），2）领域（视觉或自然语言处理），3）模型大小（宽度或深度），以及4）初始化（预训练或随机权重）。我们的系统评估表明，模型尺寸的增加不会影响，并且可能有助于在所有设置中ERM下最差的组测试性能。特别是，增加预先训练的模型大小，可以持续提高水鸟和MultiNLI的性能。当子组标签未知时，我们建议从业者使用更大的预训练模型。摘要：Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown.

【12】 KoopmanizingFlows: Diffeomorphically Learning Stable Koopman Operators 标题：KoopmanizingFlows：微分学习稳定的Koopman算子链接：https://arxiv.org/abs/2112.04085

作者：Petar Bevanda,Max Beier,Sebastian Kerz,Armin Lederer,Stefan Sosnowski,Sandra Hirche 备注：Submitted to the 4th Annual Learning for Dynamics & Control Conference 摘要：我们提出了一个新的框架，用于构造一类稳定非线性动力学的库普曼算子数据驱动表示的线性时不变（LTI）模型。Koopman算子（生成器）将有限维非线性系统提升到可能无限维的线性特征空间。为了利用它进行建模，需要发现Koopman算子的有限维表示。学习合适的特征具有挑战性，因为人们需要学习库普曼不变（在动态下线性演化）以及相关（跨越原始状态）的LTI特征——这是一项通常无监督的学习任务。为了从理论上很好地解决这个问题，我们建议通过组成一个具有潜在线性模型的提升聚合系统的微分同胚学习器来学习Koopman不变坐标。使用稳定矩阵的无约束参数化以及上述特征构造，我们学习Koopman算子特征，无需假设预定义的函数库或了解谱，同时确保稳定性，而不考虑算子近似精度。在著名的LASA手写数据集上，与最先进的方法相比，我们证明了该方法的优越性。摘要：We propose a novel framework for constructing linear time-invariant (LTI) models for data-driven representations of the Koopman operator for a class of stable nonlinear dynamics. The Koopman operator (generator) lifts a finite-dimensional nonlinear system to a possibly infinite-dimensional linear feature space. To utilize it for modeling, one needs to discover finite-dimensional representations of the Koopman operator. Learning suitable features is challenging, as one needs to learn LTI features that are both Koopman-invariant (evolve linearly under the dynamics) as well as relevant (spanning the original state) - a generally unsupervised learning task. For a theoretically well-founded solution to this problem, we propose learning Koopman-invariant coordinates by composing a diffeomorphic learner with a lifted aggregate system of a latent linear model. Using an unconstrained parameterization of stable matrices along with the aforementioned feature construction, we learn the Koopman operator features without assuming a predefined library of functions or knowing the spectrum, while ensuring stability regardless of the operator approximation accuracy. We demonstrate the superior efficacy of the proposed method in comparison to a state-of-the-art method on the well-known LASA handwriting dataset.

【13】 A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules 标题：一种可移植的多芯片模块机器学习模型划分方法链接：https://arxiv.org/abs/2112.04041

作者：Xinfeng Xie,Prakash Prabhu,Ulysse Beaugnon,Phitchaya Mangpo Phothilimthana,Sudip Roy,Azalia Mirhoseini,Eugene Brevdo,James Laudon,Yanqi Zhou 摘要：多芯片模块（MCMS）降低了机器学习（ML）加速器的设计和制造成本，同时将性能和能量效率与单片大芯片相媲美。然而，针对MCM的ML编译器需要以最佳方式高效地解决复杂的优化问题，以实现这种高性能。其中一个问题是多芯片分区问题，编译器在MCM中的芯片上确定张量计算图中操作的最佳分区和位置。为MCM划分ML图特别困难，因为搜索空间随着可用芯片的数量和神经网络中节点的数量呈指数增长。此外，底层硬件施加的约束会产生一个搜索空间，其中有效的解决方案非常稀疏。在本文中，我们提出了一种策略，使用深度强化学习（RL）框架来生成可能无效的候选分区，然后由约束求解器进行纠正。与未学习策略相比，使用约束解算器可确保RL在稀疏空间中遇到有效解的频率足以收敛到更少的样本。我们为策略网络所做的架构选择允许我们在不同的ML图之间进行概括。我们在实际硬件上对生产规模模型BERT的评估表明，使用RL策略生成的分区的吞吐量比随机搜索和模拟退火分别高6.11%和5.85%。此外，微调预先训练的RL策略将搜索时间从3小时减少到仅9分钟，同时实现与从头开始训练RL策略相同的吞吐量。摘要：Multi-Chip-Modules (MCMs) reduce the design and fabrication cost of machine learning (ML) accelerators while delivering performance and energy efficiency on par with a monolithic large chip. However, ML compilers targeting MCMs need to solve complex optimization problems optimally and efficiently to achieve this high performance. One such problem is the multi-chip partitioning problem where compilers determine the optimal partitioning and placement of operations in tensor computation graphs on chiplets in MCMs. Partitioning ML graphs for MCMs is particularly hard as the search space grows exponentially with the number of chiplets available and the number of nodes in the neural network. Furthermore, the constraints imposed by the underlying hardware produce a search space where valid solutions are extremely sparse. In this paper, we present a strategy using a deep reinforcement learning (RL) framework to emit a possibly invalid candidate partition that is then corrected by a constraint solver. Using the constraint solver ensures that RL encounters valid solutions in the sparse space frequently enough to converge with fewer samples as compared to non-learned strategies. The architectural choices we make for the policy network allow us to generalize across different ML graphs. Our evaluation of a production-scale model, BERT, on real hardware reveals that the partitioning generated using RL policy achieves 6.11% and 5.85% higher throughput than random search and simulated annealing. In addition, fine-tuning the pre-trained RL policy reduces the search time from 3 hours to only 9 minutes, while achieving the same throughput as training RL policy from scratch.

【14】 DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs 标题：深度诊断：在深度学习计划中自动诊断故障并推荐可行的修复链接：https://arxiv.org/abs/2112.04036

作者：Mohammad Wardat,Breno Dantas Cruz,Wei Le,Hridesh Rajan 备注：Accepted at ICSE 2022 摘要：深度神经网络（DNN）有着广泛的应用。然而，与任何软件应用程序一样，基于DNN的应用程序也受到bug的困扰。以前的工作观察到DNN错误修复模式不同于传统的错误修复模式。此外，由于有多个选项可以修复莫名其妙的错误，这些错误模型的诊断和修复非常困难。为了支持开发人员定位和修复bug，我们提出了DeepDiagnosis，这是一种新的调试方法，用于定位故障、报告错误症状并为DNN程序提出修复建议。在第一阶段，我们的技术监控一个训练模型，定期检查八种类型的错误条件。然后，如果出现问题，它会报告包含足够信息的消息，以便对模型执行可操作的修复。在评估中，我们彻底检查了444个模型——53个来自GitHub和Stack Overflow的真实模型，以及391个由AUTOTRAINER策划的模型。与UMLUAT和DeepLocalize相比，DeepDiagnosis具有更高的准确性。我们的技术在故障定位方面比AUTOTRAINER更快。结果表明，我们的方法可以支持其他类型的模型，而最先进的方法只能处理分类模型。我们的技术能够报告在训练期间不会表现为数字错误的错误。此外，它还可以为修复提供可操作的见解，而DeepLocalize只能报告在训练期间导致数值错误的故障。与其他方法相比，DeepDiagnosis具有最佳的故障检测、缺陷定位和症状识别能力。摘要：Deep Neural Networks (DNNs) are used in a wide variety of applications. However, as in any software application, DNN-based apps are afflicted with bugs. Previous work observed that DNN bug fix patterns are different from traditional bug fix patterns. Furthermore, those buggy models are non-trivial to diagnose and fix due to inexplicit errors with several options to fix them. To support developers in locating and fixing bugs, we propose DeepDiagnosis, a novel debugging approach that localizes the faults, reports error symptoms and suggests fixes for DNN programs. In the first phase, our technique monitors a training model, periodically checking for eight types of error conditions. Then, in case of problems, it reports messages containing sufficient information to perform actionable repairs to the model. In the evaluation, we thoroughly examine 444 models -53 real-world from GitHub and Stack Overflow, and 391 curated by AUTOTRAINER. DeepDiagnosis provides superior accuracy when compared to UMLUAT and DeepLocalize. Our technique is faster than AUTOTRAINER for fault localization. The results show that our approach can support additional types of models, while state-of-the-art was only able to handle classification ones. Our technique was able to report bugs that do not manifest as numerical errors during training. Also, it can provide actionable insights for fix whereas DeepLocalize can only report faults that lead to numerical errors during training. DeepDiagnosis manifests the best capabilities of fault detection, bug localization, and symptoms identification when compared to other approaches.

【15】 SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning 标题：SHRIMP：基于迭代幅度剪枝的稀疏随机特征模型链接：https://arxiv.org/abs/2112.04002

作者：Yuege Xie,Bobby Shi,Hayden Schaeffer,Rachel Ward 摘要：稀疏收缩加性模型和稀疏随机特征模型分别作为学习低阶函数的方法被开发，在低阶函数中，变量之间几乎没有交互作用，但两者都不能提供计算效率。另一方面，基于$ellu 2$的收缩加性模型是有效的，但由于生成的系数向量密集，因此不提供特征选择。受迭代幅度剪枝技术在寻找神经网络彩票方面的成功启发，我们提出了一种新方法——通过IMP（ShRIMP）建立稀疏随机特征模型——以稀疏变量依赖的形式有效地拟合具有固有低维结构的高维数据。我们的方法可以看作是构造和查找两层密集网络的稀疏彩票的组合过程。我们通过对阈值基追踪的泛化误差和由此产生的特征值界的精细分析，解释了SHRIMP的观察优势。通过对合成数据和真实基准数据集的函数近似实验，我们表明SHRIMP比最先进的稀疏特征和加法（如SRFE-S、SSAM和SALSA）获得更好的测试精度或具有竞争力的测试精度。同时，SHRIMP以较低的计算复杂度执行特征选择，并且对剪枝率具有鲁棒性，表明所获得的子网络结构具有鲁棒性。通过SHRIMP，我们发现了我们的模型和重量/神经元子网络之间的对应关系，从而深入了解了彩票假设。摘要：Sparse shrunk additive models and sparse random feature models have been developed separately as methods to learn low-order functions, where there are few interactions between variables, but neither offers computational efficiency. On the other hand, $ell_2$-based shrunk additive models are efficient but do not offer feature selection as the resulting coefficient vectors are dense. Inspired by the success of the iterative magnitude pruning technique in finding lottery tickets of neural networks, we propose a new method -- Sparser Random Feature Models via IMP (ShRIMP) -- to efficiently fit high-dimensional data with inherent low-dimensional structure in the form of sparse variable dependencies. Our method can be viewed as a combined process to construct and find sparse lottery tickets for two-layer dense networks. We explain the observed benefit of SHRIMP through a refined analysis on the generalization error for thresholded Basis Pursuit and resulting bounds on eigenvalues. From function approximation experiments on both synthetic data and real-world benchmark datasets, we show that SHRIMP obtains better than or competitive test accuracy compared to state-of-art sparse feature and additive methods such as SRFE-S, SSAM, and SALSA. Meanwhile, SHRIMP performs feature selection with low computational complexity and is robust to the pruning rate, indicating a robustness in the structure of the obtained subnetworks. We gain insight into the lottery ticket hypothesis through SHRIMP by noting a correspondence between our model and weight/neuron subnetworks.

【16】 Optimistic Rates: A Unifying Theory for Interpolation Learning and Regularization in Linear Regression 标题：乐观率：线性回归中插值学习和正则化的统一理论链接：https://arxiv.org/abs/2112.04470

作者：Lijia Zhou,Frederic Koehler,Danica J. Sutherland,Nathan Srebro 摘要：我们研究了高斯数据线性回归的局部一致收敛概念，称为“乐观率”（Panchenko 2002；Srebro等人，2010）。我们的精细分析避免了现有结果中隐藏的常数和对数因子，这在高维环境中是至关重要的，特别是对于理解插值学习。作为特例，我们的分析恢复了Koehler et al.（2021）的保证，该保证紧密地刻画了良性过拟合条件下低范数插值器的总体风险。然而，我们的乐观率界也分析了具有任意训练误差的预测值。这使我们能够恢复随机设计下岭回归和套索回归的一些经典统计保证，并帮助我们获得对过度参数化区域中近内插子过度风险的精确理解。摘要：We study a localized notion of uniform convergence known as an "optimistic rate" (Panchenko 2002; Srebro et al. 2010) for linear regression with Gaussian data. Our refined analysis avoids the hidden constant and logarithmic factor in existing results, which are known to be crucial in high-dimensional settings, especially for understanding interpolation learning. As a special case, our analysis recovers the guarantee from Koehler et al. (2021), which tightly characterizes the population risk of low-norm interpolators under the benign overfitting conditions. Our optimistic rate bound, though, also analyzes predictors with arbitrary training error. This allows us to recover some classical statistical guarantees for ridge and LASSO regression under random designs, and helps us obtain a precise understanding of the excess risk of near-interpolators in the over-parameterized regime.

【17】 Learning Linear Models Using Distributed Iterative Hessian Sketching 标题：使用分布式迭代Hessian草图学习线性模型链接：https://arxiv.org/abs/2112.04101

作者：Han Wang,James Anderson 摘要：这项工作考虑的问题，学习马尔可夫参数的线性系统从观测数据。最近的非渐近系统辨识结果在单卷展栏和多卷展栏设置下表征了该问题的样本复杂性。在这两种情况下，获得可接受估计所需的样本数量可能会产生二阶算法的决策变量数量非常多的优化问题。我们证明了基于Hessian素描的随机分布牛顿算法可以产生$epsilon$最优解并几何收敛。此外，该算法具有可并行性。我们的结果适用于各种草图矩阵，并用数值例子说明了理论。摘要：This work considers the problem of learning the Markov parameters of a linear system from observed data. Recent non-asymptotic system identification results have characterized the sample complexity of this problem in the single and multi-rollout setting. In both instances, the number of samples required in order to obtain acceptable estimates can produce optimization problems with an intractably large number of decision variables for a second-order algorithm. We show that a randomized and distributed Newton algorithm based on Hessian-sketching can produce $epsilon$-optimal solutions and converges geometrically. Moreover, the algorithm is trivially parallelizable. Our results hold for a variety of sketching matrices and we illustrate the theory with numerical examples.

【18】 A deep learning model for data-driven discovery of functional connectivity 标题：一种用于数据驱动的功能连通性发现的深度学习模型链接：https://arxiv.org/abs/2112.04013

作者：Usman Mahmood,Zening Fu,Vince Calhoun,Sergey Plis 备注：None 摘要：功能连接性（FC）研究已经证明了通过功能磁共振相关矩阵的无向加权图研究大脑及其疾病的总体价值。然而，FC的大部分工作取决于连接性的计算方式，并进一步取决于FC矩阵的手动事后分析。在这项工作中，我们提出了一个深度学习架构BrainGNN，它学习连通性结构，作为学习分类主题的一部分。它同时将图形神经网络应用于该学习图，并学习选择对预测任务重要的大脑区域的稀疏子集。我们在精神分裂症功能磁共振数据集上展示了该模型最先进的分类性能，并展示了内省如何导致与疾病相关的发现。该模型学习到的图具有很强的类判别性，相关区域的稀疏子集与文献一致。摘要：Functional connectivity (FC) studies have demonstrated the overarching value of studying the brain and its disorders through the undirected weighted graph of fMRI correlation matrix. Most of the work with the FC, however, depends on the way the connectivity is computed, and further depends on the manual post-hoc analysis of the FC matrices. In this work we propose a deep learning architecture BrainGNN that learns the connectivity structure as part of learning to classify subjects. It simultaneously applies a graphical neural network to this learned graph and learns to select a sparse subset of brain regions important to the prediction task. We demonstrate the model's state-of-the-art classification performance on a schizophrenia fMRI dataset and demonstrate how introspection leads to disorder relevant findings. The graphs learned by the model exhibit strong class discrimination and the sparse subset of relevant regions are consistent with the schizophrenia literature.

其他(12篇)

【1】 What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods 标题：我不能预测的，我不能理解的：以人为中心的可解释性方法评估框架链接：https://arxiv.org/abs/2112.04417

作者：Thomas Fel,Julien Colin,Remi Cadene,Thomas Serre 摘要：人们提出了多种解释方法和理论评估分数。然而，目前尚不清楚：（1）这些方法在现实世界场景中有多有用；（2）理论测量如何预测这些方法对人类实际使用的有用性。为了填补这一空白，我们在量表上进行了人类心理物理学实验，以评估人类参与者（n=1150）利用代表性归因方法学习预测不同图像分类器决策的能力。我们的研究结果表明，用于对可解释性方法进行评分的理论度量很难反映出个体归因方法在现实场景中的实用性。此外，个体归因方法在多大程度上帮助人类参与者预测分类器的决策，在不同的分类任务和数据集上差异很大。总的来说，我们的结果突出了该领域的基本挑战——表明迫切需要开发更好的可解释性方法，并部署以人为中心的评估方法。我们将提供我们框架的代码，以便于对新的可解释性方法进行系统评估。摘要：A multitude of explainability methods and theoretical evaluation scores have been proposed. However, it is not yet known: (1) how useful these methods are in real-world scenarios and (2) how well theoretical measures predict the usefulness of these methods for practical use by a human. To fill this gap, we conducted human psychophysics experiments at scale to evaluate the ability of human participants (n=1,150) to leverage representative attribution methods to learn to predict the decision of different image classifiers. Our results demonstrate that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios. Furthermore, the degree to which individual attribution methods helped human participants predict classifiers' decisions varied widely across categorization tasks and datasets. Overall, our results highlight fundamental challenges for the field -- suggesting a critical need to develop better explainability methods and to deploy human-centered evaluation approaches. We will make the code of our framework available to ease the systematic evaluation of novel explainability methods.

【2】 Trainability for Universal GNNs Through Surgical Randomness 标题：基于手术随机性的通用GNN可训练性链接：https://arxiv.org/abs/2112.04314

作者：Billy Joe Franks,Markus Anders,Marius Kloft,Pascal Schweitzer 摘要：消息传递神经网络（MPNN）具有可证明的局限性，通用网络可以克服这些局限性。然而，通用网络通常是不切实际的。唯一的例外是随机节点初始化（RNI），这是一种数据增强方法，可以产生可证明的通用网络。不幸的是，RNI存在严重的缺点，如收敛速度慢和对超参数变化高度敏感。我们将强大的技术从图同构测试的实际世界转移到MPNNs，解决了这些缺点。这最终导致个性化细化节点初始化（IRNI）。我们将RNI中使用的不分青红皂白和随意的随机性替换为在精心选择的节点上仅使用几个随机位的外科手术切口。我们新颖的非侵入式数据扩充方案在解决可训练性问题的同时保持了网络的通用性。我们正式证明了所声称的普遍性，并在实验上证实了IRNI克服了MPNN的局限性——在先前明确为此目的设计的合成基准集上。我们还验证了我们的方法在标准基准数据集蛋白质和NCI1上的实际有效性。摘要：Message passing neural networks (MPNN) have provable limitations, which can be overcome by universal networks. However, universal networks are typically impractical. The only exception is random node initialization (RNI), a data augmentation method that results in provably universal networks. Unfortunately, RNI suffers from severe drawbacks such as slow convergence and high sensitivity to changes in hyperparameters. We transfer powerful techniques from the practical world of graph isomorphism testing to MPNNs, resolving these drawbacks. This culminates in individualization-refinement node initialization (IRNI). We replace the indiscriminate and haphazard randomness used in RNI by a surgical incision of only a few random bits at well-selected nodes. Our novel non-intrusive data-augmentation scheme maintains the networks' universality while resolving the trainability issues. We formally prove the claimed universality and corroborate experimentally -- on synthetic benchmarks sets previously explicitly designed for that purpose -- that IRNI overcomes the limitations of MPNNs. We also verify the practical efficacy of our approach on the standard benchmark data sets PROTEINS and NCI1.

【3】 Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural Human Rendering 标题：几何引导的渐进神经网络用于泛化高效的神经人体绘制链接：https://arxiv.org/abs/2112.04312

作者：Mingfei Chen,Jianfeng Zhang,Xiangyu Xu,Lijuan Liu,Jiashi Feng,Shuicheng Yan 摘要：在这项工作中，我们开发了一个可推广和高效的神经辐射场（NeRF）管道，用于在具有稀疏摄影机视图的设置下进行高保真自由视点人体合成。尽管现有的基于NeRF的方法可以合成相当逼真的人体细节，但当输入具有自遮挡时，它们往往会产生较差的结果，尤其是对于稀疏视图下看不见的人。此外，这些方法通常需要大量的采样点进行渲染，这导致了效率低下，并限制了它们在现实世界中的适用性。为了应对这些挑战，我们提出了一种几何引导的渐进式NeRF~（GP-NeRF）。特别是，为了更好地解决自遮挡问题，我们设计了一种几何引导的多视图特征集成方法，该方法在集成来自输入视图的不完整信息之前利用估计的几何，并为目标人体构建完整的几何体体积。同时，为了获得更高的渲染效率，我们引入了一种几何引导的渐进式渲染管道，它利用几何特征体积和预测的密度值来逐步减少采样点的数量并加快渲染过程。在ZJU MoCap和THUman数据集上的实验表明，我们的方法在多个泛化设置中显著优于最新技术，同时通过应用我们高效的渐进式渲染管道，时间成本降低了70%以上。摘要：In this work we develop a generalizable and efficient Neural Radiance Field (NeRF) pipeline for high-fidelity free-viewpoint human body synthesis under settings with sparse camera views. Though existing NeRF-based methods can synthesize rather realistic details for human body, they tend to produce poor results when the input has self-occlusion, especially for unseen humans under sparse views. Moreover, these methods often require a large number of sampling points for rendering, which leads to low efficiency and limits their real-world applicability. To address these challenges, we propose a Geometry-guided Progressive NeRF~(GP-NeRF). In particular, to better tackle self-occlusion, we devise a geometry-guided multi-view feature integration approach that utilizes the estimated geometry prior to integrate the incomplete information from input views and construct a complete geometry volume for the target human body. Meanwhile, for achieving higher rendering efficiency, we introduce a geometry-guided progressive rendering pipeline, which leverages the geometric feature volume and the predicted density values to progressively reduce the number of sampling points and speed up the rendering process. Experiments on the ZJU-MoCap and THUman datasets show that our method outperforms the state-of-the-arts significantly across multiple generalization settings, while the time cost is reduced >70% via applying our efficient progressive rendering pipeline.

【4】 Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost 标题：一种高效的垂直联合XGBoost批量同态加密算法链接：https://arxiv.org/abs/2112.04261

作者：Wuxing Xu,Hao Fan,Kaixin Li,Kai Yang 摘要：越来越多的组织和机构致力于利用外部数据来提高人工智能服务的性能。为了解决数据隐私和安全问题，联合学习吸引了学术界和工业界越来越多的关注，以跨多个孤立的数据提供商安全地构建AI模型。在本文中，我们研究了在实际应用中广泛使用的XGBoost模型适应垂直联合学习环境的效率问题。最先进的垂直联合XGBoost框架需要大量加密操作和密文传输，这使得模型训练的效率远远低于本地训练XGBoost模型。为了弥补这一差距，我们提出了一种新的批量同态加密方法，将加密相关的计算和传输成本降低近一半。这是通过将一阶导数和二阶导数编码为一个数字来实现的，用于加密、密文传输和同态加法操作。多个一阶导数和二阶导数之和可以从编码值之和同时解码。在BatchCrypt的水平联邦学习工作中，我们受批处理思想的启发，设计了一种新的批处理方法来解决允许少量负数的限制。该方法的编码过程包括移位、截断、量化和批处理四个步骤，而解码过程包括去量化和向后移位。通过理论分析和大量数值实验证明了该方法的优越性。摘要：More and more orgainizations and institutions make efforts on using external data to improve the performance of AI services. To address the data privacy and security concerns, federated learning has attracted increasing attention from both academia and industry to securely construct AI models across multiple isolated data providers. In this paper, we studied the efficiency problem of adapting widely used XGBoost model in real-world applications to vertical federated learning setting. State-of-the-art vertical federated XGBoost frameworks requires large number of encryption operations and ciphertext transmissions, which makes the model training much less efficient than training XGBoost models locally. To bridge this gap, we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. We are motivated by the batch idea in the work of BatchCrypt for horizontal federated learning, and design a novel batch method to address the limitations of allowing quite few number of negative numbers. The encode procedure of the proposed batch method consists of four steps, including shifting, truncating, quantizing and batching, while the decoding procedure consists of de-quantization and shifting back. The advantages of our method are demonstrated through theoretical analysis and extensive numerical experiments.

【5】 Replay For Safety 标题：为安全起见重播链接：https://arxiv.org/abs/2112.04229

作者：Liran Szlak,Ohad Shamir 摘要：经验重播citep{Lin1993Encurrence，mnih2015human}是一种广泛使用的技术，用于在RL算法中实现数据的高效使用和改进性能。在经验回放中，过去的转换存储在内存缓冲区中，并在学习过程中重复使用。在以前的工作中，已经对回放缓冲区中的采样方案提出了各种建议，试图以最佳方式选择那些最有助于收敛到最优策略的经验。这里，我们给出了重播抽样方案的一些条件，以确保收敛性，重点讨论了著名的表格式Q-学习算法。在建立了收敛的充分条件之后，我们转而建议对经验重放的一种稍微不同的用法——以一种有偏见的方式重放记忆，作为改变结果策略属性的一种手段。我们开始对经验回放进行严格研究，将其作为控制和修改最终策略属性的工具。特别是，我们证明了使用适当的有偏抽样方案可以实现emph{safe}策略。我们相信，使用经验重播作为一种偏向机制，允许以理想的方式控制产生的策略，这是一种在许多应用中具有潜在潜力的想法。摘要：Experience replay citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and re-used during learning. Various suggestions for sampling schemes from the replay buffer have been suggested in previous works, attempting to optimally choose those experiences which will most contribute to the convergence to an optimal policy. Here, we give some conditions on the replay sampling scheme that will ensure convergence, focusing on the well-known Q-learning algorithm in the tabular setting. After establishing sufficient conditions for convergence, we turn to suggest a slightly different usage for experience replay - replaying memories in a biased manner as a means to change the properties of the resulting policy. We initiate a rigorous study of experience replay as a tool to control and modify the properties of the resulting policy. In particular, we show that using an appropriate biased sampling scheme can allow us to achieve a emph{safe} policy. We believe that using experience replay as a biasing mechanism that allows controlling the resulting policy in desirable ways is an idea with promising potential for many applications.

【6】 Specializing Versatile Skill Libraries using Local Mixture of Experts 标题：利用本地混合专家实现多功能技能库的专业化链接：https://arxiv.org/abs/2112.04216

作者：Onur Celik,Dongzhuoran Zhou,Ge Li,Philipp Becker,Gerhard Neumann 备注：published at CoRL 2021 London 摘要：机器人学的一个长期梦想是为机器人配备与人类的多功能性和精确性相匹配的技能。例如，在打乒乓球时，机器人应该能够以各种方式将球返回，同时精确地将球放置在所需位置。对这种多功能行为建模的常用方法是使用混合专家（MoE）模型，其中每个专家都是一个上下文运动原语。然而，学习这样的MOE是一个挑战，因为大多数目标迫使模型覆盖整个上下文空间，这会阻止原语的专门化，从而导致相当低质量的组件。从最大熵强化学习（RL）开始，我们将目标分解为优化每个混合成分的单个下限。此外，我们引入了一个课程，允许组件关注本地上下文区域，使模型能够学习高度准确的技能表示。为此，我们使用与专家原语联合适应的局部上下文分布。我们的下限提倡迭代添加新组件，其中新组件将集中在当前MoE未涵盖的本地上下文区域。这种局部和增量学习产生了高精度和多功能性的模块化MoE模型，在该模型中，可以通过动态添加更多组件来扩展这两种特性。我们通过广泛的消融和两个具有挑战性的模拟机器人技能学习任务来证明这一点。我们将我们取得的绩效与LaDiPS和HIREP进行比较，这是一种用于学习不同技能的已知分层策略搜索方法。摘要：A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a contextual motion primitive. However, learning such MoEs is challenging as most objectives force the model to cover the entire context space, which prevents specialization of the primitives resulting in rather low-quality components. Starting from maximum entropy reinforcement learning (RL), we decompose the objective into optimizing an individual lower bound per mixture component. Further, we introduce a curriculum by allowing the components to focus on a local context region, enabling the model to learn highly accurate skill representations. To this end, we use local context distributions that are adapted jointly with the expert primitives. Our lower bound advocates an iterative addition of new components, where new components will concentrate on local context regions not covered by the current MoE. This local and incremental learning results in a modular MoE model of high accuracy and versatility, where both properties can be scaled by adding more components on the fly. We demonstrate this by an extensive ablation and on two challenging simulated robot skill learning tasks. We compare our achieved performance to LaDiPS and HiREPS, a known hierarchical policy search method for learning diverse skills.

【7】 A Fast Algorithm for PAC Combinatorial Pure Exploration 标题：一种PAC组合纯搜索的快速算法链接：https://arxiv.org/abs/2112.04197

作者：Noa Ben-David,Sivan Sabato 备注：Full version of paper accepted to AAAI-22 摘要：我们考虑组合纯探索（CPE）的问题，它涉及找到一个组合的集合或武器具有高回报，当个别手臂的奖励是事先未知的，并且必须使用ARM拉出来估计。以前针对该问题的算法虽然在许多情况下降低了样本复杂度，但计算量很大，因此即使对于较小的问题也不可行。在这项工作中，我们在PAC环境中提出了一种新的CPE算法，该算法计算量小，因此可以很容易地应用于上万个手臂的问题。这是因为所提出的算法需要非常少的组合oracle调用。该算法基于arms的连续接受，以及基于问题组合结构的消除。我们为我们的算法提供了样本复杂性保证，并在实验中证明了它在大型问题上的有效性，而以前的算法在几十个分支的问题上都是不切实际的。有关算法和实验的代码，请参阅https://github.com/noabdavid/csale. 摘要：We consider the problem of Combinatorial Pure Exploration (CPE), which deals with finding a combinatorial set or arms with a high reward, when the rewards of individual arms are unknown in advance and must be estimated using arm pulls. Previous algorithms for this problem, while obtaining sample complexity reductions in many cases, are highly computationally intensive, thus making them impractical even for mildly large problems. In this work, we propose a new CPE algorithm in the PAC setting, which is computationally light weight, and so can easily be applied to problems with tens of thousands of arms. This is achieved since the proposed algorithm requires a very small number of combinatorial oracle calls. The algorithm is based on successive acceptance of arms, along with elimination which is based on the combinatorial structure of the problem. We provide sample complexity guarantees for our algorithm, and demonstrate in experiments its usefulness on large problems, whereas previous algorithms are impractical to run on problems of even a few dozen arms. The code for the algorithms and experiments is provided at https://github.com/noabdavid/csale.

【8】 Transformaly -- Two (Feature Spaces) Are Better Than One 标题：变形--两个(特征空间)比一个好链接：https://arxiv.org/abs/2112.04185

作者：Matan Jacob Cohen,Shai Avidan 摘要：异常检测是一个成熟的研究领域，旨在识别预定分布以外的样本。异常检测管道由两个主要阶段组成：（1）特征提取和（2）正态性评分分配。最近的论文使用预先训练好的网络进行特征提取，获得了最先进的结果。然而，使用预先训练的网络并不能充分利用列车时刻可用的正常样本。本文建议通过师生训练来利用这一信息。在我们的设置中，使用预训练的教师网络在正常训练样本上训练学生网络。由于学生网络仅在正常样本上训练，因此在异常情况下，学生网络可能会偏离教师网络。这种差异可以作为预训练特征向量的补充表示。我们的方法——Transformaly——利用预先训练的视觉变换器（ViT）来提取两个特征向量：预先训练的（不可知的）特征和师生（微调的）特征。我们报告了最先进的AUROC结果，其中一个类别被视为正常，其他类别被视为异常的普通单峰设置，以及多峰设置，其中除一个类别外，所有类别都被视为正常，只有一个类别被视为异常。该守则可于https://github.com/MatanCohen1/Transformaly. 摘要：Anomaly detection is a well-established research area that seeks to identify samples outside of a predetermined distribution. An anomaly detection pipeline is comprised of two main stages: (1) feature extraction and (2) normality score assignment. Recent papers used pre-trained networks for feature extraction achieving state-of-the-art results. However, the use of pre-trained networks does not fully-utilize the normal samples that are available at train time. This paper suggests taking advantage of this information by using teacher-student training. In our setting, a pretrained teacher network is used to train a student network on the normal training samples. Since the student network is trained only on normal samples, it is expected to deviate from the teacher network in abnormal cases. This difference can serve as a complementary representation to the pre-trained feature vector. Our method -- Transformaly -- exploits a pre-trained Vision Transformer (ViT) to extract both feature vectors: the pre-trained (agnostic) features and the teacher-student (fine-tuned) features. We report state-of-the-art AUROC results in both the common unimodal setting, where one class is considered normal and the rest are considered abnormal, and the multimodal setting, where all classes but one are considered normal, and just one class is considered abnormal. The code is available at https://github.com/MatanCohen1/Transformaly.

【9】 ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives 标题：ShinRL：一个从理论和实践角度评估RL算法的库链接：https://arxiv.org/abs/2112.04123

作者：Toshinori Kitamura,Ryo Yonetani 备注：Published at the NeurIPS Deep RL Workshop (2021) 摘要：我们介绍了ShinRL，这是一个开源库，专门用于从理论和实践角度评估强化学习（RL）算法。现有的RL库通常允许用户通过返回来评估深度RL算法的实际性能。然而，这些库对于分析算法是否如理论预期的那样执行，例如Q学习是否真的实现了最佳Q函数，并不一定有用。相比之下，ShinRL提供了一个RL环境接口，该接口可以计算用于深入研究RL算法行为的度量，例如学习和最佳Q值之间的差距以及状态访问频率。此外，我们还引入了一个灵活的求解器接口，用于以一致的方式评估理论上合理的算法（例如，动态规划和表格RL）和实际有效的算法（例如，深度RL，通常带有一些额外的扩展和正则化）。作为一个案例研究，我们展示了如何结合ShinRL的这两个特征，使深入Q学习的行为更容易分析。此外，我们证明了ShinRL可用于实证验证最近的理论发现，如KL正则化对值迭代和深度Q学习的影响，以及熵正则化策略对对抗性奖励的鲁棒性。ShinRL的源代码可在GitHub上获得：https://github.com/omron-sinicx/ShinRL. 摘要：We present ShinRL, an open-source library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Existing RL libraries typically allow users to evaluate practical performances of deep RL algorithms through returns. Nevertheless, these libraries are not necessarily useful for analyzing if the algorithms perform as theoretically expected, such as if Q learning really achieves the optimal Q function. In contrast, ShinRL provides an RL environment interface that can compute metrics for delving into the behaviors of RL algorithms, such as the gap between learned and the optimal Q values and state visitation frequencies. In addition, we introduce a flexible solver interface for evaluating both theoretically justified algorithms (e.g., dynamic programming and tabular RL) and practically effective ones (i.e., deep RL, typically with some additional extensions and regularizations) in a consistent fashion. As a case study, we show that how combining these two features of ShinRL makes it easier to analyze the behavior of deep Q learning. Furthermore, we demonstrate that ShinRL can be used to empirically validate recent theoretical findings such as the effect of KL regularization for value iteration and for deep Q learning, and the robustness of entropy-regularized policies to adversarial rewards. The source code for ShinRL is available on GitHub: https://github.com/omron-sinicx/ShinRL.

【10】 Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project 标题：基于MIMIC-III的合成急性低血压和脓毒症数据集，并作为健康健身房项目的一部分出版链接：https://arxiv.org/abs/2112.03914

作者：Nicholas I-Hsien Kuo,Mark Polizzotto,Simon Finfer,Louisa Jorm,Sebastiano Barbieri 摘要：这两个合成数据集包括重症监护病房（ICU）中3910名急性低血压患者和2164名败血症患者的生命体征、实验室检查结果、给药的液体和血管升压药。患者队列是使用先前公布的纳入和排除标准建立的，数据是使用生成性对抗网络（GANs）和MIMIC-III临床数据库创建的。与这些数据发布相关的身份披露风险估计非常低（0.045%）。这些数据集是作为Health Gym项目的一部分生成和发布的，该项目旨在公开发布合成纵向健康数据，用于开发机器学习算法（特别关注离线强化学习）和教育目的。摘要：These two synthetic datasets comprise vital signs, laboratory test results, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension and for 2,164 patients with sepsis in the Intensive Care Unit (ICU). The patient cohorts were built using previously published inclusion and exclusion criteria and the data were created using Generative Adversarial Networks (GANs) and the MIMIC-III Clinical Database. The risk of identity disclosure associated with the release of these data was estimated to be very low (0.045%). The datasets were generated and published as part of the Health Gym, a project aiming to publicly distribute synthetic longitudinal health data for developing machine learning algorithms (with a particular focus on offline reinforcement learning) and for educational purposes.

【11】 RID-Noise: Towards Robust Inverse Design under Noisy Environments 标题：RID-噪声：噪声环境下的鲁棒逆设计链接：https://arxiv.org/abs/2112.03912

作者：Jia-Qi Yang,Ke-Bin Fan,Hao Ma,De-Chuan Zhan 备注：AAAI'22 摘要：从工程角度来看，设计不仅应在理想条件下运行良好，还应抵抗噪音。这种设计方法，即稳健设计，已广泛应用于工业产品质量控制。然而，传统稳健设计需要对单个设计目标进行大量评估，而这些评估的结果不能用于新的目标。为了实现数据高效的鲁棒设计，我们提出了噪声下的鲁棒逆设计（RID噪声），它可以利用现有的噪声数据来训练条件可逆神经网络（cINN）。具体来说，我们通过前向神经网络的预测误差来衡量设计参数的可预测性，从而估计其鲁棒性。我们还定义了样本权重，可用于基于cINN的逆模型的最大加权似然估计。通过实验的可视化结果，我们清楚地证明了RID噪声是如何通过从数据中学习分布和鲁棒性来工作的。在多个有噪声的真实基准任务上的进一步实验证实，我们的方法比其他最先进的逆设计方法更有效。守则及补充资料可于https://github.com/ThyrixYang/rid-noise-aaai22 摘要：From an engineering perspective, a design should not only perform well in an ideal condition, but should also resist noises. Such a design methodology, namely robust design, has been widely implemented in the industry for product quality control. However, classic robust design requires a lot of evaluations for a single design target, while the results of these evaluations could not be reused for a new target. To achieve a data-efficient robust design, we propose Robust Inverse Design under Noise (RID-Noise), which can utilize existing noisy data to train a conditional invertible neural network (cINN). Specifically, we estimate the robustness of a design parameter by its predictability, measured by the prediction error of a forward neural network. We also define a sample-wise weight, which can be used in the maximum weighted likelihood estimation of an inverse model based on a cINN. With the visual results from experiments, we clearly justify how RID-Noise works by learning the distribution and robustness from data. Further experiments on several real-world benchmark tasks with noises confirm that our method is more effective than other state-of-the-art inverse design methods. Code and supplementary is publicly available at https://github.com/ThyrixYang/rid-noise-aaai22

【12】 Multiway Ensemble Kalman Filter 标题：多路乐团卡尔曼过滤链接：https://arxiv.org/abs/2112.04322

作者：Yu Wang,Alfred Hero 备注：Appeared in NeurIPS'21 Workshop on Machine Learning and the Physical Sciences 摘要：在这项工作中，我们研究了由偏微分方程（PDE）控制的动力学过程的二阶统计特征中出现的稀疏性和多向结构。我们考虑几个最先进的多通道协方差和逆协方差（精度）矩阵估计，并检查其优点和缺点的准确性和可解释性的背景下，物理驱动的预测时，纳入集成卡尔曼滤波器（Enkf）。特别是，我们表明，当结合适当的协方差和精确矩阵估计量时，由泊松和对流扩散类型的偏微分方程生成的多路数据可以通过EnKF精确跟踪。摘要：In this work, we study the emergence of sparsity and multiway structures in second-order statistical characterizations of dynamical processes governed by partial differential equations (PDEs). We consider several state-of-the-art multiway covariance and inverse covariance (precision) matrix estimators and examine their pros and cons in terms of accuracy and interpretability in the context of physics-driven forecasting when incorporated into the ensemble Kalman filter (EnKF). In particular, we show that multiway data generated from the Poisson and the convection-diffusion types of PDEs can be accurately tracked via EnKF when integrated with appropriate covariance and precision matrix estimators.

机器翻译，仅供参考

linux https 网络安全学习方法机器学习

0 人点赞