机器学习学术速递[6.28]

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.LG 方向，今日共计98篇

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】 Data efficiency in graph networks through equivariance 标题：图网络中的等方差数据效率

作者：Francesco Farina,Emma Slade 备注：Accepted at the ICML 2021 Workshop on Subset Selection in Machine Learning: From Theory to Practice. arXiv admin note: text overlap with arXiv:2105.14058 链接：https://arxiv.org/abs/2106.13786 摘要：我们提出了一种新的图网络结构，它等价于坐标嵌入中的任何变换，保持相邻节点之间的距离。特别地，它是等价于欧几里德和共形正交群在$n$-维。由于它的等变特性，所提出的模型相对于经典的图结构具有更高的数据效率，并且本质上具有更好的归纳偏差。我们证明，在最小数据量的学习下，我们提出的体系结构可以很好地推广到合成问题中的不可见数据，而标准模型需要更多的训练数据才能达到可比的性能。摘要：We introduce a novel architecture for graph networks which is equivariant to any transformation in the coordinate embeddings that preserves the distance between neighbouring nodes. In particular, it is equivariant to the Euclidean and conformal orthogonal groups in $n$-dimensions. Thanks to its equivariance properties, the proposed model is extremely more data efficient with respect to classical graph architectures and also intrinsically equipped with a better inductive bias. We show that, learning on a minimal amount of data, the architecture we propose can perfectly generalise to unseen data in a synthetic problem, while much more training data are required from a standard model to reach comparable performance.

【2】 VEGN: Variant Effect Prediction with Graph Neural Networks 标题：VEGN：基于图神经网络的变异效应预测

作者：Jun Cheng,Carolin Lawrence,Mathias Niepert 备注：Accepted at Workshop on Computational Biology, co-located with the 38th International Conference on Machine Learning 链接：https://arxiv.org/abs/2106.13642 摘要：基因突变可以破坏正常的基因功能而导致疾病。从单个患者体内数百万个基因变异中识别致病突变是一个具有挑战性的问题。因此，能够优先考虑致病突变的计算方法有着巨大的应用。众所周知，基因通过一个复杂的调控网络发挥作用。然而，现有的变量效应预测模型只考虑一个孤立的变量。与此相反，我们提出了VEGN，它使用一个图形神经网络（GNN）来模拟变异效应预测，该网络操作在一个包含基因和变异的异质图形上。这张图是通过给基因分配变异并用基因-基因相互作用网络连接基因而创建的。在这种情况下，我们探讨了一种方法，其中一个基因图是给定的，另一个素食主义者学习的基因图，因此在给定和学习的边缘操作。图形神经网络被训练来聚集基因之间，以及基因和变体之间的信息。变种可以通过它们连接的基因来交换信息。这种方法提高了现有最先进模型的性能。摘要：Genetic mutations can cause disease by disrupting normal gene function. Identifying the disease-causing mutations from millions of genetic variants within an individual patient is a challenging problem. Computational methods which can prioritize disease-causing mutations have, therefore, enormous applications. It is well-known that genes function through a complex regulatory network. However, existing variant effect prediction models only consider a variant in isolation. In contrast, we propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants. The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network. In this context, we explore an approach where a gene-gene graph is given and another where VEGN learns the gene-gene graph and therefore operates both on given and learnt edges. The graph neural network is trained to aggregate information between genes, and between genes and variants. Variants can exchange information via the genes they connect to. This approach improves the performance of existing state-of-the-art models.

【3】 Temporal Graph Signal Decomposition 标题：时态图信号分解

作者：Maxwell McNeil,Lin Zhang,Petko Bogdanov 机构：University at Albany—SUNY, USA 备注：9 Main Pages 2 Supplement to be published in the research track in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021), August 14 through August 18, 2021,Virtual Event, Singapore 链接：https://arxiv.org/abs/2106.13517 摘要：时态图信号是多变量时间序列，其各个分量与固定图结构的节点相关联。这类数据出现在许多领域，包括社会网络用户的活动、传感器网络随时间变化的读数以及模型生物交互网络中的时程基因表达。应用于此类数据的传统矩阵分解方法无法利用编码在基础图中以及信号的时间模式中的结构规律。我们如何考虑这种结构来获得时间图信号的简洁和可解释的表示？我们提出了一个通用的、基于字典的时序图信号分解框架（TGSD）。其关键思想是通过图形和时间字典的组合来学习数据的低秩联合编码。我们提出了一种高度可扩展的完全和不完全数据分解算法，并在从交通模式到社交媒体活动的合成和真实数据中证明了它在矩阵分解、缺失值插补、时间插值、聚类、周期估计和秩估计方面的优势。当多达75%的观测值丢失时，我们的框架与时间插值基线相比，RMSE减少了28%。它在350万个数据点上用不到20秒的时间在基线中扩展得最好，并生成最节省的模型。据我们所知，TGSD是第一个通过时态和图字典联合建模图信号的框架。摘要：Temporal graph signals are multivariate time series with individual components associated with nodes of a fixed graph structure. Data of this kind arises in many domains including activity of social network users, sensor network readings over time, and time course gene expression within the interaction network of a model organism. Traditional matrix decomposition methods applied to such data fall short of exploiting structural regularities encoded in the underlying graph and also in the temporal patterns of the signal. How can we take into account such structure to obtain a succinct and interpretable representation of temporal graph signals? We propose a general, dictionary-based framework for temporal graph signal decomposition (TGSD). The key idea is to learn a low-rank, joint encoding of the data via a combination of graph and time dictionaries. We propose a highly scalable decomposition algorithm for both complete and incomplete data, and demonstrate its advantage for matrix decomposition, imputation of missing values, temporal interpolation, clustering, period estimation, and rank estimation in synthetic and real-world data ranging from traffic patterns to social media activity. Our framework achieves 28% reduction in RMSE compared to baselines for temporal interpolation when as many as 75% of the observations are missing. It scales best among baselines taking under 20 seconds on 3.5 million data points and produces the most parsimonious models. To the best of our knowledge, TGSD is the first framework to jointly model graph signals by temporal and graph dictionaries.

【4】 Reliable Graph Neural Network Explanations Through Adversarial Training 标题：通过对抗性训练实现可靠的图形神经网络解释

作者：Donald Loveland,Shusen Liu,Bhavya Kailkhura,Anna Hiszpanski,Yong Han 机构：MSD, Physical and Life Science, Lawrence Livermore National Lab, Livermore, USA, CASC, Computation 备注：4 pages, 3 figures, ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.13427 摘要：图形神经网络（GNN）的解释在很大程度上是通过事后自省来实现的。虽然这被认为是成功的，但许多事后解释方法在捕获模型的学习表示时却失败了。由于这个问题，有必要考虑如何训练模型，使其更适合事后分析。鉴于对抗性训练在计算机视觉领域取得的成功，我们提出了一种类似的GNNs训练模式，并分析了其对模型解释的影响。在没有地面真值标签的情况下，我们还通过一个新的度量来确定一个解释方法是如何很好地利用模型的学习表示的，并证明对抗性训练可以帮助更好地提取化学领域相关的见解。摘要：Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.

【5】 Federated Graph Classification over Non-IID Graphs 标题：非IID图上的联合图分类

作者：Han Xie,Jing Ma,Li Xiong,Carl Yang 机构：Department of Computer Science, Emory University 链接：https://arxiv.org/abs/2106.13423 摘要：联邦学习已经成为在不同领域训练机器学习模型的一个重要范例。对于图级任务（如图分类），图也可以看作是一种特殊类型的数据样本，可以在单独的本地系统中收集和存储。与其他领域类似，多个局部系统（每个局部系统都有一小组图）可以从协作训练一个强大的图挖掘模型中获益，例如流行的图神经网络（GNNs）。为了给这些努力提供更多的动力，我们分析了来自不同领域的真实世界图，以确认它们确实共享某些与随机图相比具有统计显著性的图属性。然而，我们也发现不同的图集，即使来自同一个域或同一个数据集，在图结构和节点特征方面都是非IID的。为了解决这个问题，我们提出了一个图聚类联邦学习（GCFL）框架，该框架基于GNNs的梯度动态地发现局部系统的聚类，并从理论上证明了这种聚类可以减少局部系统所拥有的图的结构和特征的异质性。此外，我们观察到GNNs的梯度在GCFL中波动较大，这阻碍了高质量的聚类，并设计了一种基于梯度序列的动态时间扭曲聚类机制（GCFL ）。大量的实验结果和深入的分析证明了我们提出的框架的有效性。摘要：Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustering federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL ). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.

【6】 Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks 标题：基于卷积神经网络和图神经网络的可扩展感知-动作-通信环路

作者：Ting-Kuei Hu,Fernando Gama,Tianlong Chen,Wenqing Zheng,Zhangyang Wang,Alejandro Ribeiro,Brian M. Sadler 链接：https://arxiv.org/abs/2106.13358 摘要：提出了一种基于视觉的图形聚合与推理（VGAI）的感知-动作通信环路设计方法。这种多智能体分散学习控制框架通过相邻智能体之间的局部通信，将原始的视觉观察映射到智能体的行为。该框架由卷积神经网络（CNN/GNN）和图神经网络（graph neural network，CNN/GNN）级联而成，分别处理agent级的视觉感知和特征学习，以及群体级的通信、局部信息聚合和agent动作推理。通过联合训练CNN和GNN，图像特征和通信信息结合起来学习，以更好地解决具体的任务。我们采用模仿学习的方法，依靠一个集中式专家控制器，对VGAI控制器进行离线训练。这将产生一个学习的VGAI控制器，该控制器可以以分布式方式部署以供在线执行。此外，该控制器还具有良好的可伸缩性，可以在较小的团队中进行训练，也可以在较大的团队中应用。通过一个多智能体群集应用，我们证明了VGAI的性能可以与其他分散控制器相媲美或更好，只使用视觉输入模式，而不需要访问精确的位置或运动状态信息。摘要：In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information.

【7】 Geometric learning of the conformational dynamics of molecules using dynamic graph neural networks 标题：基于动态图神经网络的分子构象动力学几何学习

作者：Michael Hunter Ashby,Jenna A. Bilbrey 机构：Pacific Northwest National Laboratory, Richland, WA, USA 备注：11 pages, 4 figures 链接：https://arxiv.org/abs/2106.13277 摘要：我们应用加权动态图的时间边缘预测模型来预测分子结构随时间的变化。每个分子被表示为一个完整的图，其中每个原子是一个顶点，所有顶点对由原子对之间的欧氏距离加权的边连接。我们将一系列完整的分子图插入动态图神经网络（GNN）中，以预测下一时间步的分子图。我们的动态GNN预测原子到原子的距离，平均绝对误差为0.017r{a}，这在分子模拟中被认为是“化学精确的”。我们还探讨了训练网络到新分子系统的可转移性，发现当在整个分子轨道上从头开始训练时，用小于总轨道10%的精细调谐提供了相同数量级的平均绝对误差。摘要：We apply a temporal edge prediction model for weighted dynamic graphs to predict time-dependent changes in molecular structure. Each molecule is represented as a complete graph in which each atom is a vertex and all vertex pairs are connected by an edge weighted by the Euclidean distance between atom pairs. We ingest a sequence of complete molecular graphs into a dynamic graph neural network (GNN) to predict the graph at the next time step. Our dynamic GNN predicts atom-to-atom distances with a mean absolute error of 0.017 r{A}, which is considered ``chemically accurate'' for molecular simulations. We also explored the transferability of a trained network to new molecular systems and found that finetuning with less than 10% of the total trajectory provides a mean absolute error of the same order of magnitude as that when training from scratch on the full molecular trajectory.

【8】 Realistic molecule optimization on a learned graph manifold 标题：学习图流形上的现实分子优化

作者：Rémy Brossard,Oriel Frigo,David Dehaene 备注：15 pages (9 page main article without refs or appendix) and 2 figures. In review at NEURIPS 2021 链接：https://arxiv.org/abs/2106.13318 摘要：基于深度学习的分子图生成与优化技术因其在药物从头设计中的巨大潜力而备受关注。一方面，最近的模型能够有效地学习给定的图分布，并且许多方法已经被证明非常有效地产生一个使给定分数最大化的分子。另一方面，以前的研究表明，生成优化的分子往往是不现实的，即使加入了力学来加强与真实药物分子数据集的相似性。在这项工作中，我们使用了一种混合方法，其中数据集分布是使用自回归模型学习的，而分数优化是使用大都会算法完成的，偏向于学习的分布。我们证明了这种方法，我们称之为学习现实主义抽样（LRS），产生了经验上更真实的分子，并且在具有相似性约束的分子优化任务中优于所有最近的基线。摘要：Deep learning based molecular graph generation and optimization has recently been attracting attention due to its great potential for de novo drug design. On the one hand, recent models are able to efficiently learn a given graph distribution, and many approaches have proven very effective to produce a molecule that maximizes a given score. On the other hand, it was shown by previous studies that generated optimized molecules are often unrealistic, even with the inclusion of mechanics to enforce similarity to a dataset of real drug molecules. In this work we use a hybrid approach, where the dataset distribution is learned using an autoregressive model while the score optimization is done using the Metropolis algorithm, biased toward the learned distribution. We show that the resulting method, that we call learned realism sampling (LRS), produces empirically more realistic molecules and outperforms all recent baselines in the task of molecule optimization with similarity constraints.

Transformer(2篇)

【1】 Vision Transformer Architecture Search 标题：视觉转换器体系结构研究

作者：Xiu Su,Shan You,Jiyang Xie,Mingkai Zheng,Fei Wang,Chen Qian,Changshui Zhang,Xiaogang Wang,Chang Xu 机构：School of Computer Science, The University of Sydney, SenseTime Research, Beijing University of Posts and Telecommunications, Department of Automation, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University (THUAI) 链接：https://arxiv.org/abs/2106.13700 摘要：近年来，transformers将图像建模为一系列具有自我注意机制的人工分割面片，在解决计算机视觉任务方面显示出巨大的优越性。然而，目前的视觉变换器（vit）的体系结构仅仅是从自然语言处理（NLP）任务中继承而来，还没有得到充分的研究和优化。在本文中，我们进一步研究了视觉任务中Transformer的内在结构，并提出了一种结构搜索方法ViTAS来搜索具有相似硬件预算的最优结构。具体地说，我们设计了一种新的有效的ViTs权值共享范式，使得可以从一个超级Transformer中得到具有不同令牌嵌入、序列大小、头数、宽度和深度的体系结构。此外，为了迎合不同架构的差异，我们在超级转换器中引入了textit{private}类标记和自我注意映射。另外，为了适应不同预算的搜索，我们提出搜索同一操作的抽样概率。实验结果表明，与现有的纯Transformer结构相比，我们的ViTAS获得了很好的效果。例如，使用$1.3$G FLOPs预算，我们的搜索架构在ImageNet上达到$74.7%$top-$1$精度，比当前的基线ViT架构高出$2.5%$。代码位于url{https://github.com/xiusu/ViTAS}. 摘要：Recently, transformers have shown great superiority in solving computer vision tasks by modeling images as a sequence of manually-split patches with self-attention mechanism. However, current architectures of vision transformers (ViTs) are simply inherited from natural language processing (NLP) tasks and have not been sufficiently investigated and optimized. In this paper, we make a further step by examining the intrinsic structure of transformers for vision tasks and propose an architecture search method, dubbed ViTAS, to search for the optimal architecture with similar hardware budgets. Concretely, we design a new effective yet efficient weight sharing paradigm for ViTs, such that architectures with different token embedding, sequence size, number of heads, width, and depth can be derived from a single super-transformer. Moreover, to cater for the variance of distinct architectures, we introduce textit{private} class token and self-attention maps in the super-transformer. In addition, to adapt the searching for different budgets, we propose to search the sampling probability of identity operation. Experimental results show that our ViTAS attains excellent results compared to existing pure transformer architectures. For example, with $1.3$G FLOPs budget, our searched architecture achieves $74.7%$ top-$1$ accuracy on ImageNet and is $2.5%$ superior than the current baseline ViT architecture. Code is available at url{https://github.com/xiusu/ViTAS}.

【2】 Shape registration in the time of transformers 标题：Transformer时代的形状配准

作者：Giovanni Trappolini,Luca Cosmo,Luca Moschella,Riccardo Marin,Emanuele Rodolà 机构：Department of Computer Engineering, Sapienza University of Rome, Department of Computer Science, Simone Melzi 链接：https://arxiv.org/abs/2106.13679 摘要：本文提出了一种基于变换器的非刚性三维点云配准方法。该方法是数据驱动的，首次在注册任务中采用transformer结构。我们的方法是通用的，适用于不同的设置。给定一个具有某些所需属性（例如蒙皮权重或其他动画提示）的固定模板，我们可以向其注册原始获取的数据，从而将所有模板属性传输到输入几何体。或者，给定一对形状，我们的方法可以将第一个形状注册到第二个形状上（反之亦然），从而获得两个形状之间的高质量密集对应。在这两种情况下，我们的结果质量使我们能够针对实际应用，如纹理转移和形状插值。此外，我们还表明，包括表面的潜在密度估计简化了学习过程。通过利用这种架构的潜力，我们可以训练我们的模型，只需要一组稀疏的地面真值对应（$10sim20%$的总分）。所提出的模型和我们进行的分析为将来探索基于Transformer的注册和匹配应用架构铺平了道路。定性和定量评估表明，在不同的数据集和场景下，我们的管道在可变形和无序的三维数据注册方面优于最先进的方法。摘要：In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds. The proposed approach is data-driven and adopts for the first time the transformer architecture in the registration task. Our method is general and applies to different settings. Given a fixed template with some desired properties (e.g. skinning weights or other animation cues), we can register raw acquired data to it, thereby transferring all the template properties to the input geometry. Alternatively, given a pair of shapes, our method can register the first onto the second (or vice-versa), obtaining a high-quality dense correspondence between the two. In both contexts, the quality of our results enables us to target real applications such as texture transfer and shape interpolation. Furthermore, we also show that including an estimation of the underlying density of the surface eases the learning process. By exploiting the potential of this architecture, we can train our model requiring only a sparse set of ground truth correspondences ($10sim20%$ of the total points). The proposed model and the analysis that we perform pave the way for future exploration of transformer-based architectures for registration and matching applications. Qualitative and quantitative evaluations demonstrate that our pipeline outperforms state-of-the-art methods for deformable and unordered 3D data registration on different datasets and scenarios.

GAN|对抗|攻击|生成相关(4篇)

【1】 Fostering Diversity in Spatial Evolutionary Generative Adversarial Networks 标题：空间进化生成对抗网络中多样性的培育

作者：Jamal Toutouh,Erik Hemberg,Una-May O'Reilly 机构：Massachusetts Institute of Technology, Cambridge, MA, USA 备注：Accepted to be presented during Conference of the Spanish Association of Artificial Intelligence (CAEPIA 2021). arXiv admin note: substantial text overlap with arXiv:1905.12702 链接：https://arxiv.org/abs/2106.13590 摘要：生成性对手网络（generativediscountary networks，GANs）存在着不稳定、模式崩溃等训练病理现象，其主要原因是缺乏多样性。协同进化GAN（CoE-GAN）训练算法已被证明对这些疾病具有弹性。本文介绍了野马，一种空间分布的CoE-GAN，它通过在训练过程中使用不同的损失函数来训练多样性。对MNIST和CelebA的实验分析表明，野马在统计上训练更精确的发电机。摘要：Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse, which mainly arise from a lack of diversity in their adversarial interactions. Co-evolutionary GAN (CoE-GAN) training algorithms have shown to be resilient to these pathologies. This article introduces Mustangs, a spatially distributed CoE-GAN, which fosters diversity by using different loss functions during the training. Experimental analysis on MNIST and CelebA demonstrated that Mustangs trains statistically more accurate generators.

【2】 Subgraph Federated Learning with Missing Neighbor Generation 标题：基于缺失邻域生成的子图联合学习

作者：Ke Zhang,Carl Yang,Xiaoxiao Li,Lichao Sun,Siu Ming Yiu 机构：Emory University,Princeton University,Lehigh University,The University of Hong Kong 链接：https://arxiv.org/abs/2106.13430 摘要：图由于其对现实对象的独特表示和交互作用，在数据挖掘和机器学习中得到了广泛的应用。随着图形越来越大，人们经常看到它们的子图被分别收集并存储在多个本地系统中。因此，考虑子图联合学习设置是很自然的，其中每个局部系统持有一个可能偏离整个图分布的小子图。因此，子图联合学习的目标是在不直接共享图数据的情况下，协同训练一个强大的、可推广的图挖掘模型。在这项工作中，针对子图联邦学习这一新颖而现实的背景，我们提出了两种主要的技术：（1）FedSage，它训练了一个基于FedAvg的图页模型来集成多个局部子图上的节点特征、链接结构和任务标签(2） FedSage ，它沿着FedSage训练丢失的邻居生成器来处理本地子图中丢失的链接。在四个具有综合子图联邦学习设置的真实图形数据集上的实验结果证明了我们提出的方法的有效性和效率。同时，对它们在全局图上的推广能力给出了一致的理论启示。摘要：Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holding a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage , which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.

【3】 On the (Un-)Avoidability of Adversarial Examples 标题：论对抗性例证的(不可回避)性

作者：Sadia Chowdhury,Ruth Urner 机构： and seemingly erraticbehaviors of deep learning models have caused substantial 1Lassonde School of Engineering, YorkUniversity 备注：ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.13326 摘要：深度学习模式中的对抗性范例现象引起了人们对其可靠性的极大关注。虽然许多深层神经网络在预测精度方面表现出令人印象深刻的性能，但在许多情况下，一个不可察觉的扰动可以错误地翻转网络的预测。大多数研究都集中在发展对抗性攻击的防御或者在最坏的情况下学习对抗性损失。在这项工作中，我们退后一步，目的是提供一个框架来确定模型在小扰动下的标签变化是否合理（以及何时不合理）。我们谨慎地认为，对抗性稳健性应该被定义为一个符合潜在分布的局部自适应度量。然后，我们提出了一个自适应鲁棒损失的定义，推导了它的经验版本，并开发了一个由此产生的数据扩充框架。我们证明了我们的自适应数据扩充在确定性标签下保持了1-最近邻分类的一致性，并提供了示例性的经验评估。摘要：The phenomenon of adversarial examples in deep learning models has caused substantial concern over their reliability. While many deep neural networks have shown impressive performance in terms of predictive accuracy, it has been shown that in many instances an imperceptible perturbation can falsely flip the network's prediction. Most research has then focused on developing defenses against adversarial attacks or learning under a worst-case adversarial loss. In this work, we take a step back and aim to provide a framework for determining whether a model's label change under small perturbation is justified (and when it is not). We carefully argue that adversarial robustness should be defined as a locally adaptive measure complying with the underlying distribution. We then suggest a definition for an adaptive robust loss, derive an empirical version of it, and develop a resulting data-augmentation framework. We prove that our adaptive data-augmentation maintains consistency of 1-nearest neighbor classification under deterministic labels and provide illustrative empirical evaluations.

【4】 A variational autoencoder approach for choice set generation and implicit perception of alternatives in choice modeling 标题：选择建模中选择集生成和选择隐式感知的变分自动编码器方法

作者：Rui Yao,Shlomo Bekhor 机构：Department of Civil and Environmental Engineering, Technion – Israel Institute of Technology, Haifa , Israel 链接：https://arxiv.org/abs/2106.13319 摘要：本文推导了具有方案隐式可用性/感知（IAP）的广义极值（GEV）模型，提出了一种用于方案选择集生成和隐式感知的变分自编码（VAE）方法。具体地说，作为IAP-GEV模型的一个例子，导出了带有IAP的交叉嵌套logit（CNL）模型。采用VAE方法对选择集生成过程进行建模，使得在选择集中感知到选择方案的可能性最大。以一个实际数据集为例，说明了VAE方法生成路由选择集的方法。与多项式logit模型和传统的选择集生成方法相比，IAP-CNL模型在拟合优度和预测性能方面都有较好的表现。摘要：This paper derives the generalized extreme value (GEV) model with implicit availability/perception (IAP) of alternatives and proposes a variational autoencoder (VAE) approach for choice set generation and implicit perception of alternatives. Specifically, the cross-nested logit (CNL) model with IAP is derived as an example of IAP-GEV models. The VAE approach is adapted to model the choice set generation process, in which the likelihood of perceiving chosen alternatives in the choice set is maximized. The VAE approach for route choice set generation is exemplified using a real dataset. IAP- CNL model estimated has the best performance in terms of goodness-of-fit and prediction performance, compared to multinomial logit models and conventional choice set generation methods.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】 Multi-Domain Active Learning: A Comparative Study 标题：多领域主动学习的比较研究

作者：Rui He,Shan He,Ke Tang 机构： Department of Computer Scienceand Engineering, Southern University of Science and Technology 链接：https://arxiv.org/abs/2106.13516 摘要：在多个领域构建分类器是现实生活中的一个实际问题。多域学习（multi-domainlearning，MDL）不是逐个构建分类器，而是在多个域上同时构建分类器。MDL利用域间共享的信息来提高性能。作为一个有监督的学习问题，MDL问题中的标注工作量仍然很大。通常，这种高成本的标签问题可以通过使用主动学习来解决。因此，利用主动学习来减少MDL中的标记工作是很自然的，我们将这种设置称为多域主动学习（MDAL）。然而，只有很少的作品是建立在这种设置。当研究者不得不面对这个问题时，没有现成的解决方案。在这种情况下，结合现有的多领域学习模型和单领域主动学习策略可能是解决MDAL问题的一个初步方案。为了找出这一初步解决方案的潜力，本文对5种模式和4种选择策略进行了比较研究。据我们所知，这是第一个提供MDAL正式定义的工作。此外，这是MDAL问题的第一个比较工作。结果表明，在大多数情况下，采用简单的最优vs次优（BvSB）不确定性策略的多项式对抗网络（MAN）模型显示了其优越性。我们将此组合作为MDAL问题的现成建议。摘要：Building classifiers on multiple domains is a practical problem in the real life. Instead of building classifiers one by one, multi-domain learning (MDL) simultaneously builds classifiers on multiple domains. MDL utilizes the information shared among the domains to improve the performance. As a supervised learning problem, the labeling effort is still high in MDL problems. Usually, this high labeling cost issue could be relieved by using active learning. Thus, it is natural to utilize active learning to reduce the labeling effort in MDL, and we refer this setting as multi-domain active learning (MDAL). However, there are only few works which are built on this setting. And when the researches have to face this problem, there is no off-the-shelf solutions. Under this circumstance, combining the current multi-domain learning models and single-domain active learning strategies might be a preliminary solution for MDAL problem. To find out the potential of this preliminary solution, a comparative study over 5 models and 4 selection strategies is made in this paper. To the best of our knowledge, this is the first work provides the formal definition of MDAL. Besides, this is the first comparative work for MDAL problem. From the results, the Multinomial Adversarial Networks (MAN) model with a simple best vs second best (BvSB) uncertainty strategy shows its superiority in most cases. We take this combination as our off-the-shelf recommendation for the MDAL problem.

【2】 Active Learning with Multifidelity Modeling for Efficient Rare Event Simulation 标题：基于多保真建模的有效稀有事件仿真主动学习

作者：S. L. N. Dhulipala,M. D. Shields,B. W. Spencer,C. Bolisetti,A. E. Slaughter,V. M. Laboure,P. Chakroborty 机构： USAbDepartment of Civil and Systems Engineering, Johns Hopkins University 链接：https://arxiv.org/abs/2106.13790 摘要：虽然多理想建模提供了一种经济高效的方法来对计算昂贵的模型进行不确定性量化，但根据问题的类型和复杂性以及结果中所需的精度，通过自适应地确定所需高保真（HF）模拟的数量，可以实现更高的效率。我们提出了一个多理想模型的主动学习框架，强调对罕见事件的有效估计。我们的框架通过融合低保真（LF）预测和HF推断校正，过滤校正后的LF预测来决定是否调用高保真模型，并且为了提高后续的准确性，在每次HF模型调用之后对LF预测进行校正。该框架并未对LF模型类型或其与HF模型的相关性作出任何假设。此外，为了提高估计较小故障概率时的鲁棒性，我们建议使用动态主动学习函数来决定何时调用HF模型。我们使用几个学术案例研究和两个有限元（FE）模型案例研究来证明我们的框架：使用Stokes近似估计Navier-Stokes速度和通过粗网格各向同性模型估计横向各向同性模型中受位移影响的应力。在这些案例研究中，所提出的框架不仅准确地估计了失效概率，而且与montecarlo或标准方差缩减方法相比，它只需要调用HF模型的一小部分。摘要：While multifidelity modeling provides a cost-effective way to conduct uncertainty quantification with computationally expensive models, much greater efficiency can be achieved by adaptively deciding the number of required high-fidelity (HF) simulations, depending on the type and complexity of the problem and the desired accuracy in the results. We propose a framework for active learning with multifidelity modeling emphasizing the efficient estimation of rare events. Our framework works by fusing a low-fidelity (LF) prediction with an HF-inferred correction, filtering the corrected LF prediction to decide whether to call the high-fidelity model, and for enhanced subsequent accuracy, adapting the correction for the LF prediction after every HF model call. The framework does not make any assumptions as to the LF model type or its correlations with the HF model. In addition, for improved robustness when estimating smaller failure probabilities, we propose using dynamic active learning functions that decide when to call the HF model. We demonstrate our framework using several academic case studies and two finite element (FE) model case studies: estimating Navier-Stokes velocities using the Stokes approximation and estimating stresses in a transversely isotropic model subjected to displacements via a coarsely meshed isotropic model. Across these case studies, not only did the proposed framework estimate the failure probabilities accurately, but compared with either Monte Carlo or a standard variance reduction method, it also required only a small fraction of the calls to the HF model.

【3】 A Novel Self-Learning Framework for Bladder Cancer Grading Using Histopathological Images 标题：一种新的基于组织病理学图像的膀胱癌分级自学习框架

作者：Gabriel García,Anna Esteve,Adrián Colomer,David Ramos,Valery Naranjo 机构：Instituto de Investigaci´on e Innovaci´on en Bioingenier´ıa, Universitat Politecnica de Valencia, Valencia, Spain, Hospital Universitario y Polit´ecnico La Fe, Avinguda de Fernando Abril Martorell, Valencia, Spain. 链接：https://arxiv.org/abs/2106.13559 摘要：近年来，膀胱癌的发病率和死亡率显著增加。目前，根据肿瘤的生长情况已知两种亚型：非肌肉浸润性膀胱癌（NMIBC）和肌肉浸润性膀胱癌（MIBC）。在这项工作中，我们关注MIBC亚型，因为它预后最差，并且可以扩散到邻近器官。我们提出了一个自我学习的框架来分级膀胱癌的组织学图像染色通过免疫组织化学技术。具体来说，我们提出了一种新的深度卷积嵌入注意聚类（DCEAC），它允许根据文献中建立的模式将组织学斑块分为不同的疾病严重程度。提出的DCEAC模型遵循两步完全无监督学习方法，从512x512像素的高分辨率样本中区分非肿瘤、轻度和浸润性模式。我们的系统比以前的基于聚类的方法具有更好的性能，包括卷积注意模块，它允许在分类阶段之前细化潜在空间的特征。所提出的网络在不同的度量中超过了最先进的方法2-3%，在多类场景中达到了0.9034的最终平均精度。此外，所报告的类激活映射证明，我们的模型能够自行学习临床医生认为相关的相同模式，而无需事先进行注释步骤。这一事实表明，肌肉浸润性膀胱癌分级取得了突破性进展，填补了在标记数据上训练模型的空白。摘要：Recently, bladder cancer has been significantly increased in terms of incidence and mortality. Currently, two subtypes are known based on tumour growth: non-muscle invasive (NMIBC) and muscle-invasive bladder cancer (MIBC). In this work, we focus on the MIBC subtype because it is of the worst prognosis and can spread to adjacent organs. We present a self-learning framework to grade bladder cancer from histological images stained via immunohistochemical techniques. Specifically, we propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which allows classifying histological patches into different severity levels of the disease, according to the patterns established in the literature. The proposed DCEAC model follows a two-step fully unsupervised learning methodology to discern between non-tumour, mild and infiltrative patterns from high-resolution samples of 512x512 pixels. Our system outperforms previous clustering-based methods by including a convolutional attention module, which allows refining the features of the latent space before the classification stage. The proposed network exceeds state-of-the-art approaches by 2-3% across different metrics, achieving a final average accuracy of 0.9034 in a multi-class scenario. Furthermore, the reported class activation maps evidence that our model is able to learn by itself the same patterns that clinicians consider relevant, without incurring prior annotation steps. This fact supposes a breakthrough in muscle-invasive bladder cancer grading which bridges the gap with respect to train the model on labelled data.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 Private Adaptive Gradient Methods for Convex Optimization 标题：凸优化的私有自适应梯度法

作者：Hilal Asi,John Duchi,Alireza Fallah,Omid Javidbakht,Kunal Talwar 机构：§Department of Electrical Engineering & Computer Science, Massachusetts Institute of Technology 备注：To appear in 38th International Conference on Machine Learning (ICML 2021) 链接：https://arxiv.org/abs/2106.13756 摘要：研究了差分私有凸优化的自适应方法，提出并分析了具有自适应步长的随机梯度下降（SGD）算法和AdaGrad算法的差分私有变量。我们给出了两种算法的上界，并证明了上界是（最坏情况下）最优的。作为我们开发的结果，我们证明了我们的私有版本的AdaGrad优于自适应SGD，而自适应SGD在具有非各向同性梯度的场景中优于传统SGD，其中（非私有）AdaGrad优于SGD。主要的挑战是，在高维问题的梯度几何中，通常为隐私而添加的各向同性噪声控制着信号；有效优化低维子空间的方法忽略了变梯度几何带来的实际问题。相比之下，我们研究了非各向同性剪裁和噪声添加，发展了一个原则性的理论方法；随后的程序也享有明显更强的经验表现比以前的方法。摘要：We study adaptive methods for differentially private convex optimization, proposing and analyzing differentially private variants of a Stochastic Gradient Descent (SGD) algorithm with adaptive stepsizes, as well as the AdaGrad algorithm. We provide upper bounds on the regret of both algorithms and show that the bounds are (worst-case) optimal. As a consequence of our development, we show that our private versions of AdaGrad outperform adaptive SGD, which in turn outperforms traditional SGD in scenarios with non-isotropic gradients where (non-private) Adagrad provably outperforms SGD. The major challenge is that the isotropic noise typically added for privacy dominates the signal in gradient geometry for high-dimensional problems; approaches to this that effectively optimize over lower-dimensional subspaces simply ignore the actual problems that varying gradient geometries introduce. In contrast, we study non-isotropic clipping and noise addition, developing a principled theoretical approach; the consequent procedures also enjoy significantly stronger empirical performance than prior approaches.

【2】 Privileged Zero-Shot AutoML 标题：特权零射AutoML

作者：Nikhil Singh,Brandon Kates,Jeff Mentch,Anant Kharkar,Madeleine Udell,Iddo Drori 机构：edu Cornell UniversityJeff Mentchjsmentch, edu Harvard UniversityAnant Kharkaragk 2 1 5 1, edu Columbia UniversityMadeleine Udelludell, edu Cornell UniversityIddo Droriidrori 备注：16 pages, 4 figures 链接：https://arxiv.org/abs/2106.13743 摘要：这项工作通过使用数据集和函数描述提高了自动机器学习（AutoML）系统的质量，同时通过使用零炮方法显著地减少了从几分钟到几毫秒的计算时间。给定一个新的数据集和一个定义良好的机器学习任务，人类首先阅读数据集的描述和要使用的算法的文档。这项工作首次将这些文本描述（我们称之为特权信息）用于AutoML。我们使用预先训练好的转换器模型来处理特权文本，并证明使用这些信息可以提高AutoML的性能。因此，我们的方法利用了自然语言处理中无监督表征学习的进展，为AutoML提供了极大的推动。我们证明了仅使用数据和函数的文本描述可以获得合理的分类性能，并且将文本描述添加到数据元特征中可以改进跨表格数据集的分类。为了实现零炮AutoML，我们训练了一个具有这些描述嵌入和数据元特征的图神经网络。每个节点代表一个训练数据集，我们用它来预测新测试数据集的最佳机器学习管道。我们的零炮方法可以快速预测一个高质量的管道，用于有监督的学习任务和数据集。相比之下，大多数AutoML系统需要数十或数百个管道评估。我们表明，zero-shot AutoML在数据集中一致地将运行和预测时间从几分钟减少到几毫秒。通过将AutoML加速几个数量级，这项工作演示了实时AutoML。摘要：This work improves the quality of automated machine learning (AutoML) systems by using dataset and function descriptions while significantly decreasing computation time from minutes to milliseconds by using a zero-shot approach. Given a new dataset and a well-defined machine learning task, humans begin by reading a description of the dataset and documentation for the algorithms to be used. This work is the first to use these textual descriptions, which we call privileged information, for AutoML. We use a pre-trained Transformer model to process the privileged text and demonstrate that using this information improves AutoML performance. Thus, our approach leverages the progress of unsupervised representation learning in natural language processing to provide a significant boost to AutoML. We demonstrate that using only textual descriptions of the data and functions achieves reasonable classification performance, and adding textual descriptions to data meta-features improves classification across tabular datasets. To achieve zero-shot AutoML we train a graph neural network with these description embeddings and the data meta-features. Each node represents a training dataset, which we use to predict the best machine learning pipeline for a new test dataset in a zero-shot fashion. Our zero-shot approach rapidly predicts a high-quality pipeline for a supervised learning task and dataset. In contrast, most AutoML systems require tens or hundreds of pipeline evaluations. We show that zero-shot AutoML reduces running and prediction times from minutes to milliseconds, consistently across datasets. By speeding up AutoML by orders of magnitude this work demonstrates real-time AutoML.

【3】 Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models 标题：减少提示和参数：用语言模型进行简单的Few-Shot学习

作者：Robert L. Logan IV,Ivana Balažević,Eric Wallace,Fabio Petroni,Sameer Singh,Sebastian Riedel 机构：Ivana Balaževi´c∗, UC Irvine, University of Edinburgh, UC Berkeley, Facebook AI Research, University College London 链接：https://arxiv.org/abs/2106.13353 摘要：用训练实例和任务描述来提示语言模型（LMs）被认为是最近在少数镜头学习中取得成功的关键。在这项工作中，我们表明，微调LMs在少数镜头设置可以大大减少需要及时工程。事实上，可以使用空提示，即既不包含特定于任务的模板也不包含训练示例的提示，并且可以在大量任务中手动调整提示，从而获得具有竞争力的准确性。虽然精调LMs确实为每个下游任务引入了新的参数，但我们表明，这种内存开销可以大大减少：仅精调偏倚项可以获得与标准精调相当或更好的精度，同时只更新0.1%的参数。总之，我们建议微调LMs用于少量镜头学习，因为它更精确，对不同提示更健壮，并且可以使其几乎与使用冻结LMs一样有效。摘要：Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competitive accuracy to manually-tuned prompts across a wide range of tasks. While finetuning LMs does introduce new parameters for each downstream task, we show that this memory overhead can be substantially reduced: finetuning only the bias terms can achieve comparable or better accuracy than standard finetuning while only updating 0.1% of the parameters. All in all, we recommend finetuning LMs for few-shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

强化学习(4篇)

【1】 Multi-Goal Reinforcement Learning environments for simulated Franka Emika Panda robot 标题：模拟Franka Emika熊猫机器人的多目标强化学习环境

作者：Quentin Gallouédec,Nicolas Cazin,Emmanuel Dellandréa,Liming Chen 机构：École Centrale de Lyon, LIRIS, CNRS UMR , France 备注：9 pages, 5 figures, 2 tables 链接：https://arxiv.org/abs/2106.13687 摘要：本技术报告介绍了熊猫健身房，一套强化学习（RL）的环境，为弗兰卡埃米卡熊猫机器人与OpenAI健身房集成。包括五项任务：伸展、推、滑、取放和堆叠。它们都遵循多目标RL框架，允许使用面向目标的RL算法。为了促进开放研究，我们选择使用开源物理引擎PyBullet。为这个包选择的实现允许非常容易地定义新任务或新机器人。本报告还提供了最新的无模型策略算法的结果基线。熊猫健身房是开源的https://github.com/qgallouedec/panda-gym. 摘要：This technical report presents panda-gym, a set Reinforcement Learning (RL) environments for the Franka Emika Panda robot integrated with OpenAI Gym. Five tasks are included: reach, push, slide, pick & place and stack. They all follow a Multi-Goal RL framework, allowing to use goal-oriented RL algorithms. To foster open-research, we chose to use the open-source physics engine PyBullet. The implementation chosen for this package allows to define very easily new tasks or new robots. This report also presents a baseline of results obtained with state-of-the-art model-free off-policy algorithms. panda-gym is open-source at https://github.com/qgallouedec/panda-gym.

【2】 Branch Prediction as a Reinforcement Learning Problem: Why, How and Case Studies 标题：作为强化学习问题的分支预测：为什么、如何和案例研究

作者：Anastasios Zouzias,Kleovoulos Kalaitzidis,Boris Grot 机构：Huawei Technologies, Zurich Research Center, Switzerland, University of Edinburgh, School of Informatics, United Kingdom 备注：6 pages, appeared in ML workshop for Computer Architecture and Systems 2021 链接：https://arxiv.org/abs/2106.13429 摘要：近年来，分支预测器（branch predictor，BP）效能的提高停滞不前，分支预测器的设计缺乏新的思路，需要在这方面进行新的思考。本文认为，从强化学习（RL）的角度看待BP，有助于对BP设计进行系统的推理和探索。我们描述了如何将RL公式应用于分支预测器，表明现有的预测器可以简洁地表达在这个公式中，并研究了两个基于RL的常规BPs变体。摘要：Recent years have seen stagnating improvements to branch predictor (BP) efficacy and a dearth of fresh ideas in branch predictor design, calling for fresh thinking in this area. This paper argues that looking at BP from the viewpoint of Reinforcement Learning (RL) facilitates systematic reasoning about, and exploration of, BP designs. We describe how to apply the RL formulation to branch predictors, show that existing predictors can be succinctly expressed in this formulation, and study two RL-based variants of conventional BPs.

【3】 Multi-Robot Deep Reinforcement Learning for Mobile Navigation 标题：多机器人深度强化学习在移动导航中的应用

作者：Katie Kang,Gregory Kahn,Sergey Levine 机构：University of California, Berkeley 链接：https://arxiv.org/abs/2106.13280 摘要：深度强化学习算法需要大量不同的数据集来学习基于感知的移动导航策略。然而，用一个机器人收集这样的数据集可能会非常昂贵。用可能具有不同动力学的多个不同机器人平台收集数据是一种更具可伸缩性的大规模数据收集方法。但深度强化学习算法如何利用这些异构数据集呢？在这项工作中，我们提出了一个具有层次整合模型（HInt）的深度强化学习算法。在训练时，HInt学习单独的感知模型和动力学模型，在测试时，HInt将两个模型进行分层集成，并用集成模型规划动作。这种分层集成模型的规划方法允许算法在各种不同平台收集的数据集上进行训练，同时尊重测试时部署机器人的物理能力。我们的移动导航实验表明，HInt优于传统的分层策略和单源方法。摘要：Deep reinforcement learning algorithms require large and diverse datasets in order to learn successful policies for perception-based mobile navigation. However, gathering such datasets with a single robot can be prohibitively expensive. Collecting data with multiple different robotic platforms with possibly different dynamics is a more scalable approach to large-scale data collection. But how can deep reinforcement learning algorithms leverage such heterogeneous datasets? In this work, we propose a deep reinforcement learning algorithm with hierarchically integrated models (HInt). At training time, HInt learns separate perception and dynamics models, and at test time, HInt integrates the two models in a hierarchical manner and plans actions with the integrated model. This method of planning with hierarchically integrated models allows the algorithm to train on datasets gathered by a variety of different platforms, while respecting the physical capabilities of the deployment robot at test time. Our mobile navigation experiments show that HInt outperforms conventional hierarchical policies and single-source approaches.

【4】 Reinforcement Learning for Mean Field Games, with Applications to Economics 标题：平均场博弈的强化学习及其在经济学中的应用

作者：Andrea Angiuli,Jean-Pierre Fouque,Mathieu Lauriere 机构：Mathieu Laurière‡ 链接：https://arxiv.org/abs/2106.13755 摘要：平均场对策（MFG）和平均场控制问题（MFC）是研究具有连续代理的博弈中纳什均衡或社会最优的框架。这些问题可以用来近似具有大量有限个代理的竞争或合作博弈，并且有着广泛的应用，特别是在经济学中。近年来，MFG和MFC中的学习问题引起了人们的兴趣，这既是计算解的一种方法，也是模拟大量学习者如何收敛到均衡的一种方法。特别令人感兴趣的是代理人不知道模型的设置，这导致了强化学习（RL）方法的发展。在回顾了相关文献之后，我们提出了一种基于RL的双时间尺度MFG和MFC方法，该方法依赖于一个统一的Q-学习算法。这种方法的主要新颖之处在于，以无模型的方式，以不同的速率同时更新作用值函数和分布。根据两个学习率的比值，该算法学习MFG或MFC解。为了说明这个方法，我们将它应用于一个具有HARA效用函数的有限时间内累积消费的平均场问题，以及一个交易者的最优清算问题。摘要：Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader's optimal liquidation problem.

医学相关(3篇)

【1】 Assessing the Lockdown Effects on Air Quality during COVID-19 Era 标题：评价冠状病毒时代对空气质量的封锁效应

作者：Ioannis Kavouras,Eftychios Protopapadakis,Maria Kaselimia,Emmanuel Sardis,Nikolaos Doulamis 链接：https://arxiv.org/abs/2106.13750 摘要：在这项工作中，我们调查了空气质量排放的短期变化，归因于在不同城市应用的预防措施，以减轻COVID-19的传播。特别是，我们强调了特定污染气体的浓度效应，如一氧化碳（CO）、臭氧（O3）、二氧化氮（NO2）和二氧化硫（SO2）。封锁对空气质量影响的评估集中在四个欧洲城市（雅典、格拉德萨克斯、洛兹和罗马）。利用全球卫星观测获得了关于污染物因子的现有数据。使用牛津COVID-19政府反应跟踪系统来确定所采用的预防措施的水平。分析的第二部分采用了各种机器学习工具，用于提前两天估算每种污染物的浓度。结果表明，相应措施与污染因子之间存在弱到中等的相关性，可以建立模型来预测人类日常活动中污染气体的行为。摘要：In this work we investigate the short-term variations in air quality emissions, attributed to the prevention measures, applied in different cities, to mitigate the COVID-19 spread. In particular, we emphasize on the concentration effects regarding specific pollutant gases, such as carbon monoxide (CO), ozone (O3), nitrogen dioxide (NO2) and sulphur dioxide (SO2). The assessment of the impact of lockdown on air quality focused on four European Cities (Athens, Gladsaxe, Lodz and Rome). Available data on pollutant factors were obtained using global satellite observations. The level of the employed prevention measures is employed using the Oxford COVID-19 Government Response Tracker. The second part of the analysis employed a variety of machine learning tools, utilized for estimating the concentration of each pollutant, two days ahead. The results showed that a weak to moderate correlation exists between the corresponding measures and the pollutant factors and that it is possible to create models which can predict the behaviour of the pollutant gases under daily human activities.

【2】 Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation 标题：从SDS评估的问题式长期录像解读抑郁

作者：Wanqing Xie,Lizhong Liang,Yao Lu,Chen Wang,Jihong Shen,Hui Luo,Xiaofeng Liu 机构： Lu is with the School of Computer Science and Engineering, Sun Yat-senUniversity, HarbinEngineering University 备注：Published in IEEE Journal of Biomedical and Health Informatics 链接：https://arxiv.org/abs/2106.13393 摘要：抑郁自评量表（SDS）是一种常用的抑郁症初筛方法。然而，不可控的自我管理措施很容易受到漫不经心或欺骗性回答的影响，并产生不同的结果与临床医生管理汉密尔顿抑郁量表（HDRS）和最终诊断。临床上，面部表情和动作在临床医生的评估中起着至关重要的作用，而自我评估中对面部表情和动作的探索不足。在这项工作中，我们收集了一个新的数据集，200名受试者的自评问卷的有效性和相应的问题的视频记录。为了从抑郁自评量表和配对视频中自动解释抑郁，我们提出了一个长时变长视频的端到端分层框架，该框架还以问卷结果和回答时间为条件。具体地说，我们采用了一个层次模型，该模型利用一个3D CNN进行局部时间模式探索，并利用一个冗余感知的自我注意（RAS）方案进行问题式的全局特征聚合。针对冗余的长期FE视频处理，我们的RAS能够有效地利用问题集中每个视频片段的相关性来强调区分信息，并基于特征对的亲和性消除冗余。然后，将问题视频特征与问卷分数连接起来，进行最终的抑郁检测。我们的深入评估也显示了融合SDS评估和视频记录的有效性，以及我们的框架相对于传统的时态建模方法的优越性。摘要：Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

【3】 Disease Progression Modeling Workbench 360 标题：疾病进展建模工作台360

作者：Parthasarathy Suryanarayanan,Prithwish Chakraborty,Piyush Madan,Kibichii Bore,William Ogallo,Rachita Chandra,Mohamed Ghalwash,Italo Buleje,Sekou Remy,Shilpa Mahatma,Pablo Meyer,Jianying Hu 机构：Center for Computational Health, IBM Research, NY, USA;, IBM Research, Nairobi, Kenya, Background, Disease Progression Modeling (DPM), aims to characterize the progression of a disease and its comorbidities over 备注：Submitted to OHDSI Collaborator Showcase, 2021 (this https URL) 链接：https://arxiv.org/abs/2106.13265 摘要：在这项工作中，我们介绍了疾病进展建模工作台360（DPM360）的开源临床信息学框架，用于医疗AI的协作研究和交付。DPM360在完全开发后，将管理整个建模生命周期，从数据分析（如队列识别）到机器学习算法开发和原型制作。DPM360利用强大的机器学习训练框架和快速原型机制，通过将模型作为集装箱化服务自动部署到云环境，增强了广泛采用的OHDSI倡议提供的数据模型标准化和工具（OMOP-CDM、Athena、ATLAS）的优势。摘要：In this work we introduce Disease Progression Modeling workbench 360 (DPM360) opensource clinical informatics framework for collaborative research and delivery of healthcare AI. DPM360, when fully developed, will manage the entire modeling life cycle, from data analysis (e.g., cohort identification) to machine learning algorithm development and prototyping. DPM360 augments the advantages of data model standardization and tooling (OMOP-CDM, Athena, ATLAS) provided by the widely-adopted OHDSI initiative with a powerful machine learning training framework, and a mechanism for rapid prototyping through automatic deployment of models as containerized services to a cloud environment.

推荐(1篇)

【1】 Semantic annotation for computational pathology: Multidisciplinary experience and best practice recommendations 标题：计算病理学的语义注释：多学科经验和最佳实践建议

作者：Noorul Wahab,Islam M Miligy,Katherine Dodd,Harvir Sahota,Michael Toss,Wenqi Lu,Mostafa Jahanifar,Mohsin Bilal,Simon Graham,Young Park,Giorgos Hadjigeorghiou,Abhir Bhalerao,Ayat Lashen,Asmaa Ibrahim,Ayaka Katayama,Henry O Ebili,Matthew Parkin,Tom Sorell,Shan E Ahmed Raza,Emily Hero,Hesham Eldaly,Yee Wah Tsang,Kishore Gopalakrishnan,David Snead,Emad Rakha,Nasir Rajpoot,Fayyaz Minhas 机构：Tissue Image Analytics Centre, University of Warwick, Coventry, UK, University of Nottingham, Nottingham, UK, Department of Pathology, Menoufia University, Egypt, University Hospital Coventry and Warwickshire, Coventry, UK 链接：https://arxiv.org/abs/2106.13689 摘要：近年来，随着全幻灯片成像（WSI）技术的发展，出现了大量基于计算机视觉和人工智能（AI）的诊断、预测和预测算法。计算病理学（CPath）提供了一个综合的解决方案，利用信息嵌入在病理WSIs超越我们通过视觉评估获得。对于WSIs的自动分析和机器学习（ML）模型的验证，需要在幻灯片、组织和细胞水平上进行注释。病理图像中重要视觉结构的标注是CPath项目的重要组成部分。不正确的注释会导致难以解释的算法，并可能产生不准确和不一致的结果。尽管注释在CPath项目中起着至关重要的作用，但是对于注释应该如何执行还没有明确的指导方针或最佳实践。在本文中，我们通过介绍在执行大规模注释练习期间获得的经验和最佳实践来解决这一缺点，该大型注释练习涉及病理学家、ML专家和研究人员组成的多学科团队，作为病理图像数据湖分析、知识和教育（PathLAKE）联盟的一部分。我们提出了一个现实世界的案例研究以及不同类型的注释，诊断算法，注释数据字典和注释结构的例子。这项工作中报告的分析强调了在CPath项目的生命周期中可以用作注释指南的最佳实践建议。摘要：Recent advances in whole slide imaging (WSI) technology have led to the development of a myriad of computer vision and artificial intelligence (AI) based diagnostic, prognostic, and predictive algorithms. Computational Pathology (CPath) offers an integrated solution to utilize information embedded in pathology WSIs beyond what we obtain through visual assessment. For automated analysis of WSIs and validation of machine learning (ML) models, annotations at the slide, tissue and cellular levels are required. The annotation of important visual constructs in pathology images is an important component of CPath projects. Improper annotations can result in algorithms which are hard to interpret and can potentially produce inaccurate and inconsistent results. Despite the crucial role of annotations in CPath projects, there are no well-defined guidelines or best practices on how annotations should be carried out. In this paper, we address this shortcoming by presenting the experience and best practices acquired during the execution of a large-scale annotation exercise involving a multidisciplinary team of pathologists, ML experts and researchers as part of the Pathology image data Lake for Analytics, Knowledge and Education (PathLAKE) consortium. We present a real-world case study along with examples of different types of annotations, diagnostic algorithm, annotation data dictionary and annotation constructs. The analyses reported in this work highlight best practice recommendations that can be used as annotation guidelines over the lifecycle of a CPath project.

联邦学习|隐私保护|加密(2篇)

【1】 Understanding Clipping for Federated Learning: Convergence and Client-Level Differential Privacy 标题：理解联合学习的裁剪：收敛和客户端级差异隐私

作者：Xinwei Zhang,Xiangyi Chen,Mingyi Hong,Zhiwei Steven Wu,Jinfeng Yi 机构：† Department of Electrical and Computer Engineering, University of Minnesota, ‡ School of Computer Science, Carnegie Mellon University 链接：https://arxiv.org/abs/2106.13673 摘要：提供隐私保护一直是联邦学习（FL）的主要动机之一。近年来，在FL算法中引入了区分隐私的形式隐私概念，为了保证FL算法中的客户级区分隐私，必须在加入隐私噪声之前对客户传输的模型更新进行剪裁。这种限幅操作与集中式差分专用SGD中的梯度限幅的对应操作实质上不同，并且尚未被很好地理解。在本文中，我们首先从经验上证明了在训练神经网络时，即使存在大量的数据异质性，clipped-FedAvg也能表现出惊人的性能，这在一定程度上是因为对于几种流行的深层体系结构，客户端的更新变得相似。基于这个关键观察，我们提供了一个差分私有（DP）FedAvg算法的收敛性分析，并强调了剪裁偏差和客户端更新分布之间的关系。据我们所知，这是第一个工作，严格调查的理论和经验问题，剪裁操作在FL算法。摘要：Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients' transmitted model updates have to be clipped before adding privacy noise. Such clipping operation is substantially different from its counterpart of gradient clipping in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients' updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clipping operation in FL algorithms.

【2】 Federated Noisy Client Learning 标题：联合噪音客户端学习

作者：Li Li,Huazhu Fu,Bo Han,Cheng-Zhong Xu,Ling Shao 机构： Shenzhen Institutes of Advanced Technology, CAS., Inception Institute of Artificial Intelligence, UAE., Department of Computer Science, Hong Kong Baptist University., University of Macau. 链接：https://arxiv.org/abs/2106.13239 摘要：联邦学习（FL）协作聚合一个基于多个本地客户机的共享全局模型，同时保持训练数据的分散性，以保护数据隐私。然而，标准的FL方法忽略了有噪声的客户端问题，这可能会损害聚合模型的整体性能。在本文中，我们首先分析有噪声的客户机语句，然后用不同的噪声分布（如Bernoulli分布和截断高斯分布）对有噪声的客户机进行建模。为了在有噪声的客户机上学习，我们提出了一个简单而有效的FL框架，称为联邦噪声客户机学习（Federated Noised Client Learning，Fed NCL），它是一种即插即用算法，包含两个主要部分：数据质量度量（DQM），用于动态量化每个参与客户机的数据质量，以及噪声鲁棒聚集（NRA），通过综合考虑每个客户机的局部训练数据量和数据质量，自适应地聚集每个客户机的局部模型。我们的Fed-NCL可以很容易地应用于任何标准的FL工作流中，以处理嘈杂的客户问题。在不同数据集上的实验结果表明，我们的算法提高了具有噪声客户端的不同系统的性能。摘要：Federated learning (FL) collaboratively aggregates a shared global model depending on multiple local clients, while keeping the training data decentralized in order to preserve data privacy. However, standard FL methods ignore the noisy client issue, which may harm the overall performance of the aggregated model. In this paper, we first analyze the noisy client statement, and then model noisy clients with different noise distributions (e.g., Bernoulli and truncated Gaussian distributions). To learn with noisy clients, we propose a simple yet effective FL framework, named Federated Noisy Client Learning (Fed-NCL), which is a plug-and-play algorithm and contains two main components: a data quality measurement (DQM) to dynamically quantify the data quality of each participating client, and a noise robust aggregation (NRA) to adaptively aggregate the local models of each client by jointly considering the amount of local training data and the data quality of each client. Our Fed-NCL can be easily applied in any standard FL workflow to handle the noisy client issue. Experimental results on various datasets demonstrate that our algorithm boosts the performances of different state-of-the-art systems with noisy clients.

推理|分析|理解|解释(7篇)

【1】 Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent 标题：代理凸性：梯度下降训练神经网络分析的统一框架

作者：Spencer Frei,Quanquan Gu 机构：and 备注：14 pages 链接：https://arxiv.org/abs/2106.13792 摘要：虽然学习神经网络的优化目标是高度非凸的，但基于梯度的方法在实际学习中已经取得了广泛的成功。这种并置导致了最近一些关于梯度下降训练神经网络的可证明保证的研究。不幸的是，这些工作中的技术往往高度特定于在每个环境中研究的问题，依赖于对分布、优化参数和网络架构的不同假设，使得很难在不同的环境中推广。在这项工作中，我们提出了一个统一的非凸优化框架来分析神经网络的训练。我们引入了代理凸性和代理Polyak-Lojasiewicz（PL）不等式的概念，当使用梯度方法时，如果原始目标函数诱导一个隐式最小化的代理目标函数，则满足这些不等式。我们证明了在满足代理凸性或代理PL不等式的目标上的随机梯度下降（SGD）导致了代理目标函数的有效保证。我们进一步证明了通过梯度下降训练的神经网络的许多现有保证可以通过代理凸性和代理PL不等式来统一。摘要：Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the problem studied in each setting, relying on different assumptions on the distribution, optimization parameters, and network architectures, making it difficult to generalize across different settings. In this work, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.

【2】 Transient Stability Analysis with Physics-Informed Neural Networks 标题：基于物理信息神经网络的暂态稳定分析

作者：Jochen Stiasny,Georgios S. Misyris,Spyros Chatzivasileiadis 机构： Chatzivasileiadis are with the TechnicalUniversity of Denmark, Department of Electrical Engineering 链接：https://arxiv.org/abs/2106.13638 摘要：求解控制电力系统的常微分方程是暂态稳定分析中不可缺少的一部分。然而，传统的方法要么计算量大，要么需要模型简化，要么使用过于保守的替代模型。神经网络可以绕过这些限制，但对所使用的数据集有很高的要求。此外，它们对基本的控制方程是不可知的。物理信息神经网络解决了这一问题，本文探讨了它们的优点和面临的挑战。我们说明了关于Kundur两区系统的发现，并强调了进一步发展这种方法的可能途径。摘要：Solving the ordinary differential equations that govern the power system is an indispensable part in transient stability analysis. However, the traditionally applied methods either carry a significant computational burden, require model simplifications, or use overly conservative surrogate models. Neural networks can circumvent these limitations but are faced with high demands on the used datasets. Furthermore, they are agnostic to the underlying governing equations. Physics-informed neural network tackle this problem and we explore their advantages and challenges in this paper. We illustrate the findings on the Kundur two-area system and highlight possible pathways forward in developing this method further.

【3】 Limitations of machine learning for building energy prediction: ASHRAE Great Energy Predictor III Kaggle competition error analysis 标题：机器学习在建筑能耗预测中的局限性：ASHRAE大能耗预报器III卡格尔竞争误差分析

作者：Clayton Miller,Bianca Picchetti,Chun Fu,Jovan Pantelic 机构：Building and Urban Data Science (BUDS) Lab, National University of Singapore (NUS), Singapore, KU Leuven, Belgium 链接：https://arxiv.org/abs/2106.13475 摘要：近年来，机器学习在建筑能耗预测中得到了广泛的应用，但对其局限性和改进潜力的认识还很缺乏。ASHRAE Great Energy Predictor III（GEPIII）Kaggle竞赛是有史以来规模最大的建筑电能表机器学习竞赛，共有4370名参赛者提交了39403个预测。测试数据集包括两年的每小时电力、热水、冷冻水和蒸汽读数，这些数据来自16个地点的1448栋建筑的2380米。本文从竞争对手的前50个解决方案的集合中分析了剩余模型误差的各种来源和类型。该分析揭示了使用历史仪表、天气和基本建筑元数据的标准模型输入进行机器学习的局限性。根据每种情况下发生的时间误差量、突变与渐进行为、误差大小以及误差是存在于单个建筑物上还是在单个位置同时存在于多个建筑物上，对误差类型进行了分类。结果表明，在79.1%的测试数据上，机器学习模型的误差在可接受的范围内。低震级模型误差出现在16.1%的试验数据中。这些差异可以通过额外的训练数据源或机器学习的创新来解决。4.8%的测试数据出现较大的幅度误差，无论创新程度如何，都不可能准确预测。根据电能表类型（电能预测模型在10%以下的测试数据中有不可接受的误差，而热水超过60%）和建筑使用类型（公共服务低于14%，而技术/科学仅超过46%），误差行为存在差异。摘要：Machine learning for building energy prediction has exploded in popularity in recent years, yet understanding its limitations and potential for improvement are lacking. The ASHRAE Great Energy Predictor III (GEPIII) Kaggle competition was the largest building energy meter machine learning competition ever held with 4,370 participants who submitted 39,403 predictions. The test data set included two years of hourly electricity, hot water, chilled water, and steam readings from 2,380 meters in 1,448 buildings at 16 locations. This paper analyzes the various sources and types of residual model error from an aggregation of the competition's top 50 solutions. This analysis reveals the limitations for machine learning using the standard model inputs of historical meter, weather, and basic building metadata. The types of error are classified according to the amount of time errors occur in each instance, abrupt versus gradual behavior, the magnitude of error, and whether the error existed on single buildings or several buildings at once from a single location. The results show machine learning models have errors within a range of acceptability on 79.1% of the test data. Lower magnitude model errors occur in 16.1% of the test data. These discrepancies can likely be addressed through additional training data sources or innovations in machine learning. Higher magnitude errors occur in 4.8% of the test data and are unlikely to be accurately predicted regardless of innovation. There is a diversity of error behavior depending on the energy meter type (electricity prediction models have unacceptable error in under 10% of test data, while hot water is over 60%) and building use type (public service less than 14%, while technology/science is just over 46%).

【4】 Bayesian Inference in High-Dimensional Time-Serieswith the Orthogonal Stochastic Linear Mixing Model 标题：正交随机线性混合模型在高维时间序列中的贝叶斯推断

作者：Rui Meng,Kristofer Bouchard 机构：Lawrence Berkeley National Laboratory, University of California, Berkeley. 链接：https://arxiv.org/abs/2106.13379 摘要：许多现代时间序列数据集包含大量长时间采样的输出响应变量。例如，在神经科学中，神经元的100s-1000的活动是在行为和对感觉刺激的反应中被记录的。多输出高斯过程模型利用高斯过程的非参数特性来捕获多个输出的结构。然而，这类模型通常假设输出响应变量之间的相关性在输入空间中是不变的。随机线性混合模型（SLMM）假设混合系数依赖于输入，使其更灵活有效地捕捉复杂的输出依赖性。然而，目前对于大数据集的SLMMs推理比较困难，不适用于一些现代时间序列问题。在本文中，我们提出了一个新的回归框架，正交随机线性混合模型（OSLMM），它在混合系数之间引入了一个正交约束。这个约束减少了推理的计算负担，同时保留了处理复杂输出依赖性的能力。我们为SLMM和OSLMM提供了Markov链montecarlo推理过程，并在一些实际应用中证明了OSLMM与现有方法相比具有优越的模型可扩展性和减少的预测误差。在神经生理学记录中，我们使用推断的潜伏期函数对群体对听觉刺激的反应进行紧凑的可视化，并且与竞争性方法（GPFA）相比显示出更好的结果。总之，这些结果表明OSLMM将有助于分析不同的、大规模的时间序列数据集。摘要：Many modern time-series datasets contain large numbers of output response variables sampled for prolonged periods of time. For example, in neuroscience, the activities of 100s-1000's of neurons are recorded during behaviors and in response to sensory stimuli. Multi-output Gaussian process models leverage the nonparametric nature of Gaussian processes to capture structure across multiple outputs. However, this class of models typically assumes that the correlations between the output response variables are invariant in the input space. Stochastic linear mixing models (SLMM) assume the mixture coefficients depend on input, making them more flexible and effective to capture complex output dependence. However, currently, the inference for SLMMs is intractable for large datasets, making them inapplicable to several modern time-series problems. In this paper, we propose a new regression framework, the orthogonal stochastic linear mixing model (OSLMM) that introduces an orthogonal constraint amongst the mixing coefficients. This constraint reduces the computational burden of inference while retaining the capability to handle complex output dependence. We provide Markov chain Monte Carlo inference procedures for both SLMM and OSLMM and demonstrate superior model scalability and reduced prediction error of OSLMM compared with state-of-the-art methods on several real-world applications. In neurophysiology recordings, we use the inferred latent functions for compact visualization of population responses to auditory stimuli, and demonstrate superior results compared to a competing method (GPFA). Together, these results demonstrate that OSLMM will be useful for the analysis of diverse, large-scale time-series datasets.

【5】 CausalCity: Complex Simulations with Agency for Causal Discovery and Reasoning 标题：因果城市：具有因果发现和推理机构的复杂模拟

作者：Daniel McDuff,Yale Song,Jiyoung Lee,Vibhav Vineet,Sai Vemprala,Nicholas Gyde,Hadi Salman,Shuang Ma,Kwanghoon Sohn,Ashish Kapoor 机构：Microsoft, Redmond, USA, Yonsei University, South Korea, MIT, Cambridge, USA 链接：https://arxiv.org/abs/2106.13364 摘要：执行因果和反事实推理的能力是人类智力的核心属性。能够进行这类推理的决策系统有可能更具普遍性和可解释性。通过提供系统地改变参数（例如，混淆）的能力和在反事实情况下生成结果的示例，模拟有助于推进这一领域的最新技术。然而，在多智能体场景中模拟复杂的时间因果事件，例如那些存在于驾驶和车辆导航中的事件，是一个挑战。为了帮助解决这个问题，我们提供了一个高保真仿真环境，该环境是为在安全关键环境中开发因果发现和反事实推理算法而设计的。我们工作的一个核心组件是引入textit{agency}，这样使用高级定义定义和创建复杂场景就很简单了。然后，车辆与机构一起运行以完成这些目标，这意味着只有在必要时才需要控制低级别的行为。我们用三种最先进的方法进行实验，以创建基线并强调这种环境的启示。最后，我们强调今后工作的挑战和机遇。摘要：The ability to perform causal and counterfactual reasoning are central properties of human intelligence. Decision-making systems that can perform these types of reasoning have the potential to be more generalizable and interpretable. Simulations have helped advance the state-of-the-art in this domain, by providing the ability to systematically vary parameters (e.g., confounders) and generate examples of the outcomes in the case of counterfactual scenarios. However, simulating complex temporal causal events in multi-agent scenarios, such as those that exist in driving and vehicle navigation, is challenging. To help address this, we present a high-fidelity simulation environment that is designed for developing algorithms for causal discovery and counterfactual reasoning in the safety-critical context. A core component of our work is to introduce textit{agency}, such that it is simple to define and create complex scenarios using high-level definitions. The vehicles then operate with agency to complete these objectives, meaning low-level behaviors need only be controlled if necessary. We perform experiments with three state-of-the-art methods to create baselines and highlight the affordances of this environment. Finally, we highlight challenges and opportunities for future work.

【6】 What will it take to generate fairness-preserving explanations? 标题：怎样才能产生保持公平的解释呢？

作者：Jessica Dai,Sohini Upadhyay,Stephen H. Bach,Himabindu Lakkaraju 机构： it is also a critical 1Brown University, USA 2Harvard University 备注：Presented at ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.13346 摘要：在解释黑匣子模型可能有用的情况下，黑匣子的公平性通常也是一个相关的问题。然而，黑箱模型的公平性与黑箱解释行为之间的联系尚不清楚。我们关注于应用于表格数据集的解释，表明解释不一定保留黑盒算法的公平性。换句话说，解释算法可以忽略或模糊关键的相关属性，从而产生错误或误导性的解释。更广泛地说，我们提出未来的研究方向，以评估和产生解释，使他们是信息和相关的公平角度。摘要：In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of the black-box algorithm. In other words, explanation algorithms can ignore or obscure critical relevant properties, creating incorrect or misleading explanations. More broadly, we propose future research directions for evaluating and generating explanations such that they are informative and relevant from a fairness perspective.

【7】 Tighter Analysis of Alternating Stochastic Gradient Method for Stochastic Nested Problems 标题：随机嵌套问题交替随机梯度法的紧致性分析

作者：Tianyi Chen,Yuejiao Sun,Wotao Yin 机构：Rensselaer Polytechnic Institute, University of California, Los Angeles 备注：Submitted for publication 链接：https://arxiv.org/abs/2106.13781 摘要：随机嵌套优化，包括随机组合优化、最小最大优化和双层优化，在许多机器学习应用中得到了广泛的应用。虽然这三个问题具有嵌套结构，但现有的工作往往将它们分开处理，从而开发特定于问题的算法并进行分析。在各种令人兴奋的发展中，简单的SGD类型更新（可能是多变量更新）在解决这类嵌套问题时仍然很普遍，但是与非嵌套问题相比，它们被认为具有较慢的收敛速度。本文将随机嵌套问题的多个SGD类型更新统一为一个SGD方法，我们称之为交替随机梯度下降（ALSET）方法。利用问题的隐光滑性，对随机嵌套问题的ALSET进行了更严密的分析。在新的分析中，要获得嵌套问题的$epsilon$稳定点，需要${calo}（epsilon^{-2}）$个样本。在一定的正则性条件下，将我们的结果应用于随机组合、最小-最大和强化学习问题，可以提高或匹配相应情况下最著名的样本复杂度。我们的结果解释了为什么随机嵌套问题中简单的SGD型算法在实践中都能很好地工作，而不需要进一步的修改。摘要：Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them separately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $epsilon$-stationary point of the nested problem, it requires ${cal O}(epsilon^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.

检测相关(5篇)

【1】 Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets 标题：基于扩散网的瞬态噪声环境下的语音活动检测

作者：Amir Ivry,Baruch Berdugo,Israel Cohen 机构： Technion-Israel Institute of Technology 备注：None 链接：https://arxiv.org/abs/2106.13763 摘要：我们研究了在瞬态和平稳噪声环境中的语音活动检测，这在现实生活中经常发生。我们利用独特的空间模式的语音和非语音音频帧通过独立学习其基本的几何结构。这个过程是通过一个基于深度编码-解码器的神经网络结构来完成的。这种结构包括一个编码器，它将光谱特征和时间信息映射到它们的低维表示，这些低维表示是通过应用扩散映射方法生成的。编码器向解码器提供信息，解码器将嵌入的数据映射回高维空间。通过将译码器和编码器连接起来，得到了一个深度神经网络，它被训练用来将语音和非语音帧分离，类似于已知的扩散网络结构。实验结果表明，与竞争性语音活动检测方法相比，该方法具有更好的性能。该算法在精度、鲁棒性和泛化能力等方面都得到了提高。我们的模型可以实时执行，并且可以集成到基于音频的通信系统中。我们还提出了一个批处理算法，获得了更高的精度离线应用。摘要：We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm which obtains an even higher accuracy for off-line applications.

【2】 Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning 标题：基于任务驱动的具有统计保证的机器人学习失配检测

作者：Alec Farid,Sushant Veer,Anirudha Majumdar 机构：Department of Mechanical and Aerospace Engineering, Princeton University 链接：https://arxiv.org/abs/2106.13703 摘要：我们的目标是执行分布外（OOD）检测，即检测机器人何时在不同于用于训练机器人的分布的环境中工作。我们利用可能近似正确（PAC）-贝叶斯理论，在训练分布上训练一个性能有保证界的策略。我们的OOD检测的关键思想依赖于以下直觉：违反测试环境的性能限制提供了机器人正在操作OOD的证据。我们通过基于p值和浓度不等式的统计技术将其形式化。由此产生的方法（i）在OOD检测上提供有保证的置信边界，并且（ii）是任务驱动的，并且仅对影响机器人性能的变化敏感。我们在一个模拟的例子中演示了我们的方法，用不熟悉的姿势或形状来抓取物体。本文还对一架无人机在陌生环境（包括风干扰和不同的障碍物密度）中进行了基于视觉的避障仿真和硬件实验。我们的例子表明，我们可以执行任务驱动的OOD检测只需少数几个试验。与基线的比较也证明了我们的方法在提供统计保证和对任务无关的分布变化不敏感方面的优势。摘要：Our goal is to perform out-of-distribution (OOD) detection, i.e., to detect when a robot is operating in environments that are drawn from a different distribution than the environments used to train the robot. We leverage Probably Approximately Correct (PAC)-Bayes theory in order to train a policy with a guaranteed bound on performance on the training distribution. Our key idea for OOD detection then relies on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD. We formalize this via statistical techniques based on p-values and concentration inequalities. The resulting approach (i) provides guaranteed confidence bounds on OOD detection, and (ii) is task-driven and sensitive only to changes that impact the robot's performance. We demonstrate our approach on a simulated example of grasping objects with unfamiliar poses or shapes. We also present both simulation and hardware experiments for a drone performing vision-based obstacle avoidance in unfamiliar environments (including wind disturbances and different obstacle densities). Our examples demonstrate that we can perform task-driven OOD detection within just a handful of trials. Comparisons with baselines also demonstrate the advantages of our approach in terms of providing statistical guarantees and being insensitive to task-irrelevant distribution shifts.

【3】 Vulnerability and Transaction behavior based detection of Malicious Smart Contracts 标题：基于漏洞和交易行为的恶意智能合约检测

作者：Rachit Agarwal,Tanmay Thapliyal,Sandeep Kumar Shukla 机构：CSE Department, IIT Kanpur 备注：Submitted to a conf 链接：https://arxiv.org/abs/2106.13422 摘要：以太坊中的智能合约（SCs）可以自动化任务并为用户提供不同的功能。这种自动化是由编写SCs的编程语言的“图灵完整性”（Solidity）实现的。这还打开了SCs中的不同漏洞和漏洞，恶意参与者利用这些漏洞在加密货币平台上进行恶意或非法活动。在这项工作中，我们研究了恶意活动与SCs中存在的漏洞之间的相关性，发现一些恶意活动与某些类型的漏洞相关。然后，我们开发并研究了一种与SCs中存在的漏洞的严重性相对应的评分机制的可行性，以确定它是否是识别可疑SCs的相关特征。我们使用无监督机器学习（ML）算法在不同时间粒度上分析严重性评分对可疑SCs检测的效用，并识别行为变化。在我们对链上SCs的实验中，我们能够在特征集中包含智能合约脆弱性分数的情况下，找到1094个不同粒度的良性SCs，它们的行为与恶意SCs相似。摘要：Smart Contracts (SCs) in Ethereum can automate tasks and provide different functionalities to a user. Such automation is enabled by the `Turing-complete' nature of the programming language (Solidity) in which SCs are written. This also opens up different vulnerabilities and bugs in SCs that malicious actors exploit to carry out malicious or illegal activities on the cryptocurrency platform. In this work, we study the correlation between malicious activities and the vulnerabilities present in SCs and find that some malicious activities are correlated with certain types of vulnerabilities. We then develop and study the feasibility of a scoring mechanism that corresponds to the severity of the vulnerabilities present in SCs to determine if it is a relevant feature to identify suspicious SCs. We analyze the utility of severity score towards detection of suspicious SCs using unsupervised machine learning (ML) algorithms across different temporal granularities and identify behavioral changes. In our experiments with on-chain SCs, we were able to find a total of 1094 benign SCs across different granularities which behave similar to malicious SCs, with the inclusion of the smart contract vulnerability scores in the feature set.

【4】 Deep Learning for High-Impedance Fault Detection: Convolutional Autoencoders 标题：用于高阻抗故障检测的深度学习：卷积自动编码器

作者：Khushwant Rai,Farnam Hojatpanah,Firouz Badrkhani Ajaei,Katarina Grolinger 机构：��, Citation: Rai, K.; Hojatpanah, F.;, Badrkhani Ajaei, F.; Grolinger, K. Deep, Learning for High-Impedance Fault, Detection: Convolutional, Autoencoders. Energies ,. 备注：None 链接：https://arxiv.org/abs/2106.13276 摘要：高阻抗故障（HIF）具有电流幅值小、特性多样等特点，难以检测。近年来，机器学习技术从数据中学习模式，成功地检测出HIF，在HIF检测中得到了广泛的应用。然而，由于这些方法都是基于有监督学习的，它们不能可靠地检测出训练数据中不存在的任何场景、故障或非故障。因此，本文利用无监督学习的优点，提出了一种卷积式HIF检测自动编码器框架（CAE-HIFD）。与传统的从正常行为学习的自动编码器不同，CAE-HIFD中的卷积自动编码器（CAE）仅从HIF信号学习，消除了在CAE训练中存在各种非HIF场景的需要。CAE通过采用互相关来区分HIF和非HIF工况。CAE-HIFD使用峰度（概率分布形状的统计度量）来区分HIF和瞬态干扰，如电容器或负载切换。使用IEEE 13节点测试馈线进行的性能评估研究表明，CAE-HIFD可靠地检测HIF，优于最先进的HIF检测技术，并且对噪声具有鲁棒性。摘要：High-impedance faults (HIF) are difficult to detect because of their low current amplitude and highly diverse characteristics. In recent years, machine learning (ML) has been gaining popularity in HIF detection because ML techniques learn patterns from data and successfully detect HIFs. However, as these methods are based on supervised learning, they fail to reliably detect any scenario, fault or non-fault, not present in the training data. Consequently, this paper takes advantage of unsupervised learning and proposes a convolutional autoencoder framework for HIF detection (CAE-HIFD). Contrary to the conventional autoencoders that learn from normal behavior, the convolutional autoencoder (CAE) in CAE-HIFD learns only from the HIF signals eliminating the need for presence of diverse non-HIF scenarios in the CAE training. CAE distinguishes HIFs from non-HIF operating conditions by employing cross-correlation. To discriminate HIFs from transient disturbances such as capacitor or load switching, CAE-HIFD uses kurtosis, a statistical measure of the probability distribution shape. The performance evaluation studies conducted using the IEEE 13-node test feeder indicate that the CAE-HIFD reliably detects HIFs, outperforms the state-of-the-art HIF detection techniques, and is robust against noise.

【5】 Hate Speech Detection in Clubhouse 标题：会所中的仇恨言语检测

作者：Hadi Mansourifar,Dana Alsagheer,Reza Fathi,Weidong Shi,Lan Ni,Yan Huang 机构：University of Houston, Houston, Texas, USA 备注：Accepted paper in International KDD Workshop on Misinformation and Misbehavior Mining on the Web 2021 链接：https://arxiv.org/abs/2106.13238 摘要：随着社交媒体上针对少数民族的攻击性语言的高度流行，反仇恨言论的产生被认为是应对这一挑战的自动方法。反仇恨言论应该作为第三种声音出现，在不限制言论自由原则的前提下教育人们，保持社会红线的大胆。反仇恨言论的产生基于一个乐观的假设，即任何试图干预社交媒体中仇恨言论的行为都能在这一背景下发挥积极作用。除此之外，以往的研究忽略了反语前后的评论顺序。据我们所知，还没有人试图从统计学的角度来衡量反仇恨言论的影响。在本文中，我们朝着这个方向迈出了第一步，通过测量反仇恨言论对下一条评论的影响，即Google视角得分。此外，我们的实验表明，反仇恨言论会产生负面影响，这种现象在社交媒体中被称为攻击。摘要：With high prevalence of offensive language against the minorities in social media, counter hate speech generation is considered as an automatic way to tackle this challenge. The counter hate speeches are supposed to appear as a third voice to educate people and keep the social red lines bold without limiting the freedom of speech principles. The counter hate speech generation is based on the optimistic assumption that, any attempt to intervene the hate speeches in social media can play a positive role in this context. Beyond that, previous works ignored to investigate the sequence of comments before and after counter speech. To the best of our knowledge, no attempt has been made to measure the counter hate speech impact from statistical point of view. In this paper, we take the first step in this direction by measuring the counter hate speech impact on the next comments in terms of Google Perspective Scores. Furthermore, our experiments show that, counter hate speech can cause negative impacts, a phenomena which is called aggression in social media.

分类|识别(5篇)

【1】 Littlestone Classes are Privately Online Learnable 标题：Littlestone课程是私人在线学习的

作者：Noah Golowich,Roi Livni 链接：https://arxiv.org/abs/2106.13513 摘要：我们考虑隐私约束下的在线分类问题。在这种情况下，学习者依次观察一组标记示例$（xu t，yu t）$，表示$1leq tleq t$，并在每次迭代时返回$t$一个假设$hu t$，用于预测每个新示例的标签$xu t$。学习者的表现是通过对已知假设类$mathcal{H}$的后悔来衡量的。我们要求算法满足以下隐私约束：算法输出的假设序列$hu 1、ldots、hu T$必须是整个输入序列$（xu 1，yu 1）、ldots，（xu T，yu T）$的$（epsilon、delta）$-微分私有函数。我们提供了第一个非琐碎的遗憾的现实设置绑定。具体地说，我们表明，如果类$mathcal{H}$具有常数的小一维，那么，给定一个不经意的标记示例序列，有一个私有学习者在期望中最多犯$O（logt）$个错误，这与非私有情况下的最佳错误界相当，直到一个对数因子。此外，对于小石维$d$的一般值，同样的错误界成立，但在$d$因子中有一个双指数。最近的一项研究表明，在线学习的课程和私人学习的课程之间有着紧密的联系。我们的结果加强了这种联系，并表明在线学习算法实际上可以直接私有化（在可实现的环境中）。我们还讨论了一个自适应设置，并给出了$O（sqrt{T}）$的次线性遗憾界。摘要：We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 leq t leq T$, and returns at each iteration $t$ a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $mathcal{H}$. We require that the algorithm satisfies the following privacy constraint: the sequence $h_1, ldots, h_T$ of hypotheses output by the algorithm needs to be an $(epsilon, delta)$-differentially private function of the whole input sequence $(x_1, y_1), ldots, (x_T, y_T)$. We provide the first non-trivial regret bound for the realizable setting. Specifically, we show that if the class $mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor. Moreover, for general values of the Littlestone dimension $d$, the same mistake bound holds but with a doubly-exponential in $d$ factor. A recent line of work has demonstrated a strong connection between classes that are online learnable and those that are differentially-private learnable. Our results strengthen this connection and show that an online learning algorithm can in fact be directly privatized (in the realizable setting). We also discuss an adaptive setting and provide a sublinear regret bound of $O(sqrt{T})$.

【2】 A hybrid model-based and learning-based approach for classification using limited number of training samples 标题：一种使用有限训练样本的基于模型和基于学习的混合分类方法

作者：Alireza Nooraiepour,Waheed U. Bajwa,Narayan B. Mandayam 机构：RutgersUniversity 备注：21 pages, 8 figures, Journal 链接：https://arxiv.org/abs/2106.13436 摘要：对于具有已知参数统计模型的物理系统，在给定有限训练数据样本的情况下，考虑分类的基本任务。基于独立学习和统计模型的分类器在使用小训练集完成分类任务方面面临着重大挑战。具体而言，仅依赖于基于物理的统计模型的分类器通常无法正确调整潜在的不可观测参数，这导致系统行为的不匹配表示。另一方面，基于学习的分类器通常依赖于来自底层物理过程的大量训练数据，这在大多数实际场景中可能是不可行的。本文提出了一种基于物理统计模型和基于学习的分类器相结合的混合分类方法。提出的解决方案是基于这样一个假设：HyPhyLearn将通过融合基于学习和基于统计模型的分类器各自的优点来缓解与各自方法相关的挑战。所提出的混合方法首先使用可用的（次优的）统计估计程序估计不可观测的模型参数，然后使用基于物理的统计模型生成合成数据。然后，将训练数据样本与合成数据合并到基于学习的分类器中，该分类器基于神经网络的域对抗训练。具体而言，为了解决不匹配问题，分类器学习从训练数据和合成数据到公共特征空间的映射。同时，为了完成分类任务，训练分类器在该空间中寻找有区别的特征。摘要：The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a mapping from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task.

【3】 byteSteady: Fast Classification Using Byte-Level n-Gram Embeddings 标题：byteSteady：使用字节级n-Gram嵌入的快速分类

作者：Xiang Zhang,Alexandre Drouin,Raymond Li 机构：Montreal, Quebec, Canada, Element AI, ServiceNow 链接：https://arxiv.org/abs/2106.13302 摘要：本文介绍了byteSteady——一种使用字节级n-gram嵌入的快速分类模型。byteSteady假设每个输入都是一个字节序列。表示向量是使用字节级n-gram的平均嵌入向量生成的，具有预定义的n集。使用散列技术来减少嵌入向量的数目。然后将该输入表示向量输入线性分类器。byteSteady的一个简单应用是文本分类。我们还将byteSteady应用于一类非语言数据——用于基因分类的DNA序列。对于这两个问题，我们在强基线下获得了竞争性分类结果，这表明byteSteady可以应用于语言和非语言数据。此外，我们发现使用哈夫曼编码的简单压缩不会显著影响结果，这提供了一个以前在机器学习中未被探索过的精度-速度折衷。摘要：This article introduces byteSteady -- a fast model for classification using byte-level n-gram embeddings. byteSteady assumes that each input comes as a sequence of bytes. A representation vector is produced using the averaged embedding vectors of byte-level n-grams, with a pre-defined set of n. The hashing trick is used to reduce the number of embedding vectors. This input representation vector is then fed into a linear classifier. A straightforward application of byteSteady is text classification. We also apply byteSteady to one type of non-language data -- DNA sequences for gene classification. For both problems we achieved competitive classification results against strong baselines, suggesting that byteSteady can be applied to both language and non-language data. Furthermore, we find that simple compression using Huffman coding does not significantly impact the results, which offers an accuracy-speed trade-off previously unexplored in machine learning.

【4】 Multitask Learning for Citation Purpose Classification 标题：面向引文目的分类的多任务学习

作者：Alex Oesterling,Angikar Ghosal,Haoyang Yu,Rui Xin,Yasa Baig,Lesia Semenova,Cynthia Rudin 机构：Duke University 备注：Second Workshop on Scholarly Document Processing 链接：https://arxiv.org/abs/2106.13275 摘要：我们现在进入2021年3C共享任务引用上下文分类的目的竞争的基础上。竞赛的目的是根据目的对科学文章中的引文进行分类。这项任务很重要，因为它可能导致更全面地总结科学文章的目的和用途，但也很困难，主要是因为现有的训练数据有限，其中每个引用的目的都是手工标注的，而且这些标注的主观性也很强。我们的参赛作品是一个多任务模型，它结合了多个模块，从不同的角度来处理问题，包括手工生成的语言特征，TF-IDF特征，和一个带注意的LSTM模型。我们还提供了消融研究和特征分析，其见解可能会导致未来的工作。摘要：We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the limited amount of available training data in which the purposes of each citation have been hand-labeled, along with the subjectivity of these labels. Our entry in the competition is a multi-task model that combines multiple modules designed to handle the problem from different perspectives, including hand-generated linguistic features, TF-IDF features, and an LSTM-with-attention model. We also provide an ablation study and feature analysis whose insights could lead to future work.

【5】 Generalized One-Class Learning Using Pairs of Complementary Classifiers 标题：基于互补分类器对的广义单类学习

作者：Anoop Cherian,Jue Wang 机构：com•Jue Wang is with the Research School of Engineering, The AustralianNational University 备注：Accepted at Trans. PAMI. arXiv admin note: text overlap with arXiv:1908.05884 链接：https://arxiv.org/abs/2106.13272 摘要：单类学习是一个经典的问题，它将模型与数据相匹配，而这些数据的注释只对单个类可用。在这篇文章中，我们探讨了一类学习的新目标，我们统称为广义一类判别子空间（goods）。我们的核心思想是学习一对互补分类器来灵活地约束一类数据分布，其中数据属于互补对中一个分类器的正半空间，而属于另一个分类器的负半空间。为了避免冗余，同时允许分类器决策面的非线性，我们建议将每个分类器设计为一个正交框架，并通过两个冲突目标的联合优化来学习这些框架，即：i）最小化两个框架之间的距离，以及ii）最大化帧和数据之间的边距。因此，学习的正交框架将描述一个分段线性决策面，该决策面允许有效的推理，而我们的目标是将数据限制在一个最小的体积内，使决策裕度最大化，从而有力地捕获数据分布。我们探讨了我们的公式在不同的约束条件下对组成分类器的几种变体，包括核化特征映射。我们通过对计算机视觉中的一些应用（如视频序列中的异常检测、人类姿势和人类活动）的数据进行实验，证明了我们的方法的经验优势。我们还通过在几个UCI数据集上的实验，探索了GODS在非视觉任务中的通用性和有效性，展示了最新的结果。摘要：One-class learning is the classic problem of fitting a model to the data for which annotations are available only for a single class. In this paper, we explore novel objectives for one-class learning, which we collectively refer to as Generalized One-class Discriminative Subspaces (GODS). Our key idea is to learn a pair of complementary classifiers to flexibly bound the one-class data distribution, where the data belongs to the positive half-space of one of the classifiers in the complementary pair and to the negative half-space of the other. To avoid redundancy while allowing non-linearity in the classifier decision surfaces, we propose to design each classifier as an orthonormal frame and seek to learn these frames via jointly optimizing for two conflicting objectives, namely: i) to minimize the distance between the two frames, and ii) to maximize the margin between the frames and the data. The learned orthonormal frames will thus characterize a piecewise linear decision surface that allows for efficient inference, while our objectives seek to bound the data within a minimal volume that maximizes the decision margin, thereby robustly capturing the data distribution. We explore several variants of our formulation under different constraints on the constituent classifiers, including kernelized feature maps. We demonstrate the empirical benefits of our approach via experiments on data from several applications in computer vision, such as anomaly detection in video sequences, human poses, and human activities. We also explore the generality and effectiveness of GODS for non-vision tasks via experiments on several UCI datasets, demonstrating state-of-the-art results.

表征(1篇)

【1】 Decomposed Mutual Information Estimation for Contrastive Representation Learning 标题：用于对比表征学习的分解互信息估计

作者：Alessandro Sordoni,Nouha Dziri,Hannes Schulz,Geoff Gordon,Phil Bachman,Remi Tachet 机构： 20 1 4) and self-Equal contribution 1Microsoft Research 2University ofAlberta 备注：ICML 2021 链接：https://arxiv.org/abs/2106.13401 摘要：最近的对比表征学习方法依赖于对背景的多个视角之间的互信息的估计。例如，我们可以通过应用数据扩充来导出给定图像的多个视图，或者我们可以将序列分割为包含序列中某个步骤的过去和未来的视图。MI的对比下界易于优化，但在估计大量MI时存在很强的低估偏差。我们建议将完整的MI估计问题分解为一个较小的估计问题，方法是将其中一个视图拆分为逐渐增加信息的子视图，并在分解的视图之间应用MI链规则。此表达式包含无条件和条件MI项的总和，每个项测量总MI的适度部分，这有助于通过对比边界进行近似。为了使求和最大化，我们在条件MI上构造了一个对比下界，这个下界可以有效地逼近。我们将我们的一般方法称为互信息分解估计（DEMI）。我们发现，与标准的非分解对比边界相比，DEMI在合成环境中能够捕获更多的MI，并且在视觉领域和对话生成中学习更好的表示。摘要：Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

优化|敛散性(1篇)

【1】 A mechanistic-based data-driven approach to accelerate structural topology optimization through finite element convolutional neural network (FE-CNN) 标题：基于力学的数据驱动有限元卷积神经网络(FE-CNN)加速结构拓扑优化

作者：Tianle Yue,Hang Yang,Zongliang Du,Chang Liu,Khalil I. Elkhodary,Shan Tang,Xu Guo 机构：State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, Dalian, University of Technology, Dalian, PR China 链接：https://arxiv.org/abs/2106.13652 摘要：提出了一种基于数据驱动的有限元卷积神经网络（FE-CNN）加速结构拓扑优化方法。我们的方法可以分为两个阶段：离线训练和在线优化。在离线训练中，在给定设计域的高分辨率和低分辨率表示之间建立映射函数。映射由FE-CNN表示，它针对不同分辨率的设计域的共同目标函数值（例如，结构符合性）。在在线优化过程中，通过训练映射函数将任意高分辨率设计域降到低分辨率。因此，最初的高分辨率域是通过仅在低分辨率版本上执行计算，然后反向映射回高分辨率域来设计的。数值算例表明，该方法可使优化速度提高一个数量级。因此，我们提出的方法在克服基于密度的结构拓扑优化所带来的维数灾难方面显示出巨大的潜力。本文还讨论了现有方法的局限性。摘要：In this paper, a mechanistic data-driven approach is proposed to accelerate structural topology optimization, employing an in-house developed finite element convolutional neural network (FE-CNN). Our approach can be divided into two stages: offline training, and online optimization. During offline training, a mapping function is built between high and low resolution representations of a given design domain. The mapping is expressed by a FE-CNN, which targets a common objective function value (e.g., structural compliance) across design domains of differing resolutions. During online optimization, an arbitrary design domain of high resolution is reduced to low resolution through the trained mapping function. The original high-resolution domain is thus designed by computations performed on only the low-resolution version, followed by an inverse mapping back to the high-resolution domain. Numerical examples demonstrate that this approach can accelerate optimization by up to an order of magnitude in computational time. Our proposed approach therefore shows great potential to overcome the curse-of-dimensionality incurred by density-based structural topology optimization. The limitation of our present approach is also discussed.

预测|估计(6篇)

【1】 Deep Interpretable Criminal Charge Prediction and Algorithmic Bias 标题：深层可解释刑事罪名预测与算法偏差

作者：Abdul Rafae Khan,Jia Xu,Peter Varsanyi,Rachit Pabreja 备注：First two authors alphabetically ordered 链接：https://arxiv.org/abs/2106.13456 摘要：虽然预测性警务在协助刑事司法系统作出决定方面越来越普遍，但这些结果的使用仍然存在争议。一些基于深度学习的软件缺乏准确性（例如，在F-1中），许多决策过程不透明，导致对决策偏差的怀疑，如种族、年龄和性别差异。本文通过事后解释来解决偏见问题，通过学习20年来的时间行为模式，提供一个可靠的预测，即一个人是否会收到未来的刑事指控。Bi-LSTM缓解了消失梯度问题，注意机制允许学习和解释特征的重要性。我们的方法在真实数据集上显示了一致和可靠的预测精度和召回率。我们对每个输入特征的重要性的分析显示了对决策的关键因果影响，表明犯罪史是统计上显著的因素，而种族、性别和年龄等识别因素则不是。最后，我们的算法表明，随着时间的推移，嫌疑人倾向于逐渐而不是突然增加犯罪的严重程度。摘要：While predictive policing has become increasingly common in assisting with decisions in the criminal justice system, the use of these results is still controversial. Some software based on deep learning lacks accuracy (e.g., in F-1), and many decision processes are not transparent causing doubt about decision bias, such as perceived racial, age, and gender disparities. This paper addresses bias issues with post-hoc explanations to provide a trustable prediction of whether a person will receive future criminal charges given one's previous criminal records by learning temporal behavior patterns over twenty years. Bi-LSTM relieves the vanishing gradient problem, and attentional mechanisms allows learning and interpretation of feature importance. Our approach shows consistent and reliable prediction precision and recall on a real-life dataset. Our analysis of the importance of each input feature shows the critical causal impact on decision-making, suggesting that criminal histories are statistically significant factors, while identifiers, such as race, gender, and age, are not. Finally, our algorithm indicates that a suspect tends to gradually rather than suddenly increase crime severity level over time.

【2】 Fine-grained Geolocation Prediction of Tweets with Human Machine Collaboration 标题：基于人机协同的推文细粒度地理位置预测

作者：Florina Dutt,Subhajit Das 机构：Georgia Institute of Technology USA, Atlanta, GA 备注：7 pages 链接：https://arxiv.org/abs/2106.13411 摘要：Twitter是一个有用的资源，可以分析人们对各种话题的看法。通常，这些主题与这些Tweet帖子的发布位置相关。例如，餐馆老板可能需要了解他们的目标顾客在哪里吃饭，以及与食物有关的帖子的情绪，政策规划者可能需要分析市民对城市、县或州的特定地区的犯罪、安全、拥堵等相关问题的意见。尽管这很有希望，但只有不到1%$的爬网Tweet帖子带有地理位置标签。这使得准确预测非地理标记Tweet的Tweet帖子对于分析各个领域的数据非常关键。在这项研究中，我们利用数以百万计的Twitter帖子和最终用户领域的专业知识，利用自然语言处理（NLP）技术构建了一套深层次的神经网络模型，以预测非地理标记的Tweet帖子在不同粒度级别（如邻域、zipcode和经纬度）的地理位置。通过多个神经结构实验和人机协同工作流程设计，我们正在进行的地理位置检测工作显示了有希望的结果，使最终用户能够将所选变量之间的关系与位置信息关联起来。摘要：Twitter is a useful resource to analyze peoples' opinions on various topics. Often these topics are correlated or associated with locations from where these Tweet posts are made. For example, restaurant owners may need to know where their target customers eat with respect to the sentiment of the posts made related to food, policy planners may need to analyze citizens' opinion on relevant issues such as crime, safety, congestion, etc. with respect to specific parts of the city, or county or state. As promising as this is, less than $1%$ of the crawled Tweet posts come with geolocation tags. That makes accurate prediction of Tweet posts for the non geo-tagged tweets very critical to analyze data in various domains. In this research, we utilized millions of Twitter posts and end-users domain expertise to build a set of deep neural network models using natural language processing (NLP) techniques, that predicts the geolocation of non geo-tagged Tweet posts at various level of granularities such as neighborhood, zipcode, and longitude with latitudes. With multiple neural architecture experiments, and a collaborative human-machine workflow design, our ongoing work on geolocation detection shows promising results that empower end-users to correlate relationship between variables of choice with the location information.

【3】 Covariance-Aware Private Mean Estimation Without Private Covariance Estimation 标题：无私有协方差估计的协方差感知私有均值估计

作者：Gavin Brown,Marco Gaboardi,Adam Smith,Jonathan Ullman,Lydia Zakynthinou 链接：https://arxiv.org/abs/2106.13329 摘要：对于协方差未知的$d$维（亚）高斯分布，我们提出了两个样本有效的差分私有均值估计。非正式地说，给定这样一个分布的$ngtrsim d/alpha^2$样本，平均值为$mu$，协方差为$Sigma$，我们的估计器输出$tildemu$，即$\tildemu-mu{Sigma}leqalpha$，其中$\cdot{Sigma}$是马氏距离。以前所有具有相同保证的估计要么要求协方差矩阵上的强先验界，要么要求$Omega（d^{3/2}）$样本。我们的每一个估计器都是基于一个简单的、通用的方法来设计不同的私有机制，但是采用了新的技术步骤来使估计器私有化和提高样本效率。我们的第一个估计器使用指数机制对具有近似最大Tukey深度的点进行采样，但仅限于具有大Tukey深度的点集。证明这种机制是私有的需要一种新颖的分析。我们的第二个估计器使用校准到经验协方差的噪声干扰数据集的经验平均值，而不释放协方差本身。它的样本复杂度保证更适用于次高斯分布，尽管对隐私参数的依赖性稍差。对于这两种估计器，都需要对数据进行仔细的预处理，以满足不同的隐私要求。摘要：We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions with unknown covariance. Informally, given $n gtrsim d/alpha^2$ samples from such a distribution with mean $mu$ and covariance $Sigma$, our estimators output $tildemu$ such that $| tildemu - mu |_{Sigma} leq alpha$, where $| cdot |_{Sigma}$ is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $Omega(d^{3/2})$ samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.

【4】 Domain-guided Machine Learning for Remotely Sensed In-Season Crop Growth Estimation 标题：基于领域指导的机器学习在作物生长遥感估测中的应用

作者：George Worrall,Anand Rangarajan,Jasmeet Judge 机构： University of FloridaA, Rangarajan is with the Department of Computer & Information Science& Engineering, University of Florida[9]–[ 1 3] 备注：7 pages, 7 tables, 11 figures 链接：https://arxiv.org/abs/2106.13323 摘要：先进的机器学习技术已被用于遥感（RS）应用中，如作物制图和产量预测，但在跟踪作物进程方面的应用仍然不足。在这项研究中，我们展示了在基于长-短期记忆的领域引导神经网络（DgNN）中使用作物生长驱动因素的农艺知识来进行季节性作物进度估计。DgNN使用分支结构和注意力来分离独立的作物生长驱动因素，并捕捉它们在整个生长季节中的不同重要性。DgNN使用爱荷华州2003-2019年期间的遥感数据对玉米实施，美国农业部作物进度报告用作地面实况。全州范围内的DgNN的性能显示出明显的改善，比序列和密集的神经网络结构，以及广泛使用的隐马尔可夫模型方法。在所有生长阶段，DgNN的Nash-Sutfliffe效率都比其他NN高3.5%，在试验年份，具有最高余弦相似性的周数比其他NN多33%。DgNN和序贯NN在作物异常生长期间更为稳健，尽管所有方法都难以估计吐丝-颗粒物转变。最后，均匀流形近似和层激活的投影可视化显示了基于LSTM的NNs是如何将作物生长时间序列与纯稠密结构分开的。本研究的结果显示NNs在作物生长阶段估计（CGSE）中的可行性和使用领域知识的好处。本文提出的DgNN方法可以扩展到提供其他作物的近实时CGSE。摘要：Advanced machine learning techniques have been used in remote sensing (RS) applications such as crop mapping and yield prediction, but remain under-utilized for tracking crop progress. In this study, we demonstrate the use of agronomic knowledge of crop growth drivers in a Long Short-Term Memory-based, Domain-guided neural network (DgNN) for in-season crop progress estimation. The DgNN uses a branched structure and attention to separate independent crop growth drivers and capture their varying importance throughout the growing season. The DgNN is implemented for corn, using RS data in Iowa for the period 2003-2019, with USDA crop progress reports used as ground truth. State-wide DgNN performance shows significant improvement over sequential and dense-only NN structures, and a widely-used Hidden Markov Model method. The DgNN had a 3.5% higher Nash-Sutfliffe efficiency over all growth stages and 33% more weeks with highest cosine similarity than the other NNs during test years. The DgNN and Sequential NN were more robust during periods of abnormal crop progress, though estimating the Silking-Grainfill transition was difficult for all methods. Finally, Uniform Manifold Approximation and Projection visualizations of layer activations showed how LSTM-based NNs separate crop growth time-series differently from a dense-only structure. Results from this study exhibit both the viability of NNs in crop growth stage estimation (CGSE) and the benefits of using domain knowledge. The DgNN methodology presented here can be extended to provide near-real time CGSE of other crops.

【5】 Prediction of Hereditary Cancers Using Neural Networks 标题：神经网络在遗传性癌症预测中的应用

作者：Zoe Guan,Giovanni Parmigiani,Danielle Braun,Lorenzo Trippa 链接：https://arxiv.org/abs/2106.13682 摘要：家族史是多种癌症的主要危险因素。孟德尔风险预测模型将家族史转化为基于癌症易感基因知识的癌症风险预测。这些模型被广泛应用于临床实践，以帮助识别高危人群。孟德尔模型利用了整个家族史，但它们依赖于许多关于癌症易感基因的假设，这些假设要么不现实，要么由于突变率低而难以验证。在大型系谱数据库上训练更灵活的模型，如神经网络，有可能提高精确度。在本文中，我们开发了一个框架，将神经网络应用于家族史数据，并研究他们学习癌症遗传易感性的能力。虽然有大量关于神经网络及其在许多任务中的最新表现的文献，但将其应用于家族史数据的工作却很少。我们提出全连接神经网络和卷积神经网络的适应谱系。在孟德尔遗传的模拟数据中，我们证明了我们提出的神经网络模型能够达到接近最优的预测性能。此外，当观察到的家族史包括错误的癌症诊断时，神经网络能够比嵌入正确遗传规律的孟德尔BRCAPRO模型更好。使用一个超过200000个家族史的大数据集，风险服务队列，我们训练了乳腺癌未来风险的预测模型。我们使用癌症遗传学网络的数据来验证模型。摘要：Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper, we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.

【6】 Prediction of geophysical properties of rocks on rare well data and attributes of seismic waves by machine learning methods on the example of the Achimov formation 标题：基于稀有井资料和地震波属性的机器学习方法预测岩石地球物理性质--以阿奇诺夫组为例

作者：Dmitry Ivlev 备注：15 pages, 10 figures, 1 table 链接：https://arxiv.org/abs/2106.13274 摘要：利用测井资料和地震属性预测生产性沉积物中砂体的发育。以西伯利亚西部油田阿奇莫夫沉积杂岩的产层段为研究对象。这项研究展示了一系列机器学习算法、用合成数据丰富源数据的方法以及创建新特征的算法。结果表明，岩石天然放射性值与地震波场属性的回归关系模型具有较好的预测效果。通过模型交叉验证和新井结果后的数据验证了预测的可接受性。摘要：Purpose of this research is to forecast the development of sand bodies in productive sediments based on well log data and seismic attributes. The object of the study is the productive intervals of Achimov sedimentary complex in the part of oil field located in Western Siberia. The research shows a technological stack of machine learning algorithms, methods for enriching the source data with synthetic ones and algorithms for creating new features. The result was the model of regression relationship between the values of natural radioactivity of rocks and seismic wave field attributes with an acceptable prediction quality. Acceptable quality of the forecast is confirmed both by model cross validation, and by the data obtained following the results of new well.

其他神经网络|深度学习|模型|建模(19篇)

【1】 Self-training Converts Weak Learners to Strong Learners in Mixture Models 标题：在混合模式中自我训练将弱者转化为强者

作者：Spencer Frei,Difan Zou,Zixiang Chen,Quanquan Gu 机构：and 备注：21 pages 链接：https://arxiv.org/abs/2106.13805 摘要：当数据来自满足浓度和反浓度特性的两个各向同性分布的混合物时，我们考虑了一个二元分类问题。我们证明了存在一个普适常数$C{mathrm{err}}>0$，如果一个伪标号$boldsymbol{beta}{mathrm{pl}}}$最多可以达到$C{mathrm{err}}}$，那么对于任何$varepsilon>0$，使用伪标签$hat y=mathrm{sgn}（langleboldsymbol{beta}t，mathbf{x}rangle）$和最多使用$tilde O（d/varepsilon^2）$的未标记示例初始化在$boldsymbol{beta}u 0:=boldsymbol{beta}}和$tilde O（d/varepsilon^2）$的迭代自训练算法足以学习Bayes最优分类器$varepsilon$，其中，$d$是环境维度。也就是说，自我训练只使用未标记的例子将弱学习者转化为强学习者。我们还表明，通过对logistic损失进行梯度下降，仅使用$O（d）$标记的示例（即独立于$varepsilon$）就可以获得分类错误为$C{mathrm{err}}}$的伪标签$boldsymbol{beta}{mathrm{pl}}$。我们的结果表明，通过半监督自学习算法，混合模型最多可以使用$O（d）$标记样本和$tilde O（d/varepsilon^2）$未标记样本学习到贝叶斯最优精度的$varepsilon$。摘要：We consider a binary classification problem when the data comes from a mixture of two isotropic distributions satisfying concentration and anti-concentration properties enjoyed by log-concave distributions among others. We show that there exists a universal constant $C_{mathrm{err}}>0$ such that if a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ can achieve classification error at most $C_{mathrm{err}}$, then for any $varepsilon>0$, an iterative self-training algorithm initialized at $boldsymbol{beta}_0 := boldsymbol{beta}_{mathrm{pl}}$ using pseudolabels $hat y = mathrm{sgn}(langle boldsymbol{beta}_t, mathbf{x}rangle)$ and using at most $tilde O(d/varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $varepsilon$ error, where $d$ is the ambient dimension. That is, self-training converts weak learners to strong learners using only unlabeled examples. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbol{beta}_{mathrm{pl}}$ with classification error $C_{mathrm{err}}$ using only $O(d)$ labeled examples (i.e., independent of $varepsilon$). Together our results imply that mixture models can be learned to within $varepsilon$ of the Bayes-optimal accuracy using at most $O(d)$ labeled examples and $tilde O(d/varepsilon^2)$ unlabeled examples by way of a semi-supervised self-training algorithm.

【2】 Conjugate Energy-Based Models 标题：基于共轭能量的模型

作者：Hao Wu,Babak Esmaeili,Michael Wick,Jean-Baptiste Tristan,Jan-Willem van de Meent 机构：Northeastern University 链接：https://arxiv.org/abs/2106.13798 摘要：在本文中，我们提出了共轭能量模型（CEBMs），这是一类新的基于能量的模型，定义了数据和潜在变量的联合密度。CEBM的联合密度分解为数据上的难处理分布和潜在变量上的可处理后验分布。CEBMs与变分自动编码器有相似的用例，因为它们学习从数据到潜在变量的无监督映射。然而，这些模型省略了一个生成器网络，这使得它们能够学习数据点之间更灵活的相似性概念。我们的实验证明共轭EBMs在图像建模、潜在空间预测能力和各种数据集的域外检测方面都取得了很好的效果。摘要：In this paper, we propose conjugate energy-based models (CEBMs), a new class of energy-based models that define a joint density over data and latent variables. The joint density of a CEBM decomposes into an intractable distribution over data and a tractable posterior over latent variables. CEBMs have similar use cases as variational autoencoders, in the sense that they learn an unsupervised mapping from data to latent variables. However, these models omit a generator network, which allows them to learn more flexible notions of similarity between data points. Our experiments demonstrate that conjugate EBMs achieve competitive results in terms of image modelling, predictive power of latent space, and out-of-domain detection on a variety of datasets.

【3】 Nonlinear Acoustic Echo Cancellation with Deep Learning 标题：基于深度学习的非线性声学回波抵消

作者：Amir Ivry,Israel Cohen,Baruch Berdugo 机构：Technion – Israel Institute of Technology, Technion City, Haifa , Israel 备注：Accepted to Interspeech 2021 链接：https://arxiv.org/abs/2106.13754 摘要：提出了一种非线性声学回波抵消系统，该系统分两部分对从远端信号到近端传声器的回波路径进行建模。受现代免提设备物理特性的启发，我们首先介绍了一种新的神经网络结构，该结构专门用于模拟这些设备在接收和播放远端信号之间产生的非线性失真。为了考虑不同设备之间的差异，我们构建了一个具有可训练记忆长度和非线性激活函数的网络，这些函数不是预先参数化的，而是在训练阶段使用训练数据进行优化的。其次，该网络由一个标准的自适应线性滤波器取代，该滤波器不断跟踪扬声器输出和麦克风之间的回声路径。在训练过程中，对网络和滤波器进行联合优化，学习网络参数。这个系统需要17000个参数，每秒消耗5亿个浮点运算和40KB的内存。它还满足标准神经处理器上的免提通信时序要求，这使得它足以嵌入免提通信设备。使用280小时的真实数据和合成数据，实验表明，性能优于竞争的方法。摘要：We propose a nonlinear acoustic echo cancellation system, which aims to model the echo path from the far-end signal to the near-end microphone in two parts. Inspired by the physical behavior of modern hands-free devices, we first introduce a novel neural network architecture that is specifically designed to model the nonlinear distortions these devices induce between receiving and playing the far-end signal. To account for variations between devices, we construct this network with trainable memory length and nonlinear activation functions that are not parameterized in advance, but are rather optimized during the training stage using the training data. Second, the network is succeeded by a standard adaptive linear filter that constantly tracks the echo path between the loudspeaker output and the microphone. During training, the network and filter are jointly optimized to learn the network parameters. This system requires 17 thousand parameters that consume 500 Million floating-point operations per second and 40 Kilo-bytes of memory. It also satisfies hands-free communication timing requirements on a standard neural processor, which renders it adequate for embedding on hands-free communication devices. Using 280 hours of real and synthetic data, experiments show advantageous performance compared to competing methods.

【4】 Recurrent Coupled Topic Modeling over Sequential Documents 标题：时序文档的递归耦合主题建模

作者：Jinjin Guo,Longbing Cao,Zhiguo Gong 机构：of Macau, China 链接：https://arxiv.org/abs/2106.13732 摘要：丰富的连续文档（如在线档案、社交媒体和新闻提要）得到了简化更新，其中每个文档块都与平稳发展但相互依赖的主题相结合。这样的数字文本吸引了对动态主题建模的广泛研究，以推断隐藏的演化主题及其时间依赖性。然而，现有的方法大多关注于单主题线程的演化，而忽略了当前主题可能与多个相关的先前主题耦合的事实。此外，这些方法在推断潜在参数时也会产生难以解决的推理问题，导致计算量大和性能下降。在这项工作中，我们假设当前主题从所有具有相应耦合权重的先前主题演化而来，形成多主题线程演化。我们的方法对演化主题之间的依赖关系进行建模，并对其跨时间步的复杂多重耦合进行彻底编码。为了克服这一棘手的推理难题，提出了一种新的解决方案，采用了一套新颖的数据扩充技术，成功地解决了演化主题之间的多重耦合问题。从而得到了一个完全共轭的模型，保证了推理技术的有效性和效率。一种新的Gibbs采样器采用了一种后向-前向滤波算法，有效地学习了封闭形式的潜在时间演化参数。此外，利用潜在印度自助餐过程（IBP）复合分布，自动推断出每个连续文档的总主题数，并定制稀疏主题比例。在合成数据集和真实数据集上对该方法进行了评价，结果表明，该方法具有较低的字词复杂度、较高的主题连贯性和较好的文档时间预测能力。摘要：The abundant sequential documents such as online archival, social media and news feeds are streamingly updated, where each chunk of documents is incorporated with smoothly evolving yet dependent topics. Such digital texts have attracted extensive research on dynamic topic modeling to infer hidden evolving topics and their temporal dependencies. However, most of the existing approaches focus on single-topic-thread evolution and ignore the fact that a current topic may be coupled with multiple relevant prior topics. In addition, these approaches also incur the intractable inference problem when inferring latent parameters, resulting in a high computational cost and performance degradation. In this work, we assume that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution. Our method models the dependencies between evolving topics and thoroughly encodes their complex multi-couplings across time steps. To conquer the intractable inference challenge, a new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A fully conjugate model is thus obtained to guarantee the effectiveness and efficiency of the inference technique. A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent timeevolving parameters in a closed-form. In addition, the latent Indian Buffet Process (IBP) compound distribution is exploited to automatically infer the overall topic number and customize the sparse topic proportions for each sequential document without bias. The proposed method is evaluated on both synthetic and real-world datasets against the competitive baselines, demonstrating its superiority over the baselines in terms of the low per-word perplexity, high coherent topics, and better document time prediction.

【5】 Ranger21: a synergistic deep learning optimizer 标题：Ranger21：一种协同式深度学习优化器

作者：Less Wright,Nestor Demeure 机构：AudereNow.org, nd Ave Ste , Seattle, WA , USA, National Energy Research Scientific Computing Center, Lawrence Berkeley National Lab, Cyclotron Road, Berkeley, California 备注：for associated code, see this https URL 链接：https://arxiv.org/abs/2106.13731 摘要：由于优化算法对神经网络的性能至关重要，因此每年都有大量的创新论文发表。然而，尽管这些出版物中的大多数都对现有算法进行了增量改进，但它们往往是作为新的优化器而不是可组合的算法来呈现的。因此，许多有价值的改进很少出现在他们最初的出版物中。利用这一未开发的潜力，我们将介绍Ranger21，这是一种新的优化器，它将AdamW与八个组件结合在一起，在回顾和测试了文献中的思想之后精心挑选。我们发现，生成的优化器提供了显著提高的验证精度和训练速度，更平滑的训练曲线，甚至能够在ImageNet2012上训练ResNet50，而无需批量标准化层。AdamW系统地停留在一个坏的初始状态的问题。摘要：As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state.

【6】 Interval and fuzzy physics-informed neural networks for uncertain fields 标题：不确定领域的区间模糊物理信息神经网络

作者：Jan Niklas Fuhg,Amélie Fau,Nikolaos Bouklas 机构：Sibley School of Mechanical and Aerospace Engineering, Cornell University, New York, USA, Université Paris-Saclay, ENS Paris-Saclay, CNRS, LMT, Laboratoire de Mécanique et Technologie, Gif-sur-Yvette, France 备注：13 pages,12 figures 链接：https://arxiv.org/abs/2106.13727 摘要：时间和空间相关的不确定参数在工程应用中经常遇到。通常，这些不确定性是用随机场和过程来解释的，这些随机场和过程需要关于出现的概率分布函数的知识，而概率分布函数是不容易获得的。在这些情况下，非概率方法，如区间分析和模糊集理论是有用的不确定性措施。含模糊场和区间场的偏微分方程传统上采用有限元法求解，其中输入场采用基函数展开法采样。然而，这种方法是有问题的，因为它依赖于空间相关领域的知识。在这项工作中，我们利用物理信息神经网络（PINNs）来解决区间和模糊偏微分方程。所得到的网络结构称为区间物理信息神经网络（iPINNs）和模糊物理信息神经网络（fPINNs），对于获得包含空间不确定参数场的方程的有界解具有很好的效果。与有限元方法相比，不需要输入场的相关长度规范，也不需要通过蒙特卡罗模拟进行平均。事实上，关于输入区间域的信息是作为所提出的解决方案的副产品直接获得的。此外，PINNs的所有主要优点都得到了保留，即格式的无网格性和易于反问题的建立。摘要：Temporally and spatially dependent uncertain parameters are regularly encountered in engineering applications. Commonly these uncertainties are accounted for using random fields and processes which require knowledge about the appearing probability distributions functions which is not readily available. In these cases non-probabilistic approaches such as interval analysis and fuzzy set theory are helpful uncertainty measures. Partial differential equations involving fuzzy and interval fields are traditionally solved using the finite element method where the input fields are sampled using some basis function expansion methods. This approach however is problematic, as it is reliant on knowledge about the spatial correlation fields. In this work we utilize physics-informed neural networks (PINNs) to solve interval and fuzzy partial differential equations. The resulting network structures termed interval physics-informed neural networks (iPINNs) and fuzzy physics-informed neural networks (fPINNs) show promising results for obtaining bounded solutions of equations involving spatially uncertain parameter fields. In contrast to finite element approaches, no correlation length specification of the input fields as well as no averaging via Monte-Carlo simulations are necessary. In fact, information about the input interval fields is obtained directly as a byproduct of the presented solution scheme. Furthermore, all major advantages of PINNs are retained, i.e. meshfree nature of the scheme, and ease of inverse problem set-up.

【7】 Bayesian Neural Networks: Essentials 标题：贝叶斯神经网络：要点

作者：Daniel T. Chang 链接：https://arxiv.org/abs/2106.13594 摘要：贝叶斯神经网络利用概率层捕捉权重和激活的不确定性，并使用贝叶斯推理进行训练。由于这些概率层的设计是为了替换它们的确定性计数器部件，贝叶斯神经网络提供了一种直接而自然的方法来扩展传统的深度神经网络以支持概率深度学习。然而，由于贝叶斯神经网络的复杂性，对其进行理解、设计和训练是非常重要的。我们讨论了贝叶斯神经网络的本质，包括对偶性（深层神经网络、概率模型）、近似贝叶斯推理、贝叶斯先验、贝叶斯后验和深层变分学习。我们使用TensorFlow概率API和代码示例进行说明。贝叶斯神经网络的主要问题是，深层神经网络的结构使得对大量连续层的不确定性进行解释变得非常冗余，而且成本高昂。混合贝叶斯神经网络是一种实用的解决方案，它使用少量的概率层在网络中进行司法定位。摘要：Bayesian neural networks utilize probabilistic layers that capture uncertainty over weights and activations, and are trained using Bayesian inference. Since these probabilistic layers are designed to be drop-in replacement of their deterministic counter parts, Bayesian neural networks provide a direct and natural way to extend conventional deep neural networks to support probabilistic deep learning. However, it is nontrivial to understand, design and train Bayesian neural networks due to their complexities. We discuss the essentials of Bayesian neural networks including duality (deep neural networks, probabilistic models), approximate Bayesian inference, Bayesian priors, Bayesian posteriors, and deep variational learning. We use TensorFlow Probability APIs and code examples for illustration. The main problem with Bayesian neural networks is that the architecture of deep neural networks makes it quite redundant, and costly, to account for uncertainty for a large number of successive layers. Hybrid Bayesian neural networks, which use few probabilistic layers judicially positioned in the networks, provide a practical solution.

【8】 Learning Gradual Argumentation Frameworks using Genetic Algorithms 标题：使用遗传算法学习渐进式论证框架

作者：Jonathan Spieler,Nico Potyka,Steffen Staab 机构： andSteffen Staab[0000−000 2−0780− 4 1 5 4]University of Stuttgart 链接：https://arxiv.org/abs/2106.13585 摘要：渐进论证框架在加权图中表示参数及其关系。它们的图形结构和直观的语义使它们成为解释性机器学习的潜在有趣工具。最近人们注意到，它们的机制与神经网络密切相关，神经网络允许通过标准的深度学习框架从数据中学习它们的权重。作为第一个概念证明，我们提出了一种遗传算法来同时学习辩论分类模型的结构。为了得到一个可解释的模型，适应度函数平衡了分类器的稀疏性和准确性。我们讨论了我们的算法，并在UCI机器学习库的标准基准上给出了第一个实验结果。我们的原型学习的辩论分类模型是相当于决策树的学习性能和可解释性。摘要：Gradual argumentation frameworks represent arguments and their relationships in a weighted graph. Their graphical structure and intuitive semantics makes them a potentially interesting tool for interpretable machine learning. It has been noted recently that their mechanics are closely related to neural networks, which allows learning their weights from data by standard deep learning frameworks. As a first proof of concept, we propose a genetic algorithm to simultaneously learn the structure of argumentative classification models. To obtain a well interpretable model, the fitness function balances sparseness and accuracy of the classifier. We discuss our algorithm and present first experimental results on standard benchmarks from the UCI machine learning repository. Our prototype learns argumentative classification models that are comparable to decision trees in terms of learning performance and interpretability.

【9】 Tensor-based framework for training flexible neural networks 标题：基于张量的柔性神经网络训练框架

作者：Yassine Zniyed,Konstantin Usevich,Sebastian Miron,David Brie 机构： Universit´e de Lorraine 备注：26 pages, 13 figures 链接：https://arxiv.org/abs/2106.13542 摘要：激活函数（AFs）是神经网络设计的重要组成部分，其选择对神经网络的性能起着决定性的作用。在这项工作中，我们特别感兴趣的是使用基于张量的解来估计灵活的激活函数，其中AFs表示为预定义基函数的加权和。为此，我们提出了一种新的学习算法来解决约束耦合矩阵张量分解（CMTF）问题。该技术融合了神经网络的一阶和零阶信息，其中一阶信息包含在一个雅可比张量中，然后进行约束正则多元分解（CPD）。该算法可以处理不同的分解基。该方法的目的是通过用一个新的柔性层代替原网络的一层或多层子网，来压缩大的预训练神经网络模型。将该方法应用于用于字符分类的预训练卷积神经网络（CNN）。摘要：Activation functions (AFs) are an important part of the design of neural networks (NNs), and their choice plays a predominant role in the performance of a NN. In this work, we are particularly interested in the estimation of flexible activation functions using tensor-based solutions, where the AFs are expressed as a weighted sum of predefined basis functions. To do so, we propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem. This technique fuses the first and zeroth order information of the NN, where the first-order information is contained in a Jacobian tensor, following a constrained canonical polyadic decomposition (CPD). The proposed algorithm can handle different decomposition bases. The goal of this method is to compress large pretrained NN models, by replacing subnetworks, {em i.e.,} one or multiple layers of the original network, by a new flexible layer. The approach is applied to a pretrained convolutional neural network (CNN) used for character classification.

【10】 Phoneme-aware and Channel-wise Attentive Learning for Text DependentSpeaker Verification 标题：文本相关说话人确认中的音素感知和通道注意学习

作者：Yan Liu,Zheng Li,Lin Li,Qingyang Hong 机构：School of Electronic Science and Engineering, Xiamen University, China, School of Informatics, Xiamen University, China 链接：https://arxiv.org/abs/2106.13514 摘要：提出了一种基于音素感知和通道注意学习策略的文本相关说话人验证多任务学习网络。该结构采用帧级多任务学习和分段级对抗学习相结合的方法进行说话人嵌入提取。在主网的帧级特征上利用语音感知注意池进行说话人分类器，并对辅助子网中的语音分布给出相应的后验概率。此外，压缩和激励（SE块）的引入实现了动态的通道特征重校准，提高了表征能力。该方法利用了说话人与短语相关的特质，并分别从时间和通道两个方面对基于音素的注意池和SE块进行了改进。在RSR2015 Part 1数据库上进行的实验表明，该系统在文本相关SV上取得了很好的效果。摘要：This paper proposes a multi-task learning network with phoneme-aware and channel-wise attentive learning strategies for text-dependent Speaker Verification (SV). In the proposed structure, the frame-level multi-task learning along with the segment-level adversarial learning is adopted for speaker embedding extraction. The phoneme-aware attentive pooling is exploited on frame-level features in the main network for speaker classifier, with the corresponding posterior probability for the phoneme distribution in the auxiliary subnet. Further, the introduction of Squeeze and Excitation (SE-block) performs dynamic channel-wise feature recalibration, which improves the representational ability. The proposed method exploits speaker idiosyncrasies associated with pass-phrases, and is further improved by the phoneme-aware attentive pooling and SE-block from temporal and channel-wise aspects, respectively. The experiments conducted on RSR2015 Part 1 database confirm that the proposed system achieves outstanding results for textdependent SV.

【11】 Evaluation of Deep-Learning-Based Voice Activity Detectors and Room Impulse Response Models in Reverberant Environments 标题：混响环境中基于深度学习的语音活动检测器和房间脉冲响应模型的评价

作者：Amir Ivry,Israel Cohen,Baruch Berdugo 机构：Technion – Israel Institute of Technology, Technion City, Haifa , Israel 备注：Accepted to ICASSP 2020 链接：https://arxiv.org/abs/2106.13511 摘要：最先进的基于深度学习的语音活动检测器（vad）通常使用消声数据进行训练。然而，真实的声学环境通常是混响的，这会导致性能显著恶化。为了减少训练数据和真实数据之间的不匹配，我们模拟了一个包含近500万个话语的增强训练集。这种扩展包括消声话语及其混响修改，由消声话语与各种室内脉冲响应（rir）的卷积产生。我们考虑了五种不同的模型来生成RIR，以及五种不同的VAD，它们是用增广训练集训练的。我们在三个不同的真实混响环境中测试所有训练过的系统。实验结果表明，与消声训练相比，所有探测器和响应模型的准确度、精确度和召回率平均提高了20%$。此外，对于所有测试的VAD，其中一个RIR模型始终比其他模型产生更好的性能。此外，在所有实验中，其中一个vad始终优于其他vad。摘要：State-of-the-art deep-learning-based voice activity detectors (VADs) are often trained with anechoic data. However, real acoustic environments are generally reverberant, which causes the performance to significantly deteriorate. To mitigate this mismatch between training data and real data, we simulate an augmented training set that contains nearly five million utterances. This extension comprises of anechoic utterances and their reverberant modifications, generated by convolutions of the anechoic utterances with a variety of room impulse responses (RIRs). We consider five different models to generate RIRs, and five different VADs that are trained with the augmented training set. We test all trained systems in three different real reverberant environments. Experimental results show $20%$ increase on average in accuracy, precision and recall for all detectors and response models, compared to anechoic training. Furthermore, one of the RIR models consistently yields better performance than the other models, for all the tested VADs. Additionally, one of the VADs consistently outperformed the other VADs in all experiments.

【12】 Promises and Pitfalls of Black-Box Concept Learning Models 标题：黑盒概念学习模型的承诺与陷阱

作者：Anita Mahinpei,Justin Clark,Isaac Lage,Finale Doshi-Velez,Weiwei Pan 机构： ( 20 18);Equal contribution 1Harvard University 链接：https://arxiv.org/abs/2106.13314 摘要：将概念学习作为决策过程的中间步骤的机器学习模型可以与黑盒预测模型的性能相匹配，同时保留用人类可以理解的术语解释结果的能力。然而，我们证明了这些模型学习到的概念表示编码的信息超出了预先定义的概念，并且自然缓解策略没有完全起作用，使得对下游预测的解释具有误导性。我们描述了信息泄漏的机制，并提出了减轻其影响的方法。摘要：Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

【13】 Continual Competitive Memory: A Neural System for Online Task-Free Lifelong Learning 标题：持续好胜记忆：在线无任务终身学习的神经系统

作者：Alexander G. Ororbia 机构： A Neural System forOnline Task-Free Lifelong LearningAlexander OrorbiaRochester Institute of TechnologyRochester 链接：https://arxiv.org/abs/2106.13300 摘要：在这篇文章中，我们提出了一种新形式的无监督学习，连续竞争记忆（CCM），以及一个计算框架，以统一相关的神经模型，在竞争的原则下运作。结果表明，该神经系统为解决在线连续分类问题中的灾难性遗忘问题提供了一种有效的方法。我们证明了所提出的CCM系统不仅优于其他竞争性学习神经模型，而且在Split-MNIST和Split-NotMNIST等基准上，其性能与几种现代的、最先进的终身学习方法相当。CCM为获取对数据流干扰具有鲁棒性的表示提供了一条很有前途的途径，特别是当任务对于模型是未知的并且必须在没有外部指导的情况下进行推断时。摘要：In this article, we propose a novel form of unsupervised learning, continual competitive memory (CCM), as well as a computational framework to unify related neural models that operate under the principles of competition. The resulting neural system is shown to offer an effective approach for combating catastrophic forgetting in online continual classification problems. We demonstrate that the proposed CCM system not only outperforms other competitive learning neural models but also yields performance that is competitive with several modern, state-of-the-art lifelong learning approaches on benchmarks such as Split MNIST and Split NotMNIST. CCM yields a promising path forward for acquiring representations that are robust to interference from data streams, especially when the task is unknown to the model and must be inferred without external guidance.

【14】 You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks 标题：You Are AllSet：超图神经网络的多集函数框架

作者：Eli Chien,Chao Pan,Jianhao Peng,Olgica Milenkovic 机构：Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign 链接：https://arxiv.org/abs/2106.13264 摘要：超图被用来模拟agent之间的高阶相互作用，超图数据集存在许多实际相关的实例。为了有效地处理超图结构数据，人们提出了几种学习超图性质和结构的超图神经网络平台，特别是节点分类。然而，几乎所有现有的方法都使用启发式传播规则，并且在许多数据集上提供次优的性能。我们提出了一种新的超图神经网络范式AllSet，它代表了一个高度通用的（超）图神经网络框架，并首次将超图神经网络层实现为两个多集函数的组合，可以有效地学习每个任务和每个数据集。此外，AllSet还利用了超图神经网络与多集函数深度学习的最新进展之间的新联系。特别是，所提出的体系结构利用了深集和集变换体系结构，允许显著的建模灵活性并提供高表达能力。为了评估AllSet的性能，我们进行了迄今为止最广泛的实验，包括十个已知的基准数据集和三个新整理的数据集，这些数据集代表了超图节点分类的重大挑战。结果表明，AllSet具有独特的能力，无论是一致匹配或优于所有其他超图神经网络测试数据集。我们的实现和数据集将在验收后发布。摘要：Hypergraphs are used to model higher-order interactions amongst agents and there exist many practically relevant instances of hypergraph datasets. To enable efficient processing of hypergraph-structured data, several hypergraph neural network platforms have been proposed for learning hypergraph properties and structure, with a special focus on node classification. However, almost all existing methods use heuristic propagation rules and offer suboptimal performance on many datasets. We propose AllSet, a new hypergraph neural network paradigm that represents a highly general framework for (hyper)graph neural networks and for the first time implements hypergraph neural network layers as compositions of two multiset functions that can be efficiently learned for each task and each dataset. Furthermore, AllSet draws on new connections between hypergraph neural networks and recent advances in deep learning of multiset functions. In particular, the proposed architecture utilizes Deep Sets and Set Transformer architectures that allow for significant modeling flexibility and offer high expressive power. To evaluate the performance of AllSet, we conduct the most extensive experiments to date involving ten known benchmarking datasets and three newly curated datasets that represent significant challenges for hypergraph node classification. The results demonstrate that AllSet has the unique ability to consistently either match or outperform all other hypergraph neural networks across the tested datasets. Our implementation and dataset will be released upon acceptance.

【15】 Post Selections Using Test Sets (PSUTS) and How Developmental Networks Avoid Them 标题：使用测试集的帖子选择(PSUT)以及开发网络如何避免它们

作者：Juyang Weng 机构：∗Department of Computer Science and Engineering, †Cognitive Science Program, ‡Neuroscience Program, Michigan State University, East Lansing, MI, USA, §GENISAMA LLC, Okemos, MI , USA 备注：13 pages, 2 figures. The first part has been accepted as an IJCNN 2021 paper and the second has been accepted as an ICDL 2021 paper 链接：https://arxiv.org/abs/2106.13233 摘要：本文提出了一个很少报道的人工智能（AI）实践，称为使用测试集的后选择（PSUT）。因此，在深度学习中流行的错误反馈方法缺乏可接受的泛化能力。所有人工智能方法分为两大流派，连接主义和象征性。PSUT分为两种，机器PSUT和人PSUT。由于大量的网络参数和现在更糟糕的机器PSUT，连接主义学派因其“不修边幅”而受到批评；但由于人类PSUT的泛化能力较弱，这种看似“干净”的符号学派似乎更脆弱。本文正式定义了PSUTS的概念，分析了随机初始权值的误差反投影方法为什么会出现严重的局部极小值，PSUTS为什么违反了公认的研究伦理，以及每一篇使用PSUTS的论文应该如何至少透明地报告PSUTS。为了提高未来出版物的透明度，本文提出了一个新的人工智能性能评估标准，即所有训练网络的发展误差，以及三种学习条件：（1）增量学习结构，（2）训练经验和（3）有限的计算资源。开发性网络避免PSUT，并且不“邋遢”，因为它们驱动紧急图灵机，并且在整个生命周期中的最大可能性意义上是最优的。摘要：This paper raises a rarely reported practice in Artificial Intelligence (AI) called Post Selection Using Test Sets (PSUTS). Consequently, the popular error-backprop methodology in deep learning lacks an acceptable generalization power. All AI methods fall into two broad schools, connectionist and symbolic. The PSUTS fall into two kinds, machine PSUTS and human PSUTS. The connectionist school received criticisms for its "scruffiness" due to a huge number of network parameters and now the worse machine PSUTS; but the seemingly "clean" symbolic school seems more brittle because of a weaker generalization power using human PSUTS. This paper formally defines what PSUTS is, analyzes why error-backprop methods with random initial weights suffer from severe local minima, why PSUTS violates well-established research ethics, and how every paper that used PSUTS should have at least transparently reported PSUTS. For improved transparency in future publications, this paper proposes a new standard for performance evaluation of AI, called developmental errors for all networks trained, along with Three Learning Conditions: (1) an incremental learning architecture, (2) a training experience and (3) a limited amount of computational resources. Developmental Networks avoid PSUTS and are not "scruffy" because they drive Emergent Turing Machines and are optimal in the sense of maximum-likelihood across lifetime.

【16】 Circumpapillary OCT-Focused Hybrid Learning for Glaucoma Grading Using Tailored Prototypical Neural Networks 标题：定制原型神经网络用于青光眼分级的乳头状OCT聚焦混合学习

作者：Gabriel García,Rocío del Amor,Adrián Colomer,Rafael Verdú-Monedero,Juan Morales-Sánchez,Valery Naranjo 机构： Universitat Politecnica de Valencia, Universidad Polit´ecnica de Cartagena 链接：https://arxiv.org/abs/2106.13551 摘要：青光眼是世界范围内致盲的主要原因之一，光学相干断层扫描（OCT）是检测青光眼的典型成像技术。与大多数最新的青光眼检测研究不同，在本文中，我们首次提出了一种新的基于原始乳头周围B超的青光眼分级框架。特别是，我们提出了一种新的基于OCT的混合网络，它结合了手驱动和深度学习算法。提出了一种OCT特异性描述符来提取与视网膜神经纤维层（RNFL）相关的手工特征。同时，一个创新的CNN被开发使用跳跃连接，包括定制的剩余和注意模块，以细化潜在空间的自动特征。该体系结构作为主干，在静态和动态原型网络的基础上实现了一种新的Few-Shot学习。k-shot范式被重新定义，产生了一个有监督的端到端系统，它提供了区分健康、早期和晚期青光眼样本的实质性改进。从海德堡光谱系统获得的两个融合数据库中，讨论了动态原型网络的训练和评估过程。验证和测试结果对青光眼分级的分类准确率分别为0.9459和0.8788。此外，所提出的青光眼检测模型的高性能值得一提。类激活图的结果直接符合临床医生的观点，因为热图指出RNFL是诊断青光眼最相关的结构。摘要：Glaucoma is one of the leading causes of blindness worldwide and Optical Coherence Tomography (OCT) is the quintessential imaging technique for its detection. Unlike most of the state-of-the-art studies focused on glaucoma detection, in this paper, we propose, for the first time, a novel framework for glaucoma grading using raw circumpapillary B-scans. In particular, we set out a new OCT-based hybrid network which combines hand-driven and deep learning algorithms. An OCT-specific descriptor is proposed to extract hand-crafted features related to the retinal nerve fibre layer (RNFL). In parallel, an innovative CNN is developed using skip-connections to include tailored residual and attention modules to refine the automatic features of the latent space. The proposed architecture is used as a backbone to conduct a novel few-shot learning based on static and dynamic prototypical networks. The k-shot paradigm is redefined giving rise to a supervised end-to-end system which provides substantial improvements discriminating between healthy, early and advanced glaucoma samples. The training and evaluation processes of the dynamic prototypical network are addressed from two fused databases acquired via Heidelberg Spectralis system. Validation and testing results reach a categorical accuracy of 0.9459 and 0.8788 for glaucoma grading, respectively. Besides, the high performance reported by the proposed model for glaucoma detection deserves a special mention. The findings from the class activation maps are directly in line with the clinicians' opinion since the heatmaps pointed out the RNFL as the most relevant structure for glaucoma diagnosis.

【17】 Multifidelity Modeling for Physics-Informed Neural Networks (PINNs) 标题：物理信息神经网络(PINN)的多保真建模

作者：Michael Penwarden,Shandian Zhe,Akil Narayan,Robert M. Kirby 机构：School of Computing and Scientific Computing and Imaging Institute, University of Utah, School of Computing, University of Utah, Salt Lake City, UT, Department of Mathematics and Scientific Computing and Imaging Institute, University of 链接：https://arxiv.org/abs/2106.13361 摘要：多理想仿真方法经常被用于尝试以一种提高精度、节省成本的方式，将低保真度和高保真度仿真结果明智地结合起来。这种方法的候选者是仿真方法，其保真度差异与显著的计算成本差异有关。由于采用不同的置信度（以结构宽度和深度以及优化标准表示）时所需的训练时间存在显著差异，物理信息神经网络（pinn）是这类方法的候选方法。在本文中，我们提出了一种特殊的多重理想方法，应用于利用低秩结构的pinn。我们证明了宽度、深度和优化准则可以作为与模型保真度相关的参数，并从数值上证明了由于保真度参数的选择而导致的训练成本差异。我们在新兴的PINNs文献中提出的各种规范前向偏微分方程模型上测试了我们的多重理想方案。摘要：Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.

【18】 Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal-Organic Frameworks 标题：利用机器学习和数据挖掘利用社区知识进行稳定金属-有机骨架工程

作者：Aditya Nandy,Chenru Duan,Heather J. Kulik 机构：Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, AUTHOR INFORMATION, Corresponding Author 链接：https://arxiv.org/abs/2106.13327 摘要：尽管MOFs的金属活性中心和多孔结构在从气体分离到催化等工程挑战中有很大的应用前景，但对如何提高其稳定性的认识不足限制了其在实际中的应用。为了克服这一局限性，我们摘录了数千份已发表的关于MOF稳定性的关键方面的报告，这些报告是MOF实际应用所必需的：耐高温而不降解的能力以及通过去除溶剂分子而被激活的能力。从近4000份手稿中，我们使用自然语言处理和自动图像分析获得了2000多个溶剂去除稳定性指标和3000多个热降解温度。我们分析了稳定性性质与该集合中的化学结构和几何结构之间的关系，以确定来自较小mof集合的先验启发式的极限。通过训练预测机器学习（ML，即高斯过程和人工神经网络）模型来编码基于图形和孔隙结构的结构-性质关系，我们能够比传统的基于物理的建模或实验更快地预测稳定数量级。对ML模型中重要特征的解释提供了一些见解，我们可以利用这些见解来确定策略，将稳定性提高到通常用于催化应用的不稳定三维mof中。我们希望我们的方法能够加快发现稳定、实用的MOF材料的速度，使其具有广泛的应用前景。摘要：Although the tailored metal active sites and porous architectures of MOFs hold great promise for engineering challenges ranging from gas separations to catalysis, a lack of understanding of how to improve their stability limits their use in practice. To overcome this limitation, we extract thousands of published reports of the key aspects of MOF stability necessary for their practical application: the ability to withstand high temperatures without degrading and the capacity to be activated by removal of solvent molecules. From nearly 4,000 manuscripts, we use natural language processing and automated image analysis to obtain over 2,000 solvent-removal stability measures and 3,000 thermal degradation temperatures. We analyze the relationships between stability properties and the chemical and geometric structures in this set to identify limits of prior heuristics derived from smaller sets of MOFs. By training predictive machine learning (ML, i.e., Gaussian process and artificial neural network) models to encode the structure-property relationships with graph- and pore-structure-based representations, we are able to make predictions of stability orders of magnitude faster than conventional physics-based modeling or experiment. Interpretation of important features in ML models provides insights that we use to identify strategies to engineer increased stability into typically unstable 3d-containing MOFs that are frequently targeted for catalytic applications. We expect our approach to accelerate the time to discovery of stable, practical MOF materials for a wide range of applications.

【19】 Prior Image-Constrained Reconstruction using Style-Based Generative Models 标题：基于样式生成模型的先验图像约束重建

作者：Varun A. Kelkar,Mark A. Anastasio 机构： Recent research in GANs hasachieved state of the art performance in terms of visual 1University of Illinois at Urbana-Champaign 备注：Accepted for publication at the International Conference on Machine Learning (ICML) 2021 链接：https://arxiv.org/abs/2102.12525 摘要：从高度不完全的成像测量中获得对物体有用的估计仍然是成像科学的圣杯。深度学习方法在学习对象的先验知识或约束条件以改善不适定成像反问题的条件化方面显示出良好的前景。在这项研究中，我们提出了一个框架来估计与已知先验图像语义相关的感兴趣对象。在基于风格的生成模型的解纠缠潜在空间中提出了一个优化问题，并利用先验图像的解纠缠潜在表示施加语义上有意义的约束。从理论上分析了利用先验图像进行不完全测量的稳定恢复。数值实验表明，与相关方法相比，该方法具有更好的性能。摘要：Obtaining a useful estimate of an object from highly incomplete imaging measurements remains a holy grail of imaging science. Deep learning methods have shown promise in learning object priors or constraints to improve the conditioning of an ill-posed imaging inverse problem. In this study, a framework for estimating an object of interest that is semantically related to a known prior image, is proposed. An optimization problem is formulated in the disentangled latent space of a style-based generative model, and semantically meaningful constraints are imposed using the disentangled latent representation of the prior image. Stable recovery from incomplete measurements with the help of a prior image is theoretically analyzed. Numerical experiments demonstrating the superior performance of our approach as compared to related methods are presented.

其他(24篇)

【1】 Single Image Texture Translation for Data Augmentation 标题：用于数据增强的单幅图像纹理转换

作者：Boyi Li,Yin Cui,Tsung-Yi Lin,Serge Belongie 机构：Cornell University, Cornell Tech, Google Research, Brain Team 链接：https://arxiv.org/abs/2106.13804 摘要：图像合成的最新进展使人们能够通过学习源域和目标域之间的映射来翻译图像。现有的方法倾向于通过在各种数据集上训练一个模型来学习分布，结果的评估主要以主观的方式进行。然而，这方面的研究相对较少，研究语义图像翻译方法在图像识别任务中的潜在应用。在本文中，我们探讨了使用单一图像纹理转换（SITT）的数据增强。我们首先提出一个轻量级的模型来将纹理转换成基于单一输入的图像，允许快速的训练和测试。在此基础上，探讨了增强数据在长尾和Few-Shot图像分类中的应用。我们发现该方法能够将输入数据转换到目标域，从而提高图像识别性能。最后，我们研究了SITT和相关的图像翻译方法如何为数据高效、增强工程的模型训练方法提供基础。摘要：Recent advances in image synthesis enables one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of semantic image translation methods for image recognition tasks. In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation. We first propose a lightweight model for translating texture to images based on a single input of source texture, allowing for fast training and testing. Based on SITT, we then explore the use of augmented data in long-tailed and few-shot image classification tasks. We find the proposed method is capable of translating input data into a target domain, leading to consistent improved image recognition performance. Finally, we examine how SITT and related image translation methods can provide a basis for a data-efficient, augmentation engineering approach to model training.

【2】 Assessing Generalization of SGD via Disagreement 标题：通过不同意见评价SGD的概括性

作者：Yiding Jiang,Vaishnavh Nagarajan,Christina Baek,J. Zico Kolter 机构：Carnegie Mellon University, Bosch Center for AI, Pittsburgh 链接：https://arxiv.org/abs/2106.13799 摘要：我们的经验表明，深度网络的测试误差可以通过简单地在相同的训练集上训练相同的结构，但使用不同的随机梯度下降（SGD）运行，并在未标记的测试数据上测量两个网络之间的不一致率来估计。这是建立在20年Nakkiran&Bansal观察的基础上的，并且是一个更强大的版本，它要求第二次跑步必须在全新的训练环境中进行。我们进一步从理论上证明了这种特殊现象是由SGD训练模型的emph{ensembles}性质引起的。这一发现不仅为利用未标记测试数据直接预测测试误差提供了一种简单的实证方法，而且在泛化和校准之间建立了一种新的概念联系。摘要：We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the emph{well-calibrated} nature of emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.

【3】 HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters 标题：HyperNP：多维投影超参数的交互式可视化探索

作者：Gabriel Appleby,Mateus Espadoto,Rui Chen,Samuel Goree,Alexandru Telea,Erik W Anderson,Remco Chang 链接：https://arxiv.org/abs/2106.13777 摘要：投影算法（如t-SNE或UMAP）对于高维数据的可视化是有用的，但依赖于必须仔细调整的超参数。不幸的是，由于这些方法的随机性，迭代地重新计算投影来寻找最优的超参数值是计算密集和不直观的。在本文中，我们提出了一种可扩展的方法，它允许通过训练神经网络近似来实时交互探索投影方法的超参数。超NP可以在总数据实例和超参数配置的一小部分上进行训练，并且可以以交互速度计算新数据和超参数的投影。HyperNP具有体积小、计算速度快的特点，因此可以嵌入到web浏览器等轻量级可视化系统中。我们从性能和速度两个方面评估了超NP在三个数据集上的性能。结果表明，超NP是准确的，可扩展的，交互式的，适合在现实世界中使用。摘要：Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter value is computationally intensive and unintuitive due to the stochastic nature of these methods. In this paper we propose HyperNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. HyperNP can be trained on a fraction of the total data instances and hyperparameter configurations and can compute projections for new data and hyperparameters at interactive speeds. HyperNP is compact in size and fast to compute, thus allowing it to be embedded in lightweight visualization systems such as web browsers. We evaluate the performance of the HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP is accurate, scalable, interactive, and appropriate for use in real-world settings.

【4】 Jitter: Random Jittering Loss Function 标题：抖动：随机抖动损失函数

作者：Zhicheng Cai,Chenglei Peng,Sidan Du 机构：School of Electronic Science and Engineering, Nanjing University, Nanjing, China 备注：IJCNN 2021 链接：https://arxiv.org/abs/2106.13749 摘要：正则化在机器学习优化中起着至关重要的作用。一种称为泛洪（flooding）的新正则化方法使训练损失在泛洪水平上下波动。它打算使模型继续随机游走，直到它到达一个平坦的损失景观，以增强泛化。然而，泛洪方法中的超参数泛洪电平选择不合理，不能统一。我们提出了一种新的方法称为抖动来改善它。抖动本质上是一种随机损失函数。在训练之前，我们从特定的概率分布中随机抽取抖动点。用抖动点代替泛洪水位，得到新的目标函数，并对模型进行相应的训练。由于抖动点作为一个随机因素，我们实际上在损失函数中加入了一些随机性，这与机器学习模型在学习过程中存在无数的随机行为这一事实是一致的，从而使模型更具鲁棒性。此外，Jitter随机执行随机游走，将损耗曲线划分为几个小间隔，然后将它们翻转过来，理想情况下可以使损耗曲线更加平坦，增强泛化能力。此外，抖动可以是一种域、任务和模型无关的正则化方法，在训练误差减小到零后，可以有效地训练模型。实验结果表明，与以往的泛洪方法相比，抖动方法能显著提高模型性能，使测试损失曲线下降两倍。摘要：Regularization plays a vital role in machine learning optimization. One novel regularization method called flooding makes the training loss fluctuate around the flooding level. It intends to make the model continue to random walk until it comes to a flat loss landscape to enhance generalization. However, the hyper-parameter flooding level of the flooding method fails to be selected properly and uniformly. We propose a novel method called Jitter to improve it. Jitter is essentially a kind of random loss function. Before training, we randomly sample the Jitter Point from a specific probability distribution. The flooding level should be replaced by Jitter point to obtain a new target function and train the model accordingly. As Jitter point acting as a random factor, we actually add some randomness to the loss function, which is consistent with the fact that there exists innumerable random behaviors in the learning process of the machine learning model and is supposed to make the model more robust. In addition, Jitter performs random walk randomly which divides the loss curve into small intervals and then flipping them over, ideally making the loss curve much flatter and enhancing generalization ability. Moreover, Jitter can be a domain-, task-, and model-independent regularization method and train the model effectively after the training error reduces to zero. Our experimental results show that Jitter method can improve model performance more significantly than the previous flooding method and make the test loss curve descend twice.

【5】 Re-parameterizing VAEs for stability 标题：重新参数化VAE以确保稳定性

作者：David Dehaene,Rémy Brossard 链接：https://arxiv.org/abs/2106.13739 摘要：提出了一种变分自动编码器（VAE）训练数值稳定性的理论方法。我们的工作是由最近的研究推动的，这些研究使VAE能够在复杂的图像数据集上获得最先进的生成结果。这些非常深的VAE架构，以及使用更复杂输出分布的VAE，突出了随意产生高训练梯度以及NaN损失的趋势。尽管有其局限性，但为训练他们而提出的经验修正既没有充分的理论依据，也不足以在实践中普遍适用。在此基础上，我们将问题源定位在模型神经网络与其输出概率分布之间的接口处。我们解释了一个共同的不稳定来源，源于编码正态分布方差的不谨慎公式，并将相同的方法应用于其他不太明显的来源。我们证明，通过对参数化正态分布的方法进行微小的改变，可以安全地训练VAE。摘要：We propose a theoretical approach towards the training numerical stability of Variational AutoEncoders (VAE). Our work is motivated by recent studies empowering VAEs to reach state of the art generative results on complex image datasets. These very deep VAE architectures, as well as VAEs using more complex output distributions, highlight a tendency to haphazardly produce high training gradients as well as NaN losses. The empirical fixes proposed to train them despite their limitations are neither fully theoretically grounded nor generally sufficient in practice. Building on this, we localize the source of the problem at the interface between the model's neural networks and their output probabilistic distributions. We explain a common source of instability stemming from an incautious formulation of the encoded Normal distribution's variance, and apply the same approach on other, less obvious sources. We show that by implementing small changes to the way we parameterize the Normal distributions on which they rely, VAEs can securely be trained.

【6】 CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG Signals 标题：CADDA：脑电信号的分类自动微分数据增强

作者：Cédric Rommel,Thomas Moreau,Alexandre Gramfort 机构：Inria - CEA, Université Paris-Saclay 链接：https://arxiv.org/abs/2106.13695 摘要：数据扩充是深度学习管道的一个关键元素，因为它在训练期间向网络通知输入数据的转换，以保持标签不变。然而，为给定的管道手动寻找适当的增强方法和参数是非常麻烦的。特别是，虽然直觉可以指导图像的决策，但对于更复杂类型的数据（如神经科学信号），增强策略的设计和选择仍然不清楚。此外，独立于标签的策略可能不适合这种结构化数据，依赖于类的扩充可能是必要的。这一想法在文献中被意外地未被探索过，但它是相当直观的：改变汽车图像的颜色并不会改变要预测的对象类，但对橙色的图片做同样的操作会改变。本文旨在通过类级数据扩充来提高泛化能力。然而，由于寻求依赖于类的转换大大增加了任务的复杂性，使用无梯度优化技术（如大多数现有的自动方法所做的）对于真实世界的数据集来说变得很困难。基于这个原因，我们建议使用基于梯度学习的可微数据扩充。脑电信号是一个完美的例子，数据的良好的增强政策大多是未知的。在这项工作中，我们证明了我们的方法对临床相关的睡眠分期分类任务的相关性，为此我们还提出了可微转换。摘要：Data augmentation is a key element of deep learning pipelines, as it informs the network during training about transformations of the input data that keep the label unchanged. Manually finding adequate augmentation methods and parameters for a given pipeline is however rapidly cumbersome. In particular, while intuition can guide this decision for images, the design and choice of augmentation policies remains unclear for more complex types of data, such as neuroscience signals. Moreover, label independent strategies might not be suitable for such structured data and class-dependent augmentations might be necessary. This idea has been surprisingly unexplored in the literature, while it is quite intuitive: changing the color of a car image does not change the object class to be predicted, but doing the same to the picture of an orange does. This paper aims to increase the generalization power added through class-wise data augmentation. Yet, as seeking transformations depending on the class largely increases the complexity of the task, using gradient-free optimization techniques as done by most existing automatic approaches becomes intractable for real-world datasets. For this reason we propose to use differentiable data augmentation amenable to gradient-based learning. EEG signals are a perfect example of data for which good augmentation policies are mostly unknown. In this work, we demonstrate the relevance of our approach on the clinically relevant sleep staging classification task, for which we also propose differentiable transformations.

【7】 Robust Matrix Factorization with Grouping Effect 标题：具有分组效应的鲁棒矩阵分解

作者：Haiyan Jiang,Shuyu Li,Luwei Zhang,Haoyi Xiong,Dejing Dou 机构： Baidu Research, Baidu Inc., China, Columbia University, New York, NY, USA 备注：22 pages, 5 figures, 4 tables 链接：https://arxiv.org/abs/2106.13681 摘要：虽然许多技术已被应用于矩阵分解（MF），但它们可能没有充分利用特征结构。本文将分组效应引入到MF中，提出了一种新的基于分组效应的鲁棒矩阵分解方法（GRMF）。分组效应是稀疏效应的推广，它通过将相似值聚集在多个中心而不是0左右来进行去噪。与现有算法相比，本文提出的GRMF算法可以在不需要先验知识的情况下自动学习MF中的分组结构和稀疏性，通过引入一种自然可调的非凸正则化，实现了同时稀疏和分组的效果。具体地说，GRMF采用了一种高效的交替极小化框架来执行MF，该框架首先通过凸差分（DC）规划将原非凸问题转化为凸问题，然后用交替方向乘子法（ADMM）求解。此外，GRMF可以很容易地扩展到非负矩阵分解（NMF）设置。在实际数据集上进行了大量实验，实验结果表明，与五种基准算法相比，GRMF算法具有更好的性能和鲁棒性。摘要：Although many techniques have been applied to matrix factorization (MF), they may not fully exploit the feature structure. In this paper, we incorporate the grouping effect into MF and propose a novel method called Robust Matrix Factorization with Grouping effect (GRMF). The grouping effect is a generalization of the sparsity effect, which conducts denoising by clustering similar values around multiple centers instead of just around 0. Compared with existing algorithms, the proposed GRMF can automatically learn the grouping structure and sparsity in MF without prior knowledge, by introducing a naturally adjustable non-convex regularization to achieve simultaneous sparsity and grouping effect. Specifically, GRMF uses an efficient alternating minimization framework to perform MF, in which the original non-convex problem is first converted into a convex problem through Difference-of-Convex (DC) programming, and then solved by Alternating Direction Method of Multipliers (ADMM). In addition, GRMF can be easily extended to the Non-negative Matrix Factorization (NMF) settings. Extensive experiments have been conducted using real-world data sets with outliers and contaminated noise, where the experimental results show that GRMF has promoted performance and robustness, compared to five benchmark algorithms.

【8】 Multi-player Multi-armed Bandits with Collision-Dependent Reward Distributions 标题：具有碰撞相关报酬分配的多人多臂抢劫机

作者：Chengshuai Shi,Cong Shen 机构： Brown Department of Electrical andComputer Engineering, University of Virginia 备注：17 pages, 14 figures. Accepted to IEEE Transactions on Signal Processing 链接：https://arxiv.org/abs/2106.13669 摘要：研究了一个新的随机多人多臂盗贼（MP-MAB）问题，当手臂发生碰撞时，奖励分配会发生变化。现有文献总是假设，如果发生碰撞，参与者的报酬为零，但对于认知无线电等应用，更现实的情况是碰撞会降低平均报酬，但不一定为零。我们将重点放在更实用的无感知环境中，玩家不会直接感知碰撞，并提出了纠错碰撞通信（EC3）算法，该算法将隐式通信建模为噪声信道下的可靠通信问题，利用随机编码误差指数建立了无通信协议可克服的最优遗憾。最后，优化码长和译码错误率之间的折衷会导致接近集中MP-MAB遗憾的遗憾，这表示一个自然的下限。在合成数据集和真实数据集上的实际纠错码实验证明了EC3的优越性。结果表明，编码方案的选择对系统的性能有着深刻的影响。摘要：We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. Experiments with practical error-correction codes on both synthetic and real-world datasets demonstrate the superiority of EC3. In particular, the results show that the choice of coding schemes has a profound impact on the regret performance.

【9】 DeepLoc: A Ubiquitous Accurate and Low-Overhead Outdoor Cellular Localization System 标题：DeepLoc：一种无处不在的高精度低开销户外蜂窝定位系统

作者：Ahmed Shokry,Marwan Torki,Moustafa Youssef 机构：Alexandria University, Alexandria, Egypt, Egypt-Japan Univ. of Sc. & Tech. 备注：None 链接：https://arxiv.org/abs/2106.13632 摘要：近年来，户外定位服务发展迅速。虽然GPS被认为是一个无处不在的定位系统，但它不受低端手机的支持，需要与卫星直接接触，并且可以快速耗尽手机电池。本文提出了一种基于深度学习的户外定位系统DeepLoc，该系统能在不受限制的情况下获得类似GPS的定位精度。特别是，DeepLoc利用从移动设备听到的不同基站接收到的无处不在的蜂窝信号作为定位的提示。为了做到这一点，人群感知地理标记接收信号强度信息来自不同的细胞塔是用来训练一个深模型，用于推断用户的位置。作为DeepLoc设计的一部分，我们引入了一些模块来解决一些实际挑战，包括将数据收集扩展到大面积、处理蜂窝信号和地理标记数据中的固有噪声，以及以低开销提供深度学习模型所需的足够数据。我们在不同的Android设备上实现了DeepLoc。在实际城乡环境中的评估结果表明，DeepLoc在城市地区的定位精度中值在18.8m以内，在农村地区的定位精度中值在15.7m以内。这种精确度比最先进的基于蜂窝的系统高出470%以上，并且比GPS节省了330%的功率。这突出了DeepLoc作为一个无处不在的精确和低开销定位系统的前景。摘要：Recent years have witnessed fast growth in outdoor location-based services. While GPS is considered a ubiquitous localization system, it is not supported by low-end phones, requires direct line of sight to the satellites, and can drain the phone battery quickly. In this paper, we propose DeepLoc: a deep learning-based outdoor localization system that obtains GPS-like localization accuracy without its limitations. In particular, DeepLoc leverages the ubiquitous cellular signals received from the different cell towers heard by the mobile device as hints to localize it. To do that, crowd-sensed geo-tagged received signal strength information coming from different cell towers is used to train a deep model that is used to infer the user's position. As part of DeepLoc design, we introduce modules to address a number of practical challenges including scaling the data collection to large areas, handling the inherent noise in the cellular signal and geo-tagged data, as well as providing enough data that is required for deep learning models with low-overhead. We implemented DeepLoc on different Android devices. Evaluation results in realistic urban and rural environments show that DeepLoc can achieve a median localization accuracy within 18.8m in urban areas and within 15.7m in rural areas. This accuracy outperforms the state-of-the-art cellular-based systems by more than 470% and comes with 330% savings in power compared to the GPS. This highlights the promise of DeepLoc as a ubiquitous accurate and low-overhead localization system.

【10】 Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote 标题：加权多数票的Chebyshev-Canelli PAC-Bayes-Bennett不等式

作者：Yi-Shan Wu,Andrés R. Masegosa,Stephan S. Lorenzen,Christian Igel,Yevgeny Seldin 机构：University of Copenhagen, University of Almería 备注：arXiv admin note: text overlap with arXiv:2007.13532 链接：https://arxiv.org/abs/2106.13624 摘要：我们提出一个新的二阶甲骨文界的预期风险加权多数票。该界基于Chebyshev-Cantelli不等式（又称单侧Chebyshev不等式）的一种新的参数形式，该不等式易于有效极小化。新形式解决了基于Chebyshev-Cantelli不等式、C-界[Germain et al.，2015]的先验oracle界面临的优化挑战，同时改进了基于Masegosa et al.[2020]引入的二阶Markov不等式的oracle界。我们还推导了PAC-Bayes-Bennett不等式，并将其用于oracle界的经验估计。PAC-Bayes-Bennett不等式改进了Seldin等人[2012]提出的PAC-Bayes-Bernstein不等式。我们提供了一个实证评估，证明新的边界可以改进Masegosa等人[2020]的工作。Chebyshev-Cantelli不等式和PAC-Bayes-Bennett不等式的参数形式对于研究其他领域的测度集中问题可能具有独立的意义。摘要：We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev-Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive the PAC-Bayes-Bennett inequality, which we use for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality by Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work by Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.

【11】 Connecting Sphere Manifolds Hierarchically for Regularization 标题：球面流形的分层连接正则化

作者：Damien Scieur,Youngsung Kim 机构： Montreal 2Samsung Advanced Institute of Technology (SAIT) 链接：https://arxiv.org/abs/2106.13549 摘要：本文研究具有层次组织类的分类问题。我们强制每个类的分类器（超平面）属于一个球体流形，其中心是其超类的分类器。然后，根据各个球流形的层次关系将它们连接起来。我们的技术通过结合一个球形的完全连接层和一个层次结构层来代替神经网络的最后一层。这种正则化可以提高广泛使用的深度神经网络结构（ResNet和DenseNet）在公开数据集（CIFAR100、CUB200、Stanford dogs、Stanford cars和Tiny ImageNet）上的性能。摘要：This paper considers classification problems with hierarchically organized classes. We force the classifier (hyperplane) of each class to belong to a sphere manifold, whose center is the classifier of its super-class. Then, individual sphere manifolds are connected based on their hierarchical relations. Our technique replaces the last layer of a neural network by combining a spherical fully-connected layer with a hierarchical layer. This regularization is shown to improve the performance of widely used deep neural network architectures (ResNet and DenseNet) on publicly available datasets (CIFAR100, CUB200, Stanford dogs, Stanford cars, and Tiny-ImageNet).

【12】 Dealing with Expert Bias in Collective Decision-Making 标题：处理集体决策中的专家偏差问题

作者：Axel Abels,Tom Lenaerts,Vito Trianni,Ann Nowé 机构：Ann Now´e , Machine Learning Group, Universit´e Libre de Bruxelles, Brussels, Belgium, AI Lab, Vrije Universiteit Brussel, Brussels, Belgium, Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy 链接：https://arxiv.org/abs/2106.13539 摘要：相当多的现实世界问题可以被描述为决策问题，其中一个人必须从一组备选方案中反复做出适当的选择。专家的判断，无论是人为的还是人为的，都有助于做出正确的决定，尤其是在探索替代解决方案成本高昂的情况下。由于专家的意见可能有偏差，寻找正确的替代方案的问题可以作为一个集体决策问题来处理。目前解决清洁发展机制问题的最新方法受到专家组中最优秀专家素质的限制，如果专家不合格或过于偏袒，则表现不佳，从而有可能使决策过程脱轨。在本文中，我们提出了一种新的算法方法基于上下文多武装土匪问题（CMAB）来识别和抵消这种偏见的专家。我们探讨了同质、异质和两极分化的专家组，并表明这种方法能够有效地利用集体的专业知识，无论提供的建议是否直接有利于良好的表现，优于最先进的方法，特别是当提供的专业知识的质量下降。我们的新的CMAB启发的方法实现了更高的最终性能，并这样做，同时收敛速度比以前的自适应算法更快，特别是当异构的专业知识是现成的。摘要：Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives. Expert judgements, whether human or artificial, can help in taking correct decisions, especially when exploration of alternative solutions is costly. As expert opinions might deviate, the problem of finding the right alternative can be approached as a collective decision making problem (CDM). Current state-of-the-art approaches to solve CDM are limited by the quality of the best expert in the group, and perform poorly if experts are not qualified or if they are overly biased, thus potentially derailing the decision-making process. In this paper, we propose a new algorithmic approach based on contextual multi-armed bandit problems (CMAB) to identify and counteract such biased expertises. We explore homogeneous, heterogeneous and polarised expert groups and show that this approach is able to effectively exploit the collective expertise, irrespective of whether the provided advice is directly conducive to good performance, outperforming state-of-the-art methods, especially when the quality of the provided expertise degrades. Our novel CMAB-inspired approach achieves a higher final performance and does so while converging more rapidly than previous adaptive algorithms, especially when heterogeneous expertise is readily available.

【13】 Deep Residual Echo Suppression with A Tunable Tradeoff Between Signal Distortion and Echo Suppression 标题：在信号失真和回波抑制之间进行可调折衷的深层残余回波抑制

作者：Amir Ivry,Israel Cohen,Baruch Berdugo 机构：Technion – Israel Institute of Technology, Technion City, Haifa , Israel 备注：None 链接：https://arxiv.org/abs/2106.13531 摘要：本文提出了一种利用非线性神经网络将线性声回波对消器的输出直接映射到谱域期望信号的剩余回波抑制方法。该系统嵌入了一个设计参数，允许在期望的信号失真和残余回波抑制之间进行可调的折衷。该系统使用了13.6万个参数，每秒需要1.6千兆浮点运算和10兆字节的内存。该实现既满足AEC挑战的时序要求，又满足设备上应用程序的计算和内存限制。实验用来自AEC挑战数据库和真实独立记录的161小时数据进行。我们展示了该系统在实际环境中的性能，并将其与两种竞争方法进行了比较，包括回声抑制和期望信号失真、对各种环境的泛化以及对高回声水平的鲁棒性。摘要：In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory. The implementation satisfies both the timing requirements of the AEC challenge and the computational and memory limitations of on-device applications. Experiments are conducted with 161~h of data from the AEC challenge database and from real independent recordings. We demonstrate the performance of the proposed system in real-life conditions and compare it with two competing methods regarding echo suppression and desired-signal distortion, generalization to various environments, and robustness to high echo levels.

【14】 Data-based Design of Inferential Sensors for Petrochemical Industry 标题：基于数据的石油化工推理式传感器设计

作者：Martin Mojto,Karol Ľubušký,Miroslav Fikar,Radoslav Paulen 机构：Slovak University of Technology in, Bratislava, Radlinského , Bratislava, Slovakia, Slovnaft, a.s., Vlčie hrdlo , Bratislava, Slovakia 链接：https://arxiv.org/abs/2106.13503 摘要：推断（或软）传感器在工业中用于从在线测量的变量（如压力、温度）推断不精确和很少测量（或完全未测量）的变量值。在设计一个有效的推理传感器时，与经典的模型过拟合相似，主要的挑战是选择一个正确的传感器结构。传感器结构由传感器的输入数量表示，这些输入对应于在线测量的变量及其（简单的）组合。针对炼油厂两套装置，一套催化裂化装置和一套减压汽油加氢装置的工业精馏塔，设计了产品成分推断传感器。作为第一个设计步骤，我们使用了几种著名的数据预处理（粗差检测）方法，并比较了这些方法在可用工业数据中指示系统误差和异常值的能力。考虑到模型的复杂性和准确性，我们研究了各种推理传感器设计方法的有效性。有效性分析表明，与现有的推理传感器相比，改进后的传感器性能提高了19%。摘要：Inferential (or soft) sensors are used in industry to infer the values of imprecisely and rarely measured (or completely unmeasured) variables from variables measured online (e.g., pressures, temperatures). The main challenge, akin to classical model overfitting, in designing an effective inferential sensor is the selection of a correct structure of the sensor. The sensor structure is represented by the number of inputs to the sensor, which correspond to the variables measured online and their (simple) combinations. This work is focused on the design of inferential sensors for product composition of an industrial distillation column in two oil refinery units, a Fluid Catalytic Cracking unit and a Vacuum Gasoil Hydrogenation unit. As the first design step, we use several well-known data pre-treatment (gross error detection) methods and compare the ability of these approaches to indicate systematic errors and outliers in the available industrial data. We then study effectiveness of various methods for design of the inferential sensors taking into account the complexity and accuracy of the resulting model. The effectiveness analysis indicates that the improvements achieved over the current inferential sensors are up to 19 %.

【15】 Identifying malicious accounts in Blockchains using Domain Names and associated temporal properties 标题：使用域名和相关的时态属性识别区块链中的恶意账户

作者：Rohit Kumar Sachan,Rachit Agarwal,Sandeep Kumar Shukla 机构：CSE Department, IIT Kanpur 备注：Submitted to a journal 链接：https://arxiv.org/abs/2106.13420 摘要：区块链技术应用的增加导致网络犯罪分子的非法活动增加，花费数十亿美元。许多机器学习算法被用来检测这种非法行为。这些算法通常针对事务行为进行训练，在某些情况下，还针对系统中存在的漏洞进行训练。在我们的方法中，我们研究在区块链中使用与帐户相关联的元数据（如域名（DN））的可行性，并确定是否应将帐户标记为恶意帐户。这里，我们利用附加到DNs的时间方面。我们的结果确定了144930个显示恶意行为的DNs，其中54114个DNs显示了持续的恶意行为。尽管如此，在新的官方标记的恶意区块链DNs中，没有报告这些已识别的恶意DNs。摘要：The rise in the adoption of blockchain technology has led to increased illegal activities by cyber-criminals costing billions of dollars. Many machine learning algorithms are applied to detect such illegal behavior. These algorithms are often trained on the transaction behavior and, in some cases, trained on the vulnerabilities that exist in the system. In our approach, we study the feasibility of using metadata such as Domain Name (DN) associated with the account in the blockchain and identify whether an account should be tagged malicious or not. Here, we leverage the temporal aspects attached to the DNs. Our results identify 144930 DNs that show malicious behavior, and out of these, 54114 DNs show persistent malicious behavior over time. Nonetheless, none of these identified malicious DNs were reported in new officially tagged malicious blockchain DNs.

【16】 Building Intelligent Autonomous Navigation Agents 标题：构建智能自主导航代理

作者：Devendra Singh Chaplot 机构：CMU-ML-,-, Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Thesis Committee:, Ruslan Salakhutdinov, Chair, Abhinav Gupta, Deva Ramanan, Jitendra Malik, Submitted in partial fulfillment of the requirements 备注：CMU Ph.D. Thesis, March 2021. For more details see this http URL 链接：https://arxiv.org/abs/2106.13415 摘要：在过去十年中，机器学习的突破导致了“数字智能”，即机器学习模型能够从大量的标记数据中学习，以执行一些数字任务，如语音识别、人脸识别、机器翻译等。本论文的目标是在设计“物理智能”算法方面取得进展，即构建智能自主导航代理，能够学习在物理世界中执行复杂的导航任务，包括视觉感知、自然语言理解、推理、规划，以及顺序决策。尽管经典的导航方法在过去的几十年中取得了一些进展，但是当前的导航代理在长期的语义导航任务中仍然很困难。在论文的第一部分，我们讨论了利用端到端强化学习来解决诸如障碍回避、语义感知、语言基础和推理等问题的短期导航工作。在第二部分中，我们提出了一类新的基于模块化学习和结构化显式地图表示的导航方法，利用经典和端到端学习方法的优点来处理长期的导航任务。结果表明，这些方法能够有效地解决诸如定位、映射、长期规划、探索和语义先验学习等问题。这些模块化学习方法能够对空间和语义进行长期的理解，并在各种导航任务上取得最先进的效果。摘要：Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.

【17】 A Source-Criticism Debiasing Method for GloVe Embeddings 标题：一种手套嵌入的源批评去偏方法

作者：Hope McGovern 机构： University of Cambridge 链接：https://arxiv.org/abs/2106.13382 摘要：众所周知，在大型公共语料库中训练的单词嵌入始终表现出已知的人类社会偏见。尽管存在许多方法来进行借记，但是几乎所有的方法都致力于从嵌入中完全消除有偏见的信息，并且在这个过程中经常减小训练集的大小。在本文中，我们提出了一种简单而有效的去除手套词嵌入的方法（Pennington et al.，2014），该方法通过合并关于训练集偏差的显式信息而不是直接删除偏差数据。我们的方法在Brunet等人（2019）的快速偏置梯度近似方法的帮助下快速有效地运行。由于我们的方法类似于人文学科中的“源头批评”概念，我们将我们的方法称为“源头批评手套”（SC手套）。结果表明，SC-GloVe在不牺牲训练数据和TOP-1性能的前提下，减小了单词嵌入关联测试（WEAT）集的影响大小。摘要：It is well-documented that word embeddings trained on large public corpora consistently exhibit known human social biases. Although many methods for debiasing exist, almost all fixate on completely eliminating biased information from the embeddings and often diminish training set size in the process. In this paper, we present a simple yet effective method for debiasing GloVe word embeddings (Pennington et al., 2014) which works by incorporating explicit information about training set bias rather than removing biased data outright. Our method runs quickly and efficiently with the help of a fast bias gradient approximation method from Brunet et al. (2019). As our approach is akin to the notion of 'source criticism' in the humanities, we term our method Source-Critical GloVe (SC-GloVe). We show that SC-GloVe reduces the effect size on Word Embedding Association Test (WEAT) sets without sacrificing training data or TOP-1 performance.

【18】 Physics perception in sloshing scenes with guaranteed thermodynamic consistency 标题：保证热力学一致性的晃动场景中的物理感知

作者：Beatriz Moya,Alberto Badias,David Gonzalez,Francisco Chinesta,Elias Cueto 机构：Aragon Institute in Engineering Research, University of Zaragoza, Zaragoza, Spain, ESI Group chair. PIMM Lab., ENSAM Institute of Technology, Paris, France 备注：20 pages, 11 figures 链接：https://arxiv.org/abs/2106.13301 摘要：物理感知经常面临这样一个问题，即只有有限的数据或现场的部分测量数据可用。在这项工作中，我们提出了一个策略，从测量的自由表面来学习液体晃动的完整状态。该方法基于递归神经网络（RNN），将有限的信息投影到降阶流形上，不仅可以重构未知信息，而且能够实时地对未来场景进行流体推理。为了获得物理上一致的预测，我们在降阶流形上训练深层神经网络，通过引入诱导偏差，确保热力学原理的实现。RNN从历史中学习所需的隐藏信息，将有限的信息与模拟发生的潜在空间相关联。最后，解码器将数据返回到高维流形，以增强现实的形式向用户提供有见地的信息。将该算法与计算机视觉系统相结合，利用实际信息对该方法的性能进行测试，从而使系统能够实时地了解和预测被观测流体的未来状态。摘要：Physics perception very often faces the problem that only limited data or partial measurements on the scene are available. In this work, we propose a strategy to learn the full state of sloshing liquids from measurements of the free surface. Our approach is based on recurrent neural networks (RNN) that project the limited information available to a reduced-order manifold so as to not only reconstruct the unknown information, but also to be capable of performing fluid reasoning about future scenarios in real time. To obtain physically consistent predictions, we train deep neural networks on the reduced-order manifold that, through the employ of inductive biases, ensure the fulfillment of the principles of thermodynamics. RNNs learn from history the required hidden information to correlate the limited information with the latent space where the simulation occurs. Finally, a decoder returns data back to the high-dimensional manifold, so as to provide the user with insightful information in the form of augmented reality. This algorithm is connected to a computer vision system to test the performance of the proposed methodology with real information, resulting in a system capable of understanding and predicting future states of the observed fluid in real-time.

【19】 InteL-VAEs: Adding Inductive Biases to Variational Auto-Encoders via Intermediary Latents 标题：英特尔-VAE：通过中间延迟向可变自动编码器添加感应偏差

作者：Ning Miao,Emile Mathieu,N. Siddharth,Yee Whye Teh,Tom Rainforth 机构： [ 47] find that simply changing p(z) is typically 1Department of Statistics, University of Oxford, 2University of Edinburgh and the Alan Turing InstituteCorrespondence to 链接：https://arxiv.org/abs/2106.13746 摘要：本文介绍了一种简单而有效的方法，通过使用一组潜在变量来学习具有可控诱导偏差的VAE。这使我们能够克服标准高斯先验假设的局限性。特别地，它允许我们对学习的表示施加期望的属性，如稀疏性或聚类，并将先验信息合并到学习的模型中。我们称之为中间潜空间VAE（InteL-VAE）的方法是基于控制中间潜变量编码过程的随机性，然后将它们确定地映射到目标潜表示，从中执行重构。这使得我们能够保持传统VAE框架的所有优点，同时通过潜在映射合并期望的先验信息、归纳偏差，甚至拓扑信息。我们表明，这反过来又可以让英特尔VAE学习更好的生成模型和表示。摘要：We introduce a simple and effective method for learning VAEs with controllable inductive biases by using an intermediary set of latent variables. This allows us to overcome the limitations of the standard Gaussian prior assumption. In particular, it allows us to impose desired properties like sparsity or clustering on learned representations, and incorporate prior information into the learned model. Our approach, which we refer to as the Intermediary Latent Space VAE (InteL-VAE), is based around controlling the stochasticity of the encoding process with the intermediary latent variables, before deterministically mapping them forward to our target latent representation, from which reconstruction is performed. This allows us to maintain all the advantages of the traditional VAE framework, while incorporating desired prior information, inductive biases, and even topological information through the latent mapping. We show that this, in turn, allows InteL-VAEs to learn both better generative models and representations.

【20】 Primordial non-Gaussianity from the Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey I: Catalogue Preparation and Systematic Mitigation 标题：来自已完成的SDSS-IV扩展重子振荡光谱调查的原始非高斯现象I：星表编制和系统消除

作者：Mehdi Rezaie,Ashley J. Ross,Hee-Jong Seo,Eva-Maria Mueller,Will J. Percival,Grant Merz,Reza Katebi,Razvan C. Bunescu,Julian Bautista,Joel R. Brownstein,Etienne Burtin,Kyle Dawson,Héctor Gil-Marín,Jiamin Hou,Eleanor B. Lyke,Axel de la Macorra,Graziano Rossi,Donald P. Schneider,Pauline Zarrouk,Gong-Bo Zhao 机构：Department of Physics and Astronomy, Ohio University, Athens, OH , USA, Center of Cosmology and AstroParticle Physics, The Ohio State University, Columbus, OH , USA 备注：17 pages, 13 figures, 2 tables. Accepted for publication in MNRAS. For the associated code and value-added catalogs see this https URL and this https URL 链接：https://arxiv.org/abs/2106.13724 摘要：我们研究了最近完成的扩展重子振荡光谱测量（eBOSS）中的类星体的最终光谱样品的大规模聚集。该示例包含红移范围为$0.8<z<2.2$的$343708$对象和红移范围为$2.2<z<3.5$的$72667$对象，有效面积为$4699~{rm deg}^{2}$。我们开发了一种基于神经网络的方法来减轻由于成像数据质量的空间变化而引起的密度场的虚假波动，用于选择后续光谱的目标。使用与实际数据相同的角分布和径向分布进行仿真，以估计协方差矩阵，进行误差分析，并评估剩余系统不确定性。我们测量了eBOSS类星体的平均密度对比度和相互关联，并与成像系统学的潜在来源的地图进行比较，以说明算法的有效性，发现基于神经网络的方法优于标准线性回归。恒星密度是伪涨落的最重要来源之一，利用Gaia航天器的数据构建的新模板提供了与观测到的类星体群的最佳匹配。这项工作的最终成果是一个新的增值类星体目录，它具有改进的权值以校正非线性成像系统效应，并将公开。我们的类星体目录是用来测量局部型原始非高斯性在我们的同伴论文，穆勒等人在准备。摘要：We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{rm deg}^{2}$. We develop a neural network-based approach to mitigate spurious fluctuations in the density field caused by spatial variations in the quality of the imaging data used to select targets for follow-up spectroscopy. Simulations are used with the same angular and radial distributions as the real data to estimate covariance matrices, perform error analyses, and assess residual systematic uncertainties. We measure the mean density contrast and cross-correlations of the eBOSS quasars against maps of potential sources of imaging systematics to address algorithm effectiveness, finding that the neural network-based approach outperforms standard linear regression. Stellar density is one of the most important sources of spurious fluctuations, and a new template constructed using data from the Gaia spacecraft provides the best match to the observed quasar clustering. The end-product from this work is a new value-added quasar catalogue with the improved weights to correct for nonlinear imaging systematic effects, which will be made public. Our quasar catalogue is used to measure the local-type primordial non-Gaussianity in our companion paper, Mueller et al. in preparation.

【21】 Accelerated Computation of a High Dimensional Kolmogorov-Smirnov Distance 标题：高维Kolmogorov-Smirnov距离的加速计算

作者：Alex Hagen,Shane Jackson,James Kahn,Jan Strube,Isabel Haide,Karl Pazdernik,Connor Hainje 机构：Hainje, Pacific Northwest National Laboratory, Richland, WA, USA, Karlsruhe Institute of Technology, Karlsruhe, Germany, ! 备注：Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence 链接：https://arxiv.org/abs/2106.13706 摘要：统计检验是一个广泛的和关键的各种科学学科。机器学习的出现和计算能力的提高增加了人们对多维数据分析和统计测试的兴趣。我们将强大的Kolmogorov-Smirnov双样本检验扩展到高维形式，其方式与Fasano（Fasano，1987）相似。我们将我们的结果称为d维Kolmogorov-Smirnov检验（ddKS），并提供了三个新的贡献：我们建立了一个给定ddKS分数显著性的分析方程，我们提供了一个在现代计算硬件上计算ddKS的算法，该算法对于小样本量和小维数具有恒定的时间复杂度，并给出了ddKS的两种近似计算方法：一种是在大样本下将时间复杂度降为线性，另一种是随着维数的增加将时间复杂度降为线性。我们在一组数据集上对ddKS及其近似值进行了功率分析，并与其他常见的高维双样本检验和距离：Hotelling的T^2检验和Kullback-Leibler散度进行了比较。我们的ddKS测试对所有被测试的数据集、维度和大小都有很好的表现，而其他的测试和距离都不能拒绝至少一个数据集上的空假设。因此，我们得出结论，ddKS是一个强大的多维双样本测试的一般用途，并可以在一个快速有效的方式计算使用我们的并行或近似方法。本文中描述的所有方法的开源实现都位于https://github.com/pnnl/ddks. 摘要：Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. We perform power analysis of ddKS and its approximations on a corpus of datasets and compare to other common high dimensional two sample tests and distances: Hotelling's T^2 test and Kullback-Leibler divergence. Our ddKS test performs well for all datasets, dimensions, and sizes tested, whereas the other tests and distances fail to reject the null hypothesis on at least one dataset. We therefore conclude that ddKS is a powerful multidimensional two sample test for general use, and can be calculated in a fast and efficient manner using our parallel or approximate methods. Open source implementations of all methods described in this work are located at https://github.com/pnnl/ddks.

【22】 A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems 标题：非凸无调谐鲁棒回归问题的近邻优化最小化算法

作者：Peipei Tang,Chengjing Wang,Bo Jiang 备注：31 pages, 7 tables 链接：https://arxiv.org/abs/2106.13683 摘要：本文提出了一种求解非凸无调谐稳健回归问题的近优最小化（PPMM）算法。其基本思想是采用基于稀疏半光滑牛顿法（SSN）的近点算法（PPA）求解具有内子问题的非凸问题。必须强调的是，算法设计的主要难点在于如何克服内子问题的奇异性。此外，我们还证明了PPMM算法收敛于d-平稳点。由于问题的Kurdyka-Lojasiewicz（KL）性质，我们给出了PPMM算法的收敛速度。数值实验表明，该算法优于现有的最新算法。摘要：In this paper, we introduce a proximal-proximal majorization-minimization (PPMM) algorithm for nonconvex tuning-free robust regression problems. The basic idea is to apply the proximal majorization-minimization algorithm to solve the nonconvex problem with the inner subproblems solved by a sparse semismooth Newton (SSN) method based proximal point algorithm (PPA). We must emphasize that the main difficulty in the design of the algorithm lies in how to overcome the singular difficulty of the inner subproblem. Furthermore, we also prove that the PPMM algorithm converges to a d-stationary point. Due to the Kurdyka-Lojasiewicz (KL) property of the problem, we present the convergence rate of the PPMM algorithm. Numerical experiments demonstrate that our proposed algorithm outperforms the existing state-of-the-art algorithms.

【23】 Online Self-Attentive Gated RNNs for Real-Time Speaker Separation 标题：用于实时说话人分离的在线自关注门控RNN

作者：Ori Kabeli,Yossi Adi,Zhenyu Tang,Buye Xu,Anurag Kumar 机构：Facebook AI Research, TLV, Israel, Facebook Reality Labs, Redmond, WA, USA, University of Maryland, College Park, MD, USA 链接：https://arxiv.org/abs/2106.13493 摘要：深度神经网络在单、双耳盲源分离方面取得了巨大的成功。虽然这些方法被证明能产生高质量的分离，但它们主要应用于离线设置下，即模型在分离信号的同时可以访问完整的输入信号。在这项研究中，我们将一个非因果的最新分离模型转换成一个因果的实时模型，并评估其在在线和离线环境下的性能。我们比较了所提出的模型与几种基线方法在消声、噪声和混响记录条件下的性能，同时考察了单耳和双耳的输入和输出。我们的发现揭示了分离时因果模型和非因果模型之间的相对差异。与离线模式相比，我们在线分离的有状态实现导致性能略有下降；单耳输入为0.8dB，双耳输入为0.3dB，实时系数为0.65。样本可在以下链接中找到：https://kwanum.github.io/sagrnnc-stream-results/. 摘要：Deep neural networks have recently shown great success in the task of blind source separation, both under monaural and binaural settings. Although these methods were shown to produce high-quality separations, they were mainly applied under offline settings, in which the model has access to the full input signal while separating the signal. In this study, we convert a non-causal state-of-the-art separation model into a causal and real-time model and evaluate its performance under both online and offline settings. We compare the performance of the proposed model to several baseline methods under anechoic, noisy, and noisy-reverberant recording conditions while exploring both monaural and binaural inputs and outputs. Our findings shed light on the relative difference between causal and non-causal models when performing separation. Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0.8dB for monaural inputs and 0.3dB for binaural inputs while reaching a real-time factor of 0.65. Samples can be found under the following link: https://kwanum.github.io/sagrnnc-stream-results/.

【24】 Binary Matrix Factorisation and Completion via Integer Programming 标题：基于整数规划的二元矩阵分解与补全

作者：Reka A. Kovacs,Oktay Gunluk,Raphael A. Hauser 机构：aCornell UniversitybUniversity of OxfordcThe Alan Turing InstituteAbstractBinary matrix factorisation is an essential tool for identifying discrete patterns in binary data 链接：https://arxiv.org/abs/2106.13434 摘要：二进制矩阵分解是识别二进制数据中离散模式的重要工具。本文研究了布尔运算下的秩-k二元矩阵分解问题（k-BMF）：给出了一个可能缺项的nxm二元矩阵x，需要分别找到维数为nxk和kxm的两个二元矩阵A和B，使X与A和B的布尔乘积之间的距离在Frobenius距离的平方中最小化。本文给出了一个紧整数规划和两个指数型整数规划，证明了紧整数规划具有弱的LP松弛，而指数型整数规划具有较强的等效LP松弛。我们引入了一个新的目标函数，它不同于传统的平方Frobenius目标，它将一个权重赋予输入矩阵的零项，该项权重与秩k因子分解错误覆盖零的次数成正比。对于一个指数大小的IPs，我们描述了一种基于列生成的计算方法。在合成和实词数据集上的实验结果表明，我们的整数规划方法与现有的k-BMF方法相比是有竞争力的，并且提供了准确的低误差因子。摘要：Binary matrix factorisation is an essential tool for identifying discrete patterns in binary data. In this paper we consider the rank-k binary matrix factorisation problem (k-BMF) under Boolean arithmetic: we are given an n x m binary matrix X with possibly missing entries and need to find two binary matrices A and B of dimension n x k and k x m respectively, which minimise the distance between X and the Boolean product of A and B in the squared Frobenius distance. We present a compact and two exponential size integer programs (IPs) for k-BMF and show that the compact IP has a weak LP relaxation, while the exponential size LPs have a stronger equivalent LP relaxation. We introduce a new objective function, which differs from the traditional squared Frobenius objective in attributing a weight to zero entries of the input matrix that is proportional to the number of times the zero is erroneously covered in a rank-k factorisation. For one of the exponential size IPs we describe a computational approach based on column generation. Experimental results on synthetic and real word datasets suggest that our integer programming approach is competitive against available methods for k-BMF and provides accurate low-error factorisations.

linux https 网络安全数据挖掘批量计算

0 人点赞