机器学习学术速递[12.20]

2021-12-22 17:12:34 浏览数 (1)

cs.LG 方向,今日共计82篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 Embedding Graph Convolutional Networks in Recurrent Neural Networks for Predictive Monitoring 标题:用于预测监测的递归神经网络嵌入图卷积网络 链接:https://arxiv.org/abs/2112.09641

作者:Efrén Rama-Maneiro,Juan C. Vidal,Manuel Lama 机构:Centro Singular de Investigaci´on en Tecnolox´ıas Intelixentes (CiTIUS). Universidade de, Santiago de Compostela. , Santiago de Compostela, SPAIN., Departamento de Electr´onica e Computaci´on. Universidade de Santiago de Compostela 摘要:业务流程的预测监控是流程挖掘的一个子领域,旨在预测下一事件的特征或下一事件的序列。虽然已经提出了多种基于深度学习的方法,主要是递归神经网络和卷积神经网络,但没有一种方法真正利用过程模型中可用的结构信息。本文提出了一种基于图卷积网络和递归神经网络的方法,该方法直接使用来自过程模型的信息。对真实事件日志的实验评估表明,我们的方法更加一致,并且优于当前最先进的方法。 摘要:Predictive monitoring of business processes is a subfield of process mining that aims to predict, among other things, the characteristics of the next event or the sequence of next events. Although multiple approaches based on deep learning have been proposed, mainly recurrent neural networks and convolutional neural networks, none of them really exploit the structural information available in process models. This paper proposes an approach based on graph convolutional networks and recurrent neural networks that uses information directly from the process model. An experimental evaluation on real-life event logs shows that our approach is more consistent and outperforms the current state-of-the-art approaches.

【2】 Learning from Heterogeneous Data Based on Social Interactions over Graphs 标题:基于图上社会交互的异构数据学习 链接:https://arxiv.org/abs/2112.09483

作者:Virginia Bordignon,Stefan Vlaski,Vincenzo Matta,Ali H. Sayed 摘要:这项工作提出了一种分散的体系结构,其中单个代理的目标是解决分类问题,同时观察不同维度的流特征以及可能不同的分布。在社会学习的背景下,已经开发了一些有用的策略,这些策略通过跨分布式代理的局部合作来解决决策问题,并允许他们从流数据中学习。然而,传统的社会学习策略依赖于这样一个基本假设,即每个主体对观察结果的潜在分布具有重要的先验知识。在这项工作中,我们通过引入一个机器学习框架来克服这个问题,该框架利用图形上的社会交互,从而为分布式分类问题提供了一个完全数据驱动的解决方案。在提出的社会机器学习(SML)策略中,存在两个阶段:在训练阶段,使用有限数量的训练样本对分类器进行独立训练,以生成对一组假设的信念;在预测阶段,分类器评估流式未标记观测值,并与相邻分类器共享其瞬时信念。我们表明,SML策略使代理能够在这种高度异构的环境下进行一致的学习,并允许网络在决定未标记样本的预测阶段继续学习。预测决策用于此后以一种明显不同于大多数现有静态分类方案的方式持续改进性能,其中,在训练后,未标记数据的决策不会重新用于改进未来性能。 摘要:This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions and arising from possibly different distributions. In the context of social learning, several useful strategies have been developed, which solve decision making problems through local cooperation across distributed agents and allow them to learn from streaming data. However, traditional social learning strategies rely on the fundamental assumption that each agent has significant prior knowledge of the underlying distribution of the observations. In this work we overcome this issue by introducing a machine learning framework that exploits social interactions over a graph, leading to a fully data-driven solution to the distributed classification problem. In the proposed social machine learning (SML) strategy, two phases are present: in the training phase, classifiers are independently trained to generate a belief over a set of hypotheses using a finite number of training samples; in the prediction phase, classifiers evaluate streaming unlabeled observations and share their instantaneous beliefs with neighboring classifiers. We show that the SML strategy enables the agents to learn consistently under this highly-heterogeneous setting and allows the network to continue learning even during the prediction phase when it is deciding on unlabeled samples. The prediction decisions are used to continually improve performance thereafter in a manner that is markedly different from most existing static classification schemes where, following training, the decisions on unlabeled data are not re-used to improve future performance.

【3】 Community-based Layerwise Distributed Training of Graph Convolutional Networks 标题:基于社区的图卷积网络分层分布式训练 链接:https://arxiv.org/abs/2112.09335

作者:Hongyi Li,Junxiang Wang,Yongchao Wang,Yue Cheng,Liang Zhao 机构:The State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, Shaanxi, China, Department of Computer Science and Informatics, Emory University, Atlanta, Georgia, USA 备注:accepted by NeurIPS 2021 OPT workshop 摘要:图卷积网络(GCN)已经成功地应用于许多基于图的应用中。然而,训练大规模GCN模型仍然具有挑战性:由于GCN体系结构的节点依赖性和层依赖性,在训练过程中需要大量的计算时间和内存。本文提出了一种基于交替方向乘子法(ADMM)的并行分布式GCN训练算法来同时解决这两个问题。我们首先将GCN层拆分为独立的块,以实现层并行性。此外,我们通过将图划分为几个密集的社区来减少节点依赖性,这样每个社区都可以用一个代理并行地进行训练。最后,我们为基于社区的ADMM算法中的所有子问题提供解决方案。初步结果表明,我们提出的基于社区的ADMM训练算法与最先进的方法相比,在获得最佳性能的同时,可以实现三倍以上的加速比。 摘要:The Graph Convolutional Network (GCN) has been successfully applied to many graph-based applications. Training a large-scale GCN model, however, is still challenging: Due to the node dependency and layer dependency of the GCN architecture, a huge amount of computational time and memory is required in the training process. In this paper, we propose a parallel and distributed GCN training algorithm based on the Alternating Direction Method of Multipliers (ADMM) to tackle the two challenges simultaneously. We first split GCN layers into independent blocks to achieve layer parallelism. Furthermore, we reduce node dependency by dividing the graph into several dense communities such that each of them can be trained with an agent in parallel. Finally, we provide solutions for all subproblems in the community-based ADMM algorithm. Preliminary results demonstrate that our proposed community-based ADMM training algorithm can lead to more than triple speedup while achieving the best performance compared with state-of-the-art methods.

【4】 Link-Intensive Alignment for Incomplete Knowledge Graphs 标题:不完备知识图的链接密集对齐 链接:https://arxiv.org/abs/2112.09266

作者:Vinh Van Tong,Thanh Trung Huynh,Thanh Tam Nguyen,Hongzhi Yin,Quoc Viet Hung Nguyen,Quyet Thang Huynh 机构:Hanoi University of Science and Technology, Vietnam, Griffith University, Australia, The University of Queensland, Australia 摘要:知识图(KG)对齐——识别不同KG中引用相同事物的实体的任务——被认为是KG构建和完成领域中最重要的操作之一。然而,现有的对齐技术通常假设输入KG是完整的和同构的,这是不正确的,因为现实世界中的域、大小和稀疏性存在异质性。在这项工作中,我们解决了将不完全KG与表征学习相结合的问题。我们的KG嵌入框架利用了两个特征通道:基于传递性和基于邻近性。前者通过转换路径捕捉实体间的一致性约束,后者通过注意引导的关系感知图神经网络捕捉KG的邻域结构。两个特征通道被联合学习以在输入KG之间交换重要特征,同时在相同的嵌入空间中强制输入KG的输出表示。此外,我们还开发了一个缺失链接检测器,用于在训练过程中发现并恢复输入KG中的缺失链接,这有助于缓解不完整性问题,从而提高学习表示的兼容性。然后融合嵌入以生成对齐结果,并将高置信度匹配的节点对更新为预对齐的监控数据以逐步改进嵌入。实证结果表明,我们的模型比SOTA高出15.2%,并且对不同程度的不完全性具有鲁棒性。我们还证明,KG之间的知识交换有助于从知识图(也称为知识完成)中揭示不可见的事实,其结果比SOTA知识图完成技术高3.5%。 摘要:Knowledge graph (KG) alignment - the task of recognizing entities referring to the same thing in different KGs - is recognized as one of the most important operations in the field of KG construction and completion. However, existing alignment techniques often assume that the input KGs are complete and isomorphic, which is not true due to the real-world heterogeneity in the domain, size, and sparsity. In this work, we address the problem of aligning incomplete KGs with representation learning. Our KG embedding framework exploits two feature channels: transitivity-based and proximity-based. The former captures the consistency constraints between entities via translation paths, while the latter captures the neighbourhood structure of KGs via attention guided relation-aware graph neural network. The two feature channels are jointly learned to exchange important features between the input KGs while enforcing the output representations of the input KGs in the same embedding space. Also, we develop a missing links detector that discovers and recovers the missing links in the input KGs during the training process, which helps mitigate the incompleteness issue and thus improve the compatibility of the learned representations. The embeddings then are fused to generate the alignment result, and the high-confidence matched node pairs are updated to the pre-aligned supervision data to improve the embeddings gradually. Empirical results show that our model is up to 15.2% more accurate than the SOTA and is robust against different levels of incompleteness. We also demonstrate that the knowledge exchanging between the KGs helps reveal the unseen facts from knowledge graphs (a.k.a. knowledge completion), with the result being 3.5% higher than the SOTA knowledge graph completion techniques.

【5】 Two-view Graph Neural Networks for Knowledge Graph Completion 标题:用于知识图补全的双视图图神经网络 链接:https://arxiv.org/abs/2112.09231

作者:Vinh Tong,Dai Quoc Nguyen,Dinh Phung,Dat Quoc Nguyen 机构:VinAI Research, Vietnam; ,Oracle Labs, Australia; ,Monash University, Australia 摘要:在本文中,我们引入了一种新的基于GNN的知识图嵌入模型WGE,用于捕获以实体为中心的图结构和以关系为中心的图结构。特别是,给定知识图,WGE构建一个以实体为中心的无向图,将实体视为节点。此外,WGE还从以关系为中心的约束构造另一个单一的无向图,将实体和关系视为节点。然后,WGE提出了一种新的体系结构,在这两个单独的图上直接使用两个普通GNN来更好地更新实体和关系的向量表示,然后使用加权分数函数来返回三重分数。实验结果表明,WGE在三个新的具有挑战性的基准数据集CoDEx上获得了最先进的性能,用于知识图的完成。 摘要:In this paper, we introduce a novel GNN-based knowledge graph embedding model, named WGE, to capture entity-focused graph structure and relation-focused graph structure. In particular, given the knowledge graph, WGE builds a single undirected entity-focused graph that views entities as nodes. In addition, WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as nodes. WGE then proposes a new architecture of utilizing two vanilla GNNs directly on these two single graphs to better update vector representations of entities and relations, followed by a weighted score function to return the triple scores. Experimental results show that WGE obtains state-of-the-art performances on three new and challenging benchmark datasets CoDEx for knowledge graph completion.

【6】 Constraint-based graph network simulator 标题:基于约束的图网络仿真器 链接:https://arxiv.org/abs/2112.09161

作者:Yulia Rubanova,Alvaro Sanchez-Gonzalez,Tobias Pfaff,Peter Battaglia 机构:DeepMind, London, UK 摘要:在快速发展的学习型物理仿真器领域,几乎所有的方法都训练直接从输入状态预测未来状态的正向模型。然而,许多传统的模拟引擎使用基于约束的方法,而不是直接预测。在这里,我们提出了一个基于约束的学习模拟框架,其中标量约束函数被实现为一个神经网络,未来的预测被计算为这些学习约束下优化问题的解。我们使用图形神经网络作为约束函数,梯度下降作为约束求解器来实现我们的方法。该体系结构可以通过标准反向传播进行训练。我们在各种具有挑战性的物理领域对模型进行了测试,包括模拟绳索、弹跳球、碰撞不规则形状和飞溅液体。我们的模型与顶尖的仿真器相比具有更好或可比的性能。我们的模型的一个关键优势是能够在测试时推广到更多的解算器迭代,以提高模拟精度。我们还展示了如何在测试时添加手工设计的约束,以满足训练数据中不存在的目标,这在正向方法中是不可能的。我们基于约束的框架适用于使用前向学习模拟器的任何环境,并演示了学习模拟器如何利用额外的归纳偏差以及数值方法领域的技术。 摘要:In the rapidly advancing area of learned physical simulators, nearly all methods train forward models that directly predict future states from input states. However, many traditional simulation engines use a constraint-based approach instead of direct prediction. Here we present a framework for constraint-based learned simulation, where a scalar constraint function is implemented as a neural network, and future predictions are computed as the solutions to optimization problems under these learned constraints. We implement our method using a graph neural network as the constraint function and gradient descent as the constraint solver. The architecture can be trained by standard backpropagation. We test the model on a variety of challenging physical domains, including simulated ropes, bouncing balls, colliding irregular shapes and splashing fluids. Our model achieves better or comparable performance to top learned simulators. A key advantage of our model is the ability to generalize to more solver iterations at test time to improve the simulation accuracy. We also show how hand-designed constraints can be added at test time to satisfy objectives which were not present in the training data, which is not possible with forward approaches. Our constraint-based framework is applicable to any setting where forward learned simulators are used, and demonstrates how learned simulators can leverage additional inductive biases as well as the techniques from the field of numerical methods.

GAN|对抗|攻击|生成相关(4篇)

【1】 Information-theoretic stochastic contrastive conditional GAN: InfoSCC-GAN 标题:信息论随机对比条件GaN:InfoSCC-GaN 链接:https://arxiv.org/abs/2112.09653

作者:Vitaliy Kinakh,Mariia Drozdova,Guillaume Quétant,Tobias Golling,Slava Voloshynovskiy 机构:Department of Computer Science, Department of Particle Physics, University of Geneva, Switzerland 摘要:条件生成是生成问题的一个子类,其中生成的输出受属性信息的约束。本文提出了一种具有可探测潜在空间的随机对比条件生成对抗网络(InfoSCC-GAN)。InfoSCC-GAN架构基于基于InfoNCE范式的无监督对比编码器、属性分类器和特征GAN生成器。我们提出了一种新的训练方法,基于每$n$-次迭代使用外部或内部属性的生成器正则化,使用预训练的对比编码器和预训练的分类器。基于输入数据和潜在空间表示以及潜在空间和生成数据之间的互信息最大化的信息论公式,推导了所提出的InfoSCC-GAN。因此,我们证明了训练目标函数与上述信息论公式之间的联系。实验结果表明,在AFHQ和CelebA数据集上,InfoSCC-GAN在图像生成方面优于“香草”特征GAN。此外,我们通过烧蚀研究调查鉴别器结构和损耗函数的影响。最后,我们证明,由于采用了特征根生成器,与普通的确定性根相比,该框架具有随机生成能力,但与现有框架相比,编码器、分类器和生成器具有独立的训练。代码、实验结果和演示可在线访问https://github.com/vkinakh/InfoSCC-GAN. 摘要:Conditional generation is a subclass of generative problems where the output of the generation is conditioned by the attribute information. In this paper, we present a stochastic contrastive conditional generative adversarial network (InfoSCC-GAN) with an explorable latent space. The InfoSCC-GAN architecture is based on an unsupervised contrastive encoder built on the InfoNCE paradigm, an attribute classifier and an EigenGAN generator. We propose a novel training method, based on generator regularization using external or internal attributes every $n$-th iteration, using a pre-trained contrastive encoder and a pre-trained classifier. The proposed InfoSCC-GAN is derived based on an information-theoretic formulation of mutual information maximization between input data and latent space representation as well as latent space and generated data. Thus, we demonstrate a link between the training objective functions and the above information-theoretic formulation. The experimental results show that InfoSCC-GAN outperforms the "vanilla" EigenGAN in the image generation on AFHQ and CelebA datasets. In addition, we investigate the impact of discriminator architectures and loss functions by performing ablation studies. Finally, we demonstrate that thanks to the EigenGAN generator, the proposed framework enjoys a stochastic generation in contrast to vanilla deterministic GANs yet with the independent training of encoder, classifier, and generator in contrast to existing frameworks. Code, experimental results, and demos are available online at https://github.com/vkinakh/InfoSCC-GAN.

【2】 Generation of data on discontinuous manifolds via continuous stochastic non-invertible networks 标题:基于连续随机不可逆网络的不连续流形数据生成 链接:https://arxiv.org/abs/2112.09646

作者:Mariia Drozdova,Vitaliy Kinakh,Guillaume Quétant,Tobias Golling,Slava Voloshynovskiy 机构:Department of Computer Science, Department of Particle Physics, University of Geneva, Switzerland 摘要:对于大多数已知的框架,如生成式自动编码器和生成式对抗网络,不连续分布的生成是一项困难的任务。生成的不可逆模型无法准确生成此类分布,需要长时间的训练,并且经常受到模式崩溃的影响。变分自动编码器(VAE)基于保持潜在空间为高斯分布的思想,以便进行简单采样,允许精确重建,但在生成任务中会遇到很大的限制。在这项工作中,我们使用预先训练的对比编码器来获得聚集的潜在空间,而不是试图将潜在空间保持为高斯分布。然后,对于表示单峰子流形的每个簇,我们训练一个专用的低复杂度网络,从高斯分布生成该子流形。该框架基于输入数据和潜在空间表示之间互信息最大化的信息论公式。我们推导了成本函数和信息论公式之间的联系。我们将我们的方法应用于合成二维分布,以演示使用连续随机网络重建和生成不连续分布。 摘要:The generation of discontinuous distributions is a difficult task for most known frameworks such as generative autoencoders and generative adversarial networks. Generative non-invertible models are unable to accurately generate such distributions, require long training and often are subject to mode collapse. Variational autoencoders (VAEs), which are based on the idea of keeping the latent space to be Gaussian for the sake of a simple sampling, allow an accurate reconstruction, while they experience significant limitations at generation task. In this work, instead of trying to keep the latent space to be Gaussian, we use a pre-trained contrastive encoder to obtain a clustered latent space. Then, for each cluster, representing a unimodal submanifold, we train a dedicated low complexity network to generate this submanifold from the Gaussian distribution. The proposed framework is based on the information-theoretic formulation of mutual information maximization between the input data and latent space representation. We derive a link between the cost functions and the information-theoretic formulation. We apply our approach to synthetic 2D distributions to demonstrate both reconstruction and generation of discontinuous distributions using continuous stochastic networks.

【3】 A Binded VAE for Inorganic Material Generation 标题:一种用于无机材料生成的键合VAE 链接:https://arxiv.org/abs/2112.09570

作者:Fouad Oubari,Antoine de Mathelin,Rodrigue Décatoire,Mathilde Mougeot 机构:Centre Borelli, UMR , ENS Paris Saclay, Michelin, Centre Borelli, UMR , ENS Paris Saclay, ENSIIE 备注:NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications 摘要:设计具有所需性能的新工业材料可能非常昂贵且耗时。主要的困难是生成与真实材料相对应的化合物。事实上,将化合物描述为成分比例向量的特点是离散特征和严重稀疏性。此外,传统的生成性模型验证过程(如视觉验证、FID和初始分数)是针对图像定制的,因此不能在这种情况下使用。为了解决这些问题,我们开发了一个原始的绑定VAE模型,专门用于生成具有高稀疏性的离散数据集。我们用适应化合物生成问题的新指标验证了模型。我们在橡胶化合物设计的实际问题上表明,所提出的方法优于标准生成模型,这为材料设计优化开辟了新的视角。 摘要:Designing new industrial materials with desired properties can be very expensive and time consuming. The main difficulty is to generate compounds that correspond to realistic materials. Indeed, the description of compounds as vectors of components' proportions is characterized by discrete features and a severe sparsity. Furthermore, traditional generative model validation processes as visual verification, FID and Inception scores are tailored for images and cannot then be used as such in this context. To tackle these issues, we develop an original Binded-VAE model dedicated to the generation of discrete datasets with high sparsity. We validate the model with novel metrics adapted to the problem of compounds generation. We show on a real issue of rubber compound design that the proposed approach outperforms the standard generative models which opens new perspectives for material design optimization.

【4】 Provable Adversarial Robustness in the Quantum Model 标题:量子模型中的可证对抗鲁棒性 链接:https://arxiv.org/abs/2112.09625

作者:Khashayar Barooti,Grzegorz Głuch,Ruediger Urbanke 机构:EPFL, Lausanne, Switzerland 摘要:近年来,现代机器学习系统已成功地应用于各种任务,但要使此类系统对输入实例的逆向选择修改具有鲁棒性似乎是一个更困难的问题。可以公平地说,迄今为止还没有找到完全令人满意的解决方案,也不清楚标准配方是否允许有原则的解决方案。因此,我们不考虑经典的有界扰动路径,我们考虑了类似于Bouthy和杰克逊(1995)引入的量子PAC学习模型的模型。我们的第一个关键贡献表明,在该模型中,我们可以将对抗性稳健性降低到两个经典学习理论问题的结合,即(问题1)发现生成模型的问题和(问题2)设计对分布转移具有稳健性的分类器的问题。我们的第二个关键贡献是,所考虑的框架不依赖于特定的(因此也有点武断的)威胁模型,如$ellp$有界扰动。相反,我们的减少保证,为了解决在我们的模型中的对抗鲁棒性问题,它足以考虑一个单一的距离概念,即HELLIGER距离。从技术角度来看,我们的协议主要基于量子计算授权的最新进展,例如Mahadev[2018]。尽管所考虑的模型是量子的,因此不能立即适用于“真实世界”的情况,但人们可能希望未来能找到将“真实世界”问题嵌入量子框架的方法,或者能找到能够模仿其强大量子对应物的经典算法。 摘要:Modern machine learning systems have been applied successfully to a variety of tasks in recent years but making such systems robust against adversarially chosen modifications of input instances seems to be a much harder problem. It is probably fair to say that no fully satisfying solution has been found up to date and it is not clear if the standard formulation even allows for a principled solution. Hence, rather than following the classical path of bounded perturbations, we consider a model similar to the quantum PAC-learning model introduced by Bshouty and Jackson [1995]. Our first key contribution shows that in this model we can reduce adversarial robustness to the conjunction of two classical learning theory problems, namely (Problem 1) the problem of finding generative models and (Problem 2) the problem of devising classifiers that are robust with respect to distributional shifts. Our second key contribution is that the considered framework does not rely on specific (and hence also somewhat arbitrary) threat models like $ell_p$ bounded perturbations. Instead, our reduction guarantees that in order to solve the adversarial robustness problem in our model it suffices to consider a single distance notion, i.e. the Hellinger distance. From the technical perspective our protocols are heavily based on the recent advances on delegation of quantum computation, e.g. Mahadev [2018]. Although the considered model is quantum and therefore not immediately applicable to ``real-world'' situations, one might hope that in the future either one can find a way to embed ``real-world'' problems into a quantum framework or that classical algorithms can be found that are capable of mimicking their powerful quantum counterparts.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation 标题:基于伪标签自训练的局部对比损失半监督医学图像分割 链接:https://arxiv.org/abs/2112.09645

作者:Krishna Chaitanya,Ertunc Erdil,Neerav Karani,Ender Konukoglu 备注:13 pages, 4 figures, 7 tables. This article is under review at a Journal 摘要:基于监督深度学习的方法可以得到准确的医学图像分割结果。然而,这需要大量的标记数据集,获取它们是一项艰巨的任务,需要临床专业知识。基于半监督/自监督学习的方法通过利用未标记数据和有限的注释数据来解决这一限制。最近的自监督学习方法使用对比损失从未标记图像中学习良好的全局级表示,并在流行的自然图像数据集(如ImageNet)上实现高性能的分类任务。在像素级预测任务(如分割)中,还必须学习良好的局部级表示和全局表示,以获得更好的精度。然而,现有的基于局部对比损失的方法对于学习良好的局部表示的影响仍然有限,因为相似和不同的局部区域是基于随机增强和空间邻近性定义的;由于在半监督/自监督环境中缺乏大规模专家注释,因此不基于局部区域的语义标签。在本文中,我们提出了一种局部对比损失法,通过利用从未标记图像的伪标签和有限的注释图像中获得的语义标签信息来学习用于分割的良好像素级特征。特别是,我们定义了建议的损失,以鼓励对具有相同伪标签/标签的像素进行类似表示,同时与数据集中具有不同伪标签/标签的像素的表示不同。我们执行基于伪标签的自训练,并通过联合优化建议的标记集和未标记集上的对比损失和仅有限标记集上的分割损失来训练网络。我们在三个公共心脏和前列腺数据集上进行了评估,获得了较高的分割性能。 摘要:Supervised deep learning-based methods yield accurate results for medical image segmentation. However, they require large labeled datasets for this, and obtaining them is a laborious task that requires clinical expertise. Semi/self-supervised learning-based approaches address this limitation by exploiting unlabeled data along with limited annotated data. Recent self-supervised learning methods use contrastive loss to learn good global level representations from unlabeled images and achieve high performance in classification tasks on popular natural image datasets like ImageNet. In pixel-level prediction tasks such as segmentation, it is crucial to also learn good local level representations along with global representations to achieve better accuracy. However, the impact of the existing local contrastive loss-based methods remains limited for learning good local representations because similar and dissimilar local regions are defined based on random augmentations and spatial proximity; not based on the semantic label of local regions due to lack of large-scale expert annotations in the semi/self-supervised setting. In this paper, we propose a local contrastive loss to learn good pixel level features useful for segmentation by exploiting semantic label information obtained from pseudo-labels of unlabeled images alongside limited annotated images. In particular, we define the proposed loss to encourage similar representations for the pixels that have the same pseudo-label/ label while being dissimilar to the representation of pixels with different pseudo-label/label in the dataset. We perform pseudo-label based self-training and train the network by jointly optimizing the proposed contrastive loss on both labeled and unlabeled sets and segmentation loss on only the limited labeled set. We evaluated on three public cardiac and prostate datasets, and obtain high segmentation performance.

【2】 Watermarking Images in Self-Supervised Latent Spaces 标题:自监督隐含空间中的图像水印 链接:https://arxiv.org/abs/2112.09581

作者:Pierre Fernandez,Alexandre Sablayrolles,Teddy Furon,Hervé Jégou,Matthijs Douze 机构:Facebook AI,Inria Rennes 摘要:我们根据自监督方法,重新讨论了基于预训练深度网络的水印技术。我们提出了一种将标记和二进制消息嵌入其潜在空间的方法,利用标记时的数据扩充。我们的方法可以在任何分辨率下运行,并创建对广泛变换(旋转、裁剪、JPEG、对比度等)鲁棒的水印。它显著优于以前的零位方法,并且它在多比特水印上的性能与最先进的编码端到端的水印解码器结构相吻合。我们的实现和模型将公开。 摘要:We revisit watermarking techniques based on pre-trained deep networks, in the light of self-supervised approaches. We present a way to embed both marks and binary messages into their latent spaces, leveraging data augmentation at marking time. Our method can operate at any resolution and creates watermarks robust to a broad range of transformations (rotations, crops, JPEG, contrast, etc). It significantly outperforms the previous zero-bit methods, and its performance on multi-bit watermarking is on par with state-of-the-art encoder-decoder architectures trained end-to-end for watermarking. Our implementation and models will be made publicly available.

【3】 ActKnow: Active External Knowledge Infusion Learning for Question Answering in Low Data Regime 标题:ActKnow:面向低数据问答的主动外部知识灌输学习 链接:https://arxiv.org/abs/2112.09423

作者:K. M. Annervaz,Pritam Kumar Nath,Ambedkar Dukkipati 机构:Computer Science and Automation, Indian Institute of Science 备注:14 pages, 5 figures, conference 摘要:深度学习模型在各种自然语言处理任务中设定了基准结果。然而,这些模型需要大量的训练数据,这在许多实际问题中是不可行的。虽然领域自适应、fewshot学习技术等各种技术都解决了这一问题,但我们引入了一种新技术,将外部知识积极注入到学习中,以解决低数据区问题。我们提出了一种称为ActKnow的技术,该技术主动地将基于“按需”的知识图(KG)中的知识注入到问答学习(QA)中。通过引入概念网中的世界知识,我们展示了ARC挑战集基准在低数据环境下相对于纯文本Transformer模型(如RoBERTa)的显著改进。例如,通过仅使用20%的训练示例,我们证明ARC挑战和OpenBookQA的准确性分别提高了4%。 摘要:Deep learning models have set benchmark results in various Natural Language Processing tasks. However, these models require an enormous amount of training data, which is infeasible in many practical problems. While various techniques like domain adaptation, fewshot learning techniques address this problem, we introduce a new technique of actively infusing external knowledge into learning to solve low data regime problems. We propose a technique called ActKnow that actively infuses knowledge from Knowledge Graphs (KG) based "on-demand" into learning for Question Answering (QA). By infusing world knowledge from Concept-Net, we show significant improvements on the ARC Challenge-set benchmark over purely text-based transformer models like RoBERTa in the low data regime. For example, by using only 20% training examples, we demonstrate a 4% improvement in the accuracy for both ARC-challenge and OpenBookQA, respectively.

【4】 Semi-Supervised Clustering via Markov Chain Aggregation 标题:基于马尔可夫链聚合的半监督聚类 链接:https://arxiv.org/abs/2112.09397

作者:Sophie Steger,Bernhard C. Geiger,Marek Smieja 机构:MAREK ŚMIEJA, Jagiellonian University, Poland 备注:13 pages, 6 figures; this is an extended version of a short paper accepted at ACM SAC 2022 摘要:我们将半监督聚类问题与约束马尔可夫聚合联系起来,即划分马尔可夫链状态空间的任务。我们通过将数据集中的每个数据点视为马尔可夫链状态空间的一个元素,通过相应数据点之间的相似性定义状态之间的转移概率,以及通过将半监督信息作为硬约束合并到Hartigan式算法中,来实现这种连接。引入的约束马尔可夫聚类(CoMaC)是最近一个用于(无监督)马尔可夫聚合的信息论框架在半监督情况下的扩展。为某些参数设置实例化CoMaC,进一步概括了无监督聚类的前两个信息论目标。我们的结果表明,中国商飞与最先进的公司具有竞争力。此外,与无监督的方法相比,我们的方法对超参数设置的敏感性较低,这在标记数据较少的半监督设置中尤其具有吸引力。 摘要:We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorporating semi-supervision information as hard constraints in a Hartigan-style algorithm. The introduced Constrained Markov Clustering (CoMaC) is an extension of a recent information-theoretic framework for (unsupervised) Markov aggregation to the semi-supervised case. Instantiating CoMaC for certain parameter settings further generalizes two previous information-theoretic objectives for unsupervised clustering. Our results indicate that CoMaC is competitive with the state-of-the-art. Furthermore, our approach is less sensitive to hyperparameter settings than the unsupervised counterpart, which is especially attractive in the semi-supervised setting characterized by little labeled data.

【5】 Expedition: A System for the Unsupervised Learning of a Hierarchy of Concepts 标题:探险:一种概念层次的无监督学习系统 链接:https://arxiv.org/abs/2112.09348

作者:Omid Madani 机构:Cisco Secure Workload, University Ave., Suite , Palo Alto, CA 摘要:我们提出了一个自底向上的累积学习系统,用于学习与有意义字符串及其部分相关和预测边缘相对应的无数概念。学习是自我监督的,因为发现的概念被用作预测因子以及预测目标。我们设计了一个目标,通过与基线预测系统的比较,对所学概念进行分段,从而促进更大概念的形成和使用,进而允许预测更大的文本跨度,我们描述了一种促进探索的简单技术,即在细分过程中尝试新生成的概念。我们激发并解释概念分层,以帮助分离概念之间学习的(条件)分布。概念的分层大致对应于部分-整体概念层次结构。通过基本的分词和学习算法,该系统有望获得许多概念(在我们的小规模实验中有上万个),并能很好地学习文本的分词:当输入英语文本时,去掉空格,从字符级别开始,所学的很多内容都尊重单词或短语的边界,随着时间的推移,随着更大的概念被发现,并且系统学会了在切分过程中何时使用它们,切分中的“坏”分裂(即单词内部分裂)的平均数量会减少。我们报告了当输入文本被转换成二进制并且系统只从两个概念开始,“0”和“1”时有希望的实验。该系统是透明的,从这个意义上讲,很容易判断所学的概念对应于什么,哪些概念在细分中是活跃的,或者系统如何“看到”其输入。我们希望这个框架是可扩展的,我们讨论了当前的局限性以及增强学习和推理能力的一些方向。 摘要:We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaningful character strings, and their part-related and prediction edges. The learning is self-supervised in that the concepts discovered are used as predictors as well as targets of prediction. We devise an objective for segmenting with the learned concepts, derived from comparing to a baseline prediction system, that promotes making and using larger concepts, which in turn allows for predicting larger spans of text, and we describe a simple technique to promote exploration, i.e. trying out newly generated concepts in the segmentation process. We motivate and explain a layering of the concepts, to help separate the (conditional) distributions learnt among concepts. The layering of the concepts roughly corresponds to a part-whole concept hierarchy. With rudimentary segmentation and learning algorithms, the system is promising in that it acquires many concepts (tens of thousands in our small-scale experiments), and it learns to segment text well: when fed with English text with spaces removed, starting at the character level, much of what is learned respects word or phrase boundaries, and over time the average number of "bad" splits within segmentations, i.e. splits inside words, decreases as larger concepts are discovered and the system learns when to use them during segmentation. We report on promising experiments when the input text is converted to binary and the system begins with only two concepts, "0" and "1". The system is transparent, in the sense that it is easy to tell what the concepts learned correspond to, and which ones are active in a segmentation, or how the system "sees" its input. We expect this framework to be extensible and we discuss the current limitations and a number of directions for enhancing the learning and inference capabilities.

【6】 Benchmarking Uncertainty Qualification on Biosignal Classification Tasks under Dataset Shift 标题:数据集漂移条件下生物信号分类任务的不确定性分析 链接:https://arxiv.org/abs/2112.09196

作者:Tong Xia,Jing Han,Cecilia Mascolo 机构:the Department of Computer Science and Technology, University of Cambridge, UK 备注:Accepted by THE 6TH INTERNATIONAL WORKSHOP ON HEALTH INTELLIGENCE (W3PHIAI-22) 摘要:生物信号是一种可以从人体连续测量的信号,如呼吸音、心脏活动(ECG)、脑电波(EEG)等,基于这些信号,机器学习模型已经被开发出来,在自动疾病检测和健康状态监测方面具有非常好的性能。然而,数据集转移,即推理的数据分布不同于训练的分布,对于真正基于生物信号的应用来说并不少见。为了提高稳健性,采用了具有不确定性限定的概率模型来捕获预测的可靠性。然而,评估估计不确定性的质量仍然是一项挑战。在这项工作中,我们提出了一个框架来评估估计的不确定性在捕获不同类型的生物信号数据集不同程度的变化时的能力。特别是,我们使用三个基于呼吸音和心电图信号的分类任务来测试五种具有代表性的不确定性鉴定方法。大量的实验表明,尽管集合模型和贝叶斯模型能够在数据集移动的情况下提供相对较好的不确定性估计,但所有被测试的模型都不能满足可信预测和模型校准的要求。我们的工作为任何新开发的生物信号分类器的综合评估铺平了道路。 摘要:A biosignal is a signal that can be continuously measured from human bodies, such as respiratory sounds, heart activity (ECG), brain waves (EEG), etc, based on which, machine learning models have been developed with very promising performance for automatic disease detection and health status monitoring. However, dataset shift, i.e., data distribution of inference varies from the distribution of the training, is not uncommon for real biosignal-based applications. To improve the robustness, probabilistic models with uncertainty qualification are adapted to capture how reliable a prediction is. Yet, assessing the quality of the estimated uncertainty remains a challenge. In this work, we propose a framework to evaluate the capability of the estimated uncertainty in capturing different types of biosignal dataset shifts with various degrees. In particular, we use three classification tasks based on respiratory sounds and electrocardiography signals to benchmark five representative uncertainty qualification methods. Extensive experiments show that, although Ensemble and Bayesian models could provide relatively better uncertainty estimations under dataset shifts, all tested models fail to meet the promise in trustworthy prediction and model calibration. Our work paves the way for a comprehensive evaluation for any newly developed biosignal classifiers.

【7】 High Fidelity Visualization of What Your Self-Supervised Representation Knows About 标题:自我监督制图表达所知内容的高保真可视化 链接:https://arxiv.org/abs/2112.09164

作者:Florian Bordes,Randall Balestriero,Pascal Vincent 机构:Meta AI, Mila, Universite de Montreal, Canada CIFAR AI Chair 摘要:发现神经网络学习到的知识仍然是一项挑战。在自监督学习中,分类是用来评估表示有多好的最常见任务。然而,仅仅依靠这种下游任务会限制我们对给定输入表示中保留多少信息的理解。在这项工作中,我们展示了使用基于条件扩散的生成模型(RCDM)来可视化自监督模型学习的表示。我们进一步展示了该模型的生成质量如何与最先进的生成模型相一致,同时忠实于表示用作调节的表示。通过使用这个新工具来分析自监督模型,我们可以直观地显示i)SSL(主干)表示对于他们所训练的许多数据扩充并不是真正不变的。ii)对于分类之类的任务,SSL似乎过于固定。iii)SSL表示对其输入的小干扰更具鲁棒性iv)通过SSL模型学习到的固有结构可用于图像处理。 摘要:Discovering what is learned by neural networks remains a challenge. In self-supervised learning, classification is the most common task used to evaluate how good a representation is. However, relying only on such downstream task can limit our understanding of how much information is retained in the representation of a given input. In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models. We further demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning. By using this new tool to analyze self-supervised models, we can show visually that i) SSL (backbone) representation are not really invariant to many data augmentation they were trained on. ii) SSL projector embedding appear too invariant for tasks like classifications. iii) SSL representations are more robust to small adversarial perturbation of their inputs iv) there is an inherent structure learned with SSL model that can be used for image manipulation.

【8】 An overview of active learning methods for insurance with fairness appreciation 标题:公平增值保险的主动学习方法综述 链接:https://arxiv.org/abs/2112.09466

作者:Romuald Elie,Caroline Hillairet,François Hu,Marc Juillard 机构:Université Gustave Eiffel,ENSAE-CREST,Société Générale Insurance 摘要:随着模型部署的民主化,本文讨论并解决了机器学习在保险中应用的一些挑战。第一个挑战是通过主动学习(模型推理和oracle之间的反馈回路)减少标记工作(因此关注数据质量):在保险中,未标记的数据通常非常丰富,主动学习可以成为降低标记成本的重要资产。为此,本文在研究各种经典主动学习方法对合成数据集和真实数据集的实证影响之前,勾勒出了各种经典主动学习方法。保险业的另一个关键挑战是模型推理中的公平性问题。我们将在这个主动学习框架中引入并集成多类任务的后处理公平性,以解决这两个问题。最后,对不公平数据集的数值实验表明,该方法在模型精度和公平性之间取得了很好的折衷。 摘要:This paper addresses and solves some challenges in the adoption of machine learning in insurance with the democratization of model deployment. The first challenge is reducing the labelling effort (hence focusing on the data quality) with the help of active learning, a feedback loop between the model inference and an oracle: as in insurance the unlabeled data is usually abundant, active learning can become a significant asset in reducing the labelling cost. For that purpose, this paper sketches out various classical active learning methodologies before studying their empirical impact on both synthetic and real datasets. Another key challenge in insurance is the fairness issue in model inferences. We will introduce and integrate a post-processing fairness for multi-class tasks in this active learning framework to solve these two issues. Finally numerical experiments on unfair datasets highlight that the proposed setup presents a good compromise between model precision and fairness.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 Towards fuzzification of adaptation rules in self-adaptive architectures 标题:自适应体系结构中适配规则模糊化的研究 链接:https://arxiv.org/abs/2112.09468

作者:Tomáš Bureš,Petr Hnětynka,Martin Kruliš,Danylo Khalyeyev,Sebastian Hahner,Stephan Seifermann,Maximilian Walter,Robert Heinrich 机构:Charles University, Prague, Czech Republic, Karlsruhe Institute of Technology (KIT), Germany 摘要:在这篇论文中,我们着重于在自适应体系结构的分析和规划阶段开发神经网络。本文所研究的激励案例涉及现有的(遗留的)自适应体系结构及其由逻辑规则指定的自适应逻辑。我们进一步假设,有必要赋予这些系统基于输入和预期输出示例的学习能力。解决这种需求的一个简单选择是用神经网络取代基于逻辑规则的推理。然而,这一步骤带来了一些问题,这些问题通常至少会造成暂时的倒退。原因是逻辑规则通常代表一个大型且经过测试的领域知识体,如果用神经网络替换逻辑规则,这些知识体可能会丢失。此外,一般神经网络的黑盒特性混淆了系统内部的工作方式,因此引入了更多的不确定性。在本文中,我们提出了一种方法,使我们能够赋予现有的自适应体系结构使用神经网络进行学习的能力,同时保留存在于逻辑规则中的领域知识。我们在现有的基于规则的系统和基于通用神经网络的系统之间引入了一个连续体。我们展示了如何在这个连续体中导航并创建一个自然嵌入原始逻辑规则的神经网络架构,以及如何逐步扩展网络的学习潜力,从而控制所有软计算模型固有的不确定性。我们在两个更大的实际用例的代表性摘录上展示和评估该方法。 摘要:In this paper, we focus on exploiting neural networks for the analysis and planning stage in self-adaptive architectures. The studied motivating cases in the paper involve existing (legacy) self-adaptive architectures and their adaptation logic, which has been specified by logical rules. We further assume that there is a need to endow these systems with the ability to learn based on examples of inputs and expected outputs. One simple option to address such a need is to replace the reasoning based on logical rules with a neural network. However, this step brings several problems that often create at least a temporary regress. The reason is the logical rules typically represent a large and tested body of domain knowledge, which may be lost if the logical rules are replaced by a neural network. Further, the black-box nature of generic neural networks obfuscates how the systems work inside and consequently introduces more uncertainty. In this paper, we present a method that makes it possible to endow an existing self-adaptive architectures with the ability to learn using neural networks, while preserving domain knowledge existing in the logical rules. We introduce a continuum between the existing rule-based system and a system based on a generic neural network. We show how to navigate in this continuum and create a neural network architecture that naturally embeds the original logical rules and how to gradually scale the learning potential of the network, thus controlling the uncertainty inherent to all soft computing models. We showcase and evaluate the approach on representative excerpts from two larger real-life use cases.

【2】 Confidence-Aware Subject-to-Subject Transfer Learning for Brain-Computer Interface 标题:脑-机接口的可信度感知的主体间迁移学习 链接:https://arxiv.org/abs/2112.09243

作者:Dong-Kyun Han,Serkan Musellim,Dong-Young Kim 机构:Dept. Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea, Dept. Artificial Intelligence 备注:Submitted to 2022 10th IEEE International Winter Conference on Brain-Computer Interface 摘要:脑电图(EEG)的受试者间/受试者内变异性使得脑-机接口(BCI)的实际使用变得困难。通常,每次使用BCI系统时,BCI系统都需要校准程序来调整模型。这个问题被认为是脑机接口的主要障碍,为了克服这个问题,最近出现了基于迁移学习(TL)的方法。然而,许多脑机接口范例的局限性在于,它们由一个结构组成,该结构首先显示标签,然后测量“图像”,在受试者对受试者TL过程的许多情况下,源受试者包含不包含控制信号的数据的负面影响被忽略。本文的主要目的是提出一种排除可能对受试者到受试者TL训练产生负面影响的受试者的方法,该方法通常使用尽可能多的受试者的数据。在本文中,我们提出了一个只使用高置信度主题进行TL训练的BCI框架。在我们的框架中,深度神经网络使用基于小损失技巧的协同教学算法为TL过程选择有用的主题,并排除噪声主题。我们在两个公共数据集(2020年国际BCI竞赛轨道4和OpenBMI数据集)上试验了遗漏一个主题的验证。我们的实验结果表明,置信感知的TL选择具有小损失实例的被试,提高了BCI的泛化性能。 摘要:The inter/intra-subject variability of electroencephalography (EEG) makes the practical use of the brain-computer interface (BCI) difficult. In general, the BCI system requires a calibration procedure to tune the model every time the system is used. This problem is recognized as a major obstacle to BCI, and to overcome it, approaches based on transfer learning (TL) have recently emerged. However, many BCI paradigms are limited in that they consist of a structure that shows labels first and then measures "imagery", the negative effects of source subjects containing data that do not contain control signals have been ignored in many cases of the subject-to-subject TL process. The main purpose of this paper is to propose a method of excluding subjects that are expected to have a negative impact on subject-to-subject TL training, which generally uses data from as many subjects as possible. In this paper, we proposed a BCI framework using only high-confidence subjects for TL training. In our framework, a deep neural network selects useful subjects for the TL process and excludes noisy subjects, using a co-teaching algorithm based on the small-loss trick. We experimented with leave-one-subject-out validation on two public datasets (2020 international BCI competition track 4 and OpenBMI dataset). Our experimental results showed that confidence-aware TL, which selects subjects with small loss instances, improves the generalization performance of BCI.

【3】 Predicting Shallow Water Dynamics using Echo-State Networks with Transfer Learning 标题:基于传递学习的回声状态网络浅水动力预报 链接:https://arxiv.org/abs/2112.09182

作者:Xiaoqian Chen,Balasubramanya T. Nadiga,Ilya Timofeyev 摘要:在本文中,我们证明了水库计算可以用来学习浅水方程的动力学。特别是,虽然大多数以前的水库计算应用都需要在特定轨迹上进行训练,以进一步预测沿着该轨迹的演化,但我们展示了水库计算在训练过程中没有看到初始条件的情况下预测浅水方程轨迹的能力。然而,在这种情况下,我们发现网络的性能在初始条件下恶化,环境条件(如总水位和平均流速)与训练数据集中的环境条件不同。为了避免这一缺陷,我们引入了一种转移学习方法,其中使用一个小的附加训练步骤以及相关的环境条件来改进预测。 摘要:In this paper we demonstrate that reservoir computing can be used to learn the dynamics of the shallow-water equations. In particular, while most previous applications of reservoir computing have required training on a particular trajectory to further predict the evolution along that trajectory alone, we show the capability of reservoir computing to predict trajectories of the shallow-water equations with initial conditions not seen in the training process. However, in this setting, we find that the performance of the network deteriorates for initial conditions with ambient conditions (such as total water height and average velocity) that are different from those in the training dataset. To circumvent this deficiency, we introduce a transfer learning approach wherein a small additional training step with the relevant ambient conditions is used to improve the predictions.

强化学习(2篇)

【1】 Autonomous Reinforcement Learning: Formalism and Benchmarking 标题:自主强化学习:形式主义与标杆 链接:https://arxiv.org/abs/2112.09605

作者:Archit Sharma,Kelvin Xu,Nikhil Sardana,Abhishek Gupta,Karol Hausman,Sergey Levine,Chelsea Finn 机构:Stanford University, University of California, Berkeley, MIT, Google Brain 摘要:强化学习(RL)提供了一种通过尝试和错误进行学习的自然主义框架,它既简单有效,又与人类和动物通过经验获得技能的方式相似,因此极具吸引力。然而,真实世界的具体化学习,如人类和动物所执行的学习,处于一个连续的、非情节的世界中,而RL中的常见基准任务是情节性的,在试验之间重新设置环境,为代理提供多次尝试。当尝试采用为情景模拟环境开发的RL算法并在现实平台(如机器人)上运行时,这种差异是一个重大挑战。在本文中,我们旨在通过构建一个自主强化学习(ARL)框架来解决这一差异:强化学习,其中代理不仅通过自己的经验进行学习,而且还与缺乏人类监督的情况相抗衡,以在试验之间重置。我们围绕这个框架引入了一个模拟基准EARL,其中包含了一组多样且具有挑战性的模拟任务,这些任务反映了学习中引入的障碍,而此时只能假设对外部干预的依赖最小。我们表明,随着干预措施的减少,情景学习的标准方法和现有方法都会遇到困难,强调需要开发新的强化学习算法,更加注重自主性。 摘要:Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

【2】 Learning Reward Machines: A Study in Partially Observable Reinforcement Learning 标题:学习奖励机:部分可观测强化学习的研究 链接:https://arxiv.org/abs/2112.09477

作者:Rodrigo Toro Icarte,Ethan Waldie,Toryn Q. Klassen,Richard Valenzano,Margarita P. Castro,Sheila A. McIlraith 机构:Pontificia Universidad Cat´olica de Chile, Vicu˜na Mackenna , Macul, RM, Chile, University of Toronto, College St, Toronto, ON, Canada, Vector Institute, University, Toronto, ON, Canada, Ryerson University, Victoria St, Toronto, ON, Canada 摘要:强化学习是人工智能的核心问题。这个问题包括定义人工智能体,这些智能体可以通过与环境交互来学习最佳行为——在环境中,最佳行为是根据智能体寻求最大化的奖励信号来定义的。奖励机器(RMs)提供了奖励函数的结构化、基于自动机的表示,使RL代理能够将RL问题分解为结构化子问题,这些子问题可以通过非策略学习有效地学习。在这里,我们表明RMs可以从经验中学习,而不是由用户指定,并且由此产生的问题分解可以有效地解决部分可观察的RL问题。我们将学习RMs的任务视为一个离散优化问题,其目标是找到一个RM,该RM将问题分解为一组子问题,使其最优无记忆策略的组合成为原始问题的最优策略。我们展示了这种方法在三个部分可观察领域的有效性,在这三个领域,它的表现明显优于A3C、PPO和ACER,并讨论了其优势、局限性和更广泛的潜力。 摘要:Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

医学相关(2篇)

【1】 CPPE-5: Medical Personal Protective Equipment Dataset 标题:CPPE-5:医用个人防护设备数据集 链接:https://arxiv.org/abs/2112.09569

作者:Rishit Dagli,Ali Mustufa Shaikh 机构:High School, Narayana Junior College, Mumbai, India, Student Community Lead, Postman Inc 备注:16 pages, 6 tables, 6 figures. Code and models are available at this https URL 摘要:我们提出了一个新的具有挑战性的数据集CPPE-5(医疗个人防护装备),其目标是允许研究医疗个人防护装备的从属分类,这在其他关注广泛级别类别的流行数据集(如PASCAL VOC、ImageNet、Microsoft COCO、OpenImages等)中是不可能的。为了使在此数据集上训练的模型易于在复杂场景中的实际场景中使用,我们的数据集主要包含显示复杂场景的图像,每个场景中的多个对象在其自然上下文中。此数据集的图像收集重点在于:获取尽可能多的非标志性图像,并确保所有图像都是真实的图像,而不是该区域的其他现有数据集。我们的数据集包括5个对象类别(工作服、面罩、手套、面具和护目镜),每个图像都带有一组边界框和正面标签。我们对该数据集进行了详细分析,并将其与其他流行的大类数据集以及侧重于个人防护设备的数据集进行了比较。我们还发现,目前还不存在此类公开可用的数据集。最后,我们还分析了性能,并比较了边界框结果的基线模型和最先进模型的复杂性。我们的代码、数据和经过训练的模型可在https://git.io/cppe5-dataset . 摘要:We present a new challenging dataset, CPPE - 5 (Medical Personal Protective Equipment), with the goal to allow the study of subordinate categorization of medical personal protective equipments, which is not possible with other popular data sets that focus on broad level categories (such as PASCAL VOC, ImageNet, Microsoft COCO, OpenImages, etc). To make it easy for models trained on this dataset to be used in practical scenarios in complex scenes, our dataset mainly contains images that show complex scenes with several objects in each scene in their natural context. The image collection for this dataset focusing on: obtaining as many non-iconic images as possible and making sure all the images are real-life images unlike other existing datasets in this area. Our dataset includes 5 object categories (coveralls, face shield, gloves, mask, and goggles) and each image is annotated with a set of bounding boxes and positive labels. We present a detailed analysis of the dataset in comparison to other popular broad category datasets as well as datasets focusing on personal protective equipments, we also find that at present there exist no such publicly available datasets. Finally we also analyze performance and compare model complexities on baseline and state-of-the-art models for bounding box results. Our code, data, and trained models are available at https://git.io/cppe5-dataset .

【2】 Towards Launching AI Algorithms for Cellular Pathology into Clinical & Pharmaceutical Orbits 标题:将细胞病理学的人工智能算法推向临床和药物轨道 链接:https://arxiv.org/abs/2112.09496

作者:Amina Asif,Kashif Rajpoot,David Snead,Fayyaz Minhas,Nasir Rajpoot 机构:†Tissue Image Analytics Centre, Department of Computer Science, University of Warwick, UK, ‡Department of Computer Science, University of Birmingham, UK, §Department of Pathology, University Hospitals Coventry & Warwickshire, UK, ¶The Alan Turing Institute, UK 摘要:计算病理学(CPath)是一个新兴领域,通过计算算法对组织切片的数字化高分辨率图像进行处理和分析,研究组织病理学。最近CPath基于深度学习的发展成功地利用了组织学图像中大量原始像素数据来预测诊断、预测、治疗敏感性和患者分层领域的目标参数——预示着组织学和肿瘤学都将迎来一个新的数据驱动AI时代。数据作为燃料,人工智能作为引擎,CPath算法准备好起飞并最终发射到临床和制药轨道。在本文中,我们讨论了CPath的局限性和相关的挑战,使读者能够区分希望和炒作,并为未来的研究提供方向,以克服这一新兴领域面临的一些主要挑战,使其能够发射到两个轨道。 摘要:Computational Pathology (CPath) is an emerging field concerned with the study of tissue pathology via computational algorithms for the processing and analysis of digitized high-resolution images of tissue slides. Recent deep learning based developments in CPath have successfully leveraged sheer volume of raw pixel data in histology images for predicting target parameters in the domains of diagnostics, prognostics, treatment sensitivity and patient stratification -- heralding the promise of a new data-driven AI era for both histopathology and oncology. With data serving as the fuel and AI as the engine, CPath algorithms are poised to be ready for takeoff and eventual launch into clinical and pharmaceutical orbits. In this paper, we discuss CPath limitations and associated challenges to enable the readers distinguish hope from hype and provide directions for future research to overcome some of the major challenges faced by this budding field to enable its launch into the two orbits.

蒸馏|知识提取(3篇)

【1】 Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report) 标题:通过马尔可夫决策过程的变分抽象提炼具有形式保证的RL策略(技术报告) 链接:https://arxiv.org/abs/2112.09655

作者:Florent Delgrange,Ann Nowé,Guillermo A. Pérez 机构: AI Lab, Vrije Universiteit Brussel, University of Antwerp – Flanders Make 备注:Accepted at AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix) 摘要:在连续环境中通过强化学习(RL)学习的策略的背景下,我们考虑策略简化和验证的挑战。在性能良好的环境中,RL算法的收敛保证在极限范围内。虽然这些保证是有价值的,但它们不足以满足安全关键应用。此外,在应用深度RL等先进技术时,它们也会丢失。在将高级RL算法应用于(i)可达性、(ii)安全约束可达性或(iii)折扣奖励目标的更复杂环境时,恢复保证,我们基于Gelada等人引入的DeepMDP框架,推导未知环境和学习的离散潜在模型之间的新互模拟边界。我们的互模拟界使马尔可夫决策过程的形式化方法得以应用。最后,我们展示了如何使用通过最先进的RL获得的策略来有效地训练变分自动编码器,该编码器产生具有可证明的近似正确互模拟保证的离散潜在模型。此外,我们还获得了潜在模型策略的一个提炼版本。 摘要:We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.

【2】 Pixel Distillation: A New Knowledge Distillation Scheme for Low-Resolution Image Recognition 标题:像素提取:一种新的低分辨率图像识别知识提取方案 链接:https://arxiv.org/abs/2112.09532

作者:Guangyu Guo,Longfei Han,Junwei Han,Dingwen Zhang 机构:The Brain and Artificial Intelligence Laboratory, Northwestern Polytechnical University, Xi’an, China, Beijing Technology and Business University, Beijing, China 摘要:深度学习的巨大成功主要归功于大规模的网络架构和高质量的训练数据。然而,在内存和成像能力有限的便携式设备上部署最新的深度模型仍然是一个挑战。现有的一些工作已经开始通过知识提取来压缩模型。不幸的是,这些方法无法处理图像质量降低的图像,例如低分辨率(LR)图像。为此,我们做出了开创性的努力,从从高分辨率(HR)图像中学习到的重网络模型中提取有用的知识,到将处理LR图像的紧凑网络模型,从而通过新颖的像素提取推进当前的知识提取技术。为了实现这一目标,我们提出了一个教师助理学生(TAS)框架,该框架将知识提取分解为模型压缩阶段和高分辨率表示转换阶段。通过配备一个新的特征超分辨率(FSR)模块,我们的方法可以学习轻量级网络模型,该模型可以达到与重型教师模型相似的精度,但参数更少,推理速度更快,输入分辨率更低。在三个广泛使用的基准上的综合实验,即CUB-200-2011、PASCAL VOC 2007和ImageNetSub,证明了我们方法的有效性。 摘要:The great success of deep learning is mainly due to the large-scale network architecture and the high-quality training data. However, it is still challenging to deploy recent deep models on portable devices with limited memory and imaging ability. Some existing works have engaged to compress the model via knowledge distillation. Unfortunately, these methods cannot deal with images with reduced image quality, such as the low-resolution (LR) images. To this end, we make a pioneering effort to distill helpful knowledge from a heavy network model learned from high-resolution (HR) images to a compact network model that will handle LR images, thus advancing the current knowledge distillation technique with the novel pixel distillation. To achieve this goal, we propose a Teacher-Assistant-Student (TAS) framework, which disentangles knowledge distillation into the model compression stage and the high resolution representation transfer stage. By equipping a novel Feature Super Resolution (FSR) module, our approach can learn lightweight network model that can achieve similar accuracy as the heavy teacher model but with much fewer parameters, faster inference speed, and lower-resolution inputs. Comprehensive experiments on three widely-used benchmarks, ie, CUB-200-2011, PASCAL VOC 2007, and ImageNetSub, demonstrate the effectiveness of our approach.

【3】 Feature extraction and classification algorithm, which one is more essential? An experimental study on a specific task of vibration signal diagnosis 标题:特征提取和分类算法,哪一种更关键?振动信号诊断特定任务的实验研究 链接:https://arxiv.org/abs/2112.09389

作者:Qiang Liu,Jiade Zhang,Jingna Liu,Zhi Yang 机构: Department of Electronic and Communication Engineering, North China Electric Power Uni-, versity, Baoding , Hebei, China, College of Mathematics and Information Science, Hebei University, Baoding , Hebei 摘要:随着机器学习的发展,数据驱动模型在振动信号故障诊断中得到了广泛的应用。大多数数据驱动的机器学习算法都是基于设计良好的特征构建的,但特征提取通常需要提前完成。在深度学习时代,特征提取和分类器学习同时进行,这将导致端到端的学习系统。本文探讨了在学习系统生成过程中,特征提取和分类算法这两个关键因素中的哪一个对于振动信号诊断的特定任务更为重要。分别讨论了基于高斯模型和统计特征的振动信号特征提取方法。并选择了几种分类算法,通过实验验证了特征提取和分类算法对预测性能的影响。 摘要:With the development of machine learning, a data-driven model has been widely used in vibration signal fault diagnosis. Most data-driven machine learning algorithms are built based on well-designed features, but feature extraction is usually required to be completed in advance. In the deep learning era, feature extraction and classifier learning are conducted simultaneously, which will lead to an end-to-end learning system. This paper explores which one of the two key factors, i.e., feature extraction and classification algorithm, is more essential for a specific task of vibration signal diagnosis during a learning system is generated. Feature extractions from vibration signal based on both well-known Gaussian model and statistical characteristics are discussed, respectively. And several classification algorithms are selected to experimentally validate the comparative impact of both feature extraction and classification algorithm on prediction performance.

超分辨率|去噪|去模糊|去雾(1篇)

【1】 Super-resolution reconstruction of cytoskeleton image based on A-net deep learning network 标题:基于A网深度学习网络的细胞骨架图像超分辨率重建 链接:https://arxiv.org/abs/2112.09574

作者:Qian Chen,Haoxin Bai,Bingchen Che,Tianyun Zhao,Ce Zhang,Kaige Wang,Jintao Bai,Wei Zhao 机构:. School of Automation, Northwestern Polytechnical University, Xi’an , China, . State Key Laboratory of Photon-Technology in Western China Energy, International, Collaborative Center on Photoelectric Technology and Nano Functional Materials, Institute of 备注:The manuscript has 17 pages, 10 figures and 58 references 摘要:迄今为止,纳米尺度的活细胞成像仍然具有挑战性。尽管超分辨率显微镜方法能够在光学分辨率极限以下显示亚细胞结构,但空间分辨率仍然远远不足以在体内重建生物分子的结构(即约24 nm厚的微管纤维)。在本研究中,我们提出了一个A-net网络,并表明将A-net深度学习网络与基于退化模型的DWDC算法相结合,可以显著提高共焦显微镜捕获的细胞骨架图像的分辨率。利用DWDC算法构建新的数据集,并利用A-net神经网络的特性(即相当少的层),我们成功地去除了噪声和絮状结构,这些结构最初会干扰原始图像中的细胞结构,使用相对较小的数据集将空间分辨率提高了10倍。因此,我们得出结论,所提出的将A-net神经网络与DWDC方法相结合的算法是从低分辨率图像中提取生物分子、细胞和器官结构细节的合适且通用的方法。 摘要:To date, live-cell imaging at the nanometer scale remains challenging. Even though super-resolution microscopy methods have enabled visualization of subcellular structures below the optical resolution limit, the spatial resolution is still far from enough for the structural reconstruction of biomolecules in vivo (i.e. ~24 nm thickness of microtubule fiber). In this study, we proposed an A-net network and showed that the resolution of cytoskeleton images captured by a confocal microscope can be significantly improved by combining the A-net deep learning network with the DWDC algorithm based on degradation model. Utilizing the DWDC algorithm to construct new datasets and taking advantage of A-net neural network's features (i.e., considerably fewer layers), we successfully removed the noise and flocculent structures, which originally interfere with the cellular structure in the raw image, and improved the spatial resolution by 10 times using relatively small dataset. We, therefore, conclude that the proposed algorithm that combines A-net neural network with the DWDC method is a suitable and universal approach for exacting structural details of biomolecules, cells and organs from low-resolution images.

联邦学习|隐私保护|加密(1篇)

【1】 Federated Learning with Heterogeneous Data: A Superquantile Optimization Approach 标题:异构数据联合学习:一种超分位数优化方法 链接:https://arxiv.org/abs/2112.09429

作者:Krishna Pillutla,Yassine Laguel,Jérôme Malick,Zaid Harchaoui 机构:University of Washington, Seattle, WA, USA, Univ. Grenoble Alpes, Grenoble, France, CNRS, Grenoble, France 备注:This is the longer version of a conference paper published in IEEE CISS 2021 摘要:我们提出了一个联邦学习框架,该框架旨在跨异构数据的各个客户端提供良好的预测性能。所提出的方法依赖于基于超分位数的学习目标,该目标捕获异构客户机上错误分布的尾部统计信息。我们提出了一种随机训练算法,该算法将不同的私有客户端重新加权步骤与联邦平均步骤交织在一起。该算法具有有限时间收敛性保证,同时覆盖凸和非凸设置。在联邦学习的基准数据集上的实验结果表明,我们的方法在平均误差方面与经典方法具有竞争力,并且在误差的尾部统计方面优于经典方法。 摘要:We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a superquantile-based learning objective that captures the tail statistics of the error distribution over heterogeneous clients. We present a stochastic training algorithm which interleaves differentially private client reweighting steps with federated averaging steps. The proposed algorithm is supported with finite time convergence guarantees that cover both convex and non-convex settings. Experimental results on benchmark datasets for federated learning demonstrate that our approach is competitive with classical ones in terms of average error and outperforms them in terms of tail statistics of the error.

推理|分析|理解|解释(2篇)

【1】 Communication-oriented Model Fine-tuning for Packet-loss Resilient Distributed Inference under Highly Lossy IoT Networks 标题:高丢失物联网环境下面向通信的抗丢包分布式推理模型微调 链接:https://arxiv.org/abs/2112.09407

作者:Sohei Itahara,Takayuki Nishio,Yusuke Koda,Koji Yamamoto 机构:(Senior Member, IEEE), This work was supported in part by JST PRESTO Grant Number JPMJPR,. 备注:Submitted to IEEE Access 摘要:分布式推理(DI)框架作为一种在资源受限物联网(IoT)设备上通过尖端的深度机器学习(ML)实现的实时应用技术,已经获得了广泛的关注。在DI中,计算任务通过有损物联网网络从物联网设备卸载到边缘服务器。然而,一般来说,在通信延迟和可靠性之间存在通信系统级的权衡;因此,为了提供准确的DI结果,需要采用可靠且高延迟的通信系统,这导致DI的端到端延迟不可忽略。这促使我们通过对ML技术的研究来改善通信延迟和准确性之间的平衡。具体而言,我们提出了一种面向通信的模型调整(COMtune),其目的是实现具有低延迟但不可靠通信链路的高精度DI。在COMtune中,关键思想是通过应用退出技术模拟不可靠通信链路的影响来微调ML模型。这使得DI系统能够获得针对不可靠通信链路的鲁棒性。我们的ML实验表明,COMtune能够在低延迟和有损网络下进行准确预测。 摘要:The distributed inference (DI) framework has gained traction as a technique for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In DI, computational tasks are offloaded from the IoT device to the edge server via lossy IoT networks. However, generally, there is a communication system-level trade-off between communication latency and reliability; thus, to provide accurate DI results, a reliable and high-latency communication system is required to be adapted, which results in non-negligible end-to-end latency of the DI. This motivated us to improve the trade-off between the communication latency and accuracy by efforts on ML techniques. Specifically, we have proposed a communication-oriented model tuning (COMtune), which aims to achieve highly accurate DI with low-latency but unreliable communication links. In COMtune, the key idea is to fine-tune the ML model by emulating the effect of unreliable communication links through the application of the dropout technique. This enables the DI system to obtain robustness against unreliable communication links. Our ML experiments revealed that COMtune enables accurate predictions with low latency and under lossy networks.

【2】 Marginalization in Bayesian Networks: Integrating Exact and Approximate Inference 标题:贝叶斯网络中的边际化:精确推理与近似推理的结合 链接:https://arxiv.org/abs/2112.09217

作者:Fritz M. Bayer,Giusi Moffa,Niko Beerenwinkel,Jack Kuipers 机构:chGiusi Moffa University of BaselNiko Beerenwinkel ETH ZurichJack Kuipers ETH Zurichjack 摘要:贝叶斯网络是一种概率图形模型,可以简洁地表示随机变量之间的依赖关系。缺失数据和隐藏变量需要计算变量子集的边际概率分布。虽然边际概率分布的知识对于统计和机器学习中的各种问题至关重要,但由于这项任务的NP难度,它的精确计算对于分类变量通常是不可行的。我们开发了一种分而治之的方法,利用贝叶斯网络的图形特性将边际概率分布的计算划分为低维的子计算,从而降低了总体计算复杂度。利用这一性质,我们提出了一种有效且可扩展的分类变量边际概率分布估计算法。在基准研究中,将新方法与最先进的近似推理方法进行了比较,在基准研究中,新方法表现出了优异的性能。作为一个直接的应用,我们演示了如何使用边际概率分布根据贝叶斯网络对不完整数据进行分类,并使用此方法识别肾癌患者样本的癌症亚型。 摘要:Bayesian Networks are probabilistic graphical models that can compactly represent dependencies among random variables. Missing data and hidden variables require calculating the marginal probability distribution of a subset of the variables. While knowledge of the marginal probability distribution is crucial for various problems in statistics and machine learning, its exact computation is generally not feasible for categorical variables due to the NP-hardness of this task. We develop a divide-and-conquer approach using the graphical properties of Bayesian networks to split the computation of the marginal probability distribution into sub-calculations of lower dimensionality, reducing the overall computational complexity. Exploiting this property, we present an efficient and scalable algorithm for estimating the marginal probability distribution for categorical variables. The novel method is compared against state-of-the-art approximate inference methods in a benchmarking study, where it displays superior performance. As an immediate application, we demonstrate how the marginal probability distribution can be used to classify incomplete data against Bayesian networks and use this approach for identifying the cancer subtype of kidney cancer patient samples.

检测相关(2篇)

【1】 A Comparative Study of Detecting Anomalies in Time Series Data Using LSTM and TCN Models 标题:LSTM和TCN模型在时序数据异常检测中的比较研究 链接:https://arxiv.org/abs/2112.09293

作者:Saroj Gopali,Faranak Abri,Sima Siami-Namini,Akbar Siami Namin 机构:Department of Computer Science,School of Planning and Public Policy, Texas Tech University,Rutgers University 备注:15 pages, 3 figures, IEEE BigData 2021 摘要:存在几种数据驱动方法,使我们能够对时间序列数据进行建模,包括传统的基于回归的建模方法(即ARIMA)。最近,在时间序列分析和预测的背景下引入和探索了深度学习技术。一个主要的研究问题是这些不同的深度学习技术在预测时间序列数据方面的表现。本文比较了两种重要的深度学习建模技术。比较了基于递归神经网络(RNN)的长短时记忆(LSTM)和基于卷积神经网络(CNN)的时间卷积网络(TCN),并报告了它们的性能和训练时间。根据我们的实验结果,这两种建模技术表现相当,基于TCN的模型略优于LSTM。此外,基于CNN的TCN模型比基于RNN的LSTM模型建立稳定模型的速度更快。 摘要:There exist several data-driven approaches that enable us model time series data including traditional regression-based modeling approaches (i.e., ARIMA). Recently, deep learning techniques have been introduced and explored in the context of time series analysis and prediction. A major research question to ask is the performance of these many variations of deep learning techniques in predicting time series data. This paper compares two prominent deep learning modeling techniques. The Recurrent Neural Network (RNN)-based Long Short-Term Memory (LSTM) and the convolutional Neural Network (CNN)-based Temporal Convolutional Networks (TCN) are compared and their performance and training time are reported. According to our experimental results, both modeling techniques perform comparably having TCN-based models outperform LSTM slightly. Moreover, the CNN-based TCN model builds a stable model faster than the RNN-based LSTM models.

【2】 ALEBk: Feasibility Study of Attention Level Estimation via Blink Detection applied to e-Learning 标题:ALEBk:基于眨眼检测的注意力水平估计应用于e-Learning的可行性研究 链接:https://arxiv.org/abs/2112.09165

作者Roberto Daza,Daniel DeAlcala,Aythami Morales,Ruben Tolosana,Ruth Cobos,Julian Fierrez 机构:School of Engineering, Autonomous University of Madrid 备注:Preprint of the paper presented to the Workshop on Artificial Intelligence for Education (AI4EDU) of AAAI 2022 摘要:本文提出了一种基于眨眼频率的远程注意水平估计的可行性研究。我们首先提出了一种基于卷积神经网络(CNNs)的眨眼检测系统,该系统在相关工作中具有很强的竞争力。使用该检测器,我们通过实验评估了眨眼率与在线课程中学生注意力水平之间的关系。该实验框架是使用公共多模式数据库进行的,该数据库用于眨眼检测和注意力水平估计,称为mEBAL,该数据库包含来自38名学生和多个采集传感器的数据,特别是,i)提供来自学生认知信息的时间信号的脑电图(EEG)波段,以及ii)RGB和NIR摄像头捕捉学生的面部姿势。研究结果表明,眨眼频率与注意力水平呈负相关。在我们提出的称为ALEBk的方法中使用了这种关系,该方法将注意力水平估计为眨眼频率的倒数。我们的研究结果开辟了一条新的研究路线,将该技术引入未来电子学习平台的注意力水平估计,以及基于人脸分析的行为生物特征识别的其他应用。 摘要:This work presents a feasibility study of remote attention level estimation based on eye blink frequency. We first propose an eye blink detection system based on Convolutional Neural Networks (CNNs), very competitive with respect to related works. Using this detector, we experimentally evaluate the relationship between the eye blink rate and the attention level of students captured during online sessions. The experimental framework is carried out using a public multimodal database for eye blink detection and attention level estimation called mEBAL, which comprises data from 38 students and multiples acquisition sensors, in particular, i) an electroencephalogram (EEG) band which provides the time signals coming from the student's cognitive information, and ii) RGB and NIR cameras to capture the students face gestures. The results achieved suggest an inverse correlation between the eye blink frequency and the attention level. This relation is used in our proposed method called ALEBk for estimating the attention level as the inverse of the eye blink frequency. Our results open a new research line to introduce this technology for attention level estimation on future e-learning platforms, among other applications of this kind of behavioral biometrics based on face analysis.

分类|识别(4篇)

【1】 Can Machine Learning Tools Support the Identification of Sustainable Design Leads From Product Reviews? Opportunities and Challenges 标题:机器学习工具能支持从产品评审中识别可持续设计线索吗?机遇与挑战 链接:https://arxiv.org/abs/2112.09391

作者:Michael Saidani,Harrison Kim,Bernard Yannou 机构:Department of Industrial and Enterprise, Systems Engineering, University of Illinois at, Urbana-Champaign, Illinois, USA, Laboratoire Genie Industriel, CentraleSupélec, Université Paris Saclay, Gif-sur-Yvette, France 备注:ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Aug 2021, Virtual, United States 摘要:越来越多的产品评论发布在网上,这对于设计师来说是一座金矿,他们可以通过捕捉客户的声音更好地了解自己开发的产品,并相应地改进这些产品。与此同时,产品设计和开发在创造更可持续的未来方面发挥着至关重要的作用。随着人工智能技术在自然语言处理领域的最新进展,本研究旨在开发一个集成的机器学习解决方案,从在线产品评论中自动获得可持续的设计见解。在本文中,将讨论、说明和定位现有框架(包括Python库、包以及最先进的算法,如BERT)带来的机遇和挑战,并将其定位在一个特别的机器学习过程中。这篇文章讨论了构建机器学习渠道的机会和挑战,以便从产品审查中获得见解,设计更可持续的产品,包括以下五个阶段,从确定可持续性相关审查到解释可持续性设计线索:数据收集、数据格式、模型训练、模型评估和模型部署。给出了通过产品审查挖掘和处理产生的可持续设计见解的示例。最后,为该领域的未来研究提供了有希望的路线,包括将标准产品与其可持续替代品平行放置的案例研究,以比较客户所重视的特征,并产生精细相关的可持续设计线索。 摘要:The increasing number of product reviews posted online is a gold mine for designers to know better about the products they develop, by capturing the voice of customers, and to improve these products accordingly. In the meantime, product design and development have an essential role in creating a more sustainable future. With the recent advance of artificial intelligence techniques in the field of natural language processing, this research aims to develop an integrated machine learning solution to obtain sustainable design insights from online product reviews automatically. In this paper, the opportunities and challenges offered by existing frameworks - including Python libraries, packages, as well as state-of-the-art algorithms like BERT - are discussed, illustrated, and positioned along an ad hoc machine learning process. This contribution discusses the opportunities to reach and the challenges to address for building a machine learning pipeline, in order to get insights from product reviews to design more sustainable products, including the five following stages, from the identification of sustainability-related reviews to the interpretation of sustainable design leads: data collection, data formatting, model training, model evaluation, and model deployment. Examples of sustainable design insights that can be produced out of product review mining and processing are given. Finally, promising lines for future research in the field are provided, including case studies putting in parallel standard products with their sustainable alternatives, to compare the features valued by customers and to generate in fine relevant sustainable design leads.

【2】 KGBoost: A Classification-based Knowledge Base Completion Method with Negative Sampling 标题:KGBoost:一种基于分类的负抽样知识库补全方法 链接:https://arxiv.org/abs/2112.09340

作者:Yun-Cheng Wang,Xiou Ge,Bin Wang,C. -C. Jay Kuo 机构:University of Southern California, Los Angeles, USA, National University of Singapore, C.-C. Jay Kuo 摘要:在这项工作中,知识库完成被描述为一个二元分类问题,使用知识图(KG)中的相关链接为每个关系训练XGBoost二元分类器。新方法KGBoost采用模块化设计,尝试寻找硬负样本,从而训练出一个强大的缺失链预测分类器。我们在多个基准数据集上进行了实验,并证明KGBoost在大多数数据集上都优于最先进的方法。此外,与通过端到端优化训练的模型相比,KGBoost在低维设置下工作良好,从而允许更小的模型尺寸。 摘要:Knowledge base completion is formulated as a binary classification problem in this work, where an XGBoost binary classifier is trained for each relation using relevant links in knowledge graphs (KGs). The new method, named KGBoost, adopts a modularized design and attempts to find hard negative samples so as to train a powerful classifier for missing link prediction. We conduct experiments on multiple benchmark datasets, and demonstrate that KGBoost outperforms state-of-the-art methods across most datasets. Furthermore, as compared with models trained by end-to-end optimization, KGBoost works well under the low-dimensional setting so as to allow a smaller model size.

【3】 An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification 标题:一种用于拥挤场景分类的视听数据集和深度学习框架 链接:https://arxiv.org/abs/2112.09172

作者:Lam Pham,Dat Ngo,Phu X. Nguyen,Truong Hoang,Alexander Schindler 机构: Austria Instituteof Technology, Ngo is with School of Computer Science and Electronic Engineering, University of Essex, Nguyen is with Department of Computer Fundamentals, FPTUniversity 摘要:本文提出了一个视听场景分类(SC)的任务,其中输入视频被分类为五个真实拥挤场景之一:“暴乱”、“噪音街道”、“烟火事件”、“音乐事件”和“运动氛围”。为此,我们首先从Youtube(在野外场景中)收集这五个拥挤环境的视听数据集(视频)。然后,提出了广泛的深度学习框架来独立部署音频或视频输入数据。最后,将从高性能深度学习框架中获得的结果进行融合,以获得最佳准确度分数。我们的实验结果表明,音频和视频输入因素独立地影响SC任务的性能。值得注意的是,探索音频或视频输入数据的深度学习框架可以达到95.7%的最佳准确率。 摘要:This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning frameworks are proposed to deploy either audio or visual input data independently. Finally, results obtained from high-performed deep learning frameworks are fused to achieve the best accuracy score. Our experimental results indicate that audio and visual input factors independently contribute to the SC task's performance. Significantly, an ensemble of deep learning frameworks exploring either audio or visual input data can achieve the best accuracy of 95.7%.

【4】 Continual Learning for Monolingual End-to-End Automatic Speech Recognition 标题:基于连续学习的单语端到端自动语音识别 链接:https://arxiv.org/abs/2112.09427

作者:Steven Vander Eeckt,Hugo Van hamme 机构:KU Leuven, Department Electrical Engineering ESAT-PSI, Leuven, Belgium 备注:Submitted to ICASSP 2021. 5 pages, 1 figure 摘要:使自动语音识别(ASR)模型适应新的领域会导致原始领域的性能下降,这种现象称为灾难性遗忘(CF)。即使是单语ASR模型也无法扩展到新的口音、方言、主题等,而不会受到CF的影响,这使得它们无法在不存储所有过去数据的情况下不断增强。幸运的是,可以使用持续学习(CL)方法,其目的是在克服CF的同时实现持续适应。在本文中,我们为端到端ASR实现了大量的CL方法,并测试和比较了它们在四个新任务中扩展单语混合CTCTransformer模型的能力。我们发现,性能最佳的CL方法将微调模型(下限)和所有任务联合训练的模型(上限)之间的差距缩小了40%以上,同时只需要访问0.6%的原始数据。 摘要:Adapting Automatic Speech Recognition (ASR) models to new domains leads to a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.

3D|3D重建等相关(1篇)

【1】 Methods for segmenting cracks in 3d images of concrete: A comparison based on semi-synthetic images 标题:基于半合成图像的混凝土三维图像裂缝分割方法比较 链接:https://arxiv.org/abs/2112.09493

作者:Tin Barisin,Christian Jung,Franziska Müsebeck,Claudia Redenbach,Katja Schladitz 机构:Franziska M¨usebeckb, Fraunhofer Institut f¨ur Techno- und Wirtschaftsmathematik, Fraunhofer-Platz , Kaiserslautern, Germany, Technische Universit¨at Kaiserslautern, Gottlieb-Daimler-Straße , Kaiserslautern, Germany 摘要:混凝土是建筑物、桥梁和道路的标准建筑材料。由于安全在此类结构的设计、监测和维护中起着核心作用,因此了解混凝土的开裂行为非常重要。计算机断层扫描捕捉建筑材料的微观结构,并允许研究裂纹萌生和扩展。在大型3d图像中手动分割裂纹表面是不可行的。本文对三维图像中的裂纹自动分割方法进行了综述和比较。经典的图像处理方法(边缘检测滤波器、模板匹配、最小路径和区域生长算法)和学习方法(卷积神经网络、随机森林)被考虑并在半合成3d图像上进行了测试。它们的性能很大程度上取决于参数选择,参数选择应适应图像的灰度分布和混凝土的几何特性。一般来说,学习方法表现最好,尤其是对于薄裂纹和低灰度值对比度。 摘要:Concrete is the standard construction material for buildings, bridges, and roads. As safety plays a central role in the design, monitoring, and maintenance of such constructions, it is important to understand the cracking behavior of concrete. Computed tomography captures the microstructure of building materials and allows to study crack initiation and propagation. Manual segmentation of crack surfaces in large 3d images is not feasible. In this paper, automatic crack segmentation methods for 3d images are reviewed and compared. Classical image processing methods (edge detection filters, template matching, minimal path and region growing algorithms) and learning methods (convolutional neural networks, random forests) are considered and tested on semi-synthetic 3d images. Their performance strongly depends on parameter selection which should be adapted to the grayvalue distribution of the images and the geometric properties of the concrete. In general, the learning methods perform best, in particular for thin cracks and low grayvalue contrast.

编码器(1篇)

【1】 Nearest neighbor search with compact codes: A decoder perspective 标题:紧凑码最近邻搜索:一种解码器的观点 链接:https://arxiv.org/abs/2112.09568

作者:Kenza Amara,Matthijs Douze,Alexandre Sablayrolles,Hervé Jégou 机构:Facebook AI 摘要:在十亿规模的数据集上快速检索相似向量的现代方法依赖于压缩域方法,如二进制草图或产品量化。这些方法将一定的损失最小化,通常是均方误差或其他针对检索问题定制的目标函数。在本文中,我们将二进制哈希或乘积量化器等流行方法重新解释为自动编码器,并指出它们隐含地对解码器的形式做出了次优假设。我们设计了向后兼容的解码器,改进了从相同代码中重构向量的过程,从而在最近邻搜索中获得更好的性能。与二进制散列方法或流行基准上的产品量化相比,我们的方法有了显著的改进。 摘要:Modern approaches for fast retrieval of similar vectors on billion-scaled datasets rely on compressed-domain approaches such as binary sketches or product quantization. These methods minimize a certain loss, typically the mean squared error or other objective functions tailored to the retrieval problem. In this paper, we re-interpret popular methods such as binary hashing or product quantizers as auto-encoders, and point out that they implicitly make suboptimal assumptions on the form of the decoder. We design backward-compatible decoders that improve the reconstruction of the vectors from the same codes, which translates to a better performance in nearest neighbor search. Our method significantly improves over binary hashing methods or product quantization on popular benchmarks.

优化|敛散性(5篇)

【1】 From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated Optimization 标题:从恶化到加速:一种修复联合优化中步长不同步的校准方法 链接:https://arxiv.org/abs/2112.09355

作者:Feijie Wu,Song Guo,Haozhao Wang,Zhihao Qu,Haobo Zhang,Jie Zhang,Ziming Liu 机构:The Hong Kong Polytechnic University,Huazhong University of Science and Technology,Hohai University,National, University of Singapore 摘要:在联邦优化设置中,全局模型定期聚合,当参与者充分利用其计算资源进行模型训练时,会出现阶跃异步。众所周知,在非i.i.d.数据下,阶跃异步会导致客观不一致,从而降低模型精度。为了解决这个问题,我们提出了一种新的算法texttt{FedaGrac},它将局部方向校准为预测的全局方向。利用估计的方向,我们保证聚合模型不会过度偏离预期方向,同时充分利用更快节点的局部更新。我们从理论上证明了texttt{FedaGrac}比最新的方法具有更好的收敛速度,并消除了阶跃异步的负面影响。实验结果表明,该算法加快了训练速度,提高了最终精度。 摘要:In the setting of federated optimization, where a global model is aggregated periodically, step asynchronism occurs when participants conduct model training with fully utilizing their computational resources. It is well acknowledged that step asynchronism leads to objective inconsistency under non-i.i.d. data, which degrades the model accuracy. To address this issue, we propose a new algorithm texttt{FedaGrac}, which calibrates the local direction to a predictive global orientation. Taking the advantage of estimated orientation, we guarantee that the aggregated model does not excessively deviate from the expected orientation while fully utilizing the local updates of faster nodes. We theoretically prove that texttt{FedaGrac} holds an improved order of convergence rate than the state-of-the-art approaches and eliminates the negative effect of step asynchronism. Empirical results show that our algorithm accelerates the training and enhances the final accuracy.

【2】 Optimal discharge of patients from intensive care via a data-driven policy learning framework 标题:通过数据驱动的政策学习框架优化重症监护患者的放电 链接:https://arxiv.org/abs/2112.09315

作者:Fernando Lejarza,Jacob Calvert,Misty M Attwood,Daniel Evans,Qingqing Mao 机构:Dascena, Inc., Sowden Road, Suite B, Houston, TX ,-, USA, McKetta Department of Chemical Engineering, The University of Texas at Austin, Austin, TX 摘要:植根于机器学习和优化的临床决策支持工具可以为医疗保健提供者提供重要价值,包括更好地管理重症监护病房。尤其重要的是,患者出院任务应处理减少患者住院时间(和相关住院费用)与出院决定后再入院甚至死亡风险之间的微妙权衡。这项工作引入了一个端到端的通用框架,用于捕获这种权衡,从而根据患者的电子健康记录推荐最佳出院时间决策。数据驱动的方法用于导出一种简洁、离散的状态空间表示法,以捕获患者的生理状况。基于该模型和给定的成本函数,建立了一个无限期贴现马尔可夫决策过程,并对其进行了数值求解,计算出一个最优排放政策,该政策的价值采用非政策评估策略进行评估。通过大量的数值实验,使用真实的重症监护病房患者数据验证了所提出的框架。 摘要:Clinical decision support tools rooted in machine learning and optimization can provide significant value to healthcare providers, including through better management of intensive care units. In particular, it is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay (and associated hospitalization costs) and the risk of readmission or even death following the discharge decision. This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions given a patient's electronic health records. A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition. Based on this model and a given cost function, an infinite-horizon discounted Markov decision process is formulated and solved numerically to compute an optimal discharge policy, whose value is assessed using off-policy evaluation strategies. Extensive numerical experiments are performed to validate the proposed framework using real-life intensive care unit patient data.

【3】 A Robust Optimization Approach to Deep Learning 标题:一种用于深度学习的鲁棒优化方法 链接:https://arxiv.org/abs/2112.09279

作者:Dimitris Bertsimas,Xavier Boix,Kimberly Villalobos Carballo,Dick den Hertog 机构:Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA , USA, Department of Brain and Cognitive Sciences, Amsterdam Business School, University of Amsterdam 摘要:许多最先进的对抗性训练方法利用对抗性损失的上限来提供安全保障。然而,这些方法需要在每个训练步骤进行计算,而这些计算不能包含在反向传播的梯度中。我们介绍了一种新的、更具原则性的对抗性训练方法,该方法基于对抗性损失上界的封闭形式解,可以通过反向传播进行有效训练。稳健优化的最新工具促进了这一界限。我们用我们的方法导出了两种新方法。第一种方法(近似鲁棒上界或aRUB)使用网络的一阶近似以及线性鲁棒优化的基本工具,以获得易于实现的对抗性损失的近似上界。第二种方法(鲁棒上界或RUB)计算对抗损失的精确上界。通过各种表格和视觉数据集,我们证明了我们更具原则性的方法的有效性——对于较大的扰动,RUB比最先进的方法更加稳健,而对于较小的扰动,aRUB的性能与最先进的方法相当。此外,RUB和aRUB的速度都比标准对抗训练快(以增加记忆力为代价)。所有重现结果的代码都可以在https://github.com/kimvc7/Robustness. 摘要:Many state-of-the-art adversarial training methods leverage upper bounds of the adversarial loss to provide security guarantees. Yet, these methods require computations at each training step that can not be incorporated in the gradient for backpropagation. We introduce a new, more principled approach to adversarial training based on a closed form solution of an upper bound of the adversarial loss, which can be effectively trained with backpropagation. This bound is facilitated by state-of-the-art tools from robust optimization. We derive two new methods with our approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from linear robust optimization to obtain an approximate upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes an exact upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our more principled approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations. Also, both RUB and aRUB run faster than standard adversarial training (at the expense of an increase in memory). All the code to reproduce the results can be found at https://github.com/kimvc7/Robustness.

【4】 On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks 标题:深度神经网络训练中梯度下降法的全局极小值存在性及收敛性分析 链接:https://arxiv.org/abs/2112.09684

作者:Arnulf Jentzen,Adrian Riekert 机构: Applied Mathematics: Institute for Analysis and Numerics, School of Data Science and Shenzhen Research Institute of Big Data 备注:93 pages, 2 figures. arXiv admin note: text overlap with arXiv:2112.07369, arXiv:2108.04620 摘要:在本文中,我们研究了具有任意多个隐层的全连接前馈深ReLU神经网络,并在假设其概率分布的非正规概率密度函数为所考虑的监督学习问题的输入数据是分段多项式,假设目标函数(描述输入数据和输出数据之间的关系)是分段多项式,并且假设所考虑的监督学习问题的风险函数至少允许一个正则全局极小值。此外,在只有一个隐藏层和一维输入的浅层人工神经网络的特殊情况下,我们还通过在训练此类浅层人工神经网络时证明,对于每个Lipschitz连续目标函数,风险景观中存在一个全局最小值来验证这一假设。最后,在训练具有ReLU激活的深层神经网络时,我们还研究了梯度流(GF)微分方程的解,并证明了每个非发散GF轨迹以多项式的收敛速度收敛到临界点(在限制Fr'echet次微分的意义上)。我们的数学收敛性分析建立在真实代数几何工具(如半代数函数和广义Kurdyka-Lojasiewicz不等式的概念)的基础上,建立在函数分析工具(如Arzel`a-Ascoli定理)的基础上,建立在非光滑分析工具(如限制Fr`echet次梯度的概念)的基础上,以及具有固定结构的浅层ReLU ANN的实现函数集形成Petersen等人揭示的连续函数集的闭合子集这一事实。 摘要:In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fr'echet subdifferentiability). Our mathematical convergence analysis builds up on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzel`a-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fr'echet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.

【5】 Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for Solving Nonconvex Min-Max Problems 标题:解非凸Min-Max问题的双时间尺度梯度下降-上升动力学的收敛速度 链接:https://arxiv.org/abs/2112.09579

作者:Thinh T. Doan 机构: Doan is with the Bradley Department of Electrical and Computer Engineering 摘要:由于其在机器学习、网络资源分配和分布式优化等领域的广泛应用,近年来在解决非协调极小极大优化问题方面引起了广泛的兴趣。也许,在求解最小-最大优化问题时,最流行的一阶方法是所谓的同步(或单循环)梯度下降-上升算法,因为其实现简单。然而,该算法收敛性的理论保证是非常稀疏的,因为即使在一个简单的双线性问题中它也可能发散。在本文中,我们的重点是描述连续时变同步梯度下降上升算法的有限时间性能(或收敛速度)。特别地,我们推导了该方法在若干不同条件下对基本目标函数的收敛速度,即双边Polyak-L ojasiewicz(PL)、单边PL、非凸强凹和强凸非凹条件。在相同的目标函数条件下,我们的收敛结果改进了前人的工作。我们分析的关键思想是使用经典奇异摄动理论和耦合李雅普诺夫函数来解决梯度下降和上升动力学之间的时间尺度差异和相互作用。我们关于连续时间算法行为的结果可以用来增强其离散时间对应算法的收敛性。 摘要:There are much recent interests in solving noncovnex min-max optimization problems due to its broad applications in many areas including machine learning, networked resource allocations, and distributed optimization. Perhaps, the most popular first-order method in solving min-max optimization is the so-called simultaneous (or single-loop) gradient descent-ascent algorithm due to its simplicity in implementation. However, theoretical guarantees on the convergence of this algorithm is very sparse since it can diverge even in a simple bilinear problem. In this paper, our focus is to characterize the finite-time performance (or convergence rates) of the continuous-time variant of simultaneous gradient descent-ascent algorithm. In particular, we derive the rates of convergence of this method under a number of different conditions on the underlying objective function, namely, two-sided Polyak-L ojasiewicz (PL), one-sided PL, nonconvex-strongly concave, and strongly convex-nonconcave conditions. Our convergence results improve the ones in prior works under the same conditions of objective functions. The key idea in our analysis is to use the classic singular perturbation theory and coupling Lyapunov functions to address the time-scale difference and interactions between the gradient descent and ascent dynamics. Our results on the behavior of continuous-time algorithm may be used to enhance the convergence properties of its discrete-time counterpart.

其他神经网络|深度学习|模型|建模(18篇)

【1】 Deep Learning for Spatiotemporal Modeling of Urbanization 标题:深度学习在城市化时空建模中的应用 链接:https://arxiv.org/abs/2112.09668

作者:Tang Li,Jing Gao,Xi Peng 机构:University of Delaware, Newark, DE 备注:Accepted by NeurIPS 2021 MLPH (Machine Learning in Public Health) Workshop; Best Paper Awarded by NeurIPS 2021 MLPH (Machine Learning in Public Health) Workshop 摘要:城市化对全世界人口的健康和福祉产生了巨大影响。因此,城市化的预测性空间模型可以成为有效公共卫生规划的有用工具。许多空间城市化模型已经使用经典的机器学习和数值建模技术开发出来。然而,深度学习及其捕获复杂时空现象的能力尚未应用于城市化建模。在这里,我们探讨了城市化预测模型的深层空间学习能力。我们将数字地理空间数据视为具有像素和通道的图像,并通过扩充来丰富数据集,以利用深度学习的高容量。我们得到的模型可以生成端到端的多变量城市化预测,并在初步比较中优于最先进的经典机器学习城市化模型。 摘要:Urbanization has a strong impact on the health and wellbeing of populations across the world. Predictive spatial modeling of urbanization therefore can be a useful tool for effective public health planning. Many spatial urbanization models have been developed using classic machine learning and numerical modeling techniques. However, deep learning with its proven capacity to capture complex spatiotemporal phenomena has not been applied to urbanization modeling. Here we explore the capacity of deep spatial learning for the predictive modeling of urbanization. We treat numerical geospatial data as images with pixels and channels, and enrich the dataset by augmentation, in order to leverage the high capacity of deep learning. Our resulting model can generate end-to-end multi-variable urbanization predictions, and outperforms a state-of-the-art classic machine learning urbanization model in preliminary comparisons.

【2】 ColO-RAN: Developing Machine Learning-based xApps for Open RAN Closed-loop Control on Programmable Experimental Platforms 标题:Colo-RAN:在可编程实验平台上开发基于机器学习的开环闭环控制xApps 链接:https://arxiv.org/abs/2112.09559

作者:Michele Polese,Leonardo Bonati,Salvatore D'Oro,Stefano Basagni,Tommaso Melodia 机构:! 备注:Submitted for publication to the IEEE. 13 pages, 13 figures, 2 tables 摘要:尽管开放RAN带来了新的机遇,但基于ML的网络自动化进展缓慢,主要原因是无法获得大规模数据集和实验测试基础设施。这减缓了深度强化学习(DRL)代理在真实网络上的发展和广泛采用,延缓了智能和自主RAN控制的进展。在本文中,我们通过为开放RAN中基于DRL的闭环控制的设计、训练、测试和实验评估提出实用的解决方案和软件管道来应对这些挑战。我们介绍了ColO-RAN,这是第一个公开的大规模O-RAN测试框架,其中包含软件定义的无线电。基于斗兽场无线网络仿真器的规模和计算能力,ColO-RAN使用O-RAN组件、可编程基站和“无线数据工厂”实现了大规模的ML研究。具体而言,我们设计并开发了三个用于基于DRL的RAN切片控制、调度和在线模型训练的示例性XAPP,并在具有7个软化基站和42个用户的蜂窝网络上评估了它们的性能。最后,我们通过在室内可编程测试平台Arena上部署ColO-RAN,展示了它在不同平台上的可移植性。我们第一次大规模评估的广泛结果突出了基于DRL的自适应控制的优点和挑战。他们还提供了关于无线DRL管道开发的见解,从数据分析到DRL代理设计,以及与实时运行训练相关的权衡。ColO RAN和收集的大规模数据集将向研究社区公开。 摘要:In spite of the new opportunities brought about by the Open RAN, advances in ML-based network automation have been slow, mainly because of the unavailability of large-scale datasets and experimental testing infrastructure. This slows down the development and widespread adoption of Deep Reinforcement Learning (DRL) agents on real networks, delaying progress in intelligent and autonomous RAN control. In this paper, we address these challenges by proposing practical solutions and software pipelines for the design, training, testing, and experimental evaluation of DRL-based closed-loop control in the Open RAN. We introduce ColO-RAN, the first publicly-available large-scale O-RAN testing framework with software-defined radios-in-the-loop. Building on the scale and computational capabilities of the Colosseum wireless network emulator, ColO-RAN enables ML research at scale using O-RAN components, programmable base stations, and a "wireless data factory". Specifically, we design and develop three exemplary xApps for DRL-based control of RAN slicing, scheduling and online model training, and evaluate their performance on a cellular network with 7 softwarized base stations and 42 users. Finally, we showcase the portability of ColO-RAN to different platforms by deploying it on Arena, an indoor programmable testbed. Extensive results from our first-of-its-kind large-scale evaluation highlight the benefits and challenges of DRL-based adaptive control. They also provide insights on the development of wireless DRL pipelines, from data analysis to the design of DRL agents, and on the tradeoffs associated to training on a live RAN. ColO-RAN and the collected large-scale dataset will be made publicly available to the research community.

【3】 Stability Verification in Stochastic Control Systems via Neural Network Supermartingales 标题:基于神经网络上鞅的随机控制系统稳定性验证 链接:https://arxiv.org/abs/2112.09495

作者:Mathias Lechner,Đorđe Žikelić,Krishnendu Chatterjee,Thomas A. Henzinger 机构:IST Austria, Klosterneuburg, Austria 备注:Accepted by AAAI 2022 摘要:研究了离散时间非线性随机控制系统的几乎检验(A.S)渐近稳定性的形式化验证问题。虽然确定性控制系统的稳定性验证在文献中得到了广泛的研究,但随机控制系统的稳定性验证是一个开放的问题。关于这一主题的一些现有的工作要么只考虑专门形式的随机性,要么对系统做出限制性假设,使它们不适用于使用神经网络策略的学习算法。在这项工作中,我们提出了一种求解一般非线性随机控制问题的方法,具有两个新的方面:(a)我们使用排序超鞅(RSMs)来证明a.s.~渐近稳定性,而不是经典的Lyapunov函数的随机延拓;(b)我们提出了一种学习神经网络RSMs的方法。我们证明了我们的方法保证了系统的a.s.渐近稳定性,并提供了第一种获得稳定时间界的方法,而随机Lyapunov函数则没有。最后,我们在一组具有神经网络策略的非线性随机强化学习环境中进行了实验验证。 摘要:We consider the problem of formally verifying almost-sure (a.s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a.s.~asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a.s.~asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.

【4】 Learning in Restless Bandits under Exogenous Global Markov Process 标题:外生全局马尔可夫过程下的无休止带学习 链接:https://arxiv.org/abs/2112.09484

作者:Tomer Gafni,Michal Yemini,Kobi Cohen 备注:14 pages, 6 figures. arXiv admin note: text overlap with arXiv:1906.08120 摘要:我们考虑到不稳定的多武装强盗(RMAB)问题的未知臂动态,其中未知的外源全局马尔可夫过程管理每个手臂的奖励分布。在每个全局状态下,每个手臂的奖励过程根据未知的马尔可夫规则演化,这在不同的手臂之间是不相同的。每次,玩家从$N$arms中选择一只手臂进行游戏,并从一组有限的奖励状态中获得随机奖励。手臂是不安的,也就是说,无论玩家的动作如何,它们的局部状态都会发生变化。基于最近对相关RMAB设置的研究,后悔被定义为对了解问题动态的玩家的奖励损失,并且每次玩$t$使预期即时价值最大化的手臂。我们的目标是制定一个武器选择政策,最大限度地减少遗憾。为此,我们开发了外生马尔可夫过程(LEMP)下的学习算法。我们从理论上分析了LEMP,并建立了一个关于遗憾的有限样本界。我们证明了LEMP随时间达到对数遗憾阶。我们进一步对LEMP进行了数值分析,并给出了支持理论结果的仿真结果,证明了LEMP明显优于其他算法。 摘要:We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule, which is non-identical among different arms. At each time, a player chooses an arm out of $N$ arms to play, and receives a random reward from a finite set of reward states. The arms are restless, that is, their local state evolves regardless of the player's actions. Motivated by recent studies on related RMAB settings, the regret is defined as the reward loss with respect to a player that knows the dynamics of the problem, and plays at each time $t$ the arm that maximizes the expected immediate value. The objective is to develop an arm-selection policy that minimizes the regret. To that end, we develop the Learning under Exogenous Markov Process (LEMP) algorithm. We analyze LEMP theoretically and establish a finite-sample bound on the regret. We show that LEMP achieves a logarithmic regret order with time. We further analyze LEMP numerically and present simulation results that support the theoretical findings and demonstrate that LEMP significantly outperforms alternative algorithms.

【5】 Visual Learning-based Planning for Continuous High-Dimensional POMDPs 标题:基于可视化学习的连续高维POMDP规划 链接:https://arxiv.org/abs/2112.09456

作者:Sampada Deglurkar,Michael H. Lim,Johnathan Tucker,Zachary N. Sunberg,Aleksandra Faust,Claire J. Tomlin 机构:Department of Electrical Engineering and Computer Sciences, UC Berkeley, Department of Aerospace Engineering Science, CU Boulder, Google Research 摘要:部分可观测马尔可夫决策过程(POMDP)是一个强大的框架,用于捕获涉及状态和转移不确定性的决策问题。然而,目前大多数POMDP规划者无法有效处理他们在现实世界中经常遇到的高维观测(例如机器人领域中的图像观测)。在这项工作中,我们提出了可视化树搜索(VTS),这是一种学习和规划过程,将离线学习的生成模型与在线基于模型的POMDP规划相结合。VTS通过利用一组深度生成观测模型在蒙特卡罗树搜索规划器中预测和评估图像观测的可能性,将离线模型训练和在线规划联系起来。我们证明了VTS对不同的观测噪声具有鲁棒性,并且由于它采用了在线、基于模型的规划,可以适应不同的奖励结构,而无需重新训练。这种新方法在策略规划算法方面优于最新的基线状态,同时显著减少了离线训练时间。 摘要:The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle very high-dimensional observations they often encounter in the real world (e.g. image observations in robotic domains). In this work, we propose Visual Tree Search (VTS), a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train. This new approach outperforms a baseline state-of-the-art on-policy planning algorithm while using significantly less offline training time.

【6】 Quality of Data in Machine Learning 标题:机器学习中的数据质量 链接:https://arxiv.org/abs/2112.09400

作者:Antti Kariluoto,Arto Pärnänen,Joni Kultanen,Jukka Soininen,Pekka Abrahamsson 机构:NOTES:, This is the author’s version of the work., The definite version was presented in International Workshop on Data Quality for, Intelligent Systems (DQIS), which was a co-located event of QRS , (The ,st IEEE 备注:Presented in International Workshop on Data Quality for Intelligent Systems (DQIS), which was a co-located event of QRS 2021 (The 21st IEEE International Conference on Software Quality, Reliability, and Security) 摘要:一个常见的假设是,当机器学习模型有更多的数据可供学习时,它们会提高性能。在这项研究中,作者希望通过利用新的职业学生数据进行实证实验来澄清这一困境。实验比较了不同的机器学习算法,同时改变了可用于训练和测试模型的数据和特征组合的数量。实验表明,数据记录或其采样频率的增加不会立即导致模型精度或性能的显著提高,但在集合模型的情况下,精度的方差会减小。随着模型输入特征数量的增加,也出现了类似的现象。该研究反驳了最初的假设,并继续指出,在这种情况下,数据的重要性在于数据的质量,而不是数据的数量。 摘要:A common assumption exists according to which machine learning models improve their performance when they have more data to learn from. In this study, the authors wished to clarify the dilemma by performing an empirical experiment utilizing novel vocational student data. The experiment compared different machine learning algorithms while varying the number of data and feature combinations available for training and testing the models. The experiment revealed that the increase of data records or their sample frequency does not immediately lead to significant increases in the model accuracies or performance, however the variance of accuracies does diminish in the case of ensemble models. Similar phenomenon was witnessed while increasing the number of input features for the models. The study refutes the starting assumption and continues to state that in this case the significance in data lies in the quality of the data instead of the quantity of the data.

【7】 Improving evidential deep learning via multi-task learning 标题:利用多任务学习改进证据深度学习 链接:https://arxiv.org/abs/2112.09368

作者:Dongpin Oh,Bonggun Shin 机构: Deargen Inc., Seoul, South Korea, Deargen USA Inc., Atlanta, GA 备注:Accepted by AAAI-2022 摘要:证据回归网络(ENet)估计连续目标及其预测不确定性,无需昂贵的贝叶斯模型平均。然而,由于ENet原始损失函数的梯度收缩问题,即负对数边际似然(NLL)损失,目标预测可能不准确。在本文中,目标是通过解决梯度收缩问题来提高ENet的预测精度,同时保持其有效的不确定性估计。为了实现这一目标,提出了一个多任务学习(MTL)框架,称为MT-ENet。在MTL中,我们将Lipschitz修正均方误差(MSE)损失函数定义为另一损失,并将其添加到现有NLL损失中。Lipschitz修正的MSE损耗通过动态调整其Lipschitz常数来缓解与NLL损耗的梯度冲突。这样,Lipschitz MSE损失不会干扰NLL损失的不确定性估计。MT-ENet在不损失合成数据集和真实基准(包括药物靶点亲和力(DTA)回归)的不确定性估计能力的情况下,提高了ENet的预测准确性。此外,MT-ENet在DTA基准上显示出显著的校准和分布外检测能力。 摘要:The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.

【8】 ST2Vec: Spatio-Temporal Trajectory Similarity Learning in Road Networks 标题:ST2VEC:路网中的时空轨迹相似性学习 链接:https://arxiv.org/abs/2112.09339

作者:Ziquan Fang,Yuntao Du,Xinjun Zhu,Lu Chen,Yunjun Gao,Christian S. Jensen 机构:†College of Computer Science, Zhejiang University, Hangzhou, China, ♯School of Software, Zhejiang University, Ningbo, China, §Department of Computer Science, Aalborg University, Aalborg, Denmark 摘要:人和车辆的轨迹体现了交通基础设施的重要信息,轨迹相似性计算是许多涉及轨迹数据分析的实际应用中的功能。近年来,基于深度学习的轨迹相似技术与传统的相似技术相比,具有提高效率和适应性的潜力。然而,现有的轨迹相似性学习方案强调空间相似性而不是时间相似性,这使得它们在时间感知分析中不是最优的。为此,我们提出了ST2Vec,这是一种基于轨迹表示学习的体系结构,它考虑了道路网络中用于时空相似性学习的轨迹对之间的细粒度空间和时间相关性。据我们所知,这是第一个针对时空轨迹相似性分析的深度学习方案。具体而言,ST2Vec包括三个阶段:(i)选择代表性训练样本的训练数据准备;(ii)对轨迹的空间和时间特征进行编码的空间和时间建模,其中设计了通用时间建模模块(TMM);(iii)时空共同注意融合(STCF),其中开发了统一融合(UF)方法,以帮助生成统一的时空轨迹嵌入,从而捕获轨迹之间的时空相似关系。此外,受课程理念的启发,ST2Vec采用课程学习进行模型优化,以提高收敛性和有效性。一项实验研究表明,ST2Vec在有效性、效率和可扩展性方面大大优于所有最先进的竞争对手,同时显示出较低的参数敏感性和良好的模型稳健性。 摘要:People and vehicle trajectories embody important information of transportation infrastructures, and trajectory similarity computation is functionality in many real-world applications involving trajectory data analysis. Recently, deep-learning based trajectory similarity techniques hold the potential to offer improved efficiency and adaptability over traditional similarity techniques. Nevertheless, the existing trajectory similarity learning proposals emphasize spatial similarity over temporal similarity, making them suboptimal for time-aware analyses. To this end, we propose ST2Vec, a trajectory-representation-learning based architecture that considers fine-grained spatial and temporal correlations between pairs of trajectories for spatio-temporal similarity learning in road networks. To the best of our knowledge, this is the first deep-learning proposal for spatio-temporal trajectory similarity analytics. Specifically, ST2Vec encompasses three phases: (i) training data preparation that selects representative training samples; (ii) spatial and temporal modeling that encode spatial and temporal characteristics of trajectories, where a generic temporal modeling module (TMM) is designed; and (iii) spatio-temporal co-attention fusion (STCF), where a unified fusion (UF) approach is developed to help generating unified spatio-temporal trajectory embeddings that capture the spatio-temporal similarity relations between trajectories. Further, inspired by curriculum concept, ST2Vec employs the curriculum learning for model optimization to improve both convergence and effectiveness. An experimental study offers evidence that ST2Vec outperforms all state-of-the-art competitors substantially in terms of effectiveness, efficiency, and scalability, while showing low parameter sensitivity and good model robustness.

【9】 Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards 标题:通过合成数据奖励激励机器学习中的协作 链接:https://arxiv.org/abs/2112.09327

作者:Sebastian Shenghong Tay,Xinyi Xu,Chuan Sheng Foo,Bryan Kian Hsiang Low 机构:Department of Computer Science, National University of Singapore, Singapore, Institute for Infocomm Research, ASTAR, Singapore 备注:36th AAAI Conference on Artificial Intelligence (AAAI 2022), Extended version with derivations, 42 pages 摘要:本文提出了一种新的协作生成性建模(CGM)框架,该框架鼓励自利方之间的协作,向生成性模型(如GAN)训练池贡献数据,从中提取合成数据并将其分发给各方,作为与其贡献相称的奖励。将合成数据作为奖励(而不是训练有素的模型或金钱)分发给下游学习任务提供了任务和模型不可知的好处,并且不太可能违反数据隐私条例。为了实现该框架,我们首先提出了一个基于最大均值差(MMD)的数据估值函数,该函数根据数据的数量和质量来评估数据是否接近真实数据分布,并提供了理论结果,指导基于MMD的数据估值函数的核选择。然后,我们将奖励方案描述为一个线性优化问题,当该问题得到解决时,在CGM框架中保证一定的激励,如公平性。我们设计了一种加权抽样算法,用于生成作为奖励分发给各方的合成数据,以使其数据和合成数据的值与奖励方案分配的奖励值相匹配。我们通过使用模拟和真实数据集的经验表明,各方的综合数据奖励与其贡献相称。 摘要:This paper presents a novel collaborative generative modeling (CGM) framework that incentivizes collaboration among self-interested parties to contribute data to a pool for training a generative model (e.g., GAN), from which synthetic data are drawn and distributed to the parties as rewards commensurate to their contributions. Distributing synthetic data as rewards (instead of trained models or money) offers task- and model-agnostic benefits for downstream learning tasks and is less likely to violate data privacy regulation. To realize the framework, we firstly propose a data valuation function using maximum mean discrepancy (MMD) that values data based on its quantity and quality in terms of its closeness to the true data distribution and provide theoretical results guiding the kernel choice in our MMD-based data valuation function. Then, we formulate the reward scheme as a linear optimization problem that when solved, guarantees certain incentives such as fairness in the CGM framework. We devise a weighted sampling algorithm for generating synthetic data to be distributed to each party as reward such that the value of its data and the synthetic data combined matches its assigned reward value by the reward scheme. We empirically show using simulated and real-world datasets that the parties' synthetic data rewards are commensurate to their contributions.

【10】 Procedural Kernel Networks 标题:过程性核心网络 链接:https://arxiv.org/abs/2112.09318

作者:Bartlomiej Wronski 机构:Google Research 备注:11 pages, technical report 摘要:在过去的十年中,卷积神经网络(CNN)定义了许多低级图像处理和恢复任务的最新技术,如去噪、去噪、放大或修复。然而,设备上的移动摄影仍然被传统的图像处理技术所主导,并且大多使用简单的机器学习技术,或者将神经网络处理局限于生成低分辨率的掩模。CNN的高计算和内存要求、移动设备的有限处理能力和热约束,以及大的输出图像分辨率(通常为8-12 MPix)阻碍了其更广泛的应用。在这项工作中,我们介绍了过程核网络(PKNs),一系列机器学习模型,用于生成图像过滤核或其他传统算法的参数。一个轻量级的CNN以较低的分辨率处理输入图像,与其他基于内核的机器学习方法相比,它产生了显著的加速,并允许新的应用。该体系结构是端到端学习的,特别适合于广泛的低级图像处理任务,在这些任务中,它提高了许多传统算法的性能。我们还描述了该框架如何将以前的一些工作结合起来,将机器学习应用于常见的图像恢复任务。 摘要:In the last decade Convolutional Neural Networks (CNNs) have defined the state of the art for many low level image processing and restoration tasks such as denoising, demosaicking, upscaling, or inpainting. However, on-device mobile photography is still dominated by traditional image processing techniques, and uses mostly simple machine learning techniques or limits the neural network processing to producing low resolution masks. High computational and memory requirements of CNNs, limited processing power and thermal constraints of mobile devices, combined with large output image resolutions (typically 8--12 MPix) prevent their wider application. In this work, we introduce Procedural Kernel Networks (PKNs), a family of machine learning models which generate parameters of image filter kernels or other traditional algorithms. A lightweight CNN processes the input image at a lower resolution, which yields a significant speedup compared to other kernel-based machine learning methods and allows for new applications. The architecture is learned end-to-end and is especially well suited for a wide range of low-level image processing tasks, where it improves the performance of many traditional algorithms. We also describe how this framework unifies some previous work applying machine learning for common image restoration tasks.

【11】 MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling 标题:MIDI-DDSP:基于分层建模的音乐演奏细节控制 链接:https://arxiv.org/abs/2112.09312

作者:Yusong Wu,Ethan Manilow,Yi Deng,Rigel Swavely,Kyle Kastner,Tim Cooijmans,Aaron Courville,Cheng-Zhi Anna Huang,Jesse Engel 机构:∗ Equal Contribution, Mila, Quebec Artificial Intelligence Institute, Universit´e de Montr´eal, Northwestern University,New York University,Google Brain 摘要:音乐表达需要控制演奏的音符和演奏方式。传统的音频合成器提供了详细的表现力控制,但以牺牲真实感为代价。黑盒神经音频合成和级联采样器可以产生逼真的音频,但控制机制很少。在这项工作中,我们介绍了MIDI-DDSP这一乐器的分层模型,它支持真实的神经音频合成和详细的用户控制。从可解释可微数字信号处理(DDSP)合成参数开始,我们推断音符及其表现性能的高级属性(如音色、颤音、动力学和清晰度)。这将创建一个三级层次结构(注释、绩效、综合),使个人可以选择在每一级进行干预,或利用经过训练的先验知识(绩效注释、综合绩效)进行创造性帮助。通过定量实验和听力测试,我们证明了该层次结构可以重建高保真音频,准确预测音符序列的性能属性,独立操作给定性能的属性,并且作为一个完整的系统,可以从一个新的音符序列生成逼真的音频。MIDI-DDSP通过利用可解释的层次结构和多层次的粒度,打开了辅助工具的大门,使个人能够跨越各种各样的音乐体验。 摘要:Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.

【12】 DNA: Dynamic Network Augmentation 标题:DNA:动态网络增强 链接:https://arxiv.org/abs/2112.09277

作者:Scott Mahan,Tim Doster,Henry Kvinge 机构:University of California, San Diego, Pacific Northwest National Laboratory 摘要:在许多分类问题中,我们需要一个对一系列非语义转换具有鲁棒性的分类器。例如,无论狗出现的方向和姿势如何,人类都可以在图片中识别出它。大量证据表明,这种不变性可以显著提高机器学习模型的准确性和泛化能力。教授模型几何不变性的一种常用技术是使用变换后的输入增加训练数据。然而,对于给定的分类任务,需要哪些不变性并不总是已知的。确定有效的数据扩充策略可能需要领域专业知识或广泛的数据预处理。最近的工作,如在数据扩充策略的参数化搜索空间上进行AutoAugment Optimization,以自动化扩充过程。虽然AutoAugment和类似方法在几种常见数据集上实现了最先进的分类精度,但它们仅限于学习一种数据扩充策略。通常情况下,不同的类或特征需要不同的几何不变性。我们引入动态网络扩充(DNA),它学习输入条件扩充策略。我们模型中的增广参数是神经网络的输出,并且随着网络权值的更新而隐式学习。我们的模型允许动态扩充策略,并且在具有输入特征条件的几何变换的数据上表现良好。 摘要:In many classification problems, we want a classifier that is robust to a range of non-semantic transformations. For example, a human can identify a dog in a picture regardless of the orientation and pose in which it appears. There is substantial evidence that this kind of invariance can significantly improve the accuracy and generalization of machine learning models. A common technique to teach a model geometric invariances is to augment training data with transformed inputs. However, which invariances are desired for a given classification task is not always known. Determining an effective data augmentation policy can require domain expertise or extensive data pre-processing. Recent efforts like AutoAugment optimize over a parameterized search space of data augmentation policies to automate the augmentation process. While AutoAugment and similar methods achieve state-of-the-art classification accuracy on several common datasets, they are limited to learning one data augmentation policy. Often times different classes or features call for different geometric invariances. We introduce Dynamic Network Augmentation (DNA), which learns input-conditional augmentation policies. Augmentation parameters in our model are outputs of a neural network and are implicitly learned as the network weights are updated. Our model allows for dynamic augmentation policies and performs well on data with geometric transformations conditional on input features.

【13】 Automated Deep Learning: Neural Architecture Search Is Not the End 标题:自动化深度学习:神经架构搜索不是终点 链接:https://arxiv.org/abs/2112.09245

作者:Xuanyi Dong,David Jacob Kedziora,Katarzyna Musial,Bogdan Gabrys 机构: University of Technology Sydney 备注:65 pages, 9 tables, 4 figures 摘要:深度学习(DL)已被证明是在不同环境下开发模型的一种非常有效的方法,包括视觉感知、语音识别和机器翻译。然而,应用DL的端到端过程并不简单。它需要处理问题表述和上下文理解、数据工程、模型开发、部署、持续监控和维护等问题。此外,这些步骤中的每一步在知识和互动方面都严重依赖于人类,这阻碍了DL的进一步发展和民主化。因此,为了应对这些问题,在过去几年中出现了一个新的领域:自动化深度学习(AutoDL)。这一努力旨在尽量减少人类参与的需要,并以其在神经架构搜索(NAS)方面的成就而闻名,NAS是几项调查的焦点。也就是说,NAS不是AutoDL的全部。因此,本综述采用了一个总体的观点,检查了在整个原型DL工作流中对自动化的研究工作。在这样做的过程中,这项工作还提出了一套全面的十项标准,用于评估个人出版物和更广泛研究领域的现有工作。这些标准包括:新颖性、解决方案质量、效率、稳定性、可解释性、再现性、工程质量、可扩展性、可推广性和生态友好性。因此,最终,本次审查提供了20世纪20年代早期AutoDL的评估概述,确定了未来可能存在的进展机会。 摘要:Deep learning (DL) has proven to be a highly effective approach for developing models in diverse contexts, including visual perception, speech recognition, and machine translation. However, the end-to-end process for applying DL is not trivial. It requires grappling with problem formulation and context understanding, data engineering, model development, deployment, continuous monitoring and maintenance, and so on. Moreover, each of these steps typically relies heavily on humans, in terms of both knowledge and interactions, which impedes the further advancement and democratization of DL. Consequently, in response to these issues, a new field has emerged over the last few years: automated deep learning (AutoDL). This endeavor seeks to minimize the need for human involvement and is best known for its achievements in neural architecture search (NAS), a topic that has been the focus of several surveys. That stated, NAS is not the be-all and end-all of AutoDL. Accordingly, this review adopts an overarching perspective, examining research efforts into automation across the entirety of an archetypal DL workflow. In so doing, this work also proposes a comprehensive set of ten criteria by which to assess existing work in both individual publications and broader research areas. These criteria are: novelty, solution quality, efficiency, stability, interpretability, reproducibility, engineering quality, scalability, generalizability, and eco-friendliness. Thus, ultimately, this review provides an evaluative overview of AutoDL in the early 2020s, identifying where future opportunities for progress may exist.

【14】 Approximation of functions with one-bit neural networks 标题:用一位神经网络逼近函数 链接:https://arxiv.org/abs/2112.09181

作者:C. Sinan Güntürk,Weilin Li 备注:35 pages, 5 figures 摘要:本文研究了粗量化神经网络的逼近能力——这些网络的参数是从一小部分允许值中选择的。我们证明了任何光滑的多元函数都可以用适当的粗量化神经网络很好地逼近,并提供了一个定量的逼近率。对于二次激活,这只能用一位字母表来完成;对于ReLU激活,我们使用三位字母表。主要定理依赖于Bernstein多项式的重要性质。我们证明了关于Bernstein多项式逼近函数、基于Bernstein基的噪声整形量化以及通过粗量化神经网络实现Bernstein多项式的新结果。 摘要:This paper examines the approximation capabilities of coarsely quantized neural networks -- those whose parameters are selected from a small set of allowable values. We show that any smooth multivariate function can be arbitrarily well approximated by an appropriate coarsely quantized neural network and provide a quantitative approximation rate. For the quadratic activation, this can be done with only a one-bit alphabet; for the ReLU activation, we use a three-bit alphabet. The main theorems rely on important properties of Bernstein polynomials. We prove new results on approximation of functions with Bernstein polynomials, noise-shaping quantization on the Bernstein basis, and implementation of the Bernstein polynomials by coarsely quantized neural networks.

【15】 Effective prevention of semantic drift as angular distance in memory-less continual deep neural networks 标题:无记忆连续深层神经网络中语义漂移随角距离的有效防止 链接:https://arxiv.org/abs/2112.09175

作者:Khouloud Saadi,Muhammad Taimoor Khan 机构:Independant Researcher, Department of Computer Science, National University of Computer and Emerging Sciences, Pakistan 备注:5 pages, 3 figures 摘要:终身机器学习或持续学习模式试图通过在一系列任务中积累知识来进行增量学习。因此,这些模型学习得更好更快。它们用于与人类或任何动态环境交互的各种智能系统,例如聊天机器人和自动驾驶汽车。无记忆方法更常用于深层神经网络,该网络在其体系结构中容纳来自任务的传入信息。它使他们能够在所有看到的任务中表现良好。这些模型存在语义漂移或可塑性-稳定性困境。现有模型使用Minkowski距离度量来决定冻结、更新或复制哪些节点。这些距离度量不能提供更好的节点分离,因为它们容易受到高维稀疏向量的影响。在我们提出的方法中,我们使用角度距离来评估单个节点中的语义漂移,从而更好地分离节点,从而更好地平衡稳定性和可塑性。所提出的方法通过在标准数据集上保持更高的精度,优于最先进的模型。 摘要:Lifelong machine learning or continual learning models attempt to learn incrementally by accumulating knowledge across a sequence of tasks. Therefore, these models learn better and faster. They are used in various intelligent systems that have to interact with humans or any dynamic environment e.g., chatbots and self-driving cars. Memory-less approach is more often used with deep neural networks that accommodates incoming information from tasks within its architecture. It allows them to perform well on all the seen tasks. These models suffer from semantic drift or the plasticity-stability dilemma. The existing models use Minkowski distance measures to decide which nodes to freeze, update or duplicate. These distance metrics do not provide better separation of nodes as they are susceptible to high dimensional sparse vectors. In our proposed approach, we use angular distance to evaluate the semantic drift in individual nodes that provide better separation of nodes and thus better balancing between stability and plasticity. The proposed approach outperforms state-of-the art models by maintaining higher accuracy on standard datasets.

【16】 Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions 标题:二进制神经网络在磁隧道结无源阵列上的实现 链接:https://arxiv.org/abs/2112.09159

作者:Jonathan M. Goodwill,Nitin Prasad,Brian D. Hoskins,Matthew W. Daniels,Advait Madhavan,Lei Wan,Tiffany S. Santos,Michael Tran,Jordan A. Katine,Patrick M. Braganca,Mark D. Stiles,Jabez J. McClelland 机构:Gaithersburg, MD, USA, Associate, Physical Measurement Laboratory, National Institute of Standards and Technology, Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA 备注:22 pages plus 8 pages supplemental material; 7 figures plus 7 supplemental figures 摘要:随着神经网络规模的不断扩大及其应用空间的不断扩大,人们对更节能、更节省内存的人工智能专用硬件产生了需求。缓解冯·诺依曼瓶颈这一主要问题的途径包括内存和近内存架构,以及算法方法。在这里,我们利用磁性隧道结(MTJ)的低功耗和固有的二进制操作来演示基于被动MTJ阵列的神经网络硬件推断。一般来说,将经过训练的网络模型传输到硬件进行推理时,由于设备到设备的变化、写入错误、寄生电阻和衬底中的非理想性,性能会下降。为了量化这些硬件现实的影响,我们对两层感知器的300个独特权重矩阵解决方案进行基准测试,以对葡萄酒数据集的分类精度和写入保真度进行分类。尽管存在设备缺陷,但在具有一系列设备尺寸的15 x 15 MTJ阵列中,通过适当调整网络参数,我们实现了高达95.3%的软件等效精度。该优化过程的成功表明,需要新的指标来表征混合信号硬件中再现的网络的性能和质量。 摘要:The increasing scale of neural networks and their growing application space have produced demand for more energy- and memory-efficient artificial-intelligence-specific hardware. Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-to-device variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 x 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware.

【17】 An Empirical Investigation of the Role of Pre-training in Lifelong Learning 标题:关于职前训练在终身学习中作用的实证研究 链接:https://arxiv.org/abs/2112.09153

作者:Sanket Vaibhav Mehta,Darshan Patil,Sarath Chandar,Emma Strubell 机构:Carnegie Mellon University,Mila - Quebec AI Institute,University of Montreal, École Polytechnique de Montréal,Canada CIFAR AI Chair 备注:30 pages 摘要:机器学习中的终身学习模式是一种有吸引力的替代更为突出的孤立学习模式的方法,这不仅是因为它与生物学习相似,还因为它可以通过避免过度的模型再训练来减少能量浪费。这一范式的一个关键挑战是灾难性遗忘现象。随着预训练模型在机器学习中的日益普及和成功,我们提出了一个问题:预训练在终身学习中扮演什么角色,特别是在灾难性遗忘方面?我们在预先训练的大型模型中研究现有方法,并评估它们在各种文本和图像分类任务中的性能,包括使用15种不同NLP任务的新数据集进行的大规模研究。在所有设置中,我们观察到,与随机初始化的模型相比,通用预训练隐式地减轻了顺序学习多个任务时灾难性遗忘的影响。然后,我们进一步研究为什么在这种环境下,预先训练可以减轻遗忘。我们通过分析损失情况来研究这一现象,发现预先训练的权重似乎通过导致更大的极小值来缓解遗忘。基于这一观点,我们建议联合优化当前任务损失和损失盆地锐度,以便在顺序微调期间明确鼓励更宽的盆地。我们表明,这种优化方法可以在多个环境中实现与最先进的任务顺序连续学习相媲美的性能,而不会保留随任务数量而扩展的内存。 摘要:The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.

【18】 A random energy approach to deep learning 标题:深度学习的随机能量方法 链接:https://arxiv.org/abs/2112.09420

作者:Rongrong Xie,Matteo Marsili 机构:Key Laboratory of Quark and Lepton Physics (MOE) and Institute of Particle Physics, Central China Normal University (CCNU), Wuhan, China, Quantitative Life Sciences Section, The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy 备注:16 pages, 4 figures 摘要:我们研究了一个由各层隐态能级分布参数化的深度信念网络的通用集合。我们表明,在随机能量方法中,只有当每一层在学习过程中调整到接近临界点时,统计依赖性才能从可见层传播到深层。因此,经过有效训练的学习机器的特点是能量水平分布广泛。对不同数据集的深层信念网络和受限玻尔兹曼机器的分析证实了这些结论。 摘要:We study a generic ensemble of deep belief networks which is parametrized by the distribution of energy levels of the hidden states of each layer. We show that, within a random energy approach, statistical dependence can propagate from the visible to deep layers only if each layer is tuned close to the critical point during learning. As a consequence, efficiently trained learning machines are characterised by a broad distribution of energy levels. The analysis of Deep Belief Networks and Restricted Boltzmann Machines on different datasets confirms these conclusions.

其他(19篇)

【1】 An Online Data-Driven Emergency-Response Method for Autonomous Agents in Unforeseen Situations 标题:一种数据驱动的自治Agent在不可预见情况下的在线应急响应方法 链接:https://arxiv.org/abs/2112.09670

作者:Glenn Maguire,Nicholas Ketz,Praveen Pilly,Jean-Baptiste Mouret 机构: Inria, CNRS, Universit´e de Lorraine, Center for Human-Machine Collaboration, Information and Systems Sciences Laboratory, HRL Laboratories 摘要:强化学习代理在训练过程中遇到的输入分布中表现良好。然而,在他们接受额外训练之前,他们无法在面对新的发行外事件时做出有效反应。本文提出了一种在线、数据驱动的应急响应方法,旨在为自治代理提供对意外情况作出反应的能力,这些意外情况与它所训练或设计用于解决的情况大不相同。在这种情况下,由于在这些新情况下获得的观察结果不在代理优化处理的输入分布范围内,因此无法期望学习到的策略能够正确执行。所提出的方法通过选择使可变自动编码器的重建误差增加率最小化的操作来设计对不可预见情况的定制响应。使用改进的贝叶斯优化程序,以数据高效的方式(大约30个数据点)在线实现此优化。我们在一个模拟的3D汽车驾驶场景中展示了这种方法的潜力,在该场景中,agent在2秒内设计出一个响应,以避免与训练期间未看到的物体发生碰撞。 摘要:Reinforcement learning agents perform well when presented with inputs within the distribution of those encountered during training. However, they are unable to respond effectively when faced with novel, out-of-distribution events, until they have undergone additional training. This paper presents an online, data-driven, emergency-response method that aims to provide autonomous agents the ability to react to unexpected situations that are very different from those it has been trained or designed to address. In such situations, learned policies cannot be expected to perform appropriately since the observations obtained in these novel situations would fall outside the distribution of inputs that the agent has been optimized to handle. The proposed approach devises a customized response to the unforeseen situation sequentially, by selecting actions that minimize the rate of increase of the reconstruction error from a variational auto-encoder. This optimization is achieved online in a data-efficient manner (on the order of 30 data-points) using a modified Bayesian optimization procedure. We demonstrate the potential of this approach in a simulated 3D car driving scenario, in which the agent devises a response in under 2 seconds to avoid collisions with objects it has not seen during training.

【2】 Oil Spill SAR Image Segmentation via Probability Distribution Modelling 标题:基于概率分布建模的溢油SAR图像分割 链接:https://arxiv.org/abs/2112.09638

作者:Fang Chen,Aihua Zhang,Heiko Balzter,Peng Ren,Huiyu Zhou 机构: Beijing NormalUniversity-Hongkong Baptist University UIC 摘要:由于合成孔径雷达(SAR)图像的复杂性和不规则性,海上溢油的分割是一项具有挑战性的任务。在这项工作中,我们的目标是开发一种有效的分割方法,通过研究SAR图像的分布表示来解决SAR图像中的海洋溢油识别问题。为了寻求有效的溢油分割,我们重新研究了SAR成像机制,以获得溢油SAR图像的概率分布表示,其中对SAR图像的特征进行了适当的建模。然后,我们利用分布表示来建立分割能量函数,通过该函数,石油泄漏特征被纳入指导石油泄漏分割。此外,溢油分割模型包含溢油轮廓正则化项和更新的水平集正则化项,它们增强了分割能量泛函的代表性。利用SAR图像表示和溢油分割的同步性,我们提出的方法建立了一个有效的溢油分割框架。实验结果表明,我们提出的分割框架对不同类型的海洋溢油SAR图像分割是有效的。 摘要:Segmentation of marine oil spills in Synthetic Aperture Radar (SAR) images is a challenging task because of the complexity and irregularities in SAR images. In this work, we aim to develop an effective segmentation method which addresses marine oil spill identification in SAR images by investigating the distribution representation of SAR images. To seek effective oil spill segmentation, we revisit the SAR imaging mechanism in order to attain the probability distribution representation of oil spill SAR images, in which the characteristics of SAR images are properly modelled. We then exploit the distribution representation to formulate the segmentation energy functional, by which oil spill characteristics are incorporated to guide oil spill segmentation. Moreover, the oil spill segmentation model contains the oil spill contour regularisation term and the updated level set regularisation term which enhance the representational power of the segmentation energy functional. Benefiting from the synchronisation of SAR image representation and oil spill segmentation, our proposed method establishes an effective oil spill segmentation framework. Experimental evaluations demonstrate the effectiveness of our proposed segmentation framework for different types of marine oil spill SAR image segmentation.

【3】 Sublinear Time Approximation of Text Similarity Matrices 标题:文本相似矩阵的次线性时间逼近 链接:https://arxiv.org/abs/2112.09631

作者:Archan Ray,Nicholas Monath,Andrew McCallum,Cameron Musco 机构:University of Massachusetts Amherst 备注:25 pages, 10 figures 摘要:我们研究了自然语言处理中出现的成对相似矩阵的近似算法。通常,计算$n$数据点的相似性矩阵需要进行$Omega(n^2)$相似性计算。这种二次标度是一个重要的瓶颈,特别是当通过昂贵的函数(例如,通过Transformer模型)计算相似性时。近似方法降低了这种二次复杂度,通常使用精确计算的相似性的一小部分来近似完整的成对相似矩阵的其余部分。重要的工作集中在正半定(PSD)相似矩阵的有效逼近上,例如在核方法中出现的。然而,人们对NLP中经常出现的不确定(非PSD)相似矩阵的了解却少得多。由于观察到这些矩阵中的许多仍然与PSD有点接近,我们将流行的Nystr“{o}m方法推广到不确定设置。我们的算法可以应用于任何相似矩阵,并在矩阵大小的次线性时间内运行,产生秩为$s$的近似值,只有$o(ns)$相似性计算。我们表明,我们的方法,连同CUR分解的一个简单变体,在近似NLP任务中出现的各种相似矩阵方面表现得非常好。我们在文档分类、句子相似性和跨文档引用等下游任务中展示了近似相似矩阵的高精度。 摘要:We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.

【4】 A Multimodal Approach for Automatic Mania Assessment in Bipolar Disorder 标题:双相情感障碍躁狂自动评估的多模态方法 链接:https://arxiv.org/abs/2112.09467

作者:Pınar Baki 机构:B.S., Computer Engineering, Bo˘gazi¸ci University, Submitted to the Institute for Graduate Studies in, Science and Engineering in partial fulfillment of, the requirements for the degree of, Master of Science, Graduate Program in Computer Engineering 摘要:双相情感障碍是一种心理健康障碍,可导致从抑郁到躁狂的情绪波动。双相情感障碍的诊断通常基于患者访谈和患者护理者提供的报告。随后,诊断取决于专家的经验,可能会将该障碍与其他精神障碍混淆。双相情感障碍诊断中的自动化过程有助于提供定量指标,并允许对患者进行更长期的观察。此外,在COVID-19大流行期间,远程治疗和诊断的需求变得尤为重要。在这篇论文中,我们创建了一个多模态决策系统,该系统基于患者在声学、语言和视觉方面的记录。该系统在双相情感障碍语料库上进行训练。对单峰和多峰系统以及各种融合技术进行了综合分析。除了使用单峰特征处理整个患者会话外,还研究了剪辑的任务级调查。在多模态融合系统中使用声学、语言和视觉特征,我们获得了64.8%的未加权平均回忆分数,这提高了该数据集的最先进性能。 摘要:Bipolar disorder is a mental health disorder that causes mood swings that range from depression to mania. Diagnosis of bipolar disorder is usually done based on patient interviews, and reports obtained from the caregivers of the patients. Subsequently, the diagnosis depends on the experience of the expert, and it is possible to have confusions of the disorder with other mental disorders. Automated processes in the diagnosis of bipolar disorder can help providing quantitative indicators, and allow easier observations of the patients for longer periods. Furthermore, the need for remote treatment and diagnosis became especially important during the COVID-19 pandemic. In this thesis, we create a multimodal decision system based on recordings of the patient in acoustic, linguistic, and visual modalities. The system is trained on the Bipolar Disorder corpus. Comprehensive analysis of unimodal and multimodal systems, as well as various fusion techniques are performed. Besides processing entire patient sessions using unimodal features, a task-level investigation of the clips is studied. Using acoustic, linguistic, and visual features in a multimodal fusion system, we achieved a 64.8% unweighted average recall score, which improves the state-of-the-art performance achieved on this dataset.

【5】 Adaptively Customizing Activation Functions for Various Layers 标题:自适应定制各层激活功能 链接:https://arxiv.org/abs/2112.09442

作者:Haigen Hu,Aizhu Liu,Qiu Guan,Xiaoxin Li,Shengyong Chen,Qianwei Zhou 机构:College of Computer Science and Technology, Zhejiang University of Technology, Key Laboratory of Visual Media Intelligent Processing Technology of Zhejiang Province, School of Computer Science and Engineering, Tianjin University of Technology 摘要:为了增强神经网络的非线性并增强其在输入和响应变量之间的映射能力,激活函数在建模数据中更复杂的关系和模式方面起着至关重要的作用。在这项工作中,提出了一种新的方法来自适应定制激活函数,只需在传统的激活函数(如Sigmoid、Tanh和ReLU)中添加很少的参数。为了验证所提方法的有效性,本文对加速收敛和提高性能进行了一些理论和实验分析,并基于各种网络模型(如AlexNet、VGGNet、GoogLeNet、ResNet和DenseNet)进行了一系列实验,以及各种数据集(如CIFAR10、CIFAR100、miniImageNet、PASCAL VOC和COCO)。为了进一步验证在各种优化策略和使用场景中的有效性和适用性,还对不同的优化策略(如SGD、Momentum、AdaGrad、AdaDelta和ADAM)和不同的识别任务(如分类和检测)进行了对比实验。结果表明,该方法简单易行,但在收敛速度、精度和泛化能力等方面具有显著的性能,在几乎所有的实验中,该方法的整体性能都优于其他流行的方法,如ReLU和自适应函数,如Swish。该守则可于https://github.com/HuHaigen/Adaptively-Customizing-Activation-Functions.该软件包包括出于再现性目的提出的三个自适应激活函数。 摘要:To enhance the nonlinearity of neural networks and increase their mapping abilities between the inputs and response variables, activation functions play a crucial role to model more complex relationships and patterns in the data. In this work, a novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions such as Sigmoid, Tanh, and ReLU. To verify the effectiveness of the proposed methodology, some theoretical and experimental analysis on accelerating the convergence and improving the performance is presented, and a series of experiments are conducted based on various network models (such as AlexNet, VGGNet, GoogLeNet, ResNet and DenseNet), and various datasets (such as CIFAR10, CIFAR100, miniImageNet, PASCAL VOC and COCO) . To further verify the validity and suitability in various optimization strategies and usage scenarios, some comparison experiments are also implemented among different optimization strategies (such as SGD, Momentum, AdaGrad, AdaDelta and ADAM) and different recognition tasks like classification and detection. The results show that the proposed methodology is very simple but with significant performance in convergence speed, precision and generalization, and it can surpass other popular methods like ReLU and adaptive functions like Swish in almost all experiments in terms of overall performance.The code is publicly available at https://github.com/HuHaigen/Adaptively-Customizing-Activation-Functions. The package includes the proposed three adaptive activation functions for reproducibility purposes.

【6】 ML Supported Predictions for SAT Solvers Performance 标题:对SAT解算器性能的ML支持预测 链接:https://arxiv.org/abs/2112.09438

作者:A. -M. Leventi-Peetz,Jörg-Volker Peetz,Martina Rohde 机构: Federal Office for Information Security, Godesberger Allee ,–, DE-, Bonn, Germany 备注:None 摘要:为了在处理难以解决的布尔可满足性问题实例时,在多线程模式下对开源SAT解算器CryptoMiniSat的不确定性终止行为进行分类,收集并分析了内部解算器运行时参数。已选择这些参数的子集并将其用作特征向量,以成功创建一个机器学习模型,用于对解算器的终止行为进行二元分类,并对尚未求解的实例进行任何一次新的求解运行。该模型可用于解决尝试的早期估计,即属于或不属于具有快速终止机会的候选类别。在这种情况下,运行时特征的活动配置文件的组合似乎反映了解算器的瞬时启发式对解算器解析过程即时质量的影响。由于前两次求解迭代的运行时参数足以预测具有良好成功分数的尝试的终止,因此当前工作的结果提供了一个有希望的基础,可以进一步发展,以丰富CryptoMiniSat或任何具有AI能力的现代SAT解算器。 摘要:In order to classify the indeterministic termination behavior of the open source SAT solver CryptoMiniSat in multi-threading mode while processing hard to solve boolean satisfiability problem instances, internal solver runtime parameters have been collected and analyzed. A subset of these parameters has been selected and employed as features vector to successfully create a machine learning model for the binary classification of the solver's termination behavior with any single new solving run of a not yet solved instance. The model can be used for the early estimation of a solving attempt as belonging or not belonging to the class of candidates with good chances for a fast termination. In this context a combination of active profiles of runtime characteristics appear to mirror the influence of the solver's momentary heuristics on the immediate quality of the solver's resolution process. Because runtime parameters of already the first two solving iterations are enough to forecast termination of the attempt with good success scores, the results of the present work deliver a promising basis which can be further developed in order to enrich CryptoMiniSat or generally any modern SAT solver with AI abilities.

【7】 Privacy preserving n-party scalar product protocol 标题:隐私保护的n方标量积协议 链接:https://arxiv.org/abs/2112.09436

作者:Florian van Daalen,Inigo Bermejo,Lianne Ippel,Andre Dekkers 机构:GROW School for Oncology and Developmental Biology, Maastricht University Medical Centre , the Netherlands, Heerlen 摘要:保护隐私的机器学习能够在分散的数据集上训练模型,而无需在水平和垂直分区的数据上显示数据。然而,它依赖于专门的技术和算法来执行必要的计算。保护隐私的标量积协议(scalar product protocol)支持向量的点积而不暴露向量,是其多功能性的一个流行示例。不幸的是,目前文献中提出的解决方案主要集中在两方场景上,尽管数据方数量更多的场景变得更加相关。例如,当执行需要计算满足不同站点定义的特定标准的样本数的分析时,例如计算决策树中某个节点的信息增益。在本文中,我们提出了一个推广的协议为任意数目的党,基于现有的两方方法。我们提出的解决方案依赖于较小标量积的递归解析。在描述了我们提出的方法之后,我们讨论了潜在的可伸缩性问题。最后,我们描述了隐私保证并确定了任何关注点,并在这方面将所提出的方法与原始解决方案进行了比较。 摘要:Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data, both on horizontal and vertically partitioned data. However, it relies on specialized techniques and algorithms to perform the necessary computations. The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility. Unfortunately, the solutions currently proposed in the literature focus mainly on two-party scenarios, even though scenarios with a higher number of data parties are becoming more relevant. For example when performing analyses that require counting the number of samples which fulfill certain criteria defined across various sites, such as calculating the information gain at a node in a decision tree. In this paper we propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method. Our proposed solution relies on a recursive resolution of smaller scalar products. After describing our proposed method, we discuss potential scalability issues. Finally, we describe the privacy guarantees and identify any concerns, as well as comparing the proposed method to the original solution in this aspect.

【8】 Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem 标题:离散化和再综合:解决鸡尾酒会问题的另一种方法 链接:https://arxiv.org/abs/2112.09382

作者:Jing Shi,Xuankai Chang,Tomoki Hayashi,Yen-Ju Lu,Shinji Watanabe,Bo Xu 机构:Institute of Automation, Chinese Academy of Sciences (CASIA),Carnegie Mellon University, Nagoya University,Academia Sinica,Human Dataware Lab. Co., Ltd. 备注:5 pages, this https URL 摘要:基于深度学习的模型极大地提高了混合输入(如鸡尾酒会)的语音分离性能。突出的方法(例如,频域和时域语音分离)通常使用基于掩蔽的设计和信号电平损失准则(例如,MSE或SI-SNR)建立回归模型,从混合物中预测地面真实语音。这项研究首次表明,基于综合的方法也可以很好地解决这个问题,具有很大的灵活性和巨大的潜力。具体来说,我们提出了一种新的基于离散符号识别的语音分离/增强模型,并将语音分离/增强相关任务的范式从回归转换为分类。利用输入离散符号的合成模型,对离散符号序列进行预测后,对每个目标语音进行再合成。基于WSJ0-2mix和VCTK噪声语料库在不同环境下的评估结果表明,我们提出的方法能够稳定地合成分离语音,语音质量高,没有任何干扰,这在基于回归的方法中是难以避免的。此外,在听力质量损失很小的情况下,我们的方法可以很容易地实现增强/分离语音的说话人转换。 摘要:Deep learning based models have significantly improved the performance of speech separation with input mixtures like the cocktail party. Prominent methods (e.g., frequency-domain and time-domain speech separation) usually build regression models to predict the ground-truth speech from the mixture, using the masking-based design and the signal-level loss criterion (e.g., MSE or SI-SNR). This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem, with great flexibility and strong potential. Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols, and convert the paradigm of the speech separation/enhancement related tasks from regression to classification. By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized. Evaluation results based on the WSJ0-2mix and VCTK-noisy corpora in various settings show that our proposed method can steadily synthesize the separated speech with high speech quality and without any interference, which is difficult to avoid in regression-based methods. In addition, with negligible loss of listening quality, the speaker conversion of enhanced/separated speech could be easily realized through our method.

【9】 Balancing Fairness and Robustness via Partial Invariance 标题:利用部分不变性平衡公平性和稳健性 链接:https://arxiv.org/abs/2112.09346

作者:Moulik Choraria,Ibtihal Ferwana,Ankur Mani,Lav R. Varshney 机构:University of Illinois at Urbana-Champaign, University of Minnesota, Twin Cities 摘要:不变风险最小化(IRM)框架旨在从一组环境中学习不变特征,以解决分布外(OOD)泛化问题。基本假设是,数据生成分布的因果成分在整个环境中保持不变,或者,数据在整个环境中“重叠”以找到有意义的不变特征。因此,当“重叠”假设不成立时,真正不变的特征集可能不足以实现最佳预测性能。这种情况自然出现在网络环境和分层数据生成模型中,其中IRM性能变得不理想。为了减轻这种失败的情况,我们主张使用部分不变性框架。其关键思想是通过基于层次结构差异对环境进行分区,同时在分区内局部实施不变性,从而为IRM框架引入灵活性。我们在分类设置中利用因果分布在不同环境中的变化来激励这个框架。我们的结果显示了部分不变风险最小化的能力,以减轻公平性和风险之间的权衡在某些情况下。 摘要:The Invariant Risk Minimization (IRM) framework aims to learn invariant features from a set of environments for solving the out-of-distribution (OOD) generalization problem. The underlying assumption is that the causal components of the data generating distributions remain constant across the environments or alternately, the data "overlaps" across environments to find meaningful invariant features. Consequently, when the "overlap" assumption does not hold, the set of truly invariant features may not be sufficient for optimal prediction performance. Such cases arise naturally in networked settings and hierarchical data-generating models, wherein the IRM performance becomes suboptimal. To mitigate this failure case, we argue for a partial invariance framework. The key idea is to introduce flexibility into the IRM framework by partitioning the environments based on hierarchical differences, while enforcing invariance locally within the partitions. We motivate this framework in classification settings with causal distribution shifts across environments. Our results show the capability of the partial invariant risk minimization to alleviate the trade-off between fairness and risk in certain settings.

【10】 Personalized On-Device E-health Analytics with Decentralized Block Coordinate Descent 标题:去中心化挡路坐标下降的个性化设备上电子健康分析 链接:https://arxiv.org/abs/2112.09341

作者:Guanhua Ye,Hongzhi Yin,Tong Chen,Miao Xu,Quoc Viet Hung Nguyen,Jiangning Song 摘要:由于人们越来越关注个人医疗保健和流感大流行,电子健康的普及率正在激增。如今,通过机器学习模型增强医疗诊断在电子健康分析的许多方面都非常有效。然而,在经典的基于云/集中式电子健康模式中,所有数据都将集中存储在服务器上,以便于模型训练,这不可避免地会带来隐私问题和高时间延迟。提出了分散随机梯度下降(D-SGD)等分布式解决方案,以基于个人设备提供安全、及时的诊断结果。然而,像D-SGD这样的方法受到梯度消失问题的影响,通常在早期训练阶段进展缓慢,从而阻碍了训练的有效性和效率。此外,现有方法容易学习偏向数据密集用户的模型,在为少数群体提供电子健康分析时会影响公平性。在本文中,我们提出了一个分散块坐标下降(D-BCD)学习框架,该框架可以更好地优化分布在分散设备上用于电子健康分析的基于深度神经网络的模型。在三个真实数据集上的基准测试实验说明了我们提出的D-BCD的有效性和实用性,额外的模拟研究表明D-BCD在现实生活中的电子健康场景中具有很强的适用性。 摘要:Actuated by the growing attention to personal healthcare and the pandemic, the popularity of E-health is proliferating. Nowadays, enhancement on medical diagnosis via machine learning models has been highly effective in many aspects of e-health analytics. Nevertheless, in the classic cloud-based/centralized e-health paradigms, all the data will be centrally stored on the server to facilitate model training, which inevitably incurs privacy concerns and high time delay. Distributed solutions like Decentralized Stochastic Gradient Descent (D-SGD) are proposed to provide safe and timely diagnostic results based on personal devices. However, methods like D-SGD are subject to the gradient vanishing issue and usually proceed slowly at the early training stage, thereby impeding the effectiveness and efficiency of training. In addition, existing methods are prone to learning models that are biased towards users with dense data, compromising the fairness when providing E-health analytics for minority groups. In this paper, we propose a Decentralized Block Coordinate Descent (D-BCD) learning framework that can better optimize deep neural network-based models distributed on decentralized devices for E-health analytics. Benchmarking experiments on three real-world datasets illustrate the effectiveness and practicality of our proposed D-BCD, where additional simulation study showcases the strong applicability of D-BCD in real-life E-health scenarios.

【11】 WebGPT: Browser-assisted question-answering with human feedback 标题:WebGPT:具有人工反馈的浏览器辅助问答 链接:https://arxiv.org/abs/2112.09332

作者:Reiichiro Nakano,Jacob Hilton,Suchir Balaji,Jeff Wu,Long Ouyang,Christina Kim,Christopher Hesse,Shantanu Jain,Vineet Kosaraju,William Saunders,Xu Jiang,Karl Cobbe,Tyna Eloundou,Gretchen Krueger,Kevin Button,Matthew Knight,Benjamin Chess,John Schulman 机构:OpenAI 备注:30 pages 摘要:我们使用基于文本的web浏览环境对GPT-3进行微调,以回答长格式的问题,该环境允许模型搜索和浏览web。通过将任务设置为可由人工执行,我们可以使用模仿学习对任务模型进行训练,然后通过人工反馈优化答案质量。为了使人类更容易评估事实的准确性,模型必须在浏览时收集参考资料以支持其答案。我们在ELI5(Reddit用户提出的问题数据集)上训练和评估我们的模型。我们的最佳模型是通过使用行为克隆对GPT-3进行微调,然后对经过训练以预测人类偏好的奖励模型进行拒绝抽样来获得的。这个模型的答案在56%的时间里是人类的首选答案,而在69%的时间里是来自Reddit的投票率最高的答案。 摘要:We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

【12】 Gaussian RBF Centered Kernel Alignment (CKA) in the Large Bandwidth Limit 标题:大带宽限制下的高斯径向基核对齐(CKA) 链接:https://arxiv.org/abs/2112.09305

作者:Sergio A. Alvarez 机构:Department of Computer Science, Boston College, Chestnut Hill, MA , USA 备注:11 pages, 3 figures 摘要:我们证明了基于高斯径向基函数核的中心核对齐(CKA)在大带宽限制下收敛到线性CKA。我们证明了收敛起始点对特征表示的几何结构敏感,并且表示偏心率限制了高斯CKA非线性行为的带宽范围。 摘要:We prove that Centered Kernel Alignment (CKA) based on a Gaussian RBF kernel converges to linear CKA in the large-bandwidth limit. We show that convergence onset is sensitive to the geometry of the feature representations, and that representation eccentricity bounds the range of bandwidths for which Gaussian CKA behaves nonlinearly.

【13】 PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision 标题:PeopleSansPeople:以人为中心的计算机视觉合成数据生成器 链接:https://arxiv.org/abs/2112.09290

作者:Salehe Erfanian Ebadi,You-Cyuan Jhang,Alex Zook,Saurav Dhakad,Adam Crespi,Pete Parisi,Steven Borkman,Jonathan Hogins,Sujoy Ganguly 机构:Unity Technologies 备注:PeopleSansPeople template Unity environment, benchmark binaries, and source code is available at: this https URL 摘要:近年来,在大规模标记数据集的帮助下,人体检测和姿态估计取得了长足的进步。然而,这些数据集无法保证或分析人类活动、姿势或环境多样性。此外,隐私、法律、安全和道德问题可能会限制收集更多人类数据的能力。一种新兴的替代现实世界数据的方法是合成数据,它可以缓解其中的一些问题。然而,合成数据生成器的创建极具挑战性,阻碍了研究人员对其有用性的探索。因此,我们发布了一个以人为中心的合成数据生成器PeopleSansPeople,其中包含模拟就绪的3D人体资源、参数化照明和摄像系统,并生成2D和3D边界框、实例和语义分割以及COCO姿势标签。使用PeopleSansPeople,我们使用Detectron2关键点R-CNN变体[1]执行基准合成数据训练。我们发现,使用合成数据对网络进行预训练,并对目标真实世界数据进行微调(少量镜头传输到有限的COCO person train子集[2]),导致keypoint AP为$60.37pm 0.48$(COCO test-dev2017),优于仅使用相同真实数据训练(keypoint AP为$55.80$)并使用ImageNet进行预训练的模型(关键点AP为57.50美元)。这种免费提供的数据生成器应该能够在以人为中心的计算机视觉的关键领域,对新兴的仿真领域进行广泛的研究。 摘要:In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using synthetic data and fine-tuning on target real-world data (few-shot transfer to limited subsets of COCO-person train [2]) resulted in a keypoint AP of $60.37 pm 0.48$ (COCO test-dev2017) outperforming models trained with the same real data alone (keypoint AP of $55.80$) and pre-trained with ImageNet (keypoint AP of $57.50$). This freely-available data generator should enable a wide range of research into the emerging field of simulation to real transfer learning in the critical area of human-centric computer vision.

【14】 Sim2Real Docs: Domain Randomization for Documents in Natural Scenes using Ray-traced Rendering 标题:Sim2Real Docs:基于光线跟踪渲染的自然场景中文档的域随机化 链接:https://arxiv.org/abs/2112.09220

作者:Nikhil Maddikunta,Huijun Zhao,Sumit Keswani,Alfy Samuel,Fu-Ming Guo,Nishan Srishankar,Vishwa Pardeshi,Austin Huang 机构:Fidelity Investments, Artificial Intelligence Center of Excellence 备注:Accepted to Neurips 2021 Data Centric AI (DCAI) Workshop 摘要:过去,数字化文档的计算机视觉系统可以依赖于系统捕获的高质量扫描。如今,涉及数字文档的交易更可能从非专业人士上传手机照片开始。因此,用于文档自动化的计算机视觉现在必须考虑在自然场景上下文中捕获的文档。另一个挑战是,文档处理的任务目标可能是高度特定于用例的,这使得公共可用数据集的实用性受到限制,而手动数据标记的成本也很高,并且在用例之间的转换很差。为了解决这些问题,我们创建了Sim2Real Docs—一个用于合成数据集和在自然场景中执行文档领域随机化的框架。Sim2Real Docs支持使用Blender(一种用于三维建模和光线跟踪渲染的开源工具)对文档进行编程式三维渲染。通过使用模拟灯光、几何体、相机和背景的物理交互的渲染,我们在自然场景上下文中合成文档数据集。每个渲染都与特定于用例的地面真实数据配对,指定感兴趣的潜在特征,生成无限适合任务训练数据。然后,机器学习模型的作用是解决由渲染管道引起的反问题。通过微调或调整领域随机化参数,此类模型可以进一步使用真实数据进行迭代。 摘要:In the past, computer vision systems for digitized documents could rely on systematically captured, high-quality scans. Today, transactions involving digital documents are more likely to start as mobile phone photo uploads taken by non-professionals. As such, computer vision for document automation must now account for documents captured in natural scene contexts. An additional challenge is that task objectives for document processing can be highly use-case specific, which makes publicly-available datasets limited in their utility, while manual data labeling is also costly and poorly translates between use cases. To address these issues we created Sim2Real Docs - a framework for synthesizing datasets and performing domain randomization of documents in natural scenes. Sim2Real Docs enables programmatic 3D rendering of documents using Blender, an open source tool for 3D modeling and ray-traced rendering. By using rendering that simulates physical interactions of light, geometry, camera, and background, we synthesize datasets of documents in a natural scene context. Each render is paired with use-case specific ground truth data specifying latent characteristics of interest, producing unlimited fit-for-task training data. The role of machine learning models is then to solve the inverse problem posed by the rendering pipeline. Such models can be further iterated upon with real-world data by either fine tuning or making adjustments to domain randomization parameters.

【15】 Sparse Coding with Multi-Layer Decoders using Variance Regularization 标题:基于方差正则化的多层译码器稀疏编码 链接:https://arxiv.org/abs/2112.09214

作者:Katrina Evtimova,Yann LeCun 机构:Center for Data Science, New York University, Courant Institute, New York University, Meta AI Research 摘要:使用$l_1$惩罚和学习线性字典的稀疏编码需要对字典进行正则化,以防止代码的$l_1$范数崩溃。通常,这种正则化需要限定字典元素的欧几里德规范。在这项工作中,我们提出了一种新的稀疏编码协议,它可以在不需要正则化解码器的情况下防止代码崩溃。我们的方法直接正则化代码,使每个潜在代码分量在给定输入集的一组稀疏表示上的方差大于固定阈值。此外,我们还探索了使用多层解码器有效训练稀疏编码系统的方法,因为它们可以建模比线性字典更复杂的关系。在我们对MNIST和自然图像块的实验中,我们表明用我们的方法学习的解码器在线性和多层情况下都具有可解释的特征。此外,我们还表明,与使用线性字典的自动编码器相比,使用我们的方差正则化方法训练的具有多层解码器的稀疏自动编码器使用稀疏表示生成更高质量的重建。此外,使用我们的方差正则化方法获得的稀疏表示在低数据区域的去噪和分类的下游任务中是有用的。 摘要:Sparse coding with an $l_1$ penalty and a learned linear dictionary requires regularization of the dictionary to prevent a collapse in the $l_1$ norms of the codes. Typically, this regularization entails bounding the Euclidean norms of the dictionary's elements. In this work, we propose a novel sparse coding protocol which prevents a collapse in the codes without the need to regularize the decoder. Our method regularizes the codes directly so that each latent code component has variance greater than a fixed threshold over a set of sparse representations for a given set of inputs. Furthermore, we explore ways to effectively train sparse coding systems with multi-layer decoders since they can model more complex relationships than linear dictionaries. In our experiments with MNIST and natural image patches, we show that decoders learned with our approach have interpretable features both in the linear and multi-layer case. Moreover, we show that sparse autoencoders with multi-layer decoders trained using our variance regularization method produce higher quality reconstructions with sparser representations when compared to autoencoders with linear dictionaries. Additionally, sparse representations obtained with our variance regularization approach are useful in the downstream tasks of denoising and classification in the low-data regime.

【16】 Mitigating the Bias of Centered Objects in Common Datasets 标题:减轻公共数据集中居中对象的偏差 链接:https://arxiv.org/abs/2112.09195

作者:Gergely Szabo,Andras Horvath 机构: Andr´as Horv´athPeter Pazmany Catholic University Faculty of Information Technology and BionicsBudapest 摘要:卷积网络被认为是移位不变性的,但事实证明,它们的响应可能会根据对象的确切位置而变化。在本文中,我们将证明最常见的调查数据集有一个偏差,即在训练过程中,对象在图像中心过度表示。这种偏差和这些网络的边界条件会对这些体系结构的性能产生重大影响,当对象接近边界时,其精度会显著下降。我们还将演示如何使用数据增强技术减轻这种影响。 摘要:Convolutional networks are considered shift invariant, but it was demonstrated that their response may vary according to the exact location of the objects. In this paper we will demonstrate that most commonly investigated datasets have a bias, where objects are over-represented at the center of the image during training. This bias and the boundary condition of these networks can have a significant effect on the performance of these architectures and their accuracy drops significantly as an object approaches the boundary. We will also demonstrate how this effect can be mitigated with data augmentation techniques.

【17】 On Optimizing Interventions in Shared Autonomy 标题:论共享自治中的优化干预 链接:https://arxiv.org/abs/2112.09169

作者:Weihao Tan,David Koleczek,Siddhant Pradhan,Nicholas Perello,Vivek Chettiar,Vishal Rohra,Aaslesha Rajaram,Soundararajan Srinivasan,H M Sajjad Hossain,Yash Chandak 机构: University of Massachusetts Amherst, Microsoft, MassMutual Data Science 备注:Accepted by AAAI2022 摘要:共享自治是指使自治代理能够与人协作以提高人的绩效的方法。然而,除了提高性能之外,代理同时负责保存用户的体验或协作满意度通常也是有益的。为了实现这一额外目标,我们研究了通过限制自主代理的干预次数来改善用户体验的方法。我们提出了两种无模型强化学习方法,可以同时考虑干预数量的硬约束和软约束。我们表明,我们的方法不仅优于现有的基线,而且不需要手动调整黑盒超参数来控制援助水平。我们还对干预场景进行了深入分析,以进一步阐明对系统的理解。 摘要:Shared autonomy refers to approaches for enabling an autonomous agent to collaborate with a human with the aim of improving human performance. However, besides improving performance, it may often also be beneficial that the agent concurrently accounts for preserving the user's experience or satisfaction of collaboration. In order to address this additional goal, we examine approaches for improving the user experience by constraining the number of interventions by the autonomous agent. We propose two model-free reinforcement learning methods that can account for both hard and soft constraints on the number of interventions. We show that not only does our method outperform the existing baseline, but also eliminates the need to manually tune a black-box hyperparameter for controlling the level of assistance. We also provide an in-depth analysis of intervention scenarios in order to further illuminate system understanding.

【18】 Correlated Product of Experts for Sparse Gaussian Process Regression 标题:稀疏高斯过程回归的专家相关积 链接:https://arxiv.org/abs/2112.09519

作者:Manuel Schürch,Dario Azzimonti,Alessio Benavoli,Marco Zaffalon 机构:Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA), Lugano, Switzerland., Universita della Svizzera italiana (USI), Lugano, Switzerland., University of Limerick (UL), Limerick, Ireland. 摘要:高斯过程(GPs)是机器学习和统计学的重要工具,其应用范围从社会科学、自然科学到工程。它们构成了一种功能强大的核化非参数方法,具有经过良好校准的不确定性估计,然而,由于其立方计算复杂性,现成的GP推理程序仅限于具有数千个数据点的数据集。由于这个原因,许多稀疏GPs技术在过去几年中得到了发展。在本文中,我们关注GP回归任务,并提出了一种新的方法,该方法基于聚合来自多个本地和相关专家的预测。因此,专家之间的相关性程度可以在独立专家和完全相关的专家之间变化。将专家的个人预测进行汇总,并考虑其相关性,从而得出一致的不确定性估计。我们的方法在极限情况下恢复了专家、稀疏GP和完全GP的独立乘积。该框架可以处理一般的核函数和多个变量,并且在时间和空间上的复杂性与专家和数据样本的数量呈线性关系,这使得我们的方法具有高度的可扩展性。我们展示了我们所提出的方法在时间和精度方面的优越性能,与最先进的GP近似方法相比,该方法适用于合成数据集以及多个具有确定性和随机优化的真实数据集。 摘要:Gaussian processes (GPs) are an important tool in machine learning and statistics with applications ranging from social and natural science through engineering. They constitute a powerful kernelized non-parametric method with well-calibrated uncertainty estimates, however, off-the-shelf GP inference procedures are limited to datasets with several thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques have been developed over the past years. In this paper, we focus on GP regression tasks and propose a new approach based on aggregating predictions from several local and correlated experts. Thereby, the degree of correlation between the experts can vary between independent up to fully correlated experts. The individual predictions of the experts are aggregated taking into account their correlation resulting in consistent uncertainty estimates. Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases. The presented framework can deal with a general kernel function and multiple variables, and has a time and space complexity which is linear in the number of experts and data samples, which makes our approach highly scalable. We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against state-of-the-art GP approximation methods for synthetic as well as several real-world datasets with deterministic and stochastic optimization.

【19】 Colloquium: Advances in automation of quantum dot devices control 标题:座谈会:量子点器件控制自动化的进展 链接:https://arxiv.org/abs/2112.09362

作者:Justyna P. Zwolak,Jacob M. Taylor 机构:National Institute of Standards and Technology, Gaithersburg, Maryland , USA, Joint Quantum Institute, Joint Center for Quantum Information and Computer Science, University of Maryland, College Park 备注:19 pages, 9 figures 摘要:量子点阵列(QD)是实现可伸缩耦合量子比特系统的一个很有前途的候选系统,也是量子计算机的基本构造块。在这样的半导体量子系统中,器件现在有几十个单独的静电和动态电压,必须仔细设置,以将系统定位到单电子区域,并实现良好的量子比特操作性能。必要的点位置和电荷到栅极电压的映射是一个具有挑战性的经典控制问题。随着量子点量子位数量的增加,相关的参数空间变得足够大,使得启发式控制变得不可行。近年来,将基于脚本的算法与机器学习(ML)技术相结合,在自动化设备控制方面做出了相当大的努力。在本次座谈会中,我们全面概述了量子点器件控制自动化的最新进展,特别强调了二维电子气体中形成的硅基和砷化镓基量子点。将基于物理的建模与现代数值优化和ML相结合,已被证明在产生高效、可扩展的控制方面相当有效。理论、计算和实验工作与计算机科学和ML的进一步整合在推进半导体和其他量子计算平台方面具有巨大潜力。 摘要:Arrays of quantum dots (QDs) are a promising candidate system to realize scalable, coupled qubits systems and serve as a fundamental building block for quantum computers. In such semiconductor quantum systems, devices now have tens of individual electrostatic and dynamical voltages that must be carefully set to localize the system into the single-electron regime and to realize good qubit operational performance. The mapping of requisite dot locations and charges to gate voltages presents a challenging classical control problem. With an increasing number of QD qubits, the relevant parameter space grows sufficiently to make heuristic control unfeasible. In recent years, there has been a considerable effort to automate device control that combines script-based algorithms with machine learning (ML) techniques. In this Colloquium, we present a comprehensive overview of the recent progress in the automation of QD device control, with a particular emphasis on silicon- and GaAs-based QDs formed in two-dimensional electron gases. Combining physics-based modeling with modern numerical optimization and ML has proven quite effective in yielding efficient, scalable control. Further integration of theoretical, computational, and experimental efforts with computer science and ML holds tremendous potential in advancing semiconductor and other platforms for quantum computing.

机器翻译,仅供参考

0 人点赞