机器学习学术速递[6.17]

2021-07-02 18:42:41 浏览数 (1)

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.LG 方向,今日共计138篇

Graph相关(图学习|图神经网络|图优化等)(10篇)

【1】 Data Augmentation for Graph Convolutional Network on Semi-Supervised Classification 标题:基于半监督分类的图卷积网络数据增强

作者:Zhengzheng Tang,Ziyue Qiao,Xuehai Hong,Yang Wang,Fayaz Ali Dharejo,Yuanchun Zhou,Yi Du 备注:16 pages, 6 figures,APWeb-WAIM 2021: The 5th APWeb-WAIM International Joint Conference on Web and Big Data 链接:https://arxiv.org/abs/2106.08848 摘要:数据扩充的目的是从原始数据中生成新的、综合的特征,从而更好地表示数据,提高下游任务的性能和可推广性。然而,基于图的模型的数据扩充仍然是一个具有挑战性的问题,因为图数据比传统数据更为复杂,而传统数据由两个不同属性的特征组成:图拓扑和节点属性。在改进半监督节点分类的节点嵌入的背景下,研究了图卷积网络(GCN)的图数据扩充问题。具体地说,我们对原始特征进行基于余弦相似度的交叉操作来创建新的图特征,包括新的节点属性和新的图拓扑,并将它们组合为特定gcn的新成对输入。然后,我们提出一个注意整合模型,将这些GCN编码的隐藏节点嵌入加权和成最终的节点嵌入。我们也在训练时对这些隐藏节点嵌入进行视差约束,以确保从不同的特征中捕获非冗余信息。在5个真实数据集上的实验结果表明,该方法比原始GCN模型提高了分类精度,且有明显的边缘( 2.5%~ 84.2%)。 摘要:Data augmentation aims to generate new and synthetic features from the original data, which can identify a better representation of data and improve the performance and generalizability of downstream tasks. However, data augmentation for graph-based models remains a challenging problem, as graph data is more complex than traditional data, which consists of two features with different properties: graph topology and node attributes. In this paper, we study the problem of graph data augmentation for Graph Convolutional Network (GCN) in the context of improving the node embeddings for semi-supervised node classification. Specifically, we conduct cosine similarity based cross operation on the original features to create new graph features, including new node attributes and new graph topologies, and we combine them as new pairwise inputs for specific GCNs. Then, we propose an attentional integrating model to weighted sum the hidden node embeddings encoded by these GCNs into the final node embeddings. We also conduct a disparity constraint on these hidden node embeddings when training to ensure that non-redundant information is captured from different features. Experimental results on five real-world datasets show that our method improves the classification accuracy with a clear margin ( 2.5% - 84.2%) than the original GCN model.

【2】 PRASEMap: A Probabilistic Reasoning and Semantic Embedding based Knowledge Graph Alignment System 标题:PRASEMap:一种基于概率推理和语义嵌入的知识图对齐系统

作者:Zhiyuan Qi,Ziheng Zhang,Jiaoyan Chen,Xi Chen,Yefeng Zheng 链接:https://arxiv.org/abs/2106.08801 摘要:知识图对齐的目的是寻找两个知识图之间的等价实体和关系(即映射)。现有的方法要么利用基于推理的技术,要么利用基于语义嵌入的技术,但很少有研究探讨它们的结合。在这个演示中,我们介绍了PRASEMap,一个无监督的KG对齐系统,它使用概率推理(PR)和语义嵌入(SE)技术迭代计算映射。PRASEMap可以作为SE模块支持各种基于嵌入的KG对齐方法,并且支持简单的人机交互,这还为用户提供了一个选项,可以将映射注释反馈给系统以获得更好的结果。该演示通过一个具有用户友好界面的独立Web应用程序展示了这些功能。 摘要:Knowledge Graph (KG) alignment aims at finding equivalent entities and relations (i.e., mappings) between two KGs. The existing approaches utilize either reasoning-based or semantic embedding-based techniques, but few studies explore their combination. In this demonstration, we present PRASEMap, an unsupervised KG alignment system that iteratively computes the Mappings with both Probabilistic Reasoning (PR) And Semantic Embedding (SE) techniques. PRASEMap can support various embedding-based KG alignment approaches as the SE module, and enables easy human computer interaction that additionally provides an option for users to feed the mapping annotations back to the system for better results. The demonstration showcases these features via a stand-alone Web application with user friendly interfaces.

【3】 Counterfactual Graphs for Explainable Classification of Brain Networks 标题:用于脑网络可解释分类的反事实图

作者:Carlo Abrate,Francesco Bonchi 备注:In proceedings of KDD 2021 链接:https://arxiv.org/abs/2106.08640 摘要:训练能够区分健康大脑和功能失调大脑的图形分类器,可以帮助识别与特定认知表型相关的子结构。然而,仅仅是预测能力的图形分类器是有限的兴趣神经科学家,他们有大量的工具来诊断特定的精神障碍。重要的是模型的解释,因为它可以提供新的见解和新的假设。在本文中,我们提出emph{反事实图}作为一种产生任何黑盒图分类器的局部事后解释的方法。给定一个图和一个黑匣子,反事实是一个图,它虽然与原图有很高的结构相似性,但被黑匣子分类在不同的类中。我们提出并实证比较了几种反事实图搜索策略。对已知最优反事实的白盒分类器的实验表明,虽然我们的方法是启发式的,但能产生非常接近最优的反事实。最后,我们展示如何使用反事实图来建立正确捕捉不同黑盒分类器行为的全局解释,并为神经科学家提供有趣的见解。 摘要:Training graph classifiers able to distinguish between healthy brains and dysfunctional ones, can help identifying substructures associated to specific cognitive phenotypes. However, the mere predictive power of the graph classifier is of limited interest to the neuroscientists, which have plenty of tools for the diagnosis of specific mental disorders. What matters is the interpretation of the model, as it can provide novel insights and new hypotheses. In this paper we propose emph{counterfactual graphs} as a way to produce local post-hoc explanations of any black-box graph classifier. Given a graph and a black-box, a counterfactual is a graph which, while having high structural similarity with the original graph, is classified by the black-box in a different class. We propose and empirically compare several strategies for counterfactual graph search. Our experiments against a white-box classifier with known optimal counterfactual, show that our methods, although heuristic, can produce counterfactuals very close to the optimal one. Finally, we show how to use counterfactual graphs to build global explanations correctly capturing the behaviour of different black-box classifiers and providing interesting insights for the neuroscientists.

【4】 Adaptive Visibility Graph Neural Network and It's Application in Modulation Classification 标题:自适应可见度图神经网络及其在调制识别中的应用

作者:Qi Xuan,Kunfeng Qiu,Jinchao Zhou,Zhuangzhi Chen,Dongwei Xu,Shilian Zheng,Xiaoniu Yang 链接:https://arxiv.org/abs/2106.08564 摘要:我们的数字世界充满了捕捉许多复杂系统各个方面的时间序列和图形。传统上,处理这两种不同类型的数据有各自的方法,如递归神经网络(RNN)和图神经网络(GNN),而近年来,时间序列可以通过可视图(VG)等技术映射到图上,因此,研究人员可以使用图算法来挖掘时间序列中的知识。这种映射方法在时间序列和图形之间架起了一座桥梁,对分析各种真实的时间序列具有很大的潜力。然而,VG方法及其变种只是基于固定的规则,缺乏灵活性,极大地限制了其在现实中的应用。本文提出了一种自适应可见性图(AVG)算法,该算法能够将时间序列自适应地映射成图,并在此基础上利用GNN模型DiffPool作为分类器,进一步建立了AVGNet的端到端分类框架。然后采用AVGNet进行无线信号调制分类,这是无线通信领域的一个重要课题。仿真结果验证了AVGNet的性能优于一系列先进的深度学习方法,达到了该任务的最新水平。 摘要:Our digital world is full of time series and graphs which capture the various aspects of many complex systems. Traditionally, there are respective methods in processing these two different types of data, e.g., Recurrent Neural Network (RNN) and Graph Neural Network (GNN), while in recent years, time series could be mapped to graphs by using the techniques such as Visibility Graph (VG), so that researchers can use graph algorithms to mine the knowledge in time series. Such mapping methods establish a bridge between time series and graphs, and have high potential to facilitate the analysis of various real-world time series. However, the VG method and its variants are just based on fixed rules and thus lack of flexibility, largely limiting their application in reality. In this paper, we propose an Adaptive Visibility Graph (AVG) algorithm that can adaptively map time series into graphs, based on which we further establish an end-to-end classification framework AVGNet, by utilizing GNN model DiffPool as the classifier. We then adopt AVGNet for radio signal modulation classification which is an important task in the field of wireless communication. The simulations validate that AVGNet outperforms a series of advanced deep learning methods, achieving the state-of-the-art performance in this task.

【5】 Fast Quantum Property Prediction via Deeper 2D and 3D Graph Networks 标题:基于更深层次二维和三维图网络的快速量子性质预测

作者:Meng Liu,Cong Fu,Xuan Zhang,Limei Wang,Yaochen Xie,Hao Yuan,Youzhi Luo,Zhao Xu,Shenglong Xu,Shuiwang Ji 备注:One of the winners of 2021 KDD Cup on OGB Large-Scale Challenge 链接:https://arxiv.org/abs/2106.08551 摘要:分子性质预测因其应用的多样性而受到越来越多的关注。一个特别有趣和重要的任务是预测没有三维平衡结构的量子化学性质。这实际上是有利的,因为获得三维平衡结构需要非常昂贵的计算。在这项工作中,我们设计了一个深度图神经网络,通过直接从二维分子图学习来预测量子性质。此外,我们还提出了一种三维图形神经网络来学习低成本的一致性集合,这种一致性集合可以通过开源工具获得,并且成本低廉。我们利用我们的方法参加了2021KDD杯OGB大挑战赛(OGB-LSC),旨在预测分子的HOMO-LUMO能隙。最后的评估结果显示,我们是其中一个赢家的平均绝对误差为0.1235的坚持测试集。我们的实现是作为MoleculeX包的一部分提供的(https://github.com/divelab/MoleculeX). 摘要:Molecular property prediction is gaining increasing attention due to its diverse applications. One task of particular interests and importance is to predict quantum chemical properties without 3D equilibrium structures. This is practically favorable since obtaining 3D equilibrium structures requires extremely expensive calculations. In this work, we design a deep graph neural network to predict quantum properties by directly learning from 2D molecular graphs. In addition, we propose a 3D graph neural network to learn from low-cost conformer sets, which can be obtained with open-source tools using an affordable budget. We employ our methods to participate in the 2021 KDD Cup on OGB Large-Scale Challenge (OGB-LSC), which aims to predict the HOMO-LUMO energy gap of molecules. Final evaluation results reveal that we are one of the winners with a mean absolute error of 0.1235 on the holdout test set. Our implementation is available as part of the MoleculeX package (https://github.com/divelab/MoleculeX).

【6】 Distilling Self-Knowledge From Contrastive Links to Classify Graph Nodes Without Passing Messages 标题:从对比链接中提取自知识实现不传递消息的图节点分类

作者:Yi Luo,Aiguo Chen,Ke Yan,Ling Tian 备注:9 pages, 2 figures, 4 tables 链接:https://arxiv.org/abs/2106.08541 摘要:目前,基于消息传递范式的图神经网络(GNNs)已成为图形数据学习的主流方法。这种模式中的模型必须花费额外的空间来查找具有邻接矩阵的相邻节点,并花费额外的时间来聚合来自相邻节点的多个消息。为了解决这个问题,我们开发了一种称为LinkDist的方法,该方法将连接节点对中的自知识提取到一个多层感知器(MLP)中,而不需要聚合消息。在8个真实数据集上的实验表明,基于LinkDist的MLP可以在不知道节点邻接的情况下预测节点的标签,但在半监督和全监督节点分类的情况下,其精度与GNNs相当。此外,LinkDist还得益于它的非消息传递范式,即我们可以通过对比的方式从任意采样的节点对中提取自我知识,从而进一步提高LinkDist的性能。 摘要:Nowadays, Graph Neural Networks (GNNs) following the Message Passing paradigm become the dominant way to learn on graphic data. Models in this paradigm have to spend extra space to look up adjacent nodes with adjacency matrices and extra time to aggregate multiple messages from adjacent nodes. To address this issue, we develop a method called LinkDist that distils self-knowledge from connected node pairs into a Multi-Layer Perceptron (MLP) without the need to aggregate messages. Experiment with 8 real-world datasets shows the MLP derived from LinkDist can predict the label of a node without knowing its adjacencies but achieve comparable accuracy against GNNs in the contexts of semi- and full-supervised node classification. Moreover, LinkDist benefits from its Non-Message Passing paradigm that we can also distil self-knowledge from arbitrarily sampled node pairs in a contrastive way to further boost the performance of LinkDist.

【7】 SEEN: Sharpening Explanations for Graph Neural Networks using Explanations from Neighborhoods 标题:SEW:使用邻域解释对图神经网络进行锐化解释

作者:Hyeoncheol Cho,Youngrock Oh,Eunjoo Jeon 备注:15 pages, 3 figures 链接:https://arxiv.org/abs/2106.08532 摘要:解释从图神经网络(GNNs)获得的预测的基础对于真实世界问题的GNN模型的可靠使用是至关重要的。由于GNN应用的迅速增长,最近在解释GNN预测方面的进展,如敏感性分析、摄动方法和归因方法,显示了解释GNN预测的巨大机会和可能性。在本研究中,我们提出一种方法来改善节点分类任务的解释品质,该方法可以透过聚集来自重要邻近节点的辅助解释(SEEN)以事后的方式应用。应用SEEN不需要对图进行修改,并且由于其独立的机制,可以与各种解释技术一起使用。在一个给定图形中匹配motif参与节点的实验表明,该方法的解释准确率提高了12.71%,并证明了辅助解释与利用它们的贡献提高解释准确率之间的相关性。SEEN为提高GNN模型输出的解释质量提供了一种简单而有效的方法,该方法与大多数解释技术相结合是适用的。 摘要:Explaining the foundations for predictions obtained from graph neural networks (GNNs) is critical for credible use of GNN models for real-world problems. Owing to the rapid growth of GNN applications, recent progress in explaining predictions from GNNs, such as sensitivity analysis, perturbation methods, and attribution methods, showed great opportunities and possibilities for explaining GNN predictions. In this study, we propose a method to improve the explanation quality of node classification tasks that can be applied in a post hoc manner through aggregation of auxiliary explanations from important neighboring nodes, named SEEN. Applying SEEN does not require modification of a graph and can be used with diverse explainability techniques due to its independent mechanism. Experiments on matching motif-participating nodes from a given graph show great improvement in explanation accuracy of up to 12.71% and demonstrate the correlation between the auxiliary explanations and the enhanced explanation accuracy through leveraging their contributions. SEEN provides a simple but effective method to enhance the explanation quality of GNN model outputs, and this method is applicable in combination with most explainability techniques.

【8】 Online Learning with Uncertain Feedback Graphs 标题:具有不确定反馈图的在线学习

作者:Pouya M Ghari,Yanning Shen 链接:https://arxiv.org/abs/2106.08441 摘要:基于专家建议的在线学习广泛应用于各种机器学习任务中。它考虑的问题是学习者从一组专家中选择一个来听取建议并做出决定。在许多学习问题中,专家可能是相关的,因此学习者可以观察到与所选专家相关的专家子集的损失。在这种情况下,专家之间的关系可以通过反馈图来捕捉,它可以用来帮助学习者做出决策。然而,在实际应用中,名义反馈图往往带有不确定性,这使得专家之间的实际关系难以揭示。为了应对这一挑战,本文研究了各种潜在不确定性的情况,提出了一种新的在线学习算法,利用不确定性反馈图来处理不确定性。在温和的条件下,证明了该算法具有次线性遗憾。在实际数据集上的实验验证了新算法的有效性。 摘要:Online learning with expert advice is widely used in various machine learning tasks. It considers the problem where a learner chooses one from a set of experts to take advice and make a decision. In many learning problems, experts may be related, henceforth the learner can observe the losses associated with a subset of experts that are related to the chosen one. In this context, the relationship among experts can be captured by a feedback graph, which can be used to assist the learner's decision making. However, in practice, the nominal feedback graph often entails uncertainties, which renders it impossible to reveal the actual relationship among experts. To cope with this challenge, the present work studies various cases of potential uncertainties, and develops novel online learning algorithms to deal with uncertainties while making use of the uncertain feedback graph. The proposed algorithms are proved to enjoy sublinear regret under mild conditions. Experiments on real datasets are presented to demonstrate the effectiveness of the novel algorithms.

【9】 GemNet: Universal Directional Graph Neural Networks for Molecules 标题:GemNet:分子通用有向图神经网络

作者:Johannes Klicpera,Florian Becker,Stephan Günnemann 链接:https://arxiv.org/abs/2106.08903 摘要:有效地预测分子间的相互作用有可能加速分子动力学多个数量级,从而彻底改变化学模拟。最近,图神经网络(GNNs)在这方面取得了巨大的成功,超过了基于固定分子核的经典方法。然而,从理论的角度来看,它们仍然显得非常有限,因为常规GNN不能区分某些类型的图。在这项工作中,我们缩小了理论和实践之间的差距。我们证明了具有有向边嵌入和两跳消息传递的GNNs确实是预测的通用逼近器,它对全局旋转和平移是不变的,对置换是等价的。然后,我们利用这些见解和多重结构改进,提出了几何信息传递神经网络(GemNet)。我们在多个消融研究中证明了提议的改变的好处。GemNet在COLL和MD17分子动力学数据集上的表现比以前的模型好36%,在最具挑战性的分子上表现尤其出色。 摘要:Effectively predicting molecular interactions has the potential to accelerate molecular dynamics by multiple orders of magnitude and thus revolutionize chemical simulations. Graph neural networks (GNNs) have recently shown great successes for this task, overtaking classical methods based on fixed molecular kernels. However, they still appear very limited from a theoretical perspective, since regular GNNs cannot distinguish certain types of graphs. In this work we close this gap between theory and practice. We show that GNNs with directed edge embeddings and two-hop message passing are indeed universal approximators for predictions that are invariant to global rotation and translation, and equivariant to permutation. We then leverage these insights and multiple structural improvements to propose the geometric message passing neural network (GemNet). We demonstrate the benefits of the proposed changes in multiple ablation studies. GemNet outperforms previous models on the COLL and MD17 molecular dynamics datasets by 36%, performing especially well on the most challenging molecules.

【10】 Directed Graph Embeddings in Pseudo-Riemannian Manifolds 标题:伪黎曼流形中的有向图嵌入

作者:Aaron Sim,Maciej Wiatrak,Angus Brayne,Páidí Creed,Saee Paliwal 备注:Accepted at ICML 2021 链接:https://arxiv.org/abs/2106.08678 摘要:图表示学习算法的归纳偏差通常被编码在其嵌入空间的背景几何中。在本文中,我们证明了一般有向图可以由一个嵌入模型有效地表示,该嵌入模型由三个部分组成:伪黎曼度量结构、非平凡全局拓扑和一个在嵌入空间中显式地包含一个优先方向的唯一似然函数。通过将该方法应用于自然语言应用和生物学中一系列合成的和真实的有向图的链接预测任务,证明了该方法的表示能力。特别地,我们证明了低维圆柱Minkowski和anti-de-Sitter时空可以产生与高维曲黎曼流形相等或更好的图表示。 摘要:The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions.

Transformer(2篇)

【1】 Grounding Spatio-Temporal Language with Transformers 标题:用Transformer将时空语言接地

作者:Tristan Karch,Laetitia Teodorescu,Katja Hofmann,Clément Moulin-Frier,Pierre-Yves Oudeyer 备注:Contains main article and supplementaries 链接:https://arxiv.org/abs/2106.08858 摘要:语言是与外部世界的接口。为了让具身代理使用它,语言必须建立在其他感觉运动模式的基础上。虽然有大量的文献研究机器如何学习扎根语言,但如何学习时空语言概念的主题仍然是个未知数。为了在这个方向上取得进展,我们在这里介绍了一个新的时空语言基础任务,其目标是学习具体化代理的行为轨迹的时空描述的意义。这是通过训练真值函数来实现的,真值函数可以预测描述是否与给定的观测历史相匹配。描述包括过去时态和现在时态的时间扩展谓词,以及对场景中对象的时空引用。为了研究架构偏差在这个任务中的作用,我们训练了几个模型,包括多模态Transformer架构;后者实现了跨空间和跨时间的单词和对象之间不同的注意力计算。我们测试了两类泛化模型:1)对随机出现的句子进行泛化;2) 语法原语的泛化。我们观察到,在我们的Transformer的注意力计算中保持对象的同一性有助于总体上获得良好的泛化性能,并且在单个标记中总结对象轨迹对性能的影响很小。然后,我们将讨论如何为语言引导的自主体现代理打开新的视角。我们还发布了开放源码许可证下的代码,以及预先训练的模型和数据集,以鼓励更广泛的社区在未来建立和扩展我们的工作。 摘要:Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.

【2】 Scene Transformer: A unified multi-task model for behavior prediction and planning 标题:场景变形器:一种统一的多任务行为预测和规划模型

作者:Jiquan Ngiam,Benjamin Caine,Vijay Vasudevan,Zhengdong Zhang,Hao-Tien Lewis Chiang,Jeffrey Ling,Rebecca Roelofs,Alex Bewley,Chenxi Liu,Ashish Venugopal,David Weiss,Ben Sapp,Zhifeng Chen,Jonathon Shlens 链接:https://arxiv.org/abs/2106.08417 摘要:预测多智能体的未来运动对于动态环境下的规划是必要的。这个任务对于自动驾驶来说是一个挑战,因为代理(例如车辆和行人)及其相关行为可能是多样的,并且相互影响。以前的工作主要集中在首先根据所有过去的运动预测每个代理的独立未来,然后根据这些独立预测进行规划。然而,针对固定预测的规划可能会由于无法表示不同代理之间未来交互的可能性而导致次优规划。在这项工作中,我们建立了一个模型来预测所有代理在真实驾驶环境中的行为。受最近语言建模方法的启发,我们使用掩蔽策略作为对模型的查询,使我们能够调用单个模型以多种方式预测代理行为,例如潜在地以自主车辆的目标或未来完整轨迹或环境中其他代理的行为为条件。我们的模型架构融合了一个统一的Transformer架构中的异构世界状态,通过对道路元素、代理交互和时间步长的关注。我们评估了我们的方法对自主驾驶数据集的行为预测,并取得了最先进的性能。我们的工作表明,在一个统一的体系结构中用掩蔽策略来描述行为预测问题可以使我们拥有一个能够有效执行多个运动预测和规划相关任务的模型。 摘要:Predicting the future motion of multiple agents is necessary for planning in dynamic environments. This task is challenging for autonomous driving since agents (e.g., vehicles and pedestrians) and their associated behaviors may be diverse and influence each other. Most prior work has focused on first predicting independent futures for each agent based on all past motion, and then planning against these independent predictions. However, planning against fixed predictions can suffer from the inability to represent the future interaction possibilities between different agents, leading to sub-optimal planning. In this work, we formulate a model for predicting the behavior of all agents jointly in real-world driving environments in a unified manner. Inspired by recent language modeling approaches, we use a masking strategy as the query to our model, enabling one to invoke a single model to predict agent behavior in many ways, such as potentially conditioned on the goal or full future trajectory of the autonomous vehicle or the behavior of other agents in the environment. Our model architecture fuses heterogeneous world state in a unified Transformer architecture by employing attention across road elements, agent interactions and time steps. We evaluate our approach on autonomous driving datasets for behavior prediction, and achieve state-of-the-art performance. Our work demonstrates that formulating the problem of behavior prediction in a unified architecture with a masking strategy may allow us to have a single model that can perform multiple motion prediction and planning related tasks effectively.

GAN|对抗|攻击|生成相关(7篇)

【1】 Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis 标题:用于多模态图像合成的级联模块化网络(CAM-NET)

作者:Shichong Peng,Alireza Moazeni,Ke Li 备注:Videos available as ancillary files 链接:https://arxiv.org/abs/2106.09015 摘要:深度生成模型(如GANs)近年来在条件图像合成方面取得了令人瞩目的进展。由于模式崩溃的问题,从同一输入图像生成不同版本的输出图像一直是一个难题:因为每个输入图像只给出一个地面真值输出图像,所以只对条件分布的一个模式进行建模。本文以最近提出的隐式极大似然估计(IMLE)技术为基础,研究了多模态条件图像合成问题。以往的基于IMLE的方法对不同的任务要求不同的体系结构,这限制了它们的适用性,并且在生成的图像中缺乏细节。我们提出了CAM-Net,一个可以应用于广泛任务的统一体系结构。此外,它还能够产生令人信服的高频细节,与基线相比,Frechet起始距离(FID)减少了45.3%。 摘要:Deep generative models such as GANs have driven impressive advances in conditional image synthesis in recent years. A persistent challenge has been to generate diverse versions of output images from the same input image, due to the problem of mode collapse: because only one ground truth output image is given per input image, only one mode of the conditional distribution is modelled. In this paper, we focus on this problem of multimodal conditional image synthesis and build on the recently proposed technique of Implicit Maximum Likelihood Estimation (IMLE). Prior IMLE-based methods required different architectures for different tasks, which limit their applicability, and were lacking in fine details in the generated images. We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks. Additionally, it is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.

【2】 Real-time Attacks Against Deep Reinforcement Learning Policies 标题:针对深度强化学习策略的实时攻击

作者:Buse G. A. Tekgul,Shelly Wang,Samuel Marchal,N. Asokan 备注:10 pages, 3 figures 链接:https://arxiv.org/abs/2106.08746 摘要:最近的研究发现,深度强化学习(DRL)策略容易受到对抗性例子的攻击。这些攻击通过扰乱代理所观察到的环境状态来误导DRL代理的策略。它们在原则上是可行的,但太慢,无法实时愚弄DRL策略。我们提出了一种新的攻击愚弄DRL策略,它既有效又高效,可以实时挂载。我们利用普遍对抗摄动(UAP)方法来计算有效摄动,而不依赖于它们所应用的单个输入。通过使用Atari 2600对策的广泛评估,我们证明了我们的技术是有效的,因为它完全降低了确定性和随机策略的性能(高达100%,即使扰动的$linfty$界小到0.005)。我们还证明了我们的攻击是有效的,平均在线计算开销为0.027ms。与具有不同DRL策略的代理的响应时间(平均0.6毫秒)相比,它更快,并且比以前的攻击(平均2.7毫秒)快得多。此外,我们证明了已知的防御对普遍扰动是无效的。我们提出了一种有效的检测技术,它可以为基于普遍扰动的攻击的鲁棒防御奠定基础。 摘要:Recent work has discovered that deep reinforcement learning (DRL) policies are vulnerable to adversarial examples. These attacks mislead the policy of DRL agents by perturbing the state of the environment observed by agents. They are feasible in principle but too slow to fool DRL policies in real time. We propose a new attack to fool DRL policies that is both effective and efficient enough to be mounted in real time. We utilize the Universal Adversarial Perturbation (UAP) method to compute effective perturbations independent of the individual inputs to which they are applied. Via an extensive evaluation using Atari 2600 games, we show that our technique is effective, as it fully degrades the performance of both deterministic and stochastic policies (up to 100%, even when the $l_infty$ bound on the perturbation is as small as 0.005). We also show that our attack is efficient, incurring an online computational cost of 0.027ms on average. It is faster compared to the response time (0.6ms on average) of agents with different DRL policies, and considerably faster than prior attacks (2.7ms on average). Furthermore, we demonstrate that known defenses are ineffective against universal perturbations. We propose an effective detection technique which can form the basis for robust defenses against attacks based on universal perturbations.

【3】 Self-supervised GANs with Label Augmentation 标题:具有标签增强功能的自监督GANS

作者:Liang Hou,Huawei Shen,Qi Cao,Xueqi Cheng 链接:https://arxiv.org/abs/2106.08601 摘要:近年来,基于变换的自监督学习被应用于生成对抗网络(generative adricative networks,GANs)中,通过学习稳定的表示来缓解鉴别器的灾难性遗忘问题。然而,现有的自监督遗传算法中独立的自监督任务导致了目标与生成性建模的不一致性,这是由于生成者从其不确定生成者分布的分类器中学习生成者。为了解决这个问题,我们提出了一个新的带有标签扩充的自监督GANs框架,即用自监督伪标签扩充GAN标签(真或假)。特别地,鉴别器和自监督分类器被统一以学习预测增强标签的单个任务,使得鉴别器/分类器知道生成器分布,而生成器试图通过优化变换后的真实分布和生成的分布之间的差异来混淆鉴别器/分类器。理论上,我们证明了生成器在平衡点处收敛以复制数据分布。实证研究表明,该方法在基准数据集的生成性建模和表示学习上都显著优于竞争性基线。 摘要:Recently, transformation-based self-supervised learning has been applied to generative adversarial networks (GANs) to mitigate the catastrophic forgetting problem of discriminator by learning stable representations. However, the separate self-supervised tasks in existing self-supervised GANs cause an inconsistent goal with generative modeling due to the learning of the generator from their generator distribution-agnostic classifiers. To address this issue, we propose a novel self-supervised GANs framework with label augmentation, i.e., augmenting the GAN labels (real or fake) with the self-supervised pseudo-labels. In particular, the discriminator and the self-supervised classifier are unified to learn a single task that predicts the augmented label such that the discriminator/classifier is aware of the generator distribution, while the generator tries to confuse the discriminator/classifier by optimizing the discrepancy between the transformed real and generated distributions. Theoretically, we prove that the generator, at the equilibrium point, converges to replicate the data distribution. Empirically, we demonstrate that the proposed method significantly outperforms competitive baselines on both generative modeling and representation learning across benchmark datasets.

【4】 TSO: Curriculum Generation using continuous optimization 标题:TSO:基于连续优化的课程生成

作者:Dipankar Sarkar,Mukur Gupta 备注:10 pages, along with all experiment details 链接:https://arxiv.org/abs/2106.08569 摘要:深度学习模型的训练带来了巨大的挑战,包括参数调整和训练数据排序。在课程学习中,为了优化训练数据的顺序,人们进行了大量的研究。最近的工作主要集中在使用复杂的强化学习技术来寻找最佳的数据排序策略,以最大限度地提高对给定网络的学习。本文提出了一种简单有效的基于连续优化的方法。我们称这种新方法为训练序列优化(TSO)。在我们提出的方法中有三个关键组成部分:(a)编码器网络将训练序列映射/嵌入到连续空间中(b) 预测网络使用策略的连续表示作为输入,并预测固定网络结构的精度(c) 解码器进一步将策略的连续表示映射到有序训练数据集。性能预测器和编码器使我们能够在连续空间中进行基于梯度的优化,以找到具有潜在更好精度的最优训练数据顺序的嵌入。实验结果表明,与使用CIFAR-100数据集的随机策略相比,我们生成的最优课程策略可以获得2AP,并且比现有的CL算法具有更好的性能。我们通过改变体系结构、数据集和样本大小进行了烧蚀研究,展示了我们的方法的健壮性。 摘要:The training of deep learning models poses vast challenges of including parameter tuning and ordering of training data. Significant research has been done in Curriculum learning for optimizing the sequence of training data. Recent works have focused on using complex reinforcement learning techniques to find the optimal data ordering strategy to maximize learning for a given network. In this paper, we present a simple and efficient technique based on continuous optimization. We call this new approach Training Sequence Optimization (TSO). There are three critical components in our proposed approach: (a) An encoder network maps/embeds training sequence into continuous space. (b) A predictor network uses the continuous representation of a strategy as input and predicts the accuracy for fixed network architecture. (c) A decoder further maps a continuous representation of a strategy to the ordered training dataset. The performance predictor and encoder enable us to perform gradient-based optimization in the continuous space to find the embedding of optimal training data ordering with potentially better accuracy. Experiments show that we can gain 2AP with our generated optimal curriculum strategy over the random strategy using the CIFAR-100 dataset and have better boosts than the state of the art CL algorithms. We do an ablation study varying the architecture, dataset and sample sizes showcasing our approach's robustness.

【5】 Dynamically Grown Generative Adversarial Networks 标题:动态增长的生成性对抗性网络

作者:Lanlan Liu,Yuting Zhang,Jia Deng,Stefano Soatto 备注:Accepted to AAAI 2021 链接:https://arxiv.org/abs/2106.08505 摘要:最近的工作将渐进式网络生长作为一种很有前途的方法来简化大型GANs的训练,但是模型设计和结构生长策略仍有待探索,需要针对不同的图像数据进行人工设计。本文提出了一种在训练过程中动态生长GAN的方法,优化了网络结构和参数,并实现了自动化。该方法将结构搜索技术嵌入到基于梯度训练的交织步骤中,周期性地为生成器和鉴别器寻找最优的结构增长策略。它既享受着渐进式增长带来的轻松训练的好处,也享受着更广阔的建筑设计空间带来的性能提升的好处。实验结果证明了新的图像生成技术。搜索过程中的观察也为GAN模型的设计提供了建设性的见解,如发生器-鉴别器平衡和卷积层的选择。 摘要:Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data. In this paper, we propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator. It enjoys the benefits of both eased training because of progressive growing and improved performance because of broader architecture design space. Experimental results demonstrate new state-of-the-art of image generation. Observations in the search procedure also provide constructive insights into the GAN model design such as generator-discriminator balance and convolutional layer choices.

【6】 Towards Adversarial Robustness via Transductive Learning 标题:通过引导学习走向对抗稳健性

作者:Jiefeng Chen,Yang Guo,Xi Wu,Tianqi Li,Qicheng Lao,Yingyu Liang,Somesh Jha 链接:https://arxiv.org/abs/2106.08387 摘要:利用转导学习来增强对抗的鲁棒性已经引起了人们的兴趣(Goldwasser等人,NeurIPS 2020;Wu等人,ICML 2020)。与传统的“测试时间”防御机制相比,这些防御机制通过导入学习对基于测试时间输入的模型进行“动态再训练”;从理论上讲,攻击这些防御归结为两层优化,这似乎增加了自适应攻击的难度。在本文中,我们首先形式化和分析建模方面的跨导鲁棒性。在此基础上,提出了求解双层攻击目标的攻击模型空间原理,并给出了该原理的一个实例,打破了以往的反导防御。因此,这些攻击表明,在使用转导学习来提高对抗鲁棒性方面存在重大困难。为此,我们提出了新的理论和实证证据来支持翻译学习的效用。 摘要:There has been emerging interest to use transductive learning for adversarial robustness (Goldwasser et al., NeurIPS 2020; Wu et al., ICML 2020). Compared to traditional "test-time" defenses, these defense mechanisms "dynamically retrain" the model based on test time input via transductive learning; and theoretically, attacking these defenses boils down to bilevel optimization, which seems to raise the difficulty for adaptive attacks. In this paper, we first formalize and analyze modeling aspects of transductive robustness. Then, we propose the principle of attacking model space for solving bilevel attack objectives, and present an instantiation of the principle which breaks previous transductive defenses. These attacks thus point to significant difficulties in the use of transductive learning to improve adversarial robustness. To this end, we present new theoretical and empirical evidence in support of the utility of transductive learning.

【7】 Adversarial Attacks on Deep Models for Financial Transaction Records 标题:对金融交易记录深层模型的对抗性攻击

作者:Ivan Fursov,Matvey Morozov,Nina Kaploukhaya,Elizaveta Kovtun,Rodrigo Rivera-Castro,Gleb Gusev,Dmitry Babaev,Ivan Kireev,Alexey Zaytsev,Evgeny Burnaev 链接:https://arxiv.org/abs/2106.08361 摘要:使用交易记录作为输入的机器学习模型在金融机构中很流行。最有效的模型使用与NLP社区类似的深度学习体系结构,由于其大量的参数和有限的鲁棒性而带来了挑战。特别是,深度学习模型很容易受到对手的攻击:输入的微小变化会损害模型的输出。在这项工作中,我们研究了对事务记录数据的对抗性攻击以及对这些攻击的防御。交易记录数据的结构与规范的NLP或时间序列数据不同,因为相邻记录的连接比句子中的单词少,并且每个记录都由离散的商户代码和连续的交易金额组成。我们考虑一个黑盒攻击场景,其中攻击不知道真正的决策模型,并特别注意在序列末尾添加事务标记。这些限制提供了更现实的场景,以前在NLP世界中没有被探索过。利用金融业的相关数据集,提出的对抗性攻击和相应的防御措施表现出显著的性能。我们的结果表明,生成的几个事务足以愚弄深度学习模型。此外,我们通过对抗性训练或单独的对抗性样本检测来提高模型的鲁棒性。这项工作表明,嵌入对抗性攻击的保护提高了模型的健壮性,允许银行和金融业更广泛地采用交易记录的深层模型。 摘要:Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the model's output. In this work, we examine adversarial attacks on transaction records data and defences from these attacks. The transaction records data have a different structure than the canonical NLP or time series data, as neighbouring records are less connected than words in sentences, and each record consists of both discrete merchant code and continuous transaction amount. We consider a black-box attack scenario, where the attack doesn't know the true decision model, and pay special attention to adding transaction tokens to the end of a sequence. These limitations provide more realistic scenario, previously unexplored in NLP world. The proposed adversarial attacks and the respective defences demonstrate remarkable performance using relevant datasets from the financial industry. Our results show that a couple of generated transactions are sufficient to fool a deep-learning model. Further, we improve model robustness via adversarial training or separate adversarial examples detection. This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】 Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation 标题:无监督图像到图像翻译中解缠潜在样式空间的平滑

作者:Yahui Liu,Enver Sangineto,Yajing Chen,Linchao Bao,Haoxian Zhang,Nicu Sebe,Bruno Lepri,Wei Wang,Marco De Nadai 备注:Accepted to CVPR 2021 链接:https://arxiv.org/abs/2106.09016 摘要:图像到图像(I2I)多域翻译模型通常也使用其语义插值结果的质量进行评估。然而,最先进的模型在插值过程中经常会出现图像外观的突变,并且在跨域插值时表现较差。在本文中,我们提出了一种新的基于三种特定损失的训练协议,该协议有助于翻译网络学习一个平滑且不纠缠的潜在风格空间:1)域内和域间插值都对应于生成图像的渐变;2)在翻译过程中更好地保留源图像的内容。此外,我们还提出了一种新的评价指标来衡量I2I翻译模型潜在风格空间的平滑度。我们在不同数据集上的大量实验表明,该方法能显著提高生成图像的质量和插值的渐进性。 摘要:Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic interpolation results. However, state-of-the-art models frequently show abrupt changes in the image appearance during interpolation, and usually perform poorly in interpolations across domains. In this paper, we propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space in which: 1) Both intra- and inter-domain interpolations correspond to gradual changes in the generated images and 2) The content of the source image is better preserved during the translation. Moreover, we propose a novel evaluation metric to properly measure the smoothness of latent style space of I2I translation models. The proposed method can be plugged into existing translation approaches, and our extensive experiments on different datasets show that it can significantly boost the quality of the generated images and the graduality of the interpolations.

【2】 Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization 标题:超越Tikhonov:通过迭代正则化在具有自协调损失的情况下更快地学习

作者:Gaspard Beugnot,Julien Mairal,Alessandro Rudi 链接:https://arxiv.org/abs/2106.08855 摘要:谱滤波理论是理解核学习统计特性的重要工具。对于最小二乘法,它允许导出各种正则化方案,这些方案产生比Tikhonov正则化更快的超额风险收敛速度。这通常是通过利用称为源和容量条件的经典假设来实现的,这些假设表征了学习任务的难度。为了理解从其他损失函数导出的估计量,Marteau-Ferey等人将Tikhonov正则化理论推广到广义自洽损失函数(GSC),其中包含例如logistic损失。本文进一步证明了迭代Tikhonov正则化方法能使GSC达到快速最优速率,它与优化中的近点法有着内在的联系,克服了经典Tikhonov正则化方法的局限性。 摘要:The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.

【3】 Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 标题:基于自我监督和区分训练的越界意图检测

作者:Li-Ming Zhan,Haowen Liang,Bo Liu,Lu Fan,Xiao-Ming Wu,Albert Y. S. Lam 备注:ACL2021 链接:https://arxiv.org/abs/2106.08616 摘要:超范围意图检测在面向任务的对话系统中具有重要的实际意义。由于在训练阶段,离群词的分布是任意的和未知的,现有的方法通常依赖于对数据分布的强假设,如混合高斯函数进行推理,结果要么是复杂的多步骤训练程序,要么是手工编制的规则,如离群点检测的置信阈值选择。本文通过模拟训练中的测试场景,提出了一种简单而有效的方法来训练一个完全端到端的范围外意图分类器,该方法不需要假设数据分布,也不需要额外的后处理或阈值设置。具体地说,我们在训练阶段构造了一组伪离群值,通过自我监督和从容易获得的开放域数据集中抽取范围外的语句,使用内联特征生成合成离群值。利用伪离群点训练可直接应用于测试任务并具有良好泛化能力的判别分类器。我们在四个基准对话数据集上广泛地评估了我们的方法,并观察到与最新方法相比的显著改进。我们的代码已经发布在https://github.com/liam0949/DCLOOS. 摘要:Out-of-scope intent detection is of practical importance in task-oriented dialogue systems. Since the distribution of outlier utterances is arbitrary and unknown in the training stage, existing methods commonly rely on strong assumptions on data distribution such as mixture of Gaussians to make inference, resulting in either complex multi-step training procedures or hand-crafted rules such as confidence threshold selection for outlier detection. In this paper, we propose a simple yet effective method to train an out-of-scope intent classifier in a fully end-to-end manner by simulating the test scenario in training, which requires no assumption on data distribution and no additional post-processing or threshold setting. Specifically, we construct a set of pseudo outliers in the training stage, by generating synthetic outliers using inliner features via self-supervision and sampling out-of-scope sentences from easily available open-domain datasets. The pseudo outliers are used to train a discriminative classifier that can be directly applied to and generalize well on the test task. We evaluate our method extensively on four benchmark dialogue datasets and observe significant improvements over state-of-the-art approaches. Our code has been released at https://github.com/liam0949/DCLOOS.

【4】 Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty 标题:内生和外生不确定性环境下强化学习的基本极限

作者:Rongpeng Li 备注:Manuscript has been submitted to an IEEE journal. Copyright may be transferred without further notice 链接:https://arxiv.org/abs/2106.08477 摘要:在线强化学习(RL)被广泛应用于信息处理场景中,由于信道和服务需求的内在随机性,这些场景往往表现出很大的不确定性。本文研究了一类具有内生和外生不确定性的一般马尔可夫决策过程(MDPs)中的无折扣RL问题,其中,RL agent对报酬和状态转移概率都是未知的,只要它们各自的变化不超过一定的动态预算(即上界),它们就会随时间演化。我们首先开发了一种基于变异感知的Bernstein上置信度强化学习(VB-UCRL),该学习可以根据变异的进度重新启动。我们成功地克服了外生不确定性带来的挑战,并与文献中的最新结果(其中,$S$表示MDP的状态大小,$T$表示学习步骤的迭代索引)进行了比较,建立了一个最多节省$sqrt{S}$或$S^{frac{1}{6}}T^{1}{12}}$的遗憾界。 摘要:Online reinforcement learning (RL) has been widely applied in information processing scenarios, which usually exhibit much uncertainty due to the intrinsic randomness of channels and service demands. In this paper, we consider an un-discounted RL in general Markov decision processes (MDPs) with both endogeneous and exogeneous uncertainty, where both the rewards and state transition probability are unknown to the RL agent and evolve with the time as long as their respective variations do not exceed certain dynamic budget (i.e., upper bound). We first develop a variation-aware Bernstein-based upper confidence reinforcement learning (VB-UCRL), which we allow to restart according to a schedule dependent on the variations. We successfully overcome the challenges due to the exogeneous uncertainty and establish a regret bound of saving at most $sqrt{S}$ or $S^{frac{1}{6}}T^{frac{1}{12}}$ compared with the latest results in the literature, where $S$ denotes the state size of the MDP and $T$ indicates the iteration index of learning steps.

【5】 Nonequilibrium thermodynamics of self-supervised learning 标题:自我监督学习的非平衡热力学

作者:Domingos S. P. Salazar 备注:6 pages, 1 figure 链接:https://arxiv.org/abs/2106.08981 摘要:基于能量模型的自监督学习(SSL)与平衡热力学有着直观的联系,因为softmax层将能量映射到概率,是Gibbs分布。然而,SSL以何种方式是热力学过程?我们证明了一些SSL范式表现为一个热力学复合系统,由表示和自标记与非平衡储层接触而形成。此外,该系统还受到绝热膨胀和等容加热等常规热力学循环的影响,产生了广义吉布斯系综。在这幅图中,我们展示了学习被视为一个恶魔,它通过反馈测量来从系统中提取负功。作为应用程序,我们研究了一些使用这种思想的SSL算法。 摘要:Self-supervised learning (SSL) of energy based models has an intuitive relation to equilibrium thermodynamics because the softmax layer, mapping energies to probabilities, is a Gibbs distribution. However, in what way SSL is a thermodynamic process? We show that some SSL paradigms behave as a thermodynamic composite system formed by representations and self-labels in contact with a nonequilibrium reservoir. Moreover, this system is subjected to usual thermodynamic cycles, such as adiabatic expansion and isochoric heating, resulting in a generalized Gibbs ensemble (GGE). In this picture, we show that learning is seen as a demon that operates in cycles using feedback measurements to extract negative work from the system. As applications, we examine some SSL algorithms using this idea.

【6】 Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition 标题:用于半监督语音识别的动量伪标注

作者:Yosuke Higuchi,Niko Moritz,Jonathan Le Roux,Takaaki Hori 备注:Accepted to Interspeech 2021 链接:https://arxiv.org/abs/2106.08922 摘要:伪标记(PL)在半监督自动语音识别(ASR)中是一种有效的方法,该方法利用未标记数据生成的伪标记对基础模型进行自训练。随着模型的发展,通过迭代更新伪标签,PL可以得到进一步的改进,但是以前的大多数方法都涉及到模型的无效再训练或标签更新的复杂控制。本文提出了一种简单有效的半监督ASR策略&动量伪标记(MPL)。MPL由一对在线和离线模型组成,这些模型受mean-teacher方法的启发,可以相互交流和学习。在线模型被训练来预测离线模型动态生成的伪标签。离线模型保持在线模型基于动量的移动平均值。MPL是在单个训练过程中进行的,两个模型之间的交互有效地帮助它们相互增强,从而提高了ASR的性能。我们将MPL应用到一个基于连接主义时间分类的端到端ASR模型中。实验结果表明,MPL有效地改善了基本模型的性能,并可扩展到不同数据量或域不匹配的半监督场景。 摘要:Pseudo-labeling (PL) has been shown to be effective in semi-supervised automatic speech recognition (ASR), where a base model is self-trained with pseudo-labels generated from unlabeled data. While PL can be further improved by iteratively updating pseudo-labels as the model evolves, most of the previous approaches involve inefficient retraining of the model or intricate control of the label update. We present momentum pseudo-labeling (MPL), a simple yet effective strategy for semi-supervised ASR. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The online model is trained to predict pseudo-labels generated on the fly by the offline model. The offline model maintains a momentum-based moving average of the online model. MPL is performed in a single training process and the interaction between the two models effectively helps them reinforce each other to improve the ASR performance. We apply MPL to an end-to-end ASR model based on the connectionist temporal classification. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios with varying amounts of data or domain mismatch.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation 标题:架起多任务学习与元学习之间的桥梁:迈向高效训练和有效适应

作者:Haoxiang Wang,Han Zhao,Bo Li 备注:ICML 2021 camera-ready version. Code is released at this https URL 链接:https://arxiv.org/abs/2106.09017 摘要:多任务学习(Multi-task learning,MTL)旨在通过多个相关任务的联合学习来提高它们的泛化能力。作为一个比较,除了联合训练计划,现代元学习允许在测试阶段使用有限标签的看不见的任务,以期快速适应它们。尽管MTL和元学习在问题表述上存在细微的差异,但两种学习范式都有一个共同的见解,即现有训练任务之间的共享结构可以导致更好的泛化和适应性。本文通过理论分析和实证研究,进一步认识这两种学习范式之间的密切联系。理论上,我们首先证明了MTL与一类基于梯度的元学习(GBML)算法具有相同的优化公式。证明了对于具有足够深度的超参数化神经网络,MTL和GBML的学习预测函数是接近的。特别是,这一结果表明,这两个模型给出的预测是相似的,在同一个看不见的任务。在实验上,我们证实了我们的理论发现,通过适当的实现,MTL在一组少量镜头图像分类基准上与最先进的GBML算法具有竞争力。由于现有的GBML算法往往涉及昂贵的二阶双层优化,我们的一阶MTL方法在大规模数据集(如mini-ImageNet)上的速度要快一个数量级。我们相信,这项工作可以帮助弥合这两种学习模式之间的差距,并提供一个计算效率高的替代GBML,也支持快速任务适应。 摘要:Multi-task learning (MTL) aims to improve the generalization of several related tasks by learning them jointly. As a comparison, in addition to the joint training scheme, modern meta-learning allows unseen tasks with limited labels during the test phase, in the hope of fast adaptation over them. Despite the subtle difference between MTL and meta-learning in the problem formulation, both learning paradigms share the same insight that the shared structure between existing training tasks could lead to better generalization and adaptation. In this paper, we take one important step further to understand the close connection between these two learning paradigms, through both theoretical analysis and empirical investigation. Theoretically, we first demonstrate that MTL shares the same optimization formulation with a class of gradient-based meta-learning (GBML) algorithms. We then prove that for over-parameterized neural networks with sufficient depth, the learned predictive functions of MTL and GBML are close. In particular, this result implies that the predictions given by these two models are similar over the same unseen task. Empirically, we corroborate our theoretical findings by showing that, with proper implementation, MTL is competitive against state-of-the-art GBML algorithms on a set of few-shot image classification benchmarks. Since existing GBML algorithms often involve costly second-order bi-level optimization, our first-order MTL method is an order of magnitude faster on large-scale datasets such as mini-ImageNet. We believe this work could help bridge the gap between these two learning paradigms, and provide a computationally efficient alternative to GBML that also supports fast task adaptation.

【2】 Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments 标题:VOICITY:噪声混响环境中的零激发非并行语音转换

作者:Alejandro Mottini,Jaime Lorenzo-Trueba,Sri Vishnu Kumar Karlapati,Thomas Drugman 备注:Presented at the Speech Synthesis Workshops 2021 (SSW11) 链接:https://arxiv.org/abs/2106.08873 摘要:语音转换(Voice Conversion,VC)是一种通过转换源语的非语言信息来改变说话人身份的技术。虽然有大量关于VC的文献,但是大多数提出的方法都是在干净的语音记录上进行训练和评估的。然而,许多声学环境是噪声和混响的,严重限制了流行的VC方法对此类场景的适用性。为了解决这个局限性,我们提出了voice,一个新的VC框架,特别是针对有噪声的语音。我们的方法受去噪自动编码器框架的启发,由四个编码器(说话人、内容、语音和声学ASR)和一个解码器组成。重要的是,Voice能够执行非平行零炮VC,这是任何VC系统的一个重要要求,需要在训练过程中看不到的扬声器上工作。我们已经使用LibriSpeech数据集的一个噪声混响版本验证了我们的方法。实验结果表明,在混响噪声环境中,voice在自然度和目标-说话人相似性方面优于其他测试的VC技术。 摘要:Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and evaluated on clean speech recordings. However, many acoustic environments are noisy and reverberant, severely restricting the applicability of popular VC methods to such scenarios. To address this limitation, we propose Voicy, a new VC framework particularly tailored for noisy speech. Our method, which is inspired by the de-noising auto-encoders framework, is comprised of four encoders (speaker, content, phonetic and acoustic-ASR) and one decoder. Importantly, Voicy is capable of performing non-parallel zero-shot VC, an important requirement for any VC system that needs to work on speakers not seen during training. We have validated our approach using a noisy reverberant version of the LibriSpeech dataset. Experimental results show that Voicy outperforms other tested VC techniques in terms of naturalness and target speaker similarity in noisy reverberant environments.

【3】 Knowledge-Adaptation Priors 标题:知识适应优先

作者:Mohammad Emtiyaz Khan,Siddharth Swaroop 链接:https://arxiv.org/abs/2106.08769 摘要:人类和动物有一种快速适应环境的自然能力,但机器学习模型在受到变化影响时,往往需要从头开始进行全面的再训练。我们提出知识适应先验(K-priors)来降低再训练的成本,通过快速准确地适应各种各样的任务和模型。这是通过结合权值和函数空间先验来重建过去的梯度而实现的,它恢复和概括了许多现有的,但似乎不相关的适应策略。用简单的一阶梯度方法进行训练,通常可以通过选择足够大的内存来恢复精确的再训练模型到任意精度。实证结果证实,适应可以是廉价和准确的,是一个有希望的替代再训练。 摘要:Humans and animals have a natural ability to quickly adapt to their surroundings, but machine-learning models, when subjected to changes, often require a complete retraining from scratch. We present Knowledge-adaptation priors (K-priors) to reduce the cost of retraining by enabling quick and accurate adaptation for a wide-variety of tasks and models. This is made possible by a combination of weight and function-space priors to reconstruct the gradients of the past, which recovers and generalizes many existing, but seemingly-unrelated, adaptation strategies. Training with simple first-order gradient methods can often recover the exact retrained model to an arbitrary accuracy by choosing a sufficiently large memory of the past data. Empirical results confirm that the adaptation can be cheap and accurate, and a promising alternative to retraining.

【4】 HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via Meta-Learning 标题:帮助:通过元学习实现NAS的硬件自适应高效延迟预测器

作者:Hayeon Lee,Sewoong Lee,Song Chong,Sung Ju Hwang 链接:https://arxiv.org/abs/2106.08630 摘要:对于部署,神经结构搜索应该是硬件感知的,以满足设备特定的约束(例如,内存使用、延迟和能量消耗)并提高模型效率。基于硬件感知的NAS上的现有方法从目标设备收集大量样本(例如,精确度和延迟),或者构建查找表或者延迟估计器。然而,这种方法在现实场景中是不切实际的,因为存在许多具有不同硬件规格的设备,并且从如此大量的设备收集样本将需要禁止的计算和金钱成本。为了克服这些局限性,我们提出了硬件自适应有效延迟预测器(HELP),它将特定于设备的延迟估计问题描述为元学习问题,这样我们就可以用少量的样本来估计给定任务在不可见设备上的模型性能延迟。为此,我们引入了一种新的硬件嵌入方法来嵌入任何设备,将其视为输出延迟的黑盒函数,并使用硬件嵌入方法以设备相关的方式学习硬件自适应延迟预测器。我们验证了该方法在未知平台上的时延估计性能,该方法仅需10个测量样本,就可以获得较高的时延估计性能,优于所有相关基线。我们还验证了使用HELP的端到端NAS框架与没有它的框架,并表明在延迟受限的设置下,它在很大程度上减少了基本NAS方法的总时间开销。 摘要:For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on hardware-aware NAS collect a large number of samples (e.g., accuracy and latency) from a target device, either builds a lookup table or a latency estimator. However, such approach is impractical in real-world scenarios as there exist numerous devices with different hardware specifications, and collecting samples from such a large number of devices will require prohibitive computational and monetary cost. To overcome such limitations, we propose Hardware-adaptive Efficient Latency Predictor (HELP), which formulates the device-specific latency estimation problem as a meta-learning problem, such that we can estimate the latency of a model's performance for a given task on an unseen device with a few samples. To this end, we introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner, using the hardware embeddings. We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines. We also validate end-to-end NAS frameworks using HELP against ones without it, and show that it largely reduces the total time cost of the base NAS method, in latency-constrained settings.

【5】 Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization 标题:ADA-BKB:基于自适应离散化的连续域可伸缩高斯过程优化

作者:Marco Rando,Luigi Carratino,Silvia Villa,Lorenzo Rosasco 链接:https://arxiv.org/abs/2106.08598 摘要:高斯过程优化是一类成功的算法(如GP-UCB),它通过序贯评估来优化黑盒函数。然而,当函数的域是连续的时,高斯过程优化要么依赖于空间的固定离散化,要么在每次求值时求解一个非凸优化子问题。第一种方法会对性能产生负面影响,而第二种方法会给算法带来沉重的计算负担。第三种选择,直到最近才被理论研究,是自适应离散函数域。尽管这种方法避免了额外的非凸优化代价,但总体计算复杂度仍然是令人望而却步的。GP-UCB等算法的运行时为$O(T^4)$,其中$T$是迭代次数。本文介绍了Ada BKB(Adaptive Budgeted Kernelized Bandit)算法,它是一种连续域上函数的无遗憾高斯过程优化算法,可证明运行于$O(T^2 dtext{eff}^2)$,其中$dtext{eff}$是探索空间的有效维数,通常比$T$小得多。我们用合成非凸函数和超参数优化的实际问题的实验来证实我们的发现。 摘要:Gaussian process optimization is a successful class of algorithms (e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, when the domain of the function is continuous, Gaussian process optimization has to either rely on a fixed discretization of the space, or solve a non-convex optimization subproblem at each evaluation. The first approach can negatively affect performance, while the second one puts a heavy computational burden on the algorithm. A third option, that only recently has been theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra non-convex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of $O(T^4)$, where $T$ is the number of iterations. In this paper, we introduce Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in $O(T^2 d_text{eff}^2)$, where $d_text{eff}$ is the effective dimension of the explored space, and which is typically much smaller than $T$. We corroborate our findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization.

强化学习(3篇)

【1】 Unbiased Methods for Multi-Goal Reinforcement Learning 标题:多目标强化学习的无偏方法

作者:Léonard Blier,Yann Ollivier 备注:9 pages 链接:https://arxiv.org/abs/2106.08863 摘要:在多目标强化学习(RL)中,每个目标的奖励都是稀疏的,并且位于目标的一个小邻域内。在大尺度下,个体获得奖励的概率为零,学习信号较少。事后经验回放(HER)等方法也通过从已实现但未计划的目标中学习来解决这个问题。但众所周知,她会引入偏见,并通过高估风险结果,向低回报政策靠拢。首先,我们证明了它在确定性环境中是无偏的,比如许多最优控制设置。接下来,对于连续空间中的随机环境,我们通过直接取无限稀疏报酬极限来解决稀疏报酬问题。我们完全形式化了多目标RL问题,每个目标都有无限稀疏的Dirac报酬。我们介绍了无偏的深度Q学习和演员批评算法,可以处理这种无限稀疏的奖励,并测试他们在玩具环境。 摘要:In multi-goal reinforcement learning (RL) settings, the reward for each goal is sparse, and located in a small neighborhood of the goal. In large dimension, the probability of reaching a reward vanishes and the agent receives little learning signal. Methods such as Hindsight Experience Replay (HER) tackle this issue by also learning from realized but unplanned-for goals. But HER is known to introduce bias, and can converge to low-return policies by overestimating chancy outcomes. First, we vindicate HER by proving that it is actually unbiased in deterministic environments, such as many optimal control settings. Next, for stochastic environments in continuous spaces, we tackle sparse rewards by directly taking the infinitely sparse reward limit. We fully formalize the problem of multi-goal RL with infinitely sparse Dirac rewards at each goal. We introduce unbiased deep Q-learning and actor-critic algorithms that can handle such infinitely sparse rewards, and test them in toy environments.

【2】 Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism? 标题:马尔可夫强盗的强化学习:后验抽样比乐观主义更具可扩展性吗?

作者:Nicolas Gast,Bruno Gaujal,Kimang Khun 链接:https://arxiv.org/abs/2106.08771 摘要:研究了具有折扣的经典马氏强盗问题的学习算法。我们解释了如何调整PSRL[24]和UCRL2[2]来开发问题结构。这些变体称为MB-PSRL和MB-UCRL2。虽然PSRL和UCRL2的普通实现的遗憾边界和运行时是强盗数量的指数,但是我们表明MB-PSRL和MB-UCRL2的情节遗憾是Õ(S$sqrt$nK),其中K是事件数,n是盗贼数,S是每个盗贼的状态数(S,n和K的确切界限在本文中给出)。当因子$sqrt$S时,这与我们在本文中推导的$Omega$($sqrt$SnK)的下界相匹配。MB-PSRL的计算效率也很高:它的运行时间与盗贼的数量成线性关系。我们进一步证明,这种线性运行时不能通过将经典的非贝叶斯算法(如UCRL2或UCBVI)应用于马尔可夫bandit问题来实现。最后,我们进行了数值实验,证实了MB-PSRL算法在计算时间和遗憾方面都优于其他现有算法。 摘要:We study learning algorithms for the classical Markovian bandit problem with discount. We explain how to adapt PSRL [24] and UCRL2 [2] to exploit the problem structure. These variants are called MB-PSRL and MB-UCRL2. While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 isÕ(S $sqrt$ nK) where K is the number of episodes, n is the number of bandits and S is the number of states of each bandit (the exact bound in S, n and K is given in the paper). Up to a factor $sqrt$ S, this matches the lower bound of $Omega$($sqrt$ SnK) that we also derive in the paper. MB-PSRL is also computationally efficient: its runtime is linear in the number of bandits. We further show that this linear runtime cannot be achieved by adapting classical non-Bayesian algorithms such as UCRL2 or UCBVI to Markovian bandit problems. Finally, we perform numerical experiments that confirm that MB-PSRL outperforms other existing algorithms in practice, both in terms of regret and of computation time.

【3】 Robust Reinforcement Learning Under Minimax Regret for Green Security 标题:极小极大绿色安全遗憾下的鲁棒强化学习

作者:Lily Xu,Andrew Perrault,Fei Fang,Haipeng Chen,Milind Tambe 备注:Accepted at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021. 11 pages, 5 figures 链接:https://arxiv.org/abs/2106.08413 摘要:绿色安全领域的特点是防御者在偷猎者、非法伐木者和非法渔民的敌对行为不确定的情况下计划巡逻。重要的是,巡逻对敌方未来行为的威慑作用使得巡逻计划成为一个连续的决策问题。因此,本文主要研究基于minimax后悔准则的绿色安全鲁棒序贯巡逻规划问题。我们将问题描述为防御者和控制对抗行为参数值的自然人之间的博弈,并设计了一个算法镜像来寻找鲁棒策略。MIRROR使用了两个基于强化学习的oracle来解决一个受限的游戏,考虑了有限的防御策略和参数值。我们对真实世界的偷猎数据进行评估。 摘要:Green security domains feature defenders who plan patrols in the face of uncertainty about the adversarial behavior of poachers, illegal loggers, and illegal fishers. Importantly, the deterrence effect of patrols on adversaries' future behavior makes patrol planning a sequential decision-making problem. Therefore, we focus on robust sequential patrol planning for green security following the minimax regret criterion, which has not been considered in the literature. We formulate the problem as a game between the defender and nature who controls the parameter values of the adversarial behavior and design an algorithm MIRROR to find a robust policy. MIRROR uses two reinforcement learning-based oracles and solves a restricted game considering limited defender strategies and parameter values. We evaluate MIRROR on real-world poaching data.

医学相关(3篇)

【1】 Cardiovascular Disease Prediction using Recursive Feature Elimination and Gradient Boosting Classification Techniques 标题:基于递归特征消除和梯度提升分类技术的心血管疾病预测

作者:Prasannavenkatesan Theerthagiri,Vidya J 备注:20 pages, 9 figures 链接:https://arxiv.org/abs/2106.08889 摘要:心血管疾病是影响人类健康的最常见的慢性疾病之一。早期发现心血管疾病可以通过预防或降低疾病的严重程度来降低死亡率。机器学习算法是一种很有前途的识别风险因素的方法。为了获得准确的心脏病预测,提出了一种基于递归特征消除的梯度增强(RFE-GB)算法。对具有重要心血管疾病特征的患者健康档案进行分析,评价结果。此外,还采用了其他几种机器学习方法建立了预测模型,并与所提出的模型进行了比较。该模型的结果表明,递归特征消除和梯度增强相结合的算法具有最高的精度(89.7%)。此外,在曲线下面积为0.84的情况下,所提出的RFE-GB算法被认为是优越的,并且与其他技术相比获得了可观的增益。因此,所提出的RFE-GB算法将成为CVD估计和治疗的重要模型。 摘要:Cardiovascular diseases (CVDs) are one of the most common chronic illnesses that affect peoples health. Early detection of CVDs can reduce mortality rates by preventing or reducing the severity of the disease. Machine learning algorithms are a promising method for identifying risk factors. This paper proposes a proposed recursive feature elimination-based gradient boosting (RFE-GB) algorithm in order to obtain accurate heart disease prediction. The patients health record with important CVD features has been analyzed for the evaluation of the results. Several other machine learning methods were also used to build the prediction model, and the results were compared with the proposed model. The results of this proposed model infer that the combined recursive feature elimination and gradient boosting algorithm achieves the highest accuracy (89.7 %). Further, with an area under the curve of 0.84, the proposed RFE-GB algorithm was found superior and had obtained a substantial gain over other techniques. Thus, the proposed RFE-GB algorithm will serve as a prominent model for CVD estimation and treatment.

【2】 COVID-19 Vaccines: Characterizing Misinformation Campaigns and Vaccine Hesitancy on Twitter 标题:冠状病毒疫苗:描述Twitter上的错误信息宣传和疫苗迟疑

作者:Karishma Sharma,Yizhou Zhang,Yan Liu 链接:https://arxiv.org/abs/2106.08423 摘要:疫苗犹豫不决和社交媒体上的错误信息增加了人们对获得群体免疫和战胜大流行所需的COVID-19疫苗摄取的担忧。然而,反科学和政治的错误信息和阴谋在整个流感大流行期间猖獗。对于COVID-19疫苗,我们调查了错误信息和阴谋活动及其特征行为。我们确定是否在疫苗相关讨论中使用协调努力来促进错误信息,并找到协调促进“大重置”阴谋集团的账户,促进疫苗相关的错误信息和强烈的反疫苗和反社会信息,如抵制疫苗护照、不上锁和面具。我们从信息扩散结构的角度来刻画其他误传群体,研究了大的反疫苗误传群体和小的反疫苗群体,包括一个极右翼的反疫苗阴谋集团。与主流新闻和健康新闻相比,左倾群体更倾向于疫苗,右倾群体更受反疫苗和极右误传/阴谋团体的影响。无论是针对疫苗讨论还是政治讨论,错误信息社区的声音都更大,我们发现不同社区的特征行为存在其他差异。最后,我们通过主题建模和与已报道的疫苗副作用(vaer)的比较,研究了可能增加疫苗犹豫的错误信息叙述和信息失真策略,发现更罕见的副作用更常在社交媒体上讨论。 摘要:Vaccine hesitancy and misinformation on social media has increased concerns about COVID-19 vaccine uptake required to achieve herd immunity and overcome the pandemic. However anti-science and political misinformation and conspiracies have been rampant throughout the pandemic. For COVID-19 vaccines, we investigate misinformation and conspiracy campaigns and their characteristic behaviours. We identify whether coordinated efforts are used to promote misinformation in vaccine related discussions, and find accounts coordinately promoting a `Great Reset' conspiracy group promoting vaccine related misinformation and strong anti-vaccine and anti-social messages such as boycott vaccine passports, no lock-downs and masks. We characterize other misinformation communities from the information diffusion structure, and study the large anti-vaccine misinformation community and smaller anti-vaccine communities, including a far-right anti-vaccine conspiracy group. In comparison with the mainstream and health news, left-leaning group, which are more pro-vaccine, the right-leaning group is influenced more by the anti-vaccine and far-right misinformation/conspiracy communities. The misinformation communities are more vocal either specific to the vaccine discussion or political discussion, and we find other differences in the characteristic behaviours of different communities. Lastly, we investigate misinformation narratives and tactics of information distortion that can increase vaccine hesitancy, using topic modeling and comparison with reported vaccine side-effects (VAERS) finding rarer side-effects are more frequently discussed on social media.

【3】 Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI 标题:实时MRI中基于声道形状动力学的无声语音和情感识别

作者:Laxmi Pandey,Ahmed Sabbir Arif 备注:8 pages 链接:https://arxiv.org/abs/2106.08706 摘要:口语的语音是通过改变发音器在声道周围的结构来获得的。它们包含了丰富的信息,可以用来更好地理解人类言语产生的潜在机制。我们提出了一种新的基于深度神经网络的学习框架,该框架能够理解语音产生过程中声道形状可变长度序列中的声学信息,并通过实时磁共振成像(rtMRI)捕获这些信息,然后将其翻译成文本。提出的框架包括时空卷积、循环网络和连接主义时间分类损失,完全端到端训练。在USC-TIMIT语料库上,该模型在句子水平上的识别率达到了40.6%,比现有的模型要好得多。据我们所知,这是第一个研究表明,识别整个口语句子的基础上个人的发音运动捕捉到的rtMRI视频。我们还分析了不同情绪和性别的声道各亚区(即咽部、腭部和背侧、硬腭、唇部收缩区)发音几何结构的变化。结果表明,每个子区域的失真都受到情绪和性别的影响。 摘要:Speech sounds of spoken language are obtained by varying configuration of the articulators surrounding the vocal tract. They contain abundant information that can be utilized to better understand the underlying mechanism of human speech production. We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract shaping during speech production, captured by real-time magnetic resonance imaging (rtMRI), and translate it into text. The proposed framework comprises of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. On the USC-TIMIT corpus, the model achieved a 40.6% PER at sentence-level, much better compared to the existing models. To the best of our knowledge, this is the first study that demonstrates the recognition of entire spoken sentence based on an individual's articulatory motions captured by rtMRI video. We also performed an analysis of variations in the geometry of articulation in each sub-regions of the vocal tract (i.e., pharyngeal, velar and dorsal, hard palate, labial constriction region) with respect to different emotions and genders. Results suggest that each sub-regions distortion is affected by both emotion and gender.

蒸馏|知识提取(1篇)

【1】 Topology Distillation for Recommender System 标题:面向推荐系统的拓扑抽取

作者:SeongKu Kang,Junyoung Hwang,Wonbin Kweon,Hwanjo Yu 备注:KDD 2021. 9 pages appendix (2 pages). 8 figures 链接:https://arxiv.org/abs/2106.08700 摘要:推荐系统(RS)采用了一种模型压缩技术&知识提取,利用预先训练好的大型教师模型中的知识来训练一个紧凑的学生模型。最近的研究表明,从教师中间层转移知识可以显著提高学生的推荐质量。然而,它们逐点传递个体表征的知识,因而存在一个局限性,即RS的主要信息在于表征空间中的关系。本文提出了一种新的拓扑提取方法,通过在教师空间中传递建立在关系基础上的拓扑结构来指导学生。我们首先观察到,简单地让学生学习整个拓扑结构并不总是有效的,甚至会降低学生的表现。我们证明,由于学生的能力与教师相比是非常有限的,学习整个拓扑结构对学生来说是令人望而生畏的。为了解决这一问题,我们提出了一种新的分层拓扑提取方法(HTD),该方法通过分层提取拓扑结构来应对大容量的网络缺口。我们在真实数据集上的大量实验表明,该方法明显优于现有的竞争对手。我们还提供了深入的分析,以确定提取RS拓扑的好处。 摘要:Recommender Systems (RS) have employed knowledge distillation which is a model compression technique training a compact student model with the knowledge transferred from a pre-trained large teacher model. Recent work has shown that transferring knowledge from the teacher's intermediate layer significantly improves the recommendation quality of the student. However, they transfer the knowledge of individual representation point-wise and thus have a limitation in that primary information of RS lies in the relations in the representation space. This paper proposes a new topology distillation approach that guides the student by transferring the topological structure built upon the relations in the teacher space. We first observe that simply making the student learn the whole topological structure is not always effective and even degrades the student's performance. We demonstrate that because the capacity of the student is highly limited compared to that of the teacher, learning the whole topological structure is daunting for the student. To address this issue, we propose a novel method named Hierarchical Topology Distillation (HTD) which distills the topology hierarchically to cope with the large capacity gap. Our extensive experiments on real-world datasets show that the proposed method significantly outperforms the state-of-the-art competitors. We also provide in-depth analyses to ascertain the benefit of distilling the topology for RS.

聚类(2篇)

【1】 Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation 标题:基于列表可解码均值估计的近线性时间混合模型聚类

作者:Ilias Diakonikolas,Daniel M. Kane,Daniel Kongsgaard,Jerry Li,Kevin Tian 备注:64 pages, 1 figure 链接:https://arxiv.org/abs/2106.08537 摘要:我们研究了列表可解均值估计问题,其中对手可以破坏大多数数据集。具体地说,我们在$mathbb{R}^d$中给出了一组$T$的$n$点和一个参数$0<alpha<frac 1 2$,使得$T$中的点的$alpha$分数是来自良好分布$mathcal{d}$的i.i.d.样本,其余的$(1-alpha)$分数是任意的。目标是输出一个小的向量列表,其中至少有一个向量接近$mathcal{D}$的平均值。作为我们的主要贡献,我们开发了列表可解均值估计的新算法,在运行时间为$n^{1 o(1)}d$的情况下获得了接近最优的统计保证。该问题的所有先前算法在$frac 1alpha$中都有额外的多项式因子。作为推论,我们得到了第一个几乎线性时间算法,用于聚类$k$分离的良好分布的混合物,几乎符合谱方法的统计保证。以前的聚类算法本质上依赖于$k$-PCA的应用,因此产生了$Omega(ndk)$的运行时。这标志着近20年来这一基本统计问题的首次运行时改进。该方法的出发点是在$alphato 1$区域,基于一次矩阵乘性权值激发的潜在下降,提出了一种新颖、简单的近线性时间鲁棒均值估计算法。我们在Diakonikolas et al.18,'20的迭代多重滤波技术的背景下,非常关键地利用了这个新的算法框架,提供了一种使用一维投影同时聚类和降采样点的方法——因此,绕过了先前算法所需的$k$-PCA子例程。 摘要:We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set $T$ of $n$ points in $mathbb{R}^d$ and a parameter $0< alpha <frac 1 2$ such that an $alpha$-fraction of the points in $T$ are i.i.d. samples from a well-behaved distribution $mathcal{D}$ and the remaining $(1-alpha)$-fraction of the points are arbitrary. The goal is to output a small list of vectors at least one of which is close to the mean of $mathcal{D}$. As our main contribution, we develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees, with running time $n^{1 o(1)} d$. All prior algorithms for this problem had additional polynomial factors in $frac 1 alpha$. As a corollary, we obtain the first almost-linear time algorithms for clustering mixtures of $k$ separated well-behaved distributions, nearly-matching the statistical guarantees of spectral methods. Prior clustering algorithms inherently relied on an application of $k$-PCA, thereby incurring runtimes of $Omega(n d k)$. This marks the first runtime improvement for this basic statistical problem in nearly two decades. The starting point of our approach is a novel and simpler near-linear time robust mean estimation algorithm in the $alpha to 1$ regime, based on a one-shot matrix multiplicative weights-inspired potential decrease. We crucially leverage this new algorithmic framework in the context of the iterative multi-filtering technique of Diakonikolas et. al. '18, '20, providing a method to simultaneously cluster and downsample points using one-dimensional projections --- thus, bypassing the $k$-PCA subroutines required by prior algorithms.

【2】 Correlation Clustering in Constant Many Parallel Rounds 标题:恒定多个平行轮次的相关聚类

作者:Vincent Cohen-Addad,Silvio Lattanzi,Slobodan Mitrović,Ashkan Norouzi-Fard,Nikos Parotsidis,Jakub Tarnawski 备注:ICML 2021 (long talk) 链接:https://arxiv.org/abs/2106.08448 摘要:关联聚类是无监督学习中的一个核心问题,在ML和数据挖掘中有着广泛的应用。在相关聚类中,接收一个有符号图作为输入,其目标是对其进行划分以最小化不一致的数目。在这项工作中,我们提出了一个大规模并行计算(MPC)算法,这是相当快的速度比以往的工作。特别是,我们的算法使用内存在图中的节点数次线性的机器,并返回一个常数近似值,同时只运行一个常数轮数。据我们所知,我们的算法是第一个可以证明近似的聚类问题的图只使用一个恒定数量的MPC轮在次线性内存制度。我们用我们的技术的实验分析来补充我们的分析。 摘要:Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental analysis of our techniques.

自动驾驶|车辆|车道检测等(1篇)

【1】 A Multi-Layered Approach for Measuring the Simulation-to-Reality Gap of Radar Perception for Autonomous Driving 标题:一种多层次测量自动驾驶雷达感知仿真与现实差距的方法

作者:Anthony Ngo,Max Paul Bauer,Michael Resch 备注:Accepted at the 24th IEEE International Conference on Intelligent Transportation Systems (ITSC 2021) 链接:https://arxiv.org/abs/2106.08372 摘要:随着自动驾驶汽车发布的安全验证要求的不断提高,除了传统的真实世界测试之外,还出现了替代方法,例如基于模拟的测试。为了依赖虚拟测试,必须对所采用的传感器模型进行验证。因此,有必要量化模拟与现实之间的差异,以确定某一保真度是否足以满足预期用途。目前还没有一种可靠的方法来测量自动驾驶雷达感知模拟与现实的差距。我们通过引入一种多层评估方法来解决这个问题,该方法包括显式和隐式传感器模型评估的组合。前者直接评估综合生成的传感器数据的真实性,而后者指的是对下游目标应用的评估。为了验证该方法,我们评估了三种典型雷达模型(理想、数据驱动、基于射线跟踪)的保真度及其在基于雷达的多目标跟踪虚拟测试中的适用性。我们已经证明了所提出的方法的有效性,提供了一个深入的传感器模型评估,使现有的差距可见,并使一个现实的估计总体模型保真度在不同的场景。 摘要:With the increasing safety validation requirements for the release of a self-driving car, alternative approaches, such as simulation-based testing, are emerging in addition to conventional real-world testing. In order to rely on virtual tests the employed sensor models have to be validated. For this reason, it is necessary to quantify the discrepancy between simulation and reality in order to determine whether a certain fidelity is sufficient for a desired intended use. There exists no sound method to measure this simulation-to-reality gap of radar perception for autonomous driving. We address this problem by introducing a multi-layered evaluation approach, which consists of a combination of an explicit and an implicit sensor model evaluation. The former directly evaluates the realism of the synthetically generated sensor data, while the latter refers to an evaluation of a downstream target application. In order to demonstrate the method, we evaluated the fidelity of three typical radar model types (ideal, data-driven, ray tracing-based) and their applicability for virtually testing radar-based multi-object tracking. We have shown the effectiveness of the proposed approach in terms of providing an in-depth sensor model assessment that renders existing disparities visible and enables a realistic estimation of the overall model fidelity across different scenarios.

推理|分析|理解|解释(5篇)

【1】 Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation 标题:自我注意的特征分析及其部分计算重构

作者:Srinadh Bhojanapalli,Ayan Chakrabarti,Himanshu Jain,Sanjiv Kumar,Michal Lukasik,Andreas Veit 备注:14 pages 链接:https://arxiv.org/abs/2106.08823 摘要:最先进的Transformer模型使用基于两两点积的自我注意,其计算成本是输入序列长度的二次方。在本文中,我们研究了在一个典型的输入分布上用这种点积机制计算的注意分数的整体结构,并研究了它们变化的主成分。通过对全注意得分矩阵及其各行的特征分析,我们发现注意得分之间的变化主要存在于低维特征空间中。此外,对于不同的层甚至不同的Transformer模型,我们发现这些特征空间之间有明显的重叠。在此基础上,我们提出只计算一部分记号对的分数,并用它们来估计其余记号对的分数。除了调查重建注意分数本身的准确性之外,我们还调查了采用这些近似的训练Transformer模型,并分析了对整体准确性的影响。我们的分析和提出的方法提供了如何平衡精确的成对注意的好处和其显著的计算开销的见解。 摘要:State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this dot product mechanism on a typical distribution of inputs, and study the principal components of their variation. Through eigen analysis of full attention score matrices, as well as of their individual rows, we find that most of the variation among attention scores lie in a low-dimensional eigenspace. Moreover, we find significant overlap between these eigenspaces for different layers and even different transformer models. Based on this, we propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs. Beyond investigating the accuracy of reconstructing attention scores themselves, we investigate training transformer models that employ these approximations, and analyze the effect on overall accuracy. Our analysis and the proposed method provide insights into how to balance the benefits of exact pair-wise attention and its significant computational expense.

【2】 Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation 标题:Bellman残差的神经函数逼近分析与优化

作者:Martin Gottwald,Sven Gronauer,Hao Shen,Klaus Diepold 备注:29 pages, 8 figures 链接:https://arxiv.org/abs/2106.08774 摘要:近年来,深度强化学习的发展证明了神经网络在解决具有较大甚至连续状态空间的挑战性问题方面的优越性能。一种具体的方法是通过最小化均方Bellman误差函数来部署神经网络来逼近值函数。尽管深度强化学习取得了巨大的成功,但是开发可靠有效的数值算法来最小化Bellman误差仍然是一个重要的科学兴趣和实际需求。这样一个挑战部分是由于基本的优化问题是高度非凸的,或者像半梯度算法那样使用不正确的梯度信息。在这项工作中,我们分析了均方贝尔曼误差从一个光滑的优化角度结合残余梯度公式。我们的贡献是双重的。首先,我们分析了误差函数的临界点,并就神经网络的优化和设计选择提供了技术见解。当假设全局极小值存在且目标满足一定条件时,利用过参数化神经网络可以消除次优局部极小值。在分析的基础上,我们可以构造一个有效的近似牛顿算法,并在数值上证明了该算法具有局部二次收敛到全局极小值的理论性质。其次,我们用连续控制问题的经验证明了该算法的可行性和推广能力,并对我们的临界点分析进行了数值验证。我们概述了半梯度的短期到来。为了从近似牛顿算法中获益,在训练过程中必须考虑均方贝尔曼误差的完全导数。 摘要:Recent development of Deep Reinforcement Learning has demonstrated superior performance of neural networks in solving challenging problems with large or even continuous state spaces. One specific approach is to deploy neural networks to approximate value functions by minimising the Mean Squared Bellman Error function. Despite great successes of Deep Reinforcement Learning, development of reliable and efficient numerical algorithms to minimise the Bellman Error is still of great scientific interest and practical demand. Such a challenge is partially due to the underlying optimisation problem being highly non-convex or using incorrect gradient information as done in Semi-Gradient algorithms. In this work, we analyse the Mean Squared Bellman Error from a smooth optimisation perspective combined with a Residual Gradient formulation. Our contribution is two-fold. First, we analyse critical points of the error function and provide technical insights on the optimisation procure and design choices for neural networks. When the existence of global minima is assumed and the objective fulfils certain conditions we can eliminate suboptimal local minima when using over-parametrised neural networks. We can construct an efficient Approximate Newton's algorithm based on our analysis and confirm theoretical properties of this algorithm such as being locally quadratically convergent to a global minimum numerically. Second, we demonstrate feasibility and generalisation capabilities of the proposed algorithm empirically using continuous control problems and provide a numerical verification of our critical point analysis. We outline the short coming of Semi-Gradients. To benefit from an approximate Newton's algorithm complete derivatives of the Mean Squared Bellman error must be considered during training.

【3】 Best of both worlds: local and global explanations with human-understandable concepts 标题:两全其美:使用人类可理解的概念进行本地和全球解释

作者:Jessica Schrouff,Sebastien Baur,Shaobo Hou,Diana Mincu,Eric Loreaux,Ralph Blanes,James Wexler,Alan Karthikesalingam,Been Kim 链接:https://arxiv.org/abs/2106.08641 摘要:可解释性技术旨在提供模型决策背后的基本原理,通常通过解释单个预测(局部解释,例如“为什么这个病人被诊断出这种情况”)或一类预测(全局解释,例如“为什么病人被诊断出这种情况”)。虽然有许多方法都侧重于其中一种,但很少有框架能够以一致的方式提供本地和全局解释。在这项工作中,我们结合两个强大的现有技术,一个局部(集成梯度,IG)和一个全局(测试与概念激活向量),以提供局部和全局的概念为基础的解释。我们首先用两个已知地面真实度的合成数据集来验证我们的想法,然后用一个基准自然图像数据集来进一步演示。我们用各种概念、目标类、模型架构和IG基线来测试我们的方法。我们表明,我们的方法改进了TCAV相比地面真相的全局解释,并提供了有用的见解。我们希望我们的工作为在许多现有的本地和全球方法之间建立桥梁提供了一个步骤,以实现两个世界的最佳效果。 摘要:Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction (local explanation, e.g. `why is this patient diagnosed with this condition') or a class of predictions (global explanation, e.g. `why are patients diagnosed with this condition in general'). While there are many methods focused on either one, few frameworks can provide both local and global explanations in a consistent manner. In this work, we combine two powerful existing techniques, one local (Integrated Gradients, IG) and one global (Testing with Concept Activation Vectors), to provide local, and global concept-based explanations. We first validate our idea using two synthetic datasets with a known ground truth, and further demonstrate with a benchmark natural image dataset. We test our method with various concepts, target classes, model architectures and IG baselines. We show that our method improves global explanations over TCAV when compared to ground truth, and provides useful insights. We hope our work provides a step towards building bridges between many existing local and global methods to get the best of both worlds.

【4】 Machine learning-based analysis of hyperspectral images for automated sepsis diagnosis 标题:基于机器学习的高光谱图像分析在脓毒症自动诊断中的应用

作者:Maximilian Dietrich,Silvia Seidlitz,Nicholas Schreck,Manuel Wiesenfarth,Patrick Godau,Minu Tizabi,Jan Sellner,Sebastian Marx,Samuel Knödler,Michael M. Allers,Leonardo Ayala,Karsten Schmidt,Thorsten Brenner,Alexander Studier-Fischer,Felix Nickel,Beat P. Müller-Stich,Annette Kopp-Schneider,Markus A. Weigand,Lena Maier-Hein 备注:Maximilian Dietrich and Silvia Seidlitz contributed equally. Markus A. Weigand and Lena Maier-Hein contributed equally 链接:https://arxiv.org/abs/2106.08445 摘要:脓毒症是世界范围内导致死亡和严重疾病的主要原因。虽然用于早期诊断的可靠的生物标志物仍然缺失,但最近的研究表明,高光谱成像(HSI)有可能通过监测微循环改变来克服这一瓶颈。然而,基于HSI数据的基于机器学习的脓毒症自动诊断至今尚未被探索。鉴于文献中的这一差距,我们利用现有的数据集(1)研究基于HSI的脓毒症自动诊断是否可行,以及(2)提出一系列与基于HSI的组织分类相关的可能的混杂因素。虽然我们能够利用现有数据对脓毒症进行分类,准确度超过98美元,%$,但我们的研究还揭示了几个与受试者、治疗和成像相关的混杂因素,这些因素可能导致在患者组之间不平衡时高估算法性能。我们的结论是,进一步的前瞻性研究,仔细设计这些混杂因素,是必要的,以证实在这项研究中获得的初步结果。 摘要:Sepsis is a leading cause of mortality and critical illness worldwide. While robust biomarkers for early diagnosis are still missing, recent work indicates that hyperspectral imaging (HSI) has the potential to overcome this bottleneck by monitoring microcirculatory alterations. Automated machine learning-based diagnosis of sepsis based on HSI data, however, has not been explored to date. Given this gap in the literature, we leveraged an existing data set to (1) investigate whether HSI-based automated diagnosis of sepsis is possible and (2) put forth a list of possible confounders relevant for HSI-based tissue classification. While we were able to classify sepsis with an accuracy of over $98,%$ using the existing data, our research also revealed several subject-, therapy- and imaging-related confounders that may lead to an overestimation of algorithm performance when not balanced across the patient groups. We conclude that further prospective studies, carefully designed with respect to these confounders, are necessary to confirm the preliminary results obtained in this study.

【5】 Scalable Quasi-Bayesian Inference for Instrumental Variable Regression 标题:辅助变量回归的可伸缩拟贝叶斯推理

作者:Ziyu Wang,Yuhao Zhou,Tongzheng Ren,Jun Zhu 备注:ZW and YZ contribute equally 链接:https://arxiv.org/abs/2106.08750 摘要:近年来,使用灵活的机器学习模型进行工具变量(IV)回归的研究兴起,但不确定性量化方法的研究还很缺乏。在这项工作中,我们提出了一个可扩展的准贝叶斯过程IV回归,建立在最近发展的核IV模型。与针对IV的贝叶斯建模相反,我们的方法不需要对数据生成过程进行额外的假设,并且产生了一种可伸缩的近似推理算法,其时间开销与相应的点估计方法相当。我们的算法可以进一步扩展到神经网络模型。我们分析了所提出的准后验方法的理论性质,并通过实证分析验证了该方法的竞争性能。 摘要:Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking. In this work we present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models. Contrary to Bayesian modeling for IV, our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods. Our algorithm can be further extended to work with neural network models. We analyze the theoretical properties of the proposed quasi-posterior, and demonstrate through empirical evaluation the competitive performance of our method.

检测相关(4篇)

【1】 Early fault detection with multi-target neural networks 标题:基于多目标神经网络的早期故障检测

作者:Angela Meyer 链接:https://arxiv.org/abs/2106.08957 摘要:风力发电在全世界都有强劲的增长。与此同时,能源市场利润率的不断下降,使风电场管理者能够探索降低风机运行和维护成本的方案。基于传感器的状态监测有助于对汽轮机子系统进行远程诊断,从而在需要不可预见的维护时实现更快的响应。提出了利用汽轮机监控与数据采集(SCADA)系统的数据进行状态监测的方法,介绍了基于SCADA的汽轮机状态变量单任务正常运行模型的故障检测与诊断方法。随着监控与数据采集(SCADA)通道数量的强劲增长,数千个独立的单目标模型如今已就位,用于监控单个涡轮机。最近提出了多目标学习来限制模型的数量。本研究将多目标神经网路应用于传动系元件的早期故障侦测。将齿轮轴承故障检测的精度和延迟与最新的单目标方法进行了比较。我们发现,多目标多层感知器(mlp)至少比单目标多层感知器更早地检测到故障。与单目标模型相比,多目标mlp可以提前几天检测到故障。这可以在维护工作的规划和执行方面提供显著的优势。同时,多目标MLPs达到了相同的预测稳定性水平。 摘要:Wind power is seeing a strong growth around the world. At the same time, shrinking profit margins in the energy markets let wind farm managers explore options for cost reductions in the turbine operation and maintenance. Sensor-based condition monitoring facilitates remote diagnostics of turbine subsystems, enabling faster responses when unforeseen maintenance is required. Condition monitoring with data from the turbines' supervisory control and data acquisition (SCADA) systems was proposed and SCADA-based fault detection and diagnosis approaches introduced based on single-task normal operation models of turbine state variables. As the number of SCADA channels has grown strongly, thousands of independent single-target models are in place today for monitoring a single turbine. Multi-target learning was recently proposed to limit the number of models. This study applied multi-target neural networks to the task of early fault detection in drive-train components. The accuracy and delay of detecting gear bearing faults were compared to state-of-the-art single-target approaches. We found that multi-target multi-layer perceptrons (MLPs) detected faults at least as early and in many cases earlier than single-target MLPs. The multi-target MLPs could detect faults up to several days earlier than the single-target models. This can deliver a significant advantage in the planning and performance of maintenance work. At the same time, the multi-target MLPs achieved the same level of prediction stability.

【2】 Detecting chaos in lineage-trees: A deep learning approach 标题:谱系树中的混沌检测:一种深度学习方法

作者:Hagai Rappeport,Irit Levin Reisman,Naftali Tishby,Nathalie Q. Balaban 备注:12 pages, 7 figures 链接:https://arxiv.org/abs/2106.08956 摘要:许多复杂的现象,从天气系统到心跳节律模式,都被有效地模拟为低维动力系统。这类系统在一定条件下可能表现出混沌行为,因此基于经验测量的混沌检测能力是表征和预测这些过程的一个重要步骤。将一个系统分类为混沌通常需要估计其最大Lyapunov指数,该指数量化了状态空间中初始闭合轨迹的平均收敛或发散速率,并且通常将正值作为混沌的操作定义。在受动态噪声影响的系统中,从过程的观测值估计最大Lyapunov指数尤其具有挑战性,这对于许多实际过程的模型,特别是生物系统的模型都是如此。我们描述了一种基于综合生成的轨迹训练深度学习模型的从数据中估计最大Lyapunov指数的新方法,并且证明了该方法在相对较短的输入和一系列不同的动态系统中产生准确和噪声鲁棒的预测。我们的方法是独特的,因为它可以分析树形数据,在生物环境中无处不在的拓扑结构,特别是在细胞或生物体的谱系动力学。我们还描述了模型为预测而提取的输入信息的类型,从而可以更深入地理解在不同拓扑中分析混沌的不同方法。 摘要:Many complex phenomena, from weather systems to heartbeat rhythm patterns, are effectively modeled as low-dimensional dynamical systems. Such systems may behave chaotically under certain conditions, and so the ability to detect chaos based on empirical measurement is an important step in characterizing and predicting these processes. Classifying a system as chaotic usually requires estimating its largest Lyapunov exponent, which quantifies the average rate of convergence or divergence of initially close trajectories in state space, and for which a positive value is generally accepted as an operational definition of chaos. Estimating the largest Lyapunov exponent from observations of a process is especially challenging in systems affected by dynamical noise, which is the case for many models of real-world processes, in particular models of biological systems. We describe a novel method for estimating the largest Lyapunov exponent from data, based on training Deep Learning models on synthetically generated trajectories, and demonstrate that this method yields accurate and noise-robust predictions given relatively short inputs and across a range of different dynamical systems. Our method is unique in that it can analyze tree-shaped data, a ubiquitous topology in biological settings, and specifically in dynamics over lineages of cells or organisms. We also characterize the types of input information extracted by our models for their predictions, allowing for a deeper understanding into the different ways by which chaos can be analyzed in different topologies.

【3】 ModelDiff: Testing-Based DNN Similarity Comparison for Model Reuse Detection 标题:ModelDiff:基于测试的DNN相似度比较模型重用检测

作者:Yuanchun Li,Ziqi Zhang,Bingyan Liu,Ziyue Yang,Yunxin Liu 备注:ISSTA 2021 链接:https://arxiv.org/abs/2106.08890 摘要:深度学习模式的知识可能会转移到学生模式,导致知识产权侵权或漏洞传播。检测这样的知识重用是非常重要的,因为可疑的模型可能不是白盒可访问的和/或可能服务于不同的任务。本文提出了一种基于测试的深度学习模型相似性比较方法ModelDiff。我们不是直接比较两个模型的权重、激活或输出,而是在同一组测试输入上比较它们的行为模式。具体来说,模型的行为模式被表示为决策距离向量(DDV),其中每个元素是模型对一对输入的反应之间的距离。两个模型之间的知识相似度用其ddv之间的余弦相似度来度量。为了评估ModelDiff,我们创建了一个包含144对模型的基准,这些模型涵盖了最流行的模型重用方法,包括转移学习、模型压缩和模型窃取。我们的方法在基准测试上达到了91.7%的正确率,证明了使用ModelDiff进行模型重用检测的有效性。一项针对移动深度学习应用程序的研究表明,ModelDiff在现实世界模型上是可行的。 摘要:The knowledge of a deep learning model may be transferred to a student model, leading to intellectual property infringement or vulnerability propagation. Detecting such knowledge reuse is nontrivial because the suspect models may not be white-box accessible and/or may serve different tasks. In this paper, we propose ModelDiff, a testing-based approach to deep learning model similarity comparison. Instead of directly comparing the weights, activations, or outputs of two models, we compare their behavioral patterns on the same set of test inputs. Specifically, the behavioral pattern of a model is represented as a decision distance vector (DDV), in which each element is the distance between the model's reactions to a pair of inputs. The knowledge similarity between two models is measured with the cosine similarity between their DDVs. To evaluate ModelDiff, we created a benchmark that contains 144 pairs of models that cover most popular model reuse methods, including transfer learning, model compression, and model stealing. Our method achieved 91.7% correctness on the benchmark, which demonstrates the effectiveness of using ModelDiff for model reuse detection. A study on mobile deep learning apps has shown the feasibility of ModelDiff on real-world models.

【4】 Comparison of Outlier Detection Techniques for Structured Data 标题:结构化数据孤立点检测技术的比较

作者:Amulya Agarwal,Nitin Gupta 备注:13 pages 链接:https://arxiv.org/abs/2106.08779 摘要:离群点是指一个观测值或一个数据点远离给定数据集中的其余数据点,或者我们可以说离群点远离观测值的质心。异常值的存在会扭曲统计度量和数据分布,从而导致对基础数据和关系的错误表示。可以看出,在建模之前从训练数据集中去除异常值可以得到更好的预测。随着机器学习的发展,离群点检测模型也在飞速发展。这项工作的目的是突出和比较一些现有的离群点检测技术,以便数据科学家在建立机器学习模型时使用这些信息来选择离群点算法。 摘要:An outlier is an observation or a data point that is far from rest of the data points in a given dataset or we can be said that an outlier is away from the center of mass of observations. Presence of outliers can skew statistical measures and data distributions which can lead to misleading representation of the underlying data and relationships. It is seen that the removal of outliers from the training dataset before modeling can give better predictions. With the advancement of machine learning, the outlier detection models are also advancing at a good pace. The goal of this work is to highlight and compare some of the existing outlier detection techniques for the data scientists to use that information for outlier algorithm selection while building a machine learning model.

分类|识别(5篇)

【1】 Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data 标题:具有小的强标注数据和大的弱标注数据的命名实体识别

作者:Haoming Jiang,Danqing Zhang,Tianyu Cao,Bing Yin,Tuo Zhao 备注:The 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) 链接:https://arxiv.org/abs/2106.08977 摘要:弱监督在许多自然语言处理任务中显示出了良好的效果,如命名实体识别(NER)。现有的研究主要集中在只学习弱监督下的deep-NER模型,即不需要任何人工标注,并且表明仅使用弱标记数据可以获得很好的性能,但是使用人工/强标记数据仍然不如完全监督的NER。在本文中,我们考虑一个更实际的场景,其中我们有少量的强标记数据和大量的弱标记数据。不幸的是,我们观察到,当我们在强标记和弱标记数据的简单或加权组合上训练deep-NER模型时,弱标记数据不一定改善甚至恶化模型性能(由于弱标记中的广泛噪声)。为了解决这个问题,我们提出了一个新的多阶段计算框架——针,它包含三个基本要素:(1)弱标记完成;(2)噪声感知损失函数;(3)强标记数据的最终微调。通过在电子商务查询引擎和生物医学引擎上的实验,证明了NEEDLE算法能够有效地抑制弱标签的噪声,并优于现有的方法。特别是,我们在3个生物医学NER数据集上获得了新的SOTA F1分数:BC5CDR化学93.74,BC5CDR疾病90.69,NCBI疾病92.28。 摘要:Weak supervision has shown promising results in many natural language processing tasks, such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human annotation, and shows that by merely using weakly labeled data, one can achieve good performance, though still underperforms fully supervised NER with manually/strongly labeled data. In this paper, we consider a more practical scenario, where we have both a small amount of strongly labeled data and a large amount of weakly labeled data. Unfortunately, we observe that weakly labeled data does not necessarily improve, or even deteriorate the model performance (due to the extensive noise in the weak labels) when we train deep NER models over a simple or weighted combination of the strongly labeled and weakly labeled data. To address this issue, we propose a new multi-stage computational framework -- NEEDLE with three essential ingredients: (1) weak label completion, (2) noise-aware loss function, and (3) final fine-tuning over the strongly labeled data. Through experiments on E-commerce query NER and Biomedical NER, we demonstrate that NEEDLE can effectively suppress the noise of the weak labels and outperforms existing methods. In particular, we achieve new SOTA F1-scores on 3 Biomedical NER datasets: BC5CDR-chem 93.74, BC5CDR-disease 90.69, NCBI-disease 92.28.

【2】 Multi-Class Classification from Single-Class Data with Confidences 标题:基于置信度的单类数据的多类分类

作者:Yuzhou Cao,Lei Feng,Senlin Shu,Yitian Xu,Bo An,Gang Niu,Masashi Sugiyama 备注:23 pages, 1 figure 链接:https://arxiv.org/abs/2106.08864 摘要:我们能从单个类的数据中学习多类分类器吗?我们证明了在没有损失函数、模型和优化器的任何假设的情况下,当置信度(即所有类的类后验概率)可用时,我们可以在严格的一致性保证下从单个类的数据中成功地学习多类分类器。具体来说,我们提出了一个与损失/模型/优化器无关的经验风险最小化框架。该方法不需要在给定的类和其他类之间建立边界,即使没有其他类的数据,也可以在所有类之间进行区分分类。我们进一步从理论和实验上证明,即使所提供的置信度是高噪声的,我们的方法也能与一个简单的修正保持Bayes一致。然后,我们提供了一个扩展我们的方法的情况下,数据从一个子集的所有类是可用的。实验结果证明了该方法的有效性。 摘要:Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an empirical risk minimization framework that is loss-/model-/optimizer-independent. Instead of constructing a boundary between the given class and other classes, our method can conduct discriminative classification between all the classes even if no data from the other classes are provided. We further theoretically and experimentally show that our method can be Bayes-consistent with a simple modification even if the provided confidences are highly noisy. Then, we provide an extension of our method for the case where data from a subset of all the classes are available. Experimental results demonstrate the effectiveness of our methods.

【3】 A Dataset-Level Geometric Framework for Ensemble Classifiers 标题:一种面向集成分类器的数据集级几何框架

作者:Shengli Wu,Weimin Ding 备注:number of pages: 32 number of figures: 2 链接:https://arxiv.org/abs/2106.08658 摘要:集成分类器在人工智能和机器学习领域已经得到了广泛的研究。多数投票和加权多数投票是集成学习中两种常用的组合方案。然而,对它们的理解充其量是不完整的,有些性质甚至被误解。本文在数据集层次的几何框架下,形式化地给出了这两种方案的一组性质。两个关键因素,每一个组件基分类器的性能和每一对组件分类器之间的不相似性是由相同的度量-欧氏距离来评估的。因此,集成成为一个确定性问题,集成的性能可以通过一个公式直接计算。我们证明了几个有趣的定理,并解释了它们对集合的含义。特别地,我们比较和比较了组件分类器数目对这两种集成方案的影响。在使用其他指标(如准确度)时,也进行了实证研究以验证理论结果。我们相信,本文的结果对于我们理解这两种组合方案的基本性质以及总体上集成分类器的原理是非常有用的。这些结果也有助于我们研究集成分类器中的一些问题,如集成性能预测、选择少量的基分类器以获得高效的集成。 摘要:Ensemble classifiers have been investigated by many in the artificial intelligence and machine learning community. Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning. However, understanding of them is incomplete at best, with some properties even misunderstood. In this paper, we present a group of properties of these two schemes formally under a dataset-level geometric framework. Two key factors, every component base classifier's performance and dissimilarity between each pair of component classifiers are evaluated by the same metric - the Euclidean distance. Consequently, ensembling becomes a deterministic problem and the performance of an ensemble can be calculated directly by a formula. We prove several theorems of interest and explain their implications for ensembles. In particular, we compare and contrast the effect of the number of component classifiers on these two types of ensemble schemes. Empirical investigation is also conducted to verify the theoretical results when other metrics such as accuracy are used. We believe that the results from this paper are very useful for us to understand the fundamental properties of these two combination schemes and the principles of ensemble classifiers in general. The results are also helpful for us to investigate some issues in ensemble classifiers, such as ensemble performance prediction, selecting a small number of base classifiers to obtain efficient and effective ensembles.

【4】 Silhouettes and quasi residual plots for neural nets and tree-based classifiers 标题:神经网络和树型分类器的轮廓图和拟残差图

作者:Jakob Raymaekers,Peter J. Rousseeuw 链接:https://arxiv.org/abs/2106.08814 摘要:神经网络分类和基于树的分类方法是机器学习的有力工具。这些分类器和其他分类器的内部工作存在有趣的可视化。在这里,我们追求一个不同的目标,那就是在训练数据或测试数据中可视化被分类的案例。一个重要的方面是一个案例是否已经被分类到它的给定类(标签),或者分类器是否想要将它分配到不同的类。这反映在替代类(PAC)的(条件和后验)概率中。高PAC表示标签偏差,即案例被错误标记的可能性。PAC被用来构建一个轮廓图,这个轮廓图在精神上与聚类分析的轮廓图相似(Rousseeuw,1987)。平均轮廓宽度可用于比较同一数据集的不同分类。我们还将绘制PAC与数据特征的准残差图,这可能会导致对数据的更深入了解。这些数据特征之一是每个案例离其给定类有多远。图形显示在包含图像、混合特征和tweet的基准数据集上进行说明和解释。 摘要:Classification by neural nets and by tree-based methods are powerful tools of machine learning. There exist interesting visualizations of the inner workings of these and other classifiers. Here we pursue a different goal, which is to visualize the cases being classified, either in training data or in test data. An important aspect is whether a case has been classified to its given class (label) or whether the classifier wants to assign it to different class. This is reflected in the (conditional and posterior) probability of the alternative class (PAC). A high PAC indicates label bias, i.e. the possibility that the case was mislabeled. The PAC is used to construct a silhouette plot which is similar in spirit to the silhouette plot for cluster analysis (Rousseeuw, 1987). The average silhouette width can be used to compare different classifications of the same dataset. We will also draw quasi residual plots of the PAC versus a data feature, which may lead to more insight in the data. One of these data features is how far each case lies from its given class. The graphical displays are illustrated and interpreted on benchmark data sets containing images, mixed features, and tweets.

【5】 Lorenz System State Stability Identification using Neural Networks 标题:基于神经网络的Lorenz系统状态稳定性辨识

作者:Megha Subramanian,Ramakrishna Tipireddy,Samrat Chatterjee 链接:https://arxiv.org/abs/2106.08489 摘要:非线性动力学系统,如Lorenz63方程,在本质上是混沌的,对初始条件敏感。结果,初始条件中的一个小扰动在几个时间步之后导致状态轨迹的偏差。精确识别系统状态所需的算法和计算资源取决于解是否处于过渡区。我们把过渡区和非过渡区分别称为不稳定区和稳定区。我们将一个系统状态标记为稳定的,如果它的最近过去和未来状态位于同一个系统中。然而,在给定的时间步长下,我们不知道系统是处于稳定区域还是不稳定区域。本文提出并训练一个前馈(多层感知器)神经网络,将Lorenz系统的状态分为稳定和不稳定两类。我们把这个任务作为一个有监督学习问题,在Lorenz系统上训练神经网络,该系统的状态被标记为稳定或不稳定。然后,我们测试神经网络模型的能力,以确定稳定和不稳定的状态对一个不同的洛伦兹系统,这是产生不同的初始条件。我们还评估了不匹配情况下的分类性能,即当训练和验证数据的初始条件从不同的时间间隔取样时。我们证明了某些规范化方案可以极大地提高神经网络的性能,特别是在这些不匹配的情况下。本文提出的分类框架可以作为更大范围的序贯决策框架的预处理器,在序贯决策框架中,基于观测到的稳定或不稳定状态进行决策。 摘要:Nonlinear dynamical systems such as Lorenz63 equations are known to be chaotic in nature and sensitive to initial conditions. As a result, a small perturbation in the initial conditions results in deviation in state trajectory after a few time steps. The algorithms and computational resources needed to accurately identify the system states vary depending on whether the solution is in transition region or not. We refer to the transition and non-transition regions as unstable and stable regions respectively. We label a system state to be stable if it's immediate past and future states reside in the same regime. However, at a given time step we don't have the prior knowledge about whether system is in stable or unstable region. In this paper, we develop and train a feed forward (multi-layer perceptron) Neural Network to classify the system states of a Lorenz system as stable and unstable. We pose this task as a supervised learning problem where we train the neural network on Lorenz system which have states labeled as stable or unstable. We then test the ability of the neural network models to identify the stable and unstable states on a different Lorenz system that is generated using different initial conditions. We also evaluate the classification performance in the mismatched case i.e., when the initial conditions for training and validation data are sampled from different intervals. We show that certain normalization schemes can greatly improve the performance of neural networks in especially these mismatched scenarios. The classification framework developed in the paper can be a preprocessor for a larger context of sequential decision making framework where the decision making is performed based on observed stable or unstable states.

表征(2篇)

【1】 Evolving Image Compositions for Feature Representation Learning 标题:用于特征表征学习的进化图像合成

作者:Paola Cascante-Bonilla,Arshdeep Sekhon,Yanjun Qi,Vicente Ordonez 链接:https://arxiv.org/abs/2106.09011 摘要:用于视觉识别的卷积神经网络需要大量的训练样本,并且通常受益于数据增强。本文提出了PatchMix,一种数据增强方法,通过将成对的图像按网格模式组合成补丁来创建新的样本。这些新样本的地面真值标签设置为与每个图像的面片数成比例。然后,我们在面片级别添加一组额外的损失,以在面片和图像级别进行正则化并鼓励良好的表示。使用PatchMix在ImageNet上训练的ResNet-50模型在一系列基准测试中表现出了优越的迁移学习能力。虽然PatchMix可以依赖于随机配对和随机网格模式进行混合,但我们探索了进化搜索作为一种指导策略来联合发现最佳网格模式和图像配对。为此,我们设想了一个适应度函数,它绕过了重新训练模型来评估每个选择的需要。通过这种方式,PatchMix比基于CIFAR-10( 1.91)、CIFAR-100( 5.31)、Tiny Imagenet( 3.52)和Imagenet( 1.16)的基础模型有显著的优势,也比以前最先进的成对增强策略有显著的优势。 摘要:Convolutional neural networks for visual recognition require large amounts of training samples and usually benefit from data augmentation. This paper proposes PatchMix, a data augmentation method that creates new samples by composing patches from pairs of images in a grid-like pattern. These new samples' ground truth labels are set as proportional to the number of patches from each image. We then add a set of additional losses at the patch-level to regularize and to encourage good representations at both the patch and image levels. A ResNet-50 model trained on ImageNet using PatchMix exhibits superior transfer learning capabilities across a wide array of benchmarks. Although PatchMix can rely on random pairings and random grid-like patterns for mixing, we explore evolutionary search as a guiding strategy to discover optimal grid-like patterns and image pairing jointly. For this purpose, we conceive a fitness function that bypasses the need to re-train a model to evaluate each choice. In this way, PatchMix outperforms a base model on CIFAR-10 ( 1.91), CIFAR-100 ( 5.31), Tiny Imagenet ( 3.52), and ImageNet ( 1.16) by significant margins, also outperforming previous state-of-the-art pairwise augmentation strategies.

【2】 RefBERT: Compressing BERT by Referencing to Pre-computed Representations 标题:RefBERT:通过参考预计算表示来压缩BERT

作者:Xinyi Wang,Haiqin Yang,Liang Zhao,Yang Mo,Jianping Shen 备注:8 pages, 1 figure, 3 tables, in IJCNN'21 链接:https://arxiv.org/abs/2106.08898 摘要:最近开发的大型预训练语言模型,如BERT,在许多下游自然语言处理应用中取得了显著的效果。这些预先训练好的语言模型通常包含数亿个参数,并且在实际应用中存在计算量大和延迟的问题。在保持模型在下游应用中的性能的同时,为了快速训练和推理,需要减少模型的计算开销。有几行工作利用知识提炼将教师模型压缩为较小的学生模型。然而,他们在推理时通常会抛弃老师的知识。不同的是,在本文中,我们提出了RefBERT来利用从教师那里学到的知识,即促进参考样本上预先计算的BERT表示,并将BERT压缩成一个较小的学生模型。为了保证我们的建议,我们对损失函数和参考样本的使用提供了理论依据。值得注意的是,理论结果表明,在参考样本中加入预先计算的教师表示确实增加了学习学生模型的互信息。最后,我们进行了实证评估,结果表明我们的RefBERT可以击败香草TinyBERT超过8.1%,在GLUE基准上达到了$BERTBASE$的94%以上。同时,RefBERT比BERT${rm BASE}$小7.4倍,推理速度快9.5倍。 摘要:Recently developed large pre-trained language models, e.g., BERT, have achieved remarkable performance in many downstream natural language processing applications. These pre-trained language models often contain hundreds of millions of parameters and suffer from high computation and latency in real-world applications. It is desirable to reduce the computation overhead of the models for fast training and inference while keeping the model performance in downstream applications. Several lines of work utilize knowledge distillation to compress the teacher model to a smaller student model. However, they usually discard the teacher's knowledge when in inference. Differently, in this paper, we propose RefBERT to leverage the knowledge learned from the teacher, i.e., facilitating the pre-computed BERT representation on the reference sample and compressing BERT into a smaller student model. To guarantee our proposal, we provide theoretical justification on the loss function and the usage of reference samples. Significantly, the theoretical result shows that including the pre-computed teacher's representations on the reference samples indeed increases the mutual information in learning the student model. Finally, we conduct the empirical evaluation and show that our RefBERT can beat the vanilla TinyBERT over 8.1% and achieves more than 94% of the performance of $BERTBASE$ on the GLUE benchmark. Meanwhile, RefBERT is 7.4x smaller and 9.5x faster on inference than BERT$_{rm BASE}$.

优化|敛散性(6篇)

【1】 Optimized ensemble deep learning framework for scalable forecasting of dynamics containing extreme events 标题:用于包含极端事件的动态的可扩展预测的优化集成深度学习框架

作者:Arnob Ray,Tanujit Chakraborty,Dibakar Ghosh 备注:14 pages, 8 figures, any comments are welcome 链接:https://arxiv.org/abs/2106.08968 摘要:深度学习模型和集成方法的显著灵活性和适应性导致了它们在理解许多物理现象方面的广泛应用。传统上,这两种技术在实际应用中基本上被视为独立的方法。本研究开发了一个优化的集成深度学习(OEDL)框架,其中这两种机器学习技术被联合使用,以实现模型精度、稳定性、可扩展性和可再现性的协同改进,从而推动了动力学预测的新一轮应用。不可预测性是混沌动力学的一个重要特征,因此预测非线性系统的不可预测性是科学界的一个重要课题。当极端事件的预测成为我们关注的焦点问题时,它变得更具挑战性。在这种情况下,基于前馈神经网络、储层计算和长-短期记忆的最佳凸组合的OEDL模型可以在推进极端事件动力学预测中发挥关键作用。对于数值模拟和真实世界的数据集,组合框架比单独的深度学习者和标准集成框架都能产生最佳的样本外性能。我们展示了OEDL框架在预测Lienard型系统产生的极端事件、巴西COVID-19病例、圣胡安登革热病例和尼诺3.4地区海面温度方面的突出表现。 摘要:The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble deep learning (OEDL) framework wherein these two machine learning techniques are jointly used to achieve synergistic improvements in model accuracy, stability, scalability, and reproducibility prompting a new wave of applications in the forecasting of dynamics. Unpredictability is considered as one of the key features of chaotic dynamics, so forecasting such dynamics of nonlinear systems is a relevant issue in the scientific community. It becomes more challenging when the prediction of extreme events is the focus issue for us. In this circumstance, the proposed OEDL model based on a best convex combination of feed-forward neural networks, reservoir computing, and long short-term memory can play a key role in advancing predictions of dynamics consisting of extreme events. The combined framework can generate the best out-of-sample performance than the individual deep learners and standard ensemble framework for both numerically simulated and real world data sets. We exhibit the outstanding performance of the OEDL framework for forecasting extreme events generated from Lienard-type system, prediction of COVID-19 cases in Brazil, dengue cases in San Juan, and sea surface temperature in Nino 3.4 region.

【2】 Optimal Accounting of Differential Privacy via Characteristic Function 标题:基于特征函数的差分隐私最优计费

作者:Yuqing Zhu,Jinshuo Dong,Yu-Xiang Wang 链接:https://arxiv.org/abs/2106.08567 摘要:在差异隐私(DP)中,描述组合的隐私退化,即隐私会计,是一个基本的主题,它在差异私有机器学习和联合学习中有许多应用。我们建议通过某个“最坏情况”隐私损失随机变量的特征函数($phi$-函数)来统一最近的进展(Renyi DP、隐私配置文件、$f$-DP和PLD形式主义)。我们表明,我们的方法允许像Renyi-DP这样的自然自适应合成,像PLD一样提供严格的隐私记帐,并且可以(通常是无损地)转换为隐私配置文件和$f$-DP,从而提供$(epsilon、delta)$-DP保证和可解释的折衷函数。在算法上,我们提出了一种解析傅立叶会计,它象征性地表示$phi$-函数的复对数,并使用高斯求积进行数值计算。在几个流行的DP机制和它们的子样本对应,我们证明了我们的方法在理论和实验上的灵活性和严密性。 摘要:Characterizing the privacy degradation over compositions, i.e., privacy accounting, is a fundamental topic in differential privacy (DP) with many applications to differentially private machine learning and federated learning. We propose a unification of recent advances (Renyi DP, privacy profiles, $f$-DP and the PLD formalism) via the characteristic function ($phi$-function) of a certain ``worst-case'' privacy loss random variable. We show that our approach allows natural adaptive composition like Renyi DP, provides exactly tight privacy accounting like PLD, and can be (often losslessly) converted to privacy profile and $f$-DP, thus providing $(epsilon,delta)$-DP guarantees and interpretable tradeoff functions. Algorithmically, we propose an analytical Fourier accountant that represents the complex logarithm of $phi$-functions symbolically and uses Gaussian quadrature for numerical computation. On several popular DP mechanisms and their subsampled counterparts, we demonstrate the flexibility and tightness of our approach in theory and experiments.

【3】 Non-PSD Matrix Sketching with Applications to Regression and Optimization 标题:非PSD矩阵绘制及其在回归和优化中的应用

作者:Zhili Feng,Fred Roosta,David P. Woodruff 链接:https://arxiv.org/abs/2106.08544 摘要:各种降维技术已应用于涉及大型矩阵的计算。底层矩阵被随机压缩成一个较小的矩阵,同时大致保留了它的许多原始属性。因此,许多昂贵的计算可以在小矩阵上执行。半正定(PSD)矩阵的画法已经被很好地理解了,但是有许多相关矩阵不是PSD的应用,包括非凸优化中的Hessian矩阵和涉及复数的回归应用中的协方差矩阵。本文提出了一种新的非PSD矩阵及其平方根的降维方法。我们将展示如何将这些技术用于多个下游任务。特别地,我们展示了如何将所提出的矩阵绘制技术用于凸优化和非凸优化、每$1leq pleqinfty$的$ellp$-回归以及向量矩阵-向量查询。 摘要:A variety of dimensionality reduction techniques have been applied for computations involving large matrices. The underlying matrix is randomly compressed into a smaller one, while approximately retaining many of its original properties. As a result, much of the expensive computation can be performed on the small matrix. The sketching of positive semidefinite (PSD) matrices is well understood, but there are many applications where the related matrices are not PSD, including Hessian matrices in non-convex optimization and covariance matrices in regression applications involving complex numbers. In this paper, we present novel dimensionality reduction methods for non-PSD matrices, as well as their ``square-roots", which involve matrices with complex entries. We show how these techniques can be used for multiple downstream tasks. In particular, we show how to use the proposed matrix sketching techniques for both convex and non-convex optimization, $ell_p$-regression for every $1 leq p leq infty$, and vector-matrix-vector queries.

【4】 Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path 标题:随机最短路径的隐式有限域逼近及高效优化算法

作者:Liyu Chen,Mehdi Jafarnia-Jahromi,Rahul Jain,Haipeng Luo 链接:https://arxiv.org/abs/2106.08377 摘要:在随机最短路径(SSP)模型中,我们引入了一个通用的模板来开发后悔最小化算法,只要保证一定的性质,就可以实现最小化最大后悔。我们分析的关键是一种新的技术,称为隐式有限视界近似,它只在分析中用有限视界近似SSP模型,而没有显式实现。利用这个模板,我们开发了两个新的算法:第一个是无模型的(据我们所知是文献中的第一个)和严格正成本下的极大极小优化算法;第二种方法是基于模型的,即使在零成本状态-动作对的情况下也是极小极大最优的,与[Tarbouriech等人,2021b]的现有最佳结果相匹配。重要的是,这两种算法都允许高度稀疏的更新,使得它们的计算效率比所有现有算法都高。此外,两者都可以完全无参数化。 摘要:We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is model-free (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is model-based and minimax optimal even with zero-cost state-action pairs, matching the best existing result from [Tarbouriech et al., 2021b]. Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameter-free.

【5】 Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent 标题:Bures-Wasserstein流形上的平均:梯度下降的无量纲收敛

作者:Jason M. Altschuler,Sinho Chewi,Patrik Gerber,Austin J. Stromme 备注:48 pages, 8 figures 链接:https://arxiv.org/abs/2106.08502 摘要:我们研究了计算高斯分布重心的一阶优化算法。尽管目标是测地非凸的,黎曼GD经验上收敛很快,事实上比现成的方法,如欧几里德GD和SDP解算器更快。这与Riemannian-GD最著名的理论结果形成了鲜明的对比,后者依赖于维数。在这项工作中,我们证明了新的测地凸性结果提供了更强的迭代控制,产生了无量纲收敛速度。我们的技术还可以分析两个相关的平均概念,熵正则化重心和几何中值,为这些问题的黎曼GD提供了第一个收敛保证。 摘要:We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.

【6】 A Framework for Discovering Optimal Solutions in Photonic Inverse Design 标题:光子反设计中寻找最优解的一个框架

作者:Jagrit Digani,Phillip Hon,Artur R. Davoyan 备注:16 pages, 4 figures 链接:https://arxiv.org/abs/2106.08419 摘要:光子反设计已成为复杂光学系统不可缺少的工程工具。在许多情况下,重要的是优化材料和几何结构,这将导致复杂的非光滑搜索空间与多个局部极小值。寻找接近全局最优解的解可能是一项计算困难的任务。在这里,我们开发了一个框架,可以加快搜索解决方案接近全局最优的复杂优化空间。研究了具有代表性的黑盒优化算法,包括遗传算法(GA)、粒子群优化算法(PSO)、模拟退火算法(SA)和网格自适应直接搜索算法(NOMAD)。然后,我们提出并利用两步的方法,在任意复杂的搜索空间中确定最佳性能的算法。我们揭示了搜索空间复杂度和算法性能之间的联系,发现PSO和NOMAD对于光子反设计中遇到的混合整数问题,尤其是考虑到材料组合的情况下,具有更好的性能。我们的结果不同于一般预期的遗传算法的优点。我们的发现将有助于更有效地设计具有最佳性能的光子系统。 摘要:Photonic inverse design has emerged as an indispensable engineering tool for complex optical systems. In many instances it is important to optimize for both material and geometry configurations, which results in complex non-smooth search spaces with multiple local minima. Finding solutions approaching global optimum may present a computationally intractable task. Here, we develop a framework that allows expediting the search of solutions close to global optimum on complex optimization spaces. We study the way representative black box optimization algorithms work, including genetic algorithm (GA), particle swarm optimization (PSO), simulated annealing (SA), and mesh adaptive direct search (NOMAD). We then propose and utilize a two-step approach that identifies best performance algorithms on arbitrarily complex search spaces. We reveal a connection between the search space complexity and algorithm performance and find that PSO and NOMAD consistently deliver better performance for mixed integer problems encountered in photonic inverse design, particularly with the account of material combinations. Our results differ from a commonly anticipated advantage of GA. Our findings will foster more efficient design of photonic systems with optimal performance.

预测|估计(11篇)

【1】 Intelligent Tire-Based Slip Ratio Estimation Using Different Machine Learning Algorithms 标题:基于不同机器学习算法的智能轮胎滑移率估计

作者:Nan Xu,Zepeng Tang,Jianfeng Zhou,Hassan Askari 链接:https://arxiv.org/abs/2106.08961 摘要:轮胎纵向滑移率的估算对于提高车辆在行驶和制动条件下的控制性能具有重要意义。本文基于用于智能轮胎系统的三轴MEMS加速度计的加速度信号,采用四种机器学习算法(神经网络、梯度Boosting机、随机林和支持向量机)估计轮胎的滑移率。实验数据通过MTS实验平台采集。滤波后提取轮胎接触片内相应的加速度信号,用于上述机器学习算法的训练。比较了使用10倍CV实现的ML算法。CV结果中的NRMS误差表明,与其它方法相比,神经网络具有最高的精度。NN、GBM、RF和SVM的NRSM误差分别为2.59%、3.30%、4.21%和5.34%。在这些技术中,GBM的输出方差最小,结果更稳定。本文将智能轮胎系统与机器学习算法相结合,为轮胎滑移率的精确估计奠定了基础,这对于开发可靠的车辆控制算法至关重要。 摘要:Estimation of the longitudinal slip ratio of tires is important in boosting the control performance of the vehicle under driving and braking conditions. In this paper, the slip ratio is estimated using four machine learning algorithms (Neural Network, Gradient Boosting Machine, Random Forest and Support Vector Machine) based on the acceleration signals from the tri-axial MEMS accelerometers utilized in the intelligent tire system. The experimental data are collected through the MTS experimental platform. The corresponding acceleration signals within the tire contact patch are extracted after filtering to be used for the training the aforesaid machine learning algorithms. A comparison is provided between the implemented ML algorithms using a 10-fold CV. NRMS errors in the CV results indicate that NN has the highest accuracy in comparison with other techniques. The NRSM errors of NN, GBM, RF, and SVM are 2.59%, 3.30%, 4.21%, and 5.34%, respectively. Among these techniques, GBM has a more stable results as it has the smallest output variance. The present study with the fusion of intelligent tire system and machine learning algorithms paves the way for the accurate estimation of tire slip ratio, which is critical for the development of reliable vehicle control algorithms.

【2】 FGLP: A Federated Fine-Grained Location Prediction System for Mobile Users 标题:FGLP:一种面向移动用户的联邦细粒度位置预测系统

作者:Xiaopeng Jiang,Shuai Zhao,Guy Jacobson,Rittwik Jana,Wen-Ling Hsu,Manoop Talasila,Syed Anwar Aftab,Yi Chen,Cristian Borcea 链接:https://arxiv.org/abs/2106.08946 摘要:智能手机上的细粒度位置预测可以用来提高应用程序/系统的性能。应用场景包括作为5G网络质量在预测用户位置的函数的视频质量自适应,以及基于预测用户位置加速内容呈现的增强现实应用。此类用例要求预测误差与GPS误差在同一范围内,并且现有的位置预测工作无法达到这种精度水平。我们提出了一个基于手机上采集的GPS轨迹的移动用户细粒度位置预测(FGLP)系统。FGLP由两部分组成:联邦学习框架和预测模型。该框架运行在用户的手机上,也运行在一个服务器上,该服务器协调系统中所有用户的学习。FGLP将用户位置数据表示为抽象的2D空间中的相对点,这使得能够跨不同的物理空间进行学习。该模型融合了双向长短时记忆(BiLSTM)和卷积神经网络(CNN),其中BiLSTM学习移动用户的速度和方向,CNN学习用户的移动偏好等信息。FGLP使用联合学习来保护用户隐私并减少带宽消耗。实验结果表明,FGLP在预测精度上优于基线模型。我们还证明了FGLP与迁移学习相结合可以很好地实现模型的可重用性。最后,在几种Android手机上的测试结果证明了FGLP在现实生活中的可行性。 摘要:Fine-grained location prediction on smart phones can be used to improve app/system performance. Application scenarios include video quality adaptation as a function of the 5G network quality at predicted user locations, and augmented reality apps that speed up content rendering based on predicted user locations. Such use cases require prediction error in the same range as the GPS error, and no existing works on location prediction can achieve this level of accuracy. We present a system for fine-grained location prediction (FGLP) of mobile users, based on GPS traces collected on the phones. FGLP has two components: a federated learning framework and a prediction model. The framework runs on the phones of the users and also on a server that coordinates learning from all users in the system. FGLP represents the user location data as relative points in an abstract 2D space, which enables learning across different physical spaces. The model merges Bidirectional Long Short-Term Memory (BiLSTM) and Convolutional Neural Networks (CNN), where BiLSTM learns the speed and direction of the mobile users, and CNN learns information such as user movement preferences. FGLP uses federated learning to protect user privacy and reduce bandwidth consumption. Our experimental results, using a dataset with over 600,000 users, demonstrate that FGLP outperforms baseline models in terms of prediction accuracy. We also demonstrate that FGLP works well in conjunction with transfer learning, which enables model reusability. Finally, benchmark results on several types of Android phones demonstrate FGLP's feasibility in real life.

【3】 Nonparametric Empirical Bayes Estimation and Testing for Sparse and Heteroscedastic Signals 标题:稀疏和异方差信号的非参数经验Bayes估计与检验

作者:Junhui Cai,Xu Han,Ya'acov Ritov,Linda Zhao 链接:https://arxiv.org/abs/2106.08881 摘要:现代大尺度数据往往涉及对高维未知参数的估计和检验。对稀疏信号“大海捞针”的准确识别和错误发现控制是很有必要的。然而,现代数据结构中前所未有的复杂性和异构性要求新的机器学习工具能够有效地利用共性,并对稀疏性和异构性进行有力的调整。此外,高维参数的估计往往缺乏不确定性量化。本文提出了一种新的尖峰非参数混合先验(SNP)——尖峰增强稀疏性和非参数结构捕获信号。与现有方法相比,本文提出的方法能同时解决估计和检验问题,具有以下优点:1)稀疏性估计准确;2) 具有收缩/软阈值特性的点估计;3) 不确定度量化的可信区间;4) 控制错误发现率的最佳多重测试程序。我们的方法在模拟数据和基因表达案例研究中都表现出了很好的实证效果。 摘要:Large-scale modern data often involves estimation and testing for high-dimensional unknown parameters. It is desirable to identify the sparse signals, ``the needles in the haystack'', with accuracy and false discovery control. However, the unprecedented complexity and heterogeneity in modern data structure require new machine learning tools to effectively exploit commonalities and to robustly adjust for both sparsity and heterogeneity. In addition, estimates for high-dimensional parameters often lack uncertainty quantification. In this paper, we propose a novel Spike-and-Nonparametric mixture prior (SNP) -- a spike to promote the sparsity and a nonparametric structure to capture signals. In contrast to the state-of-the-art methods, the proposed methods solve the estimation and testing problem at once with several merits: 1) an accurate sparsity estimation; 2) point estimates with shrinkage/soft-thresholding property; 3) credible intervals for uncertainty quantification; 4) an optimal multiple testing procedure that controls false discovery rate. Our method exhibits promising empirical performance on both simulated data and a gene expression case study.

【4】 Predicting crop yields with little ground truth: A simple statistical model for in-season forecasting 标题:几乎没有地面真实的作物产量预测:一种简单的季节预测统计模型

作者:Nemo Semret 链接:https://arxiv.org/abs/2106.08720 摘要:我们提出了一个全自动的季节作物产量预测模型,旨在工作在缺乏国家以下的“地面真相”信息。我们的方法主要依赖于卫星数据,其特点是仔细的特征工程与简单的回归模型相结合。因此,它几乎可以在世界任何地方工作。将其应用于10个不同的作物-国家对(5种谷物——玉米、小麦、高粱、大麦和谷子,在埃塞俄比亚和肯尼亚这两个国家),我们在一年中9个月的预测中实现了5%-10%的rmse,在一年中3个月的预测中实现了7%-14%。该模型输出当前年度最终收益率的每日预测。每对作物国家使用大约400万个数据点进行训练。这些数据包括:历史国家水平的年产量、作物日历、作物覆盖率、NDVI、温度、降雨量和蒸发量。 摘要:We present a fully automated model for in-season crop yield prediction, designed to work where there is a dearth of sub-national "ground truth" information. Our approach relies primarily on satellite data and is characterized by careful feature engineering combined with a simple regression model. As such, it can work almost anywhere in the world. Applying it to 10 different crop-country pairs (5 cereals -- corn, wheat, sorghum, barley and millet, in 2 countries -- Ethiopia and Kenya), we achieve RMSEs of 5%-10% for predictions 9 months into the year, and 7%-14% for predictions 3 months into the year. The model outputs daily forecasts for the final yield of the current year. It is trained using approximately 4 million data points for each crop-country pair. These consist of: historical country-level annual yields, crop calendars, crop cover, NDVI, temperature, rainfall, and evapotransporation.

【5】 Predictive Modeling of Hospital Readmission: Challenges and Solutions 标题:医院再入院预测建模:挑战与解决方案

作者:Shuwen Wang,Xingquan Zhu 链接:https://arxiv.org/abs/2106.08488 摘要:医院再入院预测是从历史医学数据中学习模型,预测患者出院后某一段时间(30天或90天)内再入院的概率。其动机是帮助医疗服务提供者提供更好的治疗和出院后策略,降低再入院率,并最终降低医疗费用。由于疾病和医疗生态系统的内在复杂性,模拟医院再入院面临许多挑战。到目前为止,各种方法已经被开发出来,但是现有的文献不能提供一个完整的图像来回答一些基本的问题,例如在模拟医院再入院方面的主要挑战和解决方案是什么;用于再入院预测的典型特征/模型有哪些;如何实现有意义和透明的决策预测;以及在实际应用中部署预测方法时可能发生的冲突。在本文中,我们系统地回顾了医院再入院预测的计算模型,并提出了具有四个主要类别的挑战分类:(1)数据的多样性和复杂性(2) 数据不平衡、局部性和隐私性(3) 模型可解释性;模型实现。审查总结了每一类的方法,并强调了为应对挑战而提出的技术解决方案。此外,回顾可用于医院再入院建模的数据集和资源也提供了第一手资料,以支持研究人员和从业者设计有效和高效的医院再入院预测新方法。 摘要:Hospital readmission prediction is a study to learn models from historical medical data to predict probability of a patient returning to hospital in a certain period, 30 or 90 days, after the discharge. The motivation is to help health providers deliver better treatment and post-discharge strategies, lower the hospital readmission rate, and eventually reduce the medical costs. Due to inherent complexity of diseases and healthcare ecosystems, modeling hospital readmission is facing many challenges. By now, a variety of methods have been developed, but existing literature fails to deliver a complete picture to answer some fundamental questions, such as what are the main challenges and solutions in modeling hospital readmission; what are typical features/models used for readmission prediction; how to achieve meaningful and transparent predictions for decision making; and what are possible conflicts when deploying predictive approaches for real-world usages. In this paper, we systematically review computational models for hospital readmission prediction, and propose a taxonomy of challenges featuring four main categories: (1) data variety and complexity; (2) data imbalance, locality and privacy; (3) model interpretability; and (4) model implementation. The review summarizes methods in each category, and highlights technical solutions proposed to address the challenges. In addition, a review of datasets and resources available for hospital readmission modeling also provides firsthand materials to support researchers and practitioners to design new approaches for effective and efficient hospital readmission prediction.

【6】 Learning-based Support Estimation in Sublinear Time 标题:次线性时间下基于学习的支持度估计

作者:Talya Eden,Piotr Indyk,Shyam Narayanan,Ronitt Rubinfeld,Sandeep Silwal,Tal Wagner 备注:17 pages. Published as a conference paper in ICLR 2021 链接:https://arxiv.org/abs/2106.08396 摘要:我们考虑从一个随机的元素样本中估计一个大数据集中不同元素的数量(或者,等价地,由数据集引起的分布的支持大小)的问题。这个问题出现在许多应用领域,包括生物学、基因组学、计算机系统和语言学。过去十年的一系列研究产生了一种算法,该算法从大小为$O(log^2(1/varepsilon)cdot n/log n)$的样本中估计支持度高达$pmvarepsilon$,其中$n$是数据集大小。不幸的是,这个界限很紧,限制了对这个问题复杂性的进一步改进。在本文中,我们考虑了一个基于机器学习的预测器的估计算法,该预测器在给定任何元素的情况下,返回其频率的估计值。我们证明,如果预测因子在一个常数近似因子下是正确的,那么样本复杂度可以显著地降低到[\log(1/varepsilon)cdot n^{1-Theta(1/log(1/varepsilon))}.]我们使用来自{Hsu等人,ICLR'19}的基于神经网络的估计器作为预测因子,在一组数据集上评估所提出的算法。我们的实验证明,与最先进的算法相比,估计精度有了实质性的提高(高达3倍)。 摘要:We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to $ pm varepsilon n$ from a sample of size $O(log^2(1/varepsilon) cdot n/log n)$, where $n$ is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machine-learning-based predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to [ log (1/varepsilon) cdot n^{1-Theta(1/log(1/varepsilon))}. ] We evaluate the proposed algorithms on a collection of data sets, using the neural-network based estimators from {Hsu et al, ICLR'19} as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.

【7】 Predicting Unreliable Predictions by Shattering a Neural Network 标题:通过粉碎神经网络来预测不可靠的预测

作者:Xu Ji,Razvan Pascanu,Devon Hjelm,Andrea Vedaldi,Balaji Lakshminarayanan,Yoshua Bengio 链接:https://arxiv.org/abs/2106.08365 摘要:分段线性神经网络可以分为子函数,每个子函数都有自己的激活模式、域和经验误差。整个网络的经验误差可以写成子函数经验误差的期望。构造子函数经验误差的推广界表明,子函数在表示空间中被训练样本包围的越密集,其预测就越可靠。此外,它还表明,激活区较少的模型泛化效果更好,而抽象知识程度较高的模型泛化效果更好,其他条件都相同。我们不仅提出了一个推理子函数误差界的理论框架,而且提出了一种近似计算子函数误差界的实用方法,用于预测网络不能成功推广到哪些样本。我们测试了我们的方法对错误分类和分布外样本的检测,发现它在这两种情况下都有竞争力。简而言之,一些网络激活模式比其他模式具有更高的可靠性,并且可以使用子功能错误界限来识别这些模式。 摘要:Piecewise linear neural networks can be split into subfunctions, each with its own activation pattern, domain, and empirical error. Empirical error for the full network can be written as an expectation over empirical error of subfunctions. Constructing a generalization bound on subfunction empirical error indicates that the more densely a subfunction is surrounded by training samples in representation space, the more reliable its predictions are. Further, it suggests that models with fewer activation regions generalize better, and models that abstract knowledge to a greater degree generalize better, all else equal. We propose not only a theoretical framework to reason about subfunction error bounds but also a pragmatic way of approximately evaluating it, which we apply to predicting which samples the network will not successfully generalize to. We test our method on detection of misclassification and out-of-distribution samples, finding that it performs competitively in both cases. In short, some network activation patterns are associated with higher reliability than others, and these can be identified using subfunction error bounds.

【8】 Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding 标题:基于改进CNN的低复杂度视频编码帧间预测内插滤波器学习

作者:Luka Murn,Saverio Blasi,Alan F. Smeaton,Marta Mrak 备注:IEEE Open Journal of Signal Processing Special Issue on Applied AI and Machine Learning for Video Coding and Streaming, June 2021 链接:https://arxiv.org/abs/2106.08936 摘要:最新机器学习方法的多功能性使其成为改进下一代视频压缩解决方案的理想选择。不幸的是,这些方法通常会带来计算复杂度的显著增加,并且难以解释为可解释的模型,从而影响它们在实际视频编码应用中的实现潜力。提出了一种新的基于可解释神经网络的帧间预测方法,以提高分数精度运动补偿所需参考样本的插值精度。该方法需要训练一个神经网络,并从中导出一个完整的四分之一像素插值滤波器集,因为该网络具有线性结构,易于解释。一种新颖的训练框架使得每个网络分支都类似于一个特定的分数移位。这种实用的解决方案使得它与传统的视频编码方案一起使用非常有效。当在最先进的多用途视频编码(VVC)测试模型的上下文中实现时,对于在随机存取、低延迟B和低延迟P配置下的低分辨率序列,可分别平均实现0.77%、1.27%和2.25%的BD速率节省,与全cnn插值相比,学习的插值算法的复杂度显著降低。 摘要:The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs.

【9】 Economic Nowcasting with Long Short-Term Memory Artificial Neural Networks (LSTM) 标题:用长短期记忆人工神经网络(LSTM)进行经济预测

作者:Daniel Hopp 备注:21 pages, 3 figures 链接:https://arxiv.org/abs/2106.08901 摘要:近年来,人工神经网络(ANNs)在许多领域和学科中取得了许多进展。然而,它们对经济的影响相对较小。一种神经网络,长-短期记忆网络(LSTM),特别适合处理经济时间序列。在这里,对该体系结构的性能和特性进行了评估,并与目前经济预测领域流行的动态因素模型(DFM)进行了比较。在三个独立变量的即时预测中,LSTMs的结果优于DFMs;全球商品出口价值和数量,以及全球服务出口。进一步的优势包括它们能够处理各种时间频率的大量输入特征。缺点是无法将输入特征的贡献归因于模型输出,这是所有人工神经网络的共同点。为了通过避免对深度学习库的任何知识的需要来促进该方法的持续应用研究,使用PyTorch开发了一个附带的Python库,https://pypi.org/project/nowcast-lstm/. 摘要:Artificial neural networks (ANNs) have been the catalyst to numerous advances in a variety of fields and disciplines in recent years. Their impact on economics, however, has been comparatively muted. One type of ANN, the long short-term memory network (LSTM), is particularly wellsuited to deal with economic time-series. Here, the architecture's performance and characteristics are evaluated in comparison with the dynamic factor model (DFM), currently a popular choice in the field of economic nowcasting. LSTMs are found to produce superior results to DFMs in the nowcasting of three separate variables; global merchandise export values and volumes, and global services exports. Further advantages include their ability to handle large numbers of input features in a variety of time frequencies. A disadvantage is the inability to ascribe contributions of input features to model outputs, common to all ANNs. In order to facilitate continued applied research of the methodology by avoiding the need for any knowledge of deep-learning libraries, an accompanying Python library was developed using PyTorch, https://pypi.org/project/nowcast-lstm/.

【10】 Breaking The Dimension Dependence in Sparse Distribution Estimation under Communication Constraints 标题:在通信约束下打破稀疏分布估计中的维数依赖

作者:Wei-Ning Chen,Peter Kairouz,Ayfer Özgür 链接:https://arxiv.org/abs/2106.08597 摘要:我们考虑了在一个$b$位通信约束下,从样本中估计一个$d$维$s$稀疏离散分布的问题。关于此问题的$ellu 2$估计错误的最著名的先前结果是$Oleft(frac{slogleft({d}/{s}right)}{n2^b}right)$。令人惊讶的是,当样本大小$n$超过最小阈值$n^*(s,d,b)$,我们可以得到$O left(frac{s}{n2^b} right)$的$ellu 2$估计误差。这意味着当$n>n^*(s,d,b)$时,收敛速度不依赖于环境维度$d$,这与预先知道分布的支持度是一样的。接下来我们要问一个问题:``允许无量纲收敛的最小$n^*(s,d,b)$是多少?'。对于上界$n^*(s,d,b)$,我们开发了新的定位方案来准确有效地定位未知支持。对于非交互设置,我们显示$n^*(s,d,b)=Oleft(minleft({d^2log^2 d}/{2^b},{s^4log^2 d}/{2^b}right)right)$。此外,我们将此问题与非自适应群测试相结合,得到了当$n=tilde{Omega} left({s^4log^4d}/{2^b} right)$时的多项式时间估计方案。这种基于组测试的方案对稀疏参数$s$是自适应的,因此可以在不知道它的情况下应用。对于交互设置,我们提出了一种新的基于树的估计方案,并表明实现无量纲收敛所需的最小样本量可以进一步减少到$n^*(s,d,b)=tilde{O} left({s^2log^2d}/{2^b} right)$。 摘要:We consider the problem of estimating a $d$-dimensional $s$-sparse discrete distribution from its samples observed under a $b$-bit communication constraint. The best-known previous result on $ell_2$ estimation error for this problem is $Oleft( frac{slogleft( {d}/{s}right)}{n2^b}right)$. Surprisingly, we show that when sample size $n$ exceeds a minimum threshold $n^*(s, d, b)$, we can achieve an $ell_2$ estimation error of $Oleft( frac{s}{n2^b}right)$. This implies that when $n>n^*(s, d, b)$ the convergence rate does not depend on the ambient dimension $d$ and is the same as knowing the support of the distribution beforehand. We next ask the question: ``what is the minimum $n^*(s, d, b)$ that allows dimension-free convergence?''. To upper bound $n^*(s, d, b)$, we develop novel localization schemes to accurately and efficiently localize the unknown support. For the non-interactive setting, we show that $n^*(s, d, b) = Oleft( min left( {d^2log^2 d}/{2^b}, {s^4log^2 d}/{2^b}right) right)$. Moreover, we connect the problem with non-adaptive group testing and obtain a polynomial-time estimation scheme when $n = tilde{Omega}left({s^4log^4 d}/{2^b}right)$. This group testing based scheme is adaptive to the sparsity parameter $s$, and hence can be applied without knowing it. For the interactive setting, we propose a novel tree-based estimation scheme and show that the minimum sample-size needed to achieve dimension-free convergence can be further reduced to $n^*(s, d, b) = tilde{O}left( {s^2log^2 d}/{2^b} right)$.

【11】 Model Predictive Control with and without Terminal Weight: Stability and Algorithms 标题:有终端权和无终端权的模型预测控制:稳定性和算法

作者:Wen-Hua Chen 链接:https://arxiv.org/abs/2011.14193 摘要:本文提出了一种用于模型预测控制(MPC)的稳定性分析工具。有限视界无终端权的MPC稳定性分析是一个长期存在的开放性问题。利用修正值函数作为Lyapunov函数的候选函数,利用最优性原理,建立了这类广泛应用的MPC算法的稳定性条件。提出了一种新的无终端权的稳定保证MPC算法。通过设计一个新的由提前一步代价值函数定义的子级集,给出了检验该算法递归可行性和稳定性的条件。新的稳定性条件和导出的MPC克服了现有基于终端权值的MPC框架中存在的困难,包括需要寻找合适的终端权值和可能由于终端权值不合适而导致的性能下降。这项工作进一步扩展到MPC与终端重量的完整性。数值算例表明了该方法的有效性,而现有的稳定性分析方法要么不适用,要么结果相当保守。结果表明,所提出的工具提供了许多实现稳定性的机制:调整状态和/或控制权重、延长视界长度以及在优化过程中对第一或第二状态添加简单的额外约束。 摘要:This paper presents stability analysis tools for model predictive control (MPC) with and without terminal weight. Stability analysis of MPC with a limited horizon but without terminal weight is a long-standing open problem. By using a modified value function as an Lyapunov function candidate and the principle of optimality, this paper establishes stability conditions for this type of widely spread MPC algorithms. A new stability guaranteed MPC algorithm without terminal weight (MPCS) is presented. With the help of designing a new sublevel set defined by the value function of one-step ahead stage cost, conditions for checking its recursive feasibility and stability of the proposed MPC algorithm are presented. The new stability condition and the derived MPCS overcome the difficulties arising in the existing terminal weight based MPC framework, including the need of searching a suitable terminal weight and possible poor performance caused by an inappropriate terminal weight. This work is further extended to MPC with a terminal weight for the completeness. Numerical examples are presented to demonstrate the effectiveness of the proposed tool, whereas the existing stability analysis tools are either not applicable or lead to quite conservative results. It shows that the proposed tools offer a number of mechanisms to achieve stability: adjusting state and/or control weights, extending the length of horizon, and adding a simple extra constraint on the first or second state in the optimisation.

其他神经网络|深度学习|模型|建模(39篇)

【1】 Model-Based Counterfactual Synthesizer for Interpretation 标题:基于模型的反事实解释合成器

作者:Fan Yang,Sahan Suresh Alva,Jiahao Chen,Xia Hu 备注:11 pages in total; To appear in KDD'21; 链接:https://arxiv.org/abs/2106.08971 摘要:反事实作为一种新兴的模型解释,近年来受到了研究者和实践者的广泛关注。反事实解释将“假设”情景的探索形式化,是使用一组假设数据样本的基于实例的推理实例。反事实本质上显示了模型决策如何随输入扰动而改变。现有的反事实生成方法主要是基于算法的,时间效率低,并且对不同的查询假设相同的反事实域。为了解决这些局限性,我们提出了一个基于模型的反事实合成器(MCS)框架来解释机器学习模型。我们首先分析了基于模型的反事实过程,并利用条件生成对抗网(CGAN)构造了一个基本合成器。为了更好地逼近这些罕见查询的反事实世界,我们采用伞形抽样技术对MCS框架进行训练。此外,我们还将属性间的因果关系与模型归纳偏差相结合,对MCS框架进行了改进,并从因果关系识别的角度验证了MCS框架的设计正确性。在多个数据集上的实验结果证明了本文提出的MCS框架的有效性和有效性,并验证了与其他方案相比的优势。 摘要:Counterfactuals, serving as one of the emerging type of model interpretations, have recently received attention from both researchers and practitioners. Counterfactual explanations formalize the exploration of ``what-if'' scenarios, and are an instance of example-based reasoning using a set of hypothetical data samples. Counterfactuals essentially show how the model decision alters with input perturbations. Existing methods for generating counterfactuals are mainly algorithm-based, which are time-inefficient and assume the same counterfactual universe for different queries. To address these limitations, we propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models. We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN). To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training. Besides, we also enhance the MCS framework by incorporating the causal dependence among attributes with model inductive bias, and validate its design correctness from the causality identification perspective. Experimental results on several datasets demonstrate the effectiveness as well as efficiency of our proposed MCS framework, and verify the advantages compared with other alternatives.

【2】 Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch 标题:潜伏Agent:用于从头开始训练的神经网络的可扩展隐藏触发器后门

作者:Hossein Souri,Micah Goldblum,Liam Fowl,Rama Chellappa,Tom Goldstein 链接:https://arxiv.org/abs/2106.08970 摘要:随着机器学习数据的管理变得越来越自动化,数据集篡改是一个越来越大的威胁。后门攻击者篡改训练数据,将漏洞嵌入基于该数据训练的模型中。然后在推理时,通过在模型的输入中放置“触发器”来激活此漏洞。典型的后门攻击将触发器直接插入到训练数据中,尽管在检查时可以看到这种攻击的存在。相比之下,隐藏的触发器后门攻击完全不需要在训练数据中放置触发器就可以实现中毒。然而,这种隐藏的触发攻击对从零开始训练的神经网络是无效的。我们开发了一种新的隐藏触发攻击Sleeper Agent,该Agent在构造过程中采用了梯度匹配、数据选择和目标模型再训练等技术。Sleeper Agent是第一个能够有效对抗从头训练的神经网络的隐藏触发后门攻击。我们在ImageNet和黑盒环境中演示了它的有效性。我们的实现代码可以在https://github.com/hsouri/Sleeper-Agent. 摘要:As the curation of data for machine learning becomes increasingly automated, dataset tampering is a mounting threat. Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data. This vulnerability is then activated at inference time by placing a "trigger" into the model's input. Typical backdoor attacks insert the trigger directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all. However, this hidden trigger attack is ineffective at poisoning neural networks trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch. We demonstrate its effectiveness on ImageNet and in black-box settings. Our implementation code can be found at https://github.com/hsouri/Sleeper-Agent.

【3】 Estimating the Robustness of Public Transport Systems Using Machine Learning 标题:基于机器学习的公交系统鲁棒性评估

作者:Matthias Müller-Hannemann,Ralf Rückert,Alexander Schiewe,Anita Schöbel 链接:https://arxiv.org/abs/2106.08967 摘要:具有吸引力和成本效益的公共交通系统规划是一个高度复杂的优化过程,涉及许多步骤。从乘客的角度集成健壮性使任务更具挑战性。由于文献中对鲁棒性的定义多种多样,对公共交通系统鲁棒性的真实评估是在大量可能的场景下模拟其性能。不幸的是,这在计算上非常昂贵。因此,本文利用机器学习的方法,探索了一种新的基于场景的鲁棒性逼近方法。我们通过收集公共交通系统及其乘客需求的关键特征子集并训练人工神经网络来学习给定鲁棒性测试的结果,从而实现了一种具有非常高精度的快速方法。然后,该网络仅利用其关键特征就能够高精度地预测未经训练实例的鲁棒性,从而为交通规划者提供了一个在恒定时间内近似鲁棒性的鲁棒性预言。这样的oracle可以作为一个黑匣子来提高综合公共交通规划的局部搜索框架的健壮性。在不同基准实例的计算实验中,我们证明了我们的预测具有很好的质量。 摘要:The planning of attractive and cost efficient public transport systems is a highly complex optimization process involving many steps. Integrating robustness from a passenger's point of view makes the task even more challenging. With numerous different definitions of robustness in literature, a real-world acceptable evaluation of the robustness of a public transport system is to simulate its performance under a large number of possible scenarios. Unfortunately, this is computationally very expensive. In this paper, we therefore explore a new way of such a scenario-based robustness approximation by using methods from machine learning. We achieve a fast approach with a very high accuracy by gathering a subset of key features of a public transport system and its passenger demand and training an artificial neural network to learn the outcome of a given set of robustness tests. The network is then able to predict the robustness of untrained instances with high accuracy using only its key features, allowing for a robustness oracle for transport planners that approximates the robustness in constant time. Such an oracle can be used as black box to increase the robustness within a local search framework for integrated public transportation planning. In computational experiments with different benchmark instances we demonstrate an excellent quality of our predictions.

【4】 Deep-learning based Tools for Automated Protocol Definition of Advanced Diagnostic Imaging Exams 标题:基于深度学习的高级诊断影像检查自动协议定义工具

作者:Andrew S. Nencka,Mohammad Sherafati,Timothy Goebel,Parag Tolat,Kevin M. Koch 链接:https://arxiv.org/abs/2106.08963 摘要:目的:本研究采用自然语言处理(NLP)和深度学习(DL)两种方法,评价基于顺序的磁共振成像(MRI)检查方案自动分配的有效性和影响。方法:应用NLP工具对116000多例MRI检查进行回顾性分析,采用200种不同的亚专业方案(“局部”方案类)。针对70%的处理数据对单独的DL模型进行了“本地”协议、93个美国放射学会(“ACR”)协议和48个“通用”协议的训练。DL模型在“自动协议(AP)”推理模式(返回最高建议)和“临床决策支持(CDS)”推理模式(返回多达10个方案供放射科医生审查)下进行评估。根据前两个推荐的相应神经网络的归一化输出得分之间的差异,计算和分析每个协议推荐的准确性。结果:AP模式下的最高预测方案正确率分别为82.8%、73.8%和69.3%。在CDS模式下,所有协议类的准确率均高于96%。然而,在目前的验证性能水平下,所提出的模型对大规模成像网络提供了适度的、积极的财务影响。结论:基于DL的协议自动化是可行的,可以调整为路由自动协议的大部分检查,更一般的协议具有更高的准确性。对测试算法的经济性分析表明,改进的算法性能是产生一个实用的自动测试原型工具的必要条件。 摘要:Purpose: This study evaluates the effectiveness and impact of automated order-based protocol assignment for magnetic resonance imaging (MRI) exams using natural language processing (NLP) and deep learning (DL). Methods: NLP tools were applied to retrospectively process orders from over 116,000 MRI exams with 200 unique sub-specialized protocols ("Local" protocol class). Separate DL models were trained on 70% of the processed data for "Local" protocols as well as 93 American College of Radiology ("ACR") protocols and 48 "General" protocols. The DL Models were assessed in an "auto-protocoling (AP)" inference mode which returns the top recommendation and in a "clinical decision support (CDS)" inference mode which returns up to 10 protocols for radiologist review. The accuracy of each protocol recommendation was computed and analyzed based on the difference between the normalized output score of the corresponding neural net for the top two recommendations. Results: The top predicted protocol in AP mode was correct for 82.8%, 73.8%, and 69.3% of the test cases for "General", "ACR", and "Local" protocol classes, respectively. Higher levels of accuracy over 96% were obtained for all protocol classes in CDS mode. However, at current validation performance levels, the proposed models offer modest, positive, financial impact on large-scale imaging networks. Conclusions: DL-based protocol automation is feasible and can be tuned to route substantial fractions of exams for auto-protocoling, with higher accuracy with more general protocols. Economic analyses of the tested algorithms indicate that improved algorithm performance is required to yield a practical exam auto-protocoling tool for sub-specialized imaging exams.

【5】 Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better 标题:高效深度学习:使深度学习模型更小、更快、更好的研究综述

作者:Gaurav Menghani 链接:https://arxiv.org/abs/2106.08962 摘要:深度学习已经彻底改变了计算机视觉、自然语言理解、语音识别、信息检索等领域。然而,随着深度学习模式的逐步改进,其参数数量、延迟时间、训练所需资源等都有了显著增加。因此,关注模型的这些足迹度量也变得非常重要,而不仅仅是它的质量。我们提出并激发了深度学习中的效率问题,随后对模型效率的五个核心领域(跨建模技术、基础设施和硬件)进行了全面的调查,并在那里进行了开创性的工作。我们还提供了一个基于实验的指南和代码,供实践者优化他们的模型训练和部署。我们相信这是在高效深度学习领域的第一次全面调查,涵盖了从建模技术到硬件支持的模型效率。我们希望这项调查能为读者提供心理模型和对该领域的必要理解,以便应用通用效率技术立即获得显著的改进,同时也为他们提供进一步研究和实验的想法,以获得额外的收益。 摘要:Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

【6】 Recursive Construction of Stable Assemblies of Recurrent Neural Networks 标题:递归构造递归神经网络的稳定集合

作者:Michaela Ennis,Leo Kozachkov,Jean-Jacques Slotine 备注:23 pages, 3 figures 链接:https://arxiv.org/abs/2106.08928 摘要:现代机器学习的高级应用可能会涉及到经过训练的网络的组合,就像已经在诸如DeepMind的AlphaGo这样壮观的系统中使用的那样。以一种有效和稳定的方式递归地构建这样的组合,同时也允许对单个网络的不断完善——就像自然界对生物网络所做的那样——将需要新的分析工具。本文在这一方向上迈出了一步,建立了大类非线性递归网络和神经常微分方程的压缩性质,并展示了这些量化性质如何反过来以系统的方式递归地构造稳定的网络。这些结果也可以用来稳定地结合具有量化收缩特性的递归网络和物理系统。类似地,它们也可以应用于认知的模块化计算模型。 摘要:Advanced applications of modern machine learning will likely involve combinations of trained networks, as are already used in spectacular systems such as DeepMind's AlphaGo. Recursively building such combinations in an effective and stable fashion while also allowing for continual refinement of the individual networks - as nature does for biological networks - will require new analysis tools. This paper takes a step in this direction by establishing contraction properties of broad classes of nonlinear recurrent networks and neural ODEs, and showing how these quantified properties allow in turn to recursively construct stable networks of networks in a systematic fashion. The results can also be used to stably combine recurrent networks and physical systems with quantified contraction properties. Similarly, they may be applied to modular computational models of cognition.

【7】 On the long-term learning ability of LSTM LMs 标题:论LSTM LMS的长期学习能力

作者:Wim Boes,Robbe Van Rompaey,Lyan Verwimp,Joris Pelemans,Hugo Van hamme,Patrick Wambacq 备注:None 链接:https://arxiv.org/abs/2106.08927 摘要:我们通过对句子和语篇水平的长-短期记忆语言模型(LSTM-LMs)进行基于连续词袋(CBOW)模型的语境扩展评估,并分析其性能,来考察长-短期记忆语言模型(LSTM-LMs)的长期学习能力。我们根据文本和语音进行评估。使用长期语境模块的句子层次模型的表现与普通语篇层次的LSTM-LMs相当。另一方面,这种扩展并没有为语篇层面的模型提供收益。这些发现表明,语篇层面的LSTM-LMs已经依赖于语境信息来进行长期学习。 摘要:We inspect the long-term learning ability of Long Short-Term Memory language models (LSTM LMs) by evaluating a contextual extension based on the Continuous Bag-of-Words (CBOW) model for both sentence- and discourse-level LSTM LMs and by analyzing its performance. We evaluate on text and speech. Sentence-level models using the long-term contextual module perform comparably to vanilla discourse-level LSTM LMs. On the other hand, the extension does not provide gains for discourse-level models. These findings indicate that discourse-level LSTM LMs already rely on contextual information to perform long-term learning.

【8】 A Spiking Neural Network for Image Segmentation 标题:一种用于图像分割的尖峰神经网络

作者:Kinjal Patel,Eric Hunsberger,Sean Batir,Chris Eliasmith 链接:https://arxiv.org/abs/2106.08921 摘要:我们试图研究神经形态计算在计算机视觉中的可扩展性,目的是在降低功耗的同时复制计算机视觉任务的非神经形态性能。我们使用Nengo框架将深度人工神经网络(ANN)结构U-Net转换为尖峰神经网络(SNN)结构。使用由细胞显微图像组成的ISBI 2D EM分割数据集的修改版本,基于速率和基于峰值的模型都经过训练和优化,以获得基准性能和功率。提出了一种优化片间通信的划分方法,以提高在Loihi神经形态芯片上部署多芯片网络的速度和能量效率。我们探讨了调整Loihi神经元的放电频率的优势,以使ANN转换为SNN的准确度损失最小,并优化了能量消耗。我们提出了一个基于百分位数的正则化损失函数来限制神经元的尖峰率在期望的范围内。SNN直接由相应的ANN转换而来,在神经元数目和权值相同的情况下,表现出与ANN相似的语义分割。然而,Intel Loihi神经形态芯片上的神经形态实现在联机运行时(一次一个图像)比传统硬件(CPU、GPU)的能效高出2倍以上。当所有权重(Loihi、CPU和GPU网络)被量化为8位时,在不牺牲网络的任务性能精度的情况下实现这些功率改进。 摘要:We seek to investigate the scalability of neuromorphic computing for computer vision, with the objective of replicating non-neuromorphic performance on computer vision tasks while reducing power consumption. We convert the deep Artificial Neural Network (ANN) architecture U-Net to a Spiking Neural Network (SNN) architecture using the Nengo framework. Both rate-based and spike-based models are trained and optimized for benchmarking performance and power, using a modified version of the ISBI 2D EM Segmentation dataset consisting of microscope images of cells. We propose a partitioning method to optimize inter-chip communication to improve speed and energy efficiency when deploying multi-chip networks on the Loihi neuromorphic chip. We explore the advantages of regularizing firing rates of Loihi neurons for converting ANN to SNN with minimum accuracy loss and optimized energy consumption. We propose a percentile based regularization loss function to limit the spiking rate of the neuron between a desired range. The SNN is converted directly from the corresponding ANN, and demonstrates similar semantic segmentation as the ANN using the same number of neurons and weights. However, the neuromorphic implementation on the Intel Loihi neuromorphic chip is over 2x more energy-efficient than conventional hardware (CPU, GPU) when running online (one image at a time). These power improvements are achieved without sacrificing the task performance accuracy of the network, and when all weights (Loihi, CPU, and GPU networks) are quantized to 8 bits.

【9】 C^3: Compositional Counterfactual Constrastive Learning for Video-grounded Dialogues标题:C^3:视频对话的构图反事实对比学习

作者:Hung Le,Nancy F. Chen,Steven C. H. Hoi 备注:22 pages, 11 figures, 7 tables 链接:https://arxiv.org/abs/2106.08914 摘要:基于视频的对话系统旨在整合视频理解和对话理解,以产生与对话和视频背景相关的回应。大多数现有的方法都采用了深度学习模型,并且在相对较小的数据集上取得了显著的效果。然而,结果部分是通过利用数据集中的偏差而不是发展多模态推理来实现的,这导致了有限的泛化。在本文中,我们提出了一种新的作文反事实对比学习方法(C^3$),来发展基于视频的对话中事实和反事实样本之间的对比训练。具体来说,我们设计了基于视频中的时间步长和对话中的标记的事实/反事实抽样,并提出了利用对象级或动作级方差的对比损失函数。与以往的方法不同,本文着重研究了合成输出标记之间的隐藏状态表示,以优化生成环境中的表示空间。我们在视听场景感知对话(AVSD)基准上取得了很好的性能,并展示了我们的方法在视频和对话背景方面的优势。 摘要:Video-grounded dialogue systems aim to integrate video understanding and dialogue understanding to generate responses that are relevant to both the dialogue and video context. Most existing approaches employ deep learning models and have achieved remarkable performance, given the relatively small datasets available. However, the results are partly accomplished by exploiting biases in the datasets rather than developing multimodal reasoning, resulting in limited generalization. In this paper, we propose a novel approach of Compositional Counterfactual Contrastive Learning ($C^3$) to develop contrastive training between factual and counterfactual samples in video-grounded dialogues. Specifically, we design factual/counterfactual sampling based on the temporal steps in videos and tokens in dialogues and propose contrastive loss functions that exploit object-level or action-level variance. Different from prior approaches, we focus on contrastive hidden state representations among compositional output tokens to optimize the representation space in a generation setting. We achieved promising performance gains on the Audio-Visual Scene-Aware Dialogues (AVSD) benchmark and showed the benefits of our approach in grounding video and dialogue context.

【10】 A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections 标题:大文档集问答中联合文档和摘录排序的神经模型

作者:Dimitris Pappas,Ion Androutsopoulos 备注:12 pages, 3 figures, 4 tables, ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2106.08908 摘要:用于大型文档集合的问答(QA)系统通常使用管道(i)检索可能相关的文档,(ii)对它们重新排序,(iii)对排名靠前的文档的段落或其他片段进行排序,以及(iv)选择排名靠前的片段的跨度作为精确答案。管道在概念上很简单,但错误会从一个组件传播到下一个组件,而后面的组件无法修改早期的决策。我们提出了一个文档和代码片段联合排序的体系结构,即两个中间阶段,它利用了一种直觉,即相关文档有好的代码片段,而好的代码片段来自相关文档。该体系结构是通用的,可以用于任何神经文本相关ranker。我们在POSIT-DRMM(PDRMM)和基于BERT的ranker的基础上对该体系结构的两个主要实例进行了实验。对BIOASQ的生物医学数据进行的实验表明,我们的联合模型在片段检索(QA的主要目标)方面的性能大大优于流水线,可训练的参数更少,在文档检索方面也保持竞争力。此外,我们的基于PDRMM的联合模型与基于BERT的模型相比具有竞争力,尽管使用的参数数量级较少。这些说法也得到了对两批BIOASQ的人体评估的支持。为了在另一个数据集上测试我们的关键发现,我们修改了自然问题数据集,以便它也可以用于文档和片段检索。我们的基于PDRMM的联合模型在修改后的自然问题数据集的片段检索中再次优于相应的流水线,尽管它在文档检索中的性能比流水线差。我们公开了我们的代码和修改后的自然问题数据集。 摘要:Question answering (QA) systems for large document collections typically use pipelines that (i) retrieve possibly relevant documents, (ii) re-rank them, (iii) rank paragraphs or other snippets of the top-ranked documents, and (iv) select spans of the top-ranked snippets as exact answers. Pipelines are conceptually simple, but errors propagate from one component to the next, without later components being able to revise earlier decisions. We present an architecture for joint document and snippet ranking, the two middle stages, which leverages the intuition that relevant documents have good snippets and good snippets come from relevant documents. The architecture is general and can be used with any neural text relevance ranker. We experiment with two main instantiations of the architecture, based on POSIT-DRMM (PDRMM) and a BERT-based ranker. Experiments on biomedical data from BIOASQ show that our joint models vastly outperform the pipelines in snippet retrieval, the main goal for QA, with fewer trainable parameters, also remaining competitive in document retrieval. Furthermore, our joint PDRMM-based model is competitive with BERT-based models, despite using orders of magnitude fewer parameters. These claims are also supported by human evaluation on two test batches of BIOASQ. To test our key findings on another dataset, we modified the Natural Questions dataset so that it can also be used for document and snippet retrieval. Our joint PDRMM-based model again outperforms the corresponding pipeline in snippet retrieval on the modified Natural Questions dataset, even though it performs worse than the pipeline in document retrieval. We make our code and the modified Natural Questions dataset publicly available.

【11】 Random feature neural networks learn Black-Scholes type PDEs without curse of dimensionality 标题:随机特征神经网络学习无维数灾的Black-Scholes型偏微分方程

作者:Lukas Gonon 链接:https://arxiv.org/abs/2106.08900 摘要:本文研究了用随机特征神经网络学习与Black-Scholes和更一般的指数L′evy模型有关的Kolmogorov偏(积分)微分方程。随机特征神经网络是一种仅输出权值可训练的单隐层前馈神经网络。这使得训练特别简单,但是(先验地)降低了表达能力。有趣的是,这不是布莱克-斯科尔斯型偏微分方程的情况,如我们在这里所示。我们推导了学习充分非退化Black-Scholes型模型的随机神经网络预测误差的界。文中给出了一个完整的误差分析,结果表明所导出的边界不受维数灾难的影响。我们还研究了这些结果在篮子期权中的应用,并在数值上验证了边界。这些结果证明了神经网络能够在没有维数灾难的情况下学习Black-Scholes型偏微分方程的解。此外,这提供了一个相关学习问题的例子,其中随机特征神经网络是可证明有效的。 摘要:This article investigates the use of random feature neural networks for learning Kolmogorov partial (integro-)differential equations associated to Black-Scholes and more general exponential L'evy models. Random feature neural networks are single-hidden-layer feedforward neural networks in which only the output weights are trainable. This makes training particularly simple, but (a priori) reduces expressivity. Interestingly, this is not the case for Black-Scholes type PDEs, as we show here. We derive bounds for the prediction error of random neural networks for learning sufficiently non-degenerate Black-Scholes type models. A full error analysis is provided and it is shown that the derived bounds do not suffer from the curse of dimensionality. We also investigate an application of these results to basket options and validate the bounds numerically. These results prove that neural networks are able to textit{learn} solutions to Black-Scholes type PDEs without the curse of dimensionality. In addition, this provides an example of a relevant learning problem in which random feature neural networks are provably efficient.

【12】 Simultaneous Training of Partially Masked Neural Networks 标题:部分掩蔽神经网络的同时训练

作者:Amirkeivan Mohtashami,Martin Jaggi,Sebastian U. Stich 链接:https://arxiv.org/abs/2106.08895 摘要:为了将深度学习模型部署到低端设备,有必要训练对资源要求较低的最新体系结构变体。这并不能消除对更昂贵型号的需求,因为它们具有更高的性能。为了避免训练两个独立的模型,我们证明了以这样一种方式训练神经网络是可能的,即预先定义的“核心”子网络可以从训练好的全网络中分离出来,并且具有显著的良好性能。我们扩展了以前只关注较小宽度的核心网络的方法,而我们关注于支持任意的核心网络架构。我们提出的训练方案在只优化网络的核心部分和完整部分之间连续切换。完整模型的精度保持可比性,而核心网络的性能比单独训练时更好。特别地,我们证明了训练一个低阶铁心的Transformer可以得到一个性能优于单独训练低阶模型的低阶模型。我们从理论上分析了我们的训练方案,并在标准或实际合理的假设下证明了它的收敛性。此外,我们证明了所发展的理论框架允许分析许多其他的神经网络部分训练方案。 摘要:For deploying deep learning models to lower end devices, it is necessary to train less resource-demanding variants of state-of-the-art architectures. This does not eliminate the need for more expensive models as they have a higher performance. In order to avoid training two separate models, we show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We extend on prior methods that focused only on core networks of smaller width, while we focus on supporting arbitrary core network architectures. Our proposed training scheme switches consecutively between optimizing only the core part of the network and the full one. The accuracy of the full model remains comparable, while the core network achieves better performance than when it is trained in isolation. In particular, we show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone. We analyze our training scheme theoretically, and show its convergence under assumptions that are either standard or practically justified. Moreover, we show that the developed theoretical framework allows analyzing many other partial training schemes for neural networks.

【13】 Development of Quantized DNN Library for Exact Hardware Emulation 标题:面向精确硬件仿真的量化DNN库的开发

作者:Masato Kiyama,Motoki Amagasaki,Masahiro Iida 链接:https://arxiv.org/abs/2106.08892 摘要:在AI芯片等边缘设备上运行深度神经网络(DNN)时,量化可以加快执行速度和节省功耗。为了研究量化的效果,我们需要先用32位浮点精度将DNN的权值量化一定的位宽,然后再量化回32位浮点精度。这是因为DNN库只能处理浮点数。然而,仿真的准确性并不能提供精确的精度。我们需要精确的精度来检测MAC操作中的溢出或验证边缘设备上的操作。我们已经开发了PyParch,一个DNN库,它以与硬件完全相同的性能执行量化DNN(QNN)。在本文中,我们描述了一个新的方案和实现PyParch。评估结果表明,对于像YOLOv5这样的复杂dnn和large,可以估计任意比特宽度的qnn的精度,并且可以检测到溢出。我们评估了仿真时间的开销,发现与正常DNN执行时间相比,QNN和带溢出检测的QNN分别慢5.6倍和42倍。 摘要:Quantization is used to speed up execution time and save power when runnning Deep neural networks (DNNs) on edge devices like AI chips. To investigate the effect of quantization, we need performing inference after quantizing the weights of DNN with 32-bit floating-point precision by a some bit width, and then quantizing them back to 32-bit floating-point precision. This is because the DNN library can only handle floating-point numbers. However, the accuracy of the emulation does not provide accurate precision. We need accurate precision to detect overflow in MAC operations or to verify the operation on edge de vices. We have developed PyParch, a DNN library that executes quantized DNNs (QNNs) with exactly the same be havior as hardware. In this paper, we describe a new proposal and implementation of PyParch. As a result of the evaluation, the accuracy of QNNs with arbitrary bit widths can be estimated for la rge and complex DNNs such as YOLOv5, and the overflow can be detected. We evaluated the overhead of the emulation time and found that it was 5.6 times slower for QNN and 42 times slower for QNN with overflow detection compared to the normal DNN execution time.

【14】 Bandit Modeling of Map Selection in Counter-Strike: Global Offensive 标题:反打击:全球攻势中地图选择的强盗建模

作者:Guido Petri,Michael H. Stanley,Alec B. Hon,Alexander Dong,Peter Xenopoulos,Cláudio Silva 备注:6 pages, 3 figures, IJCAI-AISA 2021 链接:https://arxiv.org/abs/2106.08888 摘要:许多电子竞技在比赛开始前都会使用一个取缔过程来定义比赛的参数。在反击:全球进攻(CSGO)比赛中,两队首先选择和禁止地图,或虚拟世界,发挥。团队通常会基于各种因素来禁止和选择地图,比如禁止他们不练习的地图,或者根据团队最近的表现来选择地图。我们引入一个上下文bandit框架来解决CSGO中的地图选择问题,并研究团队的取缔决策。使用超过3500个CSGO匹配和超过25000个地图选择决策的数据集,我们考虑问题的不同框架、不同的上下文和不同的奖励指标。我们发现团队在挑选和禁止方面都有次优的地图选择策略。我们还定义了一种奖励ban的方法,这在bandit环境中还没有被探索过,并且发现加入ban奖励可以提高模型的性能。最后,我们确定,使用我们的模型可以提高球队的预测地图获胜概率高达11%,提高整体比赛获胜概率为19.8%,为均匀匹配的球队。 摘要:Many esports use a pick and ban process to define the parameters of a match before it starts. In Counter-Strike: Global Offensive (CSGO) matches, two teams first pick and ban maps, or virtual worlds, to play. Teams typically ban and pick maps based on a variety of factors, such as banning maps which they do not practice, or choosing maps based on the team's recent performance. We introduce a contextual bandit framework to tackle the problem of map selection in CSGO and to investigate teams' pick and ban decision-making. Using a data set of over 3,500 CSGO matches and over 25,000 map selection decisions, we consider different framings for the problem, different contexts, and different reward metrics. We find that teams have suboptimal map choice policies with respect to both picking and banning. We also define an approach for rewarding bans, which has not been explored in the bandit setting, and find that incorporating ban rewards improves model performance. Finally, we determine that usage of our model could improve teams' predicted map win probability by up to 11% and raise overall match win probabilities by 19.8% for evenly-matched teams.

【15】 WaveNet-Based Deep Neural Networks for the Characterization of Anomalous Diffusion (WADNet) 标题:基于WaveNET的深层神经网络在异常扩散(WADNet)表征中的应用

作者:Dezhong Li,Qiujin Yao,Zihan Huang 备注:18 pages, 9 figures 链接:https://arxiv.org/abs/2106.08887 摘要:反常扩散是一种偏离标准布朗运动框架的输运动力学现象,参与了各种物理、化学、生物和经济系统的演化。研究这类随机过程对于揭示随机行走者和复杂系统的物理性质具有重要意义。然而,描述反常扩散的经典方法往往不适用于单个的短轨迹,导致了反常扩散(AnDi)挑战的产生。这项挑战的目的是客观地评估和比较单轨道表征的新方法,涉及三个不同方面:反常扩散指数的推断;扩散模型的分类;以及轨迹的分割。为了解决这一挑战中的推理和分类任务,我们将改进的WaveNet编码器与长-短期记忆网络相结合,开发了一种基于WaveNet的深度神经网络(WADNet),它不需要任何异常扩散的先验知识。由于我们的模型在所有维度的两个任务(6个子任务)上的性能都超过了目前挑战排行榜的第一名,WADNet可以成为最先进的技术的一部分来解码AnDi数据库。我们的方法为未来的研究提供了一个基准,并可以加速开发一个多功能的工具来表征反常扩散。 摘要:Anomalous diffusion, which shows a deviation of transport dynamics from the framework of standard Brownian motion, is involved in the evolution of various physical, chemical, biological, and economic systems. The study of such random processes is of fundamental importance in unveiling the physical properties of random walkers and complex systems. However, classical methods to characterize anomalous diffusion are often disqualified for individual short trajectories, leading to the launch of the Anomalous Diffusion (AnDi) Challenge. This challenge aims at objectively assessing and comparing new approaches for single trajectory characterization, with respect to three different aspects: the inference of the anomalous diffusion exponent; the classification of the diffusion model; and the segmentation of trajectories. In this article, to address the inference and classification tasks in the challenge, we develop a WaveNet-based deep neural network (WADNet) by combining a modified WaveNet encoder with long short-term memory networks, without any prior knowledge of anomalous diffusion. As the performance of our model has surpassed the current 1st places in the challenge leaderboard on both two tasks for all dimensions (6 subtasks), WADNet could be the part of state-of-the-art techniques to decode the AnDi database. Our method presents a benchmark for future research, and could accelerate the development of a versatile tool for the characterization of anomalous diffusion.

【16】 How memory architecture affects performance and learning in simple POMDPs 标题:内存体系结构如何影响简单POMDP的性能和学习

作者:Mario Geiger,Christophe Eloy,Matthieu Wyart 链接:https://arxiv.org/abs/2106.08849 摘要:当agent的观察是局部的或有噪声的时,强化学习变得更加复杂。这种情况对应于部分可观测马尔可夫决策过程(POMDP)。在POMDPs中寻求良好性能的一种策略是赋予代理有限的内存,其更新由策略控制。然而,在这种情况下,策略优化是非凸的,并且会导致随机初始化的训练性能较差。从经验上讲,可以通过限制内存体系结构来提高性能,然后牺牲优化来促进训练。在这里,我们研究了双臂bandit问题中的这种权衡,并比较了两种极端情况:(i)允许在$M$内存状态之间进行任何转换的随机访问内存和(ii)代理可以访问其最后$M$操作和奖励的固定内存。对于(i),对于最优策略,$q$扮演最坏角色的概率已知为以$M$为单位的指数小。我们的主要结果是表明,对于(ii)也可以达到类似的性能,尽管内存体系结构很简单:通过对灰序二进制项链的猜想,我们发现了一些策略,其中$q$在$2^m$中指数级地小,即$qsimalpha^{2^m}$对于某些$alpha<1$。有趣的是,我们从经验上观察到,随机初始化的训练导致(i)的结果非常差,而(ii)的结果明显更好。 摘要:Reinforcement learning is made much more complex when the agent's observation is partial or noisy. This case corresponds to a partially observable Markov decision process (POMDP). One strategy to seek good performance in POMDPs is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in the two-arm bandit problem, and compare two extreme cases: (i) the random access memory where any transitions between $M$ memory states are allowed and (ii) a fixed memory where the agent can access its last $m$ actions and rewards. For (i), the probability $q$ to play the worst arm is known to be exponentially small in $M$ for the optimal policy. Our main result is to show that similar performance can be reached for (ii) as well, despite the simplicity of the memory architecture: using a conjecture on Gray-ordered binary necklaces, we find policies for which $q$ is exponentially small in $2^m$ i.e. $qsimalpha^{2^m}$ for some $alpha < 1$. Interestingly, we observe empirically that training from random initialization leads to very poor results for (i), and significantly better results for (ii).

【17】 Algorithm to Compilation Codesign: An Integrated View of Neural Network Sparsity 标题:编译协同设计算法:神经网络稀疏性的综合观点

作者:Fu-Ming Guo,Austin Huang 链接:https://arxiv.org/abs/2106.08846 摘要:减少神经网络的计算量、推理延迟和内存占用是剪枝和稀疏性的研究动机。然而,实现这些好处以及理解算法设计和正则化对运行时执行的端到端影响通常没有深入研究。在这里,我们将结构化和非结构化剪枝应用于BERT语言模型的变换块的注意权重,同时还扩展了TVM编译器中的块稀疏表示(BSR)操作。BSR操作的集成使得TVM运行时执行能够利用由模型正则化引起的结构化模式稀疏性。这种修剪算法的集成视图使我们能够研究建模决策之间的关系及其对稀疏性增强执行的直接影响。我们的主要发现是:1)我们验证了结构化稀疏块正则化的性能优势必须通过对TVM的BSR增强来实现,相对于vanilla PyTorch有4倍的加速比,相对于标准TVM编译有2.2倍的加速比(没有扩展的BSR支持)。2) 对于注意权重,在这个CPU推断上下文中,端到端的最优块稀疏形状不是一个正方形块(如{gray2017gpu}),而是一个线性32x1块3)性能和块大小/形状之间的关系暗示了模型正则化参数如何与任务调度器优化交互,从而导致观察到的结果端到端性能。 摘要:Reducing computation cost, inference latency, and memory footprint of neural networks are frequently cited as research motivations for pruning and sparsity. However, operationalizing those benefits and understanding the end-to-end effect of algorithm design and regularization on the runtime execution is not often examined in depth. Here we apply structured and unstructured pruning to attention weights of transformer blocks of the BERT language model, while also expanding block sparse representation (BSR) operations in the TVM compiler. Integration of BSR operations enables the TVM runtime execution to leverage structured pattern sparsity induced by model regularization. This integrated view of pruning algorithms enables us to study relationships between modeling decisions and their direct impact on sparsity-enhanced execution. Our main findings are: 1) we validate that performance benefits of structured sparsity block regularization must be enabled by the BSR augmentations to TVM, with 4x speedup relative to vanilla PyTorch and 2.2x speedup relative to standard TVM compilation (without expanded BSR support). 2) for BERT attention weights, the end-to-end optimal block sparsity shape in this CPU inference context is not a square block (as in cite{gray2017gpu}) but rather a linear 32x1 block 3) the relationship between performance and block size / shape is is suggestive of how model regularization parameters interact with task scheduler optimizations resulting in the observed end-to-end performance.

【18】 To Raise or Not To Raise: The Autonomous Learning Rate Question 标题:提高还是不提高:自主学习率问题

作者:Xiaomeng Dong,Tao Tan,Michael Potter,Yun-Chan Tsai,Gaurav Kumar,V. Ratna Saripalli 链接:https://arxiv.org/abs/2106.08767 摘要:在深度学习的世界里,有一个普遍存在的参数:学习率。同样,还有一个普遍存在的问题:学习率应该是多少?这个问题的真正答案往往是乏味而费时的,而且近年来积累了大量关于如何选择和修改学习率以达到最佳训练效果的神秘知识。此外,当您的网络体系结构、优化器、数据集或初始条件发生如此微小的变化时,花在精心设计完美学习率上的长时间可能会化为乌有。但不一定要这样。我们提出了一个新的答案大学习率的问题:自主学习率控制器。在找到它https://github.com/fastestimator/ARC 摘要:There is a parameter ubiquitous throughout the deep learning world: learning rate. There is likewise a ubiquitous question: what should that learning rate be? The true answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and modify learning rates to achieve optimal training performance. Moreover, the long hours spent carefully crafting the perfect learning rate can come to nothing the moment your network architecture, optimizer, dataset, or initial conditions change ever so slightly. But it need not be this way. We propose a new answer to the great learning rate question: the Autonomous Learning Rate Controller. Find it at https://github.com/fastestimator/ARC

【19】 Input Invex Neural Network 标题:输入不变凸神经网络

作者:Suman Sapkota,Binod Bhattarai 备注:20 pages 链接:https://arxiv.org/abs/2106.08748 摘要:本文提出了一种新的神经网络凸性约束方法。不变凸函数保证每个驻点都是全局极小值。因此,从任何一点开始的梯度下降都会导致全局极小值。invexity在神经网络上的另一个优点是通过简单地对输出进行阈值化,将数据空间局部划分为两个具有高度非线性决策边界的连通集。为此,我们构造了一个泛不变凸函数逼近器,并利用它来增强神经网络的不变凸性。我们称之为输入不变凸神经网络(II-NN)。首先用已知的不变凸函数拟合数据,然后用神经网络进行修正,比较梯度的方向,如果梯度的方向与参考不变凸函数的方向相矛盾,则对神经网络上的梯度方向进行惩罚。为了惩罚梯度的方向,我们采用梯度裁剪梯度惩罚(GC-GP)。我们将我们的方法应用于现有的神经网络,用于图像分类和回归任务。通过大量的实证和定性实验,我们发现该方法具有与普通神经网络相似的性能,但具有不变性。该方法比线性神经网络和输入凸神经网络(ICNN)具有更大的优势。我们在github上发布了我们的代码和实现细节。 摘要:In this paper, we present a novel method to constrain invexity on Neural Networks (NN). Invex functions ensure every stationary point is global minima. Hence, gradient descent commenced from any point will lead to the global minima. Another advantage of invexity on NN is to divide data space locally into two connected sets with a highly non-linear decision boundary by simply thresholding the output. To this end, we formulate a universal invex function approximator and employ it to enforce invexity in NN. We call it Input Invex Neural Networks (II-NN). We first fit data with a known invex function, followed by modification with a NN, compare the direction of the gradient and penalize the direction of gradient on NN if it contradicts with the direction of reference invex function. In order to penalize the direction of the gradient we perform Gradient Clipped Gradient Penalty (GC-GP). We applied our method to the existing NNs for both image classification and regression tasks. From the extensive empirical and qualitative experiments, we observe that our method gives the performance similar to ordinary NN yet having invexity. Our method outperforms linear NN and Input Convex Neural Network (ICNN) with a large margin. We publish our code and implementation details at github.

【20】 Memorization and Generalization in Neural Code Intelligence Models 标题:神经编码智能模型中的记忆和泛化

作者:Md Rafiqul Islam Rabin,Aftab Hussain,Vincent J. Hellendoorn,Mohammad Amin Alipour 备注:manuscript in preparation 链接:https://arxiv.org/abs/2106.08704 摘要:深度神经网络(DNN)在软件工程和代码智能任务中的应用越来越广泛。这些是强大的工具,能够通过数百万个参数从大型数据集中学习高度概括的模式。同时,训练dnn意味着走刀口,因为它们的大容量也使得它们容易记住数据点。虽然传统上被认为是过度训练的一个方面,但最近的研究表明,当训练数据集嘈杂且记忆是唯一的手段时,记忆风险表现得尤为明显。不幸的是,大多数代码智能任务依赖于非常容易产生噪音和重复的数据源,例如GitHub,由于其庞大的规模,无法手动检查和评估。我们通过一个跨多个基准测试和模型族的案例研究,评估了神经代码智能模型中的记忆和泛化趋势,方法是利用其他领域中使用DNNs的方法,例如在训练数据集中引入目标噪声。除了加强先前关于DNNs中记忆程度的一般发现外,我们的结果还揭示了噪声数据集在训练中的影响。 摘要:Deep Neural Networks (DNN) are increasingly commonly used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, training DNNs means walking a knife's edges, because their large capacity also renders them prone to memorizing data points. While traditionally thought of as an aspect of over-training, recent work suggests that the memorization risk manifests especially strongly when the training datasets are noisy and memorization is the only recourse. Unfortunately, most code intelligence tasks rely on rather noise-prone and repetitive data sources, such as GitHub, which, due to their sheer size, cannot be manually inspected and evaluated. We evaluate the memorization and generalization tendencies in neural code intelligence models through a case study across several benchmarks and model families by leveraging established approaches from other fields that use DNNs, such as introducing targeted noise into the training dataset. In addition to reinforcing prior general findings about the extent of memorization in DNNs, our results shed light on the impact of noisy dataset in training.

【21】 Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering 标题:用于垃圾短信过滤的自动机器学习工具的比较

作者:Waddah Saeed 备注:10 pages, 3 figures 链接:https://arxiv.org/abs/2106.08671 摘要:短消息服务(SMS)是一种非常流行的用于移动用户通信的服务。然而,这种流行的服务可能被滥用,通过执行非法活动和影响安全风险。目前,已有许多自动机器学习(AutoML)工具可以帮助领域专家和普通用户在很少或根本没有机器学习知识的情况下建立高质量的机器学习模型。在这项工作中,分类性能进行了比较三个自动ML工具的短信垃圾邮件过滤。这些工具是mljar监督的AutoML、h2oautoml和基于树的管道优化工具(TPOT)AutoML。实验结果表明,集成模型取得了最好的分类性能。使用H2O-AutoML建立的叠加系综模型在对数损失(0.8370)、真正(1088/1116)和真负(281/287)指标方面取得了最好的性能。与TPOT AutoML相比,Log Loss提高了19.05%,与mljar监督的AutoML相比,Log Loss提高了10.53%。AutoML工具所获得的令人满意的过滤性能为AutoML工具提供了一个潜在的应用,它可以自动确定最适合短信垃圾邮件过滤的ML模型。 摘要:Short Message Service (SMS) is a very popular service used for communication by mobile users. However, this popular service can be abused by executing illegal activities and influencing security risks. Nowadays, many automatic machine learning (AutoML) tools exist which can help domain experts and lay users to build high-quality ML models with little or no machine learning knowledge. In this work, a classification performance comparison was conducted between three automatic ML tools for SMS spam message filtering. These tools are mljar-supervised AutoML, H2O AutoML, and Tree-based Pipeline Optimization Tool (TPOT) AutoML. Experimental results showed that ensemble models achieved the best classification performance. The Stacked Ensemble model, which was built using H2O AutoML, achieved the best performance in terms of Log Loss (0.8370), true positive (1088/1116), and true negative (281/287) metrics. There is a 19.05% improvement in Log Loss with respect to TPOT AutoML and 10.53% improvement with respect to mljar-supervised AutoML. The satisfactory filtering performance achieved with AutoML tools provides a potential application for AutoML tools to automatically determine the best ML model that can perform best for SMS spam message filtering.

【22】 Discrete Auto-regressive Variational Attention Models for Text Modeling 标题:用于文本建模的离散自回归变量注意力模型

作者:Xianghong Fang,Haoli Bai,Jian Li,Zenglin Xu,Michael Lyu,Irwin King 备注:IJCNN 2021 链接:https://arxiv.org/abs/2106.08571 摘要:变分自动编码器(VAE)在文本建模中得到了广泛的应用。然而,在实践中,他们面临着两个挑战:信息代表不足和后验崩溃。前者是将LSTM编码器的最后一个隐藏状态转化为潜在空间,一般不足以对数据进行汇总。后者是一个长期存在的问题,在训练过程中的VAEs的优化被困在一个灾难性的局部最优。本文提出离散自回归变分注意模型(DAVAM)来解决这一问题。具体来说,我们引入了一种自回归变分注意方法,通过有效地捕捉输入的语义依赖来丰富潜在空间。我们进一步设计了变分注意的离散潜空间,并从数学上证明了我们的模型不存在后崩溃。在语言建模任务上的大量实验证明了DAVAM相对于VAE的优越性。 摘要:Variational autoencoders (VAEs) have been widely applied for text modeling. In practice, however, they are troubled by two challenges: information underrepresentation and posterior collapse. The former arises as only the last hidden state of LSTM encoder is transformed into the latent space, which is generally insufficient to summarize the data. The latter is a long-standing problem during the training of VAEs as the optimization is trapped to a disastrous local optimum. In this paper, we propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges. Specifically, we introduce an auto-regressive variational attention approach to enrich the latent space by effectively capturing the semantic dependency from the input. We further design discrete latent space for the variational attention and mathematically show that our model is free from posterior collapse. Extensive experiments on language modeling tasks demonstrate the superiority of DAVAM against several VAE counterparts.

【23】 Developing a Fidelity Evaluation Approach for Interpretable Machine Learning 标题:开发一种可解释机器学习的保真度评估方法

作者:Mythreyi Velmurugan,Chun Ouyang,Catarina Moreira,Renuka Sindhgatta 链接:https://arxiv.org/abs/2106.08492 摘要:尽管现代机器学习和深度学习方法允许进行复杂和深入的数据分析,但这些方法生成的预测模型通常非常复杂,并且缺乏透明度。可解释人工智能(XAI)方法用于提高这些复杂模型的可解释性,从而提高透明度。然而,这些可解释方法的内在适用性很难评估。特别是,评估解释对潜在黑匣子的保真度的方法需要进一步发展,特别是对于表格数据。在本文中,我们(a)提出了一个三阶段的方法来开发一个评估方法(b) 采用现有的主要针对图像和文本数据的评估方法来评估基于表格数据的模型;用这种评价方法评价两种流行的解释方法。我们的评估表明,潜在预测模型的内部机制、所用可解释方法的内部机制以及模型和数据复杂性都会影响解释保真度。鉴于解释保真度对上下文、使用的工具和数据非常敏感,我们无法清楚地确定任何特定的可解释方法优于另一种方法。 摘要:Although modern machine learning and deep learning methods allow for complex and in-depth data analytics, the predictive models generated by these methods are often highly complex, and lack transparency. Explainable AI (XAI) methods are used to improve the interpretability of these complex models, and in doing so improve transparency. However, the inherent fitness of these explainable methods can be hard to evaluate. In particular, methods to evaluate the fidelity of the explanation to the underlying black box require further development, especially for tabular data. In this paper, we (a) propose a three phase approach to developing an evaluation method; (b) adapt an existing evaluation method primarily for image and text data to evaluate models trained on tabular data; and (c) evaluate two popular explainable methods using this evaluation method. Our evaluations suggest that the internal mechanism of the underlying predictive model, the internal mechanism of the explainable method used and model and data complexity all affect explanation fidelity. Given that explanation fidelity is so sensitive to context and tools and data used, we could not clearly identify any specific explainable method as being superior to another.

【24】 Achieving Domain Robustness in Stereo Matching Networks by Removing Shortcut Learning 标题:消除捷径学习实现立体匹配网络的域鲁棒性

作者:WeiQin Chuah,Ruwan Tennakoon,Alireza Bab-Hadiashar,David Suter 备注:11 pages, 7 figures 链接:https://arxiv.org/abs/2106.08486 摘要:基于学习的立体匹配和深度估计网络目前优于公共基准,并取得了令人印象深刻的结果。然而,最先进的网络往往无法从合成图像推广到更具挑战性的真实数据领域。本文试图通过分析合成图像学习对实际数据性能的影响,揭示实现域鲁棒性的秘密,特别是发现立体匹配网络泛化成功的重要因素。我们提供的证据表明,立体匹配网络在合成域中的特征学习受到合成数据中呈现的两个“捷径”的严重影响:(1)合成立体图像中匹配像素之间的相同局部统计(RGB颜色特征)和(2)合成纹理中缺乏真实感在游戏引擎中模拟的3D对象。我们将展示,通过移除这些快捷方式,我们可以在最先进的立体匹配框架中实现域鲁棒性,并在多个真实数据集上产生显著的性能,尽管事实上网络仅在合成数据上训练。我们的实验结果指出,消除合成数据中的捷径是实现合成数据域和真实数据域之间的域不变泛化的关键。 摘要:Learning-based stereo matching and depth estimation networks currently excel on public benchmarks with impressive results. However, state-of-the-art networks often fail to generalize from synthetic imagery to more challenging real data domains. This paper is an attempt to uncover hidden secrets of achieving domain robustness and in particular, discovering the important ingredients of generalization success of stereo matching networks by analyzing the effect of synthetic image learning on real data performance. We provide evidence that demonstrates that learning of features in the synthetic domain by a stereo matching network is heavily influenced by two "shortcuts" presented in the synthetic data: (1) identical local statistics (RGB colour features) between matching pixels in the synthetic stereo images and (2) lack of realism in synthetic textures on 3D objects simulated in game engines. We will show that by removing such shortcuts, we can achieve domain robustness in the state-of-the-art stereo matching frameworks and produce a remarkable performance on multiple realistic datasets, despite the fact that the networks were trained on synthetic data, only. Our experimental results point to the fact that eliminating shortcuts from the synthetic data is key to achieve domain-invariant generalization between synthetic and real data domains.

【25】 Circa: Stochastic ReLUs for Private Deep Learning 标题:场景:面向私人深度学习的随机RELU

作者:Zahra Ghodsi,Nandan Kumar Jha,Brandon Reagen,Siddharth Garg 链接:https://arxiv.org/abs/2106.08475 摘要:同时,机器学习作为一种服务的兴起和对用户隐私的关注越来越多地激发了对私有推理(PI)的需求。虽然最近的工作表明PI是可能的使用加密原语,计算开销使它不切实际。社区在很大程度上还没有准备好解决这些开销,因为PI的减速源于ReLU操作符,而纯文本推理的优化侧重于优化FLOPs。在本文中,我们重新考虑了ReLU计算,并针对神经网络的特性提出了PI的优化方法。具体来说,我们将ReLU重新表示为一个近似符号测试,并引入一种新的符号测试截断方法,显著降低了每个ReLU的成本。这些优化导致了一种特定类型的随机ReLU。关键的观察是,随机故障行为非常适合神经网络推理的容错特性。因此,我们在不影响准确性的情况下提供了显著的节约。我们将这些优化统称为Circa,并展示了与基线实现相比,最多4.7倍存储和3倍运行时的改进;我们进一步表明,Circa可以用于最近的PI优化之上,以获得1.8倍的额外加速比。 摘要:The simultaneous rise of machine learning as a service and concerns over user privacy have increasingly motivated the need for private inference (PI). While recent work demonstrates PI is possible using cryptographic primitives, the computational overheads render it impractical. The community is largely unprepared to address these overheads, as the source of slowdown in PI stems from the ReLU operator whereas optimizations for plaintext inference focus on optimizing FLOPs. In this paper we re-think the ReLU computation and propose optimizations for PI tailored to properties of neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test that significantly reduces the cost per ReLU. These optimizations result in a specific type of stochastic ReLU. The key observation is that the stochastic fault behavior is well suited for the fault-tolerant properties of neural network inference. Thus, we provide significant savings without impacting accuracy. We collectively call the optimizations Circa and demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations; we further show that Circa can be used on top of recent PI optimizations to obtain 1.8x additional speedup.

【26】 Gradient-trained Weights in Wide Neural Networks Align Layerwise to Error-scaled Input Correlations 标题:宽神经网络中梯度训练权值与误差比例输入相关性的分层对齐

作者:Akhilan Boopathy,Ila Fiete 备注:22 pages, 11 figures 链接:https://arxiv.org/abs/2106.08453 摘要:最近的工作研究了能够解决各种困难问题的神经网络如何将训练数据的统计信息结合起来以获得成功。然而,现有的研究结果只能在有限的环境下建立。在这项工作中,我们推导了具有梯度下降训练的非线性激活的无限宽神经网络的分层权重动力学。我们从理论上证明了权值更新与中间层的输入相关性是一致的,并且从经验上证明了这个结果在有限宽的网络中也是成立的。对齐结果允许我们制定反向传播自由学习规则,称为对齐零和对齐ada,理论上实现与反向传播相同的对齐。最后,我们在前馈和递归神经网络的基准问题上测试了这些学习规则,并在宽网络中证明了与反向传播相当的性能。 摘要:Recent works have examined how deep neural networks, which can solve a variety of difficult problems, incorporate the statistics of training data to achieve their success. However, existing results have been established only in limited settings. In this work, we derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent. We show theoretically that weight updates are aligned with input correlations from intermediate layers weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation. Finally, we test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation.

【27】 Bridge Networks 标题:网桥网络

作者:Wilkie Olin-Ammentorp,Maxim Bazhenov 备注:5 pages, 5 figures 链接:https://arxiv.org/abs/2106.08446 摘要:尽管取得了快速的进步,但当前的深度学习方法面临着许多严峻的挑战。其中包括高能耗、灾难性遗忘、对全球损失的依赖以及无法进行象征性推理。通过结合信息瓶颈理论和向量符号体系结构的概念,提出并实现了一种新的信息处理体系结构“桥网络”,该体系结构具有独特的优点,可以解决全局丢失和灾难性遗忘问题。此外,我们认为,它提供了进一步的基础,以提高能源效率的执行和能力的原因象征性。 摘要:Despite rapid progress, current deep learning methods face a number of critical challenges. These include high energy consumption, catastrophic forgetting, dependance on global losses, and an inability to reason symbolically. By combining concepts from information bottleneck theory and vector-symbolic architectures, we propose and implement a novel information processing architecture, the 'Bridge network.' We show this architecture provides unique advantages which can address the problem of global losses and catastrophic forgetting. Furthermore, we argue that it provides a further basis for increasing energy efficiency of execution and the ability to reason symbolically.

【28】 CODA: Constructivism Learning for Instance-Dependent Dropout Architecture Construction 标题:CODA:基于实例丢弃体系结构构建的建构主义学习

作者:Xiaoli Li 链接:https://arxiv.org/abs/2106.08444 摘要:辍学作为一种防止过度适应的有效方法,正在引起人们对深度学习的浓厚研究兴趣。与忽略结构信息的方法相比,最近在决定退出哪个单元时加入结构信息产生了有希望的结果。然而,现有工作的一个主要问题是,在构建dropout体系结构时未能区分实例。对于许多应用程序来说,这可能是一个显著的缺陷。为了解决这一问题,我们提出了建构主义学习理论,例如依赖性辍学体系结构(CODA),它的灵感来自于一种哲学理论,即建构主义学习。特别地,基于这一理论,我们设计了一个更好的退出技术,均匀过程混合模型,采用贝叶斯非参数方法对均匀过程进行优化。我们已经在5个真实数据集上评估了我们提出的方法,并将其性能与其他最先进的退出技术进行了比较。实验结果证明了尾波的有效性。 摘要:Dropout is attracting intensive research interest in deep learning as an efficient approach to prevent overfitting. Recently incorporating structural information when deciding which units to drop out produced promising results comparing to methods that ignore the structural information. However, a major issue of the existing work is that it failed to differentiate among instances when constructing the dropout architecture. This can be a significant deficiency for many applications. To solve this issue, we propose Constructivism learning for instance-dependent Dropout Architecture (CODA), which is inspired from a philosophical theory, constructivism learning. Specially, based on the theory we have designed a better drop out technique, Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform process. We have evaluated our proposed method on 5 real-world datasets and compared the performance with other state-of-the-art dropout techniques. The experimental results demonstrated the effectiveness of CODA.

【29】 Code to Comment Translation: A Comparative Study on Model Effectiveness & Errors 标题:代码评语翻译:模式有效性与错误的比较研究

作者:Junayed Mahmud,Fahim Faisal,Raihan Islam Arnob,Antonios Anastasopoulos,Kevin Moran 备注:Accepted to the 2021 NLP4Prog Workshop co-located with The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) 链接:https://arxiv.org/abs/2106.08415 摘要:自动源代码摘要是一个流行的软件工程研究课题,其中机器翻译模型被用来将代码片段“翻译”成相关的自然语言描述。大多数此类模型的评估都是使用基于自动参考的度量进行的。然而,考虑到编程语言和自然语言之间相对较大的语义差距,我们认为这一研究方向将受益于对当前最先进模型的各种错误模式的定性研究。因此,在这项工作中,我们对最近提出的三种源代码摘要模型进行了定量和定性的比较。在我们的定量评估中,我们比较了基于平滑BLEU-4、METEOR和ROUGE-L机器翻译度量的模型,在我们的定性评估中,我们对模型在与地面真相字幕进行比较时犯下的最常见错误执行手动开放编码。我们的研究揭示了基于度量的性能和模型预测错误之间的关系的新见解,这种关系建立在一种可用于推动未来研究工作的经验衍生错误分类法的基础上 摘要:Automated source code summarization is a popular software engineering research topic wherein machine translation models are employed to "translate" code snippets into relevant natural language descriptions. Most evaluations of such models are conducted using automatic reference-based metrics. However, given the relatively large semantic gap between programming languages and natural language, we argue that this line of research would benefit from a qualitative investigation into the various error modes of current state-of-the-art models. Therefore, in this work, we perform both a quantitative and qualitative comparison of three recently proposed source code summarization models. In our quantitative evaluation, we compare the models based on the smoothed BLEU-4, METEOR, and ROUGE-L machine translation metrics, and in our qualitative evaluation, we perform a manual open-coding of the most common errors committed by the models when compared to ground truth captions. Our investigation reveals new insights into the relationship between metric-based performance and model prediction errors grounded in an empirically derived error taxonomy that can be used to drive future research efforts

【30】 Spoofing Generalization: When Can't You Trust Proprietary Models? 标题:欺骗泛化:什么时候不能信任专有模型?

作者:Ankur Moitra,Elchanan Mossel,Colin Sandon 链接:https://arxiv.org/abs/2106.08393 摘要:在这项工作中,我们研究的计算复杂性,以确定是否机器学习模型,完全符合训练数据将推广到看不见的数据。特别地,我们研究了恶意代理的能力,其目标是构造一个模型g,该模型与它的训练数据完全吻合,但与精确的模型f是不可区分的。我们说,如果没有多项式时间算法可以区分它们的话,g是强欺骗的。相反,如果我们将算法限制为在$n^c$时间内运行一些固定的$c$,我们就说gc弱欺骗。我们的主要结果是:1.在密码假设下,强欺骗是可能的;2.对于任何c>0的情况,c-弱欺骗是无条件的,而恶意代理的假设是极端的(希望训练大型模型的公司不是恶意的),我们相信它揭示了盲目信任大型专有模型或数据的固有困难。 摘要:In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent difficulties of blindly trusting large proprietary models or data.

【31】 DMSANet: Dual Multi Scale Attention Network 标题:DMSANet:双重多尺度注意力网络

作者:Abhinav Sagar 备注:11 pages, 3 figures, 8 tables, Submitted to Neurips 2021 链接:https://arxiv.org/abs/2106.08382 摘要:近年来,注意机制在计算机视觉领域得到了广泛的应用。为了提高网络的性能,人们做了大量的工作,尽管这几乎总是导致计算复杂度的增加。在本文中,我们提出了一个新的注意模块,它不仅达到了最佳的性能,而且与现有的大多数模型相比,具有较小的参数。我们的注意力模块可以很容易地与其他卷积神经网络集成,因为它的轻量级。提出的双多尺度注意网络(DMSANet)由两部分组成:第一部分用于提取不同尺度的特征并进行聚合;第二部分使用空间和通道注意模块并行地自适应地将局部特征与其全局依赖性相结合。我们在ImageNet数据集上测试了我们的网络性能,在MS-COCO数据集上测试了目标检测和实例分割。 摘要:Attention mechanism of late has been quite popular in the computer vision community. A lot of work has been done to improve the performance of the network, although almost always it results in increased computational complexity. In this paper, we propose a new attention module that not only achieves the best performance but also has lesser parameters compared to most existing models. Our attention module can easily be integrated with other convolutional neural networks because of its lightweight nature. The proposed network named Dual Multi Scale Attention Network (DMSANet) is comprised of two parts: the first part is used to extract features at various scales and aggregate them, the second part uses spatial and channel attention modules in parallel to adaptively integrate local features with their global dependencies. We benchmark our network performance for Image Classification on ImageNet dataset, Object Detection and Instance Segmentation both on MS COCO dataset.

【32】 Learning effective stochastic differential equations from microscopic simulations: combining stochastic numerics and deep learning 标题:从微观模拟中学习有效的随机微分方程:随机数值与深度学习相结合

作者:Felix Dietrich,Alexei Makeev,George Kevrekidis,Nikolaos Evangelou,Tom Bertalan,Sebastian Reich,Ioannis G. Kevrekidis 备注:19 pages, includes supplemental material 链接:https://arxiv.org/abs/2106.09004 摘要:我们确定有效的随机微分方程(SDE)粗观细粒度粒子或基于代理的模拟;然后,这些SDE提供了精细尺度动力学的粗代理模型。我们通过神经网络来逼近这些有效随机微分方程中的漂移函数和扩散函数,这些神经网络可以看作是有效的随机网络。损失函数是受已建立的随机数值积分器(Euler-Maruyama和Milstein)结构的启发,并体现了这些积分器的结构;因此,我们的近似可以从这些基本数值格式的误差分析中获益。当有近似的粗模型(如平均场方程)可用时,它们自然也适合于“物理信息”的灰盒识别。我们的方法不需要长的轨迹,可以处理分散的快照数据,并且可以自然地处理每个快照的不同时间步。我们既考虑了粗集观测量值是预先已知的情况,也考虑了它们必须以数据驱动的方式被发现的情况。 摘要:We identify effective stochastic differential equations (SDE) for coarse observables of fine-grained particle- or agent-based simulations; these SDE then provide coarse surrogate models of the fine scale dynamics. We approximate the drift and diffusivity functions in these effective SDE through neural networks, which can be thought of as effective stochastic ResNets. The loss function is inspired by, and embodies, the structure of established stochastic numerical integrators (here, Euler-Maruyama and Milstein); our approximations can thus benefit from error analysis of these underlying numerical schemes. They also lend themselves naturally to "physics-informed" gray-box identification when approximate coarse models, such as mean field equations, are available. Our approach does not require long trajectories, works on scattered snapshot data, and is designed to naturally handle different time steps per snapshot. We consider both the case where the coarse collective observables are known in advance, as well as the case where they must be found in a data-driven manner.

【33】 Deriving Autism Spectrum Disorder Functional Networks from RS-FMRI Data using Group ICA and Dictionary Learning 标题:利用分组ICA和字典学习从RS-fMRI数据中提取自闭症谱障碍函数网络

作者:Xin Yang,Ning Zhang,Donglin Wang 备注:Conference 链接:https://arxiv.org/abs/2106.09000 摘要:本研究的目的是利用群体ICA和字典学习模型,导出自闭症谱系障碍(ASD)人群的功能网络,并利用从导出的功能网络计算出的功能连接性对ASD和典型发展期(TD)参与者进行分类。在我们的实验中,ASD功能网络是由静息状态功能磁共振成像(rs-fMRI)数据得到的。我们总共下载了120个训练样本,包括58名自闭症患者和62名TD患者,这些样本来自公共资源库:自闭症脑成像数据交换I(I)。我们的方法和结果有五个主要部分。首先,我们利用组ICA模型从ASD组中提取功能网络,并对前20个感兴趣区域(roi)进行排序。其次,我们利用字典学习模型从ASD组中提取功能网络,并对前20个roi进行排序。第三,我们将这两个模型中选定的40个roi合并为ASD功能网络。第四,我们根据从ICA组中选择的20个roi、从字典学习中选择的20个roi以及从这两个组中选择的40个组合roi生成三个相应的掩码。最后,利用上述三个模板提取所有训练样本的roi,并将计算出的函数连接性作为ASD和TD分类的特征。分类结果表明,基于ICA和字典学习的功能网络的分类效果优于基于单一ICA模型和单一字典学习模型的分类效果。 摘要:The objective of this study is to derive functional networks for the autism spectrum disorder (ASD) population using the group ICA and dictionary learning model together and to classify ASD and typically developing (TD) participants using the functional connectivity calculated from the derived functional networks. In our experiments, the ASD functional networks were derived from resting-state functional magnetic resonance imaging (rs-fMRI) data. We downloaded a total of 120 training samples, including 58 ASD and 62 TD participants, which were obtained from the public repository: Autism Brain Imaging Data Exchange I (ABIDE I). Our methodology and results have five main parts. First, we utilize a group ICA model to extract functional networks from the ASD group and rank the top 20 regions of interest (ROIs). Second, we utilize a dictionary learning model to extract functional networks from the ASD group and rank the top 20 ROIs. Third, we merged the 40 selected ROIs from the two models together as the ASD functional networks. Fourth, we generate three corresponding masks based on the 20 selected ROIs from group ICA, the 20 ROIs selected from dictionary learning, and the 40 combined ROIs selected from both. Finally, we extract ROIs for all training samples using the above three masks, and the calculated functional connectivity was used as features for ASD and TD classification. The classification results showed that the functional networks derived from ICA and dictionary learning together outperform those derived from a single ICA model or a single dictionary learning model.

【34】 mSHAP: SHAP Values for Two-Part Models 标题:mSHAP:两部分模型的形状值

作者:Spencer Matthews,Brian Hartman 链接:https://arxiv.org/abs/2106.08990 摘要:两部分模型对整个保险和精算科学都很重要并被使用。由于注册汽车、获得抵押贷款和参与某些业务都需要保险,因此为保险单定价的车型必须公平和无歧视,这一点尤为重要。黑盒模型会使我们很难知道哪些协变量会影响结果。SHAP值可以解释各种黑匣子模型,但是在两部分模型中几乎没有进展。在本文中,我们提出了mSHAP(或乘法SHAP),一种利用单个模型的SHAP值计算两部分模型的SHAP值的方法。这种方法将允许在单独的观察水平上解释两部分模型的预测。在开发了mSHAP之后,我们进行了深入的仿真研究。尽管kernelSHAP算法也能够计算两部分模型的近似形状值,但与我们的方法的比较表明,mSHAP的速度是指数级的。最后,我们将mSHAP应用于个人汽车财产损害保险的两部分费率模型。此外,R包(mshap)可用于在多种应用中轻松实现该方法。 摘要:Two-part models are important to and used throughout insurance and actuarial science. Since insurance is required for registering a car, obtaining a mortgage, and participating in certain businesses, it is especially important that the models which price insurance policies are fair and non-discriminatory. Black box models can make it very difficult to know which covariates are influencing the results. SHAP values enable interpretation of various black box models, but little progress has been made in two-part models. In this paper, we propose mSHAP (or multiplicative SHAP), a method for computing SHAP values of two-part models using the SHAP values of the individual models. This method will allow for the predictions of two-part models to be explained at an individual observation level. After developing mSHAP, we perform an in-depth simulation study. Although the kernelSHAP algorithm is also capable of computing approximate SHAP values for a two-part model, a comparison with our method demonstrates that mSHAP is exponentially faster. Ultimately, we apply mSHAP to a two-part ratemaking model for personal auto property damage insurance coverage. Additionally, an R package (mshap) is available to easily implement the method in a wide variety of applications.

【35】 Collaborative Learning and Personalization in Multi-Agent Stochastic Linear Bandits 标题:多智能体随机线性组合中的协作学习与个性化

作者:Avishek Ghosh,Abishek Sankararaman,Kannan Ramchandran 备注:25 pages, 8 figures 链接:https://arxiv.org/abs/2106.08902 摘要:在一个$N$agent的异构随机线性bandits框架中,我们考虑了最小化遗憾的问题,其中agent(用户)是相似的,但并不完全相同。我们使用两种在实践中普遍使用的思想来建模用户异构性(i) 一种聚类框架,其中用户被划分为多个组,同一组中的用户彼此相同,但不同组之间的用户不同;以及(ii)一种个性化框架,其中没有两个用户必须相同,但用户的参数接近总体平均数。在集群用户设置中,我们提出了一种新的算法,该算法基于集群身份的连续细化和遗憾最小化。我们证明了,对于任何代理,如果代理位于一个“完全分离”的簇中,则遗憾可扩展为$mathcal{O}(sqrt{T/N})$,如果代理的簇没有完全分离,则遗憾可扩展为$mathcal{O}(T^{frac{1}{2} varepsilon}/(N)^{frac{1}{2}-varepsilon})$,其中$varepsilon$为正且任意接近$0。我们的算法对聚类分离是自适应的,并且是无参数的——它不需要知道聚类数目、分离和聚类大小,但是遗憾保证适应了固有的复杂性。在个性化框架中,我们引入了一种自然算法,其中,个人盗贼实例由全局平均模型的估计值初始化。我们证明,一个代理$i$的参数与总体平均值的偏差为$epsilonu i$,它的遗憾标度为$widetilde{O}(epsilonu isqrt{T})$。这表明,如果用户表示接近(小$epsilonu i)$,则产生的遗憾很低,反之亦然。实验结果得到了验证,我们观察到我们的自适应算法优于非自适应基线。 摘要:We consider the problem of minimizing regret in an $N$ agent heterogeneous stochastic linear bandits framework, where the agents (users) are similar but not all identical. We model user heterogeneity using two popularly used ideas in practice; (i) A clustering framework where users are partitioned into groups with users in the same group being identical to each other, but different across groups, and (ii) a personalization framework where no two users are necessarily identical, but a user's parameters are close to that of the population average. In the clustered users' setup, we propose a novel algorithm, based on successive refinement of cluster identities and regret minimization. We show that, for any agent, the regret scales as $mathcal{O}(sqrt{T/N})$, if the agent is in a `well separated' cluster, or scales as $mathcal{O}(T^{frac{1}{2} varepsilon}/(N)^{frac{1}{2} -varepsilon})$ if its cluster is not well separated, where $varepsilon$ is positive and arbitrarily close to $0$. Our algorithm is adaptive to the cluster separation, and is parameter free -- it does not need to know the number of clusters, separation and cluster size, yet the regret guarantee adapts to the inherent complexity. In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model. We show that, an agent $i$ whose parameter deviates from the population average by $epsilon_i$, attains a regret scaling of $widetilde{O}(epsilon_isqrt{T})$. This demonstrates that if the user representations are close (small $epsilon_i)$, the resulting regret is low, and vice-versa. The results are empirically validated and we observe superior performance of our adaptive algorithms over non-adaptive baselines.

【36】 Using Machine Learning to Select High-Quality Measurements 标题:使用机器学习选择高质量的测量值

作者:Andrew Edmonds,David Brown,Luciano Vinas,Samantha Pagan 备注:8 pages, 3 figures 链接:https://arxiv.org/abs/2106.08891 摘要:我们描述了如何使用机器学习算法为Mu2e实验选择高质量的测量值。这种技术对于测量误差引起的背景实验非常重要。该算法利用多条对测量质量敏感的辅助信息来区分高质量和低质量的测量。 摘要:We describe the use of machine learning algorithms to select high-quality measurements for the Mu2e experiment. This technique is important for experiments with backgrounds that arise due to measurement errors. The algorithms use multiple pieces of ancillary information that are sensitive to measurement quality to separate high-quality and low-quality measurements.

【37】 Covariance-based smoothed particle hydrodynamics. A machine-learning application to simulating disc fragmentation 标题:基于协方差的平滑粒子流体力学。机器学习在模拟磁盘碎片中的应用

作者:Eraldo Pereira Marinho 备注:18 pages, 6 figures 链接:https://arxiv.org/abs/2106.08870 摘要:提出了一种基于PCA的SPH方法的机器学习算法。在本方案中,采用改进的八叉树数据结构计算平滑张量,使其特征值与协方差的主分量成比例,从而实现了各向异性自调整kNN的快速估计。每个SPH粒子是这样一个最优kNN簇的中心,即协方差张量允许根据马氏度量找到kNN簇本身的粒子。这种机器学习构成了一个不动点问题。最终(自我调节)kNN簇定义了执行各向异性插值所需的平滑体积,或者恰当地说,平滑椭球体。因此,平滑内核有一个椭圆形的轮廓,它改变了内核梯度的计算方式。作为一个应用,它进行了一个非磁性的,旋转的气态球的崩溃和破碎的模拟。一个有趣的结果是在圆盘碎裂中形成原恒星,在各向异性模拟中比在各向同性情况下显示出更持久和更丰富的原恒星。 摘要:A PCA-based, machine learning version of the SPH method is proposed. In the present scheme, the smoothing tensor is computed to have their eigenvalues proportional to the covariance's principal components, using a modified octree data structure, which allows the fast estimation of the anisotropic self-regulating kNN. Each SPH particle is the center of such an optimal kNN cluster, i.e., the one whose covariance tensor allows the find of the kNN cluster itself according to the Mahalanobis metric. Such machine learning constitutes a fixed point problem. The definitive (self-regulating) kNN cluster defines the smoothing volume, or properly saying, the smoothing ellipsoid, required to perform the anisotropic interpolation. Thus, the smoothing kernel has an ellipsoidal profile, which changes how the kernel gradients are computed. As an application, it was performed the simulation of collapse and fragmentation of a non-magnetic, rotating gaseous sphere. An interesting outcome was the formation of protostars in the disc fragmentation, shown to be much more persistent and much more abundant in the anisotropic simulation than in the isotropic case.

【38】 Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey 标题:再生核Hilbert空间、Mercer定理、特征函数、Nyström方法和核在机器学习中的应用:教程和综述

作者:Benyamin Ghojogh,Ali Ghodsi,Fakhri Karray,Mark Crowley 备注:To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning 链接:https://arxiv.org/abs/2106.08443 摘要:这是一篇关于内核、内核方法和相关领域的教程和调查论文。我们首先回顾核函数分析和机器学习的历史。然后介绍了Mercer核、Hilbert和Banach空间、再生核Hilbert空间(RKHS)、Mercer定理及其证明、常用核、距离度量核构造、重要核类(包括有界核、积分正定核、泛核、平稳核和特征核),对核中心化、归一化和特征函数进行了详细的说明。然后介绍了核函数在机器学习中的应用类型,包括核函数方法(如核支持向量机)、半定规划核函数学习、Hilbert-Schmidt独立性准则、最大均值差、核均值嵌入和核维数约简。我们还讨论了核矩阵的秩分解和因式分解,以及用Nystr{o}m方法逼近本征函数和核。本文可应用于机器学习、降维、数学中的泛函分析、量子力学中的数学物理等多个领域。 摘要:This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercer's theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (including bounded, integrally positive definite, universal, stationary, and characteristic kernels), kernel centering and normalization, and eigenfunctions are explained in detail. Then, we introduce types of use of kernels in machine learning including kernel methods (such as kernel support vector machines), kernel learning by semi-definite programming, Hilbert-Schmidt independence criterion, maximum mean discrepancy, kernel mean embedding, and kernel dimensionality reduction. We also cover rank and factorization of kernel matrix as well as the approximation of eigenfunctions and kernels using the Nystr{"o}m method. This paper can be useful for various fields of science including machine learning, dimensionality reduction, functional analysis in mathematics, and mathematical physics in quantum mechanics.

【39】 Quantum-inspired event reconstruction with Tensor Networks: Matrix Product States 标题:基于张量网络的量子激发事件重构:矩阵乘积态

作者:Jack Y. Araz,Michael Spannowsky 备注:25 pages, 23 figures 链接:https://arxiv.org/abs/2106.08334 摘要:张量网络是高维张量的非平凡表示,最初设计用于描述量子多体系统。我们证明了张量网络是连接量子力学概念和机器学习技术的理想载体,从而提高了神经网络的可解释性。本文提出了一种基于矩阵积状态分类器的顶夸克信号在QCD背景过程中的识别方法。我们证明了纠缠熵可以用来解释网络所学的知识,它可以在不损失一般性和性能的前提下降低网络和特征空间的复杂度。为了优化网络,我们比较了密度矩阵重整化群(DMRG)算法和随机梯度下降(SGD)算法,并提出了一种联合训练算法来利用DMRG的可解释性和SGD的效率。 摘要:Tensor Networks are non-trivial representations of high-dimensional tensors, originally designed to describe quantum many-body systems. We show that Tensor Networks are ideal vehicles to connect quantum mechanical concepts to machine learning techniques, thereby facilitating an improved interpretability of neural networks. This study presents the discrimination of top quark signal over QCD background processes using a Matrix Product State classifier. We show that entanglement entropy can be used to interpret what a network learns, which can be used to reduce the complexity of the network and feature space without loss of generality or performance. For the optimisation of the network, we compare the Density Matrix Renormalization Group (DMRG) algorithm to stochastic gradient descent (SGD) and propose a joined training algorithm to harness the explainability of DMRG with the efficiency of SGD.

其他(26篇)

【1】 Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate 标题:使用可微分代理的约束构型分期综合

作者:Xingyuan Sun,Tianju Xue,Szymon M. Rusinkiewicz,Ryan P. Adams 备注:16 pages, 9 figures 链接:https://arxiv.org/abs/2106.09019 摘要:在设计、制造和控制问题中,我们经常面临着综合任务,在综合任务中,我们必须生成满足一组约束的对象或配置,同时使一个或多个目标函数最大化。综合问题的典型特征是一个物理过程,在这个物理过程中,许多不同的实现可能实现目标。这种多对一映射对前馈综合的监督学习提出了挑战,因为可行的设计集可能具有复杂的结构。此外,许多物理模拟的不可微性妨碍了直接优化。我们解决这两个问题与两个阶段的神经网络架构,我们可以认为是一个自动编码器。我们首先学习解码器:一个近似多对一物理实现过程的可微代理。然后我们学习编码器,它从目标映射到设计,同时使用固定的解码器来评估实现的质量。我们评估了两个案例研究的方法:挤出机路径规划在加性制造和约束软机器人逆运动学。我们比较了我们的方法直接优化设计使用学习代理,并监督学习的综合问题。我们发现,我们的方法产生了比监督学习更高质量的解决方案,同时与直接优化在质量上具有竞争力,在计算成本大大降低。 摘要:In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder. We first learn the decoder: a differentiable surrogate that approximates the many-to-one physical realization process. We then learn the encoder, which maps from goal to design, while using the fixed decoder to evaluate the quality of the realization. We evaluate the approach on two case studies: extruder path planning in additive manufacturing and constrained soft robot inverse kinematics. We compare our approach to direct optimization of design using the learned surrogate, and to supervised learning of the synthesis problem. We find that our approach produces higher quality solutions than supervised learning, while being competitive in quality with direct optimization, at a greatly reduced computational cost.

【2】 An unifying point of view on expressive power of GNNs 标题:关于GNN表现力的统一观点

作者:Giuseppe Alessio D'Inverno,Monica Bianchini,Maria Lucia Sampoli,Franco Scarselli 备注:16 pages, 3 figures 链接:https://arxiv.org/abs/2106.08992 摘要:图神经网络是一类广泛应用于图处理的连接模型。它们在每个节点及其邻居上执行迭代消息传递操作,以解决分类/聚类任务——在某些节点上或在整个图上——收集所有此类消息,而不管它们的顺序如何。尽管属于这一类的各种模型之间存在差异,但大多数模型采用相同的计算方案,基于局部聚合机制,直观地说,局部计算框架主要负责GNNs的表达能力。本文证明了Weisfeiler—Lehman检验在图节点上导出了一个等价关系,该等价关系正好对应于原GNN模型上定义的展开等价。因此,关于原始广义神经网络表达能力的结果可以推广到一般广义神经网络,在温和的条件下,可以证明广义神经网络能够以概率和任何精度逼近图上尊重展开等价性的任何函数。 摘要:Graph Neural Networks (GNNs) are a wide class of connectionist models for graph processing. They perform an iterative message passing operation on each node and its neighbors, to solve classification/ clustering tasks --- on some nodes or on the whole graph --- collecting all such messages, regardless of their order. Despite the differences among the various models belonging to this class, most of them adopt the same computation scheme, based on a local aggregation mechanism and, intuitively, the local computation framework is mainly responsible for the expressive power of GNNs. In this paper, we prove that the Weisfeiler--Lehman test induces an equivalence relationship on the graph nodes that exactly corresponds to the unfolding equivalence, defined on the original GNN model. Therefore, the results on the expressive power of the original GNNs can be extended to general GNNs which, under mild conditions, can be proved capable of approximating, in probability and up to any precision, any function on graphs that respects the unfolding equivalence.

【3】 Banker Online Mirror Descent 标题:银行家在线镜像下降

作者:Jiatai Huang,Longbo Huang 链接:https://arxiv.org/abs/2106.08943 摘要:我们提出了一个新的框架银行家OMD,推广了经典的在线镜像下降(OMD)技术在在线学习算法设计。Banker OMD允许算法稳健地处理延迟反馈,并提供了一种在各种延迟反馈在线学习任务中实现$tilde{O}(sqrt{T} sqrt{D})$风格后悔边界的通用方法,其中,$T$是时间范围长度,$D$是总反馈延迟。我们展示了银行家OMD的能力,并将其应用于三种重要的具有延迟反馈的土匪场景,包括延迟对抗性多武装土匪(MAB)、延迟对抗性线性土匪和一种新的延迟两全其美的MAB设置。Banker OMD在所有这三种设置中都达到了近乎最佳的性能。特别地,它导致了第一个实现$tilde{O}(text{poly}(n)(sqrt{T} sqrt{D}))的延迟对抗线性bandit算法。 摘要:We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) technique in online learning algorithm design. Banker-OMD allows algorithms to robustly handle delayed feedback, and offers a general methodology for achieving $tilde{O}(sqrt{T} sqrt{D})$-style regret bounds in various delayed-feedback online learning tasks, where $T$ is the time horizon length and $D$ is the total feedback delay. We demonstrate the power of Banker-OMD with applications to three important bandit scenarios with delayed feedback, including delayed adversarial Multi-armed bandits (MAB), delayed adversarial linear bandits, and a novel delayed best-of-both-worlds MAB setting. Banker-OMD achieves nearly-optimal performance in all the three settings. In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $tilde{O}(text{poly}(n)(sqrt{T} sqrt{D}))$ regret.

【4】 Towards Automatic Actor-Critic Solutions to Continuous Control 标题:走向连续控制的自动作用者-批评解

作者:Jake Grigsby,Jin Yong Yoo,Yanjun Qi 备注:10 pages, 4 figures 链接:https://arxiv.org/abs/2106.08918 摘要:无模型策略-行为体-批评方法是解决复杂连续控制问题的有效方法。然而,这些算法依赖于大量的设计技巧和超参数,使得它们在新领域的应用变得困难和昂贵。本文建立了一种进化方法,自动调整这些设计决策,并从软演员批评算法中消除RL特定的超参数。我们的设计是样本有效的,与基线方法相比具有实际优势,包括改进的探索、多控制频率下的泛化以及高性能策略的健壮集成。从经验上看,我们的代理在DeepMind Control套件的流行基准测试中的性能优于经过良好调整的超参数设置。然后我们将其应用到新的控制任务中,以最小的计算和研究工作量找到高性能的解决方案。 摘要:Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks. However, these algorithms rely on a number of design tricks and many hyperparameters, making their applications to new domains difficult and computationally expensive. This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft Actor-Critic algorithm. Our design is sample efficient and provides practical advantages over baseline approaches, including improved exploration, generalization over multiple control frequencies, and a robust ensemble of high-performance policies. Empirically, we show that our agent outperforms well-tuned hyperparameter settings in popular benchmarks from the DeepMind Control Suite. We then apply it to new control tasks to find high-performance solutions with minimal compute and research effort.

【5】 Offline RL Without Off-Policy Evaluation 标题:不带离线策略评估的离线RL

作者:David Brandfonbrener,William F. Whitney,Rajesh Ranganath,Joan Bruna 链接:https://arxiv.org/abs/2106.08909 摘要:大多数离线强化学习(RL)的方法都采用了一种迭代的行为-批评方法,包括非策略评估。在本文中,我们证明了简单地使用行为策略的on-policyq估计进行约束/正则化策略改进的一个步骤的性能出奇地好。这种一步算法在D4RL基准测试的很大一部分上优于先前报告的迭代算法的结果。简单的一步基线实现了这种强大的性能,而不需要以前提出的迭代算法使用的许多技巧,并且对超参数更具鲁棒性。我们认为,迭代方法的相对较差的性能是由于执行策略评估时固有的高方差,以及针对这些高方差估计的策略的重复优化所放大的结果。此外,我们假设一步算法的强大性能是由于良好的环境结构和行为策略的结合。 摘要:Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well. This one-step algorithm beats the previously reported results of iterative algorithms on a large portion of the D4RL benchmark. The simple one-step baseline achieves this strong performance without many of the tricks used by previously proposed iterative algorithms and is more robust to hyperparameters. We argue that the relatively poor performance of iterative approaches is a result of the high variance inherent in doing off-policy evaluation and magnified by the repeated optimization of policies against those high-variance estimates. In addition, we hypothesize that the strong performance of the one-step algorithm is due to a combination of favorable structure in the environment and behavior policy.

【6】 Robust Training in High Dimensions via Block Coordinate Geometric Median Descent 标题:基于挡路坐标几何中值下降的高维稳健训练

作者:Anish Acharya,Abolfazl Hashemi,Prateek Jain,Sujay Sanghavi,Inderjit S. Dhillon,Ufuk Topcu 链接:https://arxiv.org/abs/2106.08882 摘要:几何中值(Geometric median,Gm)是统计学中的一种经典方法,用于实现对未损坏数据的稳健估计;在严重腐败的情况下,达到了0.5的最优崩溃点。然而,它的计算复杂性使得对高维优化问题的随机梯度下降(SGD)进行鲁棒化是不可行的。在本文中,我们证明了通过一次仅对一个明智地选择的坐标块应用textsc{Gm},并使用记忆机制,对于光滑的非凸问题,可以保留0.5的崩溃点,其非渐近收敛速度与使用textsc{Gm}的SGD相当。 摘要:Geometric median (textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with textsc{Gm}.

【7】 Multilinear Dirichlet Processes 标题:多线性Dirichlet过程

作者:Xiaoli Li 链接:https://arxiv.org/abs/2106.08852 摘要:依赖Dirichlet过程(DDP)已被广泛地应用于从以某种方式相关的度量集合上的分布来建模数据。另一方面,近年来,在机器学习和数据挖掘领域,越来越多的研究致力于处理涉及两个或多个因素交互作用的数据。然而,很少有研究者利用DDP技术来解决多因素调节所带来的数据异质性问题。在本文中,我们提出了一种新的技术,多线性Dirichlet过程(MLDP),通过结合DP和最先进的因子分析技术,多线性因子分析(MLFA)来构建ddp。我们已经评估了不同应用的真实字数据集上的MLDP,并取得了最先进的性能。 摘要:Dependent Dirichlet processes (DDP) have been widely applied to model data from distributions over collections of measures which are correlated in some way. On the other hand, in recent years, increasing research efforts in machine learning and data mining have been dedicated to dealing with data involving interactions from two or more factors. However, few researchers have addressed the heterogeneous relationship in data brought by modulation of multiple factors using techniques of DDP. In this paper, we propose a novel technique, MultiLinear Dirichlet Processes (MLDP), to constructing DDPs by combining DP with a state-of-the-art factor analysis technique, multilinear factor analyzers (MLFA). We have evaluated MLDP on real-word data sets for different applications and have achieved state-of-the-art performance.

【8】 Solving Continuous Control with Episodic Memory 标题:用情景记忆解决连续控制问题

作者:Igor Kuznetsov,Andrey Filchenkov 备注:To appear in the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021) 链接:https://arxiv.org/abs/2106.08832 摘要:情景记忆可以让强化学习算法记住和利用过去的有前途的经验来提高代理的性能。以往关于记忆机制的研究表明,使用基于情节的数据结构来处理离散动作问题在样本效率方面有好处。情景记忆在具有大动作空间的连续控制中的应用是不平凡的。我们的研究旨在回答这样一个问题:在连续控制中,情景记忆可以用来提高agent的性能吗?我们提出的算法通过修改评论家的目标,将情节记忆与演员-评论家结构相结合。我们通过引入基于情节的重放缓冲区优先级来进一步提高性能。我们评估了我们的算法在OpenAI健身房领域和显示更大的样本效率相比,国家的艺术模型免费政策算法。 摘要:Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.

【9】 Costs and Benefits of Wasserstein Fair Regression 标题:瓦瑟斯坦公平回归的成本与收益

作者:Han Zhao 链接:https://arxiv.org/abs/2106.08812 摘要:在现实世界中,机器学习工具在高风险领域的应用通常被规定为公平的,即预测的目标应该满足一些关于受保护属性的等价性的定量概念。然而,公平性和准确性与实值目标之间的确切权衡尚不清楚。在本文中,我们通过提供一个关于任何公平回归器误差的下界来描述回归设置中统计奇偶性和准确性之间的内在权衡。我们的下界是尖锐的,算法无关,并允许一个简单的解释:当目标的时刻不同的群体,任何公平的算法必须作出一个大的错误,至少一个群体。我们进一步扩展了这个结果,给出了任意(近似)公平算法联合误差的一个下界,利用Wasserstein距离来度量近似的质量。另一方面,我们建立了个体公平性、准确性平价和瓦塞尔斯坦距离之间的第一个联系,证明了如果一个回归变量是个体公平的,它也近似地验证了准确性平价,其中差距由两组之间的瓦塞尔斯坦距离给出。受理论结果的启发,我们从表征学习的角度提出了一种公平回归的实用算法,并在真实数据集上进行了实验验证。 摘要:Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make a large error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.

【10】 Automating Augmentation Through Random Unidimensional Search 标题:通过随机一维搜索实现自动增强

作者:Xiaomeng Dong,Michael Potter,Gaurav Kumar,Yun-Chan Tsai,V. Ratna Saripalli 链接:https://arxiv.org/abs/2106.08756 摘要:在深度学习的研究人员中,找到正确的数据增强策略意味着最先进的结果和普通的排名之间的差异已经不是什么秘密了。为此,社区已经看到了许多努力,以自动化的过程中找到完美的增强过程中的任何任务在手。不幸的是,即使是最新的尖端方法也带来了巨大的计算开销,需要多达100个完整的模型训练才能确定一个理想的配置。我们展示了如何在7:中获得更好的性能,使用随机一维增强。源代码位于https://github.com/fastestimator/RUA 摘要:It is no secret amongst deep learning researchers that finding the right data augmentation strategy during training can mean the difference between a state-of-the-art result and a run-of-the-mill ranking. To that end, the community has seen many efforts to automate the process of finding the perfect augmentation procedure for any task at hand. Unfortunately, even recent cutting-edge methods bring massive computational overhead, requiring as many as 100 full model trainings to settle on an ideal configuration. We show how to achieve even better performance in just 7: with Random Unidimensional Augmentation. Source code is available at https://github.com/fastestimator/RUA

【11】 Probabilistic DAG Search 标题:概率DAG搜索

作者:Julia Grosse,Cheng Zhang,Philipp Hennig 备注:10 pages, 8 figures, to be published at the Conference on Uncertainty in Artificial Intelligence (UAI) 2021 链接:https://arxiv.org/abs/2106.08717 摘要:令人兴奋的当代机器学习问题最近在树搜索的经典形式主义中得到了表述——最著名的是围棋游戏。有趣的是,这些顺序决策问题背后的状态空间通常具有比树更一般的潜在结构。在这项工作中,我们开发了一个概率框架来利用搜索空间的潜在结构,从而在整个搜索树上共享信息。该方法是基于对问题的探索部分的联合高斯模型中的近似推理和对未探索部分的抽象的结合,这使得复杂度的降低成为可能。在Tic-Tac-Toe和一个特征选择应用中,我们经验地发现我们的算法与现有的非概率方案相比是有利的。 摘要:Exciting contemporary machine learning problems have recently been phrased in the classic formalism of tree search -- most famously, the game of Go. Interestingly, the state-space underlying these sequential decision-making problems often posses a more general latent structure than can be captured by a tree. In this work, we develop a probabilistic framework to exploit a search space's latent structure and thereby share information across the search tree. The method is based on a combination of approximate inference in jointly Gaussian models for the explored part of the problem, and an abstraction for the unexplored part that imposes a reduction of complexity ad hoc. We empirically find our algorithm to compare favorably to existing non-probabilistic alternatives in Tic-Tac-Toe and a feature selection application.

【12】 Mobile Augmented Reality: User Interfaces, Frameworks, and Intelligence 标题:移动增强现实:用户界面、框架和智能

作者:Jacky Cao,Kit-Yung Lam,Lik-Hang Lee,Xiaoli Liu,Pan Hui,Xiang Su 备注:This work is currently under review in an international journal 链接:https://arxiv.org/abs/2106.08710 摘要:移动增强现实(MAR)将计算机生成的虚拟对象与移动设备的物理环境相结合。MAR系统使用户能够与MAR设备(如智能手机和头戴式可穿戴设备)进行交互,实现从物理世界到数字实体混合世界的无缝过渡。这些MAR系统通过使用MAR设备提供对数字内容的通用访问来支持用户体验。在过去的20年中,已经开发了许多MAR系统,然而,MAR框架的研究和设计还没有从以用户为中心的设计角度进行系统的回顾。本文介绍了对现有MAR框架的初步研究(计数:37),并通过自顶向下的方法进一步讨论了MAR的最新研究:1)MAR应用;2) 适应用户移动和环境的MAR可视化技术;3) 系统评估MAR框架,包括支持的平台和相应的功能,如跟踪、特征提取和感知能力;支持MAR系统中智能操作的底层机器学习方法。最后,我们总结了新兴研究领域的发展,当前的最新进展,并讨论了重要的开放挑战和可能的理论和技术方向。这项调查旨在使研究人员和MAR系统开发人员都受益。 摘要:Mobile Augmented Reality (MAR) integrates computer-generated virtual objects with physical environments for mobile devices. MAR systems enable users to interact with MAR devices, such as smartphones and head-worn wearables, and performs seamless transitions from the physical world to a mixed world with digital entities. These MAR systems support user experiences by using MAR devices to provide universal accessibility to digital contents. Over the past 20 years, a number of MAR systems have been developed, however, the studies and design of MAR frameworks have not yet been systematically reviewed from the perspective of user-centric design. This article presents the first effort of surveying existing MAR frameworks (count: 37) and further discusses the latest studies on MAR through a top-down approach: 1) MAR applications; 2) MAR visualisation techniques adaptive to user mobility and contexts; 3) systematic evaluation of MAR frameworks including supported platforms and corresponding features such as tracking, feature extraction plus sensing capabilities; and 4) underlying machine learning approaches supporting intelligent operations within MAR systems. Finally, we summarise the development of emerging research fields, current state-of-the-art, and discuss the important open challenges and possible theoretical and technical directions. This survey aims to benefit both researchers and MAR system developers alike.

【13】 Source Separation-based Data Augmentation for Improved Joint Beat and Downbeat Tracking 标题:基于源分离的数据增强改进的联合差拍和降拍跟踪

作者:Ching-Yu Chiu,Joann Ching,Wen-Yi Hsiao,Yu-Hua Chen,Alvin Wen-Yu Su,Yi-Hsuan Yang 备注:Accepted to European Signal Processing Conference (EUSIPCO 2021) 链接:https://arxiv.org/abs/2106.08703 摘要:近年来,随着深度学习技术的发展,音乐音频信号中自动拍和下拍跟踪的性能有了很大的提高。在训练这种基于深度学习的模型时,数据扩充被认为是一种重要的技术。然而,现有的数据扩充方法主要是为了平衡训练数据的分布与速度之间的关系。在这篇论文中,我们探讨另一种资料扩充的方法,以说明训练资料的组成,在打击和非打击声源。具体地说,我们提出了一种盲鼓分离模型,从每个训练音频信号中分离出鼓声和非鼓声,过滤掉无鼓的训练信号,然后利用得到的鼓声和非鼓声来增强训练数据。我们在四个完全看不见的测试集上进行了实验,验证了所提方法的有效性,并相应地验证了鼓声合成在拍和下拍跟踪训练数据中的重要性。 摘要:Due to advances in deep learning, the performance of automatic beat and downbeat tracking in musical audio signals has seen great improvement in recent years. In training such deep learning based models, data augmentation has been found an important technique. However, existing data augmentation methods for this task mainly target at balancing the distribution of the training data with respect to their tempo. In this paper, we investigate another approach for data augmentation, to account for the composition of the training data in terms of the percussive and non-percussive sound sources. Specifically, we propose to employ a blind drum separation model to segregate the drum and non-drum sounds from each training audio signal, filtering out training signals that are drumless, and then use the obtained drum and non-drum stems to augment the training data. We report experiments on four completely unseen test sets, validating the effectiveness of the proposed method, and accordingly the importance of drum sound composition in the training data for beat and downbeat tracking.

【14】 ParticleAugment: Sampling-Based Data Augmentation 标题:颗粒增强:基于采样的数据增强

作者:Alexander Tsaregorodtsev,Vasileios Belagiannis 备注:11 pages. Submitted to NeurIPS 2021 链接:https://arxiv.org/abs/2106.08693 摘要:提出了一种用于图像分类的自动数据增强方法。我们将问题描述为montecarlo抽样,我们的目标是近似最优增广策略。提出了一种粒子滤波方法,用于在模型训练过程中寻找最优增广策略及其调度。我们的性能度量过程依赖于训练集的验证子集,而策略转换模型依赖于高斯先验和可选的增强速度参数。在我们的实验中,我们证明了我们的自动增强公式在CIFAR-10、CIFAR-100和ImageNet数据集上达到了令人满意的结果,这些数据集使用了解决这个问题的标准网络体系结构。通过与相关工作的比较,我们也证明了我们的方法在策略搜索的计算量和模型性能之间达到了平衡。 摘要:We present an automated data augmentation approach for image classification. We formulate the problem as Monte Carlo sampling where our goal is to approximate the optimal augmentation policies. We propose a particle filtering formulation to find optimal augmentation policies and their schedules during model training. Our performance measurement procedure relies on a validation subset of our training set, while the policy transition model depends on a Gaussian prior and an optional augmentation velocity parameter. In our experiments, we show that our formulation for automated augmentation reaches promising results on CIFAR-10, CIFAR-100, and ImageNet datasets using the standard network architectures for this problem. By comparing with the related work, we also show that our method reaches a balance between the computational cost of policy search and the model performance.

【15】 Leveraging Probabilistic Circuits for Nonparametric Multi-Output Regression 标题:利用概率电路实现非参数多输出回归

作者:Zhongjie Yu,Mingye Zhu,Martin Trapp,Arseny Skryagin,Kristian Kersting 备注:Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021) 链接:https://arxiv.org/abs/2106.08687 摘要:受基于专家的高斯过程近似(GPs)领域的最新进展的启发,我们提出了一种基于专家的方法来使用单输出GP专家进行大规模多输出回归。通过概率电路编码的单输出GPs的深层结构混合,我们可以准确地捕捉多个输出维度之间的相关性。通过对协变量空间和输出空间的递归划分,我们的模型中的后验推理简化为对单个输出GP专家的推理,只需要对观测值的一小部分进行条件化。我们证明,在我们的模型中,推理可以准确而有效地进行,它可以捕获输出维度之间的相关性,因此,通常优于不包含输出间相关性的方法,如在负对数预测密度方面的几个数据集上所示。 摘要:Inspired by recent advances in the field of expert-based approximations of Gaussian processes (GPs), we present an expert-based approach to large-scale multi-output regression using single-output GP experts. Employing a deeply structured mixture of single-output GPs encoded via a probabilistic circuit allows us to capture correlations between multiple output dimensions accurately. By recursively partitioning the covariate space and the output space, posterior inference in our model reduces to inference on single-output GP experts, which only need to be conditioned on a small subset of the observations. We show that inference can be performed exactly and efficiently in our model, that it can capture correlations between output dimensions and, hence, often outperforms approaches that do not incorporate inter-output correlations, as demonstrated on several data sets in terms of the negative log predictive density.

【16】 Drum-Aware Ensemble Architecture for Improved Joint Musical Beat and Downbeat Tracking 标题:用于改进音乐节拍和弱拍联合跟踪的鼓感知集成结构

作者:Ching-Yu Chiu,Alvin Wen-Yu Su,Yi-Hsuan Yang 备注:Accepted to IEEE Signal Processing Letters (May 2021) 链接:https://arxiv.org/abs/2106.08685 摘要:本文提出了一种新的系统结构,它将盲源分离与音乐音频信号的联合拍和下拍跟踪相结合。信源分离模块将输入信号中的冲击分量和非冲击分量分离开来,在这两个分量上分别进行拍频和下拍跟踪,然后利用可学习的融合机制对结果进行聚合。这样,系统可以自适应地确定输入信号的跟踪结果在多大程度上取决于输入的冲击或非冲击分量。对四个测试集的评估表明,新的架构始终优于广泛采用的不采用源分离的基线架构。 摘要:This paper presents a novel system architecture that integrates blind source separation with joint beat and downbeat tracking in musical audio signals. The source separation module segregates the percussive and non-percussive components of the input signal, over which beat and downbeat tracking are performed separately and then the results are aggregated with a learnable fusion mechanism. This way, the system can adaptively determine how much the tracking result for an input signal should depend on the input's percussive or non-percussive components. Evaluation on four testing sets that feature different levels of presence of drum sounds shows that the new architecture consistently outperforms the widely-adopted baseline architecture that does not employ source separation.

【17】 Evaluating Gender Bias in Hindi-English Machine Translation 标题:评价印英机器翻译中的性别偏见

作者:Gauri Gupta,Krithika Ramesh,Sanjay Singh 链接:https://arxiv.org/abs/2106.08680 摘要:随着语言模型在现实世界中的应用越来越广泛,解决语言模型输出的公平性问题显得尤为重要。这些语言模型的嵌入词表示往往隐含着不必要的联想,在模型中形成社会偏见。像印地语这样的性别化语言的性质,对量化和减少偏见提出了另一个问题,因为句子中的单词形式根据主题的性别而发生变化。此外,在印度语的测量和借记系统领域中所做的工作很少。在我们的工作中,我们试图评估和量化印地语-英语机器翻译系统中的性别偏见。基于印地语的语法考虑,我们实现了现有TGBI度量的一个修改版本。我们还比较和对比了预先训练的嵌入和机器翻译模型学习的嵌入的多个度量的偏差测量结果。 摘要:With language models being deployed increasingly in the real world, it is essential to address the issue of the fairness of their outputs. The word embedding representations of these language models often implicitly draw unwanted associations that form a social bias within the model. The nature of gendered languages like Hindi, poses an additional problem to the quantification and mitigation of bias, owing to the change in the form of the words in the sentence, based on the gender of the subject. Additionally, there is sparse work done in the realm of measuring and debiasing systems for Indic languages. In our work, we attempt to evaluate and quantify the gender bias within a Hindi-English machine translation system. We implement a modified version of the existing TGBI metric based on the grammatical considerations for Hindi. We also compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.

【18】 Maxmin-Fair Ranking: Individual Fairness under Group-Fairness Constraints 标题:最大最小公平排序:群体公平约束下的个人公平

作者:David Garcia-Soriano,Francesco Bonchi 备注:In proceedings of KDD 2021 链接:https://arxiv.org/abs/2106.08652 摘要:我们研究了一个新的排序公平性问题,目的是最小化在执行组公平性约束时引入的个体不公平性。我们的建议是植根于分配最大最小公平理论,它使用随机化来最大化最差个人的期望满意度。我们设计了一个精确的多项式时间算法来寻找一般搜索问题(包括但不限于排名)的最大最小公平分布,并证明了我们的算法可以产生排名,在满足给定的群公平约束的同时,确保给个体带来最大可能的值。 摘要:We study a novel problem of fairness in ranking aimed at minimizing the amount of individual unfairness introduced when enforcing group-fairness constraints. Our proposal is rooted in the distributional maxmin fairness theory, which uses randomization to maximize the expected satisfaction of the worst-off individuals. We devise an exact polynomial-time algorithm to find maxmin-fair distributions of general search problems (including, but not limited to, ranking), and show that our algorithm can produce rankings which, while satisfying the given group-fairness constraints, ensure that the maximum possible value is brought to individuals.

【19】 Mining Interpretable Spatio-temporal Logic Properties for Spatially Distributed Systems 标题:空间分布系统的可解释时空逻辑属性挖掘

作者:Sara Mohammadinejad,Jyotirmy V. Deshmukh,Laura Nenzi 链接:https://arxiv.org/abs/2106.08548 摘要:物联网、复杂传感器网络、多智能体网络物理系统都是在时间上不断演化的空间分布系统的例子。这样的系统会产生大量的时空数据,系统设计者通常对分析和发现数据中的结构感兴趣。人们对利用信号时序逻辑(STL)等逻辑来学习时序数据的因果和逻辑特性有着浓厚的兴趣;然而,在时空数据上发现这种关系的工作非常有限。我们提出了第一套时空数据无监督学习算法。该方法通过将时空数据投影到参数时空到达逃逸逻辑(PSTREL)的参数空间,实现了对时空数据的自动特征提取。提出了一种凝聚层次聚类技术,保证每个聚类满足一个不同的STREL公式。我们证明了我们的方法使用一种新的决策树方法生成有界描述复杂度的STREL公式,该方法推广了信号时序逻辑的无监督学习技术。我们从城市交通、流行病学、绿色基础设施和空气质量监测等不同领域的案例研究中证明了我们的方法的有效性。 摘要:The Internet-of-Things, complex sensor networks, multi-agent cyber-physical systems are all examples of spatially distributed systems that continuously evolve in time. Such systems generate huge amounts of spatio-temporal data, and system designers are often interested in analyzing and discovering structure within the data. There has been considerable interest in learning causal and logical properties of temporal data using logics such as Signal Temporal Logic (STL); however, there is limited work on discovering such relations on spatio-temporal data. We propose the first set of algorithms for unsupervised learning for spatio-temporal data. Our method does automatic feature extraction from the spatio-temporal data by projecting it onto the parameter space of a parametric spatio-temporal reach and escape logic (PSTREL). We propose an agglomerative hierarchical clustering technique that guarantees that each cluster satisfies a distinct STREL formula. We show that our method generates STREL formulas of bounded description complexity using a novel decision-tree approach which generalizes previous unsupervised learning techniques for Signal Temporal Logic. We demonstrate the effectiveness of our approach on case studies from diverse domains such as urban transportation, epidemiology, green infrastructure, and air quality monitoring.

【20】 Multi-Resolution Continuous Normalizing Flows 标题:多分辨率连续归一化流

作者:Vikram Voleti,Chris Finlay,Adam Oberman,Christopher Pal 备注:9 pages, 5 figures, 3 tables, 17 equations 链接:https://arxiv.org/abs/2106.08462 摘要:最近的研究表明,神经常微分方程(ODEs)可以作为连续归一化流(CNFs)视角下的图像生成模型。这些模型提供了精确的似然计算和可逆的生成/密度估计。在这项工作中,我们引入了一种多分辨率的模型(MRCNF),通过描述生成与粗图像一致的精细图像所需的附加信息的条件分布。我们引入了一个分辨率之间的转换,允许对数似然没有变化。我们表明,这种方法可以为不同的图像数据集产生类似的似然值,在更高的分辨率和更少的参数下,仅使用1个GPU就可以提高性能。 摘要:Recent work has shown that Neural Ordinary Differential Equations (ODEs) can serve as generative models of images using the perspective of Continuous Normalizing Flows (CNFs). Such models offer exact likelihood calculation, and invertible generation/density estimation. In this work we introduce a Multi-Resolution variant of such models (MRCNF), by characterizing the conditional distribution over the additional information required to generate a fine image that is consistent with the coarse image. We introduce a transformation between resolutions that allows for no change in the log likelihood. We show that this approach yields comparable likelihood values for various image datasets, with improved performance at higher resolutions, with fewer parameters, using only 1 GPU.

【21】 On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control 标题:连续控制中重尾策略搜索的样本复杂性和亚稳性

作者:Amrit Singh Bedi,Anjaly Parayil,Junyu Zhang,Mengdi Wang,Alec Koppel 链接:https://arxiv.org/abs/2106.08414 摘要:强化学习是一种交互式决策的框架,它不需要系统动力学模型,而是在不同的时间顺序地揭示激励机制。由于其可扩展到连续空间,我们将重点放在策略搜索上,其中一个迭代地改进带有随机策略梯度(PG)更新的参数化策略。在表马尔可夫决策问题(MDPs)中,通过不断的探索和适当的参数化,可以得到全局最优解。相比之下,在连续空间中,非凸性带来了一个病态的挑战,现有的收敛结果大多局限于平稳性或任意的局部极值。为了缩小这一差距,我们通过策略参数化(由尾部指数参数alpha定义的较重尾部分布定义)在连续空间中进行持续探索,这增加了状态空间中跳跃的可能性。这样做会使PG所共有的分数函数的光滑性条件失效。因此,我们建立了收敛到平稳性的速度如何依赖于策略的尾部指数α、Holder连续性参数、可积性条件和本文首次引入的探索容差参数。进一步,我们通过一个适当定义的马尔可夫链的退出和过渡时间分析来刻画局部极大值集对尾部指数的依赖性,确定与较重尾部的Levy过程相关的策略收敛到较宽的峰值。这一现象提高了监督学习对扰动的稳定性,我们也证实了这一点,特别是当短视和远视的激励机制失调时,策略搜索的性能也得到了改善。 摘要:Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space. Doing so invalidates smoothness conditions of the score function common to PG. Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced here for the first time. Further, we characterize the dependence of the set of local maxima on the tail index through an exit and transition time analysis of a suitably defined Markov chain, identifying that policies associated with Levy Processes of a heavier tail converge to wider peaks. This phenomenon yields improved stability to perturbations in supervised learning, which we corroborate also manifests in improved performance of policy search, especially when myopic and farsighted incentives are misaligned.

【22】 On the Objective Evaluation of Post Hoc Explainers 标题:论岗位讲解员的客观评价

作者:Zachariah Carmichael,Walter J. Scheirer 备注:14 pages, 4 figures. Under review 链接:https://arxiv.org/abs/2106.08376 摘要:数据驱动模型的许多应用要求决策的透明度,特别是在医疗、刑事司法和其他高风险环境中。机器学习研究的现代趋势导致算法越来越复杂,以至于被认为是黑匣子。为了减少决策的不透明性,人们提出了以人类可理解的方式分析此类模型内部工作的方法。这些事后技术被描述为是通用的解释者——能够忠实地用算法洞察力增强决策。不幸的是,关于什么是“好的”解释,人们几乎没有达成一致意见。此外,现有的解释性评价方法是从主观或代理手段中衍生出来的。在这项工作中,我们提出了一个框架,以评估事后解释的地面真相,这是直接来自一个模型的加法结构。通过对数千个合成任务和多个真实任务中流行的解释者进行评估,我们证明了该框架在理解解释者方面的有效性。该框架揭示了解释可能是准确的,但错误地认为个别特征的重要性。 摘要:Many applications of data-driven models demand transparency of decisions, especially in health care, criminal justice, and other high-stakes environments. Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes. In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner. These post hoc techniques are described as being universal explainers - capable of faithfully augmenting decisions with algorithmic insight. Unfortunately, there is little agreement about what constitutes a "good" explanation. Moreover, current methods of explanation evaluation are derived from either subjective or proxy means. In this work, we propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model. We demonstrate the efficacy of the framework in understanding explainers by evaluating popular explainers on thousands of synthetic and several real-world tasks. The framework unveils that explanations may be accurate but misattribute the importance of individual features.

【23】 KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support 标题:羽衣甘蓝流:一种不相交支集概率的松弛KL梯度流

作者:Pierre Glaser,Michael Arbel,Arthur Gretton 链接:https://arxiv.org/abs/2106.08929 摘要:我们研究了运动源和固定目标分布之间Kullback-Leibler(KL)散度的松弛近似下的梯度流。这种近似称为KALE(KL近似下界估计),它解决了一类限制函数上定义KL的Fenchel对偶问题的正则化形式。当使用再生核Hilbert空间(RKHS)定义函数类时,我们证明了KALE在KL和最大平均差(MMD)之间连续插值。与MMD和其他积分概率度量一样,KALE对于相互奇异的分布仍然有很好的定义。尽管如此,羽衣甘蓝从极限KL继承了一个更大的敏感性错配的支持分布,与MMD相比。这两个性质使得羽衣甘蓝梯度流特别适合目标分布在低维流形上。在充分光滑的假设下,证明了KALE流的全局收敛性。我们提出了一种粒子流的实现方法,给出了源和目标分布的初始样本,并用它来验证羽衣甘蓝的性质。 摘要:We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution. This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions. When using a Reproducing Kernel Hilbert Space (RKHS) to define the function class, we show that the KALE continuously interpolates between the KL and the Maximum Mean Discrepancy (MMD). Like the MMD and other Integral Probability Metrics, the KALE remains well defined for mutually singular distributions. Nonetheless, the KALE inherits from the limiting KL a greater sensitivity to mismatch in the support of the distributions, compared with the MMD. These two properties make the KALE gradient flow particularly well suited when the target distribution is supported on a low-dimensional manifold. Under an assumption of sufficient smoothness of the trajectories, we show the global convergence of the KALE flow. We propose a particle implementation of the flow given initial samples from the source and the target distribution, which we use to empirically confirm the KALE's properties.

【24】 Locality defeats the curse of dimensionality in convolutional teacher-student scenarios 标题:局部性战胜了卷积师生情景中的维度诅咒

作者:Alessandro Favero,Francesco Cagnetta,Matthieu Wyart 备注:27 pages, 3 figures 链接:https://arxiv.org/abs/2106.08619 摘要:卷积神经网络对数据进行局部和平移不变的处理:量化这两个方面中哪一个是它们成功的关键仍然是一个挑战。我们在核回归的师生框架内研究这个问题,使用由给定滤波器大小的简单卷积结构的神经正切核启发的“卷积”核。利用物理学中的启发式方法,我们发现在无脊情况下,局部性是决定学习曲线指数$beta$(将测试误差$epsilontsimp^{-beta}$与训练集$P$的大小联系起来)的关键,而平移不变性则不是。特别是,如果teacher$t$的过滤器大小小于student$s$,则$beta$仅是$s$的函数,不依赖于输入维度。我们根据经验证实了我们对$beta$的预测。理论上,在某些情况下(包括当教师和学生是平等的),可以证明这种预测是一个上限的表现。我们的结论是,使用一个自然的普遍性假设,证明了使用脊函数进行核回归,脊函数随着训练集的大小而减小,从而得到与无脊函数相似的学习曲线指数。 摘要:Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent $beta$ (that relates the test error $epsilon_tsim P^{-beta}$ to the size of the training set $P$), whereas translational invariance is not. In particular, if the filter size of the teacher $t$ is smaller than that of the student $s$, $beta$ is a function of $s$ only and does not depend on the input dimension. We confirm our predictions on $beta$ empirically. Theoretically, in some cases (including when teacher and student are equal) it can be shown that this prediction is an upper bound on performance. We conclude by proving, using a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.

【25】 Global Rhythm Style Transfer Without Text Transcriptions 标题:无文本转录的全局节奏风格转移

作者:Kaizhi Qian,Yang Zhang,Shiyu Chang,Jinjun Xiong,Chuang Gan,David Cox,Mark Hasegawa-Johnson 链接:https://arxiv.org/abs/2106.08519 摘要:韵律在表征说话人的风格或情感方面起着重要的作用,但大多数非平行语音或情感风格转换算法并不转换任何韵律信息。韵律的两个主要组成部分是音高和节奏。从语音中分离韵律信息,特别是韵律成分是一个挑战,因为它涉及到打破输入语音和分离语音表示之间的同步性。因此,现有的大多数韵律风格转换算法都需要依赖于某种形式的文本转录来识别文本中的内容信息,这使得它们的应用仅限于高资源语言。近年来,SpeechSplit在无监督的韵律风格转换方面取得了很大的进展,但它无法以无监督的方式提取出高层次的整体韵律风格。在本文中,我们提出了AutoPST,它可以在不依赖任何文本转录的情况下从语音中分离出整体韵律风格。AutoPST是一个基于自动编码器的韵律风格转换框架,在自表达表征学习的指导下,具有一个完整的韵律去除模块。在不同风格转换任务上的实验表明,AutoPST能有效地转换出正确反映目标领域风格的韵律。 摘要:Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony between the input speech and the disentangled speech representation. As a result, most existing prosody style transfer algorithms would need to rely on some form of text transcriptions to identify the content information, which confines their application to high-resource languages only. Recently, SpeechSplit has made sizeable progress towards unsupervised prosody style transfer, but it is unable to extract high-level global prosody style in an unsupervised manner. In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions. AutoPST is an Autoencoder-based Prosody Style Transfer framework with a thorough rhythm removal module guided by the self-expressive representation learning. Experiments on different style transfer tasks show that AutoPST can effectively convert prosody that correctly reflects the styles of the target domains.

【26】 Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis 标题:Ctrl-P:语音合成中韵律变化的时间控制

作者:Devang S Ram Mohan,Vivian Hu,Tian Huey Teh,Alexandra Torresquintero,Christopher G. R. Wallis,Marlene Staib,Lorenzo Foglianti,Jiameng Gao,Simon King 备注:To be published in Interspeech 2021. 5 pages, 4 figures 链接:https://arxiv.org/abs/2106.08352 摘要:文本不能完全指定口语形式,因此文本到语音模型必须能够从语音数据中学习,这些数据的变化方式不是由相应的文本来解释的。减少训练数据中无法解释的变化量的一种方法是提供声学信息作为额外的学习信号。在生成语音时,修改此声学信息可以生成文本的多个不同格式副本。由于许多无法解释的变化都发生在韵律中,我们提出了一个模型,该模型可以生成明显依赖于韵律的三个主要声学相关:F{0}$、能量和持续时间的语音。该模型在如何指定这些特性的值方面是灵活的:它们可以从外部提供,也可以从文本中预测,或者预测然后进行修改。与采用变分自动编码器学习无监督潜在特征的模型相比,我们的模型提供了更具解释性、时间精确性和不纠缠的控制。当从文本中自动预测声学特征时,它产生的语音比tacotron2模型和参考编码器产生的语音更自然。随后对预测的声学特征进行人在环修改可以显著地进一步提高自然度。 摘要:Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct renditions of a text to be produced. Since much of the unexplained variation is in the prosody, we propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: $F_{0}$, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. When automatically predicting the acoustic features from text, it generates speech that is more natural than that from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop modification of the predicted acoustic features can significantly further increase naturalness.

0 人点赞