访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.LG 方向,今日共计190篇
Graph相关(图学习|图神经网络|图优化等)(15篇)
【1】 Boundary Graph Neural Networks for 3D Simulations 标题:用于三维仿真的边界图神经网络
作者:Andreas Mayr,Sebastian Lehner,Arno Mayrhofer,Christoph Kloss,Sepp Hochreiter,Johannes Brandstetter 机构:ELLIS Unit Linz, LIT AI Lab, Johannes Kepler University Linz, DCS Computing GmbH, Linz, Austria, Institute of Advanced Research in, Artificial Intelligence (IARAI), University of Amsterdam 链接:https://arxiv.org/abs/2106.11299 摘要:丰富的数据为机器学习在自然科学和工程领域提供了巨大的动力。然而,模拟物理过程的建模仍然很困难。这样做的一个关键问题是几何边界的正确处理。虽然三角化的几何边界在工程应用中非常常见,但是由于它们在尺寸和方向上的异质性,机器学习方法很难对它们进行建模。在这项工作中,我们引入了边界图神经网络(BGNNs),它可以动态地修改图的结构来处理边界条件。通过修改边、增加节点特征和动态插入虚拟节点来构造边界图结构。在工业机械标准件料斗和转鼓的复杂三维颗粒流过程中进行了试验。利用一种昂贵而复杂的离散元方法得到的精确模拟结果,对BGNNs的计算效率、颗粒流和混合熵的预测精度进行了评价。即使存在复杂的边界,BGNNs也能够在数十万个模拟时间步内准确地再现模拟不确定性中的三维颗粒流,最显著的是,颗粒完全停留在几何对象内,而无需使用手工制作的条件或限制。 摘要:The abundance of data has given machine learning huge momentum in natural sciences and engineering. However, the modeling of simulated physical processes remains difficult. A key problem in doing so is the correct handling of geometric boundaries. While triangularized geometric boundaries are very common in engineering applications, they are notoriously difficult to model by machine learning approaches due to their heterogeneity with respect to size and orientation. In this work, we introduce Boundary Graph Neural Networks (BGNNs), which dynamically modify graph structures to address boundary conditions. Boundary graph structures are constructed via modifying edges, augmenting node features, and dynamically inserting virtual nodes. The new BGNNs are tested on complex 3D granular flow processes of hoppers and rotating drums which are standard parts of industrial machinery. Using precise simulations that are obtained by an expensive and complex discrete element method, BGNNs are evaluated in terms of computational efficiency as well as prediction accuracy of particle flows and mixing entropies. Even if complex boundaries are present, BGNNs are able to accurately reproduce 3D granular flows within simulation uncertainties over hundreds of thousands of simulation timesteps, and most notably particles completely stay within the geometric objects without using handcrafted conditions or restrictions.
【2】 GraphMixup: Improving Class-Imbalanced Node Classification on Graphs by Self-supervised Context Prediction 标题:GraphMixup:利用自监督上下文预测改进图的类不平衡节点分类
作者:Lirong Wu,Haitao Lin,Zhangyang Gao,Cheng Tan,Stan. Z. Li 机构: AI Lab, School of Engineering, Westlake University, Hangzhou , Zhejiang Province, China, Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou , Zhejiang Province, China, Zhejiang University, Hangzhou , Zhejiang Province, China 链接:https://arxiv.org/abs/2106.11133 摘要:近年来,用图神经网络(GNNs)处理节点分类问题取得了很大的成功。然而,现有的gnn大多是基于不同类的节点样本是均衡的假设,而现实世界中的许多图都存在着类不均衡的问题,即有些类的样本可能比其它类少很多。在这种情况下,直接用原始数据训练GNN分类器会低估这些少数类的样本,从而导致次优性能。提出了一种改进图类不平衡节点分类的新框架GraphMixup。然而,由于少数类的极端稀疏性,直接在输入空间或嵌入空间进行混合可能产生域外样本;因此,我们构造语义关系空间,允许在语义层次上进行特征合成。此外,我们应用两种基于上下文的自监督技术来捕获图结构中的局部和全局信息,然后针对图数据提出了边缘融合。最后,我们开发了一个emph{Reinforcement Mixup}机制来自适应地确定要为这些少数类生成多少样本。在三个真实数据集上的大量实验表明,GraphMixup对于类不平衡节点分类任务产生了令人鼓舞的结果。 摘要:Recent years have witnessed great success in handling node classification tasks with Graph Neural Networks (GNNs). However, most existing GNNs are based on the assumption that node samples for different classes are balanced, while for many real-world graphs, there exists the problem of class imbalance, i.e., some classes may have much fewer samples than others. In this case, directly training a GNN classifier with raw data would under-represent samples from those minority classes and result in sub-optimal performance. This paper presents GraphMixup, a novel mixup-based framework for improving class-imbalanced node classification on graphs. However, directly performing mixup in the input space or embedding space may produce out-of-domain samples due to the extreme sparsity of minority classes; hence we construct semantic relation spaces that allows the Feature Mixup to be performed at the semantic level. Moreover, we apply two context-based self-supervised techniques to capture both local and global information in the graph structure and then propose Edge Mixup specifically for graph data. Finally, we develop a emph{Reinforcement Mixup} mechanism to adaptively determine how many samples are to be generated by mixup for those minority classes. Extensive experiments on three real-world datasets show that GraphMixup yields truly encouraging results for class-imbalanced node classification tasks.
【3】 BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation 标题:BernNet:通过Bernstein逼近学习任意图谱滤波器
作者:Mingguo He,Zhewei Wei,Zengfeng Huang,Hongteng Xu 机构:Renmin University of China, Fudan University 备注:14 pages, 31 figures 链接:https://arxiv.org/abs/2106.10994 摘要:许多有代表性的图神经网络,如$、GPR-GNN和ChebyNet,都是用图谱滤波器近似图卷积的。然而,现有的工作要么应用预先定义的滤波器权值,要么在没有必要约束的情况下学习它们,这可能导致滤波器过于简单或不适定。为了克服这些问题,我们提出了$textit{BernNet}$,这是一种具有理论支持的新型图神经网络,它为设计和学习任意图谱滤波器提供了一种简单而有效的方案。特别地,对于图的规范化拉普拉斯谱上的任何滤波器,我们的BernNet通过一阶-K$Bernstein多项式逼近来估计它,并通过设置Bernstein基的系数来设计它的谱性质。此外,我们可以根据观察到的图形及其相关信号来学习系数(以及相应的滤波器权重),从而实现数据专用的BernNet。实验结果表明,BernNet可以学习任意的频谱滤波器,包括复杂的带阻滤波器和梳状滤波器,在实际的图形建模任务中取得了良好的性能。 摘要:Many representative graph neural networks, $e.g.$, GPR-GNN and ChebyNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose $textit{BernNet}$, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks.
【4】 Extractive approach for text summarisation using graphs 标题:一种基于图的文本摘要抽取方法
作者:Kastriot Kadriu,Milenko Obradovic 机构:University of Ljubljana, Veˇcna pot , SI-, Ljubljana, Slovenia 备注:4 pages, 2 figures, 5 tables 链接:https://arxiv.org/abs/2106.10955 摘要:自然语言处理是一门重要的学科,其目的是通过文本的数字表示来理解文本,但由于我们的书写和说话方式的多样性,往往不够准确。本文探讨了不同的图形相关算法,可用于解决文本摘要问题的提取方法。我们考虑两个量度:句子重叠和编辑距离来衡量句子相似度。 摘要:Natural language processing is an important discipline with the aim of understanding text by its digital representation, that due to the diverse way we write and speak, is often not accurate enough. Our paper explores different graph-related algorithms that can be used in solving the text summarization problem using an extractive approach. We consider two metrics: sentence overlap and edit distance for measuring sentence similarity.
【5】 GRAND: Graph Neural Diffusion 标题:GRAND:图的神经扩散
作者:Benjamin Paul Chamberlain,James Rowbottom,Maria Gorinova,Stefan Webb,Emanuele Rossi,Michael M. Bronstein 备注:15 pages, 4 figures. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s) 链接:https://arxiv.org/abs/2106.10934 摘要:我们提出了一种图神经扩散(GRAND)方法,它将图的深度学习看作是一个连续的扩散过程,并将图神经网络(GNNs)看作是基本偏微分方程的离散化。在我们的模型中,层结构和拓扑对应于时间和空间算子的离散化选择。我们的方法允许有原则地开发一类广泛的新GNN,这些GNN能够解决图形学习模型的常见问题,如深度、过度平滑和瓶颈。我们的模型成功的关键是关于数据扰动的稳定性,这是针对隐式和显式离散格式的。我们开发了GRAND的线性和非线性版本,在许多标准图形基准上都取得了有竞争力的结果。 摘要:We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE. In our model, the layer structure and topology correspond to the discretisation choices of temporal and spatial operators. Our approach allows a principled development of a broad new class of GNNs that are able to address the common plights of graph learning models such as depth, oversmoothing, and bottlenecks. Key to the success of our models are stability with respect to perturbations in the data and this is addressed for both implicit and explicit discretisation schemes. We develop linear and nonlinear versions of GRAND, which achieve competitive results on many standard graph benchmarks.
【6】 DisenHAN: Disentangled Heterogeneous Graph Attention Network for Recommendation 标题:DisenHan:解缠异构图注意力网络推荐
作者:Yifan Wang,Suyao Tang,Yuntong Lei,Weiping Song,Sheng Wang,Ming Zhang 机构:Department of Computer Science, School of EECS, Peking University, Paul G. Allen School of Computer, Science, University of Washington 备注:Accepted at CIKM2020 链接:https://arxiv.org/abs/2106.10879 摘要:异构信息网络由于能够对用户-项目交互中丰富的上下文信息进行建模,被广泛应用于推荐系统中,以缓解推荐系统的稀疏性和冷启动问题。图神经网络能够通过在图上的传播对这种丰富的上下文信息进行编码。然而,现有的异构图神经网络忽略了来自不同方面的潜在因素的纠缠。此外,现有方法中的元路径被简化为节点对之间的连接路径或边信息,忽略了路径中丰富的语义信息。本文提出了一种新的非纠缠异构图注意网络,用于top-N$推荐,它从异构信息网络的不同方面学习非纠缠的用户/项目表示。特别地,我们使用元关系来分解节点对之间的高阶连通性,并提出了一个可迭代识别元关系主要方面的非纠缠嵌入传播层。我们的模型从目标用户/项目的每个元关系中聚合相应的方面特征。通过不同层次的嵌入传播,DisenHAN能够在语义上显式地捕获协同过滤效果。在三个真实数据集上的大量实验表明,DisenHAN始终优于最先进的方法。通过深入的案例研究和可视化,我们进一步证明了学习到的分离表征的有效性和可解释性。 摘要:Heterogeneous information network has been widely used to alleviate sparsity and cold start problems in recommender systems since it can model rich context information in user-item interactions. Graph neural network is able to encode this rich context information through propagation on the graph. However, existing heterogeneous graph neural networks neglect entanglement of the latent factors stemming from different aspects. Moreover, meta paths in existing approaches are simplified as connecting paths or side information between node pairs, overlooking the rich semantic information in the paths. In this paper, we propose a novel disentangled heterogeneous graph attention network DisenHAN for top-$N$ recommendation, which learns disentangled user/item representations from different aspects in a heterogeneous information network. In particular, we use meta relations to decompose high-order connectivity between node pairs and propose a disentangled embedding propagation layer which can iteratively identify the major aspect of meta relations. Our model aggregates corresponding aspect features from each meta relation for the target user/item. With different layers of embedding propagation, DisenHAN is able to explicitly capture the collaborative filtering effect semantically. Extensive experiments on three real-world datasets show that DisenHAN consistently outperforms state-of-the-art approaches. We further demonstrate the effectiveness and interpretability of the learned disentangled representations via insightful case studies and visualization.
【7】 Graph Attention Networks with LSTM-based Path Reweighting 标题:基于LSTM路径加权的图注意网络
作者:Jianpeng Chen,Yujing Wang,Ming Zeng,Zongyi Xiang,Yazhou Ren 机构:University of Electronic Science and, Technology of China, Key Laboratory of Machine Perception, MOE, School of EECS, Peking University, Carnegie Mellon University 备注:15 pages with 10 figures, submitted to NeurlPS 2021 链接:https://arxiv.org/abs/2106.10866 摘要:图神经网络(GNNs)以其优异的性能被广泛应用于挖掘图结构数据。然而,传统的GNNs存在过平滑、非鲁棒性和过拟合等问题。为了解决这些问题,我们设计了一种新的GNN解决方案,即基于LSTM的路径重加权图注意网络(PR-GAT)。PR-GAT能自动聚合多跳信息,突出重要路径,滤除噪声。此外,我们在PR-GAT中使用随机路径取样来扩充资料。使用增广数据预测相应标签的分布。最后,我们证明了PR-GAT可以缓解过度平滑、非鲁棒性和过度拟合的问题。我们在7个数据集中的5个数据集上达到了最先进的精度,在其他2个数据集上达到了具有竞争力的精度。7个数据集的平均精度比文献中最好的SOTA提高了0.5%。 摘要:Graph Neural Networks (GNNs) have been extensively used for mining graph-structured data with impressive performance. However, traditional GNNs suffer from over-smoothing, non-robustness and over-fitting problems. To solve these weaknesses, we design a novel GNN solution, namely Graph Attention Network with LSTM-based Path Reweighting (PR-GAT). PR-GAT can automatically aggregate multi-hop information, highlight important paths and filter out noises. In addition, we utilize random path sampling in PR-GAT for data augmentation. The augmented data is used for predicting the distribution of corresponding labels. Finally, we demonstrate that PR-GAT can mitigate the issues of over-smoothing, non-robustness and overfitting. We achieve state-of-the-art accuracy on 5 out of 7 datasets and competitive accuracy for other 2 datasets. The average accuracy of 7 datasets have been improved by 0.5% than the best SOTA from literature.
【8】 ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction 标题:ROPE:基于图的文档信息抽取的阅读顺序等变位置编码
作者:Chen-Yu Lee,Chun-Liang Li,Chu Wang,Renshen Wang,Yasuhisa Fujii,Siyang Qin,Ashok Popat,Tomas Pfister 机构:†Google Cloud AI, §McGill University, ‡ Google Research 备注:Accepted to ACL-IJCNLP 2021 (Oral) 链接:https://arxiv.org/abs/2106.10786 摘要:自然的单词阅读顺序对于从类似表单的文档中提取信息至关重要。尽管最近图卷积网络(GCNs)在文档空间布局模式建模方面取得了一些进展,但它们在捕获给定单词级节点表示的读取顺序方面能力有限。我们提出阅读顺序等变位置编码(ROPE),这是一种新的位置编码技术,旨在理解文档中单词的顺序表示。ROPE为给定单词级图连通性的目标单词相关的相邻单词生成唯一的读取顺序代码。在公共FUNSD数据集和大规模支付数据集上,我们研究了两个基本的文档实体抽取任务,包括单词标注和单词分组。我们表明,ROPE持续改进现有的GCNs,其F1分数的差值高达8.4%。 摘要:Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.
【9】 Adversarial Attack on Graph Neural Networks as An Influence Maximization Problem 标题:图神经网络的对抗性攻击作为影响最大化问题
作者:Jiaqi Ma,Junwei Deng,Qiaozhu Mei 机构:School of Information, University of Michigan, ‡Department of EECS 链接:https://arxiv.org/abs/2106.10785 摘要:图神经网络(GNNs)引起了人们越来越多的兴趣。随着GNNs在现实世界中的广泛应用,迫切需要了解GNNs在对抗性攻击下的健壮性,特别是在现实环境中。在这项工作中,我们研究了在限制和现实的设置下,通过扰动一小部分节点的特征来攻击GNNs的问题,而不需要访问模型参数和模型预测。我们的形式化分析将这种攻击与图上的影响最大化问题联系起来。这种联系不仅加深了我们对GNNs对抗性攻击问题的认识,而且使我们能够提出一组行之有效的攻击策略。我们的实验验证了所提出的攻击策略显著降低了三种流行GNN模型的性能,并优于基线对抗攻击策略。 摘要:Graph neural networks (GNNs) have attracted increasing interests. With broad deployments of GNNs in real-world applications, there is an urgent need for understanding the robustness of GNNs under adversarial attacks, especially in realistic setups. In this work, we study the problem of attacking GNNs in a restricted and realistic setup, by perturbing the features of a small set of nodes, with no access to model parameters and model predictions. Our formal analysis draws a connection between this type of attacks and an influence maximization problem on the graph. This connection not only enhances our understanding on the problem of adversarial attack on GNNs, but also allows us to propose a group of effective and practical attack strategies. Our experiments verify that the proposed attack strategies significantly degrade the performance of three popular GNN models and outperform baseline adversarial attack strategies.
【10】 Opportunities and challenges in partitioning the graph measure space of real-world networks 标题:现实网络图测度空间划分的机遇与挑战
作者:Máté Józsa,Alpár S. Lázár,Zsolt I. Lázár 机构:Department of Physics, Babe¸s-Bolyai University, M. Kog˘alniceanu nr. , Cluj-Napoca, Romania, University of East Anglia, NR,TJ, Norwich, UK, Zsolt. I. Lázár 备注:11 pages, 6 figures 链接:https://arxiv.org/abs/2106.10753 摘要:基于一个包含数千个现实世界网络的大型数据集,从基因、蛋白质相互作用和代谢网络到大脑、语言、生态和社会网络,我们寻找不同复杂网络域(CND)的结构度量。我们计算了所有网络的208个度量,并使用统计和机器学习方法的综合和严谨的工作流程,研究了识别CNDs关键图度量的局限性和可能性。我们的方法成功地识别出可区分的网络域组,并赋予它们相关的特征。这些特性是CND特有的,即使在单个CND的水平上也不是唯一的。所提出的方法可以应用于其他类似的场景,包括高度不平衡和倾斜的数据集。 摘要:Based on a large dataset containing thousands of real-world networks ranging from genetic, protein interaction, and metabolic networks to brain, language, ecology, and social networks we search for defining structural measures of the different complex network domains (CND). We calculate 208 measures for all networks and using a comprehensive and scrupulous workflow of statistical and machine learning methods we investigated the limitations and possibilities of identifying the key graph measures of CNDs. Our approach managed to identify well distinguishable groups of network domains and confer their relevant features. These features turn out to be CND specific and not unique even at the level of individual CNDs. The presented methodology may be applied to other similar scenarios involving highly unbalanced and skewed datasets.
【11】 TD-GEN: Graph Generation With Tree Decomposition 标题:TD-Gen:基于树分解的图形生成
作者:Hamed Shirzad,Hossein Hajimirsadeghi,Amir H. Abdi,Greg Mori 机构:Borealis AI & Simon Fraser University 链接:https://arxiv.org/abs/2106.10656 摘要:提出了一种基于树分解的图生成框架TD-GEN,并对图生成所需的最大决策数引入了一个简化的上界。该框架包括一个置换不变树生成模型,它构成了图生成的主干。树节点是超级节点,每个超级节点表示图中的一组节点。通过遍历树的超节点,遵循树分解的结构,并遵循簇间的节点共享决策,在簇内增量生成图的节点和边。最后,我们讨论了基于生成图的统计特性作为性能度量的标准评估准则的缺点。我们建议比较基于似然的模型的性能。在各种标准图形生成数据集上的实验结果表明了该方法的优越性。 摘要:We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incrementally generated inside the clusters by traversing the tree supernodes, respecting the structure of the tree decomposition, and following node sharing decisions between the clusters. Finally, we discuss the shortcomings of standard evaluation criteria based on statistical properties of the generated graphs as performance measures. We propose to compare the performance of models based on likelihood. Empirical results on a variety of standard graph generation datasets demonstrate the superior performance of our method.
【12】 Graph Neural Networks for Learning Real-Time Prices in Electricity Market 标题:图神经网络在电力市场实时电价学习中的应用
作者:Shaohui Liu,Chengyang Wu,Hao Zhu 机构:machine learning (ML) techniques have been recently advo-cated through extensive off-line training of neural network 1Department of Electrical and Computer Engineering 链接:https://arxiv.org/abs/2106.10529 摘要:解决实时电力市场中的最优潮流问题,提高了低碳能源接入电网的效率和可靠性。针对现有的端到端OPF学习算法的可扩展性和自适应性问题,提出了一种新的基于OPF的电力市场价格预测的图神经网络(GNN)框架。提出的GNN-for-OPF框架创新性地利用了价格的局部性,引入了物理感知的正则化,同时降低了模型复杂度,快速适应了网格拓扑的变化。数值实验验证了该方法在学习效率和自适应性方面优于现有方法。 摘要:Solving the optimal power flow (OPF) problem in real-time electricity market improves the efficiency and reliability in the integration of low-carbon energy resources into the power grids. To address the scalability and adaptivity issues of existing end-to-end OPF learning solutions, we propose a new graph neural network (GNN) framework for predicting the electricity market prices from solving OPFs. The proposed GNN-for-OPF framework innovatively exploits the locality property of prices and introduces physics-aware regularization, while attaining reduced model complexity and fast adaptivity to varying grid topology. Numerical tests have validated the learning efficiency and adaptivity improvements of our proposed method over existing approaches.
【13】 Stability of Graph Convolutional Neural Networks to Stochastic Perturbations 标题:图卷积神经网络对随机扰动的稳定性
作者:Zhan Gao,Elvin Isufi,Alejandro Ribeiro 机构: ‡Department of Intelligent Sys-tems 链接:https://arxiv.org/abs/2106.10526 摘要:图卷积神经网络(GCNNs)是从网络数据中学习表示的非线性处理工具。GCNNs的一个重要性质是对图扰动的稳定性。当前的分析考虑确定性扰动,但当拓扑变化是随机的时,不能提供相关的见解。研究了GCNNs对链路丢失引起的随机图扰动的稳定性。特别地,证明了随机扰动图上的GCNN与标称图上的GCNN之间的期望输出差的上界是一个在链路丢失概率上呈线性的因子。我们在图的谱域中进行稳定性分析,使得结果对任何图都是一致的。这一结果也显示了非线性和结构宽度和深度的作用,并允许识别句柄来提高GCNN的鲁棒性。源定位和机器人群控制的数值模拟证实了我们的理论结果。 摘要:Graph convolutional neural networks (GCNNs) are nonlinear processing tools to learn representations from network data. A key property of GCNNs is their stability to graph perturbations. Current analysis considers deterministic perturbations but fails to provide relevant insights when topological changes are random. This paper investigates the stability of GCNNs to stochastic graph perturbations induced by link losses. In particular, it proves the expected output difference between the GCNN over random perturbed graphs and the GCNN over the nominal graph is upper bounded by a factor that is linear in the link loss probability. We perform the stability analysis in the graph spectral domain such that the result holds uniformly for any graph. This result also shows the role of the nonlinearity and the architecture width and depth, and allows identifying handle to improve the GCNN robustness. Numerical simulations on source localization and robot swarm control corroborate our theoretical findings.
【14】 A Unified View of Algorithms for Path Planning Using Probabilistic Inference on Factor Graphs 标题:因子图概率推理路径规划算法的统一视图
作者:Francesco A. N. Palmieri,Krishna R. Pattipati,Giovanni Di Gennaro,Giovanni Fioretti,Francesco Verolla,Amedeo Buonanno 机构:Dipartimento di Ingegneria, Università degli Studi della Campania “Luigi Vanvitelli”, Aversa (CE), Italy, Department of Electrical and Computer Engineering, University of Connecticut, Storrs (CT), USA, ENEA 链接:https://arxiv.org/abs/2106.10442 摘要:即使路径规划可以用动态规划和控制的标准技术来解决,也可以用概率推理来解决。使用后一种框架出现的算法具有一些吸引人的特性,这些特性使概率方法成为更传统控制公式的有力替代方法。利用随机模型上的估计来解决控制问题的想法并不新鲜,这里考虑的推理方法属于主动推理(AI)和控制即推理(CAI)的范畴。在这项工作中,我们将研究由各种代价函数产生的具体递归,这些代价函数虽然在范围上看起来相似,但至少在应用于典型的路径规划问题时具有明显的差异。我们首先提出了一个概率因子图上的路径规划问题,并展示了各种算法如何转化为特定的消息组合规则。然后,我们展示了这种在概率空间和对数空间中提出的统一方法如何提供一个非常通用的框架,包括和积、最大积、动态规划和基于混合奖赏/熵准则的算法。该框架还扩展了用于更平滑或更清晰策略分布的算法设计选项,包括广义和/最大积算法、平滑动态规划算法和改进的奖励/熵递归。我们提供了一个完整的递归表,并通过模拟进行了比较,首先在一个带有障碍物的单一目标的合成小网格上,然后在一个带有多个目标和语义图的真实场景中推断出的网格上。 摘要:Even if path planning can be solved using standard techniques from dynamic programming and control, the problem can also be approached using probabilistic inference. The algorithms that emerge using the latter framework bear some appealing characteristics that qualify the probabilistic approach as a powerful alternative to the more traditional control formulations. The idea of using estimation on stochastic models to solve control problems is not new and the inference approach considered here falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we look at the specific recursions that arise from various cost functions that, although they may appear similar in scope, bear noticeable differences, at least when applied to typical path planning problems. We start by posing the path planning problem on a probabilistic factor graph, and show how the various algorithms translate into specific message composition rules. We then show how this unified approach, presented both in probability space and in log space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options for smoother or sharper policy distributions, including generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and modified versions of the Reward/Entropy recursions. We provide a comprehensive table of recursions and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.
【15】 Predicting Critical Nodes in Temporal Networks by Dynamic Graph Convolutional Networks 标题:基于动态图卷积网络的时态网络关键节点预测
作者:En-Yu Yu,Yan Fu,Jun-Lin Zhou,Hong-Liang Sun,Duan-Bing Chen 机构:Big Data Research Center, University of Electronic Science and Technology of China, School of Information Engineering, Nanjing University of Finance and Economics, School of Computer Science and Technology, University of Nottingham, Ningbo, P. 链接:https://arxiv.org/abs/2106.10419 摘要:许多真实世界的系统都可以用时态网络来表示,节点在结构和功能上扮演着截然不同的角色,边缘表示节点之间的关系。识别关键节点可以帮助我们控制舆论或流行病的传播,预测学术界的领军人物,为各种商品做广告,等等。然而,在时态网络中,由于网络结构随时间变化,识别关键节点相当困难。本文结合时态网络的序列拓扑信息,提出了一种新颖有效的基于特殊GCNs和RNNs相结合的学习框架,用于识别具有最佳扩展能力的节点。通过加权模型评价了该方法的有效性。在四个真实时态网络上的实验结果表明,该方法在Kendall$tau$系数和top$k$命中率方面均优于传统和深度学习基准方法。 摘要:Many real-world systems can be expressed in temporal networks with nodes playing far different roles in structure and function and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities, and so on. However, it is rather difficult to identify critical nodes because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special GCNs and RNNs is proposed to identify nodes with the best spreading ability. The effectiveness of the approach is evaluated by weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall $tau$ coefficient and top $k$ hit rate.
Transformer(2篇)
【1】 Exploring Vision Transformers for Fine-grained Classification 标题:探索用于细粒度分类的视觉转换器
作者:Marcos V. Conde,Kerem Turgutlu 机构:Universidad de Valladolid, University of San Francisco 备注:4 pages, 5 figures, 4 tables. Published in The Eighth Workshop on Fine-Grained Visual Categorization, for code see this https URL, for workshop papers see this https URL 链接:https://arxiv.org/abs/2106.10587 摘要:现有的计算机视觉分类研究由于类内方差高,类间方差低,难以实现细粒度属性识别。SOTA方法通过定位信息量最大的图像区域来解决这一难题,并依靠这些区域对完整图像进行分类。最近的工作visiontransformer(ViT)显示了它在传统和细粒度分类任务中的强大性能。在这项工作中,我们提出了一个用于细粒度图像分类任务的多阶段ViT框架,该框架利用固有的多头自注意机制,在不需要结构改变的情况下定位信息丰富的图像区域。我们还引入了注意力引导的增强来提高模型的能力。我们通过四个流行的细粒度基准测试来证明我们的方法的价值:CUB-200-2011、斯坦福汽车、斯坦福狗和FGVC7植物病理学。我们还通过定性结果证明了模型的可解释性。 摘要:Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks. In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities. We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.
【2】 Transformer-based Spatial-Temporal Feature Learning for EEG Decoding 标题:基于变换的时空特征学习在脑电解码中的应用
作者:Yonghao Song,Xueyu Jia,Lie Yang,Longhan Xie 机构: Lie Yang and Longhan Xie are with theShien-Ming Wu School of Intelligent Engineering, South China Universityof Technology 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.11170 摘要:目前,人们通常采用一些基于卷积神经网络(CNNs)的方法对脑电图(EEG)进行解码。然而,CNNs在感知全局依赖性方面存在局限性,这对于具有强整体关系的常见EEG范式是不够的。针对这一问题,本文提出了一种基于注意机制的脑电解码方法。首先对脑电数据进行预处理和空间滤波。然后,在特征通道维度上进行注意转换,使模型能够增强更多相关的空间特征。最关键的一步是在时间维度上对数据进行切片,进行注意力转换,最终得到高度可分辨的表征。在这个时候,全局平均池和一个简单的完全连接层被用来分类不同类别的脑电数据。在两个公共数据集上的实验表明,注意转换策略有效地利用了空间和时间特征。在脑电多分类中,参数较少,达到了最先进的水平。据我们所知,这是第一次提出一个详细而完整的方法,基于Transformer的思想在这一领域。它对促进脑机接口(BCI)的实用化具有良好的潜力。源代码可以在以下位置找到:textit{https://github.com/anranknight/EEG-Transformer}. 摘要:At present, people usually use some methods based on convolutional neural networks (CNNs) for Electroencephalograph (EEG) decoding. However, CNNs have limitations in perceiving global dependencies, which is not adequate for common EEG paradigms with a strong overall relationship. Regarding this issue, we propose a novel EEG decoding method that mainly relies on the attention mechanism. The EEG data is firstly preprocessed and spatially filtered. And then, we apply attention transforming on the feature-channel dimension so that the model can enhance more relevant spatial features. The most crucial step is to slice the data in the time dimension for attention transforming, and finally obtain a highly distinguishable representation. At this time, global averaging pooling and a simple fully-connected layer are used to classify different categories of EEG data. Experiments on two public datasets indicate that the strategy of attention transforming effectively utilizes spatial and temporal features. And we have reached the level of the state-of-the-art in multi-classification of EEG, with fewer parameters. As far as we know, it is the first time that a detailed and complete method based on the transformer idea has been proposed in this field. It has good potential to promote the practicality of brain-computer interface (BCI). The source code can be found at: textit{https://github.com/anranknight/EEG-Transformer}.
GAN|对抗|攻击|生成相关(8篇)
【1】 Delving into the pixels of adversarial samples 标题:深入研究对抗性样本的像素
作者:Blerta Lindqvist 机构:Department of Computer Science, Aalto University, Helsinki, Finland 链接:https://arxiv.org/abs/2106.10996 摘要:尽管对对抗性攻击进行了广泛的研究,但我们不知道对抗性攻击是如何影响图像像素的。了解图像像素如何受到敌方攻击的影响,有可能使我们获得更好的敌方防御。基于我们发现强攻击不会转移的实例,我们深入研究像素级的对抗性示例,以仔细研究对抗性攻击如何影响图像像素值。我们考虑了几种ImageNet架构,InceptionV3、VGG19和ResNet50,以及几种强攻击。我们发现,攻击可以在像素级别上有不同的效果,这取决于分类器结构。特别是,在攻击对像素的影响中,输入预处理扮演了一个以前被忽视的角色。基于像素级检测的洞察力,我们找到了新的方法来检测一些最强的当前攻击。 摘要:Despite extensive research into adversarial attacks, we do not know how adversarial attacks affect image pixels. Knowing how image pixels are affected by adversarial attacks has the potential to lead us to better adversarial defenses. Motivated by instances that we find where strong attacks do not transfer, we delve into adversarial examples at pixel level to scrutinize how adversarial attacks affect image pixel values. We consider several ImageNet architectures, InceptionV3, VGG19 and ResNet50, as well as several strong attacks. We find that attacks can have different effects at pixel level depending on classifier architecture. In particular, input pre-processing plays a previously overlooked role in the effect that attacks have on pixels. Based on the insights of pixel-level examination, we find new ways to detect some of the strongest current attacks.
【2】 Adversarial Examples Make Strong Poisons 标题:对抗性的例子是强有力的毒药。
作者:Liam Fowl,Micah Goldblum,Ping-yeh Chiang,Jonas Geiping,Wojtek Czaja,Tom Goldstein 机构:Department of Mathematics, University of Maryland, Department of Computer Science, Department of Electrical Engineering, University of Siegen 链接:https://arxiv.org/abs/2106.10807 摘要:对抗性机器学习文献主要分为对测试数据的规避攻击和对训练数据的中毒攻击。在这项工作中,我们展示了最初用于攻击预先训练的模型的对抗性例子,对于数据中毒比最近专门为中毒设计的方法更有效。我们的研究结果表明,对抗性的例子,当分配原始标签的自然基图像,不能用来训练分类器的自然图像。此外,当对手的例子被分配他们的对手类标签,他们是有用的训练。这表明对抗性的例子包含有用的语义内容,只是带有“错误”的标签(根据网络,而不是人类)。我们的方法,对抗性中毒,在安全数据集发布方面比现有的中毒方法更有效,并且我们发布了ImageNet的中毒版本ImageNet-P,以鼓励研究这种形式的数据混淆的强度。 摘要:The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data. In this work, we show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning. Our findings indicate that adversarial examples, when assigned the original label of their natural base image, cannot be used to train a classifier for natural images. Furthermore, when adversarial examples are assigned their adversarial class label, they are useful for training. This suggests that adversarial examples contain useful semantic content, just with the ``wrong'' labels (according to a network, but not a human). Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and we release a poisoned version of ImageNet, ImageNet-P, to encourage research into the strength of this form of data obfuscation.
【3】 Attack to Fool and Explain Deep Networks 标题:攻击愚人并解释深层网络
作者:Naveed Akhtar,Muhammad A. A. K. Jalwana,Mohammed Bennamoun,Ajmal Mian 机构:•All authors are with the Department of Computer Science and SoftwareEngineering, University of Western Australia 备注:To appear in IEEE TPAMI. arXiv admin note: text overlap with arXiv:1905.11544 链接:https://arxiv.org/abs/2106.10606 摘要:深度视觉模型容易受到输入的对抗性干扰。尽管这些信号是经过精心设计的,但在人类看来,它们仍然像噪音一样。这一观察结果导致了深层视觉表征与人类感知不一致的论点。我们通过提供人类对抗性干扰中有意义模式的证据来反驳这一论点。我们首先提出一种攻击,愚弄一个网络,使其混淆整个类别的对象(源类)和目标标签。我们的攻击还限制了来自非源类的样本的非故意愚弄,从而限制了用于网络愚弄的人类定义的语义概念。我们证明了所提出的攻击不仅导致了扰动中规则几何模式的出现,而且揭示了深模型决策边界的深刻信息。进一步探讨这一现象,我们改变了攻击的“对抗性”目标,将其作为“解释”深层视觉表象的工具。我们表明,通过仔细的通道和投影的扰动计算我们的方法,我们可以可视化一个模型的理解人类定义的语义概念。最后,我们利用扰动的可解释性,通过攻击具有对抗性的鲁棒“分类器”来进行图像生成、修复和交互式图像处理。总之,我们的主要贡献是一种新的实用主义对抗攻击,它随后被转化为一种解释视觉模型的工具。这篇文章还为我们的攻击在多个有趣的应用中超越敌方目标做出了次要贡献。 摘要:Deep visual models are susceptible to adversarial perturbations to inputs. Although these signals are carefully crafted, they still appear noise-like patterns to humans. This observation has led to the argument that deep visual representation is misaligned with human perception. We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. We first propose an attack that fools a network to confuse a whole category of objects (source class) with a target label. Our attack also limits the unintended fooling by samples from non-sources classes, thereby circumscribing human-defined semantic notions for network fooling. We show that the proposed attack not only leads to the emergence of regular geometric patterns in the perturbations, but also reveals insightful information about the decision boundaries of deep models. Exploring this phenomenon further, we alter the `adversarial' objective of our attack to use it as a tool to `explain' deep visual representation. We show that by careful channeling and projection of the perturbations computed by our method, we can visualize a model's understanding of human-defined semantic notions. Finally, we exploit the explanability properties of our perturbations to perform image generation, inpainting and interactive image manipulation by attacking adversarialy robust `classifiers'.In all, our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models. The article also makes secondary contributions in terms of establishing the utility of our attack beyond the adversarial objective with multiple interesting applications.
【4】 Accelerated Policy Evaluation: Learning Adversarial Environments with Adaptive Importance Sampling 标题:加速策略评估:利用自适应重要性抽样学习对抗性环境
作者:Mengdi Xu,Peide Huang,Fengpei Li,Jiacheng Zhu,Xuewei Qi,Kentaro Oguchi,Zhiyuan Huang,Henry Lam,Ding Zhao 机构:. Carnegie Mellon University ,. Columbia University ,. Morgan Stanley AI CoE, . Toyota Motor North America R&D ,. Tongji University 备注:10 pages, 5 figures 链接:https://arxiv.org/abs/2106.10566 摘要:对罕见但高风险事件的评估仍然是从智能代理获取可靠策略的主要困难之一,特别是在大型或连续的状态/动作空间中,有限的可伸缩性强制使用大量的测试迭代。另一方面,安全关键系统中有偏见或不准确的策略评估可能会在部署过程中导致意外的灾难性故障。本文提出了一种加速策略评估(APE)方法,该方法能同时发现马尔可夫决策过程中的稀有事件并估计稀有事件概率。APE方法将环境本质看作一个对抗性的agent,通过自适应重要性抽样,学习零方差抽样分布进行策略评估。此外,通过引入函数逼近器,APE可以扩展到大的离散或连续空间。在适当的正则条件下,我们研究了算法的收敛性。我们的实证研究表明,在多智能体和单智能体环境下,APE估计稀有事件概率的方差较小,同时只使用了比基线方法少几个数量级的样本。 摘要:The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reliable policies from intelligent agents, especially in large or continuous state/action spaces where limited scalability enforces the use of a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. In this paper, we propose the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence properties of proposed algorithms under suitable regularity conditions. Our empirical studies show that APE estimates rare event probability with a smaller variance while only using orders of magnitude fewer samples compared to baseline methods in both multi-agent and single-agent environments.
【5】 Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions 标题:已知和未知转移的近极小极大最优对抗模仿学习
作者:Tian Xu,Ziniu Li,Yang Yu 机构:National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, Shenzhen , China, Pazhou Lab, Guangzhou, China, Polixir.ai 链接:https://arxiv.org/abs/2106.10424 摘要:本文致力于设计可证明有效的对抗性模仿学习(AIL)算法,直接从专家演示中优化策略。首先,在已知的过渡设置下,我们开发了一个专家样本复杂度为$tilde{O}(H ^{3/2}| S |/varepsilon)$的过渡感知的AIL算法,其中,$H$是规划范围,$S |$是状态空间大小,$varepsilon$是期望的策略值差距。这改进了以前所有方法的$tilde{O}(H^2 | S |/varepsilon^2)$的最佳界限,并与[Rajaraman et al.,2021]中的$tilde{Omega}(H ^{3/2}S |/varepsilon)$的下界匹配到对数因子。TAIL的核心是专家状态行为分布的细粒度估计,它明确地利用了转移函数信息。其次,考虑到实际环境中过渡函数通常是未知的,但允许环境交互,因此我们提出了一种基于模型的过渡感知AIL算法MB-TAIL。特别地,MB-TAIL通过与环境的交互作用建立了一个经验转换模型,并在恢复的经验模型下进行了仿真。MB-TAIL的交互复杂度为$tilde{O}(H^3 | S | 2 | A |/varepsilon^2)$,这改进了[Shani et al.,2021]中$tilde{O}(H^4 | S | 2 | A |/varepsilon ^2)$的最著名结果。最后,我们的理论结果得到了两个具有挑战性的mdp的数值计算和详细分析的支持。 摘要:This paper is dedicated to designing provably efficient adversarial imitation learning (AIL) algorithms that directly optimize policies from expert demonstrations. Firstly, we develop a transition-aware AIL algorithm named TAIL with an expert sample complexity of $tilde{O}(H^{3/2} |S|/varepsilon)$ under the known transition setting, where $H$ is the planning horizon, $|S|$ is the state space size and $varepsilon$ is desired policy value gap. This improves upon the previous best bound of $tilde{O}(H^2 |S| / varepsilon^2)$ for AIL methods and matches the lower bound of $tilde{Omega} (H^{3/2} |S|/varepsilon)$ in [Rajaraman et al., 2021] up to a logarithmic factor. The key ingredient of TAIL is a fine-grained estimator for expert state-action distribution, which explicitly utilizes the transition function information. Secondly, considering practical settings where the transition functions are usually unknown but environment interaction is allowed, we accordingly develop a model-based transition-aware AIL algorithm named MB-TAIL. In particular, MB-TAIL builds an empirical transition model by interacting with the environment and performs imitation under the recovered empirical model. The interaction complexity of MB-TAIL is $tilde{O} (H^3 |S|^2 |A| / varepsilon^2)$, which improves the best known result of $tilde{O} (H^4 |S|^2 |A| / varepsilon^2)$ in [Shani et al., 2021]. Finally, our theoretical results are supported by numerical evaluation and detailed analysis on two challenging MDPs.
【6】 Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments 标题:Scenic4RL:强化学习环境的程序化建模与生成
作者:Abdus Salam Azad,Edward Kim,Qiancheng Wu,Kimin Lee,Ion Stoica,Pieter Abbeel,Sanjit A. Seshia 机构:Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 备注:First two authors contributed equally. Currently Under Review 链接:https://arxiv.org/abs/2106.10365 摘要:强化学习(RL)agent的能力直接取决于环境生成的学习场景的多样性以及它捕捉真实场景的程度。然而,现有的环境/模拟器缺乏对初始状态分布和过渡动力学进行系统建模的支持。此外,在足球等复杂领域,可能场景的空间是无限的,这使得一个研究小组不可能提供一套完整的场景来训练、测试和基准RL算法。为了解决这个问题,我们首次采用了一种现有的形式化场景描述语言scient,直观地建模和生成交互场景。我们将SCENIC与googleresearch足球环境连接起来,创建了一个名为SCENIC4RL的平台。利用这个平台,我们提供了一个由36个场景程序组成的数据集,这些场景程序以场景和演示数据的形式编码,并从其中的一个子集生成演示数据。我们分享我们的实验结果,以显示我们的数据集和平台的有效性,训练,测试和基准RL算法。更重要的是,我们将我们的平台开源,使RL社区能够共同为构建一套全面的场景做出贡献。 摘要:The capability of reinforcement learning (RL) agent directly depends on the diversity of learning scenarios the environment generates and how closely it captures real-world situations. However, existing environments/simulators lack the support to systematically model distributions over initial states and transition dynamics. Furthermore, in complex domains such as soccer, the space of possible scenarios is infinite, which makes it impossible for one research group to provide a comprehensive set of scenarios to train, test, and benchmark RL algorithms. To address this issue, for the first time, we adopt an existing formal scenario specification language, SCENIC, to intuitively model and generate interactive scenarios. We interfaced SCENIC to Google Research Soccer environment to create a platform called SCENIC4RL. Using this platform, we provide a dataset consisting of 36 scenario programs encoded in SCENIC and demonstration data generated from a subset of them. We share our experimental results to show the effectiveness of our dataset and the platform to train, test, and benchmark RL algorithms. More importantly, we open-source our platform to enable RL community to collectively contribute to constructing a comprehensive set of scenarios.
【7】 Group-Structured Adversarial Training 标题:团体结构化对抗性训练
作者:Farzan Farnia,Amirali Aghazadeh,James Zou,David Tse 机构: University of California, edu‡Department of Biomedical Data Science, Stanford University 链接:https://arxiv.org/abs/2106.10324 摘要:抗输入数据扰动的鲁棒训练方法在机器学习领域受到了广泛的关注。在这个方向上的一个标准方法是对抗性训练,它使用对抗性扰动的训练样本来学习模型。然而,对抗性训练在对抗样本间的干扰时表现得并不理想,比如普遍的和群体的稀疏位移,而这种位移通常存在于生物数据中,比如不同组织的基因表达水平。在这项工作中,我们试图弥补这一最优性差距,并引入了群体结构对抗训练(GSAT),它学习一个对跨样本结构的扰动具有鲁棒性的模型。我们将GSAT描述为一个非凸-凹minimax优化问题,该问题使得一个群体结构的最优运输成本最小化。具体地说,我们重点研究了GSAT在群稀疏扰动和秩约束扰动中的应用。为了解决这种情况下GSAT的非光滑优化问题,结合梯度下降上升法(GDA)和交替方向乘子法(ADMM),提出了一种新的minimax优化算法GDADMM。本文介绍了GSAT框架在图像识别和计算生物学数据集抗结构扰动方面的若干应用。 摘要:Robust training methods against perturbations to the input data have received great attention in the machine learning literature. A standard approach in this direction is adversarial training which learns a model using adversarially-perturbed training samples. However, adversarial training performs suboptimally against perturbations structured across samples such as universal and group-sparse shifts that are commonly present in biological data such as gene expression levels of different tissues. In this work, we seek to close this optimality gap and introduce Group-Structured Adversarial Training (GSAT) which learns a model robust to perturbations structured across samples. We formulate GSAT as a non-convex concave minimax optimization problem which minimizes a group-structured optimal transport cost. Specifically, we focus on the applications of GSAT for group-sparse and rank-constrained perturbations modeled using group and nuclear norm penalties. In order to solve GSAT's non-smooth optimization problem in those cases, we propose a new minimax optimization algorithm called GDADMM by combining Gradient Descent Ascent (GDA) and Alternating Direction Method of Multipliers (ADMM). We present several applications of the GSAT framework to gain robustness against structured perturbations for image recognition and computational biology datasets.
【8】 Generative Model Adversarial Training for Deep Compressed Sensing 标题:用于深度压缩感知的生成模型对抗训练
作者:Ashkan Esmaeili 链接:https://arxiv.org/abs/2106.10696 摘要:深度压缩感知假设数据在潜在空间中具有稀疏表示,即数据本质上是低维的。假设原始数据通过低维到高维生成器从低维空间映射。在这项工作中,我们提出了如何设计这样一个适合压缩感知的低到高维深度学习生成器,同时满足对潜在域中普遍对抗性扰动的鲁棒性。我们还证明了为什么噪声被认为是在潜在的空间。这项工作也支持理论分析的鲁棒性训练发电机对抗性扰动。在真实数据集上的实验证实了所提出的生成模型对抗训练对深度压缩感知的有效性 摘要:Deep compressed sensing assumes the data has sparse representation in a latent space, i.e., it is intrinsically of low-dimension. The original data is assumed to be mapped from a low-dimensional space through a low-to-high-dimensional generator. In this work, we propound how to design such a low-to-high dimensional deep learning-based generator suiting for compressed sensing, while satisfying robustness to universal adversarial perturbations in the latent domain. We also justify why the noise is considered in the latent space. The work is also buttressed with theoretical analysis on the robustness of the trained generator to adversarial perturbations. Experiments on real-world datasets are provided to substantiate the efficacy of the proposed emph{generative model adversarial training for deep compressed sensing.}
半/弱/无/有监督|不确定性|主动学习(11篇)
【1】 Affinity Mixup for Weakly Supervised Sound Event Detection 标题:基于亲和力混合的弱监督声音事件检测
作者:Mohammad Rasool Izadi,Robert Stevenson,Laura N. Kloepper 链接:https://arxiv.org/abs/2106.11233 摘要:弱监督声音事件检测问题是在弱标记数据集中预测声音事件的存在及其相应的起点和终点。弱数据集将每个训练样本(短记录)与一个或多个当前源相关联。仅依赖卷积层和循环层的网络不能直接关联记录中的多个帧。在注意和图神经网络的激励下,我们引入了亲和性混合的概念,将时间层次的相似性结合起来,并在帧之间建立联系。这种正则化技术使用自适应亲和矩阵混合不同层中的特征。我们提议的亲和性混合网络比最先进的技术event-F1分数提高了8.2%$。 摘要:The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recording) to one or more present sources. Networks that solely rely on convolutional and recurrent layers cannot directly relate multiple frames in a recording. Motivated by attention and graph neural networks, we introduce the concept of an affinity mixup to incorporate time-level similarities and make a connection between frames. This regularization technique mixes up features in different layers using an adaptive affinity matrix. Our proposed affinity mixup network improves over state-of-the-art techniques event-F1 scores by $8.2%$.
【2】 Corruption Robust Active Learning 标题:腐败鲁棒主动学习
作者:Yifang Chen,Simon S. Du,Kevin Jamieson 机构:Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle,WA 链接:https://arxiv.org/abs/2106.11220 摘要:我们对未知对抗性标签损坏情况下基于流媒体的二值分类主动学习进行了理论研究。在这种情况下,每次学习者观察样本之前,对手都会决定是否破坏标签。首先,我们证明,在良性破坏环境(包括作为特例的错误指定环境)中,随着假设消除阈值的略微增大,经典的RobustCAL框架可以(令人惊讶地)获得与非破坏环境中几乎相同的标签复杂性保证。但是,此算法在一般损坏设置中可能会失败。为了解决这个缺点,我们提出了一个新的算法,它是可证明正确的不存在任何假设的腐蚀。此外,该算法在未损坏设置(由RobustCAL实现)中具有极大的最小标签复杂度,并且只需要在损坏设置中增加$tilde{mathcal{O}(C{mathrm{total}})$个标签即可实现$mathcal{O}(varepsilon frac{C{mathrm{total}}{n})$,其中$varepsilon$是目标精度,$C{mathrm{total}}$是损坏的总数,$n$是未标记样本的总数。 摘要:We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions. In this setting, every time before the learner observes a sample, the adversary decides whether to corrupt the label or not. First, we show that, in a benign corruption setting (which includes the misspecification setting as a special case), with a slight enlargement on the hypothesis elimination threshold, the classical RobustCAL framework can (surprisingly) achieve nearly the same label complexity guarantee as in the non-corrupted setting. However, this algorithm can fail in the general corruption setting. To resolve this drawback, we propose a new algorithm which is provably correct without any assumptions on the presence of corruptions. Furthermore, this algorithm enjoys the minimax label complexity in the non-corrupted setting (which is achieved by RobustCAL) and only requires $tilde{mathcal{O}}(C_{mathrm{total}})$ additional labels in the corrupted setting to achieve $mathcal{O}(varepsilon frac{C_{mathrm{total}}}{n})$, where $varepsilon$ is the target accuracy, $C_{mathrm{total}}$ is the total number of corruptions and $n$ is the total number of unlabeled samples.
【3】 Deep Learning-Based Active User Detection for Grant-free SCMA Systems 标题:基于深度学习的无授权SCMA系统活动用户检测
作者:Thushan Sivalingam,Samad Ali,Nurul Huda Mahmood,Nandana Rajatheva,Matti Latva-Aho 机构:Centre for Wireless Communications, University of Oulu, Oulu, Finland 备注:Accepted for 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 链接:https://arxiv.org/abs/2106.11198 摘要:无授权随机接入和上行链路非正交多址接入(NOMA)被引入到大规模机器通信(mMTC)中,以减少传输时延和信令开销。本文针对mMTC上行链路无授权稀疏码多址(SCMA)系统,提出了两种新的基于分组的深度神经网络主动用户检测(AUD)方案。所提出的AUD方案学习非线性映射,即多维码本结构和信道特性。这是通过接收信号来实现的,该信号将设备活动的稀疏结构与训练数据集结合起来。此外,离线预训练模型能够在没有任何信道状态信息和设备稀疏度先验知识的情况下检测出活动设备。仿真结果表明,在多个有源器件的情况下,与传统的AUD方案相比,该方案在信噪比范围内的检测概率提高了一倍以上。 摘要:Grant-free random access and uplink non-orthogonal multiple access (NOMA) have been introduced to reduce transmission latency and signaling overhead in massive machine-type communication (mMTC). In this paper, we propose two novel group-based deep neural network active user detection (AUD) schemes for the grant-free sparse code multiple access (SCMA) system in mMTC uplink framework. The proposed AUD schemes learn the nonlinear mapping, i.e., multi-dimensional codebook structure and the channel characteristic. This is accomplished through the received signal which incorporates the sparse structure of device activity with the training dataset. Moreover, the offline pre-trained model is able to detect the active devices without any channel state information and prior knowledge of the device sparsity level. Simulation results show that with several active devices, the proposed schemes obtain more than twice the probability of detection compared to the conventional AUD schemes over the signal to noise ratio range of interest.
【4】 Low-rank Dictionary Learning for Unsupervised Feature Selection 标题:用于无监督特征选择的低阶字典学习
作者:Mohsen Ghassemi Parsa,Hadi Zare,Mehdi Ghatee 机构:University of Tehran, Tehran, Iran, Department of Mathematics and Computer Science, Amirkabir University of, Technology, Tehran, Iran 链接:https://arxiv.org/abs/2106.11102 摘要:在生物、计算机视觉、社会网络等现实应用中,存在着大量的高维数据。特征选择方法是为了应对高维数据的挑战而设计的,其目标是有效的学习技术以及降低模型的复杂性。由于在这些数据集上标注的困难,在无监督的环境下,有多种方法通过考虑数据的一些重要特征来进行特征选择。在本文中,我们引入了一种新的无监督特征选择方法,通过在低秩表示中应用字典学习思想。低秩表示中的词典学习不仅可以提供新的表示,而且可以保持特征的相关性。然后,利用光谱分析来保持样本的相似性。最后,通过$ell{2,1}$-范数正则化,以稀疏的方式提出了一个统一的无监督特征选择目标函数。此外,还设计了一种有效的数值算法来求解相应的优化问题。我们在不同应用领域的标准数据集上验证了该方法的性能。实验结果表明,该方法优于现有的算法。 摘要:There exist many high-dimensional data in real-world applications such as biology, computer vision, and social networks. Feature selection approaches are devised to confront with high-dimensional data challenges with the aim of efficient learning technologies as well as reduction of models complexity. Due to the hardship of labeling on these datasets, there are a variety of approaches on feature selection process in an unsupervised setting by considering some important characteristics of data. In this paper, we introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation. Dictionary learning in a low-rank representation not only enables us to provide a new representation, but it also maintains feature correlation. Then, spectral analysis is employed to preserve sample similarities. Finally, a unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_{2,1}$-norm regularization. Furthermore, an efficient numerical algorithm is designed to solve the corresponding optimization problem. We demonstrate the performance of the proposed method based on a variety of standard datasets from different applied domains. Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
【5】 Visual Probing: Cognitive Framework for Explaining Self-Supervised Image Representations 标题:视觉探测:解释自我监督图像表征的认知框架
作者:Witold Oleszkiewicz,Dominika Basaj,Igor Sieradzki,Michał Górszczak,Barbara Rychalska,Koryna Lewandowska,Tomasz Trzciński,Bartosz Zieliński 机构: Lewandowska is with Department of Cognitive Neuroscience and Neu-roergonomics 链接:https://arxiv.org/abs/2106.11054 摘要:最近引入的图像表示学习的自监督方法与完全监督的竞争对手相比提供了相当或更好的结果,然而相应的解释自监督方法的努力却相对滞后。基于这一观察,我们引入了一个新的视觉探测框架,通过利用自然语言处理中的探测任务来解释自监督模型。探测任务需要了解图像部分之间的语义关系。因此,我们提出了一个系统的方法来获得视觉中自然语言的类似物,例如视觉词汇、上下文和分类法。我们的建议是基于马尔的视觉计算理论和关注的特点,如纹理,形状和线条。我们在解释自我监督表征时展示了这些类比的有效性和适用性。我们的主要发现强调语言和视觉之间的关系可以作为一个有效而直观的工具来发现机器学习模型是如何工作的,独立于数据模式。我们的工作开启了一条通向更可解释和透明人工智能的研究道路。 摘要:Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors, yet the corresponding efforts to explain the self-supervised approaches lag behind. Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models by leveraging probing tasks employed previously in natural language processing. The probing tasks require knowledge about semantic relationships between image parts. Hence, we propose a systematic approach to obtain analogs of natural language in vision, such as visual words, context, and taxonomy. Our proposal is grounded in Marr's computational theory of vision and concerns features like textures, shapes, and lines. We show the effectiveness and applicability of those analogs in the context of explaining self-supervised representations. Our key findings emphasize that relations between language and vision can serve as an effective yet intuitive tool for discovering how machine learning models work, independently of data modality. Our work opens a plethora of research pathways towards more explainable and transparent AI.
【6】 Active Learning for Deep Neural Networks on Edge Devices 标题:边缘设备上深度神经网络的主动学习
作者:Yuya Senzaki,Christian Hamelain 机构:Idein Inc., Tokyo, Japan 链接:https://arxiv.org/abs/2106.10836 摘要:在边缘器件的深度神经网络(DNN)应用中,模型的不断更新是非常重要的。尽管用实际输入数据更新模型是理想的,但是由于标签和通信成本等限制,使用所有这些数据并不总是可行的。因此,有必要过滤和选择用于设备上的训练(即,主动学习)的数据。本文将一个实用的基于边缘设备的DNNs主动学习问题形式化,提出了一个通用的任务无关框架来解决这个问题,并将其归结为一个流子模最大化问题。这个框架足够轻,可以用较低的计算资源运行,但是由于子模块的特性,它提供的解决方案的质量在理论上是有保证的。通过这个框架,我们可以灵活地配置数据选择标准,包括使用以前主动学习研究中提出的方法。我们评估了我们的方法在两个分类和目标检测任务在实际环境中模拟现实生活中的情况。研究结果表明,该框架在两种任务上都优于其他方法,同时在实际设备上运行速度也很快。 摘要:When dealing with deep neural network (DNN) applications on edge devices, continuously updating the model is important. Although updating a model with real incoming data is ideal, using all of them is not always feasible due to limits, such as labeling and communication costs. Thus, it is necessary to filter and select the data to use for training (i.e., active learning) on the device. In this paper, we formalize a practical active learning problem for DNNs on edge devices and propose a general task-agnostic framework to tackle this problem, which reduces it to a stream submodular maximization. This framework is light enough to be run with low computational resources, yet provides solutions whose quality is theoretically guaranteed thanks to the submodular property. Through this framework, we can configure data selection criteria flexibly, including using methods proposed in previous active learning studies. We evaluate our approach on both classification and object detection tasks in a practical setting to simulate a real-life scenario. The results of our study show that the proposed framework outperforms all other methods in both tasks, while running at a practical speed on real devices.
【7】 Demonstration of Panda: A Weakly Supervised Entity Matching System 标题:Panda演示:一个弱监督的实体匹配系统
作者:Renzhi Wu,Prem Sakala,Peng Li,Xu Chu,Yeye He 机构:†Georgia Institute of Technology, §Microsoft Research 备注:None 链接:https://arxiv.org/abs/2106.10821 摘要:实体匹配(entitymatching,EM)是指在一个或多个关系中识别元组对的问题,这些元组对引用相同的真实实体。有监督机器学习(ML)方法,特别是基于深度学习的方法,通常可以获得最先进的匹配结果。然而,这些方法需要许多标记的例子,以匹配对和非匹配对的形式出现,这些标记既昂贵又耗时。本文介绍了Panda,一个专为EM设计的弱监督系统。Panda使用与Snokel相同的标签函数抽象,其中标签函数(LF)是用户提供的程序,可以快速、廉价地生成大量(有点噪声的)标签,然后可以通过一个标记模型来组合,从而生成准确的最终预测。为了支持用户为EM开发LFs,Panda提供了一个集成开发环境(IDE),该环境采用现代浏览器体系结构。Panda的IDE在EM任务的上下文中促进LFs的开发、调试和生命周期管理,类似于visualstudio或Eclipse等IDE在通用编程中的表现。Panda的IDE包含许多专为EM构建的新功能,如智能数据采样、EM实用函数的内置库、自动生成的LFs、LFs的可视化调试,以及最终的EM特定标记模型。在这个演示中,我们展示了pandaide可以极大地加快高质量EM解决方案的开发。 摘要:Entity matching (EM) refers to the problem of identifying tuple pairs in one or more relations that refer to the same real world entities. Supervised machine learning (ML) approaches, and deep learning based approaches in particular, typically achieve state-of-the-art matching results. However, these approaches require many labeled examples, in the form of matching and non-matching pairs, which are expensive and time-consuming to label. In this paper, we introduce Panda, a weakly supervised system specifically designed for EM. Panda uses the same labeling function abstraction as Snorkel, where labeling functions (LF) are user-provided programs that can generate large amounts of (somewhat noisy) labels quickly and cheaply, which can then be combined via a labeling model to generate accurate final predictions. To support users developing LFs for EM, Panda provides an integrated development environment (IDE) that lives in a modern browser architecture. Panda's IDE facilitates the development, debugging, and life-cycle management of LFs in the context of EM tasks, similar to how IDEs such as Visual Studio or Eclipse excel in general-purpose programming. Panda's IDE includes many novel features purpose-built for EM, such as smart data sampling, a builtin library of EM utility functions, automatically generated LFs, visual debugging of LFs, and finally, an EM-specific labeling model. We show in this demo that Panda IDE can greatly accelerate the development of high-quality EM solutions using weak supervision.
【8】 Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences 标题:标签、复制或预测:使用序列进行视觉信息提取的统一弱监督学习框架
作者:Jiapeng Wang,Tianwei Wang,Guozhi Tang,Lianwen Jin,Weihong Ma,Kai Ding,Yichao Huang 机构:School of Electronic and Information Engineering, South China University of Technology, China, IntSig Information Co., Ltd, Shanghai, China, Guangdong Artificial Intelligence and Digital Economy Laboratory (Pazhou Lab), Guangzhou, China 备注:IJCAI2021 链接:https://arxiv.org/abs/2106.10681 摘要:视觉信息抽取(VIE)近年来受到越来越多的关注。现有的方法通常是先将光学字符识别(OCR)结果组织成纯文本,然后利用标记级实体标注作为监督来训练序列标注模型。然而,它耗费了大量的注释成本,并且可能会导致标签混淆,OCR错误也会严重影响最终的性能。本文提出了一个统一的弱监督学习框架TCPN(Tag,Copy或Predict Network),该框架引入了一个高效的编码器来同时对二维OCR结果中的语义和布局信息进行建模;2) 一种仅利用关键信息序列作为监督的弱监督训练策略;3)一种灵活可切换的解码器,包含两种推理模式:一种(复制或预测模式)是通过在每个时间步从输入或预测中复制一个令牌来输出不同类别的关键信息序列;另一种(标记模式)是在单个前向传递中直接标记输入序列。我们的方法在几个公共基准上显示了最新的性能,这充分证明了它的有效性。 摘要:Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.
【9】 Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection 标题:半监督自定步集优化运输在跨医院脓毒症早期发现中的应用
作者:Ruiqing Ding,Yu Zhou,Jie Xu,Yan Xie,Qiqiang Liang,He Ren,Yixuan Wang,Yanlin Chen,Leye Wang,Man Huang 机构: Zhejiang University School of Medicine SecondAffiliated Hospital 备注:14 pages, 9 figures 链接:https://arxiv.org/abs/2106.10352 摘要:近年来,利用计算机技术解决医学场景中的问题引起了人们的广泛关注,但仍有很大的潜力和探索空间。其中,机器学习被广泛应用于脓毒症的预测、诊断甚至治疗。然而,最先进的方法需要大量的标记医学数据进行监督学习。在实际应用中,如果一家医院想要部署一个新的脓毒症检测系统,缺少标记数据将造成巨大的障碍。与监督学习不同的是,我们需要使用已知的信息(例如,来自另一家有丰富标记数据的医院)来帮助建立一个性能可以接受的模型,即转移学习。在本文中,我们提出了一个半监督的最佳传输与自定步调集成框架脓毒症早期检测,称为SPSSOT,转移知识从其他有丰富的标记数据。在SPSSOT中,我们首先从源域(例如标记数据丰富的医院)和目标域(例如标记数据较少的医院)中提取相同的临床指标,然后将基于最优传输理论的半监督域自适应算法与自适应欠采样相结合,避免了由于协变量漂移和类不平衡引起的负迁移。总的来说,SPSSOT是一种用于脓毒症早期检测的端到端迁移学习方法,它可以根据迭代次数分别从两个域中自动选择合适的样本,并对齐两个域的特征空间。在两个开放的临床数据集上进行的大量实验表明,与其他方法相比,我们提出的SPSSOT可以显著提高AUC值,在两个迁移学习场景中,目标域中只有1%的标记数据,即模拟$rightarrow$挑战和挑战$rightarrow$模拟。 摘要:The utilization of computer technology to solve problems in medical scenarios has attracted considerable attention in recent years, which still has great potential and space for exploration. Among them, machine learning has been widely used in the prediction, diagnosis and even treatment of Sepsis. However, state-of-the-art methods require large amounts of labeled medical data for supervised learning. In real-world applications, the lack of labeled data will cause enormous obstacles if one hospital wants to deploy a new Sepsis detection system. Different from the supervised learning setting, we need to use known information (e.g., from another hospital with rich labeled data) to help build a model with acceptable performance, i.e., transfer learning. In this paper, we propose a semi-supervised optimal transport with self-paced ensemble framework for Sepsis early detection, called SPSSOT, to transfer knowledge from the other that has rich labeled data. In SPSSOT, we first extract the same clinical indicators from the source domain (e.g., hospital with rich labeled data) and the target domain (e.g., hospital with little labeled data), then we combine the semi-supervised domain adaptation based on optimal transport theory with self-paced under-sampling to avoid a negative transfer possibly caused by covariate shift and class imbalance. On the whole, SPSSOT is an end-to-end transfer learning method for Sepsis early detection which can automatically select suitable samples from two domains respectively according to the number of iterations and align feature space of two domains. Extensive experiments on two open clinical datasets demonstrate that comparing with other methods, our proposed SPSSOT, can significantly improve the AUC values with only 1% labeled data in the target domain in two transfer learning scenarios, MIMIC $rightarrow$ Challenge and Challenge $rightarrow$ MIMIC.
【10】 Dependency Structure Misspecification in Multi-Source Weak Supervision Models 标题:多源弱监督模型中的依赖结构错误规范
作者:Salva Rühling Cachay,Benedikt Boecking,Artur Dubrawski 机构:Carnegie Mellon University 备注:Oral presentation at the Workshop on Weakly Supervised Learning at ICLR 2021 链接:https://arxiv.org/abs/2106.10302 摘要:数据编程(DP)已被证明是一个有吸引力的替代昂贵的手工数据标签。在DP中,用户将领域知识编码为emph{labeling functions}(LF),这是一种启发式方法,它对数据的子集进行有噪声的标记,并且可能具有复杂的依赖关系。然后将标签模型拟合到LFs中,以产生未知类标签的估计。研究了标签模型错误对下游分类器测试集性能的影响。这给实践者带来了一个严重的认识差距,特别是因为在DP的现场应用中,LFs之间的依赖结构常常被忽略。我们分析了由于结构过度规范而导致的建模误差。我们推导了建模误差的新的理论界,并从经验上证明了这种误差可能是巨大的,即使在建模一个看似合理的结构时也是如此。 摘要:Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into emph{labeling functions} (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.
【11】 Machine Learning based optimization for interval uncertainty propagation with application to vibro-acoustic models 标题:基于机器学习的区间不确定性传播优化及其在振声模型中的应用
作者:Alice Cicirello,Filippo Giunta 机构:Section of Mechanics and Physics of Structures, Delft University of Technology, Stevinweg , Delft , NL 备注:Preprint submitted to Mechanical Systems and Signal Processing 链接:https://arxiv.org/abs/2106.11215 摘要:本文提出了两种非侵入式的不确定性传播方法,用于以区间变量为参数的确定性计算机模型的性能分析。这些方法采用基于机器学习的优化策略,即所谓的贝叶斯优化,当每个区间变量在其范围内独立变化时,用于评估一组可能响应的通用响应变量的上下界。由于没有评估所有可能的区间变量组合的响应函数而导致的知识缺乏,可以通过使用高斯过程回归模型开发响应变量本身的概率描述来解释。开发了一个迭代程序,用于选择少量模拟进行评估,以便通过使用已建立的采集函数来更新此统计模型,并评估响应边界。两种方法都定义了初始训练数据集。一种方法迭代构建两个不同的训练数据集,分别计算响应变量的上下界,另一种方法迭代构建单个训练数据集。因此,这两种方法将在每次迭代中产生不同的界估计。上限和下限响应表示为从后验分布的平均函数获得的点估计。此外,当针对没有进行确定性模拟的区间变量的组合获得这些估计时,提供了关于每个估计的置信区间,以便有效地与工程师通信。最后,提出了两个度量来定义评估预测界估计是否令人满意的条件。 摘要:Two non-intrusive uncertainty propagation approaches are proposed for the performance analysis of engineering systems described by expensive-to-evaluate deterministic computer models with parameters defined as interval variables. These approaches employ a machine learning based optimization strategy, the so-called Bayesian optimization, for evaluating the upper and lower bounds of a generic response variable over the set of possible responses obtained when each interval variable varies independently over its range. The lack of knowledge caused by not evaluating the response function for all the possible combinations of the interval variables is accounted for by developing a probabilistic description of the response variable itself by using a Gaussian Process regression model. An iterative procedure is developed for selecting a small number of simulations to be evaluated for updating this statistical model by using well-established acquisition functions and to assess the response bounds. In both approaches, an initial training dataset is defined. While one approach builds iteratively two distinct training datasets for evaluating separately the upper and lower bounds of the response variable, the other builds iteratively a single training dataset. Consequently, the two approaches will produce different bound estimates at each iteration. The upper and lower bound responses are expressed as point estimates obtained from the mean function of the posterior distribution. Moreover, a confidence interval on each estimate is provided for effectively communicating to engineers when these estimates are obtained for a combination of the interval variables for which no deterministic simulation has been run. Finally, two metrics are proposed to define conditions for assessing if the predicted bound estimates can be considered satisfactory.
迁移|Zero/Few/One-Shot|自适应(7篇)
【1】 Towards Better Shale Gas Production Forecasting Using Transfer Learning 标题:利用迁移学习实现更好的页岩气产量预测
作者:Omar S. Alolayan,Samuel J. Raymond,Justin B. Montgomery,John R. Williams 机构:Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA ,., The Center for Computational Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA ,. 链接:https://arxiv.org/abs/2106.11051 摘要:在样本井数有限的县域,利用传递学习,深层神经网络可以生成更准确的页岩气产量预测。本文提供了一种将在相邻县域上训练的其他深度神经网络模型的知识转化为感兴趣县域的方法。本文使用来自德克萨斯州巴奈特和宾夕法尼亚州马塞勒斯页岩地层17个县的6000多口页岩气井的数据来测试迁移学习的能力。结果表明,与广泛应用的Arps递减曲线模型相比,预测误差降低了11%~47%。 摘要:Deep neural networks can generate more accurate shale gas production forecasts in counties with a limited number of sample wells by utilizing transfer learning. This paper provides a way of transferring the knowledge gained from other deep neural network models trained on adjacent counties into the county of interest. The paper uses data from more than 6000 shale gas wells across 17 counties from Texas Barnett and Pennsylvania Marcellus shale formations to test the capabilities of transfer learning. The results reduce the forecasting error between 11% and 47% compared to the widely used Arps decline curve model.
【2】 STEP-EZ: Syntax Tree guided semantic ExPlanation for Explainable Zero-shot modeling of clinical depression symptoms from text 标题:STEP-EZ:句法树引导的语义解释对临床抑郁症状文本的可解释零射建模
作者:Nawshad Farruque,Randy Goebel,Osmar Zaiane,Sudhakar Sivapalan 机构:Sivapalan, Department of Computing Science, University of Alberta, Alberta Machine Intelligence Institute (AMII), University of Alberta, Department of Psychiatry, University of Alberta 链接:https://arxiv.org/abs/2106.10928 摘要:我们致力于探索Zero-Shot学习(ZSL)的各种方法,以及它们对一个因训练数据匮乏而臭名昭著的具有挑战性但又很重要的有监督学习任务的解释能力,即从文本中检测抑郁症状(DSD)。我们首先在临床医生的帮助下,对ZSL模型的不同组成部分进行综合,并对我们的基本事实样本和抑郁症症状线索的治疗过程进行分析。接下来,我们将分析各种最先进的ZSL模型的准确性以及它们对我们任务的潜在增强。此外,我们还为使用ZSL进行基于文本的分层解释机制勾画了一个框架,我们称之为语法树指导的语义解释(STEP)。最后,我们总结了实验结果,从中我们可以得出结论,我们可以使用ZSL模型,并达到合理的准确性和解释性,衡量了提出的解释性指数(EI)。据我们所知,这项工作是第一次从准确性和解释性两个方面全面探讨ZSL模型在DSD任务中的有效性。 摘要:We focus on exploring various approaches of Zero-Shot Learning (ZSL) and their explainability for a challenging yet important supervised learning task notorious for training data scarcity, i.e. Depression Symptoms Detection (DSD) from text. We start with a comprehensive synthesis of different components of our ZSL modeling and analysis of our ground truth samples and Depression symptom clues curation process with the help of a practicing clinician. We next analyze the accuracy of various state-of-the-art ZSL models and their potential enhancements for our task. Further, we sketch a framework for the use of ZSL for hierarchical text-based explanation mechanism, which we call, Syntax Tree-Guided Semantic Explanation (STEP). Finally, we summarize experiments from which we conclude that we can use ZSL models and achieve reasonable accuracy and explainability, measured by a proposed Explainability Index (EI). This work is, to our knowledge, the first work to exhaustively explore the efficacy of ZSL models for DSD task, both in terms of accuracy and explainability.
【3】 Trainable Class Prototypes for Few-Shot Learning 标题:用于Few-Shot学习的可训练类原型
作者:Jianyi Li,Guizhong Liu 机构:School of Information and Communications, Xi’an Jiaotong University, Xi’an, P.R. China 备注:8 pages, 2 figures,and 3 Tables. arXiv admin note: substantial text overlap with arXiv:2008.09942 链接:https://arxiv.org/abs/2106.10846 摘要:度量学习是一种广泛应用的Few-Shot学习方法,其中原型的质量是算法的关键。本文在元训练和任务训练框架下,提出了可训练的距离测量原型,而不是人工的距离测量原型。同时为了避免幕式元训练带来的弊端,我们采用了基于自监督学习的非幕式元训练。总体而言,我们分两个阶段来解决少数镜头任务:通过自监督学习对可转移特征提取器进行元训练和对原型进行度量分类训练。此外,元训练和任务训练都采用了简单的注意机制。我们的方法在标准的Few-Shot视觉分类数据集上实现了各种已建立的Few-Shot任务的最新性能,与现有的无监督Few-Shot学习方法相比提高了约20%。 摘要:Metric learning is a widely used method for few shot learning in which the quality of prototypes plays a key role in the algorithm. In this paper we propose the trainable prototypes for distance measure instead of the artificial ones within the meta-training and task-training framework. Also to avoid the disadvantages that the episodic meta-training brought, we adopt non-episodic meta-training based on self-supervised learning. Overall we solve the few-shot tasks in two phases: meta-training a transferable feature extractor via self-supervised learning and training the prototypes for metric classification. In addition, the simple attention mechanism is used in both meta-training and task-training. Our method achieves state-of-the-art performance in a variety of established few-shot tasks on the standard few-shot visual classification dataset, with about 20% increase compared to the available unsupervised few-shot learning methods.
【4】 Transfer Bayesian Meta-learning via Weighted Free Energy Minimization 标题:基于加权自由能最小化的传递贝叶斯元学习
作者:Yunchuan Zhang,Sharu Theresa Jose,Osvaldo Simeone 机构:King’s Communications, Learning and Information Processing (KCLIP) Lab, Department of Engineering, King’s College London, London WC,R ,LS, UK 备注:9 pages, 5 figures, submitted to IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING 2021 链接:https://arxiv.org/abs/2106.10711 摘要:元学习基于从许多辅助任务中采集的数据,优化训练过程的超参数,如初始化、内核或学习速率。一个关键的基本假设是,辅助任务(称为元训练任务)与部署时遇到的任务(称为元测试任务)共享相同的生成分布。然而,当测试环境不同于元训练条件时,情况可能并非如此。为了解决元训练和元测试阶段任务生成分布的变化问题,本文引入了加权自由能最小化(WFEM)的迁移元学习方法。我们提出了一种基于高斯过程(GPs)的非参数贝叶斯回归分类方法。通过与PACOH实现的GP先验标准元学习的比较,验证了该方法在玩具正弦回归问题以及minimagenet和CUB数据集分类上的有效性。 摘要:Meta-learning optimizes the hyperparameters of a training procedure, such as its initialization, kernel, or learning rate, based on data sampled from a number of auxiliary tasks. A key underlying assumption is that the auxiliary tasks, known as meta-training tasks, share the same generating distribution as the tasks to be encountered at deployment time, known as meta-test tasks. This may, however, not be the case when the test environment differ from the meta-training conditions. To address shifts in task generating distribution between meta-training and meta-testing phases, this paper introduces weighted free energy minimization (WFEM) for transfer meta-learning. We instantiate the proposed approach for non-parametric Bayesian regression and classification via Gaussian Processes (GPs). The method is validated on a toy sinusoidal regression problem, as well as on classification using miniImagenet and CUB data sets, through comparison with standard meta-learning of GP priors as implemented by PACOH.
【5】 Task Attended Meta-Learning for Few-Shot Learning 标题:任务参与的元学习中的少数几次学习
作者:Aroof Aimen,Sahil Sidheekh,Narayanan C. Krishnan 机构:Indian Institute of Technology, Ropar 链接:https://arxiv.org/abs/2106.10642 摘要:元学习(Meta-learning,ML)是在资源受限条件下(如Few-Shot学习)学习模型的一个很有前途的发展方向。目前流行的ML方法要么学习一个可推广的初始模型,要么通过幕式训练学习一个通用的参数优化器。前一种方法利用一批任务的知识来学习最优先验知识。在这项工作中,我们研究了批处理对ML的重要性。具体来说,我们首先引入了批处理-幕式训练方案来改进泛型参数优化器的学习。我们还假设,在间歇式训练中,一个批次中的每个任务对学习最优元模型的贡献相等的共同假设不一定是真的。我们建议根据任务在元模型学习中的“重要性”对任务进行分批加权。为此,我们引入了一种以人类选择性聚焦为动机的训练课程,称为任务参与元训练。任务注意是一个独立的模块,可以与任何间歇式训练方案相结合。通过在minimagenet和tieredImageNet等复杂数据集上与非任务参与模型的比较,验证了该方法的有效性。 摘要:Meta-learning (ML) has emerged as a promising direction in learning models under constrained resource settings like few-shot learning. The popular approaches for ML either learn a generalizable initial model or a generic parametric optimizer through episodic training. The former approaches leverage the knowledge from a batch of tasks to learn an optimal prior. In this work, we study the importance of a batch for ML. Specifically, we first incorporate a batch episodic training regimen to improve the learning of the generic parametric optimizer. We also hypothesize that the common assumption in batch episodic training that each task in a batch has an equal contribution to learning an optimal meta-model need not be true. We propose to weight the tasks in a batch according to their "importance" in improving the meta-model's learning. To this end, we introduce a training curriculum motivated by selective focus in humans, called task attended meta-training, to weight the tasks in a batch. Task attention is a standalone module that can be integrated with any batch episodic training regimen. The comparisons of the models with their non-task-attended counterparts on complex datasets like miniImageNet and tieredImageNet validate its effectiveness.
【6】 High-level Features for Resource Economy and Fast Learning in Skill Transfer 标题:资源节约型和技能转移学习快速性的高层次特征
作者:Alper Ahmetoglu,Emre Ugur,Minoru Asada,Erhan Oztop 机构:Department of Computer Engineering, Bogazici University, Turkey, OTRISISREC, Osaka University, Osaka, Japan, Department of Computer Science, Ozyegin University, Turkey, (Received , Month ,X; accepted , Month ,X) 链接:https://arxiv.org/abs/2106.10354 摘要:抽象是智能的一个重要方面,它使智能体能够构造有效决策的健壮表示。在过去的十年中,深度网络被证明是有效的,因为它们能够形成越来越复杂的抽象。然而,这些抽象分布在许多神经元上,使得重复使用所学技能的代价高昂。以前的工作要么强制形成抽象,造成设计师的偏见,要么使用大量的神经单元,而没有研究如何获得更有效地捕获源任务的高级特征。为了避免设计师的偏见和不共享的资源使用,我们建议利用神经反应动力学形成紧凑的表达,用于技能转移。为此,我们考虑了基于(1)最大信息压缩原理和(2)抽象事件倾向于产生缓慢变化的信号的概念的两种竞争方法,并将它们应用于任务执行过程中产生的神经信号。具体地说,在我们的模拟实验中,我们对深度网络执行源任务时从最后一个隐藏层采集的信号进行主成分分析(PCA)或慢特征分析(SFA),并在新的目标任务中使用这些特征进行技能转移。我们比较了这些方案的泛化性能与基线的技能转移与全层输出和无转移设置。我们的结果表明,SFA单位是最成功的技能转移。与通常的技能转移相比,SFA和PCA产生的资源更少,由此形成的许多单位显示出反映末端效应器-障碍-目标关系的局部反应。最后,具有最低特征值的SFA单元类似于与高级特征高度相关的符号表示,例如可以被认为是完全符号系统的前兆的关节角。 摘要:Abstraction is an important aspect of intelligence which enables agents to construct robust representations for effective decision making. In the last decade, deep networks are proven to be effective due to their ability to form increasingly complex abstractions. However, these abstractions are distributed over many neurons, making the re-use of a learned skill costly. Previous work either enforced formation of abstractions creating a designer bias, or used a large number of neural units without investigating how to obtain high-level features that may more effectively capture the source task. For avoiding designer bias and unsparing resource use, we propose to exploit neural response dynamics to form compact representations to use in skill transfer. For this, we consider two competing methods based on (1) maximum information compression principle and (2) the notion that abstract events tend to generate slowly changing signals, and apply them to the neural signals generated during task execution. To be concrete, in our simulation experiments, we either apply principal component analysis (PCA) or slow feature analysis (SFA) on the signals collected from the last hidden layer of a deep network while it performs a source task, and use these features for skill transfer in a new target task. We compare the generalization performance of these alternatives with the baselines of skill transfer with full layer output and no-transfer settings. Our results show that SFA units are the most successful for skill transfer. SFA as well as PCA, incur less resources compared to usual skill transfer, whereby many units formed show a localized response reflecting end-effector-obstacle-goal relations. Finally, SFA units with lowest eigenvalues resembles symbolic representations that highly correlate with high-level features such as joint angles which might be thought of precursors for fully symbolic systems.
【7】 Deep Learning for Functional Data Analysis with Adaptive Basis Layers 标题:基于自适应基本层的函数数据分析深度学习
作者:Junwen Yao,Jonas Mueller,Jane-Ling Wang 备注:ICML 2021 链接:https://arxiv.org/abs/2106.10414 摘要:尽管深部神经网络已经取得了广泛的成功,但其在功能性数据中的应用仍然很少。函数数据的无限维性意味着标准的学习算法只有在适当的降维后才能应用,通常通过基展开来实现。目前,这些基础是事先选择的,没有手头任务的信息,因此可能对指定的任务无效。相反,我们建议以端到端的方式自适应地学习这些基础。我们介绍了一种新的神经网络,它采用一个新的基层,其隐单元是每个基函数本身,作为一个微神经网络来实现。我们的架构学习对功能输入应用简约降维,只关注与目标相关的信息,而不是输入函数中不相关的变化。在众多的函数数据分类/回归任务中,我们的方法在经验上优于其他类型的神经网络,并且我们证明了我们的方法在统计上与低泛化误差是一致的。代码位于:url{https://github.com/jwyyy/AdaFNN}. 摘要:Despite their widespread success, the application of deep neural networks to functional data remains scarce today. The infinite dimensionality of functional data means standard learning algorithms can be applied only after appropriate dimension reduction, typically achieved via basis expansions. Currently, these bases are chosen a priori without the information for the task at hand and thus may not be effective for the designated task. We instead propose to adaptively learn these bases in an end-to-end fashion. We introduce neural networks that employ a new Basis Layer whose hidden units are each basis functions themselves implemented as a micro neural network. Our architecture learns to apply parsimonious dimension reduction to functional inputs that focuses only on information relevant to the target rather than irrelevant variation in the input function. Across numerous classification/regression tasks with functional data, our method empirically outperforms other types of neural networks, and we prove that our approach is statistically consistent with low generalization error. Code is available at: url{https://github.com/jwyyy/AdaFNN}.
强化学习(5篇)
【1】 A Max-Min Entropy Framework for Reinforcement Learning 标题:一种强化学习的极大-最小熵框架
作者:Seungyul Han,Youngchul Sung 机构:Dept. of Electrical Engineering, KAIST, Daejeon, South Korea 备注:Submitted to NIPS 2021 链接:https://arxiv.org/abs/2106.10517 摘要:针对最大熵强化学习框架在无模型样本学习中的局限性,提出了一种最大最小熵强化学习框架。最大熵RL框架指导学习策略到达高熵状态,而max-min-entropy框架旨在学习访问低熵状态并最大化这些低熵状态的熵以促进探索。对于一般马尔可夫决策过程(MDPs),在提出的最大最小熵框架下,基于探索与开发的分离,构造了一种有效的算法。数值结果表明,与现有的RL算法相比,该算法的性能有了很大的提高。 摘要:In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the maximum entropy RL framework in model-free sample-based learning. Whereas the maximum entropy RL framework guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.
【2】 Boosting Offline Reinforcement Learning with Residual Generative Modeling 标题:基于残差产生式建模的离线强化学习
作者:Hua Wei,Deheng Ye,Zhao Liu,Hao Wu,Bo Yuan,Qiang Fu,Wei Yang,Zhenhui,Li 机构:Tencent AI Lab, Shenzhen, China, The Pennsylvania State University, University Park, USA 备注:Accepted by IJCAI 2021, appendix included, 9 pages, 4 figures, 2 tables 链接:https://arxiv.org/abs/2106.10411 摘要:离线强化学习(RL)试图在不进行在线探索的情况下,通过记录离线经验来学习近似最优策略。现有的离线RL研究包括:1)生成性建模,即使用固定数据近似策略;学习状态-动作-价值函数。目前的研究主要集中在状态作用函数部分,通过减少训练数据分布偏移引起的值函数逼近的自举误差,而忽略了生成建模中误差传播的影响。本文分析了生成性建模中的误差。我们提出了一个残差生成模型AQL(action-conditional Q-learning)来减少离线RL的策略逼近误差。我们证明了我们的方法可以在不同的基准数据集中学习更精确的策略近似。此外,在多人在线竞技场(MOBA)游戏《国王的荣誉》中,我们还证明了所提出的离线RL方法可以在复杂的控制任务中学习到更多有竞争力的人工智能。 摘要:Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
【3】 Sample Efficient Social Navigation Using Inverse Reinforcement Learning 标题:基于逆向强化学习的样本高效社交导航
作者:Bobak H. Baghi,Gregory Dudek 机构: Baghi and 2Gregory Dudek are with the School ofComputer Science, McGill University, 3 480 Rue University 链接:https://arxiv.org/abs/2106.10318 摘要:在本文中,我们提出了一个算法,以有效地学习社会兼容的导航政策,从观察人类的轨迹。当移动机器人开始在社交空间中居住和交通时,它们必须考虑到社交线索,并以一种顺应社会的方式行事。我们专注于从例子中学习这些线索。我们描述了一种基于逆强化学习的算法,该算法在不知道人的具体行为的情况下,从人的轨迹观察中学习。我们通过利用重放缓冲区的概念(在许多非策略强化学习方法中发现)来消除与逆强化学习相关的额外样本复杂性,从而提高了我们方法的样本效率。我们通过使用公开的行人运动数据集训练代理来评估我们的方法,并将其与相关方法进行比较。结果表明,该方法在降低训练时间和样本复杂度的同时,具有更好的性能。 摘要:In this paper, we present an algorithm to efficiently learn socially-compliant navigation policies from observations of human trajectories. As mobile robots come to inhabit and traffic social spaces, they must account for social cues and behave in a socially compliant manner. We focus on learning such cues from examples. We describe an inverse reinforcement learning based algorithm which learns from human trajectory observations without knowing their specific actions. We increase the sample-efficiency of our approach over alternative methods by leveraging the notion of a replay buffer (found in many off-policy reinforcement learning methods) to eliminate the additional sample complexity associated with inverse reinforcement learning. We evaluate our method by training agents using publicly available pedestrian motion data sets and compare it to related methods. We show that our approach yields better performance while also decreasing training time and sample complexity.
【4】 Scientific multi-agent reinforcement learning for wall-models of turbulent flows 标题:湍流壁面模型的科学多智能体强化学习
作者:H. Jane Bae,Petros Koumoutsakos 机构:Institute of Applied Computational Science, Harvard University, Cambridge, MA, Graduate Aerospace Laboratories, California Institute of Technology, Pasadena, CA , Computational Science and Engineering Laboratory, ETH Zurich, CH-, Switzerland 链接:https://arxiv.org/abs/2106.11144 摘要:湍流模拟的预测能力,对于空气动力学设计和天气预报至关重要,取决于湍流模型的选择。来自实验和模拟的大量数据以及机器学习的出现为这些建模工作提供了动力。然而,由于启发式算法和有监督学习无法模拟近壁动力学,湍流模拟仍然受到阻碍。我们通过引入科学的多智能体强化学习(SciMARL)来发现大涡模拟(LES)的壁面模型来应对这一挑战。在SciMARL中,离散化点还充当协作代理,学习提供LES闭包模型。代理使用有限的数据自学习,并推广到极端雷诺数和以前看不到的几何。目前的模拟减少了几个数量级的计算成本比完全解决模拟,同时再现关键流量。我们相信SciMARL为湍流的模拟创造了新的能力。 摘要:The predictive capabilities of turbulent flow simulations, critical for aerodynamic design and weather prediction, hinge on the choice of turbulence models. The abundance of data from experiments and simulations and the advent of machine learning have provided a boost to these modeling efforts. However, simulations of turbulent flows remain hindered by the inability of heuristics and supervised learning to model the near-wall dynamics. We address this challenge by introducing scientific multi-agent reinforcement learning (SciMARL) for the discovery of wall models for large-eddy simulations (LES). In SciMARL, discretization points act also as cooperating agents that learn to supply the LES closure model. The agents self-learn using limited data and generalize to extreme Reynolds numbers and previously unseen geometries. The present simulations reduce by several orders of magnitude the computational cost over fully-resolved simulations while reproducing key flow quantities. We believe that SciMARL creates new capabilities for the simulation of turbulent flows.
【5】 Reinforcement learning for pursuit and evasion of microswimmers at low Reynolds number 标题:追打的强化学习与微泳者在低雷诺数下的逃避
作者:Francesco Borra,Luca Biferale,Massimo Cencini,Antonio Celani 机构:Dipartimento di Fisica, Universita “Sapienza” Piazzale A. Moro , I-, Rome, Italy, Department of Physics and INFN, University of Rome Tor Vergata, Via della Ricerca Scientifica , Rome, Italy 备注:6 pages, 3 figures (Supplementary Material in ancillary directory) 链接:https://arxiv.org/abs/2106.08609 摘要:水生生物可以利用水动力线索导航,找到猎物,逃离捕食者。我们考虑了两个相互竞争的微机器人在低雷诺数环境中进行追逐-逃避任务的模型。玩家的能力有限:他们只能感知水动力干扰,而水动力干扰提供了一些关于对手位置的线索,并执行简单的操作。追捕者的目标是在尽可能短的时间内抓住逃犯。相反,逃犯的目的是尽可能推迟抓捕。我们表明,通过强化学习,玩家可以找到有效的和物理上可解释的策略,这些策略可以利用流体动力环境。这封信为利用强化学习发现水环境中的捕食策略提供了一个概念证明,在水下机器人技术中有潜在的应用。 摘要:Aquatic organisms can use hydrodynamic cues to navigate, find their preys and escape from predators. We consider a model of two competing microswimmers engaged in a pursue-evasion task while immersed in a low-Reynolds-number environment. The players have limited abilities: they can only sense hydrodynamic disturbances, which provide some cue about the opponent's position, and perform simple manoeuvres. The goal of the pursuer is to capturethe evader in the shortest possible time. Conversely the evader aims at deferring capture as much as possible. We show that by means of Reinforcement Learning the players find efficient and physically explainable strategies which non-trivially exploit the hydrodynamic environment. This Letter offers a proof-of-concept for the use of Reinforcement Learning to discover prey-predator strategies in aquatic environments, with potential applications to underwater robotics.
元学习(2篇)
【1】 Compositional Federated Learning: Applications in Distributionally Robust Averaging and Meta Learning 标题:组合联邦学习:在分布稳健平均和元学习中的应用
作者:Feihu Huang,Junyi Li,Heng Huang 机构:com†Department of Electrical and Computer Engineering, University of Pittsburgh 备注:21 pages, 8 figures 链接:https://arxiv.org/abs/2106.11264 摘要:本文提出了一种高效的组合联邦学习(ComFedL)算法来解决一个新的组合联邦学习(FL)框架,它经常出现在许多具有层次结构的机器学习问题中,如分布式鲁棒联邦学习和模型不可知元学习(MAML)。此外,我们还研究了ComFedL算法在一些温和条件下的收敛性分析,证明了它的收敛速度达到$O(frac{1}{sqrt{T}})$,其中$T$表示迭代次数。据我们所知,我们的算法是第一个将联邦学习与组合随机优化相结合的算法。特别地,我们首先利用KL散度正则化将分布鲁棒FL(即minimax优化问题)转化为一个简单的组合优化问题。同时,我们还将分布不可知的MAML问题(即极大极小优化问题)转化为一个简单的组合优化问题。最后,我们应用两个流行的机器学习任务,即分布式鲁棒FL和MAML来验证我们算法的有效性。 摘要:In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many machine learning problems with a hierarchical structure such as distributionally robust federated learning and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a fast convergence rate of $O(frac{1}{sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our algorithm is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.
【2】 EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization 标题:EvoGrad:高效的基于梯度的元学习和超参数优化
作者:Ondrej Bohdal,Yongxin Yang,Timothy Hospedales 机构:School of Informatics, The University of Edinburgh 链接:https://arxiv.org/abs/2106.10575 摘要:近年来,基于梯度的元学习和超参数优化技术取得了显著的进展,使得神经网络的端到端训练和许多超参数训练成为可能。然而,现有的方法是相对昂贵的,因为他们需要计算二阶导数和存储一个较长的计算图。这一成本使它们无法扩展到更大的网络体系结构。我们介绍了EvoGrad,一种新的元学习方法,它利用进化技术更有效地计算超梯度。EvoGrad在不计算二阶梯度或存储更长的计算图的情况下估计超参数的超梯度,从而显著提高了效率。我们评估了EvoGrad在最近两个重要的元学习应用,即跨域的Few-Shot学习与特征转换和噪声标签学习与MetaWeightNet。结果表明,EvoGrad显著提高了效率,并支持将元学习扩展到更大的CNN架构,如从ResNet18到ResNet34。 摘要:Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on two substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations and noisy label learning with MetaWeightNet. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger CNN architectures such as from ResNet18 to ResNet34.
符号|符号学习(2篇)
【1】 An interpretable prediction model for longitudinal dispersion coefficient in natural streams based on evolutionary symbolic regression network 标题:基于进化符号回归网络的天然河流纵向分散系数可解释预测模型
作者:Yifeng Zhao,Zicheng Liu,Pei Zhang,Stan Z. Li,S. A. Galindo-Torres 机构:Collage of Environmental and Resources Science, Zhejiang University, Yuhangtang, Road, Hangzhou , Zhejiang Province, China, M, Lab, School of Engineering, Westlake University, Shilongshan Road, Hangzhou 备注:Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file Subjects 链接:https://arxiv.org/abs/2106.11026 摘要:为了更好地理解天然河流中的弥散,需要了解纵向弥散系数(LDC)。人们提出了各种方法来预测LDC。这些研究可以分为三类:分析研究、统计研究和ML驱动的研究(内隐和外显)。然而,对它们的综合评价仍然缺乏。本文首先对这些方法进行了深入的分析,找出了它们的不足。这是在一个广泛的数据库中进行的,该数据库由全球660个水力学和航道特性样本组成。通过使用最大不相似子集选择(SSMD)进行测试集选择和四分位距(IQR)去除离群值,提高了利用数据的可靠性和代表性。评价结果表明,这些方法的优劣顺序为:ML驱动法>统计法>分析法。隐式ML驱动方法本质上是黑箱,而显式ML驱动方法在预测LDC方面更具潜力。另外,过拟合是现有模型普遍存在的问题。这些模型还受到固定参数组合的影响。为了建立可解释的LDC预测模型,设计了一种新的符号回归方法&进化符号回归网络(ESRN)。它是遗传算法和神经网络的结合。引入了避免过度拟合和探索更多参数组合的策略。结果表明,ESRN模型在性能上优于现有的符号模型。该模型参数要求低(只需w和U*),适合于实际工程问题。对于无法进行现场试验或只能获得有限的现场信息的情况,它可以提供令人信服的解决方案。 摘要:A better understanding of dispersion in natural streams requires knowledge of longitudinal dispersion coefficient(LDC). Various methods have been proposed for predictions of LDC. Those studies can be grouped into three types: analytical, statistical and ML-driven researches(Implicit and explicit). However, a comprehensive evaluation of them is still lacking. In this paper, we first present an in-depth analysis of those methods and find out their defects. This is carried out on an extensive database composed of 660 samples of hydraulic and channel properties worldwide. The reliability and representativeness of utilized data are enhanced through the deployment of the Subset Selection of Maximum Dissimilarity(SSMD) for testing set selection and the Inter Quartile Range(IQR) for removal of the outlier. The evaluation reveals the rank of those methods as: ML-driven method > the statistical method > the analytical method. Whereas implicit ML-driven methods are black-boxes in nature, explicit ML-driven methods have more potential in prediction of LDC. Besides, overfitting is a universal problem in existing models. Those models also suffer from a fixed parameter combination. To establish an interpretable model for LDC prediction with higher performance, we then design a novel symbolic regression method called evolutionary symbolic regression network(ESRN). It is a combination of genetic algorithms and neural networks. Strategies are introduced to avoid overfitting and explore more parameter combinations. Results show that the ESRN model has superiorities over other existing symbolic models in performance. The proposed model is suitable for practical engineering problems due to its advantage in low requirement of parameters (only w and U* are required). It can provide convincing solutions for situations where the field test cannot be carried out or limited field information can be obtained.
【2】 Signal Processing Based Deep Learning for Blind Symbol Decoding and Modulation Classification 标题:基于信号处理的符号盲解码和调制分类深度学习
作者:Samer Hanna,Chris Dick,Danijela Cabric 机构: University of California 链接:https://arxiv.org/abs/2106.10543 摘要:对信号进行盲解码需要估计其未知的传输参数、补偿无线信道损伤以及识别调制类型。虽然深度学习可以解决复杂的问题,但数字信号处理(DSP)是可解释的,并且计算效率更高。为了将两者结合起来,我们提出了双路径网络(DPN)。它由一条DSP操作的信号路径和一条神经网络的特征路径组成,前者用于恢复信号,后者用于估计未知的传输参数。通过在多个恢复阶段上互连路径,后期阶段从恢复的信号中获益,并重用所有先前提取的特征。与缺乏特征共享或无法访问恢复信号的替代设计相比,所提出的设计在调制分类方面提高了5%。在模拟数据集上,DPN的估计结果及其盲解码性能均优于BPSK和QPSK盲信号处理算法。一个空中软件定义的无线电捕获被用来验证高信噪比下的DPN结果。DPN设计可以处理可变长度的输入,并且在调制分类中比依赖固定长度的输入具有高达15%的长信号预测平均性能。 摘要:Blindly decoding a signal requires estimating its unknown transmit parameters, compensating for the wireless channel impairments, and identifying the modulation type. While deep learning can solve complex problems, digital signal processing (DSP) is interpretable and can be more computationally efficient. To combine both, we propose the dual path network (DPN). It consists of a signal path of DSP operations that recover the signal, and a feature path of neural networks that estimate the unknown transmit parameters. By interconnecting the paths over several recovery stages, later stages benefit from the recovered signals and reuse all the previously extracted features. The proposed design is demonstrated to provide 5% improvement in modulation classification compared to alternative designs lacking either feature sharing or access to recovered signals. The estimation results of DPN along with its blind decoding performance are shown to outperform a blind signal processing algorithm for BPSK and QPSK on a simulated dataset. An over-the-air software-defined-radio capture was used to verify DPN results at high SNRs. DPN design can process variable length inputs and is shown to outperform relying on fixed length inputs with prediction averaging on longer signals by up to 15% in modulation classification.
分层学习(1篇)
【1】 Learning Timestamp-Level Representations for Time Series with Hierarchical Contrastive Loss 标题:具有分层对比损失的时间序列的学习时间戳层次表示
作者:Zhihan Yue,Yujing Wang,Juanyong Duan,Tianmeng Yang,Congrui Huang,Bixiong Xu 机构:Peking University,Microsoft 备注:20 pages, 6 figures 链接:https://arxiv.org/abs/2106.10466 摘要:提出了一种学习时间序列时间戳级表示的通用框架TS2Vec。与现有方法不同,TS2Vec执行时间戳区分,它直接为每个时间戳学习上下文表示向量。我们发现学习到的表征具有很强的预测能力。在有监督的时间序列预测中,基于学习表示的线性回归算法的性能优于以往的sota算法。此外,实例级表示可以简单地通过在所有时间戳的学习表示之上应用最大池层来获得。我们对时间序列分类任务进行了大量的实验,以评估实例级表示的质量。结果表明,在125个UCR数据集和29个UEA数据集上,TS2Vec与现有的无监督时间序列表示方法相比有了显著的改进。源代码在https://github.com/yuezhihan/ts2vec. 摘要:This paper presents TS2Vec, a universal framework for learning timestamp-level representations of time series. Unlike existing methods, TS2Vec performs timestamp-wise discrimination, which learns a contextual representation vector directly for each timestamp. We find that the learned representations have superior predictive ability. A linear regression trained on top of the learned representations outperforms previous SOTAs for supervised time series forecasting. Also, the instance-level representations can be simply obtained by applying a max pooling layer on top of learned representations of all timestamps. We conduct extensive experiments on time series classification tasks to evaluate the quality of instance-level representations. As a result, TS2Vec achieves significant improvement compared with existing SOTAs of unsupervised time series representation on 125 UCR datasets and 29 UEA datasets. The source code is publicly available at https://github.com/yuezhihan/ts2vec.
医学相关(2篇)
【1】 Plant Disease Detection Using Image Processing and Machine Learning 标题:基于图像处理和机器学习的植物病害检测
作者:Pranesh Kulkarni,Atharva Karwande,Tejas Kolhe,Soham Kamble,Akshay Joshi,Medha Wyawahare 机构: Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, India. 链接:https://arxiv.org/abs/2106.10698 摘要:在农业实践中,一项重要而繁琐的任务是检测作物上的病害。它需要大量的时间和熟练的劳动力。利用计算机视觉和机器学习技术,提出了一种智能高效的作物病害检测技术。该系统可检测5种常见植物的20种病害,准确率达93%。 摘要:One of the important and tedious task in agricultural practices is the detection of the disease on crops. It requires huge time as well as skilled labor. This paper proposes a smart and efficient technique for detection of crop disease which uses computer vision and machine learning techniques. The proposed system is able to detect 20 different diseases of 5 common plants with 93% accuracy.
【2】 Hybrid approach to detecting symptoms of depression in social media entries 标题:检测社交媒体条目中抑郁症状的混合方法
作者:Agnieszka Wołk,Karol Chlasta,Paweł Holas 机构:Polish-Japanese Academy of Information, Technology, The Institute of Literary Research of the, Polish Academy of Sciences, Koszykowa ,-, Warsaw, Nowy Świat ,-, Warsaw, Kozminski University, Jagiellońska ,,-, Warsaw, University of Warsaw 备注:11 pages, 4 figures, 2 tables, The Pacific Asia Conference on Information Systems (PACIS2021) 链接:https://arxiv.org/abs/2106.10485 摘要:情绪和词汇分析被广泛用于检测抑郁症或焦虑症。有文献记载,与健康人相比,情绪障碍患者所使用的语言存在显著差异。尽管如此,这些词汇方法的有效性还可以进一步提高,因为目前的分析重点是社交媒体条目是关于什么的,而不是它们是如何写的。在这项研究中,我们将重点放在这些短文彼此相似的方面,以及它们是如何产生的。我们提出了一种新颖的方法来解决抑郁症筛查问题,它是一种已知的从文本中获取语言信息的有效方法。我们将这些结果与基于BERT结构的情感分析进行了比较。最后,我们建立了一个混合模型,实现了71%的诊断准确率。 摘要:Sentiment and lexical analyses are widely used to detect depression or anxiety disorders. It has been documented that there are significant differences in the language used by a person with emotional disorders in comparison to a healthy individual. Still, the effectiveness of these lexical approaches could be improved further because the current analysis focuses on what the social media entries are about, and not how they are written. In this study, we focus on aspects in which these short texts are similar to each other, and how they were created. We present an innovative approach to the depression screening problem by applying Collgram analysis, which is a known effective method of obtaining linguistic information from texts. We compare these results with sentiment analysis based on the BERT architecture. Finally, we create a hybrid model achieving a diagnostic accuracy of 71%.
蒸馏|知识提取(1篇)
【1】 Teacher's pet: understanding and mitigating biases in distillation 标题:教师的宠儿:理解和减轻蒸馏过程中的偏差
作者:Michal Lukasik,Srinadh Bhojanapalli,Aditya Krishna Menon,Sanjiv Kumar 机构:Google Research, New York 备注:17 pages, 8 figures 链接:https://arxiv.org/abs/2106.10494 摘要:知识提炼被广泛用作一种手段,利用复杂教师模型的预测来提高相对简单的学生模型的性能。有几项研究表明,蒸馏能显著提高学生的整体表现;然而,这些收益在所有数据子组中是一致的吗?在本文中,我们证明了蒸馏可以损害某些子群的性能,例如,关联样本很少的类。我们将这种行为追溯到教师分配被转移到学生模型并被学生模型放大所造成的错误。为了缓解这一问题,我们提出了一些技术,可以在不太可靠的子组中软化教师的影响。在多个图像分类基准上的实验表明,这些改进的蒸馏保持了整体精度的提高,同时确保了子组性能的改善。 摘要:Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain subgroups, e.g., classes with few associated samples. We trace this behaviour to errors made by the teacher distribution being transferred to and amplified by the student model. To mitigate this problem, we present techniques which soften the teacher influence for subgroups where it is less reliable. Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance.
推荐(3篇)
【1】 Data Optimisation for a Deep Learning Recommender System 标题:深度学习推荐系统中的数据优化
作者:Gustav Hertz,Sandhya Sachidanandan,Balázs Tóth,Emil S. Jørgensen,Martin Tegnér 机构:Oxford-Man Institute, University of Oxford 链接:https://arxiv.org/abs/2106.11218 摘要:本文提出了推荐系统中用户数据收集的隐私保护要求。我们研究的目的是双重的。首先,我们询问对数据收集的限制是否会影响基于RNN的建议的测试质量。我们研究验证性能如何依赖于可用的训练数据量。为此,我们结合使用top-K精度、目录覆盖率和新颖性,因为传统的精度指标不一定能为用户提供好的建议。其次,我们询问是否可以通过使用辅助数据源在最小数据量下提高质量。为此,我们提出了知识转移的概念,并构造了一个表示来度量数据中购买行为之间的相似性。这需要对哪个源域贡献最大做出有条件的判断。我们的结果表明:(i)当训练规模增加到临界点以上时,测试性能存在饱和。我们还讨论了不同性能指标和数据属性之间的相互作用。此外,我们证明了(ii)我们的表述对于衡量购买行为是有意义的。特别是,结果表明,如果我们根据相似的度量选择相关的源域,我们可以利用辅助数据来提高验证性能。 摘要:This paper advocates privacy preserving requirements on collection of user data for recommender systems. The purpose of our study is twofold. First, we ask if restrictions on data collection will hurt test quality of RNN-based recommendations. We study how validation performance depends on the available amount of training data. We use a combination of top-K accuracy, catalog coverage and novelty for this purpose, since good recommendations for the user is not necessarily captured by a traditional accuracy metric. Second, we ask if we can improve the quality under minimal data by using secondary data sources. We propose knowledge transfer for this purpose and construct a representation to measure similarities between purchase behaviour in data. This to make qualified judgements of which source domain will contribute the most. Our results show that (i) there is a saturation in test performance when training size is increased above a critical point. We also discuss the interplay between different performance metrics, and properties of data. Moreover, we demonstrate that (ii) our representation is meaningful for measuring purchase behaviour. In particular, results show that we can leverage secondary data to improve validation performance if we select a relevant source domain according to our similarly measure.
【2】 BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System 标题:BanditMF:基于多臂Bandit的矩阵分解推荐系统
作者:Shenghao Xu 机构:A report, submitted in partial fulfillment, of the requirements for the degree of, Master of Science in Computer Science, The Chinese University of Hong Kong, Supervisor: Prof. John C.S. Lui, arXiv:,.,v, [cs.IR] , Jun 备注:MSc dissertation 链接:https://arxiv.org/abs/2106.10898 摘要:多武装土匪(Multi-armed bandits,MAB)提供了一种有原则的在线学习方法,实现了探索与开发的平衡,由于其性能优越、学习反馈低、不需要在多种情况下学习行为,因此在推荐系统等应用中受到了广泛的关注。同样,在推荐系统中,协同过滤(CF)可以说是推荐系统中最早和最有影响力的方法。最关键的是,新用户和不断变化的推荐项目池是推荐系统需要解决的挑战。对于协同过滤,经典的方法是离线训练模型,然后进行在线测试,但是这种方法已经不能处理用户偏好的动态变化,即所谓的textit{cold start}。那么在缺乏有效信息的情况下,如何有效地向用户推荐商品呢?针对上述问题,提出了一种基于多臂bandit的协同过滤推荐系统BanditMF。BanditMF旨在解决多臂bandits算法和协同过滤中的两个难题:(1)如何解决有效信息稀缺条件下协同过滤的冷启动问题,(2)如何解决强社会关系域中bandit算法由于独立估计与每个用户相关的未知参数而忽略用户之间的相关性而导致的次优问题。 摘要:Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.Due to the superior performance and low feedback learning without the learning to act in multiple situations, Multi-armed Bandits drawing widespread attention in applications ranging such as recommender systems. Likewise, within the recommender system, collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system. Crucially, new users and an ever-changing pool of recommended items are the challenges that recommender systems need to address. For collaborative filtering, the classical method is training the model offline, then perform the online testing, but this approach can no longer handle the dynamic changes in user preferences which is the so-called textit{cold start}. So how to effectively recommend items to users in the absence of effective information? To address the aforementioned problems, a multi-armed bandit based collaborative filtering recommender system has been proposed, named BanditMF. BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering: (1) how to solve the cold start problem for collaborative filtering under the condition of scarcity of valid information, (2) how to solve the sub-optimal problem of bandit algorithms in strong social relations domains caused by independently estimating unknown parameters associated with each user and ignoring correlations between users.
【3】 A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems 标题:非神经网络协同过滤推荐系统综述
作者:Carmel Wenga,Majirus Fansi,Sébastien Chabrier,Jean-Martial Mari,Alban Gabillon 机构:(a) Université de la Polynésie Française 备注:29 pages, 7 tables and 2 figures 链接:https://arxiv.org/abs/2106.10679 摘要:在过去的二十年里,由于在线应用程序中数据量的爆炸式增长,推荐系统吸引了很多人的兴趣。协同过滤是信息推荐应用中应用最为广泛的一种过滤方式。协同过滤(CF)利用一组用户的已知偏好对其他用户的未知偏好进行预测和推荐(推荐是基于用户过去的行为)。20世纪90年代首次提出,各种越来越成功的模式被提出。由于机器学习技术在许多领域的成功,在推荐系统中的应用越来越受到重视。在本文中,我们概述了推荐系统的CF方法,它们的两个主要类别,以及它们的评估指标。通过介绍经典机器学习算法从最初的使用案例到先进的机器学习模型的演变过程,重点研究了经典机器学习算法在CF推荐系统中的应用。我们试图对CF系统(使用python实现)提供一个全面和比较的概述,作为这一领域研究和实践的指南。 摘要:Over the past two decades, recommender systems have attracted a lot of interest due to the explosion in the amount of data in online applications. A particular attention has been paid to collaborative filtering, which is the most widely used in applications that involve information recommendations. Collaborative filtering (CF) uses the known preference of a group of users to make predictions and recommendations about the unknown preferences of other users (recommendations are made based on the past behavior of users). First introduced in the 1990s, a wide variety of increasingly successful models have been proposed. Due to the success of machine learning techniques in many areas, there has been a growing emphasis on the application of such algorithms in recommendation systems. In this article, we present an overview of the CF approaches for recommender systems, their two main categories, and their evaluation metrics. We focus on the application of classical Machine Learning algorithms to CF recommender systems by presenting their evolution from their first use-cases to advanced Machine Learning models. We attempt to provide a comprehensive and comparative overview of CF systems (with python implementations) that can serve as a guideline for research and practice in this area.
聚类(3篇)
【1】 Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view Clustering 标题:Multi-VAE:学习用于多视图聚类的非纠缠视图公共和视图特有的视觉表示
作者:Jie Xu,Yazhou Ren,Huayi Tang,Xiaorong Pu,Xiaofeng Zhu,Ming Zeng,Lifang He 机构:University of Electronic Science and Technology of China, Carnegie Mellon University,Lehigh University 链接:https://arxiv.org/abs/2106.11232 摘要:多视图聚类是一个由来已久的重要研究课题,主要研究从不同的角度挖掘互补信息。然而,现有的研究往往在一个公共特征空间中融合多个视图的表示或处理聚类,这可能导致它们的纠缠,特别是对于视觉表示。为了解决这个问题,我们提出了一个新的基于VAE的多视图聚类框架(multi-VAE)。具体地说,在生成模型中定义了一个视图公共变量和多个视图特殊变量。视图公共变量的先验服从近似离散的Gumbel-Softmax分布,引入该分布提取多视图的公共聚类因子。同时,视图特有变量的先验服从连续高斯分布,用来表示每个视图特有的视觉因素。通过控制互信息容量来分离视图的公共表示和视图的特殊表示,可以分离出多个视图的连续视觉信息,从而有效地挖掘出它们的公共离散聚类信息。实验结果表明,与现有的聚类方法相比,Multi-VAE在获得更好的聚类性能的同时,具有良好的可解释性。 摘要:Multi-view clustering, a long-standing and important research problem, focuses on mining complementary information from diverse views. However, existing works often fuse multiple views' representations or handle clustering in a common feature space, which may result in their entanglement especially for visual representations. To address this issue, we present a novel VAE-based multi-view clustering framework (Multi-VAE) by learning disentangled visual representations. Concretely, we define a view-common variable and multiple view-peculiar variables in the generative model. The prior of view-common variable obeys approximately discrete Gumbel Softmax distribution, which is introduced to extract the common cluster factor of multiple views. Meanwhile, the prior of view-peculiar variable follows continuous Gaussian distribution, which is used to represent each view's peculiar visual factors. By controlling the mutual information capacity to disentangle the view-common and view-peculiar representations, continuous visual information of multiple views can be separated so that their common discrete cluster information can be effectively mined. Experimental results demonstrate that Multi-VAE enjoys the disentangled and explainable visual representations, while obtaining superior clustering performance compared with state-of-the-art methods.
【2】 Contrastive Multi-Modal Clustering 标题:对比多模态聚类
作者:Jie Xu,Huayi Tang,Yazhou Ren,Xiaofeng Zhu,Lifang He 机构:University of Electronic Science and Technology of China, Lehigh University 链接:https://arxiv.org/abs/2106.11193 摘要:多模态聚类是从多个模态或视图中挖掘互补信息的一种方法,越来越受到人们的关注。然而,现有的研究很少集中于提取多模态的高层语义信息进行聚类。本文提出了对比多模态聚类(CMMC),通过对比学习挖掘高层语义信息。具体来说,我们的框架包括三个部分(1) 多个自动编码器进行了优化,以保持每个模态的多样性,学习互补信息(2) 提出了一个特征对比模块,从不同的语态中学习常见的高级语义特征(3) 标签对比模块旨在学习所有模式的一致聚类分配。通过提出的多模态对比学习,在保持低层潜在特征多样性的同时,最大化了高层特征的互信息。此外,为了利用学习到的高级语义特征,我们进一步通过解决最大匹配问题来生成伪标签,以微调聚类分配。大量实验表明,CMMC具有良好的可扩展性,并优于现有的多模式聚类方法。 摘要:Multi-modal clustering, which explores complementary information from multiple modalities or views, has attracted people's increasing attentions. However, existing works rarely focus on extracting high-level semantic information of multiple modalities for clustering. In this paper, we propose Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic information via contrastive learning. Concretely, our framework consists of three parts. (1) Multiple autoencoders are optimized to maintain each modality's diversity to learn complementary information. (2) A feature contrastive module is proposed to learn common high-level semantic features from different modalities. (3) A label contrastive module aims to learn consistent cluster assignments for all modalities. By the proposed multi-modal contrastive learning, the mutual information of high-level features is maximized, while the diversity of the low-level latent features is maintained. In addition, to utilize the learned high-level semantic features, we further generate pseudo labels by solving a maximum matching problem to fine-tune the cluster assignments. Extensive experiments demonstrate that CMMC has good scalability and outperforms state-of-the-art multi-modal clustering methods.
【3】 Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle 标题:面向故障Oracle的查询优化和时间效率聚类算法
作者:Pan Peng,Jiapeng Zhang 机构:Department of Computer Science, University of Sheffield, uk†Department of Computer Science, University of Southern California 备注:Accepted for presentation at the Conference on Learning Theory (COLT) 2021 链接:https://arxiv.org/abs/2106.10374 摘要:基于数据库中的众包实体解析、社交网络中的符号边缘预测和相关聚类的应用,Mazumdar和Saha[NIPS 2017]提出了一个优雅的理论模型,用于研究具有错误oracle的聚类。在这个模型中,给定一组属于$k$未知组(或簇)的$n$项,我们的目标是通过向oracle请求成对查询来恢复簇。这个oracle可以回答“项目$u$和$v$属于同一个集群吗?”。但是,对每个成对查询的回答都会出现错误,概率为$varepsilon$,对于某些$varepsilon in(0, frac12)$。Mazumdar和Saha在该模型下给出了两种算法:一种算法是查询最优的,但时间效率不高(即以准多项式时间运行);另一种算法是时间效率高(即以多项式时间运行),但查询次优。Larsen,Mitzenmacher和tsourakakakis[WWW 2020]针对$2$簇的特殊情况给出了一种新的时间效率算法,当模型的偏差$delta:=1-2varepsilon$较大时,该算法是查询最优的。对于$k$簇的一般情况和$delta$的其他区域,是否能得到一个查询最优的、时间效率高的算法是一个悬而未决的问题。在本文中,我们在上述问题上取得了进展,并在信息论恢复可能的情况下,为所有常数$k$和区域中的任何$delta$提供了一个具有接近最优查询复杂度(高达$O(log^2n)$)的时间效率算法。我们的算法建立在随机块模型的基础上。 摘要:Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with a faulty oracle. In this model, given a set of $n$ items which belong to $k$ unknown groups (or clusters), our goal is to recover the clusters by asking pairwise queries to an oracle. This oracle can answer the query that ``do items $u$ and $v$ belong to the same cluster?''. However, the answer to each pairwise query errs with probability $varepsilon$, for some $varepsilonin(0,frac12)$. Mazumdar and Saha provided two algorithms under this model: one algorithm is query-optimal while time-inefficient (i.e., running in quasi-polynomial time), the other is time efficient (i.e., in polynomial time) while query-suboptimal. Larsen, Mitzenmacher and Tsourakakis [WWW 2020] then gave a new time-efficient algorithm for the special case of $2$ clusters, which is query-optimal if the bias $delta:=1-2varepsilon$ of the model is large. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters and other regimes of $delta$. In this paper, we make progress on the above question and provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(log^2 n)$) for all constant $k$ and any $delta$ in the regime when information-theoretic recovery is possible. Our algorithm is built on a connection to the stochastic block model.
超分辨率|去噪|去模糊|去雾(1篇)
【1】 One-to-many Approach for Improving Super-Resolution 标题:提高超分辨率的一对多方法
作者:Sieun Park,Eunho Lee 机构:Goldsmiths, University of London, London, SE,NW, Paul Math School, Chung-cheong bukdo 链接:https://arxiv.org/abs/2106.10437 摘要:超分辨率(SR)是一个一对多的任务,有多种可能的解决方案。然而,以往的研究并不关注这一特征。对于一对多管道,生成器应该能够生成重建的多个估计,并且不会因为生成相似且同样真实的图像而受到惩罚。为了实现这一点,我们建议在剩余密集块(RRDB)中的每一个残差之后加入加权像素噪声,以使生成器能够生成各种图像。我们修改了严格的内容丢失,只要内容一致,就不会惩罚重建图像中的随机变化。此外,我们观察到,在DIV2K和DIV8K数据集中,有一些没有焦点的区域提供了毫无帮助的指导方针。我们使用[10]的方法过滤训练数据中的模糊区域。最后,我们修改鉴别器以接收低解析度影像作为参考影像与目标影像,以提供更好的回馈给产生器。使用我们提出的方法,我们能够提高ESRGAN在x4知觉SR中的性能,并在x16知觉极限SR中获得最先进的LPIPS分数。 摘要:Super-resolution (SR) is a one-to-many task with multiple possible solutions. However, previous works were not concerned about this characteristic. For a one-to-many pipeline, the generator should be able to generate multiple estimates of the reconstruction, and not be penalized for generating similar and equally realistic images. To achieve this, we propose adding weighted pixel-wise noise after every Residual-in-Residual Dense Block (RRDB) to enable the generator to generate various images. We modify the strict content loss to not penalize the stochastic variation in reconstructed images as long as it has consistent content. Additionally, we observe that there are out-of-focus regions in the DIV2K, DIV8K datasets that provide unhelpful guidelines. We filter blurry regions in the training data using the method of [10]. Finally, we modify the discriminator to receive the low-resolution image as a reference image along with the target image to provide better feedback to the generator. Using our proposed methods, we were able to improve the performance of ESRGAN in x4 perceptual SR and achieve the state-of-the-art LPIPS score in x16 perceptual extreme SR.
自动驾驶|车辆|车道检测等(4篇)
【1】 Attention-based Neural Network for Driving Environment Complexity Perception 标题:基于注意力的神经网络在驾驶环境复杂性感知中的应用
作者:Ce Zhang,Azim Eskandarian,Xuelai Du 机构:He is a Ph.D. student at Virginia Tech Autonomous Systems and Intelligent, Machines (ASIM) Lab., predicts the affordance for driving actions (vehicle angle 备注:Accepted by 2021 IEEE Intelligent Transportation Systems Conference 链接:https://arxiv.org/abs/2106.11277 摘要:环境感知对自动驾驶汽车(AV)的安全至关重要。现有的AV感知算法大多没有研究周围环境的复杂度,没有考虑环境复杂度参数。提出了一种新的基于注意的神经网络模型来预测周围驾驶环境的复杂程度。该模型以自然驾驶视频和相应的车辆动力学参数作为输入。它由一个Yolo-v3目标检测算法、一个热图生成算法、基于CNN的特征提取器和基于注意力的特征提取器组成,用于视频和时间序列车辆动力学数据输入以提取特征。该算法的输出是一个环境复杂度参数。利用Berkeley-DeepDrive数据集(BDD数据集)和主观标注的环境复杂度水平对算法进行模型训练和验证。提出的基于注意的网络对周围环境的复杂度进行分类,平均分类准确率达到91.22%。结果表明,该算法能够准确地预测环境复杂度水平,并可用于未来AVs的环境感知研究。 摘要:Environment perception is crucial for autonomous vehicle (AV) safety. Most existing AV perception algorithms have not studied the surrounding environment complexity and failed to include the environment complexity parameter. This paper proposes a novel attention-based neural network model to predict the complexity level of the surrounding driving environment. The proposed model takes naturalistic driving videos and corresponding vehicle dynamics parameters as input. It consists of a Yolo-v3 object detection algorithm, a heat map generation algorithm, CNN-based feature extractors, and attention-based feature extractors for both video and time-series vehicle dynamics data inputs to extract features. The output from the proposed algorithm is a surrounding environment complexity parameter. The Berkeley DeepDrive dataset (BDD Dataset) and subjectively labeled surrounding environment complexity levels are used for model training and validation to evaluate the algorithm. The proposed attention-based network achieves 91.22% average classification accuracy to classify the surrounding environment complexity. It proves that the environment complexity level can be accurately predicted and applied for future AVs' environment perception studies.
【2】 Vehicle Trajectory Prediction in City-scale Road Networks using a Direction-based Sequence-to-Sequence Model with Spatiotemporal Attention Mechanisms 标题:具有时空注意机制的基于方向的序列到序列模型在城市路网车辆轨迹预测中的应用
作者:Yuebing Liang,Zhan Zhao 机构:Department of Urban Planning and Design, University of Hong Kong 链接:https://arxiv.org/abs/2106.11175 摘要:城市尺度下的车辆轨迹预测对于车辆导航、交通管理、位置推荐等各种基于位置的应用具有重要意义。现有方法通常将轨迹表示为一系列网格单元、路段或意图集。这些方法都不理想,因为基于单元的表示方法忽略了道路网的结构,另外两种方法在分析城市规模的道路网时效率较低。另外,大多数模型都侧重于预测下一个位置,对于较长的序列很难推广。为了解决这些问题,我们提出了一种新的序列到序列模型D-LSTM(Direction-based Long-Short-Term Memory),它将每条轨迹表示为一系列交叉点和相关的运动方向,然后将它们输入到LSTM编解码网络中,用于将来的轨迹生成。此外,我们还引入了一种空间注意机制来捕捉道路网络中的动态空间依赖,以及一种带有滑动上下文窗口的时间注意机制来捕捉轨迹数据中的短期和长期时间依赖。基于两个真实的大规模滑行轨迹数据集的大量实验表明,D-LSTM算法的性能优于现有的最新车辆轨迹预测方法,验证了所提出的轨迹表示方法和时空注意机制的有效性。 摘要:Trajectory prediction of vehicles at the city scale is of great importance to various location-based applications such as vehicle navigation, traffic management, and location-based recommendations. Existing methods typically represent a trajectory as a sequence of grid cells, road segments or intention sets. None of them is ideal, as the cell-based representation ignores the road network structures and the other two are less efficient in analyzing city-scale road networks. In addition, most models focus on predicting the immediate next position, and are difficult to generalize for longer sequences. To address these problems, we propose a novel sequence-to-sequence model named D-LSTM (Direction-based Long Short-Term Memory), which represents each trajectory as a sequence of intersections and associated movement directions, and then feeds them into a LSTM encoder-decoder network for future trajectory generation. Furthermore, we introduce a spatial attention mechanism to capture dynamic spatial dependencies in road networks, and a temporal attention mechanism with a sliding context window to capture both short- and long-term temporal dependencies in trajectory data. Extensive experiments based on two real-world large-scale taxi trajectory datasets show that D-LSTM outperforms the existing state-of-the-art methods for vehicle trajectory prediction, validating the effectiveness of the proposed trajectory representation method and spatiotemporal attention mechanisms.
【3】 Deep Spatio-Temporal Forecasting of Electrical Vehicle Charging Demand 标题:电动汽车充电需求的深空时预测
作者:Frederik Boe Hüttel,Inon Peled,Filipe Rodrigues,Francisco C. Pereira 机构: Despite this interest in modelling the de-Equal contribution 1Department of Management, TechnicalUniversity of Denmark 链接:https://arxiv.org/abs/2106.10940 摘要:电动汽车可以提供一个低碳排放的解决方案,以扭转不断上升的排放趋势。然而,这要求用于满足需求的能源是绿色的。为了满足这一要求,准确预测充电需求至关重要。短期和长期充电需求预测将有助于更好地优化电网和未来基础设施扩建。在本文中,我们建议使用公开的数据来预测电动汽车充电需求。为了模拟充电站之间复杂的时空相关性,我们认为时态图卷积模型最适合捕捉这些相关性。与其他预测方法相比,本文提出的时间图卷积网络为短期和长期预测提供了最准确的预测。 摘要:Electric vehicles can offer a low carbon emission solution to reverse rising emission trends. However, this requires that the energy used to meet the demand is green. To meet this requirement, accurate forecasting of the charging demand is vital. Short and long-term charging demand forecasting will allow for better optimisation of the power grid and future infrastructure expansions. In this paper, we propose to use publicly available data to forecast the electric vehicle charging demand. To model the complex spatial-temporal correlations between charging stations, we argue that Temporal Graph Convolution Models are the most suitable to capture the correlations. The proposed Temporal Graph Convolutional Networks provide the most accurate forecasts for short and long-term forecasting compared with other forecasting methods.
【4】 Learning Space Partitions for Path Planning 标题:用于路径规划的学习空间划分
作者:Kevin Yang,Tianjun Zhang,Chris Cummins,Brandon Cui,Benoit Steiner,Linnan Wang,Joseph E. Gonzalez,Dan Klein,Yuandong Tian 备注:Under submission to NeurIPS 2021 链接:https://arxiv.org/abs/2106.10544 摘要:路径规划是一个有效地发现高报酬轨迹的问题,通常需要优化高维多模态报酬函数。像CEM和CMA-ES这样的流行方法贪婪地关注搜索空间中有前途的区域,并且可能陷入局部极大值。DOO和VOOT平衡探索和开发,但使用独立于奖励函数的空间划分策略进行优化。最近,LaMCTS在经验上学会了以奖励敏感的方式划分搜索空间进行黑盒优化。在本文中,我们发展了一种新的形式遗憾分析,以确定这种自适应区域划分方案何时以及为什么有效。我们还提出了一种新的路径规划方法PlaLaM,它改进了每个子区域内的函数值估计,并使用了搜索空间的潜在表示。根据经验,PlaLaM在二维导航任务中的性能优于现有的路径规划方法,特别是在存在难以逃逸的局部最优解的情况下,并且当插入带有规划组件(如PETS)的基于模型的RL时显示出优势。这些增益转移到高度多模态的现实世界任务中,在编译器相位排序方面,我们的性能比强基线高出245%,在分子设计方面,我们的性能比强基线高出0.4,在0-1的范围内。 摘要:Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method PlaLaM which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into model-based RL with planning components such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】 DiGS : Divergence guided shape implicit neural representation for unoriented point clouds 标题:DIGS:无方向点云的散度引导形状隐式神经表示
作者:Yizhak Ben-Shabat,Chamin Hewa Koneputugodage,Stephen Gould 机构:The Australian National University, Technion Israel Institute of Technology 链接:https://arxiv.org/abs/2106.10811 摘要:最近,神经形状表示在形状分析和重建任务中被证明是有效的。现有的神经网络方法需要点坐标和相应的法向量来学习形状的隐式水平集。通常不提供法向量作为原始数据,因此需要近似和重定位作为预处理阶段,这两个阶段都会引入噪声。本文提出了一种不需要法向量作为输入的散度引导的形状表示学习方法。我们证明,在距离函数的散度上加入软约束有利于平滑解,该解可靠地确定梯度方向以匹配每个点的未知法线,在某些情况下甚至比直接使用地面真值法线向量的方法更好。此外,我们提出了一种新的正弦形状表示网络的几何初始化方法,进一步提高了收敛到期望解的能力。我们评估了我们的方法在表面重建任务中的有效性,并与其他无定向方法相比显示了最先进的性能,与定向方法相比,显示了相当的性能。 摘要:Neural shape representations have recently shown to be effective in shape analysis and reconstruction tasks. Existing neural network methods require point coordinates and corresponding normal vectors to learn the implicit level sets of the shape. Normal vectors are often not provided as raw data, therefore, approximation and reorientation are required as pre-processing stages, both of which can introduce noise. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal shape representation networks that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and show state-of-the-art performance compared to other unoriented methods and on-par performance compared to oriented methods.
【2】 Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis 标题:仿射不变综合秩加权深度:定义、性质和有限样本分析
作者:Guillaume Staerman,Pavlo Mozharovskyi,Stéphan Clémençon 机构:LTCI, Télécom Paris, Institut Polytechnique de Paris 链接:https://arxiv.org/abs/2106.11068 摘要:因为统计深度的概念决定了观测值在$mathbb{R}^d$和$dgeq 2$中的中心向外排序,所以它允许定义多变量数据的分位数和秩,并将它们用于各种统计任务(例如}推断、假设检验)。尽管自{Tukey75}的开创性贡献以来,文献中提出了许多深度函数,但并非所有深度函数都具有模拟单变量概率分布分位数函数概念所需的性质。在本文中,我们提出了一个扩展的综合秩加权统计深度(简称IRW深度),最初是在{IRW}中引入的,为了满足{EXTIT{仿射不变性}的性质而进行了修改,从而实现了{ZuoS00a}所阐述的命名法中列出的所有四个关键公理。我们提出的变量,称为仿射不变IRW深度(简称AI-IRW),涉及正在研究的(假定平方可积)$d$维随机向量$X$的协方差/精度矩阵,为了考虑$X$最可变的方向,将深度值赋给任何点$Xinmathbb{R}^d$。从非交感的角度研究了AI-IRW深度采样版本的精度。也就是说,证明了AI-IRW深度的统计对应的浓缩结果。除了进行理论分析外,还考虑了在异常检测中的应用,并展示了数值结果,为我们提出的深度函数的相关性提供了有力的经验证据。 摘要:Because it determines a center-outward ordering of observations in $mathbb{R}^d$ with $dgeq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (textit{e.g.} inference, hypothesis testing). Whereas many depth functions have been proposed textit{ad-hoc} in the literature since the seminal contribution of cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in cite{IRW}, modified in order to satisfy the property of textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $xin mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.
联邦学习|隐私保护|加密(5篇)
【1】 Federated Learning with Positive and Unlabeled Data 标题:具有正数据和无标签数据的联合学习
作者:Xinyang Lin,Hanting Chen,Yixing Xu,Chao Xu,Xiaolin Gui,Yiping Deng,Yunhe Wang 机构: School of Electronics and Information Engineering, Xi’an Jiaotong University., Key Lab of Machine Perception (MOE), Dept. of Machine Intelligence, Peking University., Noah’s Ark Lab, Huawei Technologies. , Central Software Institution, Huawei Technologies. 链接:https://arxiv.org/abs/2106.10904 摘要:我们研究了在联邦环境下从正数据和未标记(PU)数据中学习的问题,由于资源和时间的限制,每个客户机只标记其数据集的一小部分。与传统PU学习中负类由单个类组成的设置不同,联合设置中客户端无法识别的负样本可能来自客户端未知的多个类。因此,现有的PU学习方法很难应用于这种情况。为了解决这个问题,我们提出了一个新的框架,即使用正数据和未标记数据的联合学习(FedPU),通过利用其他客户机中的标记数据来最小化多个负类的预期风险。我们从理论上证明了所提出的FedPU可以达到一个不劣于完全监督模型的$Csqrt{C}$倍(其中$C$表示类的数目)的推广界。实验结果表明,FedPU比传统的只能使用正数据的学习方法具有更好的学习效果。 摘要:We study the problem of learning from positive and unlabeled (PU) data in the federated setting, where each client only labels a little part of their dataset due to the limitation of resources and time. Different from the settings in traditional PU learning where the negative class consists of a single class, the negative samples which cannot be identified by a client in the federated setting may come from multiple classes which are unknown to the client. Therefore, existing PU learning methods can be hardly applied in this situation. To address this problem, we propose a novel framework, namely Federated learning with Positive and Unlabeled data (FedPU), to minimize the expected risk of multiple negative classes by leveraging the labeled data in other clients. We theoretically prove that the proposed FedPU can achieve a generalization bound which is no worse than $Csqrt{C}$ times (where $C$ denotes the number of classes) of the fully-supervised model. Empirical experiments show that the FedPU can achieve much better performance than conventional learning methods which can only use positive data.
【2】 FedCM: Federated Learning with Client-level Momentum 标题:FedCM:具有客户级动量的联合学习
作者:Jing Xu,Sen Wang,Liwei Wang,Andrew Chi-Chih Yao 机构: Theory Lab, Labs, Huawei Technologies, Co.Ltd., Hong Kong, Key Laboratory of Machine Perception, MOE, School of EECS, Peking University, Institute for Interdisciplinary Information Sciences, Tsinghua University 链接:https://arxiv.org/abs/2106.10874 摘要:联邦学习是一种分布式机器学习方法,它可以在不共享数据的情况下进行模型训练。本文提出了一种新的联邦学习算法,即基于客户端动量的联邦平均算法(FedCM),以解决实际联邦学习应用中的部分参与和客户端异构性问题。FedCM在前几轮通信中聚集全局梯度信息,用类动量项修正客户梯度下降,有效地修正了偏差,提高了局部SGD的稳定性。我们提供理论分析来强调FedCM的好处。我们还进行了广泛的实证研究,并证明FedCM在各种任务中取得了优异的性能,并且对不同水平的客户数量、参与率和客户异质性具有鲁棒性。 摘要:Federated Learning is a distributed machine learning approach which enables model training without data sharing. In this paper, we propose a new federated learning algorithm, Federated Averaging with Client-level Momentum (FedCM), to tackle problems of partial participation and client heterogeneity in real-world federated learning applications. FedCM aggregates global gradient information in previous communication rounds and modifies client gradient descent with a momentum-like term, which can effectively correct the bias and improve the stability of local SGD. We provide theoretical analysis to highlight the benefits of FedCM. We also perform extensive empirical studies and demonstrate that FedCM achieves superior performance in various tasks and is robust to different levels of client numbers, participation rate and client heterogeneity.
【3】 Is Shapley Value fair? Improving Client Selection for Mavericks in Federated Learning 标题:沙普利的价值公平吗?在联合学习中改进小牛队的客户选择
作者:Jiyue Huang,Chi Hong,Lydia Y. Chen,Stefanie Roos 机构:Delft University of Technology 链接:https://arxiv.org/abs/2106.10734 摘要:Shapley值通常用于衡量和激励客户参与联合学习。在本文中,我们从理论上和模拟上证明了Shapley Value低估了一种常见的客户类型:Maverick。Mavericks是在数据分布和数据量上都不同的客户机,可以是某些类型数据的唯一所有者。在正确的时刻选择正确的客户机对于联邦学习减少收敛时间和提高准确性非常重要。我们提出了FedEMD,一种基于局部和全局数据分布之间的Wasserstein距离的自适应客户选择策略。由于FedEMD调整了选择概率,使得当模型受益于稀有类的改进时,最好选择小牛,因此它始终确保在存在不同类型小牛的情况下快速收敛。与现有的策略(包括基于Shapley值的策略)相比,FedEMD将FedAvg聚合的神经网络分类器的收敛性提高了至少26.9%。 摘要:Shapley Value is commonly adopted to measure and incentivize client participation in federated learning. In this paper, we show -- theoretically and through simulations -- that Shapley Value underestimates the contribution of a common type of client: the Maverick. Mavericks are clients that differ both in data distribution and data quantity and can be the sole owners of certain types of data. Selecting the right clients at the right moment is important for federated learning to reduce convergence times and improve accuracy. We propose FedEMD, an adaptive client selection strategy based on the Wasserstein distance between the local and global data distributions. As FedEMD adapts the selection probability such that Mavericks are preferably selected when the model benefits from improvement on rare classes, it consistently ensures the fast convergence in the presence of different types of Mavericks. Compared to existing strategies, including Shapley Value-based ones, FedEMD improves the convergence of neural network classifiers by at least 26.9% for FedAvg aggregation compared with the state of the art.
【4】 FedXGBoost: Privacy-Preserving XGBoost for Federated Learning 标题:FedXGBoost:面向联合学习的隐私保护XGBoost
作者:Nhan Khanh Le,Yang Liu,Quang Minh Nguyen,Qingchen Liu,Fangzhou Liu,Quanwei Cai,Sandra Hirche 机构:Chair of Information-Oriented Control, Technical University of Munich, Security Research, Bytedance Inc., Chair of Automatic Control Engineering, Technical University of Munich, Department of EECS, Massachusetts Institute of Technology 链接:https://arxiv.org/abs/2106.10662 摘要:联合学习是一种分布式机器学习框架,它支持跨多方的协作训练,同时确保数据隐私。由于传统的隐私保护方法成本高昂,XGBoost(最先进的树推进框架)对联邦学习的实际适应性仍然有限。为了解决这个问题,我们提出了两种具有隐私保证的联邦XGBoost变体:FedXGBoost-SMM和FedXGBoost-LDP。我们的第一个协议FedXGBoost-SMM部署了增强的安全矩阵乘法方法,以无损的准确性和比基于加密的技术更低的开销来保护隐私。独立开发的第二个协议FedXGBoost-LDP是在局部微分隐私的噪声扰动下启发式设计的,并在真实世界和合成数据集上进行了实证评估。 摘要:Federated learning is the distributed machine learning framework that enables collaborative training across multiple parties while ensuring data privacy. Practical adaptation of XGBoost, the state-of-the-art tree boosting framework, to federated learning remains limited due to high cost incurred by conventional privacy-preserving methods. To address the problem, we propose two variants of federated XGBoost with privacy guarantee: FedXGBoost-SMM and FedXGBoost-LDP. Our first protocol FedXGBoost-SMM deploys enhanced secure matrix multiplication method to preserve privacy with lossless accuracy and lower overhead than encryption-based techniques. Developed independently, the second protocol FedXGBoost-LDP is heuristically designed with noise perturbation for local differential privacy, and empirically evaluated on real-world and synthetic datasets.
【5】 STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning 标题:STEM:一种获得接近最优样本和联邦学习通信复杂度的随机双边动量算法
作者:Prashant Khanduri,Pranay Sharma,Haibo Yang,Mingyi Hong,Jia Liu,Ketan Rajawat,Pramod K. Varshney 机构:∗Department of Electrical and Computer Engineering, The Ohio State University, OH, USA, †Department of Electrical and Computer Engineering, University of Minnesota, MN, USA, ⋄Department of Electrical Engineering and Computer Science, Syracuse University, NY, USA 链接:https://arxiv.org/abs/2106.10435 摘要:联邦学习(FL)是指多个工作节点(WNs)通过使用本地数据建立联合模型的范例。对于一个一般的非凸FL问题,如何选择WNs和服务器的更新方向、小批量大小和本地更新频率,使WNs使用最少的样本数和通信轮数来达到期望的解,目前还不清楚。这项工作解决了上述问题,并考虑了一类随机算法,其中WNs在通信前执行一些局部更新。我们证明,当基于随机动量估计选择WN和服务器的方向时,该算法需要$tilde{mathcal{O}(epsilon^{-3/2})$样本和$tilde{mathcal{O}(epsilon^{-1})$通信轮来计算$epsilon$-平稳解。据我们所知,这是第一个同时实现{it near optimal}采样和通信复杂性的FL算法。进一步,我们证明了在局部更新频率和局部小批量大小之间存在一条折衷曲线,在该曲线上可以保持上述样本和通信复杂性。最后,我们证明了对于经典的FedAvg(又称局部SGD,这是STEM的一个动量较少的特例),存在一个类似的折衷曲线,尽管样本和通信复杂度较差。我们对这种权衡的见解为选择FL算法的四个重要设计元素、更新频率、方向和小批量大小提供了指导,以实现最佳性能。 摘要:Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $tilde{mathcal{O}}(epsilon^{-3/2})$ samples and $tilde{mathcal{O}}(epsilon^{-1})$ communication rounds to compute an $epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.
推理|分析|理解|解释(7篇)
【1】 Conditional Neural Relational Inference for Interacting Systems 标题:交互系统的条件神经关系推理
作者:Joao A. Candido Ramos,Lionel Blondé,Stéphane Armand,Alexandros Kalousis 机构: University of Applied Sciences and Arts Western Switzerland, University of Geneva 备注:17 pages, 5 figures 链接:https://arxiv.org/abs/2106.11083 摘要:在这项工作中,我们想学习如何模拟相似但又不同的交互对象组的动力学。这些群体遵循一些共同的物理规律,表现出通过一些矢量描述捕捉到的特殊性。我们开发了一个模型,允许我们从给定向量描述的任何这样的组中进行条件生成。与以往的学习动力系统只能完成轨迹完成,并且需要在生成时提供一部分轨迹动力学作为输入的工作不同,我们只使用条件向量生成,不需要访问生成时的轨迹。我们评估我们的模型在设置人体步态建模,特别是病理人体步态。 摘要:In this work, we want to learn to model the dynamics of similar yet distinct groups of interacting objects. These groups follow some common physical laws that exhibit specificities that are captured through some vectorial description. We develop a model that allows us to do conditional generation from any such group given its vectorial description. Unlike previous work on learning dynamical systems that can only do trajectory completion and require a part of the trajectory dynamics to be provided as input in generation time, we do generation using only the conditioning vector with no access to generation time's trajectories. We evaluate our model in the setting of modeling human gait and, in particular pathological human gait.
【2】 Understanding the Dynamics between Vaping and Cannabis Legalization Using Twitter Opinions 标题:用推特上的观点理解大麻合法化和Vaping之间的动态
作者:Shishir Adhikari,Akshay Uppal,Robin Mermelstein,Tanya Berger-Wolf,Elena Zheleva 机构:Computer Science, University of Illinois at Chicago, Psychology; Institute for Health Research and Policy, University of Illinois at Chicago, Computer Science and Engineering; Electrical and Computer Engineering; Evolution, Ecology, and Organismal Biology; 备注:Published at ICWSM 2021 链接:https://arxiv.org/abs/2106.11029 摘要:大麻合法化受到美国许多州的欢迎,但其在从使用烟草电子烟升级为吸食大麻方面的作用尚不清楚。与此同时,吸食大麻与新的肺部疾病和青少年使用率上升有关。为了了解大麻合法化对升级的影响,我们设计了一项观察性研究,以评估娱乐性大麻合法化对电子烟使用者亲大麻态度发展的因果影响。我们收集并分析了Twitter数据,其中包含了对大麻和JUUL(一个非常流行的电子香烟品牌)的看法。我们使用弱监督学习对个人微博进行过滤,并分类进行姿态检测。我们发现,休闲大麻合法化政策对已经支持电子烟的使用者的亲大麻态度的发展产生了影响。 摘要:Cannabis legalization has been welcomed by many U.S. states but its role in escalation from tobacco e-cigarette use to cannabis vaping is unclear. Meanwhile, cannabis vaping has been associated with new lung diseases and rising adolescent use. To understand the impact of cannabis legalization on escalation, we design an observational study to estimate the causal effect of recreational cannabis legalization on the development of pro-cannabis attitude for e-cigarette users. We collect and analyze Twitter data which contains opinions about cannabis and JUUL, a very popular e-cigarette brand. We use weakly supervised learning for personal tweet filtering and classification for stance detection. We discover that recreational cannabis legalization policy has an effect on increased development of pro-cannabis attitudes for users already in favor of e-cigarettes.
【3】 Bayesian inference of ODEs with Gaussian processes 标题:具有高斯过程的常微分方程的贝叶斯推断
作者:Pashupati Hegde,Çağatay Yıldız,Harri Lähdesmäki,Samuel Kaski,Markus Heinonen 机构:Ça˘gatay Yıldız, Department of Computer Science, Aalto University, Finland 链接:https://arxiv.org/abs/2106.10905 摘要:机器学习的最新进展提出了直接从数据中估计未知连续时间系统动力学的黑盒方法。然而,早期的工作是基于近似常微分方程解或点估计。提出了一种新的贝叶斯非参数模型,利用高斯过程直接从数据中推断未知ODE系统的后验概率。我们推导了用解耦函数采样表示向量场后验概率的稀疏变分推理。我们还介绍了一种概率射击增强,使有效的推断从任意长的轨迹。该方法证明了计算向量场后验概率的优势,在多个ODE学习任务中,预测不确定性得分优于其他方法。 摘要:Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data. However, earlier works are based on approximative ODE solutions or point estimates. We propose a novel Bayesian nonparametric model that uses Gaussian processes to infer posteriors of unknown ODE systems directly from data. We derive sparse variational inference with decoupled functional sampling to represent vector field posteriors. We also introduce a probabilistic shooting augmentation to enable efficient inference from arbitrarily long trajectories. The method demonstrates the benefit of computing vector field posteriors, with predictive uncertainty scores outperforming alternative methods on multiple ODE learning tasks.
【4】 TinyML: Analysis of Xtensa LX6 microprocessor for Neural Network Applications by ESP32 SoC 标题:TinyML:用ESP32 SoC分析用于神经网络的Xtensa LX6微处理器
作者:Md Ziaul Haque Zim 机构: Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh, Saint Petersburg Electrotechnical University “LETI”, Saint Petersburg, Russia 链接:https://arxiv.org/abs/2106.10652 摘要:近几十年来,机器学习在许多计算应用中变得极其重要。超低功耗嵌入式设备的普及,如带有微型机器学习(tinyML)应用的ESP32或ESP32-Cam,将使人工智能驱动的嵌入式物联网设备大量增加。在过去几年中,微控制器设备(Espressif ESP32)变得足够强大,可以用于小型/微型机器学习(tinyML)任务。Arduino IDE、MicroPython和TensorFlow Lite(TF)等平台以及tinyML应用程序的易用性使其成为移动机器人、现代计算机科学和电气工程领域不可或缺的研究课题。本文的目的是通过运行一个神经网络应用程序来分析Xtensa双核32位LX6微处理器的速度。在具有一个和两个隐层的神经网络中,通过不同数量的神经元输入不同数量的输入(9、36、144和576)。Xtensa LX6微处理器已被分析,因为它内置了Espressif ESP32和ESP32 Cam,这是非常易于使用的即插即用物联网设备。本文分析了Xtensa LX6微处理器在前馈模式下的速度。 摘要:In recent decades, Machine Learning (ML) has become extremely important for many computing applications. The pervasiveness of ultra-low-power embedded devices such as ESP32 or ESP32 Cam with tiny Machine Learning (tinyML) applications will enable the mass proliferation of Artificial Intelligent powered Embedded IoT Devices. In the last few years, the microcontroller device (Espressif ESP32) became powerful enough to be used for small/tiny machine learning (tinyML) tasks. The ease of use of platforms like Arduino IDE, MicroPython and TensorFlow Lite (TF) with tinyML application make it an indispensable topic of research for mobile robotics, modern computer science and electrical engineering. The goal of this paper is to analyze the speed of the Xtensa dual core 32-bit LX6 microprocessor by running a neural network application. The different number of inputs (9, 36, 144 and 576) inputted through the different number of neurons in neural networks with one and two hidden layers. Xtensa LX6 microprocessor has been analyzed because it comes inside with Espressif ESP32 and ESP32 Cam which are very easy to use, plug and play IoT device. In this paper speed of the Xtensa LX6 microprocessor in feed-forward mode has been analyzed.
【5】 Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis 标题:数据管理和机器学习中基于分数的解释:反事实分析的答案集编程方法
作者:Leopoldo Bertossi 机构:Universidad Adolfo Ib´a˜nez, and, Millennium Inst. for Foundational Research on Data (IMFD), Santiago, Chile 备注:Paper associated to forthcoming short course at Fall School. arXiv admin note: text overlap with arXiv:2007.12799 链接:https://arxiv.org/abs/2106.10562 摘要:我们描述了一些最近的方法,以分数为基础的解释数据库中的查询答案和结果分类模型在机器学习。重点是作者和合作者所做的工作。特别强调了基于答案集编程的陈述式方法,以及反事实推理在分数说明和计算中的应用。几个例子说明了这些方法的灵活性。 摘要:We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning. The focus is on work done by the author and collaborators. Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.
【6】 Nested Variational Inference 标题:嵌套变分推理
作者:Heiko Zimmermann,Hao Wu,Babak Esmaeili,Jan-Willem van de Meent 机构:†Khoury College of Computer Sciences, Northeastern University 链接:https://arxiv.org/abs/2106.11302 摘要:我们开发了嵌套变分推理(NVI),这是一个方法家族,通过最小化嵌套的每个层次上的正向或反向KL散度来学习嵌套重要性采样器的建议。NVI适用于许多常用的重要抽样策略,它提供了一种学习中间密度的机制,可以作为指导抽样的启发式方法。我们的实验将NVI应用于(a)使用学习退火路径的多峰分布样本(b)在隐马尔可夫模型中学习近似未来观测可能性的启发式算法,以及(c)在层次深生成模型中执行摊销推理。我们观察到,优化嵌套目标导致改善样本质量方面的对数平均重量和有效样本量。 摘要:We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
【7】 Outlier Detection and Spatial Analysis Algorithms 标题:离群点检测与空间分析算法
作者:Jacob John 机构:School of Computer Science and Engineering (SCOPE), Vellore Institute of Technology, Vellore, India 备注:7 pages, 14 figures 链接:https://arxiv.org/abs/2106.10669 摘要:离群点检测是数据挖掘的一个重要领域。它可以用于在分析之前对数据进行预处理,也可以用于在处理阶段(可视化之前)进行后处理,具体取决于异常值的有效性及其重要性。离群点检测扩展到信用卡欺诈、网络入侵、机器故障预测、潜在恐怖袭击等领域。离群值是那些具有显著不同特征的数据点。它们偏离数据集,导致分析过程中的不一致、噪声和异常,并导致原始点的修改。然而,一个常见的误解是,必须立即从数据集中消除或替换异常值。如果单独分析,这些观点可能会被认为是有用的,因为它们可以从一个单独的机制中获得,这使得它对研究问题非常重要。本文综述了用于空间分析的不同离群点检测方法。空间数据或地理空间数据是那些显示地理属性或属性,如位置或面积的数据。一个例子是为某个特定区域收集的天气数据,如降水量、温度、风速等。 摘要:Outlier detection is a significant area in data mining. It can be either used to pre-process the data prior to an analysis or post the processing phase (before visualization) depending on the effectiveness of the outlier and its importance. Outlier detection extends to several fields such as detection of credit card fraud, network intrusions, machine failure prediction, potential terrorist attacks, and so on. Outliers are those data points with characteristics considerably different. They deviate from the data set causing inconsistencies, noise and anomalies during analysis and result in modification of the original points However, a common misconception is that outliers have to be immediately eliminated or replaced from the data set. Such points could be considered useful if analyzed separately as they could be obtained from a separate mechanism entirely making it important to the research question. This study surveys the different methods of outlier detection for spatial analysis. Spatial data or geospatial data are those that exhibit geographic properties or attributes such as position or areas. An example would be weather data such as precipitation, temperature, wind velocity, and so on collected for a defined region.
检测相关(3篇)
【1】 AOMD: An Analogy-aware Approach to Offensive Meme Detection on Social Media 标题:AOMD:一种基于类比感知的社交媒体攻击性模因检测方法
作者:Lanyu Shang,Yang Zhang,Yuheng Zha,Yingxi Chen,Christina Youn,Dong Wang 机构:Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA 链接:https://arxiv.org/abs/2106.11229 摘要:本文主要研究网络社交媒体中攻击性类比模因的检测问题,即通过视觉内容和模因的文本/标题进行类比来传递攻击性信息。现有的攻击性模因检测方法往往忽略了模因的视觉和文本内容之间的隐含关系,不足以识别攻击性类比模因。准确地检测攻击性类比模因有两个重要的挑战:一是捕捉模因隐含的类比并非易事;ii)有效地将模因中不同数据模式之间的复杂类比联系起来也是一个挑战。为了应对上述挑战,我们开发了一个基于深度学习的类比感知攻击性模因检测框架,从模因的多模态内容中学习内隐类比,有效地检测攻击性类比模因。我们在两个来自在线社交媒体的真实数据集上评估AOMD。评估结果表明,与最新的基线相比,AOMD通过更准确地检测攻击性类比模因获得了显著的性能提升。 摘要:This paper focuses on an important problem of detecting offensive analogy meme on online social media where the visual content and the texts/captions of the meme together make an analogy to convey the offensive information. Existing offensive meme detection solutions often ignore the implicit relation between the visual and textual contents of the meme and are insufficient to identify the offensive analogy memes. Two important challenges exist in accurately detecting the offensive analogy memes: i) it is not trivial to capture the analogy that is often implicitly conveyed by a meme; ii) it is also challenging to effectively align the complex analogy across different data modalities in a meme. To address the above challenges, we develop a deep learning based Analogy-aware Offensive Meme Detection (AOMD) framework to learn the implicit analogy from the multi-modal contents of the meme and effectively detect offensive analogy memes. We evaluate AOMD on two real-world datasets from online social media. Evaluation results show that AOMD achieves significant performance gains compared to state-of-the-art baselines by detecting offensive analogy memes more accurately.
【2】 Anticipatory Detection of Compulsive Body-focused Repetitive Behaviors with Wearables 标题:可穿戴设备对强迫性身体聚焦重复行为的预测检测
作者:Benjamin Lucas Searle,Dimitris Spathis,Marios Constantinides,Daniele Quercia,Cecilia Mascolo 机构: University of Cambridge 备注:Accepted to ACM MobileHCI 2021 (20 pages, dataset/code: this https URL) 链接:https://arxiv.org/abs/2106.10970 摘要:以身体为中心的重复行为(BFRBs)是一种手驱动的行为,如不及早识别和治疗,会损害人的外貌。自动检测技术仍处于探索阶段,以前的工作很少局限于具有单一模式(如运动)的可穿戴设备。在这里,我们提出了一种结合运动、方向和心率传感器的多感官方法来检测BFRBs。我们进行了一项可行性研究,参与者(N=10)暴露于BFRBs诱导任务中,并在广泛评估传感模式、交叉验证方法和观察窗的情况下分析了380分钟的信号。我们的模型在区分BFRBs方面的AUC>0.90,这在行为发生前5分钟的观察窗中比1分钟的观察窗更明显。在后续的定性调查中,我们发现,在设计预防BFRBs的及时干预措施时,不仅检测时机很重要,而且模型也需要有上下文意识。 摘要:Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation, and heart rate sensors to detect BFRBs. We conducted a feasibility study in which participants (N=10) were exposed to BFRBs-inducing tasks, and analyzed 380 mins of signals under an extensive evaluation of sensing modalities, cross-validation methods, and observation windows. Our models achieved an AUC > 0.90 in distinguishing BFRBs, which were more evident in observation windows 5 mins prior to the behavior as opposed to 1-min ones. In a follow-up qualitative survey, we found that not only the timing of detection matters but also models need to be context-aware, when designing just-in-time interventions to prevent BFRBs.
【3】 Hard hat wearing detection based on head keypoint localization 标题:基于头部关键点定位的安全帽佩戴检测
作者:Bartosz Wójcik,Mateusz Żarski,Kamil Książek,Jarosław Adam Miszczak,Mirosław Jan Skibniewski 机构:Jan Skibniewskid,b,e,f, Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka ,-, Gliwice, Poland, Electronics and Computer Science, Silesian University of Technology, Akademicka ,-, Gliwice, Poland 备注:15 pages, 9 figures and 9 tables 链接:https://arxiv.org/abs/2106.10944 摘要:近年来,基于视觉的施工现场安全系统中的深度学习方法受到了广泛关注,尤其是个人防护装备。然而,尽管如此,仍然没有可靠的方法来确定工人和他们的安全帽之间的关系。为了解决这一问题,本文提出了一种结合深度学习、目标检测和头部关键点定位的方法,并结合简单的基于规则的推理。在测试中,该方法超越了以往基于不同实例的相对包围盒位置以及直接检测戴头盔者和未戴头盔者的方法。结果表明,将新的深度学习方法与基于规则的人性化解释系统相结合,可以得到既可靠又能成功模拟人工现场监督的解决方案。这项工作是发展完全自主的建筑工地安全系统的下一步,表明这方面仍有改进的余地。 摘要:In recent years, a lot of attention is paid to deep learning methods in the context of vision-based construction site safety systems, especially regarding personal protective equipment. However, despite all this attention, there is still no reliable way to establish the relationship between workers and their hard hats. To answer this problem a combination of deep learning, object detection and head keypoint localization, with simple rule-based reasoning is proposed in this article. In tests, this solution surpassed the previous methods based on the relative bounding box position of different instances, as well as direct detection of hard hat wearers and non-wearers. The results show that the conjunction of novel deep learning methods with humanly-interpretable rule-based systems can result in a solution that is both reliable and can successfully mimic manual, on-site supervision. This work is the next step in the development of fully autonomous construction site safety systems and shows that there is still room for improvement in this area.
分类|识别(13篇)
【1】 On fine-tuning of Autoencoders for Fuzzy rule classifiers 标题:模糊规则分类器自动编码器的微调研究
作者:Rahul Kumar Sevakula,Nishchal Kumar Verma,Hisao Ishibuchi 链接:https://arxiv.org/abs/2106.11182 摘要:深层神经网络的最新发现使研究人员能够处理一些非常复杂的问题,如图像分类和音频分类,并改进了理论和经验证明。提出了一种将自动编码器应用于模糊规则分类器的新方案。自动编码器在堆叠时可以学习到数据之间复杂的非线性关系,而所提出的基于FRC的框架可以让用户向系统输入专家知识。本文进一步介绍了四种新的自动编码器微调策略,以提高FRC的分类和规则约简性能。该框架已经在五个真实的基准数据集上进行了测试。与之前15项研究的详细比较,以及10倍交叉验证性能表明,所提出的方法能够构建FRCs,从而提供最先进的精确度。 摘要:Recent discoveries in Deep Neural Networks are allowing researchers to tackle some very complex problems such as image classification and audio classification, with improved theoretical and empirical justifications. This paper presents a novel scheme to incorporate the use of autoencoders in Fuzzy rule classifiers (FRC). Autoencoders when stacked can learn the complex non-linear relationships amongst data, and the proposed framework built towards FRC can allow users to input expert knowledge to the system. This paper further introduces four novel fine-tuning strategies for autoencoders to improve the FRC's classification and rule reduction performance. The proposed framework has been tested across five real-world benchmark datasets. Elaborate comparisons with over 15 previous studies, and across 10-fold cross validation performance, suggest that the proposed methods are capable of building FRCs which can provide state of the art accuracies.
【2】 Classification of Documents Extracted from Images with Optical Character Recognition Methods 标题:用光学字符识别方法对从图像中提取的文档进行分类
作者:Omer Aydin 备注:None 链接:https://arxiv.org/abs/2106.11125 摘要:在过去的十年里,机器学习方法给了我们无人驾驶汽车、语音识别、有效的网络搜索以及对人类基因组的更好理解。机器学习在今天是如此普遍,以至于它一天被使用几十次,可能是在不知不觉中。试着教机器一些过程或某些情况,可以让他们预测一些人脑难以预测的结果。这些方法也帮助我们在短时间内完成一些人类活动通常不可能或难以完成的操作。基于这些原因,机器学习在今天是如此重要。在这项研究中,两种不同的机器学习方法相结合。为了解决现实世界中的问题,手稿文件首先被传送到计算机,然后被分类。整个过程采用了三种基本的实现方法。手写或打印的文件已被扫描仪或数码相机数字化。这些文件已处理两种不同的光学字符识别(OCR)操作。然后利用朴素贝叶斯算法对生成的文本进行分类。所有项目都是在Windows操作系统上的microsoftvisualstudio12平台上进行编程的。研究的所有部分都使用了C#编程语言。此外,还使用了一些准备好的代码和DLL。 摘要:Over the past decade, machine learning methods have given us driverless cars, voice recognition, effective web search, and a much better understanding of the human genome. Machine learning is so common today that it is used dozens of times a day, possibly unknowingly. Trying to teach a machine some processes or some situations can make them predict some results that are difficult to predict by the human brain. These methods also help us do some operations that are often impossible or difficult to do with human activities in a short time. For these reasons, machine learning is so important today. In this study, two different machine learning methods were combined. In order to solve a real-world problem, the manuscript documents were first transferred to the computer and then classified. We used three basic methods to realize the whole process. Handwriting or printed documents have been digitalized by a scanner or digital camera. These documents have been processed with two different Optical Character Recognition (OCR) operation. After that generated texts are classified by using Naive Bayes algorithm. All project was programmed in Microsoft Visual Studio 12 platform on Windows operating system. C# programming language was used for all parts of the study. Also, some prepared codes and DLLs were used.
【3】 Paradigm selection for Data Fusion of SAR and Multispectral Sentinel data applied to Land-Cover Classification 标题:用于土地覆盖分类的SAR与多光谱哨兵数据融合范例选择
作者:Alessandro Sebastianelli,Maria Pia Del Rosso,Pierre Philippe Mathieu,Silvia Liberata Ullo 机构: University of Sannio 备注:This work has been submitted to the IEEE Geoscience and Remote Sensing Letters for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2106.11056 摘要:数据融合是一项众所周知的技术,在人工智能对地观测(AI4EO)领域得到越来越广泛的应用,主要是因为它能够通过组合多个数据源来加强AI4EO的应用,从而产生更好的结果。另一方面,与其他卫星数据分析方法一样,由于人工智能(AI)的集成,数据融合本身也在受益和发展。本文分析并实现了四种基于卷积神经网络的数据融合方法。其目的是在确定了CNN的基本结构之后,为选择最佳的数据融合框架提供一个系统的过程,从而得到最佳的分类结果,并在涉及数据融合应用于遥感时帮助感兴趣的研究人员进行工作。该方法已在土地覆被分类中得到验证,但也可以推广到其他情况。 摘要:Data fusion is a well-known technique, becoming more and more popular in the Artificial Intelligence for Earth Observation (AI4EO) domain mainly due to its ability of reinforcing AI4EO applications by combining multiple data sources and thus bringing better results. On the other hand, like other methods for satellite data analysis, data fusion itself is also benefiting and evolving thanks to the integration of Artificial Intelligence (AI). In this letter, four data fusion paradigms, based on Convolutional Neural Networks (CNNs), are analyzed and implemented. The goals are to provide a systematic procedure for choosing the best data fusion framework, resulting in the best classification results, once the basic structure for the CNN has been defined, and to help interested researchers in their work when data fusion applied to remote sensing is involved. The procedure has been validated for land-cover classification but it can be transferred to other cases.
【4】 Performance Evaluation of Classification Models for Household Income, Consumption and Expenditure Data Set 标题:家庭收入、消费和支出数据集分类模型的性能评价
作者:Mersha Nigus,Dorsewamy 机构:Department of Computer Science, Mangalore University, Karnataka, India, Key words: Machine learning, classification, HICE, food, insecurity, KNN, Corresponding Author: 链接:https://arxiv.org/abs/2106.11055 摘要:由于最近区域和全球一级的粮食短缺以及主要捐助国对消除长期饥饿的新承诺,粮食安全在今天的政策议程上比过去更加突出。机器学习可用于家庭食品不安全分类的一个领域。在这项研究中,我们建立了一个稳健的方法来分类是否一个家庭是食品安全和食品不安全的机器学习算法。在本研究中,我们使用十种机器学习演算法来分类家庭的食物安全状况。梯度增强(GB)、随机林(RF)、额外树(ET)、Bagging、K-最近邻(KNN)、决策树(DT)、支持向量机(SVM)、Logistic回归(LR)、Ada-Boost(AB)和朴素贝叶斯是本研究中使用的分类算法(NB)。然后,通过从HICE调查数据中收集数据并由领域专家进行验证,开发家庭食品安全状况数据集,完成分类任务。所有分类器的性能对于所有性能指标都有更好的结果。随机森林模型和梯度增强模型的测试精度为0.9997,其他分类器如Bagging、Decision tree、Ada-Boost、Extra tree、K-近邻、Logistic回归、SVM和Naive Bayes的得分分别为0.9996、0.09996、0.9994、0.95675、0.9415、0.8915、0.7853和0.7595,分别。 摘要:Food security is more prominent on the policy agenda today than it has been in the past, thanks to recent food shortages at both the regional and global levels as well as renewed promises from major donor countries to combat chronic hunger. One field where machine learning can be used is in the classification of household food insecurity. In this study, we establish a robust methodology to categorize whether or not a household is being food secure and food insecure by machine learning algorithms. In this study, we have used ten machine learning algorithms to classify the food security status of the Household. Gradient Boosting (GB), Random Forest (RF), Extra Tree (ET), Bagging, K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Ada Boost (AB) and Naive Bayes were the classification algorithms used throughout this study (NB). Then, we perform classification tasks from developing data set for household food security status by gathering data from HICE survey data and validating it by Domain Experts. The performance of all classifiers has better results for all performance metrics. The performance of the Random Forest and Gradient Boosting models are outstanding with a testing accuracy of 0.9997 and the other classifier such as Bagging, Decision tree, Ada Boost, Extra tree, K-nearest neighbor, Logistic Regression, SVM and Naive Bayes are scored 0.9996, 0.09996, 0.9994, 0.95675, 0.9415, 0.8915, 0.7853 and 0.7595, respectively.
【5】 SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild 标题:SHREC 2021:野外基于骨架的手势识别跟踪
作者:Ariel Caputo,Andrea Giachetti,Simone Soso,Deborah Pintani,Andrea D'Eusanio,Stefano Pini,Guido Borghi,Alessandro Simoni,Roberto Vezzani,Rita Cucchiara,Andrea Ranieri,Franca Giannini,Katia Lupinetti,Marina Monti,Mehran Maghoumi,Joseph J. LaViola Jr,Minh-Quan Le,Hai-Dang Nguyen,Minh-Triet Tran 机构:Department of Computer Science, University of Verona, Italy, b Università di Modena e Reggio Emilia, Dipartimento di Ingegneria "Enzo Ferrari", Dipartimento di Informatica - Scienza e Ingegneria 备注:12 pages, to be published on Computers & Graphics 链接:https://arxiv.org/abs/2106.10980 摘要:手势识别是一种基本的工具,可以在各种应用场景中实现新的交互模式,如混合现实环境、非接触公共信息亭、娱乐系统等。如今,手势识别可以直接从低成本跟踪器(Ultraleap)和MR耳机(Hololens、Oculus Quest)提供的软件或视频处理软件模块(例如Google Mediapipe)估计的手骨架流中执行。尽管最近在骨骼的手势和动作识别方面取得了一些进展,但目前的最新技术在识别大量异构手势的真实场景中的表现如何尚不清楚,因为许多基准测试不测试在线识别,并且使用的词典有限。这推动了shrec2021的提案:野外基于骨架的手势识别跟踪。在本次比赛中,我们创建了一个新的数据集,其中包含具有不同类型和持续时间的异构手势。这些手势必须在在线识别场景中的序列中找到。本文介绍了比赛的结果,展示了四个研究小组提出的技术在挑战性任务上的表现,并与简单的基线方法进行了比较。 摘要:Gesture recognition is a fundamental tool to enable novel interaction paradigms in a variety of application scenarios like Mixed Reality environments, touchless public kiosks, entertainment systems, and more. Recognition of hand gestures can be nowadays performed directly from the stream of hand skeletons estimated by software provided by low-cost trackers (Ultraleap) and MR headsets (Hololens, Oculus Quest) or by video processing software modules (e.g. Google Mediapipe). Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario for the recognition of a wide set of heterogeneous gestures, as many benchmarks do not test online recognition and use limited dictionaries. This motivated the proposal of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild. For this contest, we created a novel dataset with heterogeneous gestures featuring different types and duration. These gestures have to be found inside sequences in an online recognition scenario. This paper presents the result of the contest, showing the performances of the techniques proposed by four research groups on the challenging task compared with a simple baseline method.
【6】 Leveraging Conditional Generative Models in a General Explanation Framework of Classifier Decisions 标题:在分类器决策的一般解释框架中利用条件生成模型
作者:Martin Charachon,Paul-Henry Cournède,Céline Hudelot,Roberto Ardon 机构:Ardon, Incepto Medical, France, MICS, Universit´e Paris-Saclay, CentraleSup´elec, France 链接:https://arxiv.org/abs/2106.10947 摘要:为分类器的决策提供一个人类可以理解的解释,对于在日常任务中使用分类器产生信任已成为当务之急。尽管许多工作已经通过生成视觉解释图来解决这个问题,但是它们经常提供噪声和不准确的结果,迫使使用与所讨论的分类器无关的启发式正则化。在本文中,我们提出了一个新的一般视角的视觉解释问题克服这些限制。我们证明了通过两个特定的条件生成模型得到的两幅图像之间的差异可以产生视觉解释。这两种生成模型都是使用分类器来解释的,并使用数据库来实现以下特性:(i)第一个生成器生成的所有图像分类与输入图像相似,而第二个生成器的输出分类则相反(ii)生成的图像属于真实图像的分布(iii)输入图像和相应生成图像之间的距离最小,因此生成元素之间的差异仅揭示所研究分类器的相关信息。利用对称约束和循环约束,我们给出了一般公式的两种不同的近似和实现。在实验上,我们在三个不同的公共数据集上证明了与最新技术相比的显著改进。特别地,影响分类器的区域的定位与人类注释是一致的。 摘要:Providing a human-understandable explanation of classifiers' decisions has become imperative to generate trust in their use for day-to-day tasks. Although many works have addressed this problem by generating visual explanation maps, they often provide noisy and inaccurate results forcing the use of heuristic regularization unrelated to the classifier in question. In this paper, we propose a new general perspective of the visual explanation problem overcoming these limitations. We show that visual explanation can be produced as the difference between two generated images obtained via two specific conditional generative models. Both generative models are trained using the classifier to explain and a database to enforce the following properties: (i) All images generated by the first generator are classified similarly to the input image, whereas the second generator's outputs are classified oppositely. (ii) Generated images belong to the distribution of real images. (iii) The distances between the input image and the corresponding generated images are minimal so that the difference between the generated elements only reveals relevant information for the studied classifier. Using symmetrical and cyclic constraints, we present two different approximations and implementations of the general formulation. Experimentally, we demonstrate significant improvements w.r.t the state-of-the-art on three different public data sets. In particular, the localization of regions influencing the classifier is consistent with human annotations.
【7】 Practical Transferability Estimation for Image Classification Tasks 标题:一种实用的图像分类任务可转移性估计
作者:Yang Tan,Yang Li,Shao-Lun Huang 机构:Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, China 备注:12 pages 链接:https://arxiv.org/abs/2106.10479 摘要:迁移性估计是迁移学习中的一个重要问题,它用来预测将源模型(源任务)迁移到目标任务时的性能。最近的分析性可转移性度量被广泛应用于源模型选择和多任务学习。在具有挑战性的跨域跨任务传输设置下,早期的指标并不能很好地工作,但是最近的OTCE评分通过使用辅助任务获得了显著的性能。一个名为OT-based NCE score的简化版本牺牲了准确度以提高效率,但它可以进一步改进。因此,我们提出了一个实用的可转移性度量JC-NCE评分,以进一步提高跨域跨任务可转移性估计的性能,该评分比OTCE评分更有效,比基于OT的NCE评分更准确。具体来说,我们通过求解一个同时考虑样本距离和标签距离的最优传输问题来建立源数据和目标数据之间的联合对应关系,然后计算可传输性得分作为负条件熵。在数据集内和数据集间转移设置下的广泛验证表明,我们的JC-NCE得分优于基于OT的NCE得分,分别获得约7%和12%的收益。 摘要:Transferability estimation is an essential problem in transfer learning to predict how good the performance is when transfer a source model (source task) to a target task. Recent analytical transferability metrics have been widely used for source model selection and multi-task learning. Earlier metrics does not work sufficiently well under the challenging cross-domain cross-task transfer settings, but recent OTCE score achieves a noteworthy performance using auxiliary tasks. A simplified version named OT-based NCE score sacrifices accuracy to be more efficient, but it can be further improved. Consequently, we propose a practical transferability metric called JC-NCE score to further improve the cross-domain cross-task transferability estimation performance, which is more efficient than the OTCE score and more accurate than the OT-based NCE score. Specifically, we build the joint correspondences between source and target data via solving an optimal transport problem with considering both the sample distance and label distance, and then compute the transferability score as the negative conditional entropy. Extensive validations under the intra-dataset and inter-dataset transfer settings demonstrate that our JC-NCE score outperforms the OT-based NCE score with about 7% and 12% gains, respectively.
【8】 Neural Network Classifier as Mutual Information Evaluator 标题:作为互信息评估器的神经网络分类器
作者:Zhenyue Qin,Dongwoo Kim,Tom Gedeon 机构: From a varia-Equal contribution 1School of Computing, Australian Na-tional University 2GSAI 备注:arXiv admin note: substantial text overlap with arXiv:1911.10688 链接:https://arxiv.org/abs/2106.10471 摘要:具有softmax输出的交叉熵损失是训练神经网络分类器的标准选择。提出了以softmax和交叉熵作为互信息评价器的神经网络分类器的新观点。我们证明了当数据集是平衡的,训练一个具有交叉熵的神经网络通过互信息的变分形式使输入和标签之间的互信息最大化。因此,我们开发了一种新形式的softmax,当数据集不平衡时,它还将分类器转换为互信息评估器。实验结果表明,这种新的分类形式具有更好的分类精度,特别是对于不平衡数据集。 摘要:Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanced, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.
【9】 Improving Compositional Generalization in Classification Tasks via Structure Annotations 标题:通过结构标注改进分类任务中的成分泛化
作者:Juyong Kim,Pradeep Ravikumar,Joshua Ainslie,Santiago Ontañón 机构:Carnegie Mellon University, Santiago Onta˜n´on, Google Research 备注:Accepted as a short paper at ACL 2021 链接:https://arxiv.org/abs/2106.10434 摘要:组合泛化是通过组合已知的成分,系统地泛化到一个新的数据分布的能力。虽然人类似乎有很强的合成概括能力,但最先进的神经模型很难做到这一点。在这项工作中,我们研究了分类任务中的合成概括,并提出了两个主要贡献。首先,我们研究如何将自然语言序列转换成序列数据集,再转换成同样需要合成泛化的分类数据集。其次,我们展示了提供结构提示(特别是提供解析树和实体链接作为转换器模型的注意遮罩)有助于合成泛化。 摘要:Compositional generalization is the ability to generalize systematically to a new data distribution by combining known components. Although humans seem to have a great ability to generalize compositionally, state-of-the-art neural models struggle to do so. In this work, we study compositional generalization in classification tasks and present two main contributions. First, we study ways to convert a natural language sequence-to-sequence dataset to a classification dataset that also requires compositional generalization. Second, we show that providing structural hints (specifically, providing parse trees and entity links as attention masks for a Transformer model) helps compositional generalization.
【10】 Variance-Dependent Best Arm Identification 标题:基于方差的最佳ARM辨识
作者:Pinyan Lu,Chao Tao,Xiaojin Zhang 机构:ITCS, Shanghai University of Finance and Economics, Department of Computer Science, Indiana University Bloomington, Department of Computer Science and Engineering, The Chinese University of Hong Kong 链接:https://arxiv.org/abs/2106.10417 摘要:研究了随机多臂土匪博弈中最优臂的确定问题。给定一组从$1$到$n$索引的$n$arms,每个arm$i$与一个未知的奖励分布相关联,该奖励分布在$[0,1]$上,平均值为$thetau i$,方差为$sigmau i^2$。假设$thetau 1>thetau 2geqcdotsgeqthetau n$。我们提出了一种自适应算法,利用一种称为分组中值消元法的新方法,探索手臂奖励的差距和方差,并根据收集到的信息做出未来的决策。该算法保证输出概率为$(1-delta)$的最佳arm,最多使用$O左(sum{i=1}^n左(frac{sigma{i^2}{delta{i^2} frac{1}{delta{i}右)(ln delta^{-1} ln ln delta{i^-1})right)$样本,其中$Deltau i$($igeq 2$)表示arm$i$和最佳arm之间的报酬差距,我们定义$Deltau 1=Deltau 2$。在某些有利的情况下,这与方差无关算法相比具有显著的优势,并且与最新技术相比,这是第一个消除最佳arm上额外的$ln n$因子的结果。我们进一步证明了$Omegaleft(sum{i=1}^nleft(frac{sigma{i^2}{Delta{i^2} frac{1}{Delta{i} right)lnDelta^{-1} right)$样本是算法实现相同目标所必需的,从而说明我们的算法在双对数项下是最优的。 摘要:We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $theta_i$ and variance $sigma_i^2$. Assume $theta_1 > theta_2 geq cdots geqtheta_n$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-delta)$ and uses at most $O left(sum_{i = 1}^n left(frac{sigma_i^2}{Delta_i^2} frac{1}{Delta_i}right)(ln delta^{-1} ln ln Delta_i^{-1})right)$ samples, where $Delta_i$ ($i geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $Delta_1 = Delta_2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $ln n$ factor on the best arm compared with the state-of-the-art. We further show that $Omega left( sum_{i = 1}^n left( frac{sigma_i^2}{Delta_i^2} frac{1}{Delta_i} right) ln delta^{-1} right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.
【11】 Exoskeleton-Based Multimodal Action and Movement Recognition: Identifying and Developing the Optimal Boosted Learning Approach 标题:基于外骨骼的多模态动作和动作识别:识别和开发最佳增强学习方法
作者:Nirmalya Thakur,Chia Y. Han 机构: Han 2 1 Department of Electrical Engineering and Computer Science, University of Cincinnati, 2 Department of Electrical Engineering and Computer Science 备注:None 链接:https://arxiv.org/abs/2106.10331 摘要:本文在基于外骨骼的动作和运动识别领域做出了两项科学贡献。首先,提出了一种新的基于机器学习和模式识别的框架,该框架能够检测多种行为和动作——行走、上楼行走、下楼行走、坐、站、卧、站坐、站、坐、站、坐、卧、卧、坐、卧、卧、立,总体准确率为82.63%。其次,对随机林、人工神经网络、决策树、多向决策树、支持向量机、k-NN、梯度增强树、决策树桩、自动MLP、线性回归、向量线性回归、随机树、Na-ive Bayes、Na-ive Bayes(核)等不同的学习方法进行了综合比较研究,线性判别分析、二次判别分析和深度学习应用于该框架。采用AdaBoost算法提高了每种学习方法的性能,并采用交叉验证方法进行训练和测试。结果表明,在增强形式下,k-NN分类器的性能优于所有其他增强学习方法,因此是实现这一目的的最佳学习方法。所呈现和讨论的结果支持了这项工作的重要性,有助于在未来基于物联网的生活环境(如智能家居)中增强老年人基于外骨骼的辅助和独立生活能力。作为一个特定的用例,我们还讨论了我们的研究结果如何与增强混合辅助肢体外骨骼(一种功能强大的下肢外骨骼)的功能相关。 摘要:This paper makes two scientific contributions to the field of exoskeleton-based action and movement recognition. First, it presents a novel machine learning and pattern recognition-based framework that can detect a wide range of actions and movements - walking, walking upstairs, walking downstairs, sitting, standing, lying, stand to sit, sit to stand, sit to lie, lie to sit, stand to lie, and lie to stand, with an overall accuracy of 82.63%. Second, it presents a comprehensive comparative study of different learning approaches - Random Forest, Artificial Neural Network, Decision Tree, Multiway Decision Tree, Support Vector Machine, k-NN, Gradient Boosted Trees, Decision Stump, Auto MLP, Linear Regression, Vector Linear Regression, Random Tree, Na"ive Bayes, Na"ive Bayes (Kernel), Linear Discriminant Analysis, Quadratic Discriminant Analysis, and Deep Learning applied to this framework. The performance of each of these learning approaches was boosted by using the AdaBoost algorithm, and the Cross Validation approach was used for training and testing. The results show that in boosted form, the k- NN classifier outperforms all the other boosted learning approaches and is, therefore, the optimal learning method for this purpose. The results presented and discussed uphold the importance of this work to contribute towards augmenting the abilities of exoskeleton-based assisted and independent living of the elderly in the future of Internet of Things-based living environments, such as Smart Homes. As a specific use case, we also discuss how the findings of our work are relevant for augmenting the capabilities of the Hybrid Assistive Limb exoskeleton, a highly functional lower limb exoskeleton.
【12】 Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation 标题:多类分类中的良性过拟合:所有道路都会导致插值
作者:Ke Wang,Vidya Muthukumar,Christos Thrampoulidis 机构:∗Department of Statistics and Applied Probability, University of California Santa Barbara, †Electrical and Computer Engineering & Industrial and Systems Engineering, Georgia Institute of Technology 链接:https://arxiv.org/abs/2106.10865 摘要:关于过度参数化模型中“良性过度拟合”的文献越来越多,主要局限于回归或二元分类设置;然而,现代机器学习的大多数成功案例都是在多类环境中记录的。基于这种差异,我们研究了多类线性分类中的良性过拟合。具体来说,我们考虑了以下几种常用的可分离数据训练算法:(i)交叉熵损失的经验风险最小化(ERM),它收敛于多类支持向量机(SVM)解(ii)带最小二乘损失的ERM,收敛到最小范数插值(MNI)解;以及,(iii)一对所有支持向量机分类器。首先,我们提供了一个简单的充分条件,在这个条件下,所有三种算法都可以得到插值训练数据且具有相同精度的分类器。当数据由高斯混合或多项式logistic模型生成时,在足够高的有效过参数化条件下,该条件成立。其次,我们推导了MNI分类器精度的新误差界,从而表明在充分的过参数化条件下,所有三种训练算法都会导致良性过拟合。最后,我们的分析表明,良好的推广是可能的SVM解决方案以外的领域,其中典型的保证金为基础的界限适用。 摘要:The growing literature on "benign overfitting" in overparameterized models has been mostly restricted to regression or binary classification settings; however, most success stories of modern machine learning have been recorded in multiclass settings. Motivated by this discrepancy, we study benign overfitting in multiclass linear classification. Specifically, we consider the following popular training algorithms on separable data: (i) empirical risk minimization (ERM) with cross-entropy loss, which converges to the multiclass support vector machine (SVM) solution; (ii) ERM with least-squares loss, which converges to the min-norm interpolating (MNI) solution; and, (iii) the one-vs-all SVM classifier. First, we provide a simple sufficient condition under which all three algorithms lead to classifiers that interpolate the training data and have equal accuracy. When the data is generated from Gaussian mixtures or a multinomial logistic model, this condition holds under high enough effective overparameterization. Second, we derive novel error bounds on the accuracy of the MNI classifier, thereby showing that all three training algorithms lead to benign overfitting under sufficient overparameterization. Ultimately, our analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply.
【13】 Learning Signal Representations for EEG Cross-Subject Channel Selection and Trial Classification 标题:用于脑电信号跨学科通道选择和试验分类的学习信号表示
作者:Michela C. Massi,Francesca Ieva 机构:a MOX - Dept. of Mathematics, Politecnico di Milano, b CADS - Human Technopole, c CHRP - Bicocca University 链接:https://arxiv.org/abs/2106.10633 摘要:脑电技术在许多领域都有应用。目前,大多数脑电图系统都要求受试者在头皮上佩戴多个电极才能发挥作用。然而,许多通道可能包含噪声信息、冗余信号、导致较长的准备时间和增加任何脑电解码自动系统的计算时间。将信道选择与特征提取相结合是降低信噪比和提高分类精度的一种方法,但脑电信号具有很高的个体间变异性。在这项工作中,我们介绍了一种新的算法,独立于主题的通道选择脑电记录。该算法以多通道试验记录为统计单元,以脑电解码任务为参考类,采用有监督的方式,利用通道特定的一维卷积神经网络(1dcnns)作为特征抽取器,最大限度地提高了类的可分性(ii)通过连接通道的嵌入,将高维多通道试验表示减少为唯一的试验向量,并且(iii)恢复通道选择期间的复杂通道间关系,通过利用一个集成的自动编码器(AE)从这些矢量中识别出最相关的通道来进行分类。经过训练后,该算法只需将选定的信道特定一维cnn的参数化子群转换成新的被试信号,就可以得到低维、高信息量的试验向量,并将其提供给任何分类器。 摘要:EEG technology finds applications in several domains. Currently, most EEG systems require subjects to wear several electrodes on the scalp to be effective. However, several channels might include noisy information, redundant signals, induce longer preparation times and increase computational times of any automated system for EEG decoding. One way to reduce the signal-to-noise ratio and improve classification accuracy is to combine channel selection with feature extraction, but EEG signals are known to present high inter-subject variability. In this work we introduce a novel algorithm for subject-independent channel selection of EEG recordings. Considering multi-channel trial recordings as statistical units and the EEG decoding task as the class of reference, the algorithm (i) exploits channel-specific 1D-Convolutional Neural Networks (1D-CNNs) as feature extractors in a supervised fashion to maximize class separability; (ii) it reduces a high dimensional multi-channel trial representation into a unique trial vector by concatenating the channels' embeddings and (iii) recovers the complex inter-channel relationships during channel selection, by exploiting an ensemble of AutoEncoders (AE) to identify from these vectors the most relevant channels to perform classification. After training, the algorithm can be exploited by transferring only the parametrized subgroup of selected channel-specific 1D-CNNs to new signals from new subjects and obtain low-dimensional and highly informative trial vectors to be fed to any classifier.
表征(2篇)
【1】 Universal Rate-Distortion-Perception Representations for Lossy Compression 标题:有损压缩的通用率失真感知表示法
作者:George Zhang,Jingjing Qian,Jun Chen,Ashish Khisti 机构: University of Toronto†Department of Electrical and Computer Engineering, McMaster UniversityPreprint 链接:https://arxiv.org/abs/2106.10311 摘要:在有损压缩的背景下,Blau&Michaeli(2019)采用了感知质量的数学概念,定义了信息率失真感知函数,推广了经典的率失真权衡。我们考虑通用表示的概念,其中可以固定编码器并改变解码器,以实现失真和感知约束集合中的任何点。我们证明了相应的信息论通用率失真感知函数在近似意义下是可操作的。在均方误差失真下,我们证明了高斯信源的整个失真感知折衷可以通过一个相同码率的编码器渐近实现。然后,在任意分布的情况下,我们刻画了固定表示的可实现失真感知区域,确定了上述结果继续保持近似的条件,并研究了速率不固定的情况。这激发了对实际结构的研究,这些实际结构在RDP权衡中几乎是通用的,从而减轻了为每个目标设计新编码器的需要。我们在MNIST和SVHN上的实验结果表明,在图像压缩任务中,使用固定编码器的机器学习模型与使用可变编码器的机器学习模型相比,在操作上的折衷只受到很小的惩罚。 摘要:In the context of lossy compression, Blau & Michaeli (2019) adopt a mathematical notion of perceptual quality and define the information rate-distortion-perception function, generalizing the classical rate-distortion tradeoff. We consider the notion of universal representations in which one may fix an encoder and vary the decoder to achieve any point within a collection of distortion and perception constraints. We prove that the corresponding information-theoretic universal rate-distortion-perception function is operationally achievable in an approximate sense. Under MSE distortion, we show that the entire distortion-perception tradeoff of a Gaussian source can be achieved by a single encoder of the same rate asymptotically. We then characterize the achievable distortion-perception region for a fixed representation in the case of arbitrary distributions, identify conditions under which the aforementioned results continue to hold approximately, and study the case when the rate is not fixed in advance. This motivates the study of practical constructions that are approximately universal across the RDP tradeoff, thereby alleviating the need to design a new encoder for each objective. We provide experimental results on MNIST and SVHN suggesting that on image compression tasks, the operational tradeoffs achieved by machine learning models with a fixed encoder suffer only a small penalty when compared to their variable encoder counterparts.
【2】 Representations and Strategies for Transferable Machine Learning Models in Chemical Discovery 标题:化学发现中可迁移机器学习模型的表示与策略
作者:Daniel R. Harper,Aditya Nandy,Naveen Arunachalam,Chenru Duan,Jon Paul Janet,Heather J. Kulik 机构:Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA , #These authors contributed equally. 链接:https://arxiv.org/abs/2106.10768 摘要:机器学习(ML)的策略-加速发现是跨材料组成空间的通用策略是必不可少的,但ML的演示主要局限于狭窄的组成变化。通过解决化学空间中具有挑战性的目标(如开壳过渡金属络合物)的有前景区域的数据稀缺问题,利用现有数据中已知关系的一般表示和可转移ML模型将加速发现。在一大组(约1000)的同价过渡金属配合物中,我们量化了周期表行(即3d/4d金属和2p/3p配体)之间不同性质(即自旋分裂和配体解离)的明显关系。我们展示了一个基于图的修正自相关(RAC)表示(即eRAC)的扩展,它结合了有效核电荷和核电荷启发式,否则会高估同价复合物的相异性。为了解决在数据有限的新空间中发现的共同挑战,我们引入了一种转移学习方法,在该方法中,我们使用来自周期表一行的大量数据和来自附加行的少量数据点来训练模型。我们展示了eRACs与这种转移学习策略的协同价值,以持续改进模型性能。对这些模型的分析强调了这种方法是如何通过重新排列络合物之间的距离,使之与周期表更加一致而取得成功的,我们期望这一特性对其他材料领域有广泛的用途。 摘要:Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.
编码器(1篇)
【1】 Matrix Encoding Networks for Neural Combinatorial Optimization 标题:神经组合优化的矩阵编码网络
作者:Yeong-Dae Kwon,Jinho Choo,Iljoo Yoon,Minah Park,Duwon Park,Youngjune Gwon 机构:Samsung SDS 备注:under review 链接:https://arxiv.org/abs/2106.11113 摘要:机器学习有助于更好地解决组合优化问题。一种流行的方法是使用神经网络来计算给定的协同问题的参数,并提取有用的信息来指导寻找好的解决方案。许多具有实际意义的协同问题可以用参数矩阵形式来表示,这些参数量化了两组项目之间的关系。然而,目前还没有一种神经网络模型能够将这种矩阵式的关系数据作为输入。因此,这些类型的协同问题已经超出了ML工程师的能力范围。本文介绍了矩阵编码网络(MatNet),并说明它是如何方便地接受和处理这类复杂的CO问题的参数的。采用基于MatNet的端到端模型,解决了非对称旅行商(ATSP)和柔性流水车间(FFSP)问题。特别是,对于我们测试MatNet的一类FFSP,我们证明了比迄今已知的任何方法(神经或非神经)都优越得多的经验性能。 摘要:Machine Learning (ML) can help solve combinatorial optimization (CO) problems better. A popular approach is to use a neural net to compute on the parameters of a given CO problem and extract useful information that guides the search for good solutions. Many CO problems of practical importance can be specified in a matrix form of parameters quantifying the relationship between two groups of items. There is currently no neural net model, however, that takes in such matrix-style relationship data as an input. Consequently, these types of CO problems have been out of reach for ML engineers. In this paper, we introduce Matrix Encoding Network (MatNet) and show how conveniently it takes in and processes parameters of such complex CO problems. Using an end-to-end model based on MatNet, we solve asymmetric traveling salesman (ATSP) and flexible flow shop (FFSP) problems as the earliest neural approach. In particular, for a class of FFSP we have tested MatNet on, we demonstrate a far superior empirical performance to any methods (neural or not) known to date.
优化|敛散性(7篇)
【1】 How Do Adam and Training Strategies Help BNNs Optimization? 标题:ADAM和训练策略如何帮助BNN优化?
作者:Zechun Liu,Zhiqiang Shen,Shichao Li,Koen Helwegen,Dong Huang,Kwang-Ting Cheng 机构: IntroductionBinary Neural Networks (BNNs) have gained increasingattention in recent years due to the high compression ra-Equal contribution 1Hong Kong University of Science andTechnology 2Carnegie Mellon University 3Plumerai 备注:ICML 2021. Code and models are available at this https URL 链接:https://arxiv.org/abs/2106.11309 摘要:最佳性能的二元神经网络(BNNs)通常是通过Adam优化及其多步训练来实现的。然而,据我们所知,很少有研究探讨Adam在BNN优化方面优于SGD等其他优化器的根本原因,或者提供支持特定训练策略的分析性解释。为了解决这个问题,在本文中,我们首先研究了在训练过程中的梯度和权重的轨迹。我们证明了Adam中二阶动量的正则化效应对于恢复BNNs中由于激活饱和而死亡的权值是至关重要的。我们发现Adam通过其自适应学习速率策略,能够更好地处理BNNs的粗糙损失面,并获得更好的最优解,具有更高的泛化能力。此外,我们还考察了实值权值在二元网络中的有趣作用,揭示了权值衰减对BNN优化稳定性和迟滞性的影响。通过大量的实验和分析,我们得到了一个简单的训练方案,建立在现有的基于Adam的优化基础上,使用与最先进的ReActNet相同的体系结构,在ImageNet数据集上实现了70.5%的top-1精度,同时实现了1.1%的更高精度。代码和型号可在https://github.com/liuzechun/AdamBNN. 摘要:The best performing Binary Neural Networks (BNNs) are usually attained using Adam optimization and its multi-step training variants. However, to the best of our knowledge, few studies explore the fundamental reasons why Adam is superior to other optimizers like SGD for BNN optimization or provide analytical explanations that support specific training strategies. To address this, in this paper we first investigate the trajectories of gradients and weights in BNNs during the training process. We show the regularization effect of second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. We find that Adam, through its adaptive learning rate strategy, is better equipped to handle the rugged loss surface of BNNs and reaches a better optimum with higher generalization ability. Furthermore, we inspect the intriguing role of the real-valued weights in binary networks, and reveal the effect of weight decay on the stability and sluggishness of BNN optimization. Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset using the same architecture as the state-of-the-art ReActNet while achieving 1.1% higher accuracy. Code and models are available at https://github.com/liuzechun/AdamBNN.
【2】 Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task? 标题:优化源任务性能是否意味着目标任务的最佳预训练?
作者:Steven Gutstein,Brent Lance,Sanjay Shakkottai 机构: of Electrical and Computer Engineeringat the University of Texas at Austin 链接:https://arxiv.org/abs/2106.11174 摘要:预训练的深网通常用于提高神经网络的精度和训练时间。一般认为,对网络进行预训练以获得最佳的源任务性能最有利于网络学习任意目标任务。这通常不是真的。停止源任务训练,在达到最佳性能之前,可以创建一个更适合学习新任务的预训练网络。我们做了几个实验来证明这种效果,以及训练量和学习率的影响。此外,我们发现,这反映了学习能力的普遍丧失,甚至延伸到重新学习源任务 摘要:Pre-trained deep nets are commonly used to improve accuracies and training times for neural nets. It is generally assumed that pre-training a net for optimal source task performance best prepares it to learn an arbitrary target task. This is generally not true. Stopping source task training, prior to optimal performance, can create a pre-trained net better suited for learning a new task. We performed several experiments demonstrating this effect, as well as the influence of amount of training and of learning rate. Additionally, we show that this reflects a general loss of learning ability that even extends to relearning the source task
【3】 OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation 标题:OptiDICE:基于平稳分布修正估计的离线策略优化
作者:Jongmin Lee,Wonseok Jeon,Byung-Jun Lee,Joelle Pineau,Kee-Eung Kim 机构:Equal contribution 1School of Computing, Quebec AI Institute 3School of Computer Science, 5Facebook AI Research 6GraduateSchool of AI 备注:26 pages, 11 figures, Accepted at ICML 2021 链接:https://arxiv.org/abs/2106.10783 摘要:我们考虑离线强化学习(RL)设置,其中agent的目标是仅从数据中优化策略,而不需要进一步的环境交互。在离线RL中,由于优化的目标策略偏离了用于数据收集的行为策略,分布转移成为主要的困难来源。这通常会导致对动作值的高估,这对使用自举的无模型算法带来了严重的问题。为了缓解这个问题,以前的离线RL算法经常使用复杂的技术,鼓励低估动作值,这就引入了一组额外的需要适当调整的超参数。在本文中,我们提出了一个离线RL算法,以更具原则性的方式防止高估。与以往的离线RL算法不同,我们的算法OptiDICE直接估计最优策略的平稳分布修正,不依赖于策略梯度。使用一组广泛的离线RL基准数据集,我们证明optdice的性能与最先进的方法相当。 摘要:We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy and does not rely on policy-gradients, unlike previous offline RL algorithms. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.
【4】 Optimal Strategies for Decision Theoretic Online Learning 标题:决策论在线学习的优化策略
作者:Yoav Freund 链接:https://arxiv.org/abs/2106.10717 摘要:我们将漂移对策分析推广到连续时间,并证明了当值函数具有严格的四阶正导数时,最优对手是bronian运动。 摘要:We extend the drifting games analysis to continuous time and show that the optimal adversary, if the value function has strictly positive derivative up to fourth order is bronian motion.
【5】 Complexity-Free Generalization via Distributionally Robust Optimization 标题:基于分布式鲁棒优化的无复杂度泛化
作者:Henry Lam,Yibo Zeng 机构:Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 链接:https://arxiv.org/abs/2106.11180 摘要:在数据驱动优化和机器学习中,获得泛化界的方法大多建立在经验风险最小化(ERM)的基础上,而经验风险最小化在很大程度上依赖于假设类的函数复杂性。在本文中,我们提出了一种从分布式鲁棒优化(DRO)中获得这些解的界的替代方法,DRO是一种基于最坏情况分析和模糊集概念的数据驱动优化框架,用于捕获统计不确定性。与ERM中假设类的复杂性不同,我们的DRO界依赖于模糊集的几何结构及其与真损失函数的相容性。值得注意的是,当使用最大平均差异作为DRO距离度量时,我们的分析意味着,据我们所知,文献中的第一个推广界完全依赖于真实损失函数,完全没有任何复杂度度量或假设类的界限。 摘要:Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function, entirely free of any complexity measures or bounds on the hypothesis class.
【6】 α-Stable convergence of heavy-tailed infinitely-wide neural networks标题:α-重尾无限宽神经网络的稳定收敛性
作者:Paul Jung,Hoil Lee,Jiho Lee,Hongseok Yang 链接:https://arxiv.org/abs/2106.11064 摘要:我们考虑无限宽的多层感知器(mlp),这是标准的深度前馈神经网络的限制。我们假设,对于每一层,MLP的权重都是用来自对称$alpha$-稳定分布吸引域中的轻尾(有限方差)或重尾分布的i.i.d.样本初始化的,其中$alphain(0,2]$可能取决于层。对于层的偏差项,我们假设i.i.d.初始化具有对称的$alpha$-稳定分布,该层具有相同的$alpha$参数。然后,我们扩展了Favaro、Fortini和Peluchetti(2020)的最新结果,以表明给定隐藏层的所有节点上的预激活值向量在适当的比例下收敛到具有对称$alpha$-稳定分布的i.i.d.随机变量向量。 摘要:We consider infinitely-wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with i.i.d. samples from either a light-tailed (finite variance) or heavy-tailed distribution in the domain of attraction of a symmetric $alpha$-stable distribution, where $alphain(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $alpha$-stable distribution having the same $alpha$ parameter of that layer. We then extend a recent result of Favaro, Fortini, and Peluchetti (2020), to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $alpha$-stable distributions.
【7】 Rayleigh-Gauss-Newton optimization with enhanced sampling for variational Monte Carlo 标题:变分蒙特卡罗的增强抽样Rayleigh-Gauss-Newton优化
作者:Robert J. Webber,Michael Lindsey 机构:Courant Institute of Mathematical Sciences, New York University, New York , USA 备注:12 pages, 7 figures 链接:https://arxiv.org/abs/2106.10558 摘要:变分蒙特卡罗(VMC)是一种计算基态波函数的方法,由于引入了基于神经网络的波函数参数化,这种方法最近变得更加强大。然而,有效地训练神经波函数使其收敛到能量最小值仍然是一个难题。在这项工作中,我们分析了VMC中使用的优化和抽样方法,并介绍了改进方法来提高它们的性能。首先,基于理论上的无噪声收敛性分析,提出了一种新的优化算法Rayleigh-Gauss-Newton法,它可以改进梯度下降法和自然梯度下降法,实现超线性收敛。其次,为了在随机噪声存在的情况下实现这种有利的比较,我们分析了采样误差对VMC参数更新的影响,并通过实验证明了并行回火方法可以降低这种影响。特别地,我们证明了RGN对于在优化过程中采样器获得新的配置空间区域时出现的能量峰值具有鲁棒性。最后,将理论应用到实际中,我们将改进的优化和采样方法应用到大晶格的横向场Ising和XXZ模型中,只需更新200-500个参数,就可以得到非常高精度的基态能量估计。 摘要:Variational Monte Carlo (VMC) is an approach for computing ground-state wavefunctions that has recently become more powerful due to the introduction of neural network-based wavefunction parametrizations. However, efficiently training neural wavefunctions to converge to an energy minimum remains a difficult problem. In this work, we analyze optimization and sampling methods used in VMC and introduce alterations to improve their performance. First, based on theoretical convergence analysis in a noiseless setting, we motivate a new optimizer that we call the Rayleigh-Gauss-Newton method, which can improve upon gradient descent and natural gradient descent to achieve superlinear convergence. Second, in order to realize this favorable comparison in the presence of stochastic noise, we analyze the effect of sampling error on VMC parameter updates and experimentally demonstrate that it can be reduced by the parallel tempering method. In particular, we demonstrate that RGN can be made robust to energy spikes that occur when new regions of configuration space become available to the sampler over the course of optimization. Finally, putting theory into practice, we apply our enhanced optimization and sampling methods to the transverse-field Ising and XXZ models on large lattices, yielding ground-state energy estimates with remarkably high accuracy after just 200-500 parameter updates.
预测|估计(11篇)
【1】 VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning 标题:VIMPAC:基于掩蔽标记预测和对比学习的视频预训练
作者:Hao Tan,Jie Lei,Thomas Wolf,Mohit Bansal 机构:UNC Chapel Hill, Huggingface 备注:Under review, 23 Pages 链接:https://arxiv.org/abs/2106.11250 摘要:视频理解依赖于感知全局内容和建模其内部联系(例如因果关系、运动和时空对应)。为了了解这些交互作用,我们应用一个掩码,然后预测通过VQ-VAE生成的离散化视频令牌的预训练任务。与语言不同的是,文本标记更加独立,相邻的视频标记通常具有很强的相关性(例如,连续的视频帧通常看起来非常相似),因此统一地屏蔽单个标记将使任务变得过于琐碎而无法学习有用的表示。为了解决这个问题,我们提出了一种分块掩蔽策略,在空间域和时间域中掩蔽相邻的视频令牌。我们还添加了一种无需增强的对比学习方法,通过预测视频片段是否来自同一视频来进一步捕获全局内容。我们在未分级的视频上对我们的模型进行了预训练,并表明我们的预训练模型可以在多个视频理解数据集(如SSV2、g48)上达到最先进的结果。最后,对模型的可扩展性和预训练方法设计进行了详细的分析。代码发布时间https://github.com/airsplay/vimpac. 摘要:Video understanding relies on perceiving the global content and modeling its internal connections (e.g., causality, movement, and spatio-temporal correspondence). To learn these interactions, we apply a mask-then-predict pre-training task on discretized video tokens generated via VQ-VAE. Unlike language, where the text tokens are more independent, neighboring video tokens typically have strong correlations (e.g., consecutive video frames usually look very similar), and hence uniformly masking individual tokens will make the task too trivial to learn useful representations. To deal with this issue, we propose a block-wise masking strategy where we mask neighboring video tokens in both spatial and temporal domains. We also add an augmentation-free contrastive learning method to further capture the global content by predicting whether the video clips are sampled from the same video. We pre-train our model on uncurated videos and show that our pre-trained model can reach state-of-the-art results on several video understanding datasets (e.g., SSV2, Diving48). Lastly, we provide detailed analyses on model scalability and pre-training method design. Code is released at https://github.com/airsplay/vimpac.
【2】 Neural Controlled Differential Equations for Online Prediction Tasks 标题:用于在线预测任务的神经控制微分方程
作者:James Morrill,Patrick Kidger,Lingyi Yang,Terry Lyons 机构:Mathematical Institute, University of Oxford, Oxford, UK 链接:https://arxiv.org/abs/2106.11028 摘要:神经控制微分方程(Neural-controlled differential equations,Neural-CDEs)是递归神经网络(recurrent Neural networks,RNNs)的一种连续时间扩展,在不规则时间序列函数的建模方面取得了最先进的性能。为了在连续时间内解释离散数据,当前的实现依赖于数据的非因果插值。当预先观察整个时间序列时,这很好,但这意味着神经CDE不适合用于需要实时进行预测的在线预测任务:递归网络的主要用例。在这里,我们将展示如何纠正这一限制。首先,我们确定了神经cde插值格式应满足的几个理论条件,如有界性和唯一性。第二,我们用这些来激励新方案的引入,以解决这些问题,特别是提供可测量性(用于在线预测)和平滑性(用于速度)。第三,我们在模拟IV医学数据库的三个连续监测任务上对我们的在线神经CDE模型进行了经验基准测试:我们在所有任务上与ODE基准测试相比表现出了更好的性能,并且在三个任务中的两个任务上与SOTA非ODE基准测试相比表现出了更好的性能。 摘要:Neural controlled differential equations (Neural CDEs) are a continuous-time extension of recurrent neural networks (RNNs), achieving state-of-the-art (SOTA) performance at modelling functions of irregular time series. In order to interpret discrete data in continuous time, current implementations rely on non-causal interpolations of the data. This is fine when the whole time series is observed in advance, but means that Neural CDEs are not suitable for use in textit{online prediction tasks}, where predictions need to be made in real-time: a major use case for recurrent networks. Here, we show how this limitation may be rectified. First, we identify several theoretical conditions that interpolation schemes for Neural CDEs should satisfy, such as boundedness and uniqueness. Second, we use these to motivate the introduction of new schemes that address these conditions, offering in particular measurability (for online prediction), and smoothness (for speed). Third, we empirically benchmark our online Neural CDE model on three continuous monitoring tasks from the MIMIC-IV medical database: we demonstrate improved performance on all tasks against ODE benchmarks, and on two of the three tasks against SOTA non-ODE benchmarks.
【3】 Lossy Compression for Lossless Prediction 标题:有损压缩在无损预测中的应用
作者:Yann Dubois,Benjamin Bloem-Reddy,Karen Ullrich,Chris J. Maddison 机构:The University of British Columbia, Facebook AI Research, University of Toronto, Vector Institute 链接:https://arxiv.org/abs/2106.10800 摘要:大多数数据是自动收集的,只有算法才能“看到”。然而,数据压缩器保持了感知保真度,而不仅仅是执行下游任务的算法所需的信息。在本文中,我们描述了在一组变换(如数据扩充)下保持不变的所有预测任务上确保高性能所需的比特率。基于我们的理论,我们设计了训练神经压缩器的无监督目标。利用这些目标,我们训练了一个通用的图像压缩程序,与8个数据集上的JPEG相比,它在不降低下游分类性能的情况下实现了显著的速率节省(超过ImageNet上的1000美元×1000美元)。 摘要:Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.
【4】 On predicting research grants productivity 标题:关于科研经费生产率预测的几点思考
作者:Jorge A. V. Tohalino,Diego R. Amancio 机构:Institute of Mathematics and Computer Science, Department of Computer Science, University of S˜ao Paulo, S˜ao Carlos, SP, Brazil, ) 链接:https://arxiv.org/abs/2106.10700 摘要:了解与成功提案相关的原因对于改进评估过程至关重要。在此背景下,我们分析了文献计量学特征是否能够预测研究资助的成功。我们提取了巴西研究人员学术史的特征,包括研究主题、隶属关系、出版物数量和知名度。然后利用提取的特征通过机器学习预测医学、牙科和兽医三个主要研究领域的生产率。我们发现,研究主题和出版历史在预测生产力方面发挥了作用。此外,基于机构的功能在与其他功能结合时证明是相关的。虽然最好的结果优于基于文本的属性,但评估的特征没有高度的区分性。我们的研究结果表明,预测助学金的成功,至少在考虑到一系列文献计量学特征的情况下,不是一项简单的任务。 摘要:Understanding the reasons associated with successful proposals is of paramount importance to improve evaluation processes. In this context, we analyzed whether bibliometric features are able to predict the success of research grants. We extracted features aiming at characterizing the academic history of Brazilian researchers, including research topics, affiliations, number of publications and visibility. The extracted features were then used to predict grants productivity via machine learning in three major research areas, namely Medicine, Dentistry and Veterinary Medicine. We found that research subject and publication history play a role in predicting productivity. In addition, institution-based features turned out to be relevant when combined with other features. While the best results outperformed text-based attributes, the evaluated features were not highly discriminative. Our findings indicate that predicting grants success, at least with the considered set of bibliometric features, is not a trivial task.
【5】 Fast PDN Impedance Prediction Using Deep Learning 标题:基于深度学习的PDN阻抗快速预测
作者:Ling Zhang,Jack Juang,Zurab Kiguradze,Bo Pu,Shuai Jin,Songping Wu,Zhiping Yang,Chulsoon Hwang 机构:EMC Laboratory, Missouri University, of Science and Technology, Rolla, MO, Google Inc., Mountain View, CA, USA, Correspondence, Univeristy of Science and Technology., Present Address, Enterprise Dr., Rolla, MO 链接:https://arxiv.org/abs/2106.10693 摘要:对于具有不规则板形和多层堆叠的印刷电路板(pcb)的配电网(PDN)的建模与仿真,采用全波仿真方法计算效率很低。提出了一种利用深度学习进行PDN阻抗预测的新概念。采用边界元法(BEM)有效地计算了任意板形和叠层的阻抗。然后随机生成100多万块不同形状、堆叠、IC定位和decap布局的电路板来训练深度神经网络(DNN)。经过训练的DNN可以准确地预测没有用于训练的新电路板配置的阻抗。使用训练的DNN所消耗的时间仅为0.1秒,比BEM方法快100多倍,比全波模拟快5000倍。 摘要:Modeling and simulating a power distribution network (PDN) for printed circuit boards (PCBs) with irregular board shapes and multi-layer stackup is computationally inefficient using full-wave simulations. This paper presents a new concept of using deep learning for PDN impedance prediction. A boundary element method (BEM) is applied to efficiently calculate the impedance for arbitrary board shape and stackup. Then over one million boards with different shapes, stackup, IC location, and decap placement are randomly generated to train a deep neural network (DNN). The trained DNN can predict the impedance accurately for new board configurations that have not been used for training. The consumed time using the trained DNN is only 0.1 seconds, which is over 100 times faster than the BEM method and 5000 times faster than full-wave simulations.
【6】 Neural network interpretability for forecasting of aggregated renewable generatiion 标题:聚合可再生发电量预测的神经网络可解释性
作者:Yucun Lu,Ilgiz Murzakhanov,Spyros Chatzivasileiadis 机构:Center for Electric Power and Energy, Technical University of Denmark, Kgs. Lyngby, Denmark 链接:https://arxiv.org/abs/2106.10476 摘要:随着可再生能源的快速发展,大量小型光伏发电企业应运而生。由于太阳能发电的不确定性,需要聚合的消费者来预测太阳能发电以及太阳能发电是否会超过负荷。本文提出了两种可解释的神经网络来解决这一问题:一种二元分类神经网络和一种回归神经网络。神经网络是用张量流建立的。通过三种基于梯度的方法:综合梯度法、期望梯度法和DeepLIFT法来检验全局特征重要性和局部特征贡献。此外,我们还利用贝叶斯神经网络来估计预测的不确定性,以检测预测可能失败的异常情况。神经网络通过基于梯度的方法进行解释,并辅以不确定性估计,为决策者提供了稳健且可解释的预测。 摘要:With the rapid growth of renewable energy, lots of small photovoltaic (PV) prosumers emerge. Due to the uncertainty of solar power generation, there is a need for aggregated prosumers to predict solar power generation and whether solar power generation will be larger than load. This paper presents two interpretable neural networks to solve the problem: one binary classification neural network and one regression neural network. The neural networks are built using TensorFlow. The global feature importance and local feature contributions are examined by three gradient-based methods: Integrated Gradients, Expected Gradients, and DeepLIFT. Moreover, we detect abnormal cases when predictions might fail by estimating the prediction uncertainty using Bayesian neural networks. Neural networks, which are interpreted by gradient-based methods and complemented with uncertainty estimation, provide robust and explainable forecasting for decision-makers.
【7】 Prediction of the facial growth direction with Machine Learning methods 标题:基于机器学习方法的人脸生长方向预测
作者:Stanisław Kaźmierczak,Zofia Juszka,Piotr Fudalej,Jacek Mańdziuk 机构:Warsaw University of Technology, Warsaw, Poland, Prof. Loster’s Orthodontics, ul. Bart�lomieja Nowodworskiego , -, Krakow, Poland, Department of Orthodontics, Jagiellonian University in Krakow, Krakow, Poland 链接:https://arxiv.org/abs/2106.10464 摘要:预测面部生长(FG)方向的首次尝试是在半个多世纪前做出的。尽管经过多次尝试和时间的流逝,一个令人满意的方法尚未建立,这个问题仍然对医学专家提出了挑战。据我们所知,这是第一个机器学习方法预测FG方向。数据分析揭示了问题的内在复杂性,解释了基于二维X射线图像的FG方向预测困难的原因。为了进行增长预测,我们采用了各种各样的算法,从logistic回归、树集合到神经网络,并考虑了三种稍微不同的问题公式。分类精度在71%到75%之间。 摘要:First attempts of prediction of the facial growth (FG) direction were made over half of a century ago. Despite numerous attempts and elapsed time, a satisfactory method has not been established yet and the problem still poses a challenge for medical experts. To our knowledge, this paper is the first Machine Learning approach to the prediction of FG direction. Conducted data analysis reveals the inherent complexity of the problem and explains the reasons of difficulty in FG direction prediction based on 2D X-ray images. To perform growth forecasting, we employ a wide range of algorithms, from logistic regression, through tree ensembles to neural networks and consider three, slightly different, problem formulations. The resulting classification accuracy varies between 71% and 75%.
【8】 Robust M-estimation-based Tensor Ring Completion: a Half-quadratic Minimization Approach 标题:基于鲁棒M-估计的张量环完备化:半二次最小化方法
作者:Yicong He,George K. Atia 机构: Atia are with the Department of Electricaland Computer Engineering, University of Central Florida 链接:https://arxiv.org/abs/2106.10422 摘要:张量完备是从部分观测的数据中估计高阶数据缺失值的问题。在张量秩的几种定义中,张量环秩提供了建立不同阶张量模型所需的灵活性和准确性,这激发了近年来张量环完备的研究。然而,由于普遍存在的异常值而导致的数据损坏对现有算法提出了重大挑战。在本文中,我们提出了一种稳健的张量环完备方法,该方法使用M-估计作为其误差统计量,可以显著减轻异常值的影响。利用半二次(HQ)方法,我们将问题转化为一个加权张量完备问题。提出了两种基于截断奇异值分解和矩阵分解的HQ算法,并对其收敛性和复杂性进行了分析。讨论了该方法对张量秩的替代定义的可扩展性。实验结果表明,该方法优于现有的张量完备鲁棒算法。 摘要:Tensor completion is the problem of estimating the missing values of high-order data from partially observed entries. Among several definitions of tensor rank, tensor ring rank affords the flexibility and accuracy needed to model tensors of different orders, which motivated recent efforts on tensor-ring completion. However, data corruption due to prevailing outliers poses major challenges to existing algorithms. In this paper, we develop a robust approach to tensor ring completion that uses an M-estimator as its error statistic, which can significantly alleviate the effect of outliers. Leveraging a half-quadratic (HQ) method, we reformulate the problem as one of weighted tensor completion. We present two HQ-based algorithms based on truncated singular value decomposition and matrix factorization along with their convergence and complexity analysis. Extendibility of the proposed approach to alternative definitions of tensor rank is also discussed. The experimental results demonstrate the superior performance of the proposed approach over state-of-the-art robust algorithms for tensor completion.
【9】 Prediction-Free, Real-Time Flexible Control of Tidal Lagoons through Proximal Policy Optimisation: A Case Study for the Swansea Lagoon 标题:通过近期策略优化实现对潮汐泻湖的无预报、实时灵活控制--以斯旺西泻湖为例
作者:Túlio Marcondes Moreira,Jackson Geraldo de Faria Jr,Pedro O. S. Vaz de Melo,Luiz Chaimowicz,Gilberto Medeiros-Ribeiro 机构:Computer Science Department (DCC), Universidade Federal de Minas Gerais, Belo, Horizonte, Minas Gerais,-, Brazil 备注:33 pages, 10 figures and 11 tables 链接:https://arxiv.org/abs/2106.10360 摘要:潮差结构由于其在不排放温室气体的情况下产生合理可预测能量的潜在能力而被认为是大规模发电的基础。一旦驱动潮汐的主要强迫成分具有确定性动力学,就可以通过分析和数值优化程序,将给定潮汐发电厂中的可用能量估计为最可预测的事件。这一限制要求采用最先进的灵活操作方法,依靠潮汐预测(与测量数据同时进行,并在未来进行多次半潮汐周期)来推断潮汐泻湖的最佳操作策略,同时需要为每一次新潮汐运行优化程序。在本文中,我们提出了一种新的优化操作的潮汐泻湖与近政策优化通过统一ML代理。我们以斯旺西湾潮汐泻湖为例,将这项技术与文献中设计的6种不同的运行优化方法(基线)进行了比较。我们表明,我们的方法通过优化涡轮机和水闸的运行策略,成功地实现了发电量的最大化,无论使用何种测试数据,都能通过最先进的优化方法产生具有竞争力的结果,只需进行一次训练,并仅使用测量的海洋数据执行实时灵活的控制。 摘要:Tidal range structures have been considered for large scale electricity generation for their potential ability to produce reasonable predictable energy without the emission of greenhouse gases. Once the main forcing components for driving the tides have deterministic dynamics, the available energy in a given tidal power plant has been estimated, through analytical and numerical optimisation routines, as a mostly predictable event. This constraint imposes state-of-art flexible operation methods to rely on tidal predictions (concurrent with measured data and up to a multiple of half-tidal cycles into the future) to infer best operational strategies for tidal lagoons, with the additional cost of requiring to run optimisation routines for every new tide. In this paper, we propose a novel optimised operation of tidal lagoons with proximal policy optimisation through Unity ML-Agents. We compare this technique with 6 different operation optimisation approaches (baselines) devised from the literature, utilising the Swansea Bay Tidal Lagoon as a case study. We show that our approach is successful in maximising energy generation through an optimised operational policy of turbines and sluices, yielding competitive results with state-of-the-art methods of optimisation, regardless of test data used, requiring training once and performing real-time flexible control with measured ocean data only.
【10】 Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation 标题:低秩特征张量密度估计第二部分:压缩和潜在密度估计
作者:Magda Amiridi,Nikos Kargas,Nicholas D. Sidiropoulos 机构:University of Minnesota; he is now with Amazon 链接:https://arxiv.org/abs/2106.10591 摘要:生成概率模型的学习是机器学习中的一个核心问题,由于维数灾难的存在,它给机器学习带来了巨大的挑战。本文提出了一个联合降维和非参数密度估计框架,使用一种新的估计器,可以显式地捕捉输入数据的适当降维表示的潜在分布。其思想是联合设计一个非线性降维自动编码器,将训练数据建模为一组简约的潜在随机变量,并学习潜在变量在Fourier域联合分布的标准低秩张量模型。提出的潜在密度模型是非参数的和普遍的,而不是预定义的先验假设在变分自动编码器。自动编码器和潜在密度估计器的联合优化通过一个公式来实现,该公式通过最小化潜在域中的负对数似然和自动编码器重建损失的组合来学习两者。我们证明了所提出的模型在玩具、表格和图像数据集的回归任务、采样和异常检测方面取得了很好的效果。 摘要:Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input data. The idea is to jointly design a nonlinear dimensionality reducing auto-encoder to model the training data in terms of a parsimonious set of latent random variables, and learn a canonical low-rank tensor model of the joint distribution of the latent variables in the Fourier domain. The proposed latent density model is non-parametric and universal, as opposed to the predefined prior that is assumed in variational auto-encoders. Joint optimization of the auto-encoder and the latent density estimator is pursued via a formulation which learns both by minimizing a combination of the negative log-likelihood in the latent domain and the auto-encoder reconstruction loss. We demonstrate that the proposed model achieves very promising results on toy, tabular, and image datasets on regression tasks, sampling, and anomaly detection.
【11】 On the benefits of maximum likelihood estimation for Regression and Forecasting 标题:论极大似然估计在回归预测中的效益
作者:Pranjal Awasthi,Abhimanyu Das,Rajat Sen,Ananda Theertha Suresh 机构:Google Research 链接:https://arxiv.org/abs/2106.10370 摘要:我们主张用一种实用的最大似然估计(MLE)方法进行回归和预测,作为对特定目标度量的典型经验风险最小化(ERM)方法的替代方法。这种方法更适合于捕获归纳偏差,例如数据集中的先验领域知识,并且可以在推理时输出事后估计器,从而优化不同类型的目标度量。我们给出的理论结果表明,在某些一般条件下,我们的方法总是与目标度量的任何估计相竞争的,并且在许多实际情况下(如Poisson回归)实际上可以比ERM优越得多。我们的经验证明,我们的方法实例化一个设计良好的通用混合似然家庭可以获得优于ERM的各种任务的时间序列预测和回归数据集不同的数据分布。 摘要:We advocate for a practical Maximum Likelihood Estimation (MLE) approach for regression and forecasting, as an alternative to the typical approach of Empirical Risk Minimization (ERM) for a specific target metric. This approach is better suited to capture inductive biases such as prior domain knowledge in datasets, and can output post-hoc estimators at inference time that can optimize different types of target metrics. We present theoretical results to demonstrate that our approach is always competitive with any estimator for the target metric under some general conditions, and in many practical settings (such as Poisson Regression) can actually be much superior to ERM. We demonstrate empirically that our method instantiated with a well-designed general purpose mixture likelihood family can obtain superior performance over ERM for a variety of tasks across time-series forecasting and regression datasets with different data distributions.
其他神经网络|深度学习|模型|建模(37篇)
【1】 A Discriminative Entity-Aware Language Model for Virtual Assistants 标题:一种区分实体的虚拟助手语言模型
作者:Mandana Saebi,Ernest Pusateri,Aaksha Meghawat,Christophe Van Gysel 机构:University of Notre Dame, Notre Dame, IN, USA, Apple, Cupertino, CA, USA 备注:To appear in Interspeech 2021 链接:https://arxiv.org/abs/2106.11292 摘要:高质量的自动语音识别(ASR)是虚拟助理(VAs)正常工作的关键。但是,ASR在包含命名实体的VA请求上的性能通常很差。在这项工作中,我们从观察命名实体上的许多ASR错误与实际知识不一致开始。我们使用捕捉实体类型实体和实体-实体关系的特征,扩展了以往的区分性n-gram语言建模方法,将知识图(KG)中的真实世界知识结合起来。我们通过一个有效的格重排序过程来应用我们的模型,在一些包含不太流行的实体的综合测试集上实现了超过25%的相对句子错误率降低,而在一个均匀采样的VA测试集上退化最小。 摘要:High-quality automatic speech recognition (ASR) is essential for virtual assistants (VAs) to work well. However, ASR often performs poorly on VA requests containing named entities. In this work, we start from the observation that many ASR errors on named entities are inconsistent with real-world knowledge. We extend previous discriminative n-gram language modeling approaches to incorporate real-world knowledge from a Knowledge Graph (KG), using features that capture entity type-entity and entity-entity relationships. We apply our model through an efficient lattice rescoring process, achieving relative sentence error rate reductions of more than 25% on some synthesized test sets covering less popular entities, with minimal degradation on a uniformly sampled VA test set.
【2】 Can contrastive learning avoid shortcut solutions? 标题:对比学习能避免捷径解吗?
作者:Joshua Robinson,Li Sun,Ke Yu,Kayhan Batmanghelich,Stefanie Jegelka,Suvrit Sra 机构:†Massachusetts Institute of Technology, ∗University of Pittsburgh 备注:Preprint 链接:https://arxiv.org/abs/2106.11230 摘要:通过对比学习获得的表征的泛化关键取决于提取数据的哪些特征。然而,我们观察到对比损失并不总是充分指导提取哪些特征,这种行为可以通过“捷径”对下游任务的性能产生负面影响,即无意中抑制重要的预测特征。我们发现,特征提取受到所谓的实例判别任务(即从相似点对中判别相似点对的任务)难度的影响。尽管较硬的特征对改进了某些特征的表示,但这种改进是以抑制先前良好表示的特征为代价的。作为回应,我们提出了内隐特征修正(IFM),一种改变正样本和负样本的方法,以引导对比模型捕捉更广泛的预测特征。经验上,我们观察到IFM减少了特征抑制,从而提高了视觉和医学成像任务的性能。代码位于:url{https://github.com/joshr17/IFM}. 摘要:The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i.e., by inadvertently suppressing important predictive features. We find that feature extraction is influenced by the difficulty of the so-called instance discrimination task (i.e., the task of discriminating pairs of similar points from pairs of dissimilar ones). Although harder pairs improve the representation of some features, the improvement comes at the cost of suppressing previously well represented features. In response, we propose implicit feature modification (IFM), a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. Empirically, we observe that IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks. The code is available at: url{https://github.com/joshr17/IFM}.
【3】 Effects of boundary conditions in fully convolutional networks for learning spatio-temporal dynamics 标题:全卷积网络中边界条件对时空动力学学习的影响
作者:Antonio Alguacil andr Gonçalves Pinto,Michael Bauerheim,Marc C. Jacob,Stéphane Moreau 机构:C. Jacob, and St´ephane Moreau, Department of Mechanical Engineering, University of Sherbrooke, boul. de, l’Universit´e, Sherbrooke J,K ,R, QC, Canada, Department of Aerodynamics, Energetics and Propulsion, ISAE-Supaero, Avenue, Edouard Belin, Toulouse, France 备注:17 pages, 8 figures, submitted to ECML PKDD 2021 Conference 链接:https://arxiv.org/abs/2106.11160 摘要:边界条件的精确建模是计算物理的关键。神经网络作为物理相关问题的代替品的使用日益增多,这就要求人们更好地理解边界条件处理及其对网络精度的影响。本文研究了在循环任务中应用完全卷积网络的情况下施加边界条件的几种策略(即填充、改进的空间上下文和物理边界的显式编码)。这些策略在两个由偏微分方程模拟的时空演化问题上进行了评估:声波的二维传播(双曲偏微分方程)和热方程(抛物线偏微分方程)。结果表明,在这种经常性任务中,边界实现的精度和稳定性都具有很高的敏感性。然后证明了最佳填充策略的选择与数据语义直接相关。此外,包含额外的输入空间上下文或基于物理的显式规则允许更好地处理边界,特别是对于大量的重复,从而产生更健壮和稳定的神经网络,同时促进此类网络的设计和多功能性。 摘要:Accurate modeling of boundary conditions is crucial in computational physics. The ever increasing use of neural networks as surrogates for physics-related problems calls for an improved understanding of boundary condition treatment, and its influence on the network accuracy. In this paper, several strategies to impose boundary conditions (namely padding, improved spatial context, and explicit encoding of physical boundaries) are investigated in the context of fully convolutional networks applied to recurrent tasks. These strategies are evaluated on two spatio-temporal evolving problems modeled by partial differential equations: the 2D propagation of acoustic waves (hyperbolic PDE) and the heat equation (parabolic PDE). Results reveal a high sensitivity of both accuracy and stability on the boundary implementation in such recurrent tasks. It is then demonstrated that the choice of the optimal padding strategy is directly linked to the data semantics. Furthermore, the inclusion of additional input spatial context or explicit physics-based rules allows a better handling of boundaries in particular for large number of recurrences, resulting in more robust and stable neural networks, while facilitating the design and versatility of such networks.
【4】 Curriculum-Driven Multi-Agent Learning and the Role of Implicit Communication in Teamwork 标题:课程驱动的多Agent学习与隐性沟通在团队合作中的作用
作者:Niko A. Grupen,Daniel D. Lee,Bart Selman 机构: Cornell University, Cornell Tech 备注:18 pages, 10 figures 链接:https://arxiv.org/abs/2106.11156 摘要:我们提出了一个课程驱动的学习策略来解决多智能体协调任务。我们的方法受到动物通信研究的启发,这表明两个简单的设计特征(相互奖励和分散)在本质上支持广泛的通信协议。我们强调了将紧急通信类似地解释为频谱的重要性。我们引入了一个环形的,连续的空间追踪回避环境,并证明了单纯的分散学习并不能很好地执行。在此基础上,提出了一种新的课程驱动的多智能体学习策略。对追踪规避的实验表明,我们的方法能够使分散的追踪者学会协调和捕获一个优秀的规避者,显著优于复杂的分析策略。我们通过额外的定量分析(包括基于影响力的措施,如即时协调)认为,紧急内隐沟通在实现更高水平的协调方面发挥着重要作用。 摘要:We propose a curriculum-driven learning strategy for solving difficult multi-agent coordination tasks. Our method is inspired by a study of animal communication, which shows that two straightforward design features (mutual reward and decentralization) support a vast spectrum of communication protocols in nature. We highlight the importance of similarly interpreting emergent communication as a spectrum. We introduce a toroidal, continuous-space pursuit-evasion environment and show that naive decentralized learning does not perform well. We then propose a novel curriculum-driven strategy for multi-agent learning. Experiments with pursuit-evasion show that our approach enables decentralized pursuers to learn to coordinate and capture a superior evader, significantly outperforming sophisticated analytical policies. We argue through additional quantitative analysis -- including influence-based measures such as Instantaneous Coordination -- that emergent implicit communication plays a large role in enabling superior levels of coordination.
【5】 Decadal Forecasts with ResDMD: a Residual DMD Neural Network 标题:基于ResDMD的年代际预报:一种残差DMD神经网络
作者:Eduardo Rodrigues,Bianca Zadrozny,Campbell Watson,David Gold 备注:Accepted to ICML 2021 Workshop Tackling Climate Change with Machine Learning 链接:https://arxiv.org/abs/2106.11111 摘要:运营预测中心正在投资十年(1-10年)预测系统,以支持更具气候适应能力的社会的长期决策。以前采用的一种方法是动态模式分解(DMD)算法,也称为线性逆模型,它将线性动态模型与数据相匹配。虽然DMD通常将真实动力学中的非线性项近似为随机噪声的线性系统,但是我们研究了DMD的一个扩展,它将非线性项显式地表示为神经网络。我们的权值初始化允许网络在训练前产生合理的结果,然后在训练后当数据可用时改进预测。在这篇短文中,我们评估了所提出的模拟全球海面温度的架构,并将其结果与最先进的动态模式CFSv2所产生的标准DMD和季节预报进行了比较。 摘要:Operational forecasting centers are investing in decadal (1-10 year) forecast systems to support long-term decision making for a more climate-resilient society. One method that has previously been employed is the Dynamic Mode Decomposition (DMD) algorithm - also known as the Linear Inverse Model - which fits linear dynamical models to data. While the DMD usually approximates non-linear terms in the true dynamics as a linear system with random noise, we investigate an extension to the DMD that explicitly represents the non-linear terms as a neural network. Our weight initialization allows the network to produce sensible results before training and then improve the prediction after training as data becomes available. In this short paper, we evaluate the proposed architecture for simulating global sea surface temperatures and compare the results with the standard DMD and seasonal forecasts produced by the state-of-the-art dynamical model, CFSv2.
【6】 Analytically Tractable Bayesian Deep Q-Learning 标题:分析易处理的贝叶斯深度Q-学习
作者:Luong Ha,Nguyen,James-A. Goulet 机构:Department of Civil, Geologic and Mining Engineering, Polytechnique Montr´eal, CANADA 备注:19 pages, 4 figures 链接:https://arxiv.org/abs/2106.11086 摘要:强化学习(RL)自从使用深度Q-learning(DQN)证明它能够在视频游戏基准测试中达到人类的表现以来,受到了越来越多的关注。在这种复杂环境下训练神经网络的共识是依赖于基于梯度的优化。尽管存在替代的贝叶斯深度学习方法,但大多数方法仍然依赖于基于梯度的优化,并且它们通常不基于基准(如Atari游戏环境)进行扩展。此外,这些方法都不允许对定义神经网络的权重和偏差进行分析推断。在这篇文章中,我们提出如何调整时间差分Q-学习框架,使之与可处理的近似高斯推理(TAGI)兼容,后者允许使用封闭形式的分析方法来学习神经网络的参数。通过对开策略和关策略强化学习方法的实验,我们证明了在使用较少的超参数和不依赖基于梯度的优化的情况下,TAGI可以达到与反向传播训练网络相当的性能。 摘要:Reinforcement learning (RL) has gained increasing interest since the demonstration it was able to reach human performance on video game benchmarks using deep Q-learning (DQN). The current consensus for training neural networks on such complex environments is to rely on gradient-based optimization. Although alternative Bayesian deep learning methods exist, most of them still rely on gradient-based optimization, and they typically do not scale on benchmarks such as the Atari game environment. Moreover none of these approaches allow performing the analytical inference for the weights and biases defining the neural network. In this paper, we present how we can adapt the temporal difference Q-learning framework to make it compatible with the tractable approximate Gaussian inference (TAGI), which allows learning the parameters of a neural network using a closed-form analytical method. Throughout the experiments with on- and off-policy reinforcement learning approaches, we demonstrate that TAGI can reach a performance comparable to backpropagation-trained networks while using fewer hyperparameters, and without relying on gradient-based optimization.
【7】 Improving Multi-Modal Learning with Uni-Modal Teachers 标题:用单模态教师促进多模态学习
作者:Chenzhuang Du,Tingle Li,Yichen Liu,Zixin Wen,Tianyu Hua,Yue Wang,Hang Zhao 机构:IIIS, Tsinghua University, UIBE Beijing, Massachusetts Institute of Technology, Shanghai Qi Zhi Institute 链接:https://arxiv.org/abs/2106.11059 摘要:学习多模态表示是机器人实际应用的一个重要步骤,各种多模态融合模型已经被开发出来。然而,我们观察到,现有的模式,其目标大多是基于联合训练,往往遭受学习每一个模式的劣势代表。我们将这一问题命名为模态失效,并假设融合方法中模态的不平衡性和共同目标的内隐偏差使得每个模态的编码器无法进行充分的特征学习。为此,本文提出了一种新的多模态学习方法&单峰教师法,将融合目标和单峰提取相结合来解决模态失效问题。结果表明,该方法不仅大大提高了各模态的表示,而且提高了多模态任务的整体性能。我们的方法可以有效地推广到大多数多模态融合方法。我们在VGGSound视听分类任务上实现了3%以上的改进,在NYU depth V2 RGB-D图像分割任务上也实现了3%以上的性能改进。 摘要:Learning multi-modal representations is an essential step towards real-world robotic applications, and various multi-modal fusion models have been developed for this purpose. However, we observe that existing models, whose objectives are mostly based on joint training, often suffer from learning inferior representations of each modality. We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning. To this end, we propose a new multi-modal learning method, Uni-Modal Teacher, which combines the fusion objective and uni-modal distillation to tackle the modality failure problem. We show that our method not only drastically improves the representation of each modality, but also improves the overall multi-modal task performance. Our method can be effectively generalized to most multi-modal fusion approaches. We achieve more than 3% improvement on the VGGSound audio-visual classification task, as well as improving performance on the NYU depth V2 RGB-D image segmentation task.
【8】 Leveraging Language to Learn Program Abstractions and Search Heuristics 标题:利用语言学习程序抽象和搜索启发式
作者:Catherine Wong,Kevin Ellis,Joshua B. Tenenbaum,Jacob Andreas 机构: a framework for improving the ef-ficiency and generalizability of learned program synthesis 1MIT 2Cornell University 3Center for Brains 备注:appeared in Thirty-eighth International Conference on Machine Learning (ICML 2021) 链接:https://arxiv.org/abs/2106.11053 摘要:归纳程序综合,或从期望行为的例子中推断程序,为建立可解释的、健壮的和可推广的机器学习系统提供了一个通用的范例。有效的程序综合取决于两个关键要素:一个强大的函数库,从中生成程序;一个高效的搜索策略,用于查找解决给定任务的程序。我们介绍了LAPS(Language for Abstraction and Program Search),一种使用自然语言注释来指导库的联合学习的技术,以及用于合成的神经引导搜索模型。当集成到最先进的库学习系统(DreamCoder)中时,LAPS生成更高质量的库,并在字符串编辑、图像合成和场景抽象推理三个领域提高搜索效率和泛化能力,即使在测试时没有可用的自然语言提示。 摘要:Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains -- string editing, image composition, and abstract reasoning about scenes -- even when no natural language hints are available at test time.
【9】 Know Your Model (KYM): Increasing Trust in AI and Machine Learning 标题:了解你的模型(KIM):增加对人工智能和机器学习的信任
作者:Mary Roszel,Robert Norvill,Jean Hilger,Radu State 机构:University of Luxembourg, Banque et Caisse d’Epargne de l’Etat 备注:10 pages 链接:https://arxiv.org/abs/2106.11036 摘要:人工智能系统的广泛应用引起了人们对此类系统对社会的潜在影响的关注。特别值得关注的是预测错误可能对现实世界场景产生的后果,以及人类对人工智能系统的信任。有必要了解我们如何评估人工智能的可信性,以及个人和实体如何开发可信性人工智能系统。在本文中,我们分析了每一个可信性要素,并提供了一套20条准则,这些准则可用于确保最佳人工智能功能,同时考虑到对人类更大的道德、技术和实际影响。此外,这些指导方针有助于确保可信度是可证明和可证明的,它们与实现无关,并且可以应用于任何部门的任何人工智能系统。 摘要:The widespread utilization of AI systems has drawn attention to the potential impacts of such systems on society. Of particular concern are the consequences that prediction errors may have on real-world scenarios, and the trust humanity places in AI systems. It is necessary to understand how we can evaluate trustworthiness in AI and how individuals and entities alike can develop trustworthy AI systems. In this paper, we analyze each element of trustworthiness and provide a set of 20 guidelines that can be leveraged to ensure optimal AI functionality while taking into account the greater ethical, technical, and practical impacts to humanity. Moreover, the guidelines help ensure that trustworthiness is provable and can be demonstrated, they are implementation agnostic, and they can be applied to any AI system in any sector.
【10】 Friendly Training: Neural Networks Can Adapt Data To Make Learning Easier 标题:友好的训练:神经网络可以调整数据,使学习变得更容易
作者:Simone Marullo,Matteo Tiezzi,Marco Gori,Stefano Melacci 机构:∗Dept. of Information Engineering, University of Florence, Florence, Italy, †Dept. of Information Engineering and Mathematics, University of Siena, Siena, Italy, ‡MAASAI, Universite Cˆote d’Azur, Nice, France 备注:9 pages, 5 figures 链接:https://arxiv.org/abs/2106.10974 摘要:近十年来,在深度学习成功的推动下,科学界提出了几种方法来提高神经网络的学习效率。当我们关注如何将训练数据提供给学习机时,我们可以区分基于随机梯度优化的经典随机选择和更复杂的设计课程来组织数据的技术,并逐渐增加训练集的复杂性。在本文中,我们提出了一种新的训练方法,称为友好训练,不同于上述方法,它涉及改变训练实例,以帮助模型更好地满足其学习标准。该模型可以简化那些在训练过程的某个阶段很难分类的例子。数据转换由一个发展计划控制,该计划在训练期间逐步减少其影响,直到其完全消失。在某种意义上,这与通常所做的相反,目的是增强对抗性示例的健壮性,即对抗性训练。在多个数据集上进行的实验表明,友好的训练可以改善已知数据子选择例程和随机选择,特别是在深度卷积结构中。结果表明,调整输入数据是稳定学习和提高网络泛化能力的可行方法。 摘要:In the last decade, motivated by the success of Deep Learning, the scientific community proposed several approaches to make the learning procedure of Neural Networks more effective. When focussing on the way in which the training data are provided to the learning machine, we can distinguish between the classic random selection of stochastic gradient-based optimization and more involved techniques that devise curricula to organize data, and progressively increase the complexity of the training set. In this paper, we propose a novel training procedure named Friendly Training that, differently from the aforementioned approaches, involves altering the training examples in order to help the model to better fulfil its learning criterion. The model is allowed to simplify those examples that are too hard to be classified at a certain stage of the training procedure. The data transformation is controlled by a developmental plan that progressively reduces its impact during training, until it completely vanishes. In a sense, this is the opposite of what is commonly done in order to increase robustness against adversarial examples, i.e., Adversarial Training. Experiments on multiple datasets are provided, showing that Friendly Training yields improvements with respect to informed data sub-selection routines and random selection, especially in deep convolutional architectures. Results suggest that adapting the input data is a feasible way to stabilize learning and improve the generalization skills of the network.
【11】 A Game-Theoretic Taxonomy of Visual Concepts in DNNs 标题:DNNs中视觉概念的博弈论分类
作者:Xu Cheng,Chuntung Chu,Yi Zheng,Jie Ren,Quanshi Zhang 机构:Shanghai Jiao Tong University, University of Toronto, China University of Mining & Technology 备注:12 pages 链接:https://arxiv.org/abs/2106.10938 摘要:本文从一个新的角度,即图像中像素之间的博弈论多阶交互,重新思考了DNN如何编码不同复杂性的视觉概念。除了对象的分类法和纹理和形状的认知分类法之外,我们还提供了一种新的视觉概念分类法,帮助我们从概念复杂性的角度解释形状和纹理的编码。这样,基于多阶相互作用,我们发现了DNNs编码纹理的三种不同的信号处理行为。此外,我们还发现DNN编码形状的灵活性低于编码纹理的灵活性。此外,我们还分析了DNNs如何对异常样本进行编码,并探讨了网络结构对交互的影响。此外,我们还阐明了多阶交互在实际应用中的关键作用。该代码将在论文被接受时发布。 摘要:In this paper, we rethink how a DNN encodes visual concepts of different complexities from a new perspective, i.e. the game-theoretic multi-order interactions between pixels in an image. Beyond the categorical taxonomy of objects and the cognitive taxonomy of textures and shapes, we provide a new taxonomy of visual concepts, which helps us interpret the encoding of shapes and textures, in terms of concept complexities. In this way, based on multi-order interactions, we find three distinctive signal-processing behaviors of DNNs encoding textures. Besides, we also discover the flexibility for a DNN to encode shapes is lower than the flexibility of encoding textures. Furthermore, we analyze how DNNs encode outlier samples, and explore the impacts of network architectures on interactions. Additionally, we clarify the crucial role of the multi-order interactions in real-world applications. The code will be released when the paper is accepted.
【12】 Approximation capabilities of measure-preserving neural networks 标题:保测度神经网络的逼近能力
作者:Aiqing Zhu,Pengzhan Jin,Yifa Tang 机构:LSEC, ICMSEC, Academy of Mathematics and Systems Science, Chinese, Academy of Sciences, Beijing , China, School of Mathematical Sciences, University of Chinese Academy of Sciences 链接:https://arxiv.org/abs/2106.10911 摘要:保测度神经网络是一种发展良好的可逆模型,但其逼近能力尚不清楚。本文严格地建立了用保测度神经网络逼近保测度映射的一般充分条件。结果表明,对于压缩的$UsubsetR^D$和$Dgeq 2$,每一个保测度映射$psi:UtoR^D$内射有界都可以用保测度神经网络在$L^p$-范数下逼近。具体地说,具有雅可比矩阵$pm1$行列式的可微映射是保测度的、内射的和有界的,因而具有逼近性质。 摘要:Measure-preserving neural networks are well-developed invertible models, however, the approximation capabilities remain unexplored. This paper rigorously establishes the general sufficient conditions for approximating measure-preserving maps using measure-preserving neural networks. It is shown that for compact $U subset R^D$ with $Dgeq 2$, every measure-preserving map $psi: Uto R^D$ which is injective and bounded can be approximated in the $L^p$-norm by measure-preserving neural networks. Specifically, the differentiable maps with $pm 1$ determinants of Jacobians are measure-preserving, injective and bounded on $U$, thus hold the approximation property.
【13】 Multirate Training of Neural Networks 标题:神经网络的多速率训练
作者:Tiffany Vlaar,Benedict Leimkuhler 机构:Department of Mathematics, University of Edinburgh 链接:https://arxiv.org/abs/2106.10771 摘要:我们提出了神经网络的多速率训练:将神经网络参数分为“快”和“慢”两部分,用不同的学习速率同时训练。通过选择适当的划分,我们可以获得大的计算速度为转移学习任务。我们证明,对于视觉和自然语言处理中的各种转移学习应用,我们可以在几乎一半的时间内对深度神经网络进行微调,而不会降低所得到模型的泛化性能。我们还讨论了在从头开始训练神经网络的情况下,有利于提高泛化性能的神经网络参数的其他分裂选择。最后,我们提出了一种额外的多速率技术,它可以通过在不同时间尺度上同时训练整个网络来学习数据中的不同特征。对于图像数据的ResNet体系结构,说明了使用这种方法的好处。本文揭示了利用多速率技术进行神经网络训练的潜力,并为今后这方面的工作提供了许多出发点。 摘要:We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained simultaneously using different learning rates. By choosing appropriate partitionings we can obtain large computational speed-ups for transfer learning tasks. We show that for various transfer learning applications in vision and NLP we can fine-tune deep neural networks in almost half the time, without reducing the generalization performance of the resulting model. We also discuss other splitting choices for the neural network parameters which are beneficial in enhancing generalization performance in settings where neural networks are trained from scratch. Finally, we propose an additional multirate technique which can learn different features present in the data by training the full network on different time scales simultaneously. The benefits of using this approach are illustrated for ResNet architectures on image data. Our paper unlocks the potential of using multirate techniques for neural network training and provides many starting points for future work in this area.
【14】 On Stein Variational Neural Network Ensembles 标题:关于Stein变分神经网络集成
作者:Francesco D'Angelo,Vincent Fortuin,Florian Wenzel 机构:ETH Zürich, Zürich, Switzerland, Humboldt University of Berlin, Berlin, Germany 链接:https://arxiv.org/abs/2106.10760 摘要:深层神经网络的集合最近取得了巨大的成功,但它们并没有提供一个恰当的贝叶斯证明。此外,虽然它们允许对多个假设的预测进行平均,但它们不能保证它们的多样性,从而导致函数空间中的冗余解。相比之下,基于粒子的推理方法,如Stein变分梯度下降(SVGD),提供了一个贝叶斯框架,但依赖于核的选择来度量集合成员之间的相似性。在这项工作中,我们研究了不同的SVGD方法在权值空间,函数空间和混合设置中的操作直接在神经网络函数上定义核似乎有希望克服深层集成的限制然而,在保持SVGD理论保证的同时,确保函数空间的多样性并非易事在这项工作中,我们提供了一个在权重空间和函数空间不同的集合和SVGD方法的概述,并提出了新的和评估其理论和实证性质的合成和现实世界的任务。我们比较了SVGD方法和其他基于集成的方法的理论特性,并评估了它们在合成任务和实际任务中的实证性能。我们发现使用函数核和混合核的SVGD可以克服深系综的限制。它改进了函数多样性和不确定性估计,更接近真实贝叶斯后验概率。此外,我们还表明,与标准的确定性更新相比,使用随机SVGD更新可以进一步提高性能。 摘要:Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. % Defining the kernel directly on the neural network functions seems promising to overcome the limitations of deep ensembles. % However, ensuring diversity in function space while maintaining SVGD's theoretical guarantees is not trivial. % In this work, we provide an overview over different ensembling and SVGD methods in weight space and function space and propose new and assess their theoretical and empirical properties on synthetic and real-world tasks. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.
【15】 Robust Regression via Model Based Methods 标题:基于模型方法的稳健回归
作者:Armin Moharrer,Khashayar Kamran,Edmund Ye,Stratis Ioannidis 机构:Northeastern University, Boston MA , USA 链接:https://arxiv.org/abs/2106.10759 摘要:均方误差损失在许多应用中得到了广泛的应用,包括自动编码器、多目标回归和矩阵分解等。尽管由于其可微性而具有计算上的优势,但它对异常值不具有鲁棒性。相比之下,l_p范数已知是鲁棒的,但是不能通过例如随机梯度下降来优化,因为它们是不可微的。我们提出了一种受基于模型的优化(MBO)启发的算法[35,36],该算法用凸模型函数代替非凸目标,并在优化模型函数和更新解之间交替进行。我们将其应用于稳健回归,提出了在线交替方向乘数法(OADM)的随机变量SADM[50]来解决MBO中的内部优化问题。我们证明了SADM收敛速度为O(logt/T)。最后,我们通过实验证明了(a)lp范数对异常值的鲁棒性和(b)我们提出的基于模型的算法与基于自编码器和多目标回归的梯度方法相比的效率。 摘要:The mean squared error loss is widely used in many applications, including auto-encoders, multi-target regression, and matrix factorization, to name a few. Despite computational advantages due to its differentiability, it is not robust to outliers. In contrast, l_p norms are known to be robust, but cannot be optimized via, e.g., stochastic gradient descent, as they are non-differentiable. We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-convex objective with a convex model function and alternates between optimizing the model function and updating the solution. We apply this to robust regression, proposing SADM, a stochastic variant of the Online Alternating Direction Method of Multipliers (OADM) [50] to solve the inner optimization in MBO. We show that SADM converges with the rate O(log T/T). Finally, we demonstrate experimentally (a) the robustness of l_p norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with gradient methods on autoencoders and multi-target regression.
【16】 On the Cryptographic Hardness of Learning Single Periodic Neurons 标题:关于学习单周期神经元的密码难度
作者:Min Jae Song,Ilias Zadik,Joan Bruna 机构:Courant Institute of Mathematical Sciences, New York University, New York, Center for Data Science, New York University, New York 备注:54 pages 链接:https://arxiv.org/abs/2106.10744 摘要:我们给出了一个简单的简化算法,证明了在噪声存在的情况下,在各向同性高斯分布上学习单个周期神经元的密码困难性。更确切地说,我们的减少表明,任何多项式时间算法(不一定是基于梯度的)在小噪声下学习这样的函数意味着一个多项式时间量子算法用于求解最坏情况的格点问题,其硬度是基于格的密码学的基础。我们的核心硬函数族,由一层神经网络很好地逼近,采用一元周期函数的一般形式应用于数据的仿射投影。这些函数出现在以前的开创性工作中,证明了它们对基于梯度(Shamir'18)和统计查询(SQ)算法(Song et al'17)的硬度。我们证明,如果(多项式)小噪声被添加到标签上,在上述密码假设下,学习这些函数的困难性适用于所有多项式时间算法。此外,我们还通过设计一个多项式时间算法,在指数小的对抗性噪声下学习某些函数族,证明了在硬度结果中引入噪声的必要性。我们提出的算法不是基于梯度或SQ的算法,而是基于著名的Lenstra-Lenstra-Lov'asz(LLL)格基约简算法。此外,在没有噪声的情况下,该算法可直接用于解决CLWE检测(Bruna等人21)和相位恢复,最佳样本复杂度为$d 1$样本。在前一种情况下,这改进了(Bruna等人,21)中所要求的二次In-$d$样本复杂性。在后一种情况下,这改进了最先进的基于放大器的算法,该算法需要大约1.128d$个样本(Barbier等人,19)。 摘要:We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lov'asz (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.'21) and phase retrieval with an optimal sample complexity of $d 1$ samples. In the former case, this improves upon the quadratic-in-$d$ sample complexity required in (Bruna et al.'21). In the latter case, this improves upon the state-of-the-art AMP-based algorithm, which requires approximately $1.128d$ samples (Barbier et al.'19).
【17】 Neighborhood Contrastive Learning for Novel Class Discovery 标题:邻域对比学习在新颖类发现中的应用
作者:Zhun Zhong,Enrico Fini,Subhankar Roy,Zhiming Luo,Elisa Ricci,Nicu Sebe 机构:University of Trento, Xiamen University ,Fondazione Bruno Kessler 备注:CVPR 2021 链接:https://arxiv.org/abs/2106.10731 摘要:在本文中,我们讨论了新类发现(NCD),即在给定已知类的标记数据集的一组未标记样本中发现新类的任务。我们利用NCD的特点建立了一个新的框架,称为邻域对比学习(NCL),用来学习对聚类性能有重要影响的区分性表示。我们的贡献是双重的。首先,我们发现在标记集上训练的特征提取器生成表示,其中泛型查询样本及其邻居可能共享同一类。我们利用这一观察结果,通过对比学习来检索和聚合伪正对,从而鼓励模型学习更多的区分性表征。其次,我们注意到大多数实例很容易被网络所识别,对对比损失的贡献较小。为了克服这个问题,我们提出在特征空间中混合标记样本和未标记样本来产生硬底片。我们通过实验证明了这两个因素对聚类性能的显著贡献,并使我们的模型大大优于最先进的方法(例如,在CIFAR-100上聚类准确率 13%,在ImageNet上聚类准确率 8%)。 摘要:In this paper, we address Novel Class Discovery (NCD), the task of unveiling new classes in a set of unlabeled samples given a labeled dataset with known classes. We exploit the peculiarities of NCD to build a new framework, named Neighborhood Contrastive Learning (NCL), to learn discriminative representations that are important to clustering performance. Our contribution is twofold. First, we find that a feature extractor trained on the labeled set generates representations in which a generic query sample and its neighbors are likely to share the same class. We exploit this observation to retrieve and aggregate pseudo-positive pairs with contrastive learning, thus encouraging the model to learn more discriminative representations. Second, we notice that most of the instances are easily discriminated by the network, contributing less to the contrastive loss. To overcome this issue, we propose to generate hard negatives by mixing labeled and unlabeled samples in the feature space. We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin (e.g., clustering accuracy 13% on CIFAR-100 and 8% on ImageNet).
【18】 Machine learning in the social and health sciences 标题:社会科学和健康科学中的机器学习
作者:Anja K. Leist,Matthias Klee,Jung Hyun Kim,David H. Rehkopf,Stéphane P. A. Bordas,Graciela Muniz-Terrera,Sara Wade 机构:Affiliations, University of Luxembourg, Department of Social Sciences – Institute for Research on Socio-Economic, Inequality (IRSEI), Esch-sur-Alzette, LU, Stanford University, Department of Epidemiology and Population Health, Palo Alto, CA 链接:https://arxiv.org/abs/2106.10716 摘要:机器学习(ML)方法在社会和健康科学中的应用相当缓慢,使用ML解决社会和健康研究问题的研究仍然零散。这可能是由于计算/数据与社会和健康科学的研究分开发展,以及缺乏对非数据科学研究人员的可访问概述和适当的ML技术训练。本文提供了一个元映射的研究问题,在社会和健康科学的适当ML方法,通过纳入必要的要求,统计分析在这些学科。我们将已建立的分类映射为描述、预测和因果推断,以实现共同的研究目标,例如估计不良健康或社会结果的患病率、预测事件的风险以及识别不良结果的风险因素或原因。这种元映射旨在克服学科障碍,并在社会和健康科学的研究人员和方法论训练的研究人员之间展开流畅的对话。这种映射也可能有助于充分利用ML的好处,同时考虑与社会和健康科学相关的领域特定方面,并有望有助于加速ML应用的吸收,以推进基础和应用社会和健康科学研究。 摘要:The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-mapping of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-mapping aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such mapping may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.
【19】 Memory Augmented Optimizers for Deep Learning 标题:用于深度学习的记忆增强型优化器
作者:Paul-Aymeric McRae,Prasanna Parthasarathi,Mahmoud Assran,Sarath Chandar 机构:Mila - Quebec AI Institute, Canada, McGill University, Canada, École Polytechnique de Montréal, Canada, Canada CIFAR AI Chair 备注:24 Pages. Currently under review 链接:https://arxiv.org/abs/2106.10708 摘要:在数据驱动的学习中,减少损失的流行方法通常涉及对梯度历史的抽象或显式保留,以实现有效的参数更新。渐变的聚合历史会将参数更新推向正确的方向,即使在任何给定步骤的渐变都没有信息的情况下也是如此。尽管在理论和实践中,用元参数概括或显式存储在内存中的梯度历史已经被证明是有效的,但是在确定参数更新时,历史中的梯度是全部$还是只有一个子集是足够的问题仍然没有答案。在本文中,我们提出了一个框架的记忆增强梯度下降优化器,保留一个有限的观点,他们的梯度历史在他们的内存。这样的优化器可以很好地扩展到大型的实际数据集,我们的实验表明,标准优化器的内存扩充扩展在我们考虑的大多数计算机视觉和语言任务上具有加速的收敛性和改进的性能。此外,我们证明了所提出的一类具有固定大小内存的优化器在强凸性的假设下收敛,无论选择哪种梯度或如何线性组合形成更新步骤。 摘要:Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the right direction even when the gradients at any given step are not informative. Although the history of gradients summarized in meta-parameters or explicitly stored in memory has been shown effective in theory and practice, the question of whether $all$ or only a subset of the gradients in the history are sufficient in deciding the parameter updates remains unanswered. In this paper, we propose a framework of memory-augmented gradient descent optimizers that retain a limited view of their gradient history in their internal memory. Such optimizers scale well to large real-life datasets, and our experiments show that the memory augmented extensions of standard optimizers enjoy accelerated convergence and improved performance on a majority of computer vision and language tasks that we considered. Additionally, we prove that the proposed class of optimizers with fixed-size memory converge under assumptions of strong convexity, regardless of which gradients are selected or how they are linearly combined to form the update step.
【20】 A compressive multi-kernel method for privacy-preserving machine learning 标题:一种用于隐私保护的压缩多核机器学习方法
作者:Thee Chanyaswad,J. Morris Chang,S. Y. Kung 机构:Department of, Electrical Engineeering, Princeton University, Princeton, New Jersey , University of South Florida, Tampa, Florida , S.Y. Kung 备注:Published in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017 链接:https://arxiv.org/abs/2106.10671 摘要:随着分析工具越来越强大,每天生成的数据越来越多,数据隐私问题也随之产生。这导致了对隐私保护机器学习算法设计的研究。考虑到效用最大化和隐私损失最小化这两个目标,本工作基于两个先前不相交的机制——压缩隐私和多核方法。压缩隐私(Compressive Privacy)是一种隐私框架,它采用实用的有损编码方案来保护数据的隐私,而多核方法是一种基于核的机器学习机制,它探索了使用多核来构建更好的预测器的思想。提出的压缩多核方法分为两个阶段:压缩阶段和多核阶段。压缩阶段遵循压缩隐私范式来提供所需的隐私保护。每个核矩阵被压缩一个有损投影矩阵来自判别成分分析(DCA)。多核级利用每个核的信噪比得分对多个压缩核进行非均匀组合。该方法在两个移动感知数据集MHEALTH和HAR上进行了评估,其中活动识别定义为效用,个人识别定义为隐私。结果表明,在所有实验中,由于隐私分类的准确率几乎都处于随机猜测水平,因此压缩机制在隐私保护方面是成功的。另一方面,新的基于信噪比的多核方法在两种数据集的分类精度上都比现有方法有所提高。这些结果为隐私保护机器学习的研究指明了方向。 摘要:As the analytic tools become more powerful, and more data are generated on a daily basis, the issue of data privacy arises. This leads to the study of the design of privacy-preserving machine learning algorithms. Given two objectives, namely, utility maximization and privacy-loss minimization, this work is based on two previously non-intersecting regimes -- Compressive Privacy and multi-kernel method. Compressive Privacy is a privacy framework that employs utility-preserving lossy-encoding scheme to protect the privacy of the data, while multi-kernel method is a kernel based machine learning regime that explores the idea of using multiple kernels for building better predictors. The compressive multi-kernel method proposed consists of two stages -- the compression stage and the multi-kernel stage. The compression stage follows the Compressive Privacy paradigm to provide the desired privacy protection. Each kernel matrix is compressed with a lossy projection matrix derived from the Discriminant Component Analysis (DCA). The multi-kernel stage uses the signal-to-noise ratio (SNR) score of each kernel to non-uniformly combine multiple compressive kernels. The proposed method is evaluated on two mobile-sensing datasets -- MHEALTH and HAR -- where activity recognition is defined as utility and person identification is defined as privacy. The results show that the compression regime is successful in privacy preservation as the privacy classification accuracies are almost at the random-guess level in all experiments. On the other hand, the novel SNR-based multi-kernel shows utility classification accuracy improvement upon the state-of-the-art in both datasets. These results indicate a promising direction for research in privacy-preserving machine learning.
【21】 Practical Assessment of Generalization Performance Robustness for Deep Networks via Contrastive Examples 标题:基于对比实例的深度网络泛化性能健壮性实用评估
作者:Xuanyu Wu,Xuhong Li,Haoyi Xiong,Xiao Zhang,Siyu Huang,Dejing Dou 机构:University of Pennsylvania, Philadelphia, USA, Baidu Research, Baidu Inc., China, Tsinghua University, Beijing, China, Nanyang Technological University, Singapore 链接:https://arxiv.org/abs/2106.10653 摘要:为了补充深度神经网络(DNNs)泛化性能评价的测试集,提出了用数据变换的训练图像作为对比示例。在这项工作中,我们提出了一个实用的框架ContRE(“ContRE”一词在法语中的意思是“反对”或“反对”),它使用对比示例来进行DNN泛化性能评估。具体而言,ContRE遵循对比学习的假设,即具有良好泛化性能的稳健DNN模型能够在不同的数据转换下从同一幅图像中提取一致的特征集并做出一致的预测。在训练集上结合一组设计良好的数据转换随机策略,ContRE在生成的对比示例上采用分类错误和Fisher比率来评估和分析深度模型的泛化性能,并辅以测试集。为了证明控制的有效性和效率,在三个开源基准数据集上使用各种DNN模型进行了广泛的实验,并进行了深入的烧蚀研究和适用性分析。我们的实验结果证实:(1)深度模型在对比实例上的行为与测试集上的内容有很强的相关性;(2)ContRE是一种在不同环境下对测试集的泛化性能进行补充的稳健度量。 摘要:Training images with data transformations have been suggested as contrastive examples to complement the testing set for generalization performance evaluation of deep neural networks (DNNs). In this work, we propose a practical framework ContRE (The word "contre" means "against" or "versus" in French.) that uses Contrastive examples for DNN geneRalization performance Estimation. Specifically, ContRE follows the assumption in contrastive learning that robust DNN models with good generalization performance are capable of extracting a consistent set of features and making consistent predictions from the same image under varying data transformations. Incorporating with a set of randomized strategies for well-designed data transformations over the training set, ContRE adopts classification errors and Fisher ratios on the generated contrastive examples to assess and analyze the generalization performance of deep models in complement with a testing set. To show the effectiveness and the efficiency of ContRE, extensive experiments have been done using various DNN models on three open source benchmark datasets with thorough ablation studies and applicability analyses. Our experiment results confirm that (1) behaviors of deep models on contrastive examples are strongly correlated to what on the testing set, and (2) ContRE is a robust measure of generalization performance complementing to the testing set in various settings.
【22】 Large-Scale Network Embedding in Apache Spark 标题:阿帕奇电光中的大规模网络嵌入
作者:Wenqing Lin 机构:Interactive Entertainment Group, Tencent, Shenzhen, Guangdong, China 备注:Accepted in KDD 2021 链接:https://arxiv.org/abs/2106.10620 摘要:Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that (i) computation on graphs is often costly and (ii) the size of graph or the intermediate results of vectors could be prohibitively large, rendering it difficult to be processed on a single machine. In this paper, we propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark, which recursively partitions a graph into several small-sized subgraphs to capture the internal and external structural information of nodes, and then computes the network embedding for each subgraph in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain the embeddings of nodes in a linear cost. After that, we demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches. Besides, it achieves up to $4.25%$ and $4.27%$ improvements on link prediction and node classification tasks respectively. In the end, we deploy the proposed algorithms in two online games of Tencent with the applications of friend recommendation and item recommendation, which improve the competitors by up to $91.11%$ in running time and up to $12.80%$ in the corresponding evaluation metrics.
【23】 Cogradient Descent for Dependable Learning 标题:可靠学习的认知下降
作者:Runqi Wang,Baochang Zhang,Li'an Zhuo,Qixiang Ye,David Doermann 机构:School of Automation Science and Electrical Engineering, Beihang University, Beijing, China, Li’an Zhuo, School of Electronic.Electrical and Communication Engineering, University of Chinese Academy and Sciences, University at Buffalo, Buffalo, USA, Editor: 备注:arXiv admin note: substantial text overlap with arXiv:2006.09142 链接:https://arxiv.org/abs/2106.10617 摘要:Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative. Treating the coupled variables independently while ignoring the interaction, however, leads to an insufficient optimization for bilinear models. In this paper, we propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem, providing a systematic way to coordinate the gradients of coupling variables based on a kernelized projection function. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint, as often occurs in modern learning paradigms. CoGD can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs) and improve the model capacity. CoGD is applied in representative bilinear problems, including image reconstruction, image inpainting, network pruning and CNN training. Extensive experiments show that CoGD improves the state-of-the-arts by significant margins. Code is available at {https://github.com/bczhangbczhang/CoGD}.
【24】 Heterogeneous Multi-task Learning with Expert Diversity 标题:具有专家多样性的异构多任务学习
作者:Raquel Aoki,Frederick Tung,Gabriel L. Oliveira 机构:Simon Fraser University, Burnaby, Canada, Borealis AI, Vancouver, Canada 备注:10 pages, 7 figures, BIOKDD 链接:https://arxiv.org/abs/2106.10595 摘要:Predicting multiple heterogeneous biological and medical targets is a challenge for traditional deep learning models. In contrast to single-task learning, in which a separate model is trained for each target, multi-task learning (MTL) optimizes a single model to predict multiple related targets simultaneously. To address this challenge, we propose the Multi-gate Mixture-of-Experts with Exclusivity (MMoEEx). Our work aims to tackle the heterogeneous MTL setting, in which the same model optimizes multiple tasks with different characteristics. Such a scenario can overwhelm current MTL approaches due to the challenges in balancing shared and task-specific representations and the need to optimize tasks with competing optimization paths. Our method makes two key contributions: first, we introduce an approach to induce more diversity among experts, thus creating representations more suitable for highly imbalanced and heterogenous MTL learning; second, we adopt a two-step optimization [6, 11] approach to balancing the tasks at the gradient level. We validate our method on three MTL benchmark datasets, including Medical Information Mart for Intensive Care (MIMIC-III) and PubChem BioAssay (PCBA).
【25】 Learning and Generalization in Overparameterized Normalizing Flows 标题:过参数化正规化流动中的学习与泛化
作者:Kulin Shah,Amit Deshpande,Navin Goyal 机构:Microsoft Research 备注:80 pages, 79 figures 链接:https://arxiv.org/abs/2106.10535 摘要:In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is one-hidden-layer overparameterized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for a class of NFs containing most of the existing NF models, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized.
【26】 Learning to Reach, Swim, Walk and Fly in One Trial: Data-Driven Control with Scarce Data and Side Information 标题:在一次试验中学会伸手、游泳、步行和飞翔:使用稀缺数据和边际信息的数据驱动控制
作者:Franck Djeumou,Ufuk Topcu 机构:Department of Electrical and Computer Engineering, University of Texas at Austin United States, Department of Aerospace Engineering and Engineering Mechanics 备注:Initial submission to CoRL 2021 链接:https://arxiv.org/abs/2106.10533 摘要:We develop a learning-based control algorithm for unknown dynamical systems under very severe data limitations. Specifically, the algorithm has access to streaming data only from a single and ongoing trial. Despite the scarcity of data, we show -- through a series of examples -- that the algorithm can provide performance comparable to reinforcement learning algorithms trained over millions of environment interactions. It accomplishes such performance by effectively leveraging various forms of side information on the dynamics to reduce the sample complexity. Such side information typically comes from elementary laws of physics and qualitative properties of the system. More precisely, the algorithm approximately solves an optimal control problem encoding the system's desired behavior. To this end, it constructs and refines a differential inclusion that contains the unknown vector field of the dynamics. The differential inclusion, used in an interval Taylor-based method, enables to over-approximate the set of states the system may reach. Theoretically, we establish a bound on the suboptimality of the approximate solution with respect to the case of known dynamics. We show that the longer the trial or the more side information is available, the tighter the bound. Empirically, experiments in a high-fidelity F-16 aircraft simulator and MuJoCo's environments such as the Reacher, Swimmer, and Cheetah illustrate the algorithm's effectiveness.
【27】 MSN: Efficient Online Mask Selection Network for Video Instance Segmentation 标题:MSN:高效的视频实例分割在线掩码选择网络
作者:Vidit Goel,Jiachen Li,Shubhika Garg,Harsh Maheshwari,Humphrey Shi 机构:Picsart AI Research (PAIR) 备注:3rd Place Solution to the YouTube-VIS Challenge at CVPR 2021 链接:https://arxiv.org/abs/2106.10452 摘要:In this work we present a novel solution for Video Instance Segmentation(VIS), that is automatically generating instance level segmentation masks along with object class and tracking them in a video. Our method improves the masks from segmentation and propagation branches in an online manner using the Mask Selection Network (MSN) hence limiting the noise accumulation during mask tracking. We propose an effective design of MSN by using patch-based convolutional neural network. The network is able to distinguish between very subtle differences between the masks and choose the better masks out of the associated masks accurately. Further, we make use of temporal consistency and process the video sequences in both forward and reverse manner as a post processing step to recover lost objects. The proposed method can be used to adapt any video object segmentation method for the task of VIS. Our method achieves a score of 49.1 mAP on 2021 YouTube-VIS Challenge and was ranked third place among more than 30 global teams. Our code will be available at https://github.com/SHI-Labs/Mask-Selection-Networks.
【28】 Algorithm Unrolling for Massive Access via Deep Neural Network with Theoretical Guarantee 标题:具有理论保障的深度神经网络海量访问算法展开
作者:Yandong Shi,Hayoung Choi,Yuanming Shi,Yong Zhou 机构: Choi is with the Department of Mathematics Kyungpook National Uni-versity Daegu 备注:15 pages, 15 figures, this paper has been submitted to IEEE Transactions on Wireless Communications 链接:https://arxiv.org/abs/2106.10426 摘要:Massive access is a critical design challenge of Internet of Things (IoT) networks. In this paper, we consider the grant-free uplink transmission of an IoT network with a multiple-antenna base station (BS) and a large number of single-antenna IoT devices. Taking into account the sporadic nature of IoT devices, we formulate the joint activity detection and channel estimation (JADCE) problem as a group-sparse matrix estimation problem. This problem can be solved by applying the existing compressed sensing techniques, which however either suffer from high computational complexities or lack of algorithm robustness. To this end, we propose a novel algorithm unrolling framework based on the deep neural network to simultaneously achieve low computational complexity and high robustness for solving the JADCE problem. Specifically, we map the original iterative shrinkage thresholding algorithm (ISTA) into an unrolled recurrent neural network (RNN), thereby improving the convergence rate and computational efficiency through end-to-end training. Moreover, the proposed algorithm unrolling approach inherits the structure and domain knowledge of the ISTA, thereby maintaining the algorithm robustness, which can handle non-Gaussian preamble sequence matrix in massive access. With rigorous theoretical analysis, we further simplify the unrolled network structure by reducing the redundant training parameters. Furthermore, we prove that the simplified unrolled deep neural network structures enjoy a linear convergence rate. Extensive simulations based on various preamble signatures show that the proposed unrolled networks outperform the existing methods in terms of the convergence rate, robustness and estimation accuracy.
【29】 Deep Generative Learning via Schrödinger Bridge 标题:基于薛定谔桥的深度生成性学习
作者:Gefei Wang,Yuling Jiao,Qian Xu,Yang Wang,Can Yang 机构: 20 17; 1Department of Mathematics, The Hong Kong University ofScience and Technology, China 2School of Mathemat-ics and Statistics, Wuhan University 备注:None 链接:https://arxiv.org/abs/2106.10410 摘要:We propose to learn a generative model via entropy interpolation with a Schr"{o}dinger Bridge. The generative learning task can be formulated as interpolating between a reference distribution and a target distribution based on the Kullback-Leibler divergence. At the population level, this entropy interpolation is characterized via an SDE on $[0,1]$ with a time-varying drift term. At the sample level, we derive our Schr"{o}dinger Bridge algorithm by plugging the drift term estimated by a deep score estimator and a deep density ratio estimator into the Euler-Maruyama method. Under some mild smoothness assumptions of the target distribution, we prove the consistency of both the score estimator and the density ratio estimator, and then establish the consistency of the proposed Schr"{o}dinger Bridge approach. Our theoretical results guarantee that the distribution learned by our approach converges to the target distribution. Experimental results on multimodal synthetic data and benchmark data support our theoretical findings and indicate that the generative model via Schr"{o}dinger Bridge is comparable with state-of-the-art GANs, suggesting a new formulation of generative learning. We demonstrate its usefulness in image interpolation and image inpainting.
【30】 The Perils of Learning Before Optimizing 标题:先学习后优化的危险
作者:Chris Cameron,Jason Hartford,Taylor Lundy,Kevin Leyton-Brown 机构:Department of Computer Science, University of British Columbia 链接:https://arxiv.org/abs/2106.10349 摘要:Formulating real-world optimization problems often begins with making predictions from historical data (e.g., an optimizer that aims to recommend fast routes relies upon travel-time predictions). Typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Recent work has showed how such prediction models can be learned end-to-end by differentiating through the optimization task. Such methods often yield empirical improvements, which are typically attributed to end-to-end making better error tradeoffs than the standard loss function used in a two-stage solution. We refine this explanation and more precisely characterize when end-to-end can improve performance. When prediction targets are stochastic, a two-stage solution must make an a priori choice about which statistics of the target distribution to model -- we consider expectations over prediction targets -- while an end-to-end solution can make this choice adaptively. We show that the performance gap between a two-stage and end-to-end approach is closely related to the emph{price of correlation} concept in stochastic optimization and show the implications of some existing POC results for our predict-then-optimize problem. We then consider a novel and particularly practical setting, where coefficients in the objective function depend on multiple prediction targets. We give explicit constructions where (1) two-stage performs unboundedly worse than end-to-end; and (2) two-stage is optimal. We identify a large set of real-world applications whose objective functions rely on multiple prediction targets but which nevertheless deploy two-stage solutions. We also use simulations to experimentally quantify performance gaps.
【31】 Multi-Task Learning for User Engagement and Adoption in Live Video Streaming Events 标题:视频直播活动中用户参与度和采用率的多任务学习
作者:Stefanos Antaris,Dimitrios Rafailidis,Romina Arriaza 机构: KTH Royal Institute of Technology, Sweden, Hive Streaming AB, Sweden, University of Thessaly, Greece 链接:https://arxiv.org/abs/2106.10305 摘要:Nowadays, live video streaming events have become a mainstay in viewer's communication in large international enterprises. Provided that viewers are distributed worldwide, the main challenge resides on how to schedule the optimal event's time so as to improve both the viewer's engagement and adoption. In this paper we present a multi-task deep reinforcement learning model to select the time of a live video streaming event, aiming to optimize the viewer's engagement and adoption at the same time. We consider the engagement and adoption of the viewers as independent tasks and formulate a unified loss function to learn a common policy. In addition, we account for the fact that each task might have different contribution to the training strategy of the agent. Therefore, to determine the contribution of each task to the agent's training, we design a Transformer's architecture for the state-action transitions of each task. We evaluate our proposed model on four real-world datasets, generated by the live video streaming events of four large enterprises spanning from January 2019 until March 2021. Our experiments demonstrate the effectiveness of the proposed model when compared with several state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/merlin.
【32】 Stratified Learning: a general-purpose statistical method for improved learning under Covariate Shift 标题:分层学习:协变量漂移条件下改进学习的一种通用统计方法
作者:Maximilian Autenrieth,David A. van Dyk,Roberto Trotta,David C. Stenning 机构:Imperial College London, David van Dyk, & SISSA (Trieste), David Stenning, Simon Fraser University 链接:https://arxiv.org/abs/2106.11211 摘要:Covariate shift arises when the labelled training (source) data is not representative of the unlabelled (target) data due to systematic differences in the covariate distributions. A supervised model trained on the source data subject to covariate shift may suffer from poor generalization on the target data. We propose a novel, statistically principled and theoretically justified method to improve learning under covariate shift conditions, based on propensity score stratification, a well-established methodology in causal inference. We show that the effects of covariate shift can be reduced or altogether eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners on subgroups ("strata") constructed by partitioning the data based on the estimated propensity scores, leading to balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on contemporary research questions in observational cosmology, and on additional benchmark examples, matching or outperforming state-of-the-art importance weighting methods, widely studied in the covariate shift literature. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge" and improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.
【33】 UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control 标题:UniTTS:用于语音风格控制的统一嵌入空间的残差学习
作者:Minsu Kang,Sungjae Kim,Injung Kim 机构:Department of Computer Science and Electronic Engineering, Handong Global University 链接:https://arxiv.org/abs/2106.11171 摘要:We propose a novel high-fidelity expressive speech synthesis model, UniTTS, that learns and controls overlapping style attributes avoiding interference. UniTTS represents multiple style attributes in a single unified embedding space by the residuals between the phoneme embeddings before and after applying the attributes. The proposed method is especially effective in controlling multiple attributes that are difficult to separate cleanly, such as speaker ID and emotion, because it minimizes redundancy when adding variance in speaker ID and emotion, and additionally, predicts duration, pitch, and energy based on the speaker ID and emotion. In experiments, the visualization results exhibit that the proposed methods learned multiple attributes harmoniously in a manner that can be easily separated again. As well, UniTTS synthesized high-fidelity speech signals controlling multiple style attributes. The synthesized speech samples are presented at https://jackson-kang.github.io/paper_works/UniTTS/demos.
【34】 Spliced Binned-Pareto Distribution for Robust Modeling of Heavy-tailed Time Series 标题:重尾时间序列稳健建模的拼接宾数-帕累托分布
作者:Elena Ehrlich,Laurent Callot,François-Xavier Aubet 机构:AWS ProServe, Miami, FL, USA, Amazon Research, Seattle, WA, USA, Franc¸ois-Xavier Aubet, Vienna, Austria 备注:Accepted at RobustWorkshop@ICLR2021: <this https URL> 链接:https://arxiv.org/abs/2106.10952 摘要:This work proposes a novel method to robustly and accurately model time series with heavy-tailed noise, in non-stationary scenarios. In many practical application time series have heavy-tailed noise that significantly impacts the performance of classical forecasting models; in particular, accurately modeling a distribution over extreme events is crucial to performing accurate time series anomaly detection. We propose a Spliced Binned-Pareto distribution which is both robust to extreme observations and allows accurate modeling of the full distribution. Our method allows the capture of time dependencies in the higher order moments of the distribution such as the tail heaviness. We compare the robustness and the accuracy of the tail estimation of our method to other state of the art methods on Twitter mentions count time series.
【35】 Quantum Machine Learning: Fad or Future? 标题:量子机学习:时尚还是未来?
作者:Arhum Ishtiaq,Sara Mahmood 机构:Computer Science Department, Habib University, Karachi, Pakistan 链接:https://arxiv.org/abs/2106.10714 摘要:For the last few decades, classical machine learning has allowed us to improve the lives of many through automation, natural language processing, predictive analytics and much more. However, a major concern is the fact that we're fast approach the threshold of the maximum possible computational capacity available to us by the means of classical computing devices including CPUs, GPUs and Application Specific Integrated Circuits (ASICs). This is due to the exponential increase in model sizes which now have parameters in the magnitude of billions and trillions, requiring a significant amount of computing resources across a significant amount of time, just to converge one single model. To observe the efficacy of using quantum computing for certain machine learning tasks and explore the improved potential of convergence, error reduction and robustness to noisy data, this paper will look forth to test and verify the aspects in which quantum machine learning can help improve over classical machine learning approaches while also shedding light on the likely limitations that have prevented quantum approaches to become the mainstream. A major focus will be to recreate the work by Farhi et al and conduct experiments using their theory of performing machine learning in a quantum context, with assistance from the Tensorflow Quantum documentation.
【36】 Parallel frequency function-deep neural network for efficient complex broadband signal approximation 标题:并行频率函数-深度神经网络高效逼近复宽带信号
作者:Zhi Zeng,Pengpeng Shi,Fulei Ma,Peihan Qi 机构: These authors contributed equally to this paper., Corresponding author, in Xi'an University of Architecture and Technology, China 链接:https://arxiv.org/abs/2106.10401 摘要:A neural network is essentially a high-dimensional complex mapping model by adjusting network weights for feature fitting. However, the spectral bias in network training leads to unbearable training epochs for fitting the high-frequency components in broadband signals. To improve the fitting efficiency of high-frequency components, the PhaseDNN was proposed recently by combining complex frequency band extraction and frequency shift techniques [Cai et al. SIAM J. SCI. COMPUT. 42, A3285 (2020)]. Our paper is devoted to an alternative candidate for fitting complex signals with high-frequency components. Here, a parallel frequency function-deep neural network (PFF-DNN) is proposed to suppress computational overhead while ensuring fitting accuracy by utilizing fast Fourier analysis of broadband signals and the spectral bias nature of neural networks. The effectiveness and efficiency of the proposed PFF-DNN method are verified based on detailed numerical experiments for six typical broadband signals.
【37】 Learning the Preferences of Uncertain Humans with Inverse Decision Theory 标题:用逆决策理论学习不确定人的偏好
作者:Cassidy Laidlaw,Stuart Russell 机构:University of California, Berkeley 链接:https://arxiv.org/abs/2106.10394 摘要:Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed framework where a human is observed making non-sequential binary decisions under uncertainty. In IDT, the human's preferences are conveyed through their loss function, which expresses a tradeoff between different types of mistakes. We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision. Interestingly, we show that it is actually easier to identify preferences when the decision problem is more uncertain. Furthermore, uncertain decision problems allow us to relax the unrealistic assumption that the human is an optimal decision maker but still identify their exact preferences; we give sample complexities in this suboptimal case as well. Our analysis contradicts the intuition that partial observability should make preference learning more difficult. It also provides a first step towards understanding and improving preference learning methods for uncertain and suboptimal humans.
其他(35篇)
【1】 TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? 标题:TokenLearner:8个学习过的令牌对图片和视频有什么作用?
作者:Michael S. Ryoo,AJ Piergiovanni,Anurag Arnab,Mostafa Dehghani,Anelia Angelova 机构:Google Research, Stony Brook University 链接:https://arxiv.org/abs/2106.11297 摘要:In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in images. Our experiments demonstrate strong performance on several challenging benchmarks for both image and video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced compute amount.
【2】 Smooth Sequential Optimisation with Delayed Feedback 标题:延迟反馈的光滑序贯优化
作者:Srivas Chennu,Jamie Martin,Puli Liyanagama,Phil Mohr 机构:Apple 备注:Workshop on Bayesian causal inference for real world interactive systems, 27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021) 链接:https://arxiv.org/abs/2106.11294 摘要:Stochastic delays in feedback lead to unstable sequential learning using multi-armed bandits. Recently, empirical Bayesian shrinkage has been shown to improve reward estimation in bandit learning. Here, we propose a novel adaptation to shrinkage that estimates smoothed reward estimates from windowed cumulative inputs, to deal with incomplete knowledge from delayed feedback and non-stationary rewards. Using numerical simulations, we show that this adaptation retains the benefits of shrinkage, and improves the stability of reward estimation by more than 50%. Our proposal reduces variability in treatment allocations to the best arm by up to 3.8x, and improves statistical accuracy - with up to 8% improvement in true positive rates and 37% reduction in false positive rates. Together, these advantages enable control of the trade-off between speed and stability of adaptation, and facilitate human-in-the-loop sequential optimisation.
【3】 Neural Marching Cubes 标题:神经行进立方体
作者:Zhiqin Chen,Hao Zhang 机构: Simon Fraser University 链接:https://arxiv.org/abs/2106.11272 摘要:We introduce Neural Marching Cubes (NMC), a data-driven approach for extracting a triangle mesh from a discretized implicit field. Classical MC is defined by coarse tessellation templates isolated to individual cubes. While more refined tessellations have been proposed, they all make heuristic assumptions, such as trilinearity, when determining the vertex positions and local mesh topologies in each cube. In principle, none of these approaches can reconstruct geometric features that reveal coherence or dependencies between nearby cubes (e.g., a sharp edge), as such information is unaccounted for, resulting in poor estimates of the true underlying implicit field. To tackle these challenges, we re-cast MC from a deep learning perspective, by designing tessellation templates more apt at preserving geometric features, and learning the vertex positions and mesh topologies from training meshes, to account for contextual information from nearby cubes. We develop a compact per-cube parameterization to represent the output triangle mesh, while being compatible with neural processing, so that a simple 3D convolutional network can be employed for the training. We show that all topological cases in each cube that are applicable to our design can be easily derived using our representation, and the resulting tessellations can also be obtained naturally and efficiently by following a few design guidelines. In addition, our network learns local features with limited receptive fields, hence it generalizes well to new shapes and new datasets. We evaluate our neural MC approach by quantitative and qualitative comparisons to all well-known MC variants. In particular, we demonstrate the ability of our network to recover sharp features such as edges and corners, a long-standing issue of MC and its variants. Our network also reconstructs local mesh topologies more accurately than previous approaches.
【4】 Secure Distributed Training at Scale 标题:确保大规模分布式训练的安全
作者:Eduard Gorbunov,Alexander Borzunov,Michael Diskin,Max Ryabinin 机构:Yandex, MIPT, HSE University 备注:55 pages, 6 figures. Code: this https URL 链接:https://arxiv.org/abs/2106.11257 摘要:Some of the hardest problems in deep learning can be solved with the combined effort of many independent parties, as is the case for volunteer computing and federated learning. These setups rely on high numbers of peers to provide computational resources or train on decentralized datasets. Unfortunately, participants in such systems are not always reliable. Any single participant can jeopardize the entire training run by sending incorrect updates, whether deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server. As a result, it can be infeasible to apply such algorithms to large-scale distributed deep learning, where models can have billions of parameters. In this work, we propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency. We rigorously analyze this protocol: in particular, we provide theoretical bounds for its resistance against Byzantine and Sybil attacks and show that it has a marginal communication overhead. To demonstrate its practical effectiveness, we conduct large-scale experiments on image classification and language modeling in presence of Byzantine attackers.
【5】 A causal view on compositional data 标题:关于成分数据的一种因果观
作者:Elisabeth Ailer,Christian L. Müller,Niki Kilbertus 机构:Helmholtz AI, Munich, LMU & Helmholtz Zentrum Munich, Flatiron Institute, New York 备注:Code available on this https URL 链接:https://arxiv.org/abs/2106.11234 摘要:Many scientific datasets are compositional in nature. Important examples include species abundances in ecology, rock compositions in geology, topic compositions in large-scale text corpora, and sequencing count data in molecular biology. Here, we provide a causal view on compositional data in an instrumental variable setting where the composition acts as the cause. Throughout, we pay particular attention to the interpretation of compositional causes from the viewpoint of interventions and crisply articulate potential pitfalls for practitioners. Focusing on modern high-dimensional microbiome sequencing data as a timely illustrative use case, our analysis first reveals that popular one-dimensional information-theoretic summary statistics, such as diversity and richness, may be insufficient for drawing causal conclusions from ecological data. Instead, we advocate for multivariate alternatives using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account. In a comparative analysis on synthetic and semi-synthetic data we show the advantages and limitations of our proposal. We posit that our framework may provide a useful starting point for cause-effect estimation in the context of compositional data.
【6】 Regularization is all you Need: Simple Neural Nets can Excel on Tabular Data 标题:正则化是您所需要的全部:简单的神经网络可以在表格数据上出类拔萃
作者:Arlind Kadra,Marius Lindauer,Frank Hutter,Josif Grabocka 机构:Department of Representation Learning, University of Freiburg, Freiburg, Germany, Institute for Information Processing, Leibniz University Hannover, Hannover, Germany, Department of Machine Learning 链接:https://arxiv.org/abs/2106.11189 摘要:Tabular datasets are the last "unconquered castle" for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularization techniques. As a result, we propose regularizing plain Multilayer Perceptron (MLP) networks by searching for the optimal combination/cocktail of 13 regularization techniques for each dataset using a joint optimization over the decision on which regularizers to apply and their subsidiary hyperparameters. We empirically assess the impact of these regularization cocktails for MLPs on a large-scale empirical study comprising 40 tabular datasets and demonstrate that (i) well-regularized plain MLPs significantly outperform recent state-of-the-art specialized neural network architectures, and (ii) they even outperform strong traditional ML methods, such as XGBoost.
【7】 Graceful Degradation and Related Fields 标题:优美退化及其相关领域
作者:Jack Dymond 机构:School of Electronics and Computer Science, University of Southampton 链接:https://arxiv.org/abs/2106.11119 摘要:When machine learning models encounter data which is out of the distribution on which they were trained they have a tendency to behave poorly, most prominently over-confidence in erroneous predictions. Such behaviours will have disastrous effects on real-world machine learning systems. In this field graceful degradation refers to the optimisation of model performance as it encounters this out-of-distribution data. This work presents a definition and discussion of graceful degradation and where it can be applied in deployed visual systems. Following this a survey of relevant areas is undertaken, novelly splitting the graceful degradation problem into active and passive approaches. In passive approaches, graceful degradation is handled and achieved by the model in a self-contained manner, in active approaches the model is updated upon encountering epistemic uncertainties. This work communicates the importance of the problem and aims to prompt the development of machine learning strategies that are aware of graceful degradation.
【8】 Multivariate Data Explanation by Jumping Emerging Patterns Visualization 标题:跳跃显现模式可视化在多元数据解释中的应用
作者:Mário Popolin Neto,Fernando V. Paulovich 机构: and University of S˜aoPaulo (USP) 链接:https://arxiv.org/abs/2106.11112 摘要:Visual Analytics (VA) tools and techniques have shown to be instrumental in supporting users to build better classification models, interpret model decisions and audit results. In a different direction, VA has recently been applied to transform classification models into descriptive mechanisms instead of predictive. The idea is to use such models as surrogates for data patterns, visualizing the model to understand the phenomenon represented by the data. Although very useful and inspiring, the few proposed approaches have opted to use low complex classification models to promote straightforward interpretation, presenting limitations to capture intricate data patterns. In this paper, we present VAX (multiVariate dAta eXplanation), a new VA method to support the identification and visual interpretation of patterns in multivariate data sets. Unlike the existing similar approaches, VAX uses the concept of Jumping Emerging Patterns to identify and aggregate several diversified patterns, producing explanations through logic combinations of data variables. The potential of VAX to interpret complex multivariate datasets is demonstrated through study-cases using two real-world data sets covering different scenarios.
【9】 Techniques for Symbol Grounding with SATNet 标题:使用SATNet实现符号接地的技术
作者:Sever Topan,David Rolnick,Xujie Si 机构: the issue 1McGill University and NVIDIA, ca 2McGill University and Mila – Quebec AI Institute 备注:Code available at this https URL 链接:https://arxiv.org/abs/2106.11072 摘要:Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.
【10】 QuaPy: A Python-Based Framework for Quantification 标题:QuaPy:一个基于Python的量化框架
作者:Alejandro Moreo,Andrea Esuli,Fabrizio Sebastiani 机构:Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi , Pisa, Italy 链接:https://arxiv.org/abs/2106.11057 摘要:QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially performed by applying a standard classifier to each unlabelled data item and counting how many data items have been assigned to each class, it has been shown that this "classify and count" method is outperformed by methods specifically designed for quantification. QuaPy provides implementations of a number of baseline methods and advanced quantification methods, of routines for quantification-oriented model selection, of several broadly accepted evaluation measures, and of robust evaluation protocols routinely used in the field. QuaPy also makes available datasets commonly used for testing quantifiers, and offers visualization tools for facilitating the analysis and interpretation of the results. The software is open-source and publicly available under a BSD-3 licence via https://github.com/HLT-ISTI/QuaPy, and can be installed via pip (https://pypi.org/project/QuaPy/)
【11】 Attribute Selection using Contranominal Scales 标题:利用共生标度进行属性选择
作者:Dominik Dürrschnabel,Maren Koyda,Gerd Stumme 机构: Knowledge & Data Engineering Group, University of Kassel, Germany, Interdisciplinary Research Center for Information System Design 备注:17 pages, 2 figures, 3 tables, 1 algorithm, 26th International Conference on Conceptual Structures 链接:https://arxiv.org/abs/2106.10978 摘要:Formal Concept Analysis (FCA) allows to analyze binary data by deriving concepts and ordering them in lattices. One of the main goals of FCA is to enable humans to comprehend the information that is encapsulated in the data; however, the large size of concept lattices is a limiting factor for the feasibility of understanding the underlying structural properties. The size of such a lattice depends on the number of subcontexts in the corresponding formal context that are isomorphic to a contranominal scale of high dimension. In this work, we propose the algorithm ContraFinder that enables the computation of all contranominal scales of a given formal context. Leveraging this algorithm, we introduce delta-adjusting, a novel approach in order to decrease the number of contranominal scales in a formal context by the selection of an appropriate attribute subset. We demonstrate that delta-adjusting a context reduces the size of the hereby emerging sub-semilattice and that the implication set is restricted to meaningful implications. This is evaluated with respect to its associated knowledge by means of a classification task. Hence, our proposed technique strongly improves understandability while preserving important conceptual structures.
【12】 Towards a Framework for Changing-Contact Robot Manipulation 标题:一种变接触机器人操作框架的研究
作者:Saif Sidhik,Mohan Sridharan,Dirk Ruiken 机构: Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK, Honda Research Institute Europe GmbH, Offenbach am Main, Germany 备注:Submitted to "Autonomous Robots and Multirobot Systems (ARMS) Workshop" at 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021 链接:https://arxiv.org/abs/2106.10969 摘要:Many robot manipulation tasks require the robot to make and break contact with objects and surfaces. The dynamics of such changing-contact robot manipulation tasks are discontinuous when contact is made or broken, and continuous elsewhere. These discontinuities make it difficult to construct and use a single dynamics model or control strategy for any such task. We present a framework for smooth dynamics and control of such changing-contact manipulation tasks. For any given target motion trajectory, the framework incrementally improves its prediction of when contacts will occur. This prediction and a model relating approach velocity to impact force modify the velocity profile of the motion sequence such that it is $C^infty$ smooth, and help achieve a desired force on impact. We implement this framework by building on our hybrid force-motion variable impedance controller for continuous contact tasks. We experimentally evaluate our framework in the illustrative context of sliding tasks involving multiple contact changes with transitions between surfaces of different properties.
【13】 Segmentation of cell-level anomalies in electroluminescence images of photovoltaic modules 标题:光伏组件电致发光图像中单元级异常的分割
作者:Urtzi Otamendi,Iñigo Martinez,Marco Quartulli,Igor G. Olaizola,Elisabeth Viles,Werther Cambarau 机构:Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), Donostia-San Sebasti´an , Spain, TECNUN School of Engineering, University of Navarra, Donostia-San Sebasti´an , Spain 备注:None 链接:https://arxiv.org/abs/2106.10962 摘要:In the operation & maintenance (O&M) of photovoltaic (PV) plants, the early identification of failures has become crucial to maintain productivity and prolong components' life. Of all defects, cell-level anomalies can lead to serious failures and may affect surrounding PV modules in the long run. These fine defects are usually captured with high spatial resolution electroluminescence (EL) imaging. The difficulty of acquiring such images has limited the availability of data. For this work, multiple data resources and augmentation techniques have been used to surpass this limitation. Current state-of-the-art detection methods extract barely low-level information from individual PV cell images, and their performance is conditioned by the available training data. In this article, we propose an end-to-end deep learning pipeline that detects, locates and segments cell-level anomalies from entire photovoltaic modules via EL images. The proposed modular pipeline combines three deep learning techniques: 1. object detection (modified Faster-RNN), 2. image classification (EfficientNet) and 3. weakly supervised segmentation (autoencoder). The modular nature of the pipeline allows to upgrade the deep learning models to the further improvements in the state-of-the-art and also extend the pipeline towards new functionalities.
【14】 On Limited-Memory Subsampling Strategies for Bandits 标题:关于Bitits的有限存储二次采样策略
作者:Dorian Baudry,Yoan Russac,Olivier Cappé 机构: Université PSL 备注:None 链接:https://arxiv.org/abs/2106.10935 摘要:There has been a recent surge of interest in nonparametric bandit algorithms based on subsampling. One drawback however of these approaches is the additional complexity required by random subsampling and the storage of the full history of rewards. Our first contribution is to show that a simple deterministic subsampling rule, proposed in the recent work of Baudry et al. (2020) under the name of ''last-block subsampling'', is asymptotically optimal in one-parameter exponential families. In addition, we prove that these guarantees also hold when limiting the algorithm memory to a polylogarithmic function of the time horizon. These findings open up new perspectives, in particular for non-stationary scenarios in which the arm distributions evolve over time. We propose a variant of the algorithm in which only the most recent observations are used for subsampling, achieving optimal regret guarantees under the assumption of a known number of abrupt changes. Extensive numerical simulations highlight the merits of this approach, particularly when the changes are not only affecting the means of the rewards.
【15】 Open-set Label Noise Can Improve Robustness Against Inherent Label Noise 标题:开集标签噪声可以提高对固有标签噪声的鲁棒性
作者:Hongxin Wei,Lue Tao,Renchunzi Xie,Bo An 机构:Nanyang Technological University, Singapore, Nanjing University of Aeronautics and Astronautics 链接:https://arxiv.org/abs/2106.10891 摘要:Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, we propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training. With ODNL, the extra capacity of the neural network can be largely consumed in a way that does not interfere with learning patterns from clean data. Through the lens of SGD noise, we show that the noises induced by our method are random-direction, conflict-free and biased, which may help the model converge to a flat minimum with superior stability and enforce the model to produce conservative predictions on Out-of-Distribution instances. Extensive experimental results on benchmark datasets with various types of noisy labels demonstrate that the proposed method not only enhances the performance of many existing robust algorithms but also achieves significant improvement on Out-of-Distribution detection tasks even in the label noise setting.
【16】 Multiplying Matrices Without Multiplying 标题:矩阵乘法不乘法
作者:Davis Blalock,John Guttag 备注:To appear at ICML 2021 链接:https://arxiv.org/abs/2106.10860 摘要:Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100times$ faster than exact matrix products and $10times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.
【17】 Compressing Deep ODE-Nets using Basis Function Expansions 标题:基于基函数展开的深层ODE-Net压缩
作者:Alejandro Queiruga,N. Benjamin Erichson,Liam Hodgkinson,Michael W. Mahoney 机构:Google Research, ICSI and UC Berkeley 链接:https://arxiv.org/abs/2106.10820 摘要:The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-depth functions using linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments. Furthermore, our framework enables meaningful continuous-in-time batch normalization layers using function projections. The performance of basis function compression is demonstrated by applying continuous-depth models to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
【18】 CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation 标题:CD-SGD:带压缩和延迟补偿的分布式随机梯度下降
作者:Enda Yu,Dezun Dong,Yemao Xu,Shuo Ouyang,Xiangke Liao 机构:College of Computer, National University of Defense Technology, Changsha , China 备注:12 pages 链接:https://arxiv.org/abs/2106.10796 摘要:Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient compression technique can greatly alleviate the impact of communication overhead. However, there exists two problems of gradient compression technique to be solved. Firstly, gradient compression brings in extra computation cost, which will delay the next training iteration. Secondly, gradient compression usually leads to the decrease of convergence accuracy.
【19】 iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients 标题:IDARTS:随机隐式梯度可微体系结构搜索
作者:Miao Zhang,Steven Su,Shirui Pan,Xiaojun Chang,Ehsan Abbasnejad,Reza Haffari 机构: Monash University, Australia 3Australian Institute forMachine Learning, University of Adelaide 备注:ICML2021 链接:https://arxiv.org/abs/2106.10784 摘要:textit{Differentiable ARchiTecture Search} (DARTS) has recently become the mainstream of neural architecture search (NAS) due to its efficiency and simplicity. With a gradient-based bi-level optimization, DARTS alternately optimizes the inner model weights and the outer architecture parameter in a weight-sharing supernet. A key challenge to the scalability and quality of the learned architectures is the need for differentiating through the inner-loop optimisation. While much has been discussed about several potentially fatal factors in DARTS, the architecture gradient, a.k.a. hypergradient, has received less attention. In this paper, we tackle the hypergradient computation in DARTS based on the implicit function theorem, making it only depends on the obtained solution to the inner-loop optimization and agnostic to the optimization path. To further reduce the computational requirements, we formulate a stochastic hypergradient approximation for differentiable NAS, and theoretically show that the architecture optimization with the proposed method, named iDARTS, is expected to converge to a stationary point. Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods.
【20】 Neural Spectral Marked Point Processes 标题:神经频谱标记点过程
作者:Shixiang Zhu,Haoyun Wang,Xiuyuan Cheng,Yao Xie 链接:https://arxiv.org/abs/2106.10773 摘要:Self- and mutually-exciting point processes are popular models in machine learning and statistics for dependent discrete event data. To date, most existing models assume stationary kernels (including the classical Hawkes processes) and simple parametric models. Modern applications with complex event data require more general point process models that can incorporate contextual information of the events, called marks, besides the temporal and location information. Moreover, such applications often require non-stationary models to capture more complex spatio-temporal dependence. To tackle these challenges, a key question is to devise a versatile influence kernel in the point process model. In this paper, we introduce a novel and general neural network-based non-stationary influence kernel with high expressiveness for handling complex discrete events data while providing theoretical performance guarantees. We demonstrate the superior performance of our proposed method compared with the state-of-the-art on synthetic and real data.
【21】 Generalization in the Face of Adaptivity: A Bayesian Perspective 标题:适应性面前的泛化:贝叶斯视角
作者:Moshe Shenfeld,Katrina Ligett 链接:https://arxiv.org/abs/2106.10761 摘要:Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the issued queries yield answers on the sample that differ wildly from the values of those queries on the underlying data distribution. Differential privacy provides a tool to ensure generalization despite adaptively-chosen queries, but its worst-case nature means that it cannot, for example, yield improved results for low-variance queries. In this paper, we give a simple new characterization that illuminates the core problem of adaptive data analysis. We show explicitly that the harms of adaptivity come from the covariance between the behavior of future queries and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We leverage this intuition to introduce a new stability notion; we then use it to prove new generalization results for the most basic noise-addition mechanisms (Laplace and Gaussian noise addition), with guarantees that scale with the variance of the queries rather than the square of their range. Our characterization opens the door to new insights and new algorithms for the fundamental problem of achieving generalization in adaptive data analysis.
【22】 Better Training using Weight-Constrained Stochastic Dynamics 标题:使用权重约束随机动力学进行更好的训练
作者:Benedict Leimkuhler,Tiffany Vlaar,Timothée Pouchon,Amos Storkey 机构: 1Department of Mathematics, University of Edinburgh, United Kingdom 2Department of Informatics, University of Ed-inburgh 备注:None 链接:https://arxiv.org/abs/2106.10704 摘要:We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.
【23】 Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia for Human Personality Profiling 标题:推特和脸书上的双面人:收割社交多媒体来分析人类的个性
作者:Qi Yang,Aleksandr Farseev,Andrey Filchenkov 机构:ITMO University, SUSS School of, Business, SoMin.ai Research, Russia, Singapore 链接:https://arxiv.org/abs/2106.10673 摘要:Human personality traits are the key drivers behind our decision-making, influencing our life path on a daily basis. Inference of personality traits, such as Myers-Briggs Personality Type, as well as an understanding of dependencies between personality traits and users' behavior on various social media platforms is of crucial importance to modern research and industry applications. The emergence of diverse and cross-purpose social media avenues makes it possible to perform user personality profiling automatically and efficiently based on data represented across multiple data modalities. However, the research efforts on personality profiling from multi-source multi-modal social media data are relatively sparse, and the level of impact of different social network data on machine learning performance has yet to be comprehensively evaluated. Furthermore, there is not such dataset in the research community to benchmark. This study is one of the first attempts towards bridging such an important research gap. Specifically, in this work, we infer the Myers-Briggs Personality Type indicators, by applying a novel multi-view fusion framework, called "PERS" and comparing the performance results not just across data modalities but also with respect to different social network data sources. Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources. We have also found that the selection of a machine learning approach is of crucial importance when choosing social network data sources and that people tend to reveal multiple facets of their personality in different social media avenues. Our released social multimedia dataset facilitates future research on this direction.
【24】 CAMERAS: Enhanced Resolution And Sanity preserving Class Activation Mapping for image saliency 标题:摄像机:增强分辨率和保持理智的图像显著类激活映射
作者:Mohammad A. A. K. Jalwana,Naveed Akhtar,Mohammed Bennamoun,Ajmal Mian 机构:Computer Science and Software Engineering, The University of Western Australia., Code: 备注:IEEE CVPR 2021 paper 链接:https://arxiv.org/abs/2106.10649 摘要:Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input. However, class-insensitivity of the earlier layers in a network only allows saliency computation with low resolution activation maps of the deeper layers, resulting in compromised image saliency. Remedifying this can lead to sanity failures. We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors and preserving the map sanity. Our method systematically performs multi-scale accumulation and fusion of the activation maps and backpropagated gradients to compute precise saliency maps. From accurate image saliency to articulation of relative importance of input features for different models, and precise discrimination between model perception of visually similar objects, our high-resolution mapping offers multiple novel insights into the black-box deep visual models, which are presented in the paper. We also demonstrate the utility of our saliency maps in adversarial setup by drastically reducing the norm of attack signals by focusing them on the precise regions identified by our maps. Our method also inspires new evaluation metrics and a sanity check for this developing research direction. Code is available here https://github.com/VisMIL/CAMERAS
【25】 DiffLoop: Tuning PID controllers by differentiating through the feedback loop 标题:DiffLoop:通过反馈环路微分来调整PID控制器
作者:Athindran Ramesh Kumar,Peter J. Ramadge 机构:Department of Electrical Engineering, Princeton University, Princeton, USA 备注:Extension of paper in 2021 55th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2021 链接:https://arxiv.org/abs/2106.10516 摘要:Since most industrial control applications use PID controllers, PID tuning and anti-windup measures are significant problems. This paper investigates tuning the feedback gains of a PID controller via back-calculation and automatic differentiation tools. In particular, we episodically use a cost function to generate gradients and perform gradient descent to improve controller performance. We provide a theoretical framework for analyzing this non-convex optimization and establish a relationship between back-calculation and disturbance feedback policies. We include numerical experiments on linear systems with actuator saturation to show the efficacy of this approach.
【26】 GLIB: Towards Automated Test Oracle for Graphically-Rich Applications 标题:GLib:面向图形丰富的应用程序的自动化测试Oracle
作者:Ke Chen,Yufei Li,Yingfeng Chen,Changjie Fan,Zhipeng Hu,Wei Yang 机构:Fuxi AI Lab in Netease, Hangzhou, China, The university of Texas at Dallas, Dallas, USA 备注:Accepted by ESEC/FSE 2021 链接:https://arxiv.org/abs/2106.10507 摘要:Graphically-rich applications such as games are ubiquitous with attractive visual effects of Graphical User Interface (GUI) that offers a bridge between software applications and end-users. However, various types of graphical glitches may arise from such GUI complexity and have become one of the main component of software compatibility issues. Our study on bug reports from game development teams in NetEase Inc. indicates that graphical glitches frequently occur during the GUI rendering and severely degrade the quality of graphically-rich applications such as video games. Existing automated testing techniques for such applications focus mainly on generating various GUI test sequences and check whether the test sequences can cause crashes. These techniques require constant human attention to captures non-crashing bugs such as bugs causing graphical glitches. In this paper, we present the first step in automating the test oracle for detecting non-crashing bugs in graphically-rich applications. Specifically, we propose texttt{GLIB} based on a code-based data augmentation technique to detect game GUI glitches. We perform an evaluation of texttt{GLIB} on 20 real-world game apps (with bug reports available) and the result shows that texttt{GLIB} can achieve 100% precision and 99.5% recall in detecting non-crashing bugs such as game GUI glitches. Practical application of texttt{GLIB} on another 14 real-world games (without bug reports) further demonstrates that texttt{GLIB} can effectively uncover GUI glitches, with 48 of 53 bugs reported by texttt{GLIB} having been confirmed and fixed so far.
【27】 Informative Class Activation Maps 标题:信息性类激活图
作者:Zhenyue Qin,Dongwoo Kim,Tom Gedeon 机构:Equal contribution 1School of Computing, Australian Na-tional University 2GSAI 备注:arXiv admin note: substantial text overlap with arXiv:1911.10688 链接:https://arxiv.org/abs/2106.10472 摘要:We study how to evaluate the quantitative information content of a region within an image for a particular label. To this end, we bridge class activation maps with information theory. We develop an informative class activation map (infoCAM). Given a classification task, infoCAM depict how to accumulate information of partial regions to that of the entire image toward a label. Thus, we can utilise infoCAM to locate the most informative features for a label. When applied to an image classification task, infoCAM performs better than the traditional classification map in the weakly supervised object localisation task. We achieve state-of-the-art results on Tiny-ImageNet.
【28】 Sparse Training via Boosting Pruning Plasticity with Neuroregeneration 标题:神经再生增强修剪可塑性的稀疏训练
作者:Shiwei Liu,Tianlong Chen,Xiaohan Chen,Zahra Atashgahi,Lu Yin,Huanyu Kou,Li Shen,Mykola Pechenizkiy,Zhangyang Wang,Decebal Constantin Mocanu 机构:Eindhoven University of Technology,University of Texas at Austin, University of Twente,University of Leeds,JD Explore Academy 链接:https://arxiv.org/abs/2106.10404 摘要:Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization). The former method suffers from an extremely large computation cost and the latter category of methods usually struggles with insufficient performance. In comparison, during-training pruning, a class of pruning methods that simultaneously enjoys the training/inference efficiency and the comparable performance, temporarily, has been less explored. To better understand during-training pruning, we quantitatively study the effect of pruning throughout training from the perspective of pruning plasticity (the ability of the pruned networks to recover the original performance). Pruning plasticity can help explain several other empirical observations about neural network pruning in literature. We further find that pruning plasticity can be substantially improved by injecting a brain-inspired mechanism called neuroregeneration, i.e., to regenerate the same number of connections as pruned. Based on the insights from pruning plasticity, we design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (GraNet), and its dynamic sparse training (DST) variant (GraNet-ST). Both of them advance state of the art. Perhaps most impressively, the latter for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods by a large margin with ResNet-50 on ImageNet. We will release all codes.
【29】 The Animal ID Problem: Continual Curation 标题:动物身份问题:可持续发展
作者:Charles V. Stewart,Jason R. Parham,Jason Holmberg,Tanya Y. Berger-Wolf 机构: StewartRensselaer Polytechnic Institutestewart, Berger-WolfOhio State Universityberger-wolf 备注:4 pages, 2 figures, non-archival in 2021 CVPR workshop 链接:https://arxiv.org/abs/2106.10377 摘要:Hoping to stimulate new research in individual animal identification from images, we propose to formulate the problem as the human-machine Continual Curation of images and animal identities. This is an open world recognition problem, where most new animals enter the system after its algorithms are initially trained and deployed. Continual Curation, as defined here, requires (1) an improvement in the effectiveness of current recognition methods, (2) a pairwise verification algorithm that allows the possibility of no decision, and (3) an algorithmic decision mechanism that seeks human input to guide the curation process. Error metrics must evaluate the ability of recognition algorithms to identify not only animals that have been seen just once or twice but also recognize new animals not in the database. An important measure of overall system performance is accuracy as a function of the amount of human input required.
【30】 Non-parametric Differentially Private Confidence Intervals for the Median 标题:中位数的非参数微分私有置信区间
作者:Joerg Drechsler,Ira Globus-Harris,Audra McMillan,Jayshree Sarathy,Adam Smith 机构:Institute for Employment Research, Germany, The Joint Program in Survey Methodology, University of Maryland, USA, University of Pennsylvania, USA, Apple, USA, Harvard John A. Paulson School of Engineering and Applied Sciences, USA 备注:44 pages, 15 figures 链接:https://arxiv.org/abs/2106.10333 摘要:Differential privacy is a restriction on data processing algorithms that provides strong confidentiality guarantees for individual records in the data. However, research on proper statistical inference, that is, research on properly quantifying the uncertainty of the (noisy) sample estimate regarding the true value in the population, is currently still limited. This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median. Instead of computing a differentially private point estimate and deriving its uncertainty, we directly estimate the interval bounds and discuss why this approach is superior if ensuring privacy is important. We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--simultaneously should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion. We evaluate the performance of the different algorithms under various parameter settings in extensive simulation studies and demonstrate how the findings could be applied in practical settings using data from the 1940 Decennial Census.
【31】 Proper Value Equivalence 标题:真值等价
作者:Christopher Grimm,André Barreto,Gregory Farquhar,David Silver,Satinder Singh 机构:Computer Science & Engineering, University of Michigan, DeepMind 链接:https://arxiv.org/abs/2106.10316 摘要:One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions: a model is said to be VE to the environment if the Bellman operators it induces for the policies yield the correct result when applied to the functions. As the number of policies and functions increase, the set of VE models shrinks, eventually collapsing to a single point corresponding to a perfect model. A fundamental question underlying the VE principle is thus how to select the smallest sets of policies and functions that are sufficient for planning. In this paper we take an important step towards answering this question. We start by generalizing the concept of VE to order-$k$ counterparts defined with respect to $k$ applications of the Bellman operator. This leads to a family of VE classes that increase in size as $k rightarrow infty$. In the limit, all functions become value functions, and we have a special instantiation of VE which we call proper VE or simply PVE. Unlike VE, the PVE class may contain multiple models even in the limit when all value functions are used. Crucially, all these models are sufficient for planning, meaning that they will yield an optimal policy despite the fact that they may ignore many aspects of the environment. We construct a loss function for learning PVE models and argue that popular algorithms such as MuZero and Muesli can be understood as minimizing an upper bound for this loss. We leverage this connection to propose a modification to MuZero and show that it can lead to improved performance in practice.
【32】 Fully automated quantification of in vivo viscoelasticity of prostate zones using magnetic resonance elastography with Dense U-net segmentation 标题:用密集U网分割磁共振弹性成像全自动定量前列腺区的体内粘弹性
作者:Nader Aldoj,Federico Biavati,Marc Dewey,Anja Hennemuth,Patrick Asbach,Ingolf Sack 链接:https://arxiv.org/abs/2106.11284 摘要:Magnetic resonance elastography (MRE) for measuring viscoelasticity heavily depends on proper tissue segmentation, especially in heterogeneous organs such as the prostate. Using trained network-based image segmentation, we investigated if MRE data suffice to extract anatomical and viscoelastic information for automatic tabulation of zonal mechanical properties of the prostate. Overall, 40 patients with benign prostatic hyperplasia (BPH) or prostate cancer (PCa) were examined with three magnetic resonance imaging (MRI) sequences: T2-weighted MRI (T2w), diffusion-weighted imaging (DWI), and MRE-based tomoelastography yielding six independent sets of imaging data per patient (T2w, DWI, apparent diffusion coefficient (ADC), MRE magnitude, shear wave speed, and loss angle maps). Combinations of these data were used to train Dense U-nets with manually segmented masks of the entire prostate gland (PG), central zone (CZ), and peripheral zone (PZ) in 30 patients and to validate them in 10 patients. Dice score (DS), sensitivity, specificity, and Hausdorff distance were determined. We found that segmentation based on MRE magnitude maps alone (DS, PG: 0.93$pm$0.04, CZ: 0.95$pm$0.03, PZ: 0.77$pm$0.05) was more accurate than magnitude maps combined with T2w and DWI_b (DS, PG: 0.91$pm$0.04, CZ: 0.91$pm$0.06, PZ: 0.63$pm$0.16) or T2w alone (DS, PG: 0.92$pm$0.03, CZ: 0.91$pm$0.04, PZ: 0.65$pm$0.08). Automatically tabulated MRE values were not different from ground-truth values (P>0.05). In conclusion: MRE combined with Dense U-net segmentation allows tabulation of quantitative imaging markers without manual analysis and independent of other MRI sequences and can thus contribute to PCa detection and classification.
【33】 Liquid Sensing Using WiFi Signals 标题:基于WiFi信号的液体传感
作者:Yili Ren,Jie Yang 机构:SHENG TAN, Trinity University, USA, LINGHAN ZHANG, Florida State University, USA, ZI WANG, Florida State University, USA, ZHI WANG, Florida State University, USA 链接:https://arxiv.org/abs/2106.10356 摘要:The popularity of Internet-of-Things (IoT) has provided us with unprecedented opportunities to enable a variety of emerging services in a smart home environment. Among those services, sensing the liquid level in a container is critical to building many smart home and mobile healthcare applications that improve the quality of life. This paper presents LiquidSense, a liquid-level sensing system that is low-cost, high accuracy, widely applicable to different daily liquids and containers, and can be easily integrated with existing smart home networks. LiquidSense uses an existing home WiFi network and a low-cost transducer that attached to the container to sense the resonance of the container for liquid level detection. In particular, our system mounts a low-cost transducer on the surface of the container and emits a well-designed chirp signal to make the container resonant, which introduces subtle changes to the home WiFi signals. By analyzing the subtle phase changes of the WiFi signals, LiquidSense extracts the resonance frequency as a feature for liquid level detection. Our system constructs prediction models for both continuous and discrete predictions using curve fitting and SVM respectively. We evaluate LiquidSense in home environments with containers of three different materials and six types of liquids. Results show that LiquidSense achieves an overall accuracy of 97% for continuous prediction and an overall F-score of 0.968 for discrete prediction. Results also show that our system has a large coverage in a home environment and works well under non-line-of-sight (NLOS) scenarios.
【34】 Differentiable Particle Filtering without Modifying the Forward Pass 标题:无需修改前向通道的可微粒子滤波
作者:Adam Ścibior,Vaden Masrani,Frank Wood 机构:University of British Columbia, Inverted AI, Mila 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.10314 摘要:In recent years particle filters have being used as components in systems optimized end-to-end with gradient descent. However, the resampling step in a particle filter is not differentiable, which biases gradients and interferes with optimization. To remedy this problem, several differentiable variants of resampling have been proposed, all of which modify the behavior of the particle filter in significant and potentially undesirable ways. In this paper, we show how to obtain unbiased estimators of the gradient of the marginal likelihood by only modifying messages used in backpropagation, leaving the standard forward pass of a particle filter unchanged. Our method is simple to implement, has a low computational overhead, does not introduce additional hyperparameters, and extends to derivatives of higher orders. We call it stop-gradient resampling, since it can easily be implemented with automatic differentiation libraries using the stop-gradient operator instead of explicitly modifying the backward messages.
【35】 GPLA-12: An Acoustic Signal Dataset of Gas Pipeline Leakage 标题:GPLA-12:天然气管道泄漏声信号数据集
作者:Jie Li,Lizhong Yao 机构:School of Intelligent Technology and Engineering, Chongqing University of Science and Technology, Chongqing, China, School of Electrical Engineering 链接:https://arxiv.org/abs/2106.10277 摘要:In this paper, we introduce a new acoustic leakage dataset of gas pipelines, called as GPLA-12, which has 12 categories over 684 training/testing acoustic signals. Unlike massive image and voice datasets, there have relatively few acoustic signal datasets, especially for engineering fault detection. In order to enhance the development of fault diagnosis, we collect acoustic leakage signals on the basis of an intact gas pipe system with external artificial leakages, and then preprocess the collected data with structured tailoring which are turned into GPLA-12. GPLA-12 dedicates to serve as a feature learning dataset for time-series tasks and classifications. To further understand the dataset, we train both shadow and deep learning algorithms to observe the performance. The dataset as well as the pretrained models have been released at both www.daip.club and github.com/Deep-AI-Application-DAIP